| Literature DB >> 25665577 |
Itai Sharon1, Michael Kertesz2, Laura A Hug1, Dmitry Pushkarev3, Timothy A Blauwkamp4, Cindy J Castelle1, Mojgan Amirebrahimi5, Brian C Thomas1, David Burstein1, Susannah G Tringe5, Kenneth H Williams6, Jillian F Banfield7.
Abstract
Accurate evaluation of microbial communities is essential for understanding global biogeochemical processes and can guide bioremediation and medical treatments. Metagenomics is most commonly used to analyze microbial diversity and metabolic potential, but assemblies of the short reads generated by current sequencing platforms may fail to recover heterogeneous strain populations and rare organisms. Here we used short (150-bp) and long (multi-kb) synthetic reads to evaluate strain heterogeneity and study microorganisms at low abundance in complex microbial communities from terrestrial sediments. The long-read data revealed multiple (probably dozens of) closely related species and strains from previously undescribed Deltaproteobacteria and Aminicenantes (candidate phylum OP8). Notably, these are the most abundant organisms in the communities, yet short-read assemblies achieved only partial genome coverage, mostly in the form of short scaffolds (N50 = ∼ 2200 bp). Genome architecture and metabolic potential for these lineages were reconstructed using a new synteny-based method. Analysis of long-read data also revealed thousands of species whose abundances were <0.1% in all samples. Most of the organisms in this "long tail" of rare organisms belong to phyla that are also represented by abundant organisms. Genes encoding glycosyl hydrolases are significantly more abundant than expected in rare genomes, suggesting that rare species may augment the capability for carbon turnover and confer resilience to changing environmental conditions. Overall, the study showed that a diversity of closely related strains and rare organisms account for a major portion of the communities. These are probably common features of many microbial communities and can be effectively studied using a combination of long and short reads.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25665577 PMCID: PMC4381525 DOI: 10.1101/gr.183012.114
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.(A) Overlap between short-read assembled scaffolds (orange) and synthetic long reads (blue). Numbers are in Mbp and were calculated based on all overlapping regions longer than 1000 bp aligning at 98% identity or more. (B) Coverage distribution of synthetic long reads and short-read assembled scaffolds. Coverage was computed by mapping the short reads from the same data set.
Figure 2.Rank abundance curve for the 5-m community including all species for which the rpS3 gene could be recovered. (Bottom) While most of the rpS3 genes were recovered from the short-read assembly (orange), least and most abundant species were represented almost exclusively by the long-read data (blue). (Top) Stacked bar graph shows abundance of phyla and Proteobacteria classes; stacked boxes indicate abundance of individual species (number of species indicated). Deltaproteobacteria is the most abundant lineage in the sample, with five of the seven most abundant species being closely related. (Pie chart) Species with zero short-read coverage in the short-read data, detected in the synthetic long reads only.
Figure 3.Alignment of three Deltaproteobacteria reconstructed syntenic regions from the 5-m sample to a closely related genome reconstructed from the planktonic filtrate sample GWC2. Lines connect homologous genes.
Figure 4.Summary of metabolic potential for the Deltaproteobacteria (5-m) and Aminicenantes (4-m) strains.