| Literature DB >> 26859772 |
Esther Singer1, Brian Bushnell1, Devin Coleman-Derr1,2, Brett Bowman3, Robert M Bowers1, Asaf Levy1, Esther A Gies4, Jan-Fang Cheng1, Alex Copeland1, Hans-Peter Klenk5, Steven J Hallam4, Philip Hugenholtz6, Susannah G Tringe1, Tanja Woyke1.
Abstract
Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structures at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26859772 PMCID: PMC5029162 DOI: 10.1038/ismej.2015.249
Source DB: PubMed Journal: ISME J ISSN: 1751-7362 Impact factor: 10.302
Figure 1Workflow of the PhyloTag sequence generation and cluster analysis pipeline with results from one of the five mock community replicate data sets (more details in Supplementary Figure 1). A simulated FL 16S rRNA gene read data set generated from the 23 genomes of the selected bacterial species was used to optimize the clustering steps in the pipeline (see Materials and methods section). Detailed processing steps for iTag and shotgun sequences are illustrated in Supplementary Figure 1.
Platform-dependent properties in Sanger-, Illumina- and PacBio-based community profiling
| Cloning required | Yes | No | No |
| Average sequence time | ~3 h/96-well plate | 8 h | 2 h/SMRT cell |
| Commonly used primers/amplicons | 4aF/27 F, 1392 R
→ near full-length | Various → up to 500 bp | 4aF/27 F, 1492 R
→ full-length |
| Amplification during sequencing | No | Yes | No |
| Average data output | ~0.1 Mb per 96-well plate | 8 Gb per Flowcell | 0.3 Gb per SMRT Cell |
| Approximate cost per Mb | ~US$2000.00 | US$0.11 | US$2.50 |
Abbreviations: F, forward; PacBio, Pacific Biosciences; R, reverse; SMRT, single-molecule real-time.
Clear platform advantages.
Cost is burdened.
Figure 2Analysis of the mock community profiles. (a) Abundance profiles of the mock community as represented by PhyloTags (pooled from all five replicates), PacBio shotgun sequences and V4 iTags. Nocardiopsis dassonvillei, which was added at very low relative abundance, was exclusively detected in the PacBio shotgun data set. Additional contaminant OTUs were found only in the V4 iTags (Supplementary Table 3). (b) Spearman's rank correlation coefficients and corresponding P-values were calculated to evaluate the strength of relationships between various sequence data sets. (c) Principal coordinate analysis (PCoA) of microbial community structures at various Sakinaw Lake depths according to PhyloTags and iTags. Mean PCoA distances between iTag and PhyloTag pairs of the same depth are stated within parentheses in the legend. The inset shows depths 50 –120 m reanalyzed for higher resolution.
Figure 3Conceptual representation of the 16S rRNA gene sequence with conserved (green) and hypervariable (blue) regions. Pink strips represent the abundance of mutations in respective variable regions. Target selection in amplicon sequencing determines community fingerprints. (a) The 16S rRNA gene variability is not homogeneous across taxonomic groups and subregions of the FL sequence. For example, Salmonella spp. are 97.4% identical across the FL 16S rRNA gene sequence, but are 100% identical across the V4 region. (b) In other instances, exclusively considering the hypervariable V4 region may lead to an overestimation of community diversity because mutations may accumulate here more than across the entire 16S rRNA gene.
(a) Significance test of Sakinaw Lake community structure differences between PhyloTags and in silico generated V4 sequences at various taxonomic levels and (b) percentage of PhyloTags and in silico generated V4 sequences classified at various taxonomic levels
| P | ||
|---|---|---|
| 90 | 0.30 | <0.001 |
| 93 | 0.21 | <0.001 |
| 95 | 0.30 | <0.001 |
| 97 | 0.16 | <0.001 |
| 98 | 0.20 | <0.001 |
Abbreviation: FL, full length.
Figure 4Community composition analysis of Sakinaw Lake depth profile at phylum and genus level represented by PhyloTag and in silico generated V4 sequences. (a) Percentage of FL PhyloTag sequences by phylum with ambiguous classifications according to their in silico generated V4 region and vice versa. Phyla with ambiguous sequences (V4: ⩾5.0% of their total; FL: ⩾1.0% of their total) are reported in this figure. Relative sequence abundance of phyla in the total community based on the number of sequences is stated above bars. (b) Community composition analysis of Sakinaw lake depth profile at genus level and arranged by phylum (with >1% relative abundance). Color pairs denote samples of the same depth represented by FL and V4 sequences. Bubble sizes indicate read abundance of individual genera. Several OTUs showing largest discrepancy between V4 and FL abundances are highlighted by boxes (solid gray: more FL>V4; dotted black: V4>FL). Numbered boxes around bordered bubbles represent genera Methylocaldum (1), uncultivated genus within the Nitrospiraceae (2), Bacillus (3) and Methylotenera (4). Biological importance of these selected genera is discussed in the text. Examples of other genera with >1000 more FL than V4 sequences and >200 more V4 than FL sequences are depicted by bordered bubbles and boxes. Ecological significance of these genera in Sakinaw Lake was difficult to predict, for example, owing to the lack of reference genomes.