| Literature DB >> 26300854 |
Julien Tremblay1, Kanwar Singh2, Alison Fern2, Edward S Kirton2, Shaomei He2, Tanja Woyke2, Janey Lee2, Feng Chen3, Jeffery L Dangl4, Susannah G Tringe2.
Abstract
Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6-V8, and V7-V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as well as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. Beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.Entities:
Keywords: 16S rRNA gene sequencing; amplification; community assembly; high throughput sequencing; microbial diversity; microbial population and community ecology; sequencing error
Year: 2015 PMID: 26300854 PMCID: PMC4523815 DOI: 10.3389/fmicb.2015.00771
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Timeline indicating major breakthroughs in experimental and theoretical work in the field of 16S rRNA gene sequencing.
Figure 2Error estimation for various sequencing configurations. (A) Insertion, deletion and substitution error frequency per 1000 P. suwonensis reads before and after lenient and stringent QC. Error frequency was calculated from triplicates for each sequencing condition. Error bars represent standard deviation. (B) Position of insertion, deletion and substitution error frequency in 16S tag amplicon sequences in which no QC filter has been applied.
Reads count summary of full datasets.
| 40,292+40,292 | 40,020 | 30,742 | 76.82% | ||
| 27,867+27,867 | 27,656 | 20,775 | 75.12% | ||
| 12,408+12,408 | 12,327 | 9,252 | 75.05% | ||
| Synthetic community MiSeq V4 | Synthetic community rep. #1 | 37,758+37,758 | 37,496 | 26,608 | 70.96% |
| Synthetic community rep. #2 | 47,646+47,646 | 47,303 | 34,115 | 72.12% | |
| Synthetic community rep. #3 | 59,307+59,307 | 58,906 | 41,560 | 70.55% | |
| 0+126,349 | – | 101,907 | 80.66% | ||
| 0+153,474 | – | 132,795 | 86.53% | ||
| 0+180,811 | – | 157,568 | 87.15% | ||
| Synthetic community MiSeq V6–V8 | Synthetic community rep. #1 | 0+135,496 | – | 113,949 | 84.10% |
| Synthetic community rep. #2 | 0+158,396 | – | 134,772 | 85.09% | |
| Synthetic community rep. #3 | 0+203,480 | – | 164,724 | 80.95% | |
| 275,924+275,924 | 275,156 | 180,808 | 65.71% | ||
| 82,862+82,862 | 82,403 | 74,339 | 90.21% | ||
| 391,600+391,600 | 390,326 | 355,689 | 91.13% | ||
| Synthetic community MiSeq V7–V8 | Synthetic community rep. #1 | 74,930+74,930 | 74,197 | 67,501 | 90.98% |
| Synthetic community rep. #2 | 359,731+359,731 | 358,811 | 326,660 | 91.04% | |
| Synthetic community rep. #3 | 397,267+397,267 | 395,589 | 354,364 | 89.58% | |
| 42,694+0 | – | 16,003 | 37.48% | ||
| 32,254+0 | – | 12,138 | 37.63% | ||
| 22,015+0 | – | 8629 | 39.20% | ||
| Synthetic community 454 V6–V8 | Synthetic community rep. #1 | 42,370+0 | – | 14,495 | 34.21% |
| Synthetic community rep. #2 | 48,509+0 | – | 25,175 | 51.90% | |
| Synthetic community rep. #3 | 44,347+0 | – | 17,427 | 39.30% | |
| Wetlands MiSeq V4 | WL01 | 66,041+66,041 | 65,502 | 55,256 | 84.36% |
| WL02 | 92,710+92,710 | 91,973 | 78,082 | 84.90% | |
| WL03 | 114,416+114,416 | 113,675 | 96,205 | 84.63% | |
| WL04 | 62,074+62,074 | 61,568 | 52,365 | 85.05% | |
| WL05 | 73,230+73,230 | 72,750 | 60,398 | 83.02% | |
| WL07 | 51,025+51,025 | 50,681 | 43,513 | 85.86% | |
| WL08 | 80,311+80,311 | 79,766 | 66,653 | 83.56% | |
| WL09 | 90,488+90,488 | 89,766 | 72,952 | 81.27% | |
| WL10 | 55,087+55,087 | 54,632 | 46,582 | 85.27% | |
| WL11 | 77,180+77,180 | 76,634 | 63,900 | 83.38% | |
| Wetlands MiSeq V6–V8Wetlands MiSeq V7–V8 | WL01 | 0+126,079 | – | 84,781 | 67.24% |
| WL02 | 0+108,944 | – | 83,794 | 76.91% | |
| WL03 | 0+136,065 | – | 100,682 | 74.00% | |
| WL04 | 0+186,039 | – | 136,594 | 73.42% | |
| WL05 | 0+176,231 | – | 140,471 | 79.71% | |
| WL07 | 0+165,158 | – | 128,731 | 77.94% | |
| WL08 | 0+109,851 | – | 87,215 | 79.39% | |
| WL09 | 0+172,148 | – | 133,591 | 77.60% | |
| WL10 | 0+159,228 | – | 118,982 | 74.72% | |
| WL11 | 0+114,464 | – | 90,407 | 78.98% | |
| WL01 | 320,742+320,742 | 318,226 | 228,661 | 71.85% | |
| WL02 | 258,212+258,212 | 256,818 | 207,071 | 80.63% | |
| WL03 | 354,196+354,196 | 350,555 | 267,806 | 76.39% | |
| WL04 | 358,810+358,810 | 355,839 | 268,271 | 75.39% | |
| WL05 | 422,804+422,804 | 419,296 | 330,070 | 78.72% | |
| WL07 | 374,155+374,155 | 370,146 | 289,655 | 78.25% | |
| WL08 | 406,591+406,591 | 403,809 | 310,164 | 76.81% | |
| WL09 | 394,874+394,874 | 391,374 | 307,773 | 78.64% | |
| WL10 | 221,616+221,616 | 219,057 | 168,323 | 76.84% | |
| WL11 | 339,790+339,790 | 335,636 | 264,202 | 78.72% | |
| Wetlands 454 V6–V8 | WL02 | 25,977+0 | – | 12,823 | 49.36% |
| WL03 | 18,852+0 | – | 9,202 | 48.81% | |
| WL04 | 50,363+0 | – | 22,863 | 45.40% | |
| WL05 | 17,490+0 | – | 8,482 | 48.50% | |
| WL07 | 15,617+0 | – | 7,431 | 47.58% | |
| WL08 | 16,173+0 | – | 7,462 | 46.14% | |
| WL09 | 13,899+0 | – | 6,597 | 47.46% | |
| WL10 | 33,079+0 | – | 15,746 | 47.60% | |
| WL11 | 9894+0 | – | 4849 | 49.01% |
Reads were first filtered for Illumina adapter sequences and PhiX reads and separated by pairs. Disrupted pairs were discarded and remaining reads were binned by barcodes and processed through our stringent QC filter. QC passed reads were divided by these processed pre-QC reads to obtain percentage values.
Pre-filtered assembled reads have slightly lower counts than their non-assembled counterparts because a small proportion of reads did not assemble.
Figure 3Observed OTUs rarefaction estimation curves for . A dotted black line shows the theoretical number of expected OTUs for P. suwonensis (PS) and a red line for the synthetic community (SC). All OTU tables used to generate rarefaction curves were rarefied to 2893 reads per sample.
Figure 4Taxonomy heatmaps of all 16S data from (A) the synthetic community DNA pool and (B) samples from a wetlands sampling site. Color scale is defined as log2 of percentage values of each taxon.
Synthetic community microorganism list and expected relative abundance.
| 4,847,594 | 3.50 | 8 | 35.00 | 30.37 | 71.35 | Bacteria; Firmicutes; Clostridia; Clostridiales;Clostridium | ||
| 3,697,626 | 3.00 | 1 | 30.00 | 34.12 | 10.02 | Archaea; Euryarchaeota; Halobacteria;Halobacteriales; Natrinema | ||
| 4,368,708 | 1.50 | 1 | 15.00 | 14.44 | 4.24 | Bacteria; Proteobacteria;Gammaproteobacteria; Enterobacteriales;Enterobacteriaceae; Pantoea | ||
| 4,225,490 | 1.00 | 2 | 10.00 | 9.95 | 5.85 | Bacteria; Proteobacteria;Gammaproteobacteria; Xanthomonadales;Xanthomonadaceae; Rhodanobacter | ||
| 3,788,356 | 0.68 | 3 | 6.80 | 7.55 | 6.65 | Archaea; Euryarchaeota; Halobacteria;Halobacteriales; Halobacteriaceae;Natronobacterium | ||
| 3,419,049 | 0.20 | 2 | 2.00 | 2.46 | 1.45 | Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;Xanthomonadaceae; Pseudoxanthomonas | ||
| 6,464,916 | 0.06 | 2 | 0.60 | 0.39 | 0.23 | Bacteria; Actinobacteria; Actinobacteria; Actinomycetales;Mycobacteriaceae; Mycobacterium | ||
| 2,846,968 | 0.04 | 1 | 0.40 | 0.59 | 0.17 | Archaea; Euryarchaeota; Halobacteria;Halobacteriales; Halobacteriaceae;Halobacterium | ||
| 6,805,951 | 0.02 | 1 | 0.20 | 0.12 | 0.04 | Bacteria; Firmicutes; Bacilli; Bacillales;Paenibacillaceae; Paenibacillus;Paenibacillus |
Figure 5Procrustes rotation comparison of weighted UniFrac, unweighted UniFrac and Bray-Curtis coordinates metrics for various wetland 16S tag data types.
M.
| MiSeq V7–V8 assembled | 0.043 (0.0000) | ||
| 454 V6–V8 | 0.118 (0.0000) | 0.123 (0.0000) | |
| MiSeq V6–V8 reads 2 | 0.209 (0.0004) | 0.217 (0.0004) | 0.200 (0.0004) |
| MiSeq V7–V8 assembled | 0.004 (0.0000) | ||
| 454 V6–V8 | 0.008 (0.0000) | 0.011 (0.0000) | |
| MiSeq V6–V8 reads 2 | 0.015 (0.0001) | 0.010 (0.0000) | 0.022 (0.0012) |
| MiSeq V7–V8 assembled | 0.004 (0.0000) | ||
| 454 V6–V8 | 0.027 (0.0000) | 0.025 (0.0000) | |
| MiSeq V6–V8 reads 2 | 0.031 (0.0000) | 0.036 (0.0000) | 0.038 (0.0000) |
Monte Carlo P-values are in parentheses.