| Literature DB >> 29018622 |
Abstract
Next-generation sequencing of 16S ribosomal RNA is widely used to survey microbial communities. Sequences are typically assigned to Operational Taxonomic Units (OTUs). Closed- and open-reference OTU assignment matches reads to a reference database at 97% identity (closed), then clusters unmatched reads using a de novo method (open). Implementations of these methods in the QIIME package were tested on several mock community datasets with 20 strains using different sequencing technologies and primers. Richness (number of reported OTUs) was often greatly exaggerated, with hundreds or thousands of OTUs generated on Illumina datasets. Between-sample diversity was also found to be highly exaggerated in many cases, with weighted Jaccard distances between identical mock samples often close to one, indicating very low similarity. Non-overlapping hyper-variable regions in 70% of species were assigned to different OTUs. On mock communities with Illumina V4 reads, 56% to 88% of predicted genus names were false positives. Biological inferences obtained using these methods are therefore not reliable.Entities:
Keywords: Alpha diversity; Beta diversity; Closed-reference; OTU; Open-reference; QIIME
Year: 2017 PMID: 29018622 PMCID: PMC5631090 DOI: 10.7717/peerj.3889
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Mock datasets used in this study.
SRA is the NCBI Short Read Archive accession.
| Set | Primers | Sample | Strains | Species | Genera | Families | SRA | Platform | # reads |
|---|---|---|---|---|---|---|---|---|---|
| Extreme | V4F, V4R | Mock1 | 27 | 26 | 11 | 7 |
| Illumina | 1,256,239 |
| Bok | V4F, V4R | Mock2 | 22 | 22 | 19 | 19 | – | Illumina | 7,056,809 |
| KozV34 | V3F, V4R | Mock3 | 21 | 21 | 18 | 18 | – | Illumina | 651,731 |
| KozV4 | V4F, V4R | Mock3 | 21 | 21 | 18 | 18 | – | Illumina | 4,758,584 |
| KozV45 | V4F, V5R | Mock3 | 21 | 21 | 18 | 18 | – | Illumina | 2,175,664 |
| HmpV13A | V1F, V3R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 23,164 |
| HmpV13B | V1F, V3R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 52,712 |
| HmpV31A | V3F, V3R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 2,744 |
| HmpV31B | V3F, V1R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 43,024 |
| HmpV35A | V3F, V5R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 16,223 |
| HmpV53A | V5F, V3R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 56,439 |
| HmpV53B | V5F, V3R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 14,150 |
| HmpV69A | V6F, V9R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 17,494 |
| HmpV69B | V6F, V9R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 48,141 |
| HmpV96A | V9F, V6R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 27,473 |
| HmpV96B | V9F, V6R | Mock3 | 21 | 21 | 18 | 18 |
| 454 | 12,619 |
Mock OTUs reported by Qclosed and QIIME*.
The first two columns give the numbers of OTUs reported by QIIME closed-reference (Qclosed) and the recommended QIIME protocol (QIIME*). The second two columns show the numbers of chimeras in the OTU sequences for Qclosed and QIIME* respectively as predicted by the high-confidence mode of UCHIME2.
| Set | Strains | ||||
|---|---|---|---|---|---|
| Bok | 22 | 955 | 4,482 | 41 | 703 |
| Extreme | 27 | 343 | 298 | 0 | 0 |
| KozV34 | 21 | 531 | 1,607 | 39 | 899 |
| KozV4 | 21 | 2,263 | 2,857 | 47 | 816 |
| KozV45 | 21 | 1,312 | 5,824 | 61 | 2,983 |
| HmpV13A | 21 | 30 | 565 | 13 | 220 |
| HmpV13B | 21 | 36 | 1,414 | 11 | 456 |
| HmpV31A | 21 | 56 | 536 | 14 | 284 |
| HmpV31B | 21 | 60 | 1,171 | 20 | 584 |
| HmpV35A | 21 | 127 | 679 | 20 | 128 |
| HmpV53A | 21 | 218 | 2,143 | 37 | 575 |
| HmpV53B | 21 | 138 | 739 | 23 | 223 |
| HmpV69A | 21 | 61 | 973 | 33 | 387 |
| HmpV69B | 21 | 75 | 2,562 | 56 | 728 |
| HmpV96A | 21 | 68 | 1,606 | 11 | 539 |
| HmpV96B | 21 | 59 | 792 | 9 | 304 |
Richness of OTUs assigned to known tag sequences in Mock3.
Here, richness is the number of OTUs reported by closed-reference (Qclosed) and the recommended QIIME protocol (QIIME*), respectively. In the first two columns, input is the known tag sequences for the strains in the Mock3 community, modeling an idealized case where all biological sequences in the sample are correctly identified, e.g., by a perfect denoiser. QIIME* richness is given as the number of additional OTUs found compared to Qclosed. Naively, we would expect Qclosed to assign all tags to OTUs because they all belong to strains found in GG. In the last two columns, the 1-sub. variants of each tag sequence are included, i.e., all possible sequences that differ by a single substitution, modeling a very low rate (0.2 to 0.4%) of incorrect bases due to PCR and sequencing.
| Tags | ||||
|---|---|---|---|---|
| Mock3-V13 | 24 | +2 | 217 | +4 |
| Mock3-V34 | 16 | +4 | 327 | +14 |
| Mock3-V35 | 17 | +4 | 306 | +14 |
| Mock3-V4 | 21 | +0 | 450 | +11 |
| Mock3-V45 | 22 | +1 | 442 | +16 |
| Mock3-V69 | 21 | +0 | 190 | +8 |
Qclosed OTU assignments for known tags in Mock3.
The table shows OTU identifiers assigned by Qclosed for tags in the known 16S rRNA genes in the HMP mock community. Ideally, a given species would always be assigned to the same OTU regardless of which tag or which paralog is being classified, but this is true only of D. radiodurans. Shading indicates cases where two or more tags were assigned to the same OTU; singletons are underlined.
|
|
Probability that different tags in a given 16S rRNA sequence are assigned to the same OTU by Qclosed.
For each pair of tag sequences in GG-tagsX, the table shows the fraction which were assigned to the same OTU by the QIIME closed-reference method. Pairs which overlap have darker shading.
|
|
Figure 1Rarefaction curves for Bok reads generated by QIIME.
There are two Even and two Staggered samples of Mock3 (22 strains). The e parameter is the number of reads per sample.
Figure 2Distribution of closed-reference beta diversities for all pairs of Mock2/3 samples.
The histograms show the distribution of weighted Jaccard (A, C) and weighted UniFrac (B, D) distances on all pairs of samples containing Mock2 or Mock3. A zero value for the Jaccard or UniFrac distance indicates maximum similarity between a pair of samples; one indicates maximum difference. Histograms (A) and (B) show the distribution when the same tag is sequenced (e.g., V4), histograms (C) and (D) when different tags are sequenced (e.g., V13 and V69). The y axis is the frequency, calculated as (number of sample pairs having distances which fall into a given bin) divided by (total number of sample pairs).
Taxonomy assignment accuracy for QIIME* OTUs.
| Platform | Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Illumina | Bok | 19 | 39 | 16 | 23 | 3 | 84% | 41% | 58% |
| Extreme | 11 | 16 | 7 | 9 | 4 | 63% | 43% | 56% | |
| Koz.V34 | 18 | 37 | 16 | 21 | 2 | 88% | 43% | 56% | |
| Koz.V4 | 18 | 134 | 16 | 118 | 2 | 88% | 11% | 88% | |
| Koz.V45 | 18 | 73 | 17 | 56 | 1 | 94% | 23% | 76% | |
| 454 | HmpV13A | 18 | 10 | 9 | 1 | 9 | 50% | 90% | 10% |
| HmpV13B | 18 | 11 | 10 | 1 | 8 | 55% | 90% | 9% | |
| HmpV31A | 18 | 12 | 11 | 1 | 7 | 61% | 91% | 8% | |
| HmpV31B | 18 | 10 | 9 | 1 | 9 | 50% | 90% | 10% | |
| HmpV35A | 18 | 18 | 16 | 2 | 2 | 88% | 88% | 11% | |
| HmpV53A | 18 | 18 | 16 | 2 | 2 | 88% | 88% | 11% | |
| HmpV53B | 18 | 17 | 16 | 1 | 2 | 88% | 94% | 5% | |
| HmpV69A | 18 | 17 | 15 | 2 | 3 | 83% | 88% | 11% | |
| HmpV69B | 18 | 16 | 15 | 1 | 3 | 83% | 93% | 6% | |
| HmpV96A | 18 | 14 | 13 | 1 | 5 | 72% | 92% | 7% | |
| HmpV96B | 18 | 14 | 13 | 1 | 5 | 72% | 92% | 7% |
Qclosed results for GG-tags.
Columns are Sequences, the number of tag sequences (and as a fraction GG sequences these represent due to a truncated full-length sequence or >2 primer mismatches); GG97 tags, the number of GG97 sequences from which this tag was extracted (and as percentage of all GG97 sequences); Fails, the total number of fails (and as a percentage of all tested tags); GG97 fails (and as a percentage of GG97 tags), and <97%, the number of tags with <97% identity with the full-length GG97 database (and as a fraction of GG-tags).
| Tag | Sequences | GG97 tags | Fails | GG97 fails | <97% |
|---|---|---|---|---|---|
| V13 | 266,317 (21.1%) | 46,426 (46.7%) | 18,404 (6.9%) | 186 (0.4%) | 10,386 (3.9%) |
| V34 | 1,236,137 (97.9%) | 93,280 (93.9%) | 13,956 (1.1%) | 180 (0.2%) | 6,179 (0.5%) |
| V35 | 1,240,170 (98.3%) | 94,370 (95.0%) | 18,477 (1.5%) | 880 (0.9%) | 6,201 (0.5%) |
| V4 | 1,245,904 (98.7%) | 93,610 (94.2%) | 13,018 (1.0%) | 152 (0.2%) | 6,228 (0.5%) |
| V45 | 1,249,794 (99.0%) | 94,621 (95.3%) | 7,866 (0.6%) | 33 (0.0%) | 4,999 (0.4%) |
| V69 | 100,470 (8.0%) | 13,848 (13.9%) | 2,422 (2.4%) | 25 (0.2%) | 1,706 (1.7%) |