| Literature DB >> 26078859 |
Emily A Brown1, Frédéric J J Chain2, Teresa J Crease3, Hugh J MacIsaac4, Melania E Cristescu2.
Abstract
DNA metabarcoding is a promising method for describing communities and estimating biodiversity. This approach uses high-throughput sequencing of targeted markers to identify species in a complex sample. By convention, sequences are clustered at a predefined sequence divergence threshold (often 3%) into operational taxonomic units (OTUs) that serve as a proxy for species. However, variable levels of interspecific marker variation across taxonomic groups make clustering sequences from a phylogenetically diverse dataset into OTUs at a uniform threshold problematic. In this study, we use mock zooplankton communities to evaluate the accuracy of species richness estimates when following conventional protocols to cluster hypervariable sequences of the V4 region of the small subunit ribosomal RNA gene (18S) into OTUs. By including individually tagged single specimens and "populations" of various species in our communities, we examine the impact of intra- and interspecific diversity on OTU clustering. Communities consisting of single individuals per species generated a correspondence of 59-84% between OTU number and species richness at a 3% divergence threshold. However, when multiple individuals per species were included, the correspondence between OTU number and species richness dropped to 31-63%. Our results suggest that intraspecific variation in this marker can often exceed 3%, such that a single species does not always correspond to one OTU. We advocate the need to apply group-specific divergence thresholds when analyzing complex and taxonomically diverse communities, but also encourage the development of additional filtering steps that allow identification of artifactual rRNA gene sequences or pseudogenes that may generate spurious OTUs.Entities:
Keywords: Biodiversity; high-throughput sequencing; metabarcoding; mock community; nSSU; zooplankton
Year: 2015 PMID: 26078859 PMCID: PMC4461424 DOI: 10.1002/ece3.1485
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 1The use of complex mock communities that involve tagged primers to allow the separation and independent analysis of the sequences generated by different species or taxonomic groups. This method facilitates the identification of intra- and interspecific divergence levels. It also allows researchers to calibrate the thresholds of sequence divergence for all targeted taxonomic groups.
The number of raw reads generated by the two 454 sequencing runs (the Individuals and Populations Communities). Each run included both the Tagged and Untagged Communities.
| Individuals community | Populations community | |||
|---|---|---|---|---|
| 610,914 | 625,239 | |||
| Raw reads | Tagged | Untagged | Tagged | Untagged |
| Barcode/primer error-filtered reads | 115,902 | 430,845 | 404,052 | 199,871 |
| Quality-filtered reads | 58,334 | 229,435 | 296,944 | 142,969 |
| Unique reads including singletons | 8736 | 20,487 | 32,745 | 20,077 |
| Singletons | 6338 | 13,575 | 22,181 | 14,102 |
| Unique reads excluding singletons | 2398 | 6730 | 10,564 | 5975 |
OTUs generated after clustering the data for the Tagged Individuals Community using a 3% divergence threshold. Results are reported when singletons are excluded and included. The number of filtered reads that were clustered to form each OTU is reported.
| Tagged individual | Singletons excluded | Singletons included | ||||
|---|---|---|---|---|---|---|
| No. OTUs | Species matching OTU(s) | No. reads in OTU | No. OTUs | Species matching OTU(s) | No. reads in OTU | |
| 1 | 795 | 2 | 962 | |||
| 2 | ||||||
| 1 | 6231 | 1 | 6943 | |||
| 1 | 4 | |||||
| 1 | 418 | 1 | 544 | |||
| 1 | 259 | 2 | 327 | |||
| 1 | ||||||
| 1 | 726 | 2 | 1057 | |||
| 1 | ||||||
| 2 | 23,420 | 4 | 25,497 | |||
| 3874 | 3883 | |||||
| 24 | ||||||
| 1 | ||||||
| 1 | 62 | 1 | 84 | |||
| 1 | 158 | 1 | 213 | |||
| 1 | 2947 | 1 | 3324 | |||
| 1 | 4209 | 1 | 4969 | |||
| 1 | 1764 | 1 | 2107 | |||
| 1 | 6 | 1 | 9 | |||
| 1 | 215 | 1 | 259 | |||
| 1 | 240 | 1 | 375 | |||
| 2 | 3050 | 3 | 3437 | |||
| 6 | 9 | |||||
| 1 | ||||||
| 1 | 76 | 1 | 102 | |||
| 1 | 2928 | 1 | 3383 | |||
| 1 | 26 | 1 | 39 | |||
Figure 2The number of OTUs generated and species detected when clustering data from (A) the Untagged Individuals Community and (B) the Untagged Populations Community. Filtered sequences were clustered into OTUs that were BLASTed against a reference database to assign species names. The solid horizontal line indicates the expected number of species. Percent divergence thresholds between 1% and 10% were used to cluster unique sequences with UPARSE, with and without including singletons in the analysis.
OTUs generated after clustering the data for the Tagged Populations Community using a 3% divergence threshold. Results are reported when singletons are excluded or included. The number of individuals included within each population is indicated before the species name. For example, “5 x” indicates that five individuals were present. The number of filtered reads that were clustered to form each OTU is reported.
| Tagged population | Singletons excluded | Singletons included | ||||
|---|---|---|---|---|---|---|
| No. OTUs | Species matching OTU(s) | No. reads in OTU | No. OTUs | Species matching OTU(s) | No. reads in OTU | |
| 5 × | 2 | 14,849 | 3 | 15,197 | ||
| 11 | 12 | |||||
| 2 | ||||||
| 10 × | 1 | 1465 | 3 | 1735 | ||
| 3 | ||||||
| 1 | ||||||
| 30 × | 5 | 14,922 | 5 | 17,334 | ||
| 36 | 50 | |||||
| 30 | 36 | |||||
| 9 | 12 | |||||
| 2 | 4 | |||||
| 3 × | 1 | 4186 | 1 | 4793 | ||
| 5 × | 2 | 2294 | 2 | 2863 | ||
| 503 | 746 | |||||
| 10 × | 2 | 19,560 | 2 | 20,842 | ||
| 2827 | 2963 | |||||
| 17 × | 1 | 8888 | 1 | 9765 | ||
| 5 × | 1 | 2326 | 1 | 2631 | ||
| 10 × | 3 | 1681 | 3 | 2164 | ||
| 603 | 643 | |||||
| 54 | 71 | |||||
| 5 × | 1 | 3901 | 2 | 4454 | ||
| 1 | ||||||
| 10 × | 1 | 12,825 | 1 | 13,394 | ||
| 31 × | 1 | 30,041 | 1 | 31,495 | ||
| 5 × | 2 | 12,104 | 4 | 12,901 | ||
| 4 | 8 | |||||
| 8 | ||||||
| 1 | ||||||
| 9 × | 2 | 3697 | 3 | 2884 | ||
| 10 | 2190 | |||||
| 14 | ||||||
| 30 × | 1 | 38,038 | 2 | 39,947 | ||
| 1 | ||||||
| 5 × | 1 | 565 | 1 | 860 | ||
| 8 × | 1 | 1343 | 1 | 1832 | ||
| 27 × | 1 | 26,959 | 1 | 28,075 | ||
| 5 × | 1 | 10,915 | 2 | 6564 | ||
| 4870 | ||||||
| 10 × | 1 | 2807 | 3 | 3441 | ||
| 93 | ||||||
| 1 | ||||||
| 28 × | 2 | 10,310 | 4 | 12,246 | ||
| 13 | 117 | |||||
| 73 | ||||||
| 19 | ||||||
| 5 × | 1 | 3048 | 2 | 3387 | ||
| 1 | ||||||
| 10 × | 2 | 15,569 | 3 | 16,088 | ||
| 445 | 577 | |||||
| 6 | ||||||
| 30 × | 1 | 3924 | 2 | 4593 | ||
| 2 | ||||||
Lowest percentage divergence thresholds required to generate a single OTU when clustering data for the 24 populations in the Tagged Populations Community. The number of individuals included within each population is indicated before the species name. For example, “5 x” indicates that five individuals were present. Note that >10% indicates that multiple OTUs were still generated even when applying a 10% divergence threshold.
| Percentage identity required to generate a single OTU | ||
|---|---|---|
| Singletons excluded | Singletons included | |
| 5 × | 8 | 8 |
| 10 × | 3 | 8 |
| 30 × | 9 | 9 |
| 3 × | 2 | 3 |
| 5 × | 4 | 6 |
| 10 × | 4 | 4 |
| 17 × | 3 | 3 |
| 5 × | 1 | 2 |
| 10 × | 5 | 6 |
| 5 × | 2 | >10 |
| 10 × | 2 | 3 |
| 31 × | 2 | 3 |
| 5 × | 4 | 4 |
| 9 × | 5 | 5 |
| 30 × | 3 | 10 |
| 5 × | 1 | 2 |
| 8 × | 2 | 3 |
| 27 × | 2 | 3 |
| 5 × | 3 | 4 |
| 10 × | 2 | >10 |
| 28 × | 5 | 5 |
| 5 × | 2 | 5 |
| 10 × | 6 | >10 |
| 30 × | 2 | 4 |
Figure 3Dendrograms of OTU sequences from the Tagged Individuals and Tagged Populations Communities in (A) Corbicula fluminea and (B) Leptodiaptomus spp. The divergence threshold used was 3% and singletons were excluded. Representative sequences for the OTUs and a reference sequence were aligned with default settings in MAFFT v 7.150b (Katoh and Standley 2013). Dendrograms were generated using FastTree v 2.1.7 (Price et al. 2010). Each OTU was labeled according to the number of individuals included in the tagged sample (e.g., five individuals) and the number of reads that make up the OTU cluster (e.g., 14,849 reads).