| Literature DB >> 20236171 |
Susan M Huse1, David Mark Welch, Hilary G Morrison, Mitchell L Sogin.
Abstract
Deep sequencing of PCR amplicon libraries facilitates the detection of low-abundance populations in environmental DNA surveys of complex microbial communities. At the same time, deep sequencing can lead to overestimates of microbial diversity through the generation of low-frequency, error-prone reads. Even with sequencing error rates below 0.005 per nucleotide position, the common method of generating operational taxonomic units (OTUs) by multiple sequence alignment and complete-linkage clustering significantly increases the number of predicted OTUs and inflates richness estimates. We show that a 2% single-linkage preclustering methodology followed by an average-linkage clustering based on pairwise alignments more accurately predicts expected OTUs in both single and pooled template preparations of known taxonomic composition. This new clustering method can reduce the OTU richness in environmental samples by as much as 30-60% but does not reduce the fraction of OTUs in long-tailed rank abundance curves that defines the rare biosphere.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20236171 PMCID: PMC2909393 DOI: 10.1111/j.1462-2920.2010.02193.x
Source DB: PubMed Journal: Environ Microbiol ISSN: 1462-2912 Impact factor: 5.491
Number of OTUs for different clustering methods.
| Expected OTUs | Maximum expected due to errors | MS-CL | PW-AL | SLP/PW-AL | |
|---|---|---|---|---|---|
| Template samples | |||||
| 2 | 3781 | 1042 | 277 | 88 | |
| 1 | 231 | 137 | 25 | 10 | |
| 2 | 7691 | 1267 | 323 | 128 | |
| 1 | 851 | 205 | 37 | 20 | |
| Clone-43 (v6) ( | 43 | 6301 | 2473 | 458 | 275 |
| Clone-43 v4-5 (232 nt, | 42 | 231 | 126 | 51 | 54 |
| Clone-90 (241 nt, | 30 | 34–692 | 237 | 65 | 62 |
| Natural samples | |||||
| Deep-sea vent | N/A | 63–1262 | 709 | 483 | 470 |
| English Channel ( | N/A | 13–262 | 1154 | 880 | 859 |
| Human Gut ( | N/A | 15–302 | 803 | 625 | 566 |
| Sewage ( | N/A | 33–662 | 2383 | 1881 | 1831 |
| North Atlantic Deep Water ( | N/A | 15–302 | 1713 | 1363 | 1339 |
For both known template and environmental samples, we calculated the number of expected OTUs (known templates only) and the number generated using several alignment and clustering methods. We calculated the maximum additional OTUs expected due to errors using either (1) the count of unique sequences having more than 2 errors in the template pool (superscript ‘1’ in the table), or (2) 1–2 OTUs for every 1000 tags (superscript ‘2’ in the table). MS is a multiple sequence alignment, PW is a pairwise alignment, CL is complete-linkage clustering, AL is average-linkage clustering, and SLP is single-linkage preclustering.
Effect of clustering algorithm on sequences within 3% of their template.
| Clone-43 | |||||
|---|---|---|---|---|---|
| Expected | 2 | 2 | 1 | 1 | 43 |
| MS-CL | 129 | 89 | 29 | 26 | 694 |
| MS-AL | 54 | 44 | 6 | 12 | 218 |
| MS-SL | 2 | 2 | 1 | 2 | 57 |
| PW-CL | 6 | 5 | 1 | 1 | 308 |
| PW-AL | 2 | 2 | 1 | 1 | 43 |
| PW-SL | 2 | 1 | 1 | 1 | 43 |
For each of five defined template preparations, we used combinations of alignment and clustering methods to create 3% OTUs.
Fig. 1Effect of clustering method on the number of OTUs. We created OTU clusters of the three known template preparations using combinations of multiple sequence and pairwise alignments, complete-linkage and average-linkage clustering, and single-linkage preclustering. Each method provides distinctly different numbers of OTUs for the same data. For short hypervariable tags sequenced at depth, the single-linkage preclustering using pairwise alignments, followed by an average linkage clustering (SLP / PW-AL) provides the most accurate results.
Fig. 2Number of additional OTUs as a function of sample depth. For the two genomic templates, E. coli and S. epidermidis, and the multiple template Clone-43 samples, we calculated the number of spurious OTUs as a function of sample depth.
Distribution of most and least abundant pyrotags in MS-CL and SLP/PW-AL clusters.
| Deep-sea vent | English Channel | Human Gut | Sewage | North Atlantic Deep Water | |
|---|---|---|---|---|---|
| Size of most abundant OTU MS-CL | 47 711 | 1613 | 836 | 948 | 1371 |
| Size of most abundant OTU SLP/PW-AL | 49 857 | 1882 | 1436 | 4219 | 1711 |
| Total OTUs MS-CL | 709 | 1154 | 803 | 2383 | 1713 |
| Total OTUs SLP/PW-AL | 470 | 859 | 566 | 1831 | 1339 |
| Per cent OTUs as singletons – tripletons MS-CL | 39% | 66% | 65% | 69% | 74% |
| Per cent OTUs as singletons – tripletons SLP/PW-AL | 42% | 64% | 64% | 69% | 77% |