| Literature DB >> 23571758 |
Alexander F Koeppel1, Martin Wu.
Abstract
The lack of a consensus bacterial species concept greatly hampers our ability to understand and organize bacterial diversity. Operational taxonomic units (OTUs), which are clustered on the basis of DNA sequence identity alone, are the most commonly used microbial diversity unit. Although it is understood that OTUs can be phylogenetically incoherent, the degree and the extent of the phylogenetic inconsistency have not been explicitly studied. Here, we tested the phylogenetic signal of OTUs in a broad range of bacterial genera from various phyla. Strikingly, we found that very few OTUs were monophyletic, and many showed evidence of multiple independent origins. Using previously established bacterial habitats as benchmarks, we showed that OTUs frequently spanned multiple ecological habitats. We demonstrated that ecological heterogeneity within OTUs is caused by their phylogenetic inconsistency, and not merely due to 'lumping' of taxa resulting from using relaxed identity cut-offs. We argue that ecotypes, as described by the Stable Ecotype Model, are phylogenetically and ecologically more consistent than OTUs and therefore could serve as an alternative unit for bacterial diversity studies. In addition, we introduce QuickES, a new wrapper program for the Ecotype Simulation algorithm, which is capable of demarcating ecotypes in data sets with tens of thousands of sequences.Entities:
Mesh:
Year: 2013 PMID: 23571758 PMCID: PMC3664822 DOI: 10.1093/nar/gkt241
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Phylogenetic heterogeneity among 16S rRNA OTUs of skin data set
| Genus | OTU identity threshold | OTUs >1 Sequence | OTUs >50 Sequences | ||
|---|---|---|---|---|---|
| Number of OTUs | % Monophyletic | Number of OTUs | % Monophyletic | ||
| 97% | 2 | 0.00% | 1 | 0.00% | |
| 99% | 3 | 0.00% | 1 | 0.00% | |
| 99.5% | 4 | 0.00% | 2 | 0.00% | |
| 97% | 8 | 12.50% | 3 | 0.00% | |
| 99% | 31 | 25.81% | 8 | 0.00% | |
| 99.5% | 45 | 31.11% | 1 | 0.00% | |
| 97% | 31 | 51.61% | 4 | 0.00% | |
| 99% | 64 | 37.50% | 4 | 25.00% | |
| 99.5% | 71 | 11.27% | 3 | 0.00% | |
| 97% | 5 | 60.00% | 1 | 0.00% | |
| 99% | 5 | 60.00% | 1 | 0.00% | |
| 99.5% | 12 | 41.67% | 1 | 0.00% | |
| 97% | 2 | 50.00% | 1 | 0.00% | |
| 99% | 7 | 57.14% | 2 | 0.00% | |
| 99.5% | 44 | 11.36% | 3 | 0.00% | |
| 97% | 53 | 33.96% | 22 | 13.64% | |
| 99% | 160 | 25.00% | 44 | 2.27% | |
| 99.5% | 323 | 18.89% | 45 | 0.00% | |
| 97% | 1 | 0.00% | 1 | 0.00% | |
| 99% | 1 | 0.00% | 1 | 0.00% | |
| 99.5% | 11 | 18.18% | 1 | 0.00% | |
| 97% | 3 | 66.67% | 3 | 66.67% | |
| 99% | 7 | 42.86% | 4 | 25.00% | |
| 99.5% | 88 | 23.86% | 3 | 0.00% | |
| 97% | 6 | 16.67% | 3 | 0.00% | |
| 99% | 59 | 18.64% | 16 | 12.50% | |
| 99.5% | 230 | 24.78% | 12 | 0.00% | |
| 97% | 12 | 75.00% | 5 | 60.00% | |
| 99% | 50 | 22.00% | 7 | 14.29% | |
| 99.5% | 127 | 14.96% | 4 | 0.00% | |
This table displays the number of monophyletic OTUs in each genus at three different identity thresholds (97, 99 and 99.5%). Only OTUs containing at least two sequences and meeting the support criteria were considered, as a single sequence is monophyletic by definition. The effect was more pronounced among larger OTUs (OTUs containing at least 50 sequences, right-hand columns).
Figure 1.OTU paraphyly is pervasive and pronounced. This graph plots OTU size against PI for all 99% 16S rRNA OTUs among 10 genera. PI values of 0.0 indicate monophyletic groups, whereas a PI close to 1 indicates substantial paraphyly. Genus classifications of OTUs are colour coded as indicated in the key.
Figure 2.Extensive paraphyly and polyphyly among OTUs. Maximum likelihood trees of representative subclades of the genera (A) Aquabacterium and (B) Diaphorobacter. OTU generated using the 99% identity cut-off are shown with the putative ecotypes (PE) demarcated by ES. Internal nodes with >80% bootstrap support are highlighted with red circles.
Phylogenetic heterogeneity of OTUs is robust to methodology
| Clustering method | OTU cut-off | Phylogenetic method | ||||
|---|---|---|---|---|---|---|
| FastTree (ML) | RAxML (ML) | RAxML (MP) | QuickTree (NJ) | |||
| QIIME (Uclust) | 97% | 5 | 60.00% | 40.00% | 40.00% | 50.00% |
| 99% | 5 | 60.00% | 60.00% | 40.00% | 60.00% | |
| 99.5% | 12 | 41.67% | 60.00% | 45.45% | 80.00% | |
| MOTHUR (farthest neighbour) | 97% | 9 | 33.33% | 33.33% | 25.00% | 37.50% |
| 99% | 41 | 14.63% | 15.79% | 15.79% | 17.65% | |
| 99.5% | 306 | 10.53% | 10.44% | 10.34% | 14.34% | |
| MOTHUR (nearest neighbour) | 97% | 1 | 0.00% | 0.00% | 0.00% | 0.00% |
| 99% | 3 | 66.67% | 66.67% | 66.67% | 66.67% | |
| 99.5% | 3 | 33.33% | 50.00% | 33.33% | 33.33% | |
| MOTHUR (average neighbour) | 97% | 2 | 50.00% | 50.00% | 50.00% | 50.00% |
| 99% | 5 | 60.00% | 40.00% | 60.00% | 60.00% | |
| 99.5% | 31 | 48.15% | 58.33% | 60.87% | 77.78% | |
| Clusterer (farthest neighbour) | 97% | 11 | 18.18% | 18.18% | 18.18% | 20.00% |
| 99% | 43 | 20.93% | 17.95% | 17.50% | 28.13% | |
| 99.5% | 227 | 13.24% | 14.36% | 14.78% | 15.57% | |
| Clusterer (nearest neighbour) | 97% | 1 | 0.00% | 0.00% | 0.00% | 0.00% |
| 99% | 5 | 40.00% | 40.00% | 40.00% | 40.00% | |
| 99.5% | 7 | 14.29% | 16.67% | 14.29% | 14.29% | |
| Clusterer (UPGMA) | 97% | 4 | 25.00% | 25.00% | 25.00% | 25.00% |
| 99% | 9 | 33.33% | 22.22% | 22.22% | 33.33% | |
| 99.5% | 21 | 45.00% | 52.63% | 47.37% | 47.37% | |
This table displays the percentage of monophyletic 16S rRNA OTUs in the genus Aquabacterium at three different identity thresholds (97, 99 and 99.5%). As in Table 1, only OTUs containing at least two sequences and meeting support criteria were considered in the percentage computations, though the total number of OTUs is displayed in the n column. Different phylogenetic methods (columns) and different OTU clustering algorithms (rows) were tested. Cells display the percentage of OTUs that were monophyletic clades.
ML, Maximum likelihood; MP, Maximum parsimony; NJ, Neighbour joining.
Figure 3.OTUs and Ecotypes show distinct habitat associations. An ML tree of a subset of the Vibrio hsp60 sequences (A) and a neighbour-joining tree of the full set of Synechococcus psaA sequences (B). OTUs, ES ecotypes and AdaptML ecotypes are shown. Note that the formatting in the OTU column and the AdaptML column is different. In the OTU column, all leaves marked by the same color belong to the same OTU. In the AdaptML column, different colours denoted different habitats. Each distinct colour bar is its own ecotype, whereas bars of the same colour are ecotypes co-occurring in the same habitat.
Performance of OTUs and ecotypes in explaining ecological variation in the Vibrio data set
| Model | Number of classes | Variance explained | AIC |
|---|---|---|---|
| OTU 97% | 68 | 42%* | 764 |
| OTU 99% | 187 | 60%* | 649 |
| OTU 100% | 382 | 73%* | 639 |
| QuickES | 156 | 58%* | 609 |
The variance explained was calculated by dividing the constrained inertia by the total inertia. The two methods that returned the lowest AIC scores are highlighted in bold. Asterisk denotes the significance of P ≤ 0.005 by permutation tests.
AIC, Akaike information criterion.
Figure 4.Putative ecotypes are not captured by any single sequence identity cut-off. Graphs display the number of ecotypes whose minimum pairwise sequence identity fall into each of the displayed bins in the skin Aquabacterium (A) and the marine Vibrio (B) data sets.
Ecotype formation and periodic selection rates of three genera
| Genus | Ecotype formation rate | Periodic selection rate | |||||
|---|---|---|---|---|---|---|---|
| Mean | Standard deviation | Median | Mean | Standard deviation | Median | ||
| 20 | 0.179 | 0.092 | 0.458 | 6.969 | 18.950 | 0.934 | |
| 19 | 0.166 | 0.130 | 0.340 | 0.707 | 0.472 | 0.510 | |
| 10 | 0.312 | 0.370 | 0.105 | 88.467 | 239.166 | 2.780 | |
ES (original version) estimates of the mean and median periodic selection and ecotype formation rates for skin Aquabacterium and Diaphorobacter, and for marine Vibrio. The n values are the number of subclades for which the rates were independently calculated.