| Literature DB >> 25561355 |
Ester M Eckert1, Diego Fontaneto2, Manuela Coci3, Cristiana Callieri4.
Abstract
The amount of information that is available on 16S rRNA sequences for prokaryotes thanks to high-throughput sequencing could allow a better understanding of diversity. Nevertheless, the application of predetermined threshold in genetic distances to identify units of diversity (Operative Taxonomic Units, OTUs) may provide biased results. Here we tests for the existence of a barcoding gap in several groups of Cyanobacteria, defining units of diversity according to clear differences between within-species and among-species genetic distances in 16S rRNA. The application of a tool developed for animal DNA taxonomy, the Automatic Barcode Gap Detector (ABGD), revealed that a barcoding gap could actually be found in almost half of the datasets that we tested. The identification of units of diversity through this method provided results that were not compatible with those obtained with the identification of OTUs with threshold of similarity in genetic distances of 97% or 99%. The main message of our results is a call for caution in the estimate of diversity from 16S sequences only, given that different subjective choices in the method to delimit units could provide different results.Entities:
Year: 2014 PMID: 25561355 PMCID: PMC4390840 DOI: 10.3390/life5010050
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Figure 1Plot of the distribution of pairwise genetic distances in a hypothetical group of organisms of different species. All the genetic distances that belong to pairwise comparisons of organisms within the same species fall in the bars on the left, in this case below 2.5%; all the distances that belong to pairwise comparisons between organisms of different species fall in the bars on the right, in this case between 4.5% and 9.5%; no intermediate distances exist between the two distributions, defining a dataset-specific barcoding gap ranging from 2.5% to 4.5%. In this case, a 97% threshold would provide reliable units of diversity, whereas a 99% threshold would overestimate the actual biological diversity.
Summary of the total number of sequences and maximum genetic diversity for each of the 16 datasets, with the existence or not of a clear barcoding gap found through Automatic Barcode Gap Detector (ABGD), the number of units identified by the dataset-specific threshold of ABGD, and the number of Operative Taxonomic Units (OTUs) identified by the 97% and 99% a-priori threshold in genetic diversity. Numbers between brackets identify the number of units when excluding singletons.
| Dataset | Number of Sequences | Maximum Genetic Diversity | Barcoding Gap | ABGD Units | OTU 97% | OTU 99% |
|---|---|---|---|---|---|---|
| 34 | 0.03 | yes | 2 (2) | 2 (2) | 6 (6) | |
| 508 | 0.05 | no | - | 5 (4) | 33 (28) | |
| 97 | 0.07 | yes | 3 (2) | 3 (3) | 7 (7) | |
| 71 | 0.13 | no | - | 8 (5) | 34 (13) | |
| 88 | 0.11 | no | - | 4 (3) | 25 (9) | |
| 157 | 0.14 | yes | 2 (2) | 3 (3) | 25 (18) | |
| 58 | 0.07 | yes | 6 (4) | 2 (2) | 4 (4) | |
| 400 | 0.26 | no | - | 8 (3) | 153 (67) | |
| 316 | 0.06 | no | - | 2 (2) | 11 (7) | |
| 74 | 0.05 | yes | 2 (2) | 4 (2) | 12 (6) | |
| 323 | 0.04 | no | - | 5 (4) | 19 (14) | |
| 70 | 0.16 | yes | 5 (4) | 3 (2) | 24 (8) | |
| 108 | 0.12 | yes | 13 (6) | 3 (3) | 10 (9) | |
| 1011 | 0.05 | no | - | 3 (2) | 29 (13) | |
| 2448 | 0.17 | no | - | 40 (18) | 286 (99) | |
| 135 | 0.108 | no | - | 1 (1) | 12 (7) |
Figure 2Plot of the distribution of pairwise genetic distances in six of the 16 datasets. The three datasets on the left (A–C) provided evidence of a barcoding gap through Automatic Barcode Gap Detector (ABGD), whereas no barcoding gap could be found the three datasets on the right (D–F). Note the different scale bars of the figures.
Results of the statistical assessments of various explanatory variables for the different hypotheses explicitly tested in the study. (A) Generalized Linear Model (GLM) with binomial error for the existence of a barcoding gap as a function of the number of sequences and the maximum genetic diversity in each of the 16 datasets; (B); (C); and (D) GLM with Poisson error for the number of Automatic Barcode Gap Detector (ABGD) units, Operative Taxonomic Units (OTUs) from 97% and OTUs from 99% threshold as a function of the number of sequences. The results from estimates of diversity obtained when excluding singletons are reported between brackets.
| (intercept) | −14.19 ± 566.51 | 0.898 |
| Number of sequences | −0.03 ± 0.02 | 0.286 |
| Maximum genetic diversity | −22.3 ± 33.3 | 0.504 |
| Number of sequences: Maximum genetic diversity | 0.08 ± 0.14 | 0.578 |
| Shape | 19.66 ± 566.1 | 0.897 |
| (intercept) | 1.35 ± 0.44 (1.13 ± 0.53) | 0.002 (0.035) |
| Number of sequences | 0.00 ± 0.00 (0.00 ± 0.00) | 0.624 (0.979) |
| (intercept) | 1.06 ± 0.15 (0.87 ± 0.17) | <0.0001 (<0.0001) |
| Number of sequences | 0.00 ± 0.00 (0.00 ± 0.00) | <0.0001 (<0.0001) |
| (intercept) | 2.99 ± 0.05 (0.22 ± 0.07) | <0.0001 (<0.0001) |
| Number of sequences | 0.00 ± 0.00 (0.01 ± 0.00) | <0.0001 (<0.0001) |
Figure 3Rooted Maximum Likelihood phylogenetic trees for three datasets with barcoding gap presented in Figure 2. Scale bars represent the number of substitutionns per site according to the GTRMIX model. All branches depicted are supported by bootstrap values of 100%. The numbers represent unique sequences within the clade. Colors within each bar correspond to 99% identity Operative Taxonomic Units (OTUs) within the collapsed monophyletic clades and the dark grey squares represent ABGD units (AU). The names correspond to the ones from SILVA database 111 [32].