| Literature DB >> 26615218 |
Alejandro Caro-Quintero1, Howard Ochman2.
Abstract
For both historical and technical reasons, 16S ribosomal RNA has been the most common molecular marker used to analyze the contents of microbial communities. However, its slow rate of evolution hinders the resolution of closely related bacteria--individual 16S-phylotypes, particularly when clustered at 97% sequence identity, conceal vast amounts of species- and strain-level variation. Protein-coding genes, which evolve more quickly, are useful for differentiating among more recently diverged lineages, but their application is complicated by difficulties in designing low-redundancy primers that amplify homologous regions from distantly related taxa. Given the now-common practice of multiplexing hundreds of samples, adopting new genes usually entails the synthesis of large sets of barcoded primers. To circumvent problems associated with use of protein-coding genes to survey microbial communities, we develop an approach--termed phyloTAGs--that offers an automatic solution for primer design and can be easily adapted to target different taxonomic groups and/or different protein-coding regions. We applied this method to analyze diversity within the gorilla gut microbiome and recovered hundreds of strains that went undetected after deep-sequencing of 16S rDNA amplicons. PhyloTAGs provides a powerful way to recover the fine-level diversity within microbial communities and to study stability and dynamics of bacterial populations.Entities:
Keywords: bacterial species; bacterial strain diversity; community profiling; microbiome; phylotypes; population structure
Mesh:
Substances:
Year: 2015 PMID: 26615218 PMCID: PMC4700968 DOI: 10.1093/gbe/evv234
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Comparison of Diversity Recovered by 16S iTAGs and gyrB phyloTAGs
| Gorilla Sample | Family | Marker | Reads Total | Reads Assigned | 100 (%) | 99 (%) | 98 (%) | 97 (%) | 88 (%) |
|---|---|---|---|---|---|---|---|---|---|
| 5248 | 16S rDNA | 45,699 | 113 | 56 | 31 | 26 | 22 | — | |
| 5249 | 16S rDNA | 36,604 | 116 | 63 | 42 | 34 | 31 | — | |
| 5274 | 16S rDNA | 46,947 | 196 | 81 | 43 | 36 | 29 | — | |
| All | 16S rDNA | 129,250 | 425 | 155 | 73 | 57 | 48 | — | |
| 5248 | 4,395 | 4,395 | 3,591 | 1,170 | 444 | 218 | 98 | ||
| 5249 | 2,033 | 2,033 | 1,691 | 872 | 242 | 171 | 88 | ||
| 5274 | 2,013 | 2,013 | 1,719 | 929 | 277 | 176 | 89 | ||
| All | 8,441 | 8,441 | 6,920 | 3,502 | 866 | 366 | 149 | ||
| 5248 | 16S rDNA | — | — | — | — | — | — | — | |
| 5249 | 16S rDNA | — | — | — | — | — | — | — | |
| 5274 | 16S rDNA | — | — | — | — | — | — | — | |
| All | 16S rDNA | — | — | — | — | — | — | — | |
| 5248 | 376 | 376 | 85 | 70 | 50 | 38 | 22 | ||
| 5249 | 446 | 446 | 120 | 110 | 70 | 44 | 31 | ||
| 5274 | 668 | 668 | 283 | 224 | 116 | 65 | 35 | ||
| All | 1,490 | 1,490 | 488 | 383 | 207 | 109 | 53 |
FCorrespondence between levels of sequence divergence and estimates of OTU richness for gyrB and 16S rDNA. () Association between the degree of sequence identity of 16S rDNA and the gyrB gene for pairs of genomes assigned to the same species. Note that 1) a 16S rDNA sequence identity value of 97%, which is conventionally used to delineate bacterial species, corresponds to 88% nucleotide sequence identity for gyrB, and 2) a 16S rDNA sequence identity value of 99%, which has been used to delineate strains within a designated bacterial species, corresponds to 96% nucleotide sequence identity for gyrB. A total of 604 genomes were examined. () Richness of the Lachnospiraceae family within all samples, as estimated by the Chao1 index for gyrB phyloTAGs and the 16S iTAGs at several values of OTU clustering. Estimation of parameters based on subsampling the data sets for each marker gene to the same depth with 100 bootstraps replicates. Shaded zones around rarefaction curves represent the 95% confidence intervals. Dashes show read numbers obtained after extrapolation to sample sizes larger than the actual total number of reads for the 16S data set. () Rarefaction analysis of sample coverage for data sets analyzed in panel (), using identical subsampling parameters. As in (), shaded zones around the rarefaction curves represent 95% confidence intervals, and dashed lines indicating trends after extrapolation to sample sizes larger than the actual total number of reads.
FCommunity structure and OTU diversity recovered by gyrB phyloTAGs. Progressive clustering of gyrB phyloTAG sequences at decreasing levels of amino acid identity (based on conceptually translated nucleotide sequences) reconstructs the diversity and number of taxa within the Lachnospiraceae at different phylogenetic depths. Branches in cladogram are colored according to level of identity at which sequences are clustered. Four major branches (i.e., lineages) occur when clustered at 55% amino acid identity (black), 5 at 60% identity (turquoise), 12 at 65% identity (blue), and so on, with amount of diversification at lower taxonomic scales being highly variable among lineages. Black circles at the terminal end of each branch are sized according to the number of sequences affiliated to a cluster at levels >95% amino acid identity, and the five largest clusters are labeled with roman numerals (I–V). The inset shows the fine-level resolution of OTU variation in Lachnospiraceae cluster I. In this case, progressive clustering of gyrB phyloTAGs was performed on nucleotide sequences and revealed that most of the sequence variation assorts into 99% OTUs, that is, at the level of closely related strains within species.