| Literature DB >> 31354793 |
Nishal Kumar Pinna1, Anirban Dutta1, Mohammed Monzoorul Haque1, Sharmila S Mande1.
Abstract
Background: Next-generation sequencing (NGS) technologies have enabled probing of microbial diversity in different environmental niches with unprecedented sequencing depth. However, due to read-length limitations of popular NGS technologies, 16S amplicon sequencing-based microbiome studies rely on targeting short stretches of the 16S rRNA gene encompassing a selection of variable (V) regions. In most cases, such a short stretch constitutes a single V-region or a couple of V-regions placed adjacent to each other on the 16S rRNA gene. Given that different V-regions have different resolving ability with respect to various taxonomic groups, selecting the optimal V-region (or a combination thereof) remains a challenge.Entities:
Keywords: amplicon sequencing; metagenomics 16S; microbiome analysis; paired-end sequencing; taxonomic profiling
Year: 2019 PMID: 31354793 PMCID: PMC6640118 DOI: 10.3389/fgene.2019.00653
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Combinatorial strategy for targeting multiple pair-wise combinations of non-contiguous (or contiguous) V-regions. The strategy relies on obtaining taxonomic abundance profiles of a microbial community from two paired-end sequencing experiments, each of which targets different pair-wise combinations of V-regions. The two taxonomic profiles are then combined based on the pre-calculated accuracies of individual V-regions (targeted in the two experiments) in resolving each of the taxonomic groups under consideration.
Figure 2Taxonomic classification accuracies at genus level for different variable regions. Plot depicting the percentage of 16S rRNA genes present in RDP database that could be correctly classified utilizing different variable (V) regions (see Methods). Correct classifications obtained using full-length 16S sequences are also depicted for comparison. Taxonomic classification accuracy at genus level has been considered in this plot and has been cumulated and depicted at the phylum level (only for five most represented phyla in the downloaded RDP sequences).
Figure 3Taxonomic classification accuracies at species level for different variable regions. Plot depicting the average taxonomic classification accuracies obtained at species level using different pair-wise combinations of V-regions (both contiguous as well as non-contiguous) drawn from the 16S rRNA genes. 16S rRNA genes used for the evaluation were retrieved from the RDP database (see Methods).
Figure 4Taxonomic classification accuracies obtained using different pair-wise combinations of V-regions (contiguous as well as non-contiguous). Accuracy of taxonomic assignments has been evaluated at the species level and cumulated at phylum level for representation (only for five most represented phyla in the downloaded RDP sequences). Combinations of V-regions achieving a classification accuracy of > = 70% (averaged for the depicted phyla) are shown. Combinations of contiguously placed V-regions have been indicated with an asterisk (*).
Taxonomic classification accuracies obtained using different pair-wise combinations of V-regions (both contiguous as well as non-contiguous) evaluated for mock microbiome datasets, each constituting of 10,000 randomly selected 16S rRNA genes from five different 16S gene pools.
| Combination of V-region | Classification accuracy (%) at species level averaged over five mock datasets from each 16S gene pool | |||||
|---|---|---|---|---|---|---|
| Mock datasets from 16S gene pool 1 | Mock datasets from 16S gene pool 2 | Mock datasets from 16S gene pool 3 | Mock datasets from 16S gene pool 4 | Mock datasets from 16S gene pool 5 | Average accuracy | |
| V1+V4 | 77.29 | 79.47 | 72.79 | 75.90 | 80.48 | 77.19 |
| V1+V3 | 74.69 | 78.16 | 77.52 | 74.76 | 80.08 | 77.04 |
| V1+V8 | 76.03 | 77.96 | 73.24 | 75.72 | 79.32 | 76.46 |
| V1+V7 | 77.20 | 78.33 | 70.37 | 77.34 | 78.60 | 76.37 |
| V1+V6 | 72.46 | 77.34 | 69.73 | 78.25 | 76.90 | 74.94 |
| V1+V5 | 70.89 | 74.24 | 69.16 | 73.37 | 76.40 | 72.81 |
| V1+V9 | 71.74 | 71.41 | 71.33 | 73.95 | 75.57 | 72.80 |
| V2+V4 | 69.07 | 75.07 | 72.76 | 70.99 | 73.55 | 72.29 |
| V2+V8 | 68.26 | 74.60 | 73.33 | 70.66 | 73.27 | 72.02 |
| V2+V6 | 66.84 | 74.54 | 72.60 | 72.19 | 72.67 | 71.77 |
| V2+V7 | 68.34 | 72.76 | 72.73 | 71.17 | 71.30 | 71.26 |
| V2V3* | 61.53 | 71.52 | 72.03 | 66.31 | 73.92 | 69.06 |
| V2+V9 | 65.03 | 68.85 | 71.60 | 66.32 | 71.81 | 68.72 |
| V1V2* | 64.20 | 70.29 | 66.81 | 65.44 | 72.40 | 67.83 |
| V3+V8 | 68.47 | 61.80 | 69.66 | 66.59 | 67.82 | 66.87 |
| V3+V7 | 68.41 | 61.60 | 71.05 | 66.80 | 65.93 | 66.76 |
| V2+V5 | 61.38 | 68.19 | 68.42 | 65.36 | 69.34 | 66.54 |
| V3+V6 | 63.26 | 59.91 | 68.53 | 67.04 | 65.15 | 64.78 |
| V3+V9 | 63.63 | 55.85 | 67.20 | 65.94 | 63.83 | 63.29 |
| V3+V5 | 60.94 | 56.74 | 65.79 | 62.91 | 62.49 | 61.77 |
Accuracy of taxonomic assignments has been evaluated at the species level considering the assignments obtained with full-length 16S sequences to be correct. Top 20 combinations in terms of average classification accuracy have been depicted. Combinations of contiguous V-regions have been marked with an asterisk (*).
Figure 5Evaluation of taxonomic classification efficiency on simulated microbiomes. Taxonomic classification efficiency of different combinations of V-regions evaluated on nine simulated microbiome datasets mimicking different environmental niches. Taxonomic classification accuracy in terms of percentages of correct assignments at species level are indicated in the heatmap. The color scale (1–36) depicts the performance rank of different combinations of V-regions (total of 36 combinations) in terms of taxonomic classification accuracy for each of the simulated microbiomes (presented in columns).
Utility of proposed combinatorial approach in obtaining refined taxonomic profiles compared to taxonomic abundance estimates obtained with pair-wise combinations of V-regions.
| Species | Abundance (%) estimated with full-length 16S reads | Abundance (%) estimated with 10,000 V1+V4 paired-end reads | Abundance (%) estimated with 10,000 V1+V5 paired-end reads | Abundance (%) estimated with combinatorial approach using 5,000 V1+V4 and 5,000 V1+V5 reads |
|---|---|---|---|---|
|
| 11.17 | 12.24 | 12.25 | 11.06 |
|
| 10.69 | 11.97 | 11.24 | 11.36 |
|
| 6.73 | 0.00 | 6.72 | 7.28 |
|
| 6.47 | 6.98 | 6.76 | 6.96 |
|
| 5.35 | 6.06 | 3.53 | 4.71 |
|
| 4.23 | 4.44 | 4.33 | 4.55 |
|
| 3.98 | 4.03 | 4.13 | 4.00 |
|
| 3.45 | 3.73 | 3.71 | 3.51 |
|
| 2.41 | 2.70 | 2.84 | 2.62 |
|
| 2.18 | 2.50 | 2.26 | 2.16 |
|
| 2.15 | 2.51 | 2.24 | 2.15 |
|
| 2.09 | 2.35 | 2.13 | 2.11 |
|
| 2.08 | 2.30 | 2.32 | 2.32 |
|
| 2.07 | 2.10 | 2.13 | 2.03 |
|
| 2.04 | 2.43 | 2.27 | 2.21 |
|
| 2.04 | 2.26 | 2.01 | 2.12 |
|
| 2.03 | 1.92 | 2.50 | 2.17 |
|
| 2.02 | 2.03 | 2.04 | 1.93 |
|
| 2.01 | 2.30 | 2.00 | 2.08 |
|
| 2.01 | 2.21 | 2.18 | 2.07 |
|
| 1.98 | 2.16 | 0.00 | 1.70 |
|
| 1.96 | 2.02 | 2.08 | 2.03 |
|
| 1.74 | 1.94 | 1.91 | 1.69 |
|
| 1.74 | 2.16 | 1.91 | 1.86 |
|
| 1.50 | 1.74 | 1.56 | 1.38 |
|
| 1.00 | 1.00 | 1.06 | 1.11 |
|
| 0.99 | 0.82 | 0.78 | 0.80 |
|
| 0.90 | 1.03 | 1.04 | 0.74 |
|
| 0.89 | 1.07 | 0.87 | 0.92 |
|
| 0.82 | 0.99 | 0.84 | 0.75 |
|
| 0.78 | 0.32 | 0.51 | 0.32 |
|
| 0.74 | 0.81 | 0.83 | 0.69 |
|
| 0.70 | 0.55 | 0.86 | 0.61 |
|
| 0.69 | 0.59 | 0.00 | 0.00 |
|
| 0.57 | 0.56 | 0.62 | 0.71 |
|
| 0.56 | 0.00 | 0.00 | 0.00 |
|
| 0.53 | 0.50 | 0.54 | 0.57 |
|
| 0.47 | 0.58 | 0.50 | 0.46 |
|
| 0.46 | 0.37 | 0.47 | 0.40 |
|
| 0.45 | 0.41 | 0.48 | 0.61 |
|
| 0.43 | 0.44 | 0.46 | 0.51 |
|
| 0.43 | 0.47 | 0.54 | 0.43 |
|
| 0.39 | 0.40 | 0.42 | 0.35 |
|
| 0.34 | 0.37 | 0.42 | 0.36 |
|
| 0.32 | 0.40 | 0.37 | 0.38 |
|
| 0.32 | 0.25 | 0.32 | 0.29 |
|
| 0.30 | 0.21 | 0.25 | 0.17 |
|
| 0.28 | 0.33 | 0.30 | 0.33 |
|
| 0.28 | 0.22 | 0.19 | 0.17 |
|
| 0.25 | 0.21 | 0.29 | 0.27 |
|
|
|
|
|
|
Results in the table pertain to the simulated human gut microbiome dataset Gut1 (as depicted in ).