| Literature DB >> 27699286 |
Marcel Martínez-Porchas1, Enrique Villalpando-Canchola1, Francisco Vargas-Albores1.
Abstract
The classification performance of Kraken was evaluated in terms of sensitivity and specificity when using short and long 16S rRNA sequences. A total of 440,738 sequences from bacteria with complete taxonomic classifications were downloaded from the high quality ribosomal RNA database SILVA. Amplicons produced (86,371 sequences; 1450 bp) by virtual PCR with primers covering the V1-V9 region of the 16S-rRNA gene were used as reference. Virtual PCŔs of internal fragments V3-V4, V4-V5 and V3-V5 were performed. A total of 81,523, 82,334 and 82,998 amplicons were obtained for regions V3-V4, V4-V5 and V3-V5 respectively. Differences in depth of taxonomic classification were detected among the internal fragments. For instance, sensitivity and specificity of sequences classified up to subspecies level were higher when the largest internal fraction (V3-V5) was used (54.0 and 74.6% respectively), compared to V3-V4 (45.1 and 66.7%) and V4-V5 (41.8 and 64.6%) fragments. Similar pattern was detected for sequences classified up to more superficial taxonomic categories (i.e. family, order, class…). Results also demonstrate that internal fragments lost specificity and some could be misclassified at the deepest taxonomic levels (i.e. species or subspecies). It is concluded that the larger V3-V5 fragment could be considered for massive high throughput sequencing reducing the loss of sensitivity and sensibility.Entities:
Keywords: Bioinformatics; Biological sciences; Genetics; Microbiology
Year: 2016 PMID: 27699286 PMCID: PMC5037269 DOI: 10.1016/j.heliyon.2016.e00170
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Primers used for the amplification of the complete 16S rRNA gene (V1–V9) and the internal fractions (V3–V4, V4–V5 and V3–V5).
| Name | Sequences | References |
|---|---|---|
| Large | Fw: AGAGTTTGATYMTGGCTCAG | ( |
| V3–V5 | Fw: CCTACGGGNGGCNGCA | ( |
| V4–V5 | Fw: GCCAGCAGCCGCGGTAA | ( |
| V3–V4 | Fw: CCTACGGGNGGCWGCAG | ( |
Fig. 1Distribution size of amplicons obtained after virtual PCR of complete 16S rRNA gene (V1–V9). Amplicons with extreme sizes outside the range of mean ± 2 standard deviations were excluded.
Fig. 2Cumulate proportion of taxonomic levels assigned to each amplicon type (complete gene or internal fractions) after being submitted to Kraken classifier. Unilarge is the set of sequences depurated by elimination of redundant amplicons and included in the 99% confidence range.
Proportion of sequences of the internal fragments (V3–V4, V4–V5, V3–V5) that received exactly the same classification than their unilarge (complete 16S) perfect matches. Proportion of sequences with a different classification result are also showed.
| V3–V4 | V3–V5 | V4–V5 | |
|---|---|---|---|
| Equal to large | 36,973 (45.3%) | 43,485 (52.4%) | 33,794 (41.0%) |
| Different to large | 44,170 (54.2%) | 39,243 (47.3%) | 47,542 (57.8%) |
| No reaction | 380 (0.5%) | 270 (0.3%) | 998 (1.2%) |
| Total | 81,523 | 82,998 | 82,334 |
Fig. 3Classification output obtained with the internal fractions (V3–V4, V4–V5, V3–V5) respect to the complete 16S rRNA gene sequences (V1–V9). Proportion of sequences with same classification results using either the complete sequence or the internal fragment sequence are labeled as “Equal”; whereas sequences of internal fractions with similar classification but to more superficial level are represented as “Less”, and sequences with different taxonomic classification compared to the respective complete sequence (V1–V9) are named as “Low”.
Sensitivity and specificity obtained by the different internal 16S rRNA gene fractions, considering the complete sequence of the 16S rRNA gene. Results were calculated considering the number of large sequences classified to subspecies, species or genus as maximum result.
| Sequences | Index | Subspecies | Species | Genus |
|---|---|---|---|---|
| Unilarge | Total (Z) | 41,685 | 57,616 | 68,331 |
| V3–V5 | Sensitivity | 54.02% | 56.41% | 67.57% |
| Specificity | 74.62% | 86.34% | 89.54% | |
| True Positives | 22,520 | 32,501 | 46,173 | |
| False Positives | 7,658 | 5,140 | 5,395 | |
| V4–V5 | Sensitivity | 41.83% | 44.00% | 57.27% |
| Specificity | 64.62% | 78.86% | 83.94% | |
| True Positives | 17,436 | 25,351 | 39,135 | |
| False Positives | 9,546 | 6,795 | 7,485 | |
| V3–V4 | Sensitivity | 45.10% | 48.32% | 60.38% |
| Specificity | 66.67% | 80.13% | 85.11% | |
| True Positives | 18,801 | 27,842 | 41,255 | |
| False Positives | 9,400 | 6,902 | 7,216 |
Note: Let Z be the total number of sequences assigned by Kraken to the different taxonomic levels.