| Literature DB >> 26635865 |
Matteo Ramazzotti1, Luisa Berná2, Claudio Donati3, Duccio Cavalieri3.
Abstract
Non-targeted metagenomics offers the unprecedented possibility of simultaneously investigate the microbial profile and the genetic capabilities of a sample by a direct analysis of its entire DNA content. The assessment of the microbial taxonomic composition is frequently obtained by mapping reads to genomic databases that, although growing, are still limited and biased. Here we present riboFrame, a novel procedure for microbial profiling based on the identification and classification of 16S rDNA sequences in non-targeted metagenomics datasets. Reads overlapping the 16S rDNA genes are identified using Hidden Markov Models and a taxonomic assignment is obtained by naïve Bayesian classification. All reads identified as ribosomal are coherently positioned in the 16S rDNA gene, allowing the use of the topology of the gene (i.e., the secondary structure and the location of variable regions) to guide the abundance analysis. We tested and verified the effectiveness of our method on simulated ribosomal data, on simulated metagenomes and on a real dataset. riboFrame exploits the taxonomic potentialities of the 16S rDNA gene in the context of non-targeted metagenomics, giving an accurate perspective on the microbial profile in metagenomic samples.Entities:
Keywords: 16S rDNA gene; community profiling; metagenomics; non-targeted approach; short reads; variable region
Year: 2015 PMID: 26635865 PMCID: PMC4646959 DOI: 10.3389/fgene.2015.00329
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Result of the extraction of ribosomal reads from the simulated datasets “Random” and “Curated.”
| Random | Curated | |
|---|---|---|
| Original # reads | 347174 | 187000 |
| Extracted by HMM | 308686 (88.91%) | 182687 (97.69%) |
| Missed | 38488 (11.09%) | 4313 (2.31%) |
Results of the evaluation of riboFrame with true ribosomal reads.
| Rank | % Correct | % Wrong | # Reads | |
|---|---|---|---|---|
| Curated | ||||
| Domain | 100 | 0 | 179965 | |
| Phylum | 99.91 | 0.09 | 166673 | |
| Class | 99.62 | 0.38 | 156945 | |
| Order | 98.92 | 1.08 | 137750 | |
| Family | 97.38 | 2.62 | 112094 | |
| Genus | 90.17 | 9.83 | 74110 | |
| Random | ||||
| Domain | 100 | 0 | 305417 | |
| Phylum | 99.97 | 0.03 | 293269 | |
| Class | 99.88 | 0.12 | 283741 | |
| Order | 99.14 | 0.86 | 248281 | |
| Family | 97.1 | 2.9 | 193589 | |
| Genus | 92.76 | 7.24 | 143287 | |
Results of the evaluation of riboFrame with simulated metagenomics datasets.
| Thr 0.5 | Thr 0.8 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Good | Error | Reads | Reads% | Good | Error | Reads | Reads% | ||
| 1M: 3228 reads | |||||||||
| Domain | 99.97 | 0 | 3209 | 100.00 | 99.97 | 0 | 3202 | 100.00 | |
| Phylum | 99.59 | 0.37 | 2943 | 91.71 | 99.95 | 0 | 1994 | 62.27 | |
| Class | 99.69 | 0.27 | 2568 | 80.02 | 99.93 | 0 | 1467 | 45.82 | |
| Order | 97.83 | 2.11 | 1985 | 61.86 | 99.57 | 0.32 | 935 | 29.20 | |
| Family | 94.14 | 5.8 | 1517 | 47.27 | 98.09 | 1.77 | 678 | 21.17 | |
| Genus | 88.25 | 11.64 | 944 | 29.42 | 95.57 | 4.16 | 360 | 11.24 | |
| 2M: 6247 reads | |||||||||
| Domain | 99.95 | 0.03 | 6227 | 100.00 | 99.97 | 0.02 | 6206 | 100.00 | |
| Phylum | 99.75 | 0.23 | 5711 | 91.71 | 99.95 | 0.03 | 3833 | 61.76 | |
| Class | 99.6 | 0.38 | 5005 | 80.38 | 99.97 | 0 | 2872 | 46.28 | |
| Order | 98.38 | 1.6 | 3940 | 63.27 | 99.84 | 0.11 | 1867 | 30.08 | |
| family | 94.62 | 5.35 | 2992 | 48.05 | 98.54 | 1.39 | 1367 | 22.03 | |
| Genus | 88.03 | 11.92 | 1895 | 30.43 | 95.99 | 3.87 | 722 | 11.63 | |
| 5M: 15531 reads | |||||||||
| Domain | 99.98 | 0.01 | 15462 | 100.00 | 99.99 | 0.01 | 15427 | 100.00 | |
| Phylum | 99.69 | 0.3 | 14185 | 91.74 | 99.97 | 0.02 | 9626 | 62.40 | |
| Class | 99.6 | 0.4 | 12381 | 80.07 | 99.99 | 0 | 6994 | 45.34 | |
| Order | 98.16 | 1.83 | 9558 | 61.82 | 99.65 | 0.33 | 4558 | 29.55 | |
| Family | 94.01 | 5.98 | 7158 | 46.29 | 98.25 | 1.72 | 3318 | 21.51 | |
| Genus | 86.6 | 13.38 | 4551 | 29.43 | 93.69 | 6.25 | 1742 | 11.29 | |
Result of the extraction of ribosomal reads from the “Curated” ribosomal reads set (187000 reads) by various extractors.
| Recruited | Error% | Time (min)∗∗ | |
|---|---|---|---|
| riboFrame∗ | 182687 | 2.31 | 30 |
| Infernal | 184861 | 1.14 | 2860 |
| V-Xtractor | 184161 | 1.52 | 810 |
| Metaxa | 159632 | 15.20 | 525 |