| Literature DB >> 26002885 |
Sofia Morfopoulou1, Vincent Plagnol1.
Abstract
MOTIVATION: Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26002885 PMCID: PMC4565032 DOI: 10.1093/bioinformatics/btv317
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.a. Log-likelihood trace plot for single chain MCMC and b. for PT chain at temperature T = 1. c. Schematic of parallel tempering. Exchanges are attempted between chains of neighboring temperatures, where Chain1 at
Number of species identified for the FAMeS simLC and simMC datasets, as well as sensitivity, specificity and abundance estimates error measures RRMSE and AVGRE
| metaMix | Pathoscope | MEGAN | |
|---|---|---|---|
|
| |||
|
| 116 | 165 | 232 |
|
| 99.96 | 99.1 | 100 |
|
| 99.8 | 97.7 | 95.0 |
|
| 16.9 | 36.6 | 35.9 |
|
| 8.3 | 29.7 | 18 |
|
| |||
|
| 114 | 147 | 208 |
|
| 98.8 | 97.3 | 100 |
|
| 99.8 | 98.4 | 95.9 |
|
| 21.1 | 185.6 | 32 |
|
| 8.9 | 53.3 | 16.1 |
|
| |||
|
| 115 | 144 | 208 |
|
| 98.5 | 98.2 | 100 |
|
| 99.8 | 98.6 | 95.9 |
|
| 29.6 | 152.7 | 31.9 |
|
| 12.9 | 49.2 | 19.3 |
The metaMix results are based on 25 runs.
simHC community: number of species detected by metaMix as well as sensitivity, specificity, AVGRE, RRMSE for metaMix at various posterior probability cutoffs (default in bold font)
| Cutoff | 0.9 | 0.8 | 0.7 | 0.6 | 0.5 |
|---|---|---|---|---|---|
|
| 99.82 |
| 99.96 | 100 | 100 |
|
| 0.0036 |
| 0.0017 | 0 | 0 |
|
| 99.86 |
| 99.77 | 99.73 | 99.70 |
|
| 0.0004 |
| 0.0005 | 0.0003 | 0.0001 |
|
| 16.69 | 16.85 | 16.73 | 17.50 | 17.48 |
|
| 8.20 | 8.31 | 8.16 | 8.60 | 8.56 |
|
| 115 |
| 117 | 118 | 119 |
|
| 1.2 |
| 1.2 | 0.7 | 0.3 |
The results are average values based on 25 runs.
simHC FAMeS dataset
| Read Support | metaMix | Pathoscope | MEGAN |
|---|---|---|---|
|
| 114 (0.9) | 131 | 147 |
|
| 99.1–99.9 | 98.2–99.1 | 100–98.5 |
|
| 116 (0.95) | 131 | 156 |
|
| 99.96–99.8 | 98.2–99.1 | 100–98.2 |
|
| 124 (1.65) | 141 | 166 |
|
| 100–99.5 | 98.2–98.7 | 100–98 |
|
| 155 (1.9) | 155 | 188 |
|
| 100–98.2 | 99.1–98.2 | 100–97.4 |
Number of species (SD in parenthesis), sensitivity and specificity by metaMix (25 runs), Pathoscope and MEGAN, as a function of the min. number of reads required for each species to appear in the output. metaMix: r = {10, 20, 30, 50} reads, Pathoscope: thetaPrior post-run threshold = {10, 20, 30, 50} reads, MEGAN: ‘Min Support’ + post-run threshold = {10, 20, 30, 50} reads.
Human clinical sample—novel virus
| metaMix | Pathoscope | ||||
|---|---|---|---|---|---|
| Taxon identifier | Scientific name | Assigned reads | Posterior Probability | Bayes factor | Final best hit read numbers |
| 374840 |
| 60447 | 1 | 140154 | 65327 |
| NA |
| 10257 | 1 | NA | NA |
| 9606 |
| 214 | 1 | 564 | 554 |
| 28090 |
| 94 | 1 | 197 | 126 |
| 469 |
| 71 | 0.99 | 216 | 123 |
| 13690 |
| 61 | 0.99 | 216 | 135 |
| 133448 |
| 47 | 0.91 | 4 | 169 |
| 645687 |
| 46 | 1 | 65 | 46 |
| 199310 |
| 30 | 1 | 12 | 35 |
| 56946 |
| 29 | 1 | 29 | 77 |
| 409438 |
| 19 | 0.98 | 14 | 49 |
| 618 |
| 16 | 0.92 | 5 | — |
| 1747 |
| 13 | 0.97 | 14 | 35 |
| 1282 |
| — | — | — | 10 |
| 28211 |
| — | — | — | 10 |
| 28037 |
| — | — | — | 8 |
| 562 |
| — | — | — | 8 |
| 509173 |
| — | — | — | 7 |
| 41297 |
| — | — | — | 6 |
| 40214 |
| — | — | — | 6 |
| 29391 |
| — | — | — | 5 |
| 76122 |
| — | — | — | 4 |
| 652103 |
| — | — | — | 2 |
| 268747 |
| — | — | — | 2 |
Comparison of community profile: metaMix—pathoscope.
Fig. 2.Human clinical sample - novel virus. The reads (short lines) assigned by metaMix to Astrovirus VA1 are aligned to the genome. The longer lines represent the genes of the virus