| Literature DB >> 30373533 |
Yujing Yao1, Zhezhen Jin1, Joseph H Lee2,3.
Abstract
BACKGROUND: With the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple mismatch probabilities from different genomes.Entities:
Keywords: EM algorithm; Metagenomics; Taxonomic assignment
Mesh:
Year: 2018 PMID: 30373533 PMCID: PMC6206629 DOI: 10.1186/s12863-018-0680-1
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Simulation results for evaluation of the tests
| Tests | Burden Test | Variance Component Test |
|---|---|---|
|
| 0.05 | 0.02 |
| 1 − | 0.99 | 1.00 |
Type I error and power of 100 simulation datasets with 1000 reads per set
Results for simulation study: Long reads. The proportions of reads correctly (TP) and incorrectly (FP) assigned to taxonomy tree at different ranks of two methods with average length of 500 bp
| simLC | simMC | simHC | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TAMER | TADIP | TAMER | TADIP | TAMER | TADIP | |||||||
| Rank | TP | FP | TP | FP | TP | FP | TP | FP | TP | FP | TP | FP |
| Species | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 0.9229 | 0.1731 | 0.9229 | 0.1004 | 0.9565 | 0.0273 | 0.9565 | 0.0165 |
| Genus | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 0.9233 | 0.1731 | 0.9233 | 0.0999 | 0.9644 | 0.0194 | 0.9644 | 0.0087 |
| Family | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 0.9995 | 0.0965 | 0.9995 | 0.0767 | 0.9838 | 0.0000 | 0.9838 | 0.0000 |
| Order | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 0.9995 | 0.0965 | 0.9995 | 0.0766 | 0.9838 | 0.0000 | 0.9838 | 0.0000 |
| Class | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0960 | 1.0000 | 0.0747 | 0.9838 | 0.0000 | 0.9838 | 0.0000 |
| Phylum | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0960 | 1.0000 | 0.0747 | 0.9838 | 0.0000 | 0.9838 | 0.0000 |
| Kingdom | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0960 | 1.0000 | 0.0747 | 0.9838 | 0.0000 | 0.9838 | 0.0000 |
Results for simulation study: Short reads. The proportions of reads correctly (TP) and incorrectly (FP) assigned to taxonomy tree at different ranks of two methods with average length of 150 bp
| simLC | simMC | simHC | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TAMER | TADIP | TAMER | TADIP | TAMER | TADIP | |||||||
| Rank | TP | FP | TP | FP | TP | FP | TP | FP | TP | FP | TP | FP |
| Species | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.6704 | 0.0398 | 0.6704 | 0.0065 | 0.7113 | 0.0242 | 0.7289 | 0.0055 |
| Genus | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.6704 | 0.0398 | 0.6704 | 0.0051 | 0.7113 | 0.0242 | 0.7289 | 0.0055 |
| Family | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7002 | 0.0353 | 0.7002 | 0.0342 |
| Order | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7002 | 0.0353 | 0.7002 | 0.0342 |
| Class | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7337 | 0.0018 | 0.7337 | 0.0018 |
| Phylum | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7337 | 0.0018 | 0.7337 | 0.0018 |
| Kingdom | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7102 | 0.0000 | 0.7337 | 0.0018 | 0.7337 | 0.0018 |
Fig. 1Results for low level complexity simulation study. Numbers of reads assigned using TAMER and TADIP were compared with the true values at the level of Species for the simLC with long read length (a) and short read length (b)
Simulation results of hypothesis testing for three benchmark data sets. Test results of burden test and variance component test indicating the need of different mismatch probabilities setting in the models of these simulation samples
| Group | Tests | Burden Test | Variance Component Test |
|---|---|---|---|
| simLC | Long reads | 0.0004 | 0.02 |
| Short reads | 0.131 | 0.009 | |
| simMC | Long reads | < 0.0001 | < 0.0001 |
| Short reads | 0.002 | < 0.0001 | |
| simHC | Long reads | < 0.0001 | < 0.0001 |
| Short reads | < 0.0001 | < 0.0001 |
Fig. 2Results for medium level complexity simulation study. Numbers of reads assigned using TAMER and TADIP were compared with the true values at the level of Species for the simMC with long read length (a) and short read length (b)
Fig. 3Results for high level complexity simulation study. Numbers of reads assigned using TAMER and TADIP were compared with the true values at the level of Species for the simHC with long read length(a) and short read length (b)
Fig. 4Results for the study of oral metagenomics data. Numbers of reads assigned using TAMER and TADIP for the representative Classes of the eight oral samples
Fig. 5Results for the study of gut metagenomics data. Heat maps for representative Phylum show the estimated proportion of reads assigned to each of the 11 samples based on the TADIP model