| Literature DB >> 31167634 |
Nathan LaPierre1, Serghei Mangul2, Mohammed Alser3, Igor Mandric1, Nicholas C Wu4, David Koslicki5, Eleazar Eskin1,6.
Abstract
BACKGROUND: High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes.Entities:
Keywords: Abundance estimation; Alignment; Community profiling; Eukaryome; Metagenomics; Virome
Mesh:
Substances:
Year: 2019 PMID: 31167634 PMCID: PMC6551237 DOI: 10.1186/s12864-019-5699-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1MiCoP workflow. Reads are first aligned to viral or eukaryotic genomes in a reference database using BWA-MEM. The results provide coverage and read mapping quality information that can be examined. In the abundance estimation stage, uniquely-mapped reads are assigned to species and species abundances are estimated based on these. Multi-mapped reads are then assigned to genomes with probability proportional to their abundances among uniquely-mapped reads. Species with not enough reads mapped are filtered out, and then the final species abundances are computed
Abundance estimation performance results on a simulated viral community with 544 species
| L1 error | Precision | Recall/Sensitivity | F1-Score | |
|---|---|---|---|---|
| MiCoP | 0.09124 | 1.0 | 0.98155 | 0.99069 |
| Kraken | 1.15834* | 0.85147 | 0.90959 | 0.87957 |
| MetaPhlAn2 | 1.24357 | 0.84388 | 0.369 | 0.51348 |
Kraken is considered by its authors to be a read classification tool, not abundance estimation tool, so we put an asterisk next to its results. However, we note that abundance estimation is a common application for Kraken in practice. Overall, MiCoP outperforms the other two methods across all metrics. Kraken and especially MetaPhlAn are limited by the poor representation of viruses in their standard databases. L1 error is the sum of the absolute values of the differences between the computed species abundances and the ground truth species abundances. MiCoP’s L1 error was more than an order of magnitude better than the other tools, and MiCoP had the best precision and recall. * Based on read classification proportions; Kraken does not claim to perform abundance estimation
Abundance estimation performance results on a simulated viral community with 40 species
| L1 error | Precision | Recall/Sensitivity | F1-Score | |
|---|---|---|---|---|
| MiCoP | 0.00909 | 1.0 | 1.0 | 1.0 |
| Kraken | 1.15466* | 0.82222 | 0.925 | 0.87059 |
| MetaPhlAn2 | 0.09844 | 1.0 | 1.0 | 1.0 |
These species were sampled from the intersect of the species detected by all three tools in the previous simulation. Thus, this simulation consisted of only the species that were present in all three reference databases, eliminating reference bias. MetaPhlAn’s performance dramatically improved, predicting the exact set of species in the sample, but its abundance estimation was an order of magnitude worse than MiCoP’s. Kraken’s results did not markedly improve in this simulation. * Based on read classification proportions; Kraken does not claim to perform abundance estimation
Abundance estimation performance results on a simulated fungal community consisting of 40 genomes derived from 7 different species
| L1 error | Precision | Recall/Sensitivity | F1-Score | |
|---|---|---|---|---|
| MiCoP | 0.00017 | 1.0 | 1.0 | 1.0 |
| Kraken | 0.01420* | 1.0 | 0.83333 | 0.90909 |
| MetaPhlAn2 | 0.00924 | 0.85714 | 1.0 | 0.92308 |
These species were sampled in the same way as in the previous table: by taking the intersect of species detected by all three tools on a higher-complexity community. MiCoP detected the exact set of species present in the sample, while Kraken had one false negative and MetaPhlAn had one false positive. Additionally, MiCoP’s abundance estimation was more than an order of magnitude better than the other tools. * Based on read classification proportions; Kraken does not claim to perform abundance estimation
Comparison of the performance of MiCoP and MetaPhlAn2 on a mock viral community consisting of 9 species
| Precision | Recall | F1-Score | Reads per minute | |
|---|---|---|---|---|
| MiCoP | 0.875 | 0.77778 | 0.82353 | 87629 |
| MetaPhlAn2 | 1.0 | 0.11111 | 0.19999 | 162845 |
MetaPhlAn2 only detects 1 of 9 species, with no false positives, while MiCoP detects 7 of 9 species with one false positive, thus profiling the community much more accurately. MetaPhlAn2 processed the reads about twice as fast as MiCoP
Comparison of the genus-level performance of MiCoP and MetaPhlAn2 on a mock fungal community consisting of 4 genera
| Precision | Recall | F1-Score | Reads per minute | |
|---|---|---|---|---|
| MiCoP | 1.0 | 0.75 | 0.85714 | 6934 |
| MetaPhlAn2 | NaN (0/0) | 0.0 | NaN | 187961 |
MiCoP detects 3 of the 4 genera with no false positives, while MetaPhlAn2 detects nothing. Because MetaPhlAn2 has 0 true and false positives, precision cannot be computed. MetaPhlAn2 was faster, but MiCoP still finished in less than 3 h
Fig. 2MiCoP and MetaPhlAn2 HMP Virus Profiling Results. Abundance estimation for viruses when applying MiCoP and MetaPhlAn2 to 20 Human Microbiome Project samples, 10 from buccal mucosa and 10 from tongue dorsum. MiCoP detects a total of 34 species present, with the sample being dominated by bacterial phages, particularly Streptococcus phages. MetaPhlAn finds a much lower virome diversity than MiCoP, with only 12 species identified. The sample is again dominated by Streptococcus phages, but MetaPhlAn’s results suggest that there is only a single type of this phage dominating the sample, while MiCoP suggests that a wide variety of Streptococcus phages are present. MetaPhlAn’s results may stem from the reference bias issue explored in the simulation studies
Fig. 3MiCoP and MetaPhlAn HMP Fungi Profiling Results. Abundance estimation for fungi when applying MiCoP and MetaPhlAn2 to 20 Human Microbiome Project samples, 10 from buccal mucosa and 10 from tongue dorsum. MiCoP detects a total of 6 genera present. MetaPhlAn detects only two genera (Candida and Aspergillaceae), which are also present in MiCoP’s results. As the human oral eukaryome is known to be diverse [33–35], our results indicate that MiCoP captures the fungal community diversity better