| Literature DB >> 26541816 |
Pavel Petrenko1, Briallen Lobb1, Daniel A Kurtz1, Josh D Neufeld1, Andrew C Doxey2.
Abstract
BACKGROUND: Metagenomes provide access to the taxonomic composition and functional capabilities of microbial communities. Although metagenomic analysis methods exist for estimating overall community composition or metabolic potential, identifying specific taxa that encode specific functions or pathways of interest can be more challenging. Here we present MetAnnotate, which addresses the common question: "which organisms perform my function of interest within my metagenome(s) of interest?" MetAnnotate uses profile hidden Markov models to analyze shotgun metagenomes for genes and pathways of interest, classifies retrieved sequences either through a phylogenetic placement or best hit approach, and enables comparison of these profiles between metagenomes.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26541816 PMCID: PMC4636000 DOI: 10.1186/s12915-015-0195-4
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Fig. 1Backend MetAnnotate pipeline for Hidden Markov Model (HMM) search and taxonomic classification. GO Gene Ontology, ORF open reading frame
Fig. 2Screenshots of the MetAnnotate web interface
Taxonomic classification accuracy [proportion of correctly assigned sequences (%)] for MetAnnotate’s best hit and phylogenetic classification approach
| Annotation method | Species | Genus | Phylum |
|---|---|---|---|
|
| |||
| Best hit | 61.8 | 87.4 | 94.5 |
| Phylogenetic | 60.0 | 87.6 | 97.3 |
|
| |||
| Best hit | 47.4 | 78.7 | 83.3 |
| Phylogenetic | 46.2 | 80.8 | 90.1 |
Fig. 3Taxonomic classification accuracy of MetAnnotate based on a simulated metagenome dataset and the best hit classification method. The proportion of correct taxonomic annotations assigned to detected homologs is shown for five different taxonomic markers (a) and five markers of biological functions (b), as well as different read lengths (c) and metagenomic-to-reference sequence identities (d). Results for (c) and (d) are based on all taxonomic marker homologs identified in (a)
The effect of length and similarity to database on taxonomic classification accuracy (genus-level) using best hit and phylogenetic classification. Numbers indicate proportion of correctly assigned sequences (%)
| Best hit | Phylogenetic | |
|---|---|---|
|
| ||
| 100 | 72.3 | 58.1 |
| 300 | 81.3 | 81.2 |
| 500 | 82.7 | 81.3 |
|
| ||
| 40–60 | 77.6 | 82.8 |
| 60–80 | 85.2 | 83.4 |
| 80–100 | 92.4 | 89.6 |
Fig. 4Example application: taxonomic profiling of cobalamin (vitamin B12) producers in aquatic metagenomes using MetAnnotate. Taxonomic profiles (family level) based on 11 cobalamin synthesis proteins are shown for eight metagenomes. See [25] for additional information