| Literature DB >> 27153620 |
Jens Roat Kultima1, Luis Pedro Coelho1, Kristoffer Forslund1, Jaime Huerta-Cepas1, Simone S Li2, Marja Driessen1, Anita Yvonne Voigt3, Georg Zeller1, Shinichi Sunagawa1, Peer Bork4.
Abstract
UNLABELLED: MOCAT2 is a software pipeline for metagenomic sequence assembly and gene prediction with novel features for taxonomic and functional abundance profiling. The automated generation and efficient annotation of non-redundant reference catalogs by propagating pre-computed assignments from 18 databases covering various functional categories allows for fast and comprehensive functional characterization of metagenomes.Entities:
Mesh:
Year: 2016 PMID: 27153620 PMCID: PMC4978931 DOI: 10.1093/bioinformatics/btw183
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The MOCAT2 pipeline. Read quality control, assembly and gene prediction represent the original MOCAT pipeline (dark green box). Blue path: Genes are clustered into reference gene catalogs, which are functionally annotated. Orange path: To quantify functional composition, reads are mapped to the annotated gene catalog and summarized over the respective annotation categories. Taxonomic profiles (mOTU, specI and NCBI) are generated by mapping reads to mOTU and reference marker gene (RefMG) catalogs
Databases from which functional properties are obtained
| Proteins | Coverage | Precision | Recall | Reference | |
|---|---|---|---|---|---|
| eggNOG | 7 449 593 | 100 | 100 | 100 | |
| Pfam | 16 230* | 87 | 90 | 94 | |
| Superfamily | 15 438* | 93 | 89 | 94 | |
| KEGG | 7 423 864 | 98 | 93 | 93 | |
| MetaCyc | 388 782 | 100 | 89 | 94 | |
| SEED | 4 247 700 | 99 | 94 | 94 | |
| ARDB | 25 360 | 89 | 99 | 88 | |
| CARD | 2 820 | 100 | 81 | 93 | |
| Resfams | 123* | 80 | 94 | 94 | |
| MvirDB | 29 357 | 100 | 95 | 93 | |
| PATRIC | 2 194 475 | 93 | 93 | 93 | |
| vFam | 29 655 | 35 | 99 | 86 | |
| VFDB | 1 627 380 | 86 | 89 | 91 | |
| Victors | 3 329 893 | 91 | 92 | 94 | |
| dbCAN | 333* | 76 | 99 | 99 | |
| DBETH | 228 | 100 | 99 | 86 | |
| DrugBank | 3 899 | 99 | 88 | 94 | |
| ICEberg | 13 984 | 98 | 79 | 91 | |
| Prophages | 119 183 | 95 | 88 | 91 | |
Coverage of each database in percent, e.g., of the 18 202 orthologous groups in KEGG (KO), 17 773 (98%) are covered and thus propagated by the eggNOG database. Coverage, precision and recall are given as percentages.
*Number of hidden Markov models (HMMs), whereby one HMM can hit several proteins and several HMMs can map to one protein.