| Literature DB >> 30567498 |
Marc Griesemer1, Jeffrey A Kimbrel1, Carol E Zhou2, Ali Navid1, Patrik D'haeseleer3,4.
Abstract
BACKGROUND: Genome-scale metabolic modeling is a cornerstone of systems biology analysis of microbial organisms and communities, yet these genome-scale modeling efforts are invariably based on incomplete functional annotations. Annotated genomes typically contain 30-50% of genes without functional annotation, severely limiting our knowledge of the "parts lists" that the organisms have at their disposal. These incomplete annotations may be sufficient to derive a model of a core set of well-studied metabolic pathways that support growth in pure culture. However, pathways important for growth on unusual metabolites exchanged in complex microbial communities are often less understood, resulting in missing functional annotations in newly sequenced genomes.Entities:
Keywords: Enzyme prediction; Functional annotation; Genome annotation; Metabolic modeling; Transport prediction
Mesh:
Year: 2018 PMID: 30567498 PMCID: PMC6299973 DOI: 10.1186/s12864-018-5221-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Reference genomes used in this study
| Genome Name | Biocyc ID | Phylum | NCBI Accessions | Proteins |
|---|---|---|---|---|
| MTBCDC1551 | Actinobacteria | AE000516 | 4189 | |
| MTBH37RV | Actinobacteria | AL123456 | 4018 | |
| SCO | Actinobacteria | NC_003888, NC_003903, NC_003904 | 8152 | |
| BTHE | Bacteroidetes | AE015928, AY171301 | 4825 | |
| CBTQ1 | Bacteroidetes | HG422566, CBQZ010000001- CBQZ010000011 | 739 | |
| SYNEL | Cyanobacteria | CP000100, CP000101 | 2661 | |
| 10403S_RAST | Firmicutes | CP002002 | 2814 | |
| ANTHRA | Firmicutes | NC_003997, AE017335, AE017336 | 5602 | |
| BSUB | Firmicutes | AL009126 | 4185 | |
| CLOSSAC | Firmicutes | CP004121, CP004122 | 5821 | |
| EREC | Firmicutes | CP001107 | 3626 | |
| PDIF272563 | Firmicutes | AM180355, AM180356 | 3809 | |
| AGRO | Proteobacteria | AE008687, AE008688, AE008689, AE008690 | 5402 | |
| AURANTIMONAS | Proteobacteria | AAPJ01000001- AAPJ01000035 | 3650 | |
| CAULO | Proteobacteria | AE005673 | 3737 | |
| CAULONA1000 | Proteobacteria | CP001340 | 3885 | |
| ECOL199310 | Proteobacteria | AE014075 | 5379 | |
| ECOL316407 | Proteobacteria | NC_007779 | 4410 | |
| ECOL413997 | Proteobacteria | CP000819 | 4209 | |
| ECOLI | Proteobacteria | U00096 | 4140 | |
| ECO0157 | Proteobacteria | AE005174, AF074613 | 5449 | |
| EVA | Proteobacteria | LM655252 | 330 | |
| HPY | Proteobacteria | CP003904 | 1594 | |
| MOB3B | Proteobacteria | NZ_ADVE02000001- NZ_ADVE02000003 | 4344 | |
| PABTQVLC | Proteobacteria | CP003867 | 280 | |
| SHIGELLA | Proteobacteria | AE014073 | 4068 | |
| VCHO | Proteobacteria | AE003852, AE003853 | 3828 |
aTier 1 Pathway Genome Database (EcoCyc)
bEndosymbiont with reduced genome
Fig. 1Large differences exist between the sets of Gene-EC annotations generated by the four annotation tools across the 27 reference genomes
Percentage of gene-EC annotation agreements that exist between pairs of tools
| Tool Combination | Gene-EC Agreements |
|---|---|
| KEGG-RAST | 16,697/20,915 (79.8%) |
| KEGG-EFICAz | 14,413/16,677 (86.4%) |
| KEGG-BRENDA | 3777/6748 (56.0%) |
| RAST-EFICAz | 12,977/15,694 (82.7%) |
| RAST-BRENDA | 3907/6288 (62.1%) |
| EFICAz-BRENDA | 3902/5601 (69.7%) |
Fig. 2Gene-EC annotations produced by KEGG and RAST for E. coli K-12, compared to the EcoCyc gold standard. The sets and intersections are drawn proportionally to the number of annotations in each
Fig. 3Reaction overlap between the annotation tools (average number of EC numbers per genome)
Fig. 4Precision vs Recall of EC numbers for different combinations of tools on EcoCyc. Individual tools are denoted by B, E, K, or R for BRENDA, EFICAz, KEGG, and RAST, respectively. For each combination of tools, we calculated precision and recall for both the union and intersection of the sets of EC numbers generated by each tool. The union corresponds to the set of EC numbers generated by at least one of the tools in the combination, while the intersection corresponds to those EC numbers generated by every single tool in the combination
Definitions of Precision, Recall and associated terms
| Term | Formula | Definition |
|---|---|---|
| True Positive | TP | EC numbers predicted by tools and found in EcoCyc. |
| False Positive | FP | EC numbers predicted by tools and not found in EcoCyc. |
| False Negative | FN | EC numbers in EcoCyc but not predicted by tools. |
| Precision | TP/(TP + FP) | Fraction of predicted EC numbers that are in EcoCyc. |
| Recall | TP/(TP + FN) | Fraction of EC numbers in EcoCyc correctly predicted by tools. |
Fig. 5Genome coverage and overlap in annotations varies across genomes. a Horizontal bars represent the fraction of the total number of EC numbers for each genome produced by only a single tool, or by two, three or all four tools. The 27 reference genomes were sorted with respect to the fraction of EC numbers that were predicted by 3 or more tools (blue bars). The top of the list is dominated by model organisms such as E. coli, B. subtilis, and closely related organisms. As we move farther away from such well-studied model organisms, the fraction of unique EC numbers predicted only by a single tool (red bars) increases, at the expense of those predicted by multiple tools. b The fraction of genes annotated as enzymes by each tool likewise decreases as we move farther away from model organisms such as E. coli. Note that two of the organisms with a drastically reduced genome content, Candidatus Portiera aleyrodidarum BT-QVLC and Candidatus Evansia muelleri, also have a relatively higher fraction of core metabolic enzymes
Fig. 6a Total number of genes annotated as transporters, regardless of substrate. b Transporter annotations with substrates predictions specific enough to be included in metabolic models (rank 1 or 2)
Examples of substrate annotation ranking
| Rank | Substrate | Examples |
|---|---|---|
| 1 | Metabolite that can be incorporated as a transport reaction in a metabolic model | • Fe |
| 2 | Substrate(s) that map to a small number of possible transport reactions | • Mg/Co/Ni |
| 3 | Broader substrate classes not directly usable to construct a metabolic network | • dipeptide |
| 4 | Very broad class of substrates | • multidrug efflux |
| 5 | No substrate annotated |
Transporter substrates were ranked from most specific (rank 1) to least specific (no substrate, rank 5). See Additional file 3 for the full table