| Literature DB >> 35050136 |
Anurag Passi1, Juan D Tibocha-Bonilla2, Manish Kumar1, Diego Tec-Campos1,3, Karsten Zengler1,4,5, Cristal Zuniga1.
Abstract
Genome-scale metabolic models (GEMs) enable the mathematical simulation of the metabolism of archaea, bacteria, and eukaryotic organisms. GEMs quantitatively define a relationship between genotype and phenotype by contextualizing different types of Big Data (e.g., genomics, metabolomics, and transcriptomics). In this review, we analyze the available Big Data useful for metabolic modeling and compile the available GEM reconstruction tools that integrate Big Data. We also discuss recent applications in industry and research that include predicting phenotypes, elucidating metabolic pathways, producing industry-relevant chemicals, identifying drug targets, and generating knowledge to better understand host-associated diseases. In addition to the up-to-date review of GEMs currently available, we assessed a plethora of tools for developing new GEMs that include macromolecular expression and dynamic resolution. Finally, we provide a perspective in emerging areas, such as annotation, data managing, and machine learning, in which GEMs will play a key role in the further utilization of Big Data.Entities:
Keywords: ME-models; big data; computational tools; flux balance analysis; genome-scale metabolic models; machine learning; phenotypes; reconstruction
Year: 2021 PMID: 35050136 PMCID: PMC8778254 DOI: 10.3390/metabo12010014
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1Rate of publications related to different omics-related fields. PubMed search results for keywords such as “genomics”, “transcriptomics”, “proteomics”, “epigenomics”, “metabolomics”, “pharmacogenomics”, “fluxomics”, and “phenomics” in publications from 2000–2021. Stacks with “black” borders represent PubMed search results with the keyword “big data” and above-mentioned omics keywords (Supplementary File S1). Moreover, NCBI has added billions of bases to its sequence database over the last decade. It should be noted that the figure does not intend to represent any correlation of publications to the number of sequences.
Figure 2Big Data types commonly used in metabolic modeling. The left panel represents different omics data applied to the GEM providing different layers of biological knowledgebase. Machine learning can be applied to increase the predictive capability of the reconstructed GEMs. Different applications of GEMs are shown in the top right panel and discussed in detail in the text.
Figure 3Reconstructed GEMs for bacteria. Each node represents a different year. The nodes provide information on the number of reconstructed models and their classification into Gram-negative (pink) and Gram-positive (blue). Some of the organisms like Escherichia, Staphylococcus, Klebsiella, Liberibacter, and Salmonella also have multi-strain models constructed as represented by asterisk (Supplementary File S2).
Figure 4Available models for Archaea. The nodes in brown represent the year of GEM reconstruction and number of GEMs reconstructed for archaea (Supplementary File S3).
Figure 5Chronological order of GEMs of important model eukaryotic organisms. Each node depicts the year of GEM reconstruction and the number of GEMs reconstructed for that organism. The nodes are color coded to depict the classification of GEMs into Fungi (blue), Animalia (pink) and Phototrophs (green) (Supplementary File S4).
Available GEM reconstruction tools and their features.
| Tool | Reaction Databases | Advantages/Limitations | Platform | Availability | Citations (Average/Year) | Reference |
|---|---|---|---|---|---|---|
| AuReMe | Available GEMs, MetaCyc, and BiGG | It stores the information at each step during the reconstruction process to maintain transparency and reproducibility. | Docker image | Public | 36 (13) | [ |
| AutoKEGGRec | KEGG | It can be used to reconstruct models for a single organism and a given list of organisms. It generates an intermediate consolidated model that contains all the genes and reactions for all target organisms. Further, this consolidated model can be used to generate individual models. It does not incorporate transports, exchange, and biomass reactions to the draft model. Gap-filling is also not part of this reconstruction tool. | Matlab | Public | 22 (7.33) | [ |
| CarveMe | BiGG | It is an automatic tool for reconstructing and gap-filling the draft model. CarveMe generates ready-to-use models for flux balance analysis. As a reaction database, manually curated BiGG models are used in the reconstruction process. | Python | Public | 151 (50.33) | [ |
| COBRA toolbox, COBRApy, COBRA.ji | - | These tools do not provide any function to build the models based on annotated genomes. However, they provide the functions to incorporate all the components, such as genes, reactions, and metabolites into the model. In particular, these tools are useful for expanding upon existing draft models. | Matlab, Python, and Julia | Public | COBRA toolbox v.1-3.0—2733 (170) | [ |
| COBRAme | Available GEMs | It is used to develop ME (Metabolism and Expression) models, which are the extended version of GEMs. In addition to a high-quality GEM, these models also contain transcription, translation, and tRNA charging reactions. | Python | Public | 73 (24.33) | [ |
| CoReCo | Available GEMs, KEGG | It is a comparative reconstruction approach that uses available high-quality GEMs for comparison and reactions from the KEGG database to build models for closely related species. Its capability to compare models makes this tool useful for conducting evolutionary studies. | Python, R, Perl | Public | 68 (9.71) | [ |
| FAME | KEGG | It only works on the organisms available in the KEGG database. It allows the visualization of FBA results on KEGG pathway maps. | Web-based | Public | 93 (10.33) | [ |
| GEMsiRV | Available GEMs, BiGG, KEGG, MetaCyc, ModelSEED | It generates the model based on orthologous genes between the target and template model provided by the user. It can perform gap-filling using reference databases from BiGG, KEGG, MetaCyc, and ModelSEED. | Web-based | Public | 43 (4.78) | [ |
| Merlin | KEGG, TCDB | It comprises several specific features, such as annotation of both enzymatic and transport genes, subcellular localization. Therefore, it can be used to reconstruct the models for both prokaryotes and eukaryotes. This tool also has a function to visualize all reactions in the model that can help users in the gap-filling process using the KEGG pathway browser. | Java | Public | 90 (15) | [ |
| MetaDraft | Available GEMs | It uses available GEMs as templates to build models for a new organism. It contains internal template models (BiGG models) as reaction databases; however, users can create and use more templates. | Python GUI | Public | 28 (7) | [ |
| ModelSEED/KBase | ModelSEED | In the first step, it uses RAST to annotate the genome of target organisms. This tool builds the models based on annotated genome and internal reaction databases. It performs gap-filling as a part of an algorithm based on user-provided media or complete media. It is a fully automated tool and does not allow users to customize any steps during reconstruction. It works on the assumption that all the reactions in the internal database are mass and charge-balanced. It also supports model reconstruction for plants. | Web-based | Public | 919 (83.55) | [ |
| Pantograph | Available GEMs | It uses available models as a reaction database and orthology mappings between genomes of target and template organisms to reconstruct the GEM. It does not apply automatic gap-filling to the draft models. | Python | Public | 22 (3.67) | [ |
| Pathway Tools | MetaCyc | It generates the model based on genes, reactions, and metabolites stored in organism-specific PGDB (pathway/genome database) and annotated genome. PGDB also helps in filling the gaps in the pathways. It contains 12 experimentally confirmed biomass reactions. Based on the taxonomy of the targeted organism, one biomass reaction is incorporated into the model. | Web-based, Python (via PythonCyc) | Free for academic and government researchers, | 216 (43.2) | [ |
| RAVEN | Available GEMs, KEGG. MetaCyc | It provides a flexible environment to build a draft model. Users can employ multiple template models simultaneously. This tool can also be used to build the models using reaction databases like KEGG and MetaCyc. Additionally, networks built on different databases can be merged into one model. RAVEN also contains functions for gap-filling and subcellular localization (for eukaryotes). | Matlab | Public | 97 (32.33) | [ |
| rBioNet | - | This is a part of COBRA Toolbox. It is not an automatic tool to populate the reactions in a draft model from any reaction database. Users need to provide manually or automatically created reaction databases as input for this tool. It comprises the functions to check the quality of newly added reactions such as duplication, charge, and mass balances. | Matlab | Public | 71 (7.1) | [ |
| SuBliMinal Toolbox | KEGG, MetaCyc | It provides the modules to extract the reactions from KEGG and MetaCyc and merge both versions into a single network. This tool creates biomass reactions based on the biomass precursor present in the draft model. It also has a module to perform subcellular compartmentalization for reactions in the network. | Java | Public | 103 (10.3) | [ |