| Literature DB >> 30514205 |
Emil Karlsen1, Christian Schulz1, Eivind Almaas2,3.
Abstract
BACKGROUND: Constraint-based modeling is a widely used and powerful methodology to assess the metabolic phenotypes and capabilities of an organism. The starting point and cornerstone of all such modeling is a genome-scale metabolic network reconstruction. The creation, further development, and application of such networks is a growing field of research thanks to a plethora of readily accessible computational tools. While the majority of studies are focused on single-species analyses, typically of a microbe, the computational study of communities of organisms is gaining attention. Similarly, reconstructions that are unified for a multi-cellular organism have gained in popularity. Consequently, the rapid generation of genome-scale metabolic reconstructed networks is crucial. While multiple web-based or stand-alone tools are available for automated network reconstruction, there is, however, currently no publicly available tool that allows the swift assembly of draft reconstructions of community metabolic networks and consolidated metabolic networks for a specified list of organisms.Entities:
Keywords: AutoKEGGRec; COBRA; Community model; Consolidated model; Constraint-based analysis; Draft model; Genome-scale metabolic model; Metabolic network reconstruction; Multiple organisms; Pipeline
Mesh:
Year: 2018 PMID: 30514205 PMCID: PMC6280343 DOI: 10.1186/s12859-018-2472-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1General workflow of the AutoKEGGRec pipeline. KEGG organism ID(s) are needed as input to the function (leftmost green box), and optional flags may be set (yellow boxes). Using KEGG IDs to fetch relevant data from the KEGG database and handle them within the pipeline, (1) all links between EC numbers and their genes and (2) the further linkage between EC numbers and reactions (2) are stored. From this information, (3) an Organisms-Reactions-Genes matrix is constructed. (4) Reactions are filtered by e.g. removing polymerization or generic reactions. A first draft consolidated model reconstruction is returned (rightmost green box)
Average execution time for AutoKEGGRec within Matlab 2017b on a Dell Latitude 7490 with an I7-7600U using 2-core-parallelization
| Tested strains incl. number of organisms | Mean runtime [s] | Reactions/metabolites | Giant component [#, %] |
|---|---|---|---|
| 2,995±65 | 1226/1188 | 1215/99.10 | |
| 3,957±166 | 1412/1445 | 1400/99.15 | |
| 3,538±216 | 1108/1134 | 1103/99.55 | |
| 4,792±289 | 1342/1302 | 1331/99.18 | |
| 5,618±322 | 1351/1312 | 1340/99.19 | |
| 6,724±130 | 1354/1316 | 1343/99.19 | |
| 7,750±208 | 1358/1319 | 1347/99.19 | |
| 9,115±198 | 1359/1321 | 1348/99.19 | |
| 10,032±651 | 1359/1321 | 1348/99.19 | |
| 11,466±1,065 | 1359/1321 | 1348/99.19 | |
| 12,687±1,269 | 1359/1321 | 1348/99.19 | |
| 13,848±1,658 | 1359/1321 | 1348/99.19 | |
| 15,082±1,850 | 1360/1323 | 1349/99.19 |
The columns are: a specification of the queried organisms, the mean and standard deviation runtime (five separate software executions) per dataset using ConsolidatedRec, SingleRecs, CommunityRec, OrgRxnGen, OmittedData and DisconnectedReactions as optional flags, the number of reactions and metabolites in the consolidated FDR and the number and fraction of reactions in the consolidated network’s giant component
Example output using the OrgRxnGen flag within AutoKEGGRec
| KEGG ID | eco | ecj | ecd | ebw | ecok | Sum | Total | Genes |
|---|---|---|---|---|---|---|---|---|
| R00001 | 0 | 0 | 0 | |||||
| R00002 | 0 | 0 | 0 | |||||
| R00004 | b4226 | JW4185 | ECDH10B_4421 | BWG_3936 | ECMDS42_3668 | 5 | 1 | 1 |
| R00005 | 0 | 0 | 0 | |||||
| R00006 | b0078, b3670, b0077, b3671, b3769 | JW0077, JW3645, JW0076, JW3646, JW3742 | ECDH10B_3853, ECDH10B_3958, ECDH10B_3854 | BWG_0073, BWG_0074, BWG_3454, BWG_3361, BWG_3362 | ECMDS42_3207, ECMDS42_0071, ECMDS42_0072, ECMDS42_3105, ECMDS42_3106 | 5 | 1 | 3;5 |
| R00008 | 0 | 0 | 0 | |||||
| R00009 | b1732, b3942 | JW1721, JW3914 | ECDH10B_1870, ECDH10B_4131 | BWG_1545, BWG_3611 | ECMDS42_1407, ECMDS42_3380 | 5 | 1 | 2 |
| R00010 | b1197, b3519 | JW3487, JW1186 | ECDH10B_3696, ECDH10B_1250 | BWG_3208, BWG_1022 | ECMDS42_2954, ECMDS42_0984 | 5 | 1 | 2 |
| R00011 | 0 | 0 | 0 | |||||
| R00012 | 0 | 0 | 0 | |||||
| R00013 | b0507 | JW0495 | ECDH10B_0463 | BWG_0384 | ECMDS42_0400 | 5 | 1 | 1 |
| R00014 | b0114, b0078, b3670, b0077, b3671, b3769 | JW0110, JW0077, JW3645, JW0076, JW3646, JW3742 | ECDH10B_0094, ECDH10B_3853, ECDH10B_3958, ECDH10B_3854 | BWG_0073, BWG_0074, BWG_3454, BWG_3361, BWG_3362, BWG_0107 | ECMDS42_3207, ECMDS42_0071, ECMDS42_0072, ECMDS42_0105, ECMDS42_3105, ECMDS42_3106 | 5 | 1 | 4;6 |
| R00015 | 0 | 0 | 0 | |||||
| R00017 | b3518 | JW3486 | ECDH10B_3695 | BWG_3207 | ECMDS42_2953 | 5 | 1 | 1 |
Here the Organism-Reaction-Gene matrix for the five E. coli K-12 strains is shown. The genes encoding the the first 14 reactions in KEGG, followed by the genes, if any, for the different organisms. Also provided with this output, in the three rightmost columns, is the Sum of organisms whose metabolism contains this reaction, the Total fraction of strains whose metabolism contains this reaction, and the number of Genes for the different organisms for that reaction
Fig. 2Consolidated FDR (a) and community FDR (b) generated from five E. coli K-12 strains. The consolidated FDR (a) consists of the union of metabolic reactions for the query organisms. The displayed network consists of 1596 reactions (light green) and 1621 metabolites (dark green), and is based on the five E. coli K-12 strains with KEGG organism IDs eco, ecj, ecd, ebw, and ecok. The vast majority of metabolic reactions can be seen to reside in the giant component. The community metabolic network (b) generated by AutoKEGGRec keeps the query organisms in separate compartments. The network consists of 8002 metabolites (dark green) and 7855 reactions (light green), with the vast majority of reactions associated with the five largest connected components. Note, that the different organisms are not connected due to the fact, that this consolidated FDR does not contain transport reactions, which would connect the different organisms/compartments
Comparison of eco, the FDR generated by AutoKEGGRec, with iAF1260 [31], iJO1366 [32], and ModelSeed model 511145.180 generated with default settings
| Comparison |
|
|
|
|
|---|---|---|---|---|
| # of reactions BR | 1224 | 2382 | 2583 | 1635 |
| # of metabolites BR | 1185 | 1668 | 1805 | 1573 |
| # of reactions AR | 1224 | 2081 | 2256 | 1510 |
| # of metabolites AR | 1185 | 1668 | 1805 | 1573 |
| network mean degree | 4.203 | 4.727 | 4.765 | 4.527 |
| network mean shortest path | 4.470 | 4.549 | 4.493 | 3.860 |
| # of blocked metabolites | 519 | 369 | 390 | 746 |
| # of same-sign-metabolites | 160 | 89 | 84 | 136 |
| # of genes in model | 1263 | 1260 | 1366 | 1139 |
| # of shared genes with | 1263 | 927 | 981 | 794 |
We removed all transport, biomass, and ATP maintenance reactions from iAF1260, iJO1366 and 511145.180. Except for reported values marked “BR” (before removal) and “AR” (after removal), results are for the reduced models. Definition of “blocked metabolites” and “same-sign-metabolites” is provided in the main text