| Literature DB >> 21060853 |
Dipak Barua1, Joonhoon Kim, Jennifer L Reed.
Abstract
Integrated constraint-based metabolic and regulatory models can accurately predict cellular growth phenotypes arising from genetic and environmental perturbations. Challenges in constructing such models involve the limited availability of information about transcription factor--gene target interactions and computational methods to quickly refine models based on additional datasets. In this study, we developed an algorithm, GeneForce, to identify incorrect regulatory rules and gene-protein-reaction associations in integrated metabolic and regulatory models. We applied the algorithm to refine integrated models of Escherichia coli and Salmonella typhimurium, and experimentally validated some of the algorithm's suggested refinements. The adjusted E. coli model showed improved accuracy (∼80.0%) for predicting growth phenotypes for 50,557 cases (knockout mutants tested for growth in different environmental conditions). In addition to identifying needed model corrections, the algorithm was used to identify native E. coli genes that, if over-expressed, would allow E. coli to grow in new environments. We envision that this approach will enable the rapid development and assessment of genome-scale metabolic and regulatory network models for less characterized organisms, as such models can be constructed from genome annotations and cis-regulatory network predictions.Entities:
Mesh:
Year: 2010 PMID: 21060853 PMCID: PMC2965739 DOI: 10.1371/journal.pcbi.1000970
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Example Network Illustrating the GeneForce Approach.
(A) Predicted fluxes through an un-regulated metabolic network, where all reactions are available (indicated by the green arrow) and flux through the biomass reaction (vBiomass) is maximized. The numbers and thickness of the arrows indicate flux values. (B) Predicted flux through an integrated metabolic and regulatory model (SR-FBA), where numbers and arrow thicknesses indicate flux values. The regulatory network includes regulation of two genes (G1 and G2) by two transcription factors (TF1 and TF2), where TF1 activates G1 and TF2 represses G2. G1 is needed for the B→C reaction and G2 is needed for the A→D reaction. Binary gene expression status (yG1 and yG2) and transcription factor activity (xTF1 and xTF2) indicators show the expression and binding status of G1, G2, TF1 and TF2, respectively, with value 1 indicating the expressed/active condition and 0 indicating the unexpressed/inactive condition. Regulatory interactions are shown as dashed lines, where a normal or blunt arrowhead indicates activation and repression, respectively. The colors indicate the state (active = green, inactive = red) of transcription factors and metabolic gene expression, or the availability of metabolic reactions (available = green, unavailable = red). (C) Fluxes and surrogate gene expression indicator values as predicted by the GeneForce approach. The reactions (B→C and A→D) are now dependent on the surrogate gene expression indicators (y′G1 and y′G2) instead of the expression status of genes G1 and G2 (yG1 and yG2). A threshold biomass flux (μthreshold) is set as a constraint and the GeneForce algorithm minimizes the sum of the differences between the surrogate gene expression indicators (shown in c) and the gene expression indicators (shown in b) while satisfying this constraint.
E. coli model refinements and the conditions under which they were identified by GeneForce.
| Refinement Step | Gene | Original Rule | Refined Rule | Condition | Comment |
|
|
| (NOT MetJ) | GPR correction | Gly-Met (N)Met-Ala (N) | Unknown transporter for L-methionine (PMID: 4604763) |
|
|
| (NagC) | (ON) | N-acetyl-D-glucosamine (C,N)N-acetyl-D-mannosamine (C,N)N-acetyl-neuraminic acid (N) | Essential gene (PMID: 8407787) |
|
|
| (NOT val-L(e)>0 ) | (ON) | b3773 ( | á-acetolactate or á-acetohydroxybutyrate inducer for |
|
|
| (ilvY) | (ilvY AND NOT (val-L(e)>0)) OR (NOT ilvY) | b3773 ( | Constitutive expression of |
|
|
| (Crp AND (NOT Lrp OR (leu-L(e)>0))) | ((Crp AND (NOT Lrp OR (leu-L(e)>0)))) OR (ser-L(e)>0) | L-serine (N) | Transporters for ser-L; |
|
|
| (NOT Lrp OR (leu-L(e)>0)) | (NOT GcvB) | D-alanine (C,N) | No Lrp binding; CycA transporter for 6 amino acids (PMID: 19118351) |
|
|
| (NOT GcvR AND GcvA) | D-alanine (C,N) | New regulatory small RNA (PMID: 10972807) | |
|
|
| GPR correction DsdC or (DsdC and Crp) | D-serine (C,N) | New ser-D transporter (This study, PMID: 16952954); regulation (PMID: 7592420) | |
|
|
| (NOT (rib-D(e)>0)) | (NOT ((all-D(e)>0) OR (rib-D(e)>0))) | b2914 ( |
|
|
|
| (SoxS) | (ON) | b0118 ( | Two aconitases (PMID: 9202458) |
|
|
| (NOT Lrp OR (leu-L(e)>0)) | (ON) | b2797 ( | L-serine/L-threonine deaminases; SdaA (anaerobic), TdcB (anaerobic), IlvA (PMID: 13405870, 15155761) |
|
|
| (NOT ArgR) | (ON) | L-arginine (N) | Required for L-lysine biosynthesis |
|
|
| ((NOT(Growth>0) AND RpoS) OR (NRI_hi AND RpoN)) AND (NOT Lrp OR (leu-L(e)>0)) | ((NOT(Growth>0) AND RpoS) OR (NRI_hi AND RpoN)) | L-arginine (N) | AST pathway for L-arginine degradation (PMID: 9696779) |
|
|
| (NOT (PurR)) | (NOT (PurR)) AND (NOT (AGMT>0)) | L-arginine (C) | Putrescine inhibits transcription of |
|
|
| (MetR) | (metR) OR (met-L(e)>0) | Gly-Met (N)Met-Ala (N) | methionine represses |
|
|
| (NOT (thr-L(e)>0 OR ile-L(e)>0)) AND (NOT Lrp OR (leu-L(e)>0)) | (NOT (thr-L(e)>0 OR ile-L(e)>0)) | Gly-Met (N)Met-Ala (N) | methionine represses |
|
|
| (RhaR) | (RhaR OR (RhaR AND Crp)) | L-lyxose (C) |
|
|
|
| (rmn(e)>0) | (rmn(e)>0 OR lyx(e)>0 OR man(e)) | L-lyxose (C) |
|
|
|
| (Lrp AND NOT (leu-L(e)>0) OR (NOT (Crp))) | (ON) | b0889 (lrp) | Essential in glucose and glycerol minimal medium (PMID: 17012394) |
|
|
| (Lrp AND NOT (leu-L(e)>0)) | (ON) | b0889 (lrp) | Essential in glucose and glycerol minimal medium (PMID: 17012394) |
|
|
| (NOT(leu-L(e)>0) AND Lrp) | (NOT(leu-L(e)>0) | b0889 (lrp) | Essential in glucose and glycerol minimal medium (PMID: 17012394) |
|
|
| (NOT(leu-L(e)>0) AND Lrp) | (NOT(leu-L(e)>0) | b0889 (lrp) | Essential in glucose and glycerol minimal medium (PMID: 17012394) |
|
|
| (NOT(leu-L(e)>0) AND Lrp) | (NOT(leu-L(e)>0) | b0889 (lrp) | Essential in glucose and glycerol minimal medium (PMID: 17012394) |
|
|
| (NOT(leu-L(e)>0) AND Lrp) | (NOT(leu-L(e)>0) | b0889 (lrp) | Essential in glucose and glycerol minimal medium (PMID: 17012394) |
|
|
| (NOT(leu-L(e)>0 OR val-L(e)>0) AND Crp) to (ON) | (ON) | b0889 (lrp)glucose (C)gluconate (C) |
|
|
|
| (NOT(leu-L(e)>0 OR val-L(e)>0) AND Crp) to (ON) | (ON) | b0889 (lrp)glucose (C)gluconate (C) | regulatory subunit of |
|
|
| (((“CRP noMAN”) AND NOT(ArcA) AND (DcuR)) | (ON) | b0889 (lrp)L-malate (C) |
|
*indicates alternative optimal solutions exist for this change.
(C) indicates carbon source and (N) indicates nitrogen source.
A- Rule corrections needed for iMC104+iJR904.
B- Rule corrections needed for iMC105A+iAF1260.
C- Rule corrections needed for iMC105AB+iAF1260.
Accuracy and number of rule correction and rescue non-growth cases at successive stages of regulatory rule refinements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32,050 | 32,050 | 50,327 | 50,557 | 50,557 |
|
| 3,079 | 445 | 1,546 | 565 | 510 |
|
| 2,041 | 1,847 | 2,130 | 2,087 | 2,070 |
|
| 23,670 (73.9%) | 26,112 (81.5%) | 39,288 (78.1%) | 40,403 (79.9%) | 40,441 (80.0%) |
Metabolic and regulatory networks used in the integrated models.
Total number of growth phenotypes analyzed.
Number (percent) of cases where the integrated model predictions were in agreement with experimental data.
Figure 2Accuracy and Number of Rule Correction Cases.
Application of GeneForce to correct growth phenotype predictions by overriding regulatory rules (A) Growth phenotype prediction accuracy of integrated regulatory-metabolic network models at various steps of regulatory network refinement. Accuracy (solid circles) is calculated by dividing total number of correct (experimentally consistent) predictions by the total number of cases evaluated (open squares) at each step. The colors correspond to the metabolic networks used in the integrated metabolic and regulatory network models with red for iJR904 and blue for iAF1260. (B) The total number of ‘rule correction’ cases (solid circles) for each regulatory network is plotted. Such cases are represented by +/+/− (Exp/Met/Met+Reg) in the growth comparison tables (Supporting Information Table S1 and S2).
Figure 3Number of Rule Corrections Needed to Correct Model Predictions.
Distribution of rule corrections for +/+/− cases before and after rule corrections for (A) iJR904 with rules from iMC104 (with Lrp modified regulatory rules) and iMC105A, and (B) iAF1260 with rules from iMC105A and iMC105AB. The total number of +/+/− cases for each integrated model is indicated in parenthesis in the legend. For each +/+/− case the minimum number of genes requiring regulatory rule corrections was determined. Panels A and B are histograms representing the number of cases where 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 genes need regulatory rule corrections.
Figure 4Phenotyping Experiments to Confirm Rule Corrections.
Growth phenotype screens for (A) BW25113 (parent strain), lrp::kan ΔilvB, lrp::kan ΔilvN, lrp::kan ΔilvH, and lrp::kan ΔilvI on glucose M9 minimal media, (B) BW25113, lrp::kan, ΔdctA, and lrp::kan ΔdctA on L-malate M9 minimal media, (C) BW25113, ΔrpiA, ΔrpiB, and rpiA::kan ΔrpiB on D-ribose M9 minimal media, (D) BW25113, ΔrpiA, ΔrpiB, and rpiA::kan ΔrpiB on D-allose M9 minimal media, (E) BW25113, ΔcycA, ΔdsdX, and cycA::kan ΔdsdX on D-alanine M9 minimal media, and (F) BW25113, ΔcycA, ΔdsdX, and cycA::kan ΔdsdX on D-serine M9 minimal media.
Single genes or operons that are predicted to rescue non-growth phenotypes under aerobic conditions.
| Media | Gene | Condition |
| Citrate |
| Carbon Source |
| Sucrose |
| Carbon Source |
| 1,2 propanediol |
| Carbon Source |
| Butyrate |
| Carbon Source |
| L-tartrate |
| Carbon Source |
| Allantoin |
| Nitrogen Source |
| Nitrite |
| Nitrogen Source |
Figure 5Number of Rule Corrections Needed to Rescue Non-Growth Phenotypes.
Distribution of ‘rescue non-growth’ (−/+/−) cases before and after rule corrections for (A) iJR904 with rules from the iMC104 (with Lrp modified regulatory rules) and iMC105A, and (B) iAF1260 with rules from iMC105A and iMC105AB. The number in parenthesis in the legends indicates the total number of (−/+/−) cases for the different integrated models. For each −/+/− case on the minimum number of genes requiring regulatory rule violations was determined. Panels a and b are histograms representing the number of cases requiring 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 genes to be overexpressed to rescue non-growth phenotypes.
Regulatory rules needing correction when integrated with a S. typhimurium metabolic network.
| Gene | Original rule | Refined Rule |
|
| (ppa(e)>0) | (PrpR AND RpoN AND (HimA AND HimD)) |
|
| (ppa(e)>0) | (PrpR AND RpoN AND (HimA AND HimD)) |
|
| (ppa(e)>0) | (PrpR AND RpoN AND (HimA AND HimD)) |
|
| (MCITS>0) | |
|
| ON | |
|
| ON | |
|
| ((NOT (Crp OR FadR OR OmpR))) | ON |
|
| (((((FucR) OR (rmn(e)>0)) AND (NOT (o2(e)>0))) AND Crp) OR (((FucR) OR (rmn(e)>0)) AND (NOT (o2(e)>0)))) | (fuc-L(e)>0 OR rmn(e)>0) |
|
| (Crp AND RpoN) | ON |
|
| (NOT(o2(e)>0) AND (tartr-L(e)>0)) | (tartr-L(e)>0) |
|
| (NOT(o2(e)>0) AND (tartr-L(e)>0)) | (tartr-L(e)>0) |
|
| (ArcA OR Fnr AND (Crp OR NOT (NarL))) | ON |
|
| (NOT ArgR) | (NOT ArgR) OR (arg-L(e)>0) |
|
| (NOT PurR) | ON |
|
| (NOT (PurR AND Crp)) | ON |
|
| (NOT (PurR AND Crp)) | ON |
|
| (RhaR) | (RhaR OR (RhaR AND Crp)) |
|
| (rmn(e)>0) | (rmn(e)>0 OR lyx(e)>0 OR man(e)) |
Corrections common to E. coli and S. typhimurium.
prpR, himA and himD were added to the regulatory network to update the regulatory rule for the prpBCD operon, and were not part of the original 505 regulatory rules for S. typhimurium.