Literature DB >> 26467284

Do genome-scale models need exact solvers or clearer standards?

Ali Ebrahim¹, Eivind Almaas², Eugen Bauer³, Aarash Bordbar⁴, Anthony P Burgard⁵, Roger L Chang⁶, Andreas Dräger⁷, Iman Famili⁸, Adam M Feist¹, Ronan Mt Fleming³, Stephen S Fong⁹, Vassily Hatzimanikatis¹⁰, Markus J Herrgård¹¹, Allen Holder¹², Michael Hucka¹³, Daniel Hyduke¹⁴, Neema Jamshidi¹⁵, Sang Yup Lee¹⁶, Nicolas Le Novère¹⁷, Joshua A Lerman¹, Nathan E Lewis¹⁸, Ding Ma¹⁹, Radhakrishnan Mahadevan²⁰, Costas Maranas²¹, Harish Nagarajan⁵, Ali Navid²², Jens Nielsen²³, Lars K Nielsen²⁴, Juan Nogales²⁵, Alberto Noronha³, Csaba Pal²⁶, Bernhard O Palsson¹, Jason A Papin²⁷, Kiran R Patil²⁸, Nathan D Price²⁹, Jennifer L Reed³⁰, Michael Saunders¹⁹, Ryan S Senger³¹, Nikolaus Sonnenschein¹¹, Yuekai Sun³², Ines Thiele³.

Abstract

Entities: Chemical

Mesh：

Year: 2015 PMID： 26467284 PMCID： PMC4631202 DOI： 10.15252/msb.20156157

Source DB: PubMed Journal: Mol Syst Biol ISSN： 1744-4292 Impact factor: 11.429

× No keyword cloud information.

Constraint-based analysis of genome-scale models (GEMs) arose shortly after the first genome sequences became available. As numerous reviews of the field show, this approach and methodology has proven to be successful in studying a wide range of biological phenomena (McCloskey et al, 2013; Bordbar et al, 2014). However, efforts to expand the user base are impeded by hurdles in correctly formulating these problems to obtain numerical solutions. In particular, in a study entitled “An exact arithmetic toolbox for a consistent and reproducible structural analysis of metabolic network models” (Chindelevitch et al, 2014), the authors apply an exact solver to 88 genome-scale constraint-based models of metabolism. The authors claim that COBRA calculations (Orth et al, 2010) are inconsistent with their results and that many published and actively used (Lee et al, 2007; McCloskey et al, 2013) genome-scale models do support cellular growth in existing studies only because of numerical errors. They base these broad claims on two observations: (i) three reconstructions (iAF1260, iIT341, and iNJ661) compute feasibly in COBRA, but are infeasible when exact numerical algorithms are used by their software (entitled MONGOOSE); (ii) linear programs generated by MONGOOSE for iIT341 were submitted to the NEOS Server (a Web site that runs linear programs through various solvers) and gave inconsistent results. They further claim that a large percentage of these COBRA models are actually unable to produce biomass flux. Here, we demonstrate that the claims made by Chindelevitch et al (2014) stem from an incorrect parsing of models from files rather than actual problems with numerical error or COBRA computations.

Calculating numerically accurate and thermodynamically consistent flux states

To prove the feasibility of biomass production in the chosen three models, along with some others, we used the same rational solver QSopt_ex (Applegate et al, 2007) to compute feasible flux states. Moreover, we used SymPy, a symbolic math library (Joyner et al, 2012), to show that the exactly computed feasible flux state has no numerical error. Furthermore, the computed optimal growth rate from QSopt_ex matched those computed by several floating-point solvers accessed via cobrapy (CPLEX, gurobi, glpk, and MOSEK) and the COBRA toolbox (gurobi and CPLEX) to well within a precision of 10−6. Using linear programming problems generated by COBRA for iIT341 and a version of the model we constrained to produce no biomass, we observed consistent results between COBRA and the reputable solvers hosted on the NEOS server. These results unequivocally demonstrate that these COBRA models solve consistently with both rational and floating-point solvers. We were able to extend this analysis to show 23 out of 29 models that Chindelevitch et al (2014) claim to be “blocked” by FBA have solutions that produce biomass flux without numerical error (Table EV1). Thus, the authors' claim that exact arithmetic is necessary for consistency and reproducibility is inaccurate, along with their findings that these previously published and computed models do not produce biomass flux. The authors further claim that even more models are “energy blocked” and cannot produce a feasible flux state to produce biomass without thermodynamically infeasible cycles (often referred to as type III loops). Using loopless FBA (Schellenberger et al, 2011a), we were able to compute solutions that produce biomass without using these loops. Moreover, we demonstrate that in the case that all reactions allow 0 flux (as is the case in the MONGOOSE formulation), all solutions with loops can be converted into solutions without loops and still produce biomass. As these solutions were obtained using an existing algorithm, the inability of MONGOOSE to identify such solutions is a limitation on the method used by MONGOOSE, not on the published reconstructions as stated by Chindelevitch et al (2014). In total, our analysis shows that for 51 out of 59 models, the claims made by MONGOOSE about model blockage are incorrect (Table EV1).

A call for clear standards in model formulation

While the article by Chindelevitch et al (2014) has a valid goal of computing flux states that have been diligently checked for numerical error and thermodynamically infeasible loops, its general conclusions about the current state of COBRA models are incorrect. While more new tools to ensure model quality are welcome, conventional checks with minimal computational overhead already exist, and are routinely employed by the community of flux balance analysis users to ensure that models produce numerically accurate and thermodynamically consistent flux states. We have identified the primary source of the differences between our computations and those reported by Chindelevitch et al (2014) to be difficulties with parsing reconstructions from published files and their conversion into computable models. Many of the models were read from reconstructions encoded as SBML files. The mechanism of encoding COBRA model information along with a reconstruction in SBML was originally defined by the COBRA toolbox (Schellenberger et al, 2011b), which we therefore consider the reference implementation. For example, as a part of the SBML encoding, boundary metabolites are written with their SBML boundary condition set to true for “exchange” reactions. This convention is meant to signify a system boundary where extracellular metabolites enter and leave the system. The parser developed by Chindelevitch et al (2014) to read models from SBML reconstructions ignores this distinction and therefore adds additional constraints to the model. These incorrectly added constraints block any metabolites from entering the system, causing the models to give infeasible growth solutions consistent with mass balance, because mass is not entering and therefore no growth is possible. Thus, erroneous results and conclusions reported by Chindelevitch et al (2014) resulted from incorrect parsing of SBML files, resulting in ill-formulated models and a misinterpretation of their calculations. Part of the issue, however, rests with difficulties associated with encoding models in a consistent format between different labs and software packages. As is the practice in the field, we contacted the authors of the models that we could not solve in order to resolve the differences; after all, the models had been used to perform COBRA computations in their respective publications. In these cases, the authors were able to supply a “fixed” SBML file after correcting errors in the SBML encoding in their respective codebases. An example of one such error was the presence of both “CO2” and “co2” as metabolites in the SBML file for iVS941 (Satish Kumar et al, 2011). While the GAMS software used in simulating that model is case-insensitive and correctly creates one constraint, parsing the file in other packages (such as the COBRA toolbox, cobrapy, and MONGOOSE) incorrectly created two separate constraints for the uppercase and lowercase versions. Therefore, an inadvertent error in a file-encoding led to different mathematical models in different software tools, and working with the authors of the original model was necessary to resolve the differences. Out of the 88 models attempted by Chindelevitch et al (2014), we were able to solve 80, and 9 of these required modifications to fix encoding errors. We attempted to parse 6 of the remaining 8 reconstructions. While the models we parsed from these reconstructions did not solve, this result was still consistent between floating-point and exact solvers. This situation is a symptom of the well-known issue with interoperability of reconstructions between different laboratories and software packages in constraint-based modeling (Ravikrishnan & Raman, 2015). We believe we can improve upon these issues by better adhering to the standard practices of openness and reproducibility (Dräger & Palsson, 2014). We believe the community needs to standardize on the most recent version of the flux balance constraints (fbc) extension to SBML as the single well-specified format to reliably encode reconstructions, as strict use of fbc version 2 was specifically designed to build genome-scale models unambiguously [SBML-flux Working Group, 2014 SBML Flux Balance Constraints (fbc), http://sbml.org/Documents/Specifications/SBML_Level_3/Packages/Flux_Balance_Constraints_(flux) (Accessed June 13, 2015)]. Therefore, we propose that new reconstructions be published as validated SBML+fbc files and that the authors of existing reconstructions convert them into this format. Moreover, in the interests of reproducibility, studies including flux balance analysis on these genome-scale models should strive to make their code easily reproducible. The models and code used in this study are available as Dataset EV1 and also at https://github.com/opencobra/m_model_collection.

10 in total

1. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0.

Authors: Jan Schellenberger; Richard Que; Ronan M T Fleming; Ines Thiele; Jeffrey D Orth; Adam M Feist; Daniel C Zielinski; Aarash Bordbar; Nathan E Lewis; Sorena Rahmanian; Joseph Kang; Daniel R Hyduke; Bernhard Ø Palsson
Journal: Nat Protoc Date: 2011-08-04 Impact factor: 13.491

2. Elimination of thermodynamically infeasible loops in steady-state metabolic models.

Authors: Jan Schellenberger; Nathan E Lewis; Bernhard Ø Palsson
Journal: Biophys J Date: 2011-02-02 Impact factor: 4.033

3. Critical assessment of genome-scale metabolic networks: the need for a unified standard.

Authors: Aarthi Ravikrishnan; Karthik Raman
Journal: Brief Bioinform Date: 2015-02-28 Impact factor: 11.622

Review 4. Constraint-based models predict metabolic and associated cellular functions.

Authors: Aarash Bordbar; Jonathan M Monk; Zachary A King; Bernhard O Palsson
Journal: Nat Rev Genet Date: 2014-01-16 Impact factor: 53.242

5. What is flux balance analysis?

Authors: Jeffrey D Orth; Ines Thiele; Bernhard Ø Palsson
Journal: Nat Biotechnol Date: 2010-03 Impact factor: 54.908

6. Metabolic reconstruction of the archaeon methanogen Methanosarcina Acetivorans.

Authors: Vinay Satish Kumar; James G Ferry; Costas D Maranas
Journal: BMC Syst Biol Date: 2011-02-15

Review 7. Improving collaboration by standardization efforts in systems biology.

Authors: Andreas Dräger; Bernhard Ø Palsson
Journal: Front Bioeng Biotechnol Date: 2014-12-08

8. An exact arithmetic toolbox for a consistent and reproducible structural analysis of metabolic network models.

Authors: Leonid Chindelevitch; Jason Trigg; Aviv Regev; Bonnie Berger
Journal: Nat Commun Date: 2014-10-07 Impact factor: 14.919

Review 9. Basic and applied uses of genome-scale metabolic network reconstructions of Escherichia coli.

Authors: Douglas McCloskey; Bernhard Ø Palsson; Adam M Feist
Journal: Mol Syst Biol Date: 2013 Impact factor: 11.429

10. Systems metabolic engineering of Escherichia coli for L-threonine production.

Authors: Kwang Ho Lee; Jin Hwan Park; Tae Yong Kim; Hyun Uk Kim; Sang Yup Lee
Journal: Mol Syst Biol Date: 2007-12-04 Impact factor: 11.429

10 in total

18 in total

Review 1. An engineering design approach to systems biology.

Authors: Kevin A Janes; Preethi L Chandran; Roseanne M Ford; Matthew J Lazzara; Jason A Papin; Shayn M Peirce; Jeffrey J Saucerman; Douglas A Lauffenburger
Journal: Integr Biol (Camb) Date: 2017-07-17 Impact factor: 2.192

Review 2. Harnessing Big Data for Systems Pharmacology.

Authors: Lei Xie; Eli J Draizen; Philip E Bourne
Journal: Annu Rev Pharmacol Toxicol Date: 2016-10-13 Impact factor: 13.820

3. Reply to "Do genome-scale models need exact solvers or clearer standards?".

Authors: Leonid Chindelevitch; Jason Trigg; Aviv Regev; Bonnie Berger
Journal: Mol Syst Biol Date: 2015-10-14 Impact factor: 11.429

Review 4. Personalized Cardiovascular Disease Prediction and Treatment-A Review of Existing Strategies and Novel Systems Medicine Tools.

Authors: Elias Björnson; Jan Borén; Adil Mardinoglu
Journal: Front Physiol Date: 2016-01-26 Impact factor: 4.566

5. SteadyCom: Predicting microbial abundances while ensuring community stability.

Authors: Siu Hung Joshua Chan; Margaret N Simons; Costas D Maranas
Journal: PLoS Comput Biol Date: 2017-05-15 Impact factor: 4.475

6. Evaluation of reaction gap-filling accuracy by randomization.

Authors: Mario Latendresse; Peter D Karp
Journal: BMC Bioinformatics Date: 2018-02-14 Impact factor: 3.169

7. Traceability, reproducibility and wiki-exploration for "à-la-carte" reconstructions of genome-scale metabolic models.

Authors: Méziane Aite; Marie Chevallier; Clémence Frioux; Camille Trottier; Jeanne Got; María Paz Cortés; Sebastián N Mendoza; Grégory Carrier; Olivier Dameron; Nicolas Guillaudeux; Mauricio Latorre; Nicolás Loira; Gabriel V Markov; Alejandro Maass; Anne Siegel
Journal: PLoS Comput Biol Date: 2018-05-23 Impact factor: 4.475