Literature DB >> 28158334

CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models.

Hulda S Haraldsdóttir1, Ben Cousins2, Ines Thiele1, Ronan M T Fleming1, Santosh Vempala2.   

Abstract

SUMMARY: In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. We apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks.
AVAILABILITY AND IMPLEMENTATION: https://github.com/opencobra/cobratoolbox . CONTACT: ronan.mt.fleming@gmail.com or vempala@cc.gatech.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28158334      PMCID: PMC5447232          DOI: 10.1093/bioinformatics/btx052

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

A constraint-based model of a metabolic network, with m metabolites and n reactions, consists of a set of equalities and inequalities that define a set Ω of feasible steady state reaction rates, or fluxes, . In the linear case, Here, is a generalized incidence matrix known as a stoichiometric matrix. It is defined such that is the stoichiometric coefficient of metabolite i in reaction j. The linear equalities constrain the system to a steady state where fluxes into and out of every node are balanced. Nonequilibrium steady-states are enabled by including metabolite sources and sinks, collectively known as exchange reactions, at the boundary of the system with the environment. The inequalities arise from physicochemical constraints such as thermodynamics, as well as environmental constraints such as nutrient availability. Fluxes can be further constrained to the optimal value of a biologically inspired linear objective(Orth ). Uniform sampling of constraint-based models (Thiele ) is a powerful tool for unbiased evaluation of the metabolic capabilities of biochemical networks (Lewis ). Most applications developed for this purpose (Megchelenbrink ; Saa and Nielsen, 2016; Thiele ) have been based on the artificial centering hit-and-run (ACHR) algorithm (Kaufman and Smith, 1998). ACHR is a non-Markovian process that is designed to ease exploration of a poorly structured set. However, it has some important drawbacks. Namely, it is not known whether it converges to the uniform distribution (Kaufman and Smith, 1998). Here, we present a Matlab implementation of coordinate hit-and-run with rounding (CHRR) that is compatible with the COnstraint-based Reconstruction and Analysis (COBRA) toolbox (Schellenberger ). A major difference with our approach is a preprocessing step which allows us to use a much simpler Markov chain to explore the set of metabolic flows. Rounding procedures have been used previously prior to sampling (De Martino ), but our approach achieves significant improvements for both the quality of the rounding produced and the efficiency of the sampling method (see Supplementary Methods Section S1). We gain inspiration and guidance from the current state-of-the-art theoretical results for high-dimensional sampling (Lovász and Vempala, 2006a,b), while making small modifications which drastically improve efficiency in practice. We compare the performance of CHRR with a comparable implementation of ACHR (Schellenberger ).

2 Implementation

CHRR consists of rounding followed by sampling (see Supplementary Methods Section S1 for details). To round an anisotropic polytope, we use a maximum volume ellipsoid algorithm (Zhang and Gao, 2001). The rounded polytyope is then sampled with a coordinate hit-and-run algorithm (Berbee ). Matlab (Mathworks, Natick, MA) implementations of these algorithms (Cousins and Vempala, 2016) were interfaced with the COBRA toolbox to permit sampling of any constraint-based metabolic model. The algorithmic inputs are a constraint-based metabolic model, that minimally includes S, l, u and c from Eq. 1, and parameters that control the length of the random walk and the sampling density (see Supplementary Tutorial).

3 Performance

When sampling the feasible set of a constraint-based model, it is important to run the sampling algorithm until the sampling distribution converges to a stationary distribution of fluxes over Ω. Otherwise, the sampling distribution is likely to be misrepresentative, leading to incorrect conclusions about the model (see Supplementary Figure). It is generally not empirically possible to verify convergence to the unknown distribution of fluxes over Ω. However, several measures exist that detect the absence of convergence to a stationary sampling distribution. Here, we used the potential scale reduction factor (Gelman ) as described in Supplementary Methods Section S2. For CHRR, it is known that the stationary distribution is the uniform distribution (Berbee ), but no such guarantees are known for ACHR. We compared the convergence time of CHRR to the COBRA toolbox implementation of ACHR (Fig. 1). We found that CHRR converged to a stationary sampling distribution in up to 730 times fewer steps than ACHR (Fig. 1a) on 15 models with dimensions ranging from 24 to 2430 (see Supplementary Methods Section S3). Moreover, each step of CHRR was up to 10 times faster than a step of ACHR (Fig. 1b). Each step of CHRR uses only a small number of arithmetic operations compared to ACHR, and this difference is only exaggerated as the dimension increases. Thus the improved scaling cannot be explained by programmatic differences between the two algorithms. These factors combined to give a 40–3500 fold speedup that tended to increase with model dimension.
Fig. 1.

Convergence times. A comparison between the convergence times of CHRR and ACHR for 15 constraint-based models (see Supplementary Methods Section S3). (a) The number of steps of a random walk required for convergence to a stationary sampling distribution. ACHR did not converge in the maximum walk length of 109 steps on two of the 15 models. These were the synechocystis model iJN678 () and the generic human model Recon 2 (). (b) Average time per step, computed out of 106 steps

Convergence times. A comparison between the convergence times of CHRR and ACHR for 15 constraint-based models (see Supplementary Methods Section S3). (a) The number of steps of a random walk required for convergence to a stationary sampling distribution. ACHR did not converge in the maximum walk length of 109 steps on two of the 15 models. These were the synechocystis model iJN678 () and the generic human model Recon 2 (). (b) Average time per step, computed out of 106 steps

4 Conclusions

Coordinate hit-and-run with rounding makes uniform sampling of genome-scale metabolic networks tractable and reliable. The compatibility of our implementation with the COBRA toolbox should facilitate widespread utilization by the constraint-based metabolic modelling community.

Funding

HSH and IT were supported by the Luxembourg National Research Fund (FNR) through the National Centre of Excellence in Research (NCER) on Parkinson’s disease. BC and SV were supported in part by NSF awards CCF-1217793 and EAGER-1415498. RMTF was funded by the Interagency Modeling and Analysis Group, Multi-scale Modeling Consortium U01 awards from the National Institute of General Medical Sciences, award GM102098, and U.S. Department of Energy, Office of Science, Biological and Environmental Research Program, award DE-SC0010429. Conflict of Interest: none declared. Click here for additional data file.
  8 in total

1.  Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0.

Authors:  Jan Schellenberger; Richard Que; Ronan M T Fleming; Ines Thiele; Jeffrey D Orth; Adam M Feist; Daniel C Zielinski; Aarash Bordbar; Nathan E Lewis; Sorena Rahmanian; Joseph Kang; Daniel R Hyduke; Bernhard Ø Palsson
Journal:  Nat Protoc       Date:  2011-08-04       Impact factor: 13.491

Review 2.  Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods.

Authors:  Nathan E Lewis; Harish Nagarajan; Bernhard O Palsson
Journal:  Nat Rev Microbiol       Date:  2012-02-27       Impact factor: 60.633

3.  ll-ACHRB: a scalable algorithm for sampling the feasible solution space of metabolic networks.

Authors:  Pedro A Saa; Lars K Nielsen
Journal:  Bioinformatics       Date:  2016-03-11       Impact factor: 6.937

4.  What is flux balance analysis?

Authors:  Jeffrey D Orth; Ines Thiele; Bernhard Ø Palsson
Journal:  Nat Biotechnol       Date:  2010-03       Impact factor: 54.908

5.  Candidate metabolic network states in human mitochondria. Impact of diabetes, ischemia, and diet.

Authors:  Ines Thiele; Nathan D Price; Thuy D Vo; Bernhard Ø Palsson
Journal:  J Biol Chem       Date:  2004-11-30       Impact factor: 5.157

6.  A community-driven global reconstruction of human metabolism.

Authors:  Ines Thiele; Neil Swainston; Ronan M T Fleming; Andreas Hoppe; Swagatika Sahoo; Maike K Aurich; Hulda Haraldsdottir; Monica L Mo; Ottar Rolfsson; Miranda D Stobbe; Stefan G Thorleifsson; Rasmus Agren; Christian Bölling; Sergio Bordel; Arvind K Chavali; Paul Dobson; Warwick B Dunn; Lukas Endler; David Hala; Michael Hucka; Duncan Hull; Daniel Jameson; Neema Jamshidi; Jon J Jonsson; Nick Juty; Sarah Keating; Intawat Nookaew; Nicolas Le Novère; Naglis Malys; Alexander Mazein; Jason A Papin; Nathan D Price; Evgeni Selkov; Martin I Sigurdsson; Evangelos Simeonidis; Nikolaus Sonnenschein; Kieran Smallbone; Anatoly Sorokin; Johannes H G M van Beek; Dieter Weichart; Igor Goryanin; Jens Nielsen; Hans V Westerhoff; Douglas B Kell; Pedro Mendes; Bernhard Ø Palsson
Journal:  Nat Biotechnol       Date:  2013-03-03       Impact factor: 54.908

7.  Uniform sampling of steady states in metabolic networks: heterogeneous scales and rounding.

Authors:  Daniele De Martino; Matteo Mori; Valerio Parisi
Journal:  PLoS One       Date:  2015-04-07       Impact factor: 3.240

8.  optGpSampler: an improved tool for uniformly sampling the solution-space of genome-scale metabolic networks.

Authors:  Wout Megchelenbrink; Martijn Huynen; Elena Marchiori
Journal:  PLoS One       Date:  2014-02-14       Impact factor: 3.240

  8 in total
  14 in total

1.  Gapsplit: efficient random sampling for non-convex constraint-based models.

Authors:  Thomas C Keaty; Paul A Jensen
Journal:  Bioinformatics       Date:  2020-04-15       Impact factor: 6.937

2.  Metabolic Modeling of Wine Fermentation at Genome Scale.

Authors:  Sebastián N Mendoza; Pedro A Saa; Bas Teusink; Eduardo Agosin
Journal:  Methods Mol Biol       Date:  2022

3.  Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0.

Authors:  Laurent Heirendt; Sylvain Arreckx; Thomas Pfau; Sebastián N Mendoza; Anne Richelle; Almut Heinken; Hulda S Haraldsdóttir; Jacek Wachowiak; Sarah M Keating; Vanja Vlasov; Stefania Magnusdóttir; Chiam Yu Ng; German Preciat; Alise Žagare; Siu H J Chan; Maike K Aurich; Catherine M Clancy; Jennifer Modamio; John T Sauls; Alberto Noronha; Aarash Bordbar; Benjamin Cousins; Diana C El Assal; Luis V Valcarcel; Iñigo Apaolaza; Susan Ghaderi; Masoud Ahookhosh; Marouen Ben Guebila; Andrejs Kostromins; Nicolas Sompairac; Hoai M Le; Ding Ma; Yuekai Sun; Lin Wang; James T Yurkovich; Miguel A P Oliveira; Phan T Vuong; Lemmer P El Assal; Inna Kuperstein; Andrei Zinovyev; H Scott Hinton; William A Bryant; Francisco J Aragón Artacho; Francisco J Planes; Egils Stalidzans; Alejandro Maass; Santosh Vempala; Michael Hucka; Michael A Saunders; Costas D Maranas; Nathan E Lewis; Thomas Sauter; Bernhard Ø Palsson; Ines Thiele; Ronan M T Fleming
Journal:  Nat Protoc       Date:  2019-03       Impact factor: 13.491

4.  A comparison of Monte Carlo sampling methods for metabolic network models.

Authors:  Shirin Fallahi; Hans J Skaug; Guttorm Alendal
Journal:  PLoS One       Date:  2020-07-01       Impact factor: 3.240

5.  Acetate Metabolism and the Inhibition of Bacterial Growth by Acetate.

Authors:  Johannes Geiselmann; Hidde de Jong; Stéphane Pinhal; Delphine Ropers
Journal:  J Bacteriol       Date:  2019-06-10       Impact factor: 3.490

6.  Flux sampling is a powerful tool to study metabolism under changing environmental conditions.

Authors:  Helena A Herrmann; Beth C Dyson; Lucy Vass; Giles N Johnson; Jean-Marc Schwartz
Journal:  NPJ Syst Biol Appl       Date:  2019-09-02

7.  Bayesian metabolic flux analysis reveals intracellular flux couplings.

Authors:  Markus Heinonen; Maria Osmala; Henrik Mannerström; Janne Wallenius; Samuel Kaski; Juho Rousu; Harri Lähdesmäki
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

Review 8.  Addressing uncertainty in genome-scale metabolic model reconstruction and analysis.

Authors:  David B Bernstein; Snorre Sulheim; Eivind Almaas; Daniel Segrè
Journal:  Genome Biol       Date:  2021-02-18       Impact factor: 13.583

Review 9.  An introduction to the maximum entropy approach and its application to inference problems in biology.

Authors:  Andrea De Martino; Daniele De Martino
Journal:  Heliyon       Date:  2018-04-13

10.  A systematic evaluation of Mycobacterium tuberculosis Genome-Scale Metabolic Networks.

Authors:  Víctor A López-Agudelo; Tom A Mendum; Emma Laing; HuiHai Wu; Andres Baena; Luis F Barrera; Dany J V Beste; Rigoberto Rios-Estepa
Journal:  PLoS Comput Biol       Date:  2020-06-15       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.