Literature DB >> 34791064

COBREXA.jl: constraint-based reconstruction and exascale analysis.

Miroslav Kratochvíl¹, Laurent Heirendt^1,2, St Elmo Wilken³, Taneli Pusa^1,4, Sylvain Arreckx¹, Alberto Noronha⁴, Marvin van Aalst³, Venkata P Satagopam^1,2, Oliver Ebenhöh³, Reinhard Schneider^1,2, Christophe Trefois^1,2, Wei Gu^1,2.

Abstract

SUMMARY: COBREXA.jl is a Julia package for scalable, high-performance constraint-based reconstruction and analysis of very large-scale biological models. Its primary purpose is to facilitate the integration of modern high performance computing environments with the processing and analysis of large-scale metabolic models of challenging complexity. We report the architecture of the package, and demonstrate how the design promotes analysis scalability on several use-cases with multi-organism community models.
AVAILABILITY AND IMPLEMENTATION: https://doi.org/10.17881/ZKCR-BT30. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2021 PMID： 34791064 PMCID： PMC8796381 DOI： 10.1093/bioinformatics/btab782

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Understanding metabolic interactions in cells is a crucial step to investigate disease mechanisms and to discover new therapeutics (Apaolaza ; Brunk ; Cook and Nielsen, 2017). Constraint-Based Reconstruction and Analysis (COBRA) is a promising methodology for analyzing various metabolic processes at the organism- and community- levels (Fang ). The main idea behind COBRA is to represent an organism as a constrained set of interconnected reactions and metabolites based on genomic sequencing data. This leads to a straightforward interpretation of metabolism as a constrained linear system, which enables the utilization of a wide range of well-developed analysis methods (Orth ). The increasing ubiquity of genomic sequencing has led to a rapid expansion in the number and complexity of genome-scale metabolic models, e.g. the human metabolic model that has more than 80 000 reactions (Thiele ). Recent automated reconstruction tools can generate models spanning the entire primary metabolism of both pro- and eukaryotes (Machado ). Consequently, metabolic models are becoming considerably larger in scale than their predecessors, which is further compounded by the construction of multi-member community models. This growth implies increasing analysis complexity (see Supplementary Fig. S1), which in turn drives the need to develop analysis software that can accommodate this complexity. While computing the solutions to the underlying constrained optimization problems is hard to accelerate and parallelize, many analysis types can be decomposed into individual invocations of the optimizer, which may be parallelized. However, despite continued efforts (Heirendt ), this remains challenging due to the scalability limits of existing software implementations. Here, we present COBREXA.jl, a package for implementing and running distributed COBRA workflows. The package is implemented in the Julia programming language (Bezanson ), enabling facile extension with user-defined numeric-computing routines, and interoperability with many high-performance computing packages. It provides a ‘batteries-included’ solution for scaling analyses to make efficient use of high-performance computing (HPC) facilities, giving researchers a powerful toolkit for executing complicated high-volume workflows, such as the creation and exploration of digital metabolic twins in personalized medicine (Björnsson ), and analysis of extensive microbial communities in ecology and biotechnology. We report the implementation architecture, and substantiate how the design accommodates future extensions and scaling of common analysis tasks.

2 Implementation and results

COBREXA.jl is an open architecture solution, providing interchangeable building blocks for implementing complicated COBRA workflows. Common analysis methods, such as flux balance, flux variability and gene knockout analyses (Gudmundsson and Thiele, 2010), are implemented as ready-to-use functions that may be easily composed and customized. Most importantly, the building blocks are designed so that the constructed workflows can be easily separated into parallelizable analysis steps and executed on multiple computation nodes in HPC environments (as illustrated in Fig. 1). The concurrent execution of such workflows results in significant computational speedups, without requiring user expertise in parallel programming.

Fig. 1.

Schema of an example custom analysis construction that examines flux variability in many variants of a model, its distributed execution with COBREXA.jl, and collection of many results in a multi-dimensional array The design of COBREXA.jl distinguishes it from other COBRA implementations, which typically provide parallelization support for only a few selected methods, and no current support for parallelization of custom method variants. For example, parallel single-gene deletion analysis is commonly supported, but a variant that explores the flux variability in knockouts must be reimplemented and parallelized by the user. A variety of model exchange and representation formats are supported, including MATLAB format (Heirendt ); object-oriented JSON format (Ebrahim ), and SBML (Keating ). In addition, implementation of the workflows in Julia results in highly optimized execution of the code at the cost of minor pre-compilation overhead, which benefits large, data-heavy use cases. A detailed architecture overview is provided in Supplementary Section S1. To evaluate the effect of the new architecture and optimizations on the performance and scalability of COBRA analyses, we benchmarked COBREXA.jl on use-cases that benefit from parallelization. We compared its performance to that obtained with COBRApy (Ebrahim ) and the COBRA Toolbox (Heirendt ), which are the widely adopted tools for running COBRA workflows. Running on a 256-CPU multi-node cluster, COBREXA.jl was able to fully utilize the available distributed computing resources and outperform the implementation of flux variability analysis in other packages by a factor of between 2× and 10×, even on relatively small models (Supplementary Table S2). We further demonstrated that COBREXA.jl is able to parallelize and distribute custom workloads by re-implementing the production envelope functionality of COBRApy; leading to speedups of over 10×, even on a single 16-core computation node (Supplementary Table S3). Consequently, we expect that the COBRA methods implemented in COBREXA.jl will enable reliable acceleration of many current and future workloads by simply adding more computing resources. The results are further discussed in Supplementary Section S3.4.

3 Conclusion

COBREXA.jl is a new package developed for large-scale distributed processing of constraint-based biological models. It differs from the other implementations of COBRA methods (Ebrahim ; Heirendt ) by focusing on computational efficiency, and simplifies high-level construction of parallelized user-defined analysis methods. This is required for performing extensive analyses of large models, future-proof extensibility and workload distribution that enables effective utilization of the common HPC infrastructure resources. The package thus enables fast analysis of datasets that may pose challenges for the currently available tools, such as the comprehensive human gut microbiome models.

Funding

This work was supported by the European Union’s Horizon 2020 Programme under the PerMedCoE Project (www.permedcoe.eu) [951773]. This work was also partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy–EXC-2048/1–project ID 390686111 and EU’s Horizon 2020 research and innovation program [862087]. The presented experiments were carried out using the HPC facilities of the University of Luxembourg (see https://hpc.uni.lu). Conflict of Interest: none declared. Click here for additional data file.

13 in total

1. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities.

Authors: Daniel Machado; Sergej Andrejev; Melanie Tramontano; Kiran Raosaheb Patil
Journal: Nucleic Acids Res Date: 2018-09-06 Impact factor: 16.971

Review 2. Genome-scale metabolic models applied to human health and disease.

Authors: Daniel J Cook; Jens Nielsen
Journal: Wiley Interdiscip Rev Syst Biol Med Date: 2017-06-23

Review 3. Reconstructing organisms in silico: genome-scale models and their emerging applications.

Authors: Xin Fang; Colton J Lloyd; Bernhard O Palsson
Journal: Nat Rev Microbiol Date: 2020-09-21 Impact factor: 60.633

4. Computationally efficient flux variability analysis.

Authors: Steinn Gudmundsson; Ines Thiele
Journal: BMC Bioinformatics Date: 2010-09-29 Impact factor: 3.169

5. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0.

Authors: Laurent Heirendt; Sylvain Arreckx; Thomas Pfau; Sebastián N Mendoza; Anne Richelle; Almut Heinken; Hulda S Haraldsdóttir; Jacek Wachowiak; Sarah M Keating; Vanja Vlasov; Stefania Magnusdóttir; Chiam Yu Ng; German Preciat; Alise Žagare; Siu H J Chan; Maike K Aurich; Catherine M Clancy; Jennifer Modamio; John T Sauls; Alberto Noronha; Aarash Bordbar; Benjamin Cousins; Diana C El Assal; Luis V Valcarcel; Iñigo Apaolaza; Susan Ghaderi; Masoud Ahookhosh; Marouen Ben Guebila; Andrejs Kostromins; Nicolas Sompairac; Hoai M Le; Ding Ma; Yuekai Sun; Lin Wang; James T Yurkovich; Miguel A P Oliveira; Phan T Vuong; Lemmer P El Assal; Inna Kuperstein; Andrei Zinovyev; H Scott Hinton; William A Bryant; Francisco J Aragón Artacho; Francisco J Planes; Egils Stalidzans; Alejandro Maass; Santosh Vempala; Michael Hucka; Michael A Saunders; Costas D Maranas; Nathan E Lewis; Thomas Sauter; Bernhard Ø Palsson; Ines Thiele; Ronan M T Fleming
Journal: Nat Protoc Date: 2019-03 Impact factor: 13.491

6. COBRApy: COnstraints-Based Reconstruction and Analysis for Python.

Authors: Ali Ebrahim; Joshua A Lerman; Bernhard O Palsson; Daniel R Hyduke
Journal: BMC Syst Biol Date: 2013-08-08

7. COBRA methods and metabolic drug targets in cancer.

Authors: Iñigo Apaolaza; Edurne San José-Eneriz; Xabier Agirre; Felipe Prósper; Francisco J Planes
Journal: Mol Cell Oncol Date: 2017-11-30

8. DistributedFBA.jl: high-level, high-performance flux balance analysis in Julia.

Authors: Laurent Heirendt; Ines Thiele; Ronan M T Fleming
Journal: Bioinformatics Date: 2017-05-01 Impact factor: 6.937

9. Digital twins to personalize medicine.

Authors: Bergthor Björnsson; Carl Borrebaeck; Nils Elander; Thomas Gasslander; Danuta R Gawel; Mika Gustafsson; Rebecka Jörnsten; Eun Jung Lee; Xinxiu Li; Sandra Lilja; David Martínez-Enguita; Andreas Matussek; Per Sandström; Samuel Schäfer; Margaretha Stenmarker; X F Sun; Oleg Sysoev; Huan Zhang; Mikael Benson
Journal: Genome Med Date: 2019-12-31 Impact factor: 11.117

Review 10. SBML Level 3: an extensible format for the exchange and reuse of biological models.

Authors: Sarah M Keating; Dagmar Waltemath; Matthias König; Fengkai Zhang; Andreas Dräger; Claudine Chaouiya; Frank T Bergmann; Andrew Finney; Colin S Gillespie; Tomáš Helikar; Stefan Hoops; Rahuman S Malik-Sheriff; Stuart L Moodie; Ion I Moraru; Chris J Myers; Aurélien Naldi; Brett G Olivier; Sven Sahle; James C Schaff; Lucian P Smith; Maciej J Swat; Denis Thieffry; Leandro Watanabe; Darren J Wilkinson; Michael L Blinov; Kimberly Begley; James R Faeder; Harold F Gómez; Thomas M Hamm; Yuichiro Inagaki; Wolfram Liebermeister; Allyson L Lister; Daniel Lucio; Eric Mjolsness; Carole J Proctor; Karthik Raman; Nicolas Rodriguez; Clifford A Shaffer; Bruce E Shapiro; Joerg Stelling; Neil Swainston; Naoki Tanimura; John Wagner; Martin Meier-Schellersheim; Herbert M Sauro; Bernhard Palsson; Hamid Bolouri; Hiroaki Kitano; Akira Funahashi; Henning Hermjakob; John C Doyle; Michael Hucka
Journal: Mol Syst Biol Date: 2020-08 Impact factor: 11.429