Literature DB >> 31722383

On the inconsistent treatment of gene-protein-reaction rules in context-specific metabolic models.

Miguel Ponce-de-León¹, Iñigo Apaolaza², Alfonso Valencia^1,3, Francisco J Planes².

Abstract

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Substances：
Proteins

Year: 2020 PMID： 31722383 PMCID： PMC8662768 DOI： 10.1093/bioinformatics/btz832

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

With the publication of high-quality genome-scale metabolic models (GSMs) for several organisms, the Systems Biology community has developed a plethora of algorithms for their analysis making use of ever-growing omics data (Heirendt ). In particular, in human metabolism, the reconstruction of the first genome-scale model RECON1 (Duarte ) promoted the development of Context-Specific Model (CS-Model) reconstruction methods (Opdam ). This family of algorithms aims to identify the catalogue of metabolic reactions involved in a cell in a given condition using omics data, commonly gene expression levels. CS-Models are particularly suitable for studying metabolic differences between human tissues and are widely used in the area of human health (Uhlen ), with applications including the prediction of metabolic vulnerabilities in cancer (Agren ) and the inference of biomarkers in Alzheimer and diabetes (Geng and Nielsen, 2017), among others. Currently, we have dozens of these CS-Models available in different public repositories, usually stored under the Systems Biology Markup Language standard (Hucka ). An essential component of CS-Model reconstruction algorithms are gene-protein-reaction (GPR) rules, which include the information about how genes relate to protein complexes and isozymes as well as the reactions they catalyze. GPR rules are expressed in the form of logical equations and allow us to define active/inactive reactions by mapping expression data onto them. This set of reaction states is used by reconstruction algorithms to extract the final CS-Model, which involves the reactions mapped as active plus a small set of inactive reactions automatically added to fill gaps in the network (gap-filling reactions). The relevance of GPR rules for CS-Model reconstruction algorithms can be more clearly observed in the Supplementary Information, where a general workflow is depicted. Different CS-Model reconstruction algorithms have their own strengths and weaknesses depending on the problem under study and omics data available. In this direction, Opdam and collaborators performed an extensive benchmark of CS-Model algorithms and found that no particular method outperforms the others (Opdam ). However, after careful inspection, we found that all of these algorithms share a common ‘bug’ in the way GPR rules and gene expression data are treated when reconstructing CS-Models. The first issue we encountered is related to how gap-filling reactions are managed in the reconstruction process. Model extraction algorithms may add reactions classified as inactive to fill gaps in the CS-Model. Figure 1A represents a toy model with four reactions: two categorized as active, {1, 4}, and two as inactive, {2, 3}. In this example, the CS-model includes the active reactions and the reaction 2 for gap filling. Importantly, the decision to include the gap-filling reaction 2 implies to update our assumption about the state of genes involved in such reaction, which in this case means to update the state of gene B to active. In addition, if the state of gene B is updated, this change must be propagated through the CS-model. Following the example in Figure 1A, propagating the change on the state of gene B implies that reaction 3 becomes active and thus it should be included in the consolidated CS-Model. We found that this consolidation step is not performed by the published CS-Model reconstruction algorithms, and that neglecting the GPR consolidation step will worsen the model predictions by overestimating the effect of gene knockouts.

Fig. 1.

Identified errors in context-specific metabolic model reconstruction. (A) Illustration of the consequences of not managing the gap-filling reactions properly. (B) Illustration of the consequences of not taking the molecular context into account. (C) The probability distribution of CERES essentiality scores (Meyers ) for recovered essential (orange) and non-essential (blue) genes. The recovered essential genes are those genes which are predicted non-essential by standard GIMME (Becker and Palsson, 2008) but essential when the errors in (A) and (B) are amended. The recovered non-essential genes are those genes which are predicted essential by standard GIMME (Becker and Palsson, 2008) but non-essential when the errors in (A) and (B) are amended. Red and green coloring refers to inactive and active genes, respectively. The green arrows correspond to active reactions, the red arrow refers to an inactive reaction which has been selected as gap-filling and the gray arrow corresponds to an inactive reaction which has not been selected as gap-filling The second and more striking issue is that the molecular context used to reconstruct the CS-model is usually not included as part of the CS-model. If the expression confidences used to infer the reaction states (active/inactive) are left aside after the reaction mapping, part of the information used to reconstruct the CS-Model is lost and, therefore, the model formulation will be incomplete. In the example of Figure 1B, the enolase reaction was set as active based on its expression scores and GPR rule. However, if the GPR of enolase is evaluated without the expression scores, it is not possible to guess which genes are supporting the enolase activity (in Fig. 1B ENO1). Moreover, without the information about gene expression, the default hypothesis will be that all the genes included in the CS-Model are active. Thus, analyzing a CS-Model without considering the context can affect the predictions. Specifically, we found that this inconsistent treatment of the CS-Model leads to under-estimating the effect of gene knockouts. This can be observed in Figure 1B, where the deletion of ENO1 should lead to disrupt the enolase reaction and this is not the case if the context is not taken into account. In order to evaluate the effect of this flaw, we reconstructed ∼400 CS-Models for cell lines from the Cancer Cell Line Encyclopedia (Barretina ) and conducted a gene essentiality analysis. The CS-Models were reconstructed using Gene Expression Barcode to classify genes as active/inactive (McCall ), Recon3D as the reference model (Brunk ) and GIMME as the extraction algorithm (Becker and Palsson, 2008). We amended the output of GIMME to account for the issues discussed above, using a simple and effective approach, which is fully detailed in the Supplementary Information. Summarizing the results over all the cell lines, our approach found 3160 essential genes not predicted by standard GIMME (∼8 per model) and discarded 1061 essential genes predicted by standard GIMME (∼2.5 per model) (Supplementary Fig. S1). These results clearly indicate that ignoring the molecular context has drastic effects on the in-silico predictions. To validate these results, for each cell line we gathered CRISPR–Cas9 essentiality data from DepMap (Tsherniak ) corrected using CERES essentiality score (Meyers ). Figure 1C shows the CERES scores’ probability distributions for the aforementioned 3160 non-essential genes becoming essential (orange curve) and 1061 essential genes becoming non-essential when GIMME considered context (blue curve). This same result is shown in Supplementary Figure S2 using absolute frequencies instead of probability density. As expected, the first group is significantly enriched in DepMap essential genes (one tailed Mann-Whitney test p-value = 9.96·10-41), demonstrating the practical importance of the correct treatment of GPR rules and molecular context. The same analysis was performed with other model extraction methods, i.e. FastGapFill (Thiele ) and FASTCORMICS (Pacheco ) finding similar results (Supplementary Fig. S3). Altogether, our results illustrate the importance of the errors introduced during the GPR translation in many of the published metabolic reconstructions. It is worth to note that, despite the main results were obtained using a particular gene expression thresholding, the problem of the inconsistent treatment of GPR rules is independent of the gene expression thresholding approach. To overcome this issue, we advocate for a strict control of the specific molecular context during the translation of the GPR rules to CS-Models. To that end, the existing CS-Model reconstruction algorithms and storage standards should be modified to be GPR consistent and provide the molecular context. Here, we showed the positive results in the performance of GIMME when this limitation was corrected and similar results are expected with other algorithms.

Funding

I.A. was supported by a Basque Government predoctoral grant [PRE_2018_2_0065]. This work was supported by the Minister of Economy and Competitiveness of Spain [BIO2013-48933, BIO2016-77998-R] and by the European Commission under the INFORE project [H2020-ICT- 825070]. Conflict of Interest: none declared. Click here for additional data file.

14 in total

1. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.

Authors: M Hucka; A Finney; H M Sauro; H Bolouri; J C Doyle; H Kitano; A P Arkin; B J Bornstein; D Bray; A Cornish-Bowden; A A Cuellar; S Dronov; E D Gilles; M Ginkel; V Gor; I I Goryanin; W J Hedley; T C Hodgman; J-H Hofmeyr; P J Hunter; N S Juty; J L Kasberger; A Kremling; U Kummer; N Le Novère; L M Loew; D Lucio; P Mendes; E Minch; E D Mjolsness; Y Nakayama; M R Nelson; P F Nielsen; T Sakurada; J C Schaff; B E Shapiro; T S Shimizu; H D Spence; J Stelling; K Takahashi; M Tomita; J Wagner; J Wang
Journal: Bioinformatics Date: 2003-03-01 Impact factor: 6.937

2. Global reconstruction of the human metabolic network based on genomic and bibliomic data.

Authors: Natalie C Duarte; Scott A Becker; Neema Jamshidi; Ines Thiele; Monica L Mo; Thuy D Vo; Rohith Srivas; Bernhard Ø Palsson
Journal: Proc Natl Acad Sci U S A Date: 2007-01-31 Impact factor: 11.205

3. A Systematic Evaluation of Methods for Tailoring Genome-Scale Metabolic Models.

Authors: Sjoerd Opdam; Anne Richelle; Benjamin Kellman; Shanzhong Li; Daniel C Zielinski; Nathan E Lewis
Journal: Cell Syst Date: 2017-02-15 Impact factor: 10.304

4. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0.

Authors: Laurent Heirendt; Sylvain Arreckx; Thomas Pfau; Sebastián N Mendoza; Anne Richelle; Almut Heinken; Hulda S Haraldsdóttir; Jacek Wachowiak; Sarah M Keating; Vanja Vlasov; Stefania Magnusdóttir; Chiam Yu Ng; German Preciat; Alise Žagare; Siu H J Chan; Maike K Aurich; Catherine M Clancy; Jennifer Modamio; John T Sauls; Alberto Noronha; Aarash Bordbar; Benjamin Cousins; Diana C El Assal; Luis V Valcarcel; Iñigo Apaolaza; Susan Ghaderi; Masoud Ahookhosh; Marouen Ben Guebila; Andrejs Kostromins; Nicolas Sompairac; Hoai M Le; Ding Ma; Yuekai Sun; Lin Wang; James T Yurkovich; Miguel A P Oliveira; Phan T Vuong; Lemmer P El Assal; Inna Kuperstein; Andrei Zinovyev; H Scott Hinton; William A Bryant; Francisco J Aragón Artacho; Francisco J Planes; Egils Stalidzans; Alejandro Maass; Santosh Vempala; Michael Hucka; Michael A Saunders; Costas D Maranas; Nathan E Lewis; Thomas Sauter; Bernhard Ø Palsson; Ines Thiele; Ronan M T Fleming
Journal: Nat Protoc Date: 2019-03 Impact factor: 13.491

5. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Authors: Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway
Journal: Nature Date: 2012-03-28 Impact factor: 49.962

6. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells.

Authors: Robin M Meyers; Jordan G Bryan; James M McFarland; Barbara A Weir; Ann E Sizemore; Han Xu; Neekesh V Dharia; Phillip G Montgomery; Glenn S Cowley; Sasha Pantel; Amy Goodale; Yenarae Lee; Levi D Ali; Guozhi Jiang; Rakela Lubonja; William F Harrington; Matthew Strickland; Ting Wu; Derek C Hawes; Victor A Zhivich; Meghan R Wyatt; Zohra Kalani; Jaime J Chang; Michael Okamoto; Kimberly Stegmaier; Todd R Golub; Jesse S Boehm; Francisca Vazquez; David E Root; William C Hahn; Aviad Tsherniak
Journal: Nat Genet Date: 2017-10-30 Impact factor: 38.330

On the inconsistent treatment of gene-protein-reaction rules in context-specific metabolic models.

Funding

1. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.

2. Global reconstruction of the human metabolic network based on genomic and bibliomic data.

3. A Systematic Evaluation of Methods for Tailoring Genome-Scale Metabolic Models.

4. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0.

5. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

6. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells.

7. The Gene Expression Barcode 3.0: improved data processing and mining tools.

8. Identification of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling.

9. fastGapFill: efficient gap filling in metabolic networks.

Review 10. Transcriptomics resources of human tissues and organs.