Literature DB >> 32421310

Updated ATLAS of Biochemistry with New Metabolites and Improved Enzyme Prediction Power.

Jasmin Hafner1, Homa MohammadiPeyhani1, Anastasia Sveshnikova1, Alan Scheidegger1, Vassily Hatzimanikatis1.   

Abstract

The ATLAS of Biochemistry is a repository of both known and novel predicted biochemical reactions between biological compounds listed in the Kyoto Encyclopedia of Genes and Genomes (KEGG). ATLAS was originally compiled based on KEGG 2015, though the number of KEGG reactions has increased by almost 20 percent since then. Here, we present an updated version of ATLAS created from KEGG 2018 using an increased set of generalized reaction rules. Furthermore, we improved the accuracy of the enzymes that are predicted for catalyzing novel reactions. ATLAS now contains ∼150 000 reactions, out of which 96% are novel. In this report, we present detailed statistics on the updated ATLAS and highlight the improvements with regard to the previous version. Most importantly, 107 reactions predicted in the original ATLAS are now known to KEGG, which validates the predictive power of our approach. The updated ATLAS is available at https://lcsb-databases.epfl.ch/atlas.

Entities:  

Keywords:  biochemical database; enzyme prediction; enzyme promiscuity; metabolic networks; reaction prediction

Mesh:

Substances:

Year:  2020        PMID: 32421310      PMCID: PMC7309321          DOI: 10.1021/acssynbio.0c00052

Source DB:  PubMed          Journal:  ACS Synth Biol        ISSN: 2161-5063            Impact factor:   5.110


Predicting hypothetical biochemical reactions and catalyzing enzymes is needed to design novel pathways in metabolic engineering and to fill knowledge gaps in our understanding of metabolism. The ATLAS of Biochemistry[1] is a database of known and predicted biochemical reactions that was compiled by taking the biological data available in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and predicting the biochemical reactions that would produce the contained compounds. Published in 2016, the utility of ATLAS has been recognized by several reviews as a source of novel metabolic reactions for enzyme and metabolic engineering.[2−4] More recently, Yang et al. experimentally validated hypothetical ATLAS reactions and used them to construct novel one-carbon assimilation pathways.[5] However, ATLAS was created based on the biochemical knowledge available in KEGG 2015.[6] Since then, KEGG has added 802 new metabolites, 918 new reactions, and 633 enzymes to its collection. The expansion of biochemical reactions within ATLAS relies on the reaction prediction tool BNICE.ch[7−12] (Biochemical Network Integrated Computational Explorer), which consists of (i) a large set of expert-curated, generalized reaction rules that mimic the promiscuous activity of enzymes, and (ii) a network-generating algorithm that applies the reaction rules to molecular structures to generate possible biochemical reactions and compounds. The BNICE.ch reaction rules can reconstruct known biochemical reactions, as well as generate novel, hypothetical reactions. Currently, BNICE.ch has 400 bidirectional reaction rules that account for both the forward and reverse reaction directionality. More than 130 000 novel biochemical reactions between known biological compounds have been predicted using this strategy. Herein, we integrated the new KEGG 2018 data into our database and expanded the biochemical space covered by ATLAS from 137 877 to 149 052 reactions. Interestingly, we found that the newly available data validated 107 novel reactions predicted in ATLAS 2015. In the following, we discuss the updated ATLAS statistics and illustrate the improvements compared to the first version. The latest version of ATLAS is available online (https://lcsb-databases.epfl.ch/atlas).

Methods

The ATLAS Workflow

To generate the new version of ATLAS, we applied the BNICE.ch reaction rules to all of the metabolites available in KEGG to generate all possible biochemically consistent reactions between any two or more KEGG compounds. Two types of additional annotations were performed on the generated reactions: First, the new ATLAS reactions were curated with Gibbs free energy of reaction estimated with the Group Contribution Method (GCM).[13] Second, the computational tool BridgIT was used to assign known enzymes to novel, predicted reactions[14] by comparing the molecular structure of the participants in a novel, predicted reaction to a database of known, well-curated reactions with full gene-protein-reaction assignment. It calculates a similarity score between the novel and the known reactions, which makes it possible to find the enzyme with the highest probability of catalyzing the novel reaction.

Updated Tools and Methods

Since 2015, two main aspects of our workflow have been updated, which were applied to generate the updated version of ATLAS. First, the set of bidirectional reaction rules was increased from 360 to 400. Second, we applied the most recent version of BridgIT to predict putative enzymes for novel compounds, and we report the top three enzyme matches for each. The 40 new rules were created to reconstruct the exact reaction mechanism of an additional number of 510 KEGG reactions that were not considered previously (i.e., KEGG reaction R03223) (Table S1). Marvin was used for drawing, displaying, and characterizing chemical structures, substructures and reactions, Marvin 17.28.0, 2017, ChemAxon (http://www.chemaxon.com).

Results and Discussion

ATLAS 2018, based on KEGG 2018, now has 149 052 reactions, out of which 5779 are known to KEGG. Compared to 2015, we added 385 known and 11 173 novel reactions (Table S2). Thanks to the predicted reactions, ATLAS now integrates 4587 out of 9857 disconnected, or “orphan”, KEGG metabolites, which were not participating in any known biochemical reaction.

Increased Coverage of KEGG Reactions

The KEGG database contained 18 254 compounds as of February 2018 (Table ). In a first preprocessing step, we removed 999 compounds without clearly defined molecular structures (e.g., polymers, proteins). The filtered data set comprised 17 255 compounds, out of which 9857 were not involved in any KEGG reaction. These orphan compounds did not participate in any known biotransformation in the KEGG metabolic space.
Table 1

Overview of Compound, Reaction, and Enzyme Statistics in KEGG and ATLAS

  ATLAS 2015ATLAS 2018percent change
KEGG compoundsTotal number of compounds17 45018 254+5%
 Filtered compounds (fc)16 79817 255 
 Orphan KEGG compounds (okc)9371 (56% of fc)9857 (57% of fc) 
KEGG reactionsTotal number of reactions913510 829+19%
 Filtered reactions859210 753 
BNICE.chNumber of bidirectional enzymatic reaction rules360400+11%
KEGG reaction reconstructionCovered reactions total66518118+22%
 Exact coverage52705779 
 Alternative cofactor usage9161705 
 2-step reconstruction387408 
 3-step reconstruction78145 
 4-step reconstruction81 
ATLAS statisticsTotal number of reactions137 877149 052+8%
 Novel reactions132 607143 272 
 Total number of compounds10 36210 939 
 Number of orphan compounds integrated in ATLAS3945 (42% of okc)4587 (47% of okc) 
Consistency of EC numbersa1st level EC match79 058138 168+75%
 2nd level EC match65 854126 689+92%
 3rd level EC match47 91894 168+96%

Number of matches between the EC assignment from the reaction rules and the EC numbers assigned by BridgIT for novel reactions in ATLAS.

Number of matches between the EC assignment from the reaction rules and the EC numbers assigned by BridgIT for novel reactions in ATLAS. Out of the 10 829 reactions in KEGG, 76 involved compounds with an undefined structure that were removed, resulting in a filtered set of 10 753 reactions. Out of these, 8118 reactions were reconstructed with BNICE.ch reaction rules. We observed three different types of reaction reconstruction: 5779 reactions were exactly reconstructed, meaning that the reactions generated by BNICE.ch use the same cofactors as in KEGG. Another 1705 reactions were reconstructed using alternative cofactors, out which 123 reactions were poorly characterized in KEGG (i.e., reaction mechanism not known, incomplete reaction). The remaining 634 reactions were reconstructed in two (408 reactions), three (145 reactions), or four (81 reactions) consecutive reaction steps. A total of 2635 KEGG reactions were not reconstructed with BNICE.ch (Table S3). First, 1546 reactions did not fulfill the BNICE.ch requirements for reconstruction, such as reactions involving polymer structures, generic compounds, or compounds without a defined molecular structure, as well as elementally unbalanced reactions and stereoisomerase reactions. Additionally, the reaction rules are organized according to the Enzyme Classification (EC) system, so each reconstructed or predicted reaction is automatically assigned a third-level EC number corresponding to the nonsubstrate specific EC classification of the reconstructing reaction rule. Another 308 reactions had partial or missing EC number annotations, indicating that the reaction mechanisms are not known and therefore no rule has been created for these reactions. The remaining 862 reactions were not reconstructed because their reaction mechanisms are very specific and hence not readily generalizable.

Predicted ATLAS Reactions Validated in KEGG and Other Databases

To validate the predicted reactions in ATLAS, we analyzed the novel reactions predicted in 2015 that became known in KEGG 2018. Out of the 958 reactions newly added to KEGG, only 239 reactions involved compounds that were already present in KEGG 2015, meaning that they could have been predicted in the original ATLAS. Out of these 239 reactions, 107 were already present in ATLAS. In other words, the existence of hypothetical reactions in ATLAS 2015 was confirmed in KEGG 2018, demonstrating the predictive power of BNICE.ch. Next, we examined the enzymes that BridgIT suggested in ATLAS 2015 for these 107 novel reactions, out of which 75 had an enzyme assigned. Interestingly, we found that the predicted EC numbers for 64 out of 75 reactions match the EC number proposed in KEGG up to the third level. For example, the novel reaction rat104204 was predicted to have an EC number of 2.4.1.-. BridgIT suggested R08946 as the most similar reaction, which was known to be catalyzed by 2.4.1.245. In 2018, KEGG confirmed the promiscuous activity of 2.4.1.245 for this reaction and named it R11306. In ATLAS 2018, we additionally mapped the novel reactions to reaction databases other than KEGG. Interestingly, we found that 1118 predicted reactions in ATLAS were not actually novel, but known to at least one of the repositories Brenda, Reactome, HMR, MetaCyc, MetaNetX, BIGG, or Rhea, which shows that the predictive power of ATLAS goes beyond KEGG (Table S4). ATLAS reactions that can be found in any of these databases are linked accordingly in the updated version.

Improvements in the Prediction of Enzymes for ATLAS Reactions

To find putative enzymes for the reactions in ATLAS, we applied the enzyme prediction tool BridgIT. With the latest version of the tool, the new predictions were significantly better in the updated ATLAS: BridgIT correctly matched 92% of ATLAS reactions to the same EC class as BNICE.ch rules, whereas the previous version only matched around 60% (Table ). For each ATLAS reaction, we provide the top three candidate enzymes, and we also include BridgIT results for known KEGG reactions to provide alternative enzymes for a known reaction. As a qualitative example of an improved prediction, we analyzed the ATLAS reaction rat109456, whose closest BridgIT candidate had a low matching score of 0.67. In ATLAS 2018, the reaction is now known and BridgIT found three very similar reactions, the first of which having a higher score than in the previous version (Figure ).
Figure 1

Reaction with ATLAS identifier rat109456 is an example of a reaction that was novel in ATLAS 2015 and that is now cataloged in KEGG. (left) In ATLAS 2015, the earlier version of BridgIT provided the most similar known reaction, and associated enzyme, for the ATLAS reaction with the ID. (right) In ATLAS 2018, the same reaction is now cataloged in KEGG as R11332 with EC 5.3.1.33. Other than the native enzyme with EC 5.3.1.33, BridgIT provides three alternative enzyme candidates that might also catalyze the reaction.

Reaction with ATLAS identifier rat109456 is an example of a reaction that was novel in ATLAS 2015 and that is now cataloged in KEGG. (left) In ATLAS 2015, the earlier version of BridgIT provided the most similar known reaction, and associated enzyme, for the ATLAS reaction with the ID. (right) In ATLAS 2018, the same reaction is now cataloged in KEGG as R11332 with EC 5.3.1.33. Other than the native enzyme with EC 5.3.1.33, BridgIT provides three alternative enzyme candidates that might also catalyze the reaction.

Conclusion

We have updated the ATLAS of Biochemistry to integrate new biochemical data from KEGG 2018 using an updated set of generalized reaction rules and by employing an improved version of BridgIT to enhance the enzyme predictions for novel reactions. This study demonstrates the dynamic nature of biochemical knowledge and highlights the need for continuous updates of database-dependent applications. The updated ATLAS database contributes to fill the gaps in our current knowledge of metabolism by expanding the boundaries to novel predicted metabolic reactions. The updated ATLAS database is freely available online for academia upon request.
  12 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  DREAMS of metabolism.

Authors:  Keng Cher Soh; Vassily Hatzimanikatis
Journal:  Trends Biotechnol       Date:  2010-08-19       Impact factor: 19.536

3.  Exploring the diversity of complex metabolic networks.

Authors:  Vassily Hatzimanikatis; Chunhui Li; Justin A Ionita; Christopher S Henry; Matthew D Jankowski; Linda J Broadbelt
Journal:  Bioinformatics       Date:  2004-12-21       Impact factor: 6.937

Review 4.  Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways.

Authors:  Noushin Hadadi; Vassily Hatzimanikatis
Journal:  Curr Opin Chem Biol       Date:  2015-07-11       Impact factor: 8.822

Review 5.  Systems Metabolic Engineering Strategies: Integrating Systems and Synthetic Biology with Metabolic Engineering.

Authors:  Kyeong Rok Choi; Woo Dae Jang; Dongsoo Yang; Jae Sung Cho; Dahyeon Park; Sang Yup Lee
Journal:  Trends Biotechnol       Date:  2019-02-05       Impact factor: 19.536

6.  ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies.

Authors:  Noushin Hadadi; Jasmin Hafner; Adrian Shajkofci; Aikaterini Zisaki; Vassily Hatzimanikatis
Journal:  ACS Synth Biol       Date:  2016-07-28       Impact factor: 5.110

7.  Discovery and Evaluation of Biosynthetic Pathways for the Production of Five Methyl Ethyl Ketone Precursors.

Authors:  Milenko Tokic; Noushin Hadadi; Meric Ataman; Dário Neves; Birgitta E Ebert; Lars M Blank; Ljubisa Miskovic; Vassily Hatzimanikatis
Journal:  ACS Synth Biol       Date:  2018-08-07       Impact factor: 5.110

8.  Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate.

Authors:  Christopher S Henry; Linda J Broadbelt; Vassily Hatzimanikatis
Journal:  Biotechnol Bioeng       Date:  2010-06-15       Impact factor: 4.530

9.  Computational framework for predictive biodegradation.

Authors:  Stacey D Finley; Linda J Broadbelt; Vassily Hatzimanikatis
Journal:  Biotechnol Bioeng       Date:  2009-12-15       Impact factor: 4.530

10.  Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites.

Authors:  Noushin Hadadi; Homa MohammadiPeyhani; Ljubisa Miskovic; Marianne Seijo; Vassily Hatzimanikatis
Journal:  Proc Natl Acad Sci U S A       Date:  2019-03-25       Impact factor: 11.205

View more
  7 in total

1.  NICEdrug.ch, a workflow for rational drug design and systems-level analysis of drug metabolism.

Authors:  Anush Chiappino-Pepe; Kiandokht Haddadi; Homa MohammadiPeyhani; Jasmin Hafner; Noushin Hadadi; Vassily Hatzimanikatis
Journal:  Elife       Date:  2021-08-03       Impact factor: 8.140

2.  PhenoMapping: a protocol to map cellular phenotypes to metabolic bottlenecks, identify conditional essentiality, and curate metabolic models.

Authors:  Anush Chiappino-Pepe; Vassily Hatzimanikatis
Journal:  STAR Protoc       Date:  2021-01-22

3.  A computational workflow for the expansion of heterologous biosynthetic pathways to natural product derivatives.

Authors:  Jasmin Hafner; James Payne; Homa MohammadiPeyhani; Vassily Hatzimanikatis; Christina Smolke
Journal:  Nat Commun       Date:  2021-03-19       Impact factor: 14.919

Review 4.  Addressing uncertainty in genome-scale metabolic model reconstruction and analysis.

Authors:  David B Bernstein; Snorre Sulheim; Eivind Almaas; Daniel Segrè
Journal:  Genome Biol       Date:  2021-02-18       Impact factor: 13.583

5.  15 years of microbial biotechnology: the time has come to think big-and act soon.

Authors:  Víctor de Lorenzo
Journal:  Microb Biotechnol       Date:  2021-12-21       Impact factor: 5.813

6.  Non-natural Aldol Reactions Enable the Design and Construction of Novel One-Carbon Assimilation Pathways in vitro.

Authors:  Yufeng Mao; Qianqian Yuan; Xue Yang; Pi Liu; Ying Cheng; Jiahao Luo; Huanhuan Liu; Yonghong Yao; Hongbing Sun; Tao Cai; Hongwu Ma
Journal:  Front Microbiol       Date:  2021-06-02       Impact factor: 5.640

7.  Expanding biochemical knowledge and illuminating metabolic dark matter with ATLASx.

Authors:  Homa MohammadiPeyhani; Jasmin Hafner; Anastasia Sveshnikova; Victor Viterbo; Vassily Hatzimanikatis
Journal:  Nat Commun       Date:  2022-03-23       Impact factor: 17.694

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.