| Literature DB >> 35322036 |
Homa MohammadiPeyhani1,2, Jasmin Hafner1,3, Anastasia Sveshnikova1, Victor Viterbo1, Vassily Hatzimanikatis4.
Abstract
Metabolic "dark matter" describes currently unknown metabolic processes, which form a blind spot in our general understanding of metabolism and slow down the development of biosynthetic cell factories and naturally derived pharmaceuticals. Mapping the dark matter of metabolism remains an open challenge that can be addressed globally and systematically by existing computational solutions. In this work, we use 489 generalized enzymatic reaction rules to map both known and unknown metabolic processes around a biochemical database of 1.5 million biological compounds. We predict over 5 million reactions and integrate nearly 2 million naturally and synthetically-derived compounds into the global network of biochemical knowledge, named ATLASx. ATLASx is available to researchers as a powerful online platform that supports the prediction and analysis of biochemical pathways and evaluates the biochemical vicinity of molecule classes ( https://lcsb-databases.epfl.ch/Atlas2 ).Entities:
Mesh:
Year: 2022 PMID: 35322036 PMCID: PMC8943196 DOI: 10.1038/s41467-022-29238-z
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1ATLAS workflow applied to known biological and bioactive compounds.
1. Unification- collection of metabolic reactions and biochemical compounds from different publicly available databases, which were merged into a consistent and duplicate-free database, called bioDB. 2. Curation- compounds were annotated with molecular identifiers and reactions were annotated with reaction mechanisms. 3. Expansion- generalized reaction rules from BNICE.ch were applied to bioDB compounds to generate all possible reactions producing known biological or chemical products. 4. Analysis- the connectivity of the biochemical reaction networks was analyzed before and after reaction prediction, as well as the integration of compounds not previously connected in known biochemical networks. 5. Distribution- the results were made available online (https://lcsb-databases.epfl.ch/Atlas2). ΔfGʹ°: estimated Gibbs free energy of formation for compounds under biological conditions.
Compound and reaction statistics for bioDB, bioATLAS, and chemATLAS.
| Category | bioDB | bioATLAS | chemATLAS |
|---|---|---|---|
| Compounds | |||
| Compounds integrated in reaction | 14,902 | 1,007,776 | 1,870,776 |
| Total number of compounds | 1,500,222 | 1,500,222 | 77,934,143 |
| Reactions | |||
| Known reactions | 56,087 | 56,087 | 56,087 |
| BNICE.ch-curated known reactions | 11,172 | 11,172 | 11,172 |
| BNICE.ch-predicted reactions | 0 | 1,578,885 | 5,225,661 |
| Total number of reactions | 56,087 | 1,634,972 | 5,281,748 |
Fig. 2Graph-theoretical analysis of biotransformation networks.
a Schematic overview of statistics and network properties calculated for bioDB, bioATLAS, and chemATLAS. Reactions exclusively involving biological and bioactive compounds (green nodes) are assigned to the bioATLAS reaction space, and reactions involving one or more chemical compound (light blue nodes) are assigned to the chemATLAS reaction space. The main component and the second largest component of the network are schematically shown (white highlight). The diameter, or longest shortest path between any two nodes of the main component, has a length of 8 and is highlighted in red. b The components of each reaction scope in ATLASx have been extracted and ordered by size. Here, the number of compounds (nodes) of each component is plotted on a log-log scale to show the size distribution of disconnected components for bioDB, bioATLAS, and chemATLAS. For the main component (highest number of nodes) as well as for the second largest component, exact numbers of compounds are indicated.
Fig. 3Pathway search comparison to dataset of pathways extracted from MetaCyc.
a Coverage of the collected MetaCyc pathways dataset (1131 pathways) with MetaCyc reactions in ATLAS and all ATLASx network. b Rank of the MetaCyc pathway according to the NICEpath pathway search algorithm. c For the MetaCyc pathway PWY-361 (phenylpropanoid biosynthesis) ATLASx found over 10,000 alternatives with better overall atom conservation. d Example of an alternative pathway for the original MetaCy pathway PWY-361. The LCSB IDs of the ATLASx compounds are given within parentheses.
Fig. 4Example pathway expansion showcase for the biosynthesis of the natural product staurosporine.
a Visual representation of the biosynthesis pathway from tryptophan to staurosporine (obtained from KEGG, steps numbered in bold black) has been expanded for one generation around the native intermediates. b The molecular structure of staurosporine. c Network of potential staurosporine biochemistry expanded four generations around the target compound. The size of each compound nodes decreases with each generation. When no compound is indicated, the LCSB compound ID is provided.
Pathway reconstruction and gap-filling of the staurosporine biosynthesis pathway with ATLASx.
| Step | KEGG ID | EC number | BNICE.ch rule | Top BridgIT hit EC (KEGG ID, score) | Reconstruction within ATLASx |
|---|---|---|---|---|---|
| 1 | R11119 | 1.4.3.- | 1.4.3.- | 1.4.3.23 (R09560, 0.95) | Biotransformation with LCSB ID 2600177067 |
| 2 | R11120 | 2-step reaction (spontaneous + 1.21.98.2) in bioDBa | |||
| 3 | R11121 | 1.13.12.- | Not reconstructed by any suite of reaction rules | ||
| 4 | R11122 | 2.4.-.- | 3-step reaction in chemATLASb | ||
| 5 | R11123 | 2-step reaction in bioATLASc | |||
| 6 | R11129 | 2.1.1.- | 2.1.2.5 (R03189, 0.34) | Biotransformation with LCSB ID 2600423725 | |
| 7 | R05757 | 2.1.1.139[ | 2.1.1.- | 2.1.1.139 (R05757, 1.00) | Biotransformation with LCSB ID 2600261843 |
a https://lcsb-databases.epfl.ch/Graph2/loadPathway/1/1468050408,1469435049,1468050416/2806125367,2806150968/0.
b https://lcsb-databases.epfl.ch/Graph2/loadPathway/1/1468050425,1469288899,277921848,1468050433/2603459454,2603467379,2682146339/0.
c https://lcsb-databases.epfl.ch/Graph2/loadPathway/1/1468050433,1469288674,1468050440/2603455158,2682148818/0.