| Literature DB >> 26322134 |
James G Jeffryes1, Ricardo L Colastani2, Mona Elbadawi-Sidhu3, Tobias Kind3, Thomas D Niehaus4, Linda J Broadbelt5, Andrew D Hanson4, Oliver Fiehn6, Keith E J Tyo5, Christopher S Henry2.
Abstract
BACKGROUND: In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. DESCRIPTION: Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted.Entities:
Keywords: Enzyme promiscuity; Liquid chromatography–mass spectrometry; Metabolite identification; Untargeted metabolomics
Year: 2015 PMID: 26322134 PMCID: PMC4550642 DOI: 10.1186/s13321-015-0087-1
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1MINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.
Fig. 2Generalizing a BNICE reaction rule from known biochemical reactions. The common motif of the hydrolysis of the 1,3-diketone is shaded for emphasis.
Comparison of MINEs generated from various source databases and other databases containing computationally predicted metabolites
| Original database compounds | Final database compounds | Fold increase | Compounds found in PubChem (%) | |
|---|---|---|---|---|
| KEGG MINE | 13,307 | 571,368 | 43 | 6.99 |
| EcoCyc MINE | 1,832 | 54,719 | 30 | 11.27 |
| YMDB MINE | 1,978 | 100,755 | 51 | 7.26 |
| IIMDB [ | 23,035 | 400,414 | 18 | 5.11 |
| MyCompoundID [ | 8021 | 375,809 | 47 | Unknown |
| Green Tea metabolites [ | 75 | 27,170 | 363 | 1.58 |
Fig. 3Histogram of Natural Product Likeness. This plot shows the distribution of Natural Product Likeness Scores for the KEGG Database (mean score 0.77), the KEGG MINE (mean score 0.98) and a random sample of 500,000 PubChem compounds (mean score −0.52). A more positive score indicates more natural atomic features.
Fig. 4Screenshot of Metabolomics search results. This screenshot displays features of the metabolomics results including filtering by attributes and highlighting (blue) of compound present in a specified KEGG genome reconstruction.
Annotation of MassBank data
| KEGG | KEGG MINE | PubChem | |
|---|---|---|---|
| Features annotated | 84.5% | 98.6% | 98.5% |
| Correct annotation present | 68.6% | 66.8% | 89.8% |
| Median # of candidates | 3 | 46 | 1714.5 |
Fig. 5Positive MS spectrum (a), positive MS/MS spectrum (b) and negative MS/MS spectrum (c). The positive MS spectrum provides the mass of the precursor ion [M+H]+ = 690.5099 Da and its isotopic abundance pattern. The prominent ion in the positive MS/MS spectrum corresponds to the neutral loss of the phosphoethanolamine head group. The negative MS/MS spectrum shows the molecular ion [M−H]− as well as a pair of ions corresponding to the (16:0) and (16:1) side chains.