| Literature DB >> 29176591 |
Zijuan Lai1,2, Hiroshi Tsugawa3,4, Gert Wohlgemuth1, Sajjan Mehta1, Matthew Mueller1, Yuxuan Zheng2, Atsushi Ogiwara5, John Meissen1, Megan Showalter1, Kohei Takeuchi6, Tobias Kind1, Peter Beal2, Masanori Arita3,7, Oliver Fiehn1,8.
Abstract
Novel metabolites distinct from canonical pathways can be identified through the integration of three cheminformatics tools: BinVestigate, which queries the BinBase gas chromatography-mass spectrometry (GC-MS) metabolome database to match unknowns with biological metadata across over 110,000 samples; MS-DIAL 2.0, a software tool for chromatographic deconvolution of high-resolution GC-MS or liquid chromatography-mass spectrometry (LC-MS); and MS-FINDER 2.0, a structure-elucidation program that uses a combination of 14 metabolome databases in addition to an enzyme promiscuity library. We showcase our workflow by annotating N-methyl-uridine monophosphate (UMP), lysomonogalactosyl-monopalmitin, N-methylalanine, and two propofol derivatives.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29176591 PMCID: PMC6358022 DOI: 10.1038/nmeth.4512
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1Summary for functional and structural identification of unknown metabolites
(a) BinVestigate to search unknown compounds for metabolomics study metadata and (nominal) EI-MS spectra in BinBase, with results shown as sunburst diagrams to illustrate the biological origin (species, organs, cell types) of unknowns. (b) MS-DIAL 2.0 for universal GC-MS or LC-MS/MS deconvolution with high resolution (HR) mass spectrometry analytics to obtain the deconvoluted HR-MS spectra of unknowns needed for compound identification. (c) MS-FINDER 2.0 for universal GC-EI-MS and LC-ESI-MS/MS spectral interpretation to annotate unknowns in combination with the enzyme promiscuity structure database (MINE), resulted in the discovery of biologically significant chemical structure. The tools are fully connected in MS-DIAL. Each tool is also available as standalone program.
Figure 2Metabolomic meta-analysis for origin exploration by BinVestigate
Bin IDs were queried in over 114,000 samples to show cross-study specificity and relevance of unknown BinBase ID 160842 (left) and unknown BinBase ID 106699 (right). In the sunburst diagrams, the area of the circular sector for each organ (inner cycle) or species (outer cycle) was mathematically determined by the average signal intensity of the unknown compound when present in such origin. Bin ID, Fiehn RI, Kovats RI, number of annotation records, and conclusion of biological significance for the five unknowns discussed in this paper were summarized in the table.
Figure 3Identification of N-methyl-UMP by MS-DIAL 2.0 and MS-FINDER 2.0
High resolution GC-MS analytics was first used for structure elucidation (left), then LC-MS/MS was applied as additional evidence line to validate the discovery (right). (a) Spectral deconvolution: fragment ions and molecular adduct ions of BinBase ID 106699 were deconvoluted and confirmed through MS-DIAL 2.0. (b) Formula prediction and validation: C10H15N2O9P was scored and ranked at 1st in MS-FINDER 2.0 based on mass errors, isotope ratio errors, and subformula assignments. For GC-MS flow, chemical ionization data with different derivatization methods (MSTFA vs. MSTFAd9) were obtained to verify the formula as well as to yield the number of acidic protons; for LC-MS flow, between theoretical values and experimental values, the mass errors were only 1 mDa, and the isotopic ratio errors were within 1%. (c) Structure prediction, validation, and identification: structure candidates were retrieved from MINE DB in addition to internal metabolome database, and in silico fragmented based on hydrogen rearrangement rules, bond dissociation energy, and comprehensive fragmentation rule library (including GC-EI-MS and LC-ESI-MS/MS). N-methyl-UMP was ranked at the most likely structure in MS-FINDER 2.0 with computational assigned substructures. The mass spectra and retention times in GC-MS (left) and LC-MS/MS (right) were matched between BinBase ID 106699 in cancer cell sample with chemically synthesized N-methyl-UMP standard for final validation.