| Literature DB >> 21622952 |
Manor Askenazi1, Michal Linial.
Abstract
Gas chromatography-mass spectrometry (GC-MS) acquisitions routinely yield hundreds to thousands of Electron Ionization (EI) mass spectra. The chemical identification of these spectra typically involves a search protocol that seeks an exact match to a reference spectrum. Reference spectra are found in comprehensive libraries of small molecule EI spectra curated by commercial and public entities. We developed ARISTO (Automatic Reduction of Ion Spectra To Ontology), a webtool, which provides information regarding the general chemical nature of the compound underlying an input EI mass spectrum. Importantly, ARISTO can provide such annotation without necessitating an exact match to a specific compound. ARISTO provides assignments to a subset of the ChEBI (Chemical Entities of Biological Interest) dictionary, an ontology, which aims to cover biologically relevant small molecules. Our system takes as input a mass spectrum represented as a series of mass and intensity pairs; the system returns a graphical representation of the supported ontology as well as a detailed table of suggested annotations along with their associated statistical evidence. ARISTO is accessible at this URL: http://www.ionspectra.org/aristo. The system is free, open to all and does not require registration of any sort.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21622952 PMCID: PMC3125788 DOI: 10.1093/nar/gkr403
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Learning higher level ChEBI concepts by a naive process of spectral averaging. The ‘learnable’ ChEBI concepts (AUC>0.8) include long-chain fatty acid and steroid which are considered to be ‘easily recognizable’ by mass spectrometry experts while other broad categories such as natural product are relatively unexpected. Note that with increasing compound coverage some of the broader categories (N>500) may decrease in their AUC while categories that currently lack adequate spectral support (N < 10) are likely to cross the threshold of learnability.
Figure 2.Screenshots from ARISTO’s web interface. (A) The user loads a mass spectrum corresponding to an unknown compound—in this example the structure is known (CHEBI:49519) and corresponds to example spectrum #5 in the Examples section of the website. The resulting report produced by ChEBI shows: (B) a DAG representing the estimated precision and AUC for each concept represented as the color and size of each concept-node along with its location within the supported ontology (an induced subgraph is highlighted in blue which links all ChEBI annotations passing ARISTO’s default filter back to the DAG’s root node) and (C) a tabular report showing the estimated precision of each prediction, the correctness of the call (marked as True/False and available only for the example input provided by the website) along with image-links to more supporting information. The first image-link yields a mirror plot (D) showing the match between the average annotation-spectrum (top in blue) for a given annotation, in this case straight-chain saturated fatty acid (CHEBI:39418), and the query spectrum (bottom in red). See text for more details.