| Literature DB >> 23453222 |
Kerstin Scheubert1, Franziska Hufsky, Sebastian Böcker.
Abstract
: The identification of small molecules from mass spectrometry (MS) data remains a major challenge in the interpretation of MS data. This review covers the computational aspects of identifying small molecules, from the identification of a compound searching a reference spectral library, to the structural elucidation of unknowns. In detail, we describe the basic principles and pitfalls of searching mass spectral reference libraries. Determining the molecular formula of the compound can serve as a basis for subsequent structural elucidation; consequently, we cover different methods for molecular formula identification, focussing on isotope pattern analysis. We then discuss automated methods to deal with mass spectra of compounds that are not present in spectral libraries, and provide an insight into de novo analysis of fragmentation spectra using fragmentation trees. In addition, this review shortly covers the reconstruction of metabolic networks using MS data. Finally, we list available software for different steps of the analysis pipeline.Entities:
Year: 2013 PMID: 23453222 PMCID: PMC3648359 DOI: 10.1186/1758-2946-5-12
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Number of EI spectra (top) and tandem mass spectra (bottom) in NIST and Wiley Registry from 2000 until 2011.
Figure 2Inter-instrument comparability of dixyrazine-specific tandem mass spectra collected on different instrumental platforms. Figure provided by Herbert Oberacher, compare to Figure one in Oberacher et al[71].
Software for the three basic steps of molecular formula identification using isotope patterns
| for arbitrary alphabets of elements | |
| | requires only little memory |
| | swift in practice |
| implementing | |
| | decomposing real-valued masses |
| “Seven GoldenRules” [ | to filter molecular formulas |
| multinomial expansion to predict “center masses” | |
| | memory- and time-consuming |
| pruning by probability thresholds and/ormass range | |
| | reduced memory and time consumption |
| | reduced accuracy of the predictions |
| iterative (stepwise) computation of isotopepattern | |
| | probability-weighted center masses |
| | probabilities and masses are updated as atomsare added |
| models the folding procedure as a Markovprocess | |
| Newton-Girard theorem and Vietes formulae to calculate intensities and masses | |
| 2D Fast Fourier Transform that splits up thecalculation in a coarse and a fine structure | |
| | running time improvement for large compounds |
| commercial software by Bruker Daltonics | |
| Bayesian statistics for scoring intensities andmasses of the isotope pattern | |
| simple scoring based only on intensities | |
*Recommended tools.
Figure 3Metabolite identification pipeline based on elemental composition calculation, isotope pattern scoring and subsequent database queries. Figure redrawn from Kind and Fiehn [117].
Approaches for analyzing fragmentation mass spectra of that is, “unexpected” compounds that are not present in spectral libraries [[31]]
| | | |||
|---|---|---|---|---|
| searching for similarspectra in a library,assuming thatspectral similarity isbased on structuralsimilarity | predicting substructures orcompound classes bylearning spectral classifiers | predicting spectra byapplying fragmentationrules to known molecularstructures | mapping the fragmentationspectrum to the compoundstructure to explainthe peaks | computing a fragmentation tree that explains the peaks; aligning fragmentation trees to find similar compounds |
Figure 4Predicting chemical properties (molecule fingerprints) from tandem MS data using a support vector machine (SVM) as done by Heinonen[169]. The predicted fingerprints are used to search a molecular structure database for metabolite identification. Figure redrawn from Heinonen et al[169].
Figure 5Modular structure of xemilofiban. Figure redrawn from Sweeney [200].
Figure 6MetFrag web interface with an example spectrum from Naringenin. Searching KEGG as compound library with an 10 ppm window returns 15 hits, and the correct molecule is ranked at first position.
Figure 7Fragmentation tree of phenylalanine computed from tandem MS data.
Figure 8Using spectral alignment of tandem MS data to generate a molecular network. The thickness of the edges indicates the similarity between the spectra. Figure redrawn from Watrous et al[230].