| Literature DB >> 29748461 |
Ivana Blaženović1, Tobias Kind2, Jian Ji3, Oliver Fiehn4,5.
Abstract
The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included.Entities:
Keywords: compound identification; high resolution mass spectrometry; in silico fragmentation; library search; metabolomics; tandem mass spectrometry
Year: 2018 PMID: 29748461 PMCID: PMC6027441 DOI: 10.3390/metabo8020031
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
New confidence levels of compound annotations, as discussed by the Compound Identification work group of the Metabolomics Society at the 2017 annual meeting of the Metabolomics Society (Brisbane, Australia). The new addition refers to the ‘Level 0’ annotation; other levels remain as discussed by the Metabolomics Standards Initiative.
| Confidence Level | Description | Minimum Data Requirements |
|---|---|---|
| Level 0 | Unambigous 3D structure: Isolated, pure compound, including full stereochemistry | Following natural product guidelines, determination of 3D structure |
| Level 1 | Confident 2D structure: uses reference standard match or full 2D structure elucidation | At least two orthogonal techniques defining 2D structure confidently, such as MS/MS and RT or CCS |
| Level 2 | Probable structure: matched to literature data or databases by diagnostic evidence | At least two orthogonal pieces of information, including evidence that excludes all other candidates |
| Level 3 | Possible structure or class: Most likely structure, isomers possible, substance class or substructure match | One or several candidates possible, requires at least one piece of information supporting the proposed candidate |
| Level 4 | Unkown feature of insterest: | Presence in sample |
Figure 1Computational metabolomics approaches help to unravel the complexity of the metabolome and especially shed light on unknown metabolites. This includes technologies across different disciplines, including quantum chemistry, machine learning, heuristic approaches and reaction chemistry-based methods.
Overview of selected compound databases commonly used for compound identification.
| Database | Targets | Description |
|---|---|---|
| PubChem [ | All small molecules | Small molecules, metadata |
| ChemSpider [ | All small molecules | Small molecules, curated data |
| KEGG [ | Metabolites | Pathway database, multiple species |
| MetaCyc [ | Metabolites | Pathway database, multiple species |
| BRENDA [ | Enzymes | Enzyme and metabolism data |
| HMDB [ | Metabolites | Human metabolites |
| CHEBI [ | Small molecules | Molecules of biological interest |
| UNPD [ | Metabolites | Secondary plant metabolites |
| MINE [ | Metabolites | In silico predicted metabolites |
Overview of selected mass spectral databases commonly used for compound annotations. Specialized reviews that cover other mass spectral databases are referenced in the text.
| Database | Targets | Description |
|---|---|---|
| NIST | EI-MS, CID-MS/MS | Curated DB, graphical interface |
| WILEY | EI-MS, CID-MS/MS | Largest collection of EI-MS data |
| METLIN [ | CID-MS/MS | Developed for QTOF instruments |
| MoNA | EI, MS/MS, MSn | Autocurated collection of spectra |
| MassBank [ | EI, MS/MS, MSn | Longest standing community database |
| mzCloud [ | MSn | Multiple stage MSn |
| GNPS [ | MS/MS | Community database |
| ReSpect [ | MS/MS, RT | Plant metabolomics database |
Overview of methods for in silico generation of mass spectra, including commercially or freely available algorithms. Additional tools are referenced in text.
| In Silico Method | Software | Platform | Description |
|---|---|---|---|
| Quantum chemistry | QCEIMS | EI-MS | Uses chemistry first principles; requires cluster computations |
| Machine learning | CFM-ID/CSI:FingerID | EI-MS | Requires diverse training sets; Fast method |
| Heuristic approaches | LipidBlast | CID-MS/MS | for specific compound classes (lipids); Fast method |
| Reaction chemistry methods | MassFrontier | EI-MSCID-MS/MS | generates only bar code spectra; Covers experimental gas phase reactions |
Figure 2In silico fragmentation tools such as MS-Finder, CFM-ID, CSI:FingerID and Metfrag utilized known compounds from structure databases to calculate fragments compare those theoretical fragmentations against experimental spectra. When combined with MS/MS database search and utilizing additional metadata annotation rates can be increased tremendously.
Selection of in silico fragmentation software, including commercially or freely available algorithms. Additional algorithms are referenced in the text.
| Tools | Fragmentation Method | Compound DB | Type of Interfacce |
|---|---|---|---|
| MS-FINDER | Rule-based (hydrogen rearrangement rules) | 15 integrated target DBs plus MINE and PubChem | Windows GUI |
| CFM-ID | Hybrid rule-based machine learning | KEGG, HMDB | Web application and command line tool |
| MetFrag | Hybrid rule-based combinatorial | HMDB, KEGG, PubChem | Web application, command line tool, |
| Mass Frontier | Rule-based (literature reaction mechanisms) | Internal MS database | Windows GUI |
| ChemDistiller | Fingerprint and spectral machine learning | 17 different target databases, 130 Mio compounds total | Command line, web-based output |
| MAGMa, MAGMa+ | Rule-based | PubChem, KEGG, HMDB | Web application, command line tool |
| CSI:FingerID | Combination of fragmentation trees and machine learning | PubChem and multiple bio databases | Platform independent GUI, command line tool |
Figure 3Ion mobility can be used as an additional orthogonal approach to resolve complex mixtures. The experimental collision cross-section values (CCS) can be further utilized to train machine learning models to further enrich compound databases with CCS information.
Overview of collaborative software and data sharing repositories, major metabolomics repositories and mass spectral sharing initiatives.
| Data Sharing | Link | Description |
|---|---|---|
| GitHub | github.com | Software development platform |
| BitBucket | bitbucket.org | Collaborative software sharing |
| SourceForge | sourceforge.net | Collaborative software sharing |
| Zenodo | zenodo.org | Open research data repository |
| Figshare | figshare.com | Online research data repository |
| Metabolomics Workbench | metabolomicsworkbench.org | Experimental metabolomics data |
| MetaboLights | ebi.ac.uk/metabolights | European metabolomics repository |
| OpenMSI | openmsi.nersc.gov | Mass spectral imaging data |
| MetaSpace | metaspace2020.eu | Mass spectral imaging data |
| GNPS | gnps.ucsd.edu | Mass spectral data sharing |
| MassBank | massbank.jp | Mass spectral data sharing |
| MoNA | massbank.us | Mass spectral sharing community |
| Norman MassBank | massbank.eu | Mass spectral data sharing |