| Literature DB >> 36187931 |
Zhitao Tian1,2, Fangzhou Liu3, Dongqin Li1, Alisdair R Fernie4, Wei Chen1,2.
Abstract
LC-MS/MS is a major analytical platform for metabolomics, which has become a recent hotspot in the research fields of life and environmental sciences. By contrast, structure elucidation of small molecules based on LC-MS/MS data remains a major challenge in the chemical and biological interpretation of untargeted metabolomics datasets. In recent years, several strategies for structure elucidation using LC-MS/MS data from complex biological samples have been proposed, these strategies can be simply categorized into two types, one based on structure annotation of mass spectra and for the other on retention time prediction. These strategies have helped many scientists conduct research in metabolite-related fields and are indispensable for the development of future tools. Here, we summarized the characteristics of the current tools and strategies for structure elucidation of small molecules based on LC-MS/MS data, and further discussed the directions and perspectives to improve the power of the tools or strategies for structure elucidation.Entities:
Keywords: Complex biological samples; LC–MS/MS; Structure elucidation
Year: 2022 PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1The strategies for metabolite identification based on in-silico approach. (A) From top to bottom workflow represent, in order, the first strategy indicated by the red arrows (generation of in-silico spectral libraries), the second strategy indicated by the orange arrows (substructure annotation for ESI-based spectra), the third strategy indicated by the blue arrows (network-based strategies in metabolite identification for spectrum), (B) The workflow of the fourth strategy indicated by the yellow arrows (metabolite identification for mass spectra with generative methods). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Publications relevant to RT prediction.
| Publication | Year | LC type | Model type | Size of training data | Molecular type | Variables |
|---|---|---|---|---|---|---|
| Hagiwara et al. | 2010 | RP-LC | SVR and MLR | 150 authentic compounds | 9 MDs | |
| Creek et al. | 2011 | HILIC | MLR | 120 authentic compounds | 6 MDs | |
| D'Archivio, Maggi and Ruggieri | 2014 | RP-LC | MLR and PLS regression | 47 authentic compounds | butyl esters of 47 acylcarnitines | 73 MDs |
| Kouskoura, Hadjipavlou-Litina and Markopoulou | 2014 | RP-LC | PLS regression | 100 authentic compounds | 66 MDs | |
| D'Archivio et al. | 2014 | RP-LC | DNNs | 24 authentic compounds | s-triazines | 5 MDs |
| Cao et al. | 2015 | HILIC | MLR and RF | 93 authentic compounds | 346 MDs | |
| Aicheler et al. | 2015 | RP-LC | SVR | 201 authentic compounds | lipid | 11 MDs |
| Munro et al. | 2015 | RP-LC | DNNs | 166 authentic compounds | drugs | 17 MDs |
| Falchi et al. | 2016 | RP-LC | Four combined (fingerprints + ordinary) KPLS models | 1383 authentic compounds | molecular and fingerprints descriptors | |
| Ovcacikova et al. | 2016 | RP-LC | The second degree polynomial regression | 400 authentic compounds | lipid | The carbon number (CN) and the double bonds (DB) number |
| Aalizadeh et al. | 2016 | RP-LC | MLR, DNNs, and SVM | 528 and 298 compounds for positive and negative electrospray ionization mode respectively | 6 MDs | |
| Wolfer et al. | 2016 | RP-LC | Combination of RF and SVR models | 442 authentic compounds | 97 MDs | |
| Kubik and Wiczling | 2016 | RP-LC | Lasso, Stepwise and PLS regressions | 115 authentic compounds | drugs | 50 MDs |
| Barron and McEneff | 2016 | RP-LC | DNNs | 1,117 authentic compounds | 16 MDs | |
| Randazzo et al. | 2016 | RP-LC | PLS regression | 91 authentic compounds | steroids | 97 MDs |
| Taraji et al. | 2017 | HILIC | PLS regression | 16 authentic compounds | β-adrenergic agonists and related compounds | 321 MDs |
| Taraji et al. | 2017 | HILIC | PLS regression | 98 authentic compounds | pharmaceutical compounds | 321 MDs |
| Zhang et al. | 2017 | RP-LC | MLR | 24 authentic compounds | 16-membered ring macrolides | 8 MDs |
| Park et al. | 2017 | RP-LC | MLR | 41 authentic compounds | drugs | 10 MDs |
| Wen et al. | 2018 | RP-LC | PLS regression | 148 authentic compounds | 126 MDs | |
| Wen et al. | 2018 | RP-LC | PLS regression | 191 authentic compounds | 128 MDs | |
| McEachran et al. | 2018 | RP-LC | PLS regression | 97 authentic compounds | 7 MDs | |
| Hall et al. | 2018 | RP-LC | DNNs | 1,955 authentic compounds | 47 MDs | |
| Bouwmeester, Martens and Degroeve | 2019 | RPLC (33) & HILIC (3) | Bayesian Ridge Regression (BRR), Least Absolute Shrinkage and Selection Operator (LASSO), DNNs, Adaptive Boosting (AB), Gradient Boosting (GB), RF and SVR | 6,759 authentic compounds | 151 MDs | |
| Bonini et al. | 2020 | HILIC & RP-LC | XGBoost, Bayesian-regularized Neural Network (BRNN), RF, Light Gradient-Boosting Machine (LightGBM), DNNs | 1,023 (HILIC) & 494 (RP-LC) authentic compounds | 286 MDs | |
| Ju et al. | 2021 | HILIC & RP-LC | DNNs + TL | 77,898 authentic compounds (DNNs), and 17 data sets (Transfer Learning) | 1,470 MDs | |
| Osipenko et al. | 2021 | HILIC & RP-LC | RNNs + TL | 1 million molecules (pre-training) and 269–457 authentic compounds (transfer Learning) | SMILES | |
| Kensert et al. | 2021 | HILIC & RP-LC | Graph Convolutional Networks (GCNs) | 77,980 (SMRT), 852(RIKEN) and 1,400 (Fiehn HILIC) authentic molecules | Graph and 25 atom and bond features | |
| Yang et al. | 2021 | HILIC | GNNs + TL | Graph, 16 kinds of atoms and 4 kinds of bonds | ||
| Yang et al. | 2021 | RP-LC | GNNs + TL | 80,038 authentic molecules (SMRT) for Graph Neural Network, and the MoNA and PredRet datasets for Transfer Learning | Graph | |
| Souihi et al. | 2022 | HILIC & RP-LC | RF regression | 78 authentic compounds | 153 MDs | |
| Liapikos et al. | 2022 | RP-LC | Bayesian Ridge Regression (BRidgeR), Extreme Gradient Boosting Regression (XGBR) and SVR | 26–350 authentic compounds | 70–92 MDs | |
| Fedorova et al. | 2022 | RP-LC | 1D CNN + TL | 77,983 authentic molecules (SMRT) for 1D CNN, 5 data sets for Transfer Learning | SMILES |
Fusion tools for metabolite identification based on LC–MS/MS.
| Name | Function | Availability |
|---|---|---|
| ChemDistiller | FingerScorer + FragScorer | |
| SIRIUS | “Sirius”, CSI:FingerID (with COSMIC), ZODIAC and CANOPUS | |
| msms_rt_score_integration | Mass spectrum and retention time prediction | |
| MetFrag | MetFrag (algorithm) + reference library search + retention times prediction | |
| MetDNA | Structure elucidation from knowns to unknowns | |
| MS-DIAL | MS-FINDER + LipidBlast + reference library search | |
| GNPS | mass spectrometry ecosystem for sharing of MS data and metabolites identification | |
| NAP | spectral networks to propagate information from spectral library matching |