Literature DB >> 35672784

Probabilistic metabolite annotation using retention time prediction and meta-learned projections.

Constantino A García1, Alberto Gil-de-la-Fuente2,3, Coral Barbas3, Abraham Otero2,3.   

Abstract

Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of [Formula: see text] and [Formula: see text], respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt.
© 2022. The Author(s).

Entities:  

Keywords:  Bayesian methods; Deep learning; Machine learning; Metabolomics; Retention time

Year:  2022        PMID: 35672784      PMCID: PMC9172150          DOI: 10.1186/s13321-022-00613-8

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   8.489


  28 in total

1.  PredRet: prediction of retention time by direct mapping between multiple chromatographic systems.

Authors:  Jan Stanstrup; Steffen Neumann; Urška Vrhovšek
Journal:  Anal Chem       Date:  2015-08-25       Impact factor: 6.986

Review 2.  Peptide retention time prediction.

Authors:  Luminita Moruz; Lukas Käll
Journal:  Mass Spectrom Rev       Date:  2016-01-22       Impact factor: 10.946

3.  Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics.

Authors:  Heydar Maboudi Afkham; Xuanbin Qiu; Matthew The; Lukas Käll
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

4.  Liquid-chromatography retention order prediction for metabolite identification.

Authors:  Eric Bach; Sandor Szedmak; Céline Brouard; Sebastian Böcker; Juho Rousu
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

5.  Constant size descriptors for accurate machine learning models of molecular properties.

Authors:  Christopher R Collins; Geoffrey J Gordon; O Anatole von Lilienfeld; David J Yaron
Journal:  J Chem Phys       Date:  2018-06-28       Impact factor: 3.488

6.  Comprehensive identification of sphingolipid species by in silico retention time and tandem mass spectral library.

Authors:  Hiroshi Tsugawa; Kazutaka Ikeda; Wataru Tanaka; Yuya Senoo; Makoto Arita; Masanori Arita
Journal:  J Cheminform       Date:  2017-03-15       Impact factor: 5.514

7.  HMDB 4.0: the human metabolome database for 2018.

Authors:  David S Wishart; Yannick Djoumbou Feunang; Ana Marcu; An Chi Guo; Kevin Liang; Rosa Vázquez-Fresno; Tanvir Sajed; Daniel Johnson; Carin Li; Naama Karu; Zinat Sayeeda; Elvis Lo; Nazanin Assempour; Mark Berjanskii; Sandeep Singhal; David Arndt; Yonjie Liang; Hasan Badran; Jason Grant; Arnau Serra-Cayuela; Yifeng Liu; Rupa Mandal; Vanessa Neveu; Allison Pon; Craig Knox; Michael Wilson; Claudine Manach; Augustin Scalbert
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

Review 8.  Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

Authors:  Ivana Blaženović; Tobias Kind; Jian Ji; Oliver Fiehn
Journal:  Metabolites       Date:  2018-05-10

9.  The METLIN small molecule dataset for machine learning-based retention time prediction.

Authors:  Xavier Domingo-Almenara; Carlos Guijas; Elizabeth Billings; J Rafael Montenegro-Burke; Winnie Uritboonthai; Aries E Aisporna; Emily Chen; H Paul Benton; Gary Siuzdak
Journal:  Nat Commun       Date:  2019-12-20       Impact factor: 14.919

View more
  1 in total

1.  Contribution of allergy in the acquisition of uncontrolled severe asthma.

Authors:  María Isabel Delgado Dolset; David Obeso; Juan Rodriguez-Coira; Alma Villaseñor; Heleia González Cuervo; Ana Arjona; Coral Barbas; Domingo Barber; Teresa Carrillo; María M Escribese
Journal:  Front Med (Lausanne)       Date:  2022-09-21
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.