Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as active if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of "ground truth" metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.
Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as active if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of "ground truth" metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.
Authors: Markus Heinonen; Ari Rantanen; Taneli Mielikäinen; Juha Kokkonen; Jari Kiuru; Raimo A Ketola; Juho Rousu Journal: Rapid Commun Mass Spectrom Date: 2008-10 Impact factor: 2.419
Authors: Kai Dührkop; Markus Fleischauer; Marcus Ludwig; Alexander A Aksenov; Alexey V Melnik; Marvin Meusel; Pieter C Dorrestein; Juho Rousu; Sebastian Böcker Journal: Nat Methods Date: 2019-03-18 Impact factor: 28.547
Authors: Carlos Guijas; J Rafael Montenegro-Burke; Xavier Domingo-Almenara; Amelia Palermo; Benedikt Warth; Gerrit Hermann; Gunda Koellensperger; Tao Huan; Winnie Uritboonthai; Aries E Aisporna; Dennis W Wolan; Mary E Spilker; H Paul Benton; Gary Siuzdak Journal: Anal Chem Date: 2018-02-09 Impact factor: 6.986
Authors: Kenneth Haug; Reza M Salek; Pablo Conesa; Janna Hastings; Paula de Matos; Mark Rijnbeek; Tejasvi Mahendraker; Mark Williams; Steffen Neumann; Philippe Rocca-Serra; Eamonn Maguire; Alejandra González-Beltrán; Susanna-Assunta Sansone; Julian L Griffin; Christoph Steinbeck Journal: Nucleic Acids Res Date: 2012-10-29 Impact factor: 16.971
Authors: Anelize Bauermeister; Helena Mannochio-Russo; Letícia V Costa-Lotufo; Alan K Jarmusch; Pieter C Dorrestein Journal: Nat Rev Microbiol Date: 2021-09-22 Impact factor: 78.297