Literature DB >> 24443383

ProbMetab: an R package for Bayesian probabilistic annotation of LC-MS-based metabolomics.

Ricardo R Silva¹, Fabien Jourdan, Diego M Salvanha, Fabien Letisse, Emilien L Jamin, Simone Guidetti-Gonzalez, Carlos A Labate, Ricardo Z N Vêncio.

Abstract

We present ProbMetab, an R package that promotes substantial improvement in automatic probabilistic liquid chromatography-mass spectrometry-based metabolome annotation. The inference engine core is based on a Bayesian model implemented to (i) allow diverse source of experimental data and metadata to be systematically incorporated into the model with alternative ways to calculate the likelihood function and (ii) allow sensitive selection of biologically meaningful biochemical reaction databases as Dirichlet-categorical prior distribution. Additionally, to ensure result interpretation by system biologists, we display the annotation in a network where observed mass peaks are connected if their candidate metabolites are substrate/product of known biochemical reactions. This graph can be overlaid with other graph-based analysis, such as partial correlation networks, in a visualization scheme exported to Cytoscape, with web and stand-alone versions.

Entities: Chemical Disease Species

Mesh：

Year: 2014 PMID： 24443383 PMCID： PMC3998140 DOI： 10.1093/bioinformatics/btu019

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Metabolomics is an emerging field of study in post-genomics, which aims at comprehensive analysis of small organic molecules in biological systems. Techniques of mass spectrometry coupled to liquid chromatography [liquid chromatography–mass spectrometry (LC–MS)] stand out as dominant methods in metabolomic experiments. Although computational strategies have been used to filter and annotate mass peaks in LC–MS experiments (Dunn ), these methods do not include the addition of external information into a mathematical model in a principled way. Recently, Rogers put forward a proof-of-concept in which information incorporated to a probabilistic model provides better annotation (Breitling ). Their Bayesian model, by means of appropriate prior distribution selection, introduces the elegant idea of using a set of known chemical reactions among candidate compounds to improve annotation, as certain combinations, detected together, would make more biochemical sense than others. The state-of-the-art in probabilistic annotation established by Rogers did not include an integrative computational implementation, a practical connection to public biological databases such as KEGG or MetaCyc (Altman ) or a network-based output visualization schema. Therefore, our contribution is to fulfill these specific needs allowing easy access to this powerful statistical model for all metabolomic bioinformatics community.

2 RESULTS AND CONCLUSION

The platform chosen for implementation of these ideas was the well-known and established R programming environment, which incorporates a wide range of analyses including successful tools that perform preprocessing of spectral data required for metabolite annotation (Supplementary Fig. S1) (Kuhl ; Smith ). Following Rogers brief suggestion on how their previous method could be extended to incorporate additional experimental information and metadata, we implemented modifications to the likelihood term. Expanding the likelihood function L in multiplicative independent terms allows one to account for additional orthogonal (independent) information sources: L = L · L · L, where subindexes N, rt and iso stand for measurement noise model, retention time error model and isotope profile error model, respectively. For a complete model’s description, we refer the interested reader to the Supplementary Material. The main product of a probabilistic annotation is a list of compound candidates ranked by their probabilities (Supplementary Fig. S2). To easily navigate over ProbMetabs results, we display tabular and dynamic network outputs along with supporting information, which assists practitioners to ultimately decide on most parsimonious annotations instead of forcing them to simplistically rely on the top probability assignment. All mass peaks are viewed as graph’s nodes. Edges between two nodes are drawn if any candidate compound assigned to the outgoing node can be metabolized to any candidate compound assigned to the incoming node by means of a known biochemical reaction (Supplementary Fig. S3). ProbMetab is capable of producing reaction graphs and export them as standard Cytoscape input files or broadcasting the necessary graph data and attributes (colour, shapes, etc) directly to Cytoscape Desktop using RCytoscape (Shannon ). This information can be easily overlaid with other widely used systems biology strategies such as correlation or partial correlation networks. If a mass spectra time-series or biological replicates are available, ProbMetab uses third-party packages integrated downstream to export correlation or partial correlation graphs, along with their intersection/difference with the reaction graph. Alternatively, a biologist can visualize ProbMetab’s results in a simplified searchable web interface. Our package has a function that is responsible to consume an online web-service, which checks and renders the broadcast results as a web page. The visualization approach was developed taking advantage of the cytoscape.js library (Lopes ) and its dependencies and can be easily integrated or embedded into any html5 web application. ProbMetab’s documentation brings two detailed case studies in which all its features are explored. Moreover, to highlight integration with downstream and upstream third-party R packages, data analysis examples mentioned are carried out from raw data, following through preprocessing until it reaches ProbMetab’s specific point of action. We used publicly available data from Trypanosoma brucei, causative agent of sleeping sickness, and an original dataset from Saccharum officinarum (sugarcane), an important biofuel source, to illustrate several points in typical metabolomics analysis sections. The T. brucei dataset, obtained from the mzMatch.R project website, was chosen because it presents a set of metabolites identified with the aid of internal control standard compounds, being specially suited for performance evaluation. With this validation dataset, we compare the MetSamp (http://www.dcs.gla.ac.uk/inference/metsamp/) implementation from Rogers with ProbMetab’s implementation and show that, the efficient R/c++ integrated function (Eddelbuettel and François, 2011) had a 3-fold running time improvement over the MATLAB implementation. For both implementations, the higher probability candidate was the true identity in up to 60% of the metabolites. However, instead of reporting only the higher probability candidate identity as proposed by Rogers , we show that exporting the complete ranking in summarized visualizations, up to 90% of metabolite identities are among the top three higher probabilities. The full or filtered ranking allows the experimenter to associate the candidates with additional information present in this output and attribute the correct identity. The sugarcane dataset was chosen to exemplify differential expression of annotated metabolites in contrasting environmental perturbation. We successfully recovered changes in a known stress response pathway (flavone and flavonol biosynthesis), showing the importance of a network-centric visualization for metabolite annotation to track metabolism changes. The benchmark dataset confirms, as preconceived by Rogers , that a probabilistic model using orthogonal data and metadata yields better automatic mass peak annotation. The perturbation dataset shows that probabilistic annotation can produce otherwise impossible interpretation for differential network connectivity. We implemented a method to annotate compounds in a computational framework that allows the introduction of prior knowledge and additional spectral information. With the R package ProbMetab, we provide ways to summarize the results of series of analysis needed to extract information from complex high-dimensional MS data, and help the experimenter to track metabolism changes in the process of interest.

7 in total

1. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification.

Authors: Colin A Smith; Elizabeth J Want; Grace O'Maille; Ruben Abagyan; Gary Siuzdak
Journal: Anal Chem Date: 2006-02-01 Impact factor: 6.986

2. Probabilistic assignment of formulas to mass peaks in metabolomics experiments.

Authors: Simon Rogers; Richard A Scheltema; Mark Girolami; Rainer Breitling
Journal: Bioinformatics Date: 2008-12-18 Impact factor: 6.937

Review 3. Modeling challenges in the synthetic biology of secondary metabolism.

Authors: Rainer Breitling; Fiona Achcar; Eriko Takano
Journal: ACS Synth Biol Date: 2013-05-20 Impact factor: 5.110

4. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets.

Authors: Carsten Kuhl; Ralf Tautenhahn; Christoph Böttcher; Tony R Larson; Steffen Neumann
Journal: Anal Chem Date: 2011-12-12 Impact factor: 6.986

5. Cytoscape Web: an interactive web-based network browser.

Authors: Christian T Lopes; Max Franz; Farzana Kazi; Sylva L Donaldson; Quaid Morris; Gary D Bader
Journal: Bioinformatics Date: 2010-07-23 Impact factor: 6.937

6. RCytoscape: tools for exploratory network analysis.

Authors: Paul T Shannon; Mark Grimes; Burak Kutlu; Jan J Bot; David J Galas
Journal: BMC Bioinformatics Date: 2013-07-09 Impact factor: 3.169

7. A systematic comparison of the MetaCyc and KEGG pathway databases.

Authors: Tomer Altman; Michael Travers; Anamika Kothari; Ron Caspi; Peter D Karp
Journal: BMC Bioinformatics Date: 2013-03-27 Impact factor: 3.169

7 in total

30 in total

1. Target-Decoy-Based False Discovery Rate Estimation for Large-Scale Metabolite Identification.

Authors: Xusheng Wang; Drew R Jones; Timothy I Shaw; Ji-Hoon Cho; Yuanyuan Wang; Haiyan Tan; Boer Xie; Suiping Zhou; Yuxin Li; Junmin Peng
Journal: J Proteome Res Date: 2018-05-29 Impact factor: 4.466

2. xMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data.

Authors: Karan Uppal; Douglas I Walker; Dean P Jones
Journal: Anal Chem Date: 2017-01-04 Impact factor: 6.986

3. Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations.

Authors: Zhuxuan Jin; Jian Kang; Tianwei Yu
Journal: Bioinformatics Date: 2018-05-01 Impact factor: 6.937

4. Deep annotation of untargeted LC-MS metabolomics data with Binner.

Authors: Maureen Kachman; Hani Habra; William Duren; Janis Wigginton; Peter Sajjakulnukit; George Michailidis; Charles Burant; Alla Karnovsky
Journal: Bioinformatics Date: 2020-03-01 Impact factor: 6.937

Review 5. Annotation: A Computational Solution for Streamlining Metabolomics Analysis.

Authors: Xavier Domingo-Almenara; J Rafael Montenegro-Burke; H Paul Benton; Gary Siuzdak
Journal: Anal Chem Date: 2017-11-03 Impact factor: 6.986

6. Network Marker Selection for Untargeted LC-MS Metabolomics Data.

Authors: Qingpo Cai; Jessica A Alvarez; Jian Kang; Tianwei Yu
Journal: J Proteome Res Date: 2017-02-17 Impact factor: 4.466

7. An Introduction to the Benchmarking and Publications for Non-Targeted Analysis Working Group.

Authors: Benjamin J Place; Elin M Ulrich; Jonathan K Challis; Alex Chao; Bowen Du; Kristin Favela; Yong-Lai Feng; Christine M Fisher; Piero Gardinali; Alan Hood; Ann M Knolhoff; Andrew D McEachran; Sara L Nason; Seth R Newton; Brian Ng; Jamie Nuñez; Katherine T Peter; Allison L Phillips; Natalia Quinete; Ryan Renslow; Jon R Sobus; Eric M Sussman; Benedikt Warth; Samanthi Wickramasekara; Antony J Williams
Journal: Anal Chem Date: 2021-11-29 Impact factor: 6.986

Review 8. Computational Metabolomics: A Framework for the Million Metabolome.

Authors: Karan Uppal; Douglas I Walker; Ken Liu; Shuzhao Li; Young-Mi Go; Dean P Jones
Journal: Chem Res Toxicol Date: 2016-10-12 Impact factor: 3.739

9. Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling.

Authors: Ramtin Hosseini; Neda Hassanpour; Li-Ping Liu; Soha Hassoun
Journal: Metabolites Date: 2020-05-03

10. High-throughput metabolomic and transcriptomic analyses vet the potential route of cerpegin biosynthesis in two varieties of Ceropegia bulbosa Roxb.

Authors: Sachin A Gharat; Balkrishna A Shinde; Ravindra D Mule; Sachin A Punekar; Bhushan B Dholakia; Ramesha H Jayaramaiah; Gopalakrishna Ramaswamy; Ashok P Giri
Journal: Planta Date: 2019-12-04 Impact factor: 4.116