Literature DB >> 34125879

peakPantheR, an R package for large-scale targeted extraction and integration of annotated metabolic features in LC-MS profiling datasets.

Arnaud M Wolfer^1,2, Gonçalo D S Correia^1,3, Caroline J Sands^1,3, Stephane Camuzeaux^1,3, Ada H Y Yuen^1,3, Elena Chekmeneva^1,3, Zoltán Takáts^1,3, Jake T M Pearce¹, Matthew R Lewis^1,3.

Abstract

Untargeted LC-MS profiling assays are capable of measuring thousands of chemical compounds in a single sample, but unreliable feature extraction and metabolite identification remain considerable barriers to their interpretation and usefulness. peakPantheR (Peak Picking and ANnoTation of High-resolution Experiments in R) is an R package for the targeted extraction and integration of annotated features from LC-MS profiling experiments. It takes advantage of chromatographic and spectral databases and prior information of sample matrix composition to generate annotated and interpretable metabolic phenotypic datasets and power workflows for real time data quality assessment. AVAILABILITY: peakPantheR is available via Bioconductor (https://bioconductor.org/packages/peakPantheR/). Documentation and worked examples are available at https://phenomecentre.github.io/peakPantheR.github.io/ and https://github.com/phenomecentre/metabotyping-dementia-urine. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2021 PMID： 34125879 PMCID： PMC8665750 DOI： 10.1093/bioinformatics/btab433

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Liquid chromatography–mass spectrometry (LC–MS) is a key analytical platform in modern metabolic phenotyping workflows, owning to its sensitivity and broad chemical coverage. A state-of-the-art LC–MS metabolic profiling assay is capable of detecting >10 000 ion species in a single sample (Ivanisevic ; Lewis ; Naser ). This information is commonly extracted with untargeted peak picking algorithms. These algorithms attempt to extract as many peaks as possible from each sample, and account for sample-to-sample analytical variation by establishing correspondences between similar signals across samples, combining peaks into groups known as features. The end product is a data matrix of samples and features which can be filtered to reduce variable inflation due to false positives at the peak detection stage and remove poor quality measurements. Chemical assignment is then performed by matching feature’s retention time and m/z values to spectral databases. An alternative and more direct approach is to tackle LC–MS data pre-processing as a targeted feature extraction problem, prioritizing ion species peaks known to be well captured by the analytical methodology. Advances in the characterization of metabolomes (Wishart ) and LC–MS assays (Tada ), including the improved quality of spectral and chromatographic databases, make such an approach more tractable, even for application to complex sample matrices. However, existing software for targeted feature extraction is more tailored to the integration of a limited number of features in targeted LC–MS triple quadrupole experiments, with interfaces and visualization designed for examination of each individual signal in each sample at the cost of extensive manual intervention. While appropriate for supporting targeted bioanalysis workflows, they are impractically applied to high-resolution global profiling data. For this reason, an unmet need exists for automated, scalable and high-throughput targeted annotation and integration software that is suited to the extraction of hundreds of features in large LC–MS profiling experiments. To address this, we have developed peakPantheR (Peak Picking and ANnoTation of High-resolution Experiments in R), an open-source R package for targeted extraction and integration of annotated chemical compounds from untargeted LC–MS profiling datasets. peakPantheR leverages prior knowledge of LC–MS performance characteristics and provides users with both an automated data extraction solution and direct interface for manual refinement where necessary (see Figure 1).

Fig. 1.

Overview of the peakPantheR package functionality and example outputs

2 The peakPantheR package

2.1 Implementation

peakPantheR is an open-source R (v4.0.0 or above) package and is available via Bioconductor (https://bioconductor.org/packages/peakPantheR/). The main functionality is command-line based, but a shiny graphical user interface (GUI) is provided to assist users in visualizing and iteratively refining the integration region boundary and parameters. Emphasis was placed on providing visualization options and diagnostic metrics adequate for inspection of results at dataset level, to facilitate robust high-throughput analysis. Tutorial vignettes exemplifying the main functions are available via Bioconductor. An example application to a cohort of 600 human urine biofluid samples profiled by three complementary LC–MS assays can be found in https://github.com/phenomecentre/metabotyping-dementia-urine. In this example, 315 annotated ion species are extracted using peakPantheR from three LC–MS assays described by Lewis ). Detailed instructions manual is also available in the Supplementary File S1.

2.2 Features

peakPantheR workflows are structured around the peakPantheRAnnotation object, which represents the outcome of a targeted search and integration of signals in a series of pre-specified regions of interest (ROI). The required inputs for peakPantheR are the raw MS data files in mzML or any format supported by mzR (Chambers ) and a comma-separated file defining the retention time and m/z boundaries for the ROI to integrate. Although designed for centroided data, profile/continuum data are supported. Functionality to run peakPantheR in parallel across multiple MS files simultaneously is provided via batch commands.

2.3 Chromatographic peak models and quality metrics

The peakPantheR integration model works by fitting a chromatographic line-shape model to the extracted ion chromatograms (EIC) from each ROI. Two line shapes are supported, a skewed Gaussian and an exponential modified Gaussian model. These are specifically tailored for chromatographic signals and can recreate asymmetry and tailing/fronting. If a peak model can be fitted acceptably to the EIC, the line shape is used to obtain the peak integral and other characteristics (i.e. peak width and peak asymmetry), otherwise a fallback integration of the EIC data points is performed, to handle extreme deviations in peak shape. peakPantheR is intended to be applied iteratively to a series of features/samples; to improve the reliability of the integration across the entirety of a dataset, the software automatically suggests refinements of ROI based on dataset-wide consensus estimated from a previous run’s results. Detailed information about the line-shape models, algorithms and metrics estimated can be found in the Supplementary materials.

2.4 Retention time adjustment

Retention time values are empirically derived, and therefore systematic deviations from data-based values are expected. Functionality for retention time re-calibration based on expected retention times for calibrants (either spiked internal standards or endogenous compounds) is implemented, including a robust RANSAC (Fischler and Bolles, 1981) method for correction based on endogenous compounds.

2.5 Shiny GUI

A shiny GUI is available to review peakPantheR’s results (Figure 2). The EIC and the corresponding line-shape fits are displayed in interactive plots, with action buttons and forms so the user can review and adjust the ROI boundaries more easily and re-trigger the integration procedure.

Fig. 2.

peakPantheR’s shiny graphical user interface

3 Concluding remarks

peakPantheR is a general purpose, automated and scalable targeted feature extraction software capable of producing high-fidelity datasets from global profiling LC–MS data. We anticipate it to be a valuable addition to the existing LC–MS data pre-processing toolkit as a key component of targeted integration workflows which take advantage of established chromatographic databases to obtain annotated, interpretable, and ultimately, actionable metabolic phenotypic datasets.

Funding

This work was supported by the Medical Research Council (MRC) and National Institute for Health Research (NIHR) [grant number MC_PC_12025] and the MRC UK Consortium for MetAbolic Phenotyping (MAP/UK) [grant number MR/S010483/1]. Infrastructure support was provided by the NIHR Imperial Biomedical Research Centre (BRC). Conflict of Interest: none declared. Click here for additional data file.

6 in total

1. Two complementary reversed-phase separations for comprehensive coverage of the semipolar and nonpolar metabolome.

Authors: Fuad J Naser; Nathaniel G Mahieu; Lingjue Wang; Jonathan L Spalding; Stephen L Johnson; Gary J Patti
Journal: Anal Bioanal Chem Date: 2017-12-18 Impact factor: 4.142

2. Toward 'omic scale metabolite profiling: a dual separation-mass spectrometry approach for coverage of lipid and central carbon metabolism.

Authors: Julijana Ivanisevic; Zheng-Jiang Zhu; Lars Plate; Ralf Tautenhahn; Stephen Chen; Peter J O'Brien; Caroline H Johnson; Michael A Marletta; Gary J Patti; Gary Siuzdak
Journal: Anal Chem Date: 2013-07-03 Impact factor: 6.986

3. Development and Application of Ultra-Performance Liquid Chromatography-TOF MS for Precision Large Scale Urinary Metabolic Phenotyping.

Authors: Matthew R Lewis; Jake T M Pearce; Konstantina Spagou; Martin Green; Anthony C Dona; Ada H Y Yuen; Mark David; David J Berry; Katie Chappell; Verena Horneffer-van der Sluis; Rachel Shaw; Simon Lovestone; Paul Elliott; John Shockcor; John C Lindon; Olivier Cloarec; Zoltan Takats; Elaine Holmes; Jeremy K Nicholson
Journal: Anal Chem Date: 2016-08-26 Impact factor: 6.986

4. A cross-platform toolkit for mass spectrometry and proteomics.

Authors: Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick
Journal: Nat Biotechnol Date: 2012-10 Impact factor: 54.908

5. HMDB 4.0: the human metabolome database for 2018.

Authors: David S Wishart; Yannick Djoumbou Feunang; Ana Marcu; An Chi Guo; Kevin Liang; Rosa Vázquez-Fresno; Tanvir Sajed; Daniel Johnson; Carin Li; Naama Karu; Zinat Sayeeda; Elvis Lo; Nazanin Assempour; Mark Berjanskii; Sandeep Singhal; David Arndt; Yonjie Liang; Hasan Badran; Jason Grant; Arnau Serra-Cayuela; Yifeng Liu; Rupa Mandal; Vanessa Neveu; Allison Pon; Craig Knox; Michael Wilson; Claudine Manach; Augustin Scalbert
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

6. Creating a Reliable Mass Spectral-Retention Time Library for All Ion Fragmentation-Based Metabolomics.

Authors: Ipputa Tada; Hiroshi Tsugawa; Isabel Meister; Pei Zhang; Rie Shu; Riho Katsumi; Craig E Wheelock; Masanori Arita; Romanas Chaleckis
Journal: Metabolites Date: 2019-10-26

6 in total

1. Associations of NAFLD with circulating ceramides and impaired glycemia.

Authors: Meghana D Gadgil; Monika Sarkar; Caroline Sands; Matthew R Lewis; David M Herrington; Alka M Kanaya
Journal: Diabetes Res Clin Pract Date: 2022-03-12 Impact factor: 8.180

2. Automated Cancer Diagnostics via Analysis of Optical and Chemical Images by Deep and Shallow Learning.

Authors: Olof Gerdur Isberg; Valentina Giunchiglia; James S McKenzie; Zoltan Takats; Jon Gunnlaugur Jonasson; Sigridur Klara Bodvarsdottir; Margret Thorsteinsdottir; Yuchen Xiang
Journal: Metabolites Date: 2022-05-18

3. The Metabolomic Effects of Tripeptide Gut Hormone Infusion Compared to Roux-en-Y Gastric Bypass and Caloric Restriction.

Authors: Ben Jones; Caroline Sands; Kleopatra Alexiadou; James Minnion; George Tharakan; Preeshila Behary; Ahmed R Ahmed; Sanjay Purkayastha; Matthew R Lewis; Stephen Bloom; Jia V Li; Tricia M Tan
Journal: J Clin Endocrinol Metab Date: 2022-01-18 Impact factor: 5.958

4. Antiviral metabolite 3'-deoxy-3',4'-didehydro-cytidine is detectable in serum and identifies acute viral infections including COVID-19.

Authors: Ravi Mehta; Elena Chekmeneva; Heather Jackson; Caroline Sands; Ewurabena Mills; Dominique Arancon; Ho Kwong Li; Paul Arkell; Timothy M Rawson; Robert Hammond; Maisarah Amran; Anna Haber; Graham S Cooke; Mahdad Noursadeghi; Myrsini Kaforou; Matthew R Lewis; Zoltan Takats; Shiranee Sriskandan
Journal: Med (N Y) Date: 2022-01-31

5. Metabolomic profiling in small vessel disease identifies multiple associations with disease severity.

Authors: Eric L Harshfield; Caroline J Sands; Anil M Tuladhar; Frank Erik de Leeuw; Matthew R Lewis; Hugh S Markus
Journal: Brain Date: 2022-07-29 Impact factor: 15.255

6. JPA: Joint Metabolic Feature Extraction Increases the Depth of Chemical Coverage for LC-MS-Based Metabolomics and Exposomics.

Authors: Jian Guo; Sam Shen; Min Liu; Chenjingyi Wang; Brian Low; Ying Chen; Yaxi Hu; Shipei Xing; Huaxu Yu; Yu Gao; Mingliang Fang; Tao Huan
Journal: Metabolites Date: 2022-02-26

6 in total