Jimi Wills1, Joy Edwards-Hicks1, Andrew J Finch1. 1. Cancer Research UK Edinburgh Centre, Institute of Genetics and Molecular Medicine, University of Edinburgh , Crewe Road, Edinburgh EH4 2XR, United Kingdom.
Abstract
Metabolic analyses generally fall into two classes: unbiased metabolomic analyses and analyses that are targeted toward specific metabolites. Both techniques have been revolutionized by the advent of mass spectrometers with detectors that afford high mass accuracy and resolution, such as time-of-flights (TOFs) and Orbitraps. One particular area where this technology is key is in the field of metabolic flux analysis because the resolution of these spectrometers allows for discrimination between 13C-containing isotopologues and those containing 15N or other isotopes. While XCMS-based software is freely available for untargeted analysis of mass spectrometric data sets, it does not always identify metabolites of interest in a targeted assay. Furthermore, there is a paucity of vendor-independent software that deals with targeted analyses of metabolites and of isotopologues in particular. Here, we present AssayR, an R package that takes high resolution wide-scan liquid chromatography-mass spectrometry (LC-MS) data sets and tailors peak detection for each metabolite through a simple, iterative user interface. It automatically integrates peak areas for all isotopologues and outputs extracted ion chromatograms (EICs), absolute and relative stacked bar charts for all isotopologues, and a .csv data file. We demonstrate several examples where AssayR provides more accurate and robust quantitation than XCMS, and we propose that tailored peak detection should be the preferred approach for targeted assays. In summary, AssayR provides easy and robust targeted metabolite and stable isotope analyses on wide-scan data sets from high resolution mass spectrometers.
Metabolic analyses generally fall into two classes: unbiased metabolomic analyses and analyses that are targeted toward specific metabolites. Both techniques have been revolutionized by the advent of mass spectrometers with detectors that afford high mass accuracy and resolution, such as time-of-flights (TOFs) and Orbitraps. One particular area where this technology is key is in the field of metabolic flux analysis because the resolution of these spectrometers allows for discrimination between 13C-containing isotopologues and those containing 15N or other isotopes. While XCMS-based software is freely available for untargeted analysis of mass spectrometric data sets, it does not always identify metabolites of interest in a targeted assay. Furthermore, there is a paucity of vendor-independent software that deals with targeted analyses of metabolites and of isotopologues in particular. Here, we present AssayR, an R package that takes high resolution wide-scan liquid chromatography-mass spectrometry (LC-MS) data sets and tailors peak detection for each metabolite through a simple, iterative user interface. It automatically integrates peak areas for all isotopologues and outputs extracted ion chromatograms (EICs), absolute and relative stacked bar charts for all isotopologues, and a .csv data file. We demonstrate several examples where AssayR provides more accurate and robust quantitation than XCMS, and we propose that tailored peak detection should be the preferred approach for targeted assays. In summary, AssayR provides easy and robust targeted metabolite and stable isotope analyses on wide-scan data sets from high resolution mass spectrometers.
The goal of an untargeted metabolomic
experiment is usually to identify metabolites that have changed with
the greatest significance or magnitude between two or more experimental
conditions. A typical untargeted mass spectrometric experiment usually
follows a well-defined workflow, using proprietary or open source
software (e.g., XCMS,[1−3] mzMine[4,5]) to give a list of features that
can be quantified and matched to a database to yield probable or verified
metabolite identifications. This approach was recently extended to
include untargeted identification of stable isotope fluxes using the
elegant X13CMS software tool.[6] In contrast,
a targeted metabolite experiment is one in which specific metabolites
must be identified with high confidence in all samples (where detectable),
and this requires prioritization of different analytical parameters.
Existing targeted workflows based upon XCMS do exist,[7] but the enforcement of a single set of global peak detection
parameters is a limitation that can lead to missed peaks or inaccurate
quantitation. Some peaks are simply not found, particularly with mixed
mode hydrophilic liquid interaction (HILIC) chromatography where peaks
can be broad and of irregular shape. Furthermore, this approach suffers
serious limitation in the analysis of stable isotope tracing experiments
because isotopologues are treated as distinct features during the
peak detection stage when they should be detected in concert. This
also impacts upon data output, since isotopologues are not grouped
together and must therefore be further processed to yield the isotopic
composition of each metabolite.Targeted metabolic analysis
has traditionally required less postacquisition
analysis because the preferred instrument for such experiments has
been the triple quadrupole mass spectrometer, and the combination
of precursor and product m/z ions
specified at the point of data acquisition is tied to a specific metabolite.[8] With this strategy, metabolite identification
is primarily a preacquisition issue rather than a postacquisition
one. Adding a metabolite tracer into such an analysis, however, necessitates
the addition of MRM (multiple reaction monitoring) transitions for
each expected isotopologue, and this yields a complexity of acquisition
that is not desirable, quickly limiting the number of metabolites
that can be measured. The problem of acquisition complexity is even
more pronounced if isotopic tracers are used that contain more than
one heavy isotope (e.g., 13C5, 15N2-glutamine). It is in this context that the new generation
of high resolution, accurate mass spectrometers excel because relatively
standard wide scan methods can be used for data acquisition, yet many
metabolites and their isotopologues can subsequently be separated
and quantified through data analysis approaches.[9]We set several criteria for an ideal software tool
that can take
high resolution, high mass accuracy data from any mass spectrometer
and return peak integrals for specific metabolites and their isotopologues.
These criteria are (a) robust peak detection taking into account all
isotopologues, (b) a simple, optional quality control curation step
for all peaks prior to quantitation, (c) reporting of values for separate
(including split) peaks where more than one is found that could be
attributed to a single metabolite, (d) reporting of values and bar
charts for grouped isotopologues, and (e) an interface that is easy
and intuitive to use. Here, we present AssayR, an R package[10] that fulfills the above criteria (Figure ). Using data obtained on a
ThermoScientific Q Exactive mass spectrometer, we demonstrate outputs
from XCMS and AssayR that reveal more accurate and robust quantitation
of analytes in AssayR.
Figure 1
Schematic of AssayR showing the main concepts and demonstrating
minimal user input (initial config and optional peak picking only).
mzML files undergo extracted ion chromatogram (EIC) analysis based
upon the m/z values in the input
config file. Optional interactive peak picking leads to a final config
file which is used to produce the peak integrals for quantitation.
All required isotopologues are included in the process, and the outputs
are a .csv file of the data as well as EICs and bar charts of relative
(percentage) and absolute values for all isotopologues.
Schematic of AssayR showing the main concepts and demonstrating
minimal user input (initial config and optional peak picking only).
mzML files undergo extracted ion chromatogram (EIC) analysis based
upon the m/z values in the input
config file. Optional interactive peak picking leads to a final config
file which is used to produce the peak integrals for quantitation.
All required isotopologues are included in the process, and the outputs
are a .csv file of the data as well as EICs and bar charts of relative
(percentage) and absolute values for all isotopologues.
Methods
Analysis of Cellular Metabolites
MRC5 primary human
fibroblasts were switched to DMEM with 25 mM 13C6-glucose for 5 or 60 min. The medium was aspirated; cells were washed
quickly with ice-cold PBS, and metabolites were extracted with 50:30:20
methanol/acetonitrile/water. Samples (triplicates) were applied to
liquid chromatography–mass spectrometry (LC-MS) using a 15
cm × 4.6 mm ZIC-pHILIC (Merck Millipore) column fitted with a
guard on a Thermo Ultimate 3000 HPLC. A gradient of decreasing acetonitrile
(with 20 mM ammonium carbonate as the aqueous phase) was used to elute
metabolites into a Q Exactive mass spectrometer. Data were acquired
in wide scan negative mode. In order to generate mzML files, the command
“msconvert_all()” was run that uses the msconvert utility
of Proteowizard[11,12] to generate separate positive
and negative mode mzML files.
Software Description
Input
File Format
AssayR uses the R package mzR to
extract chromatograms from files in mzML format.
Config File
A configuration file in .tsv format is
associated with each analysis (Figure ). This file specifies the m/z value and the retention time (RT) window of each metabolite
of interest as well as the maximum number of isotopologues to analyze
(split into 13C, 15N, and 2H). For
config file setup purposes, the full retention time range can be selected
(e.g., Initial config file in Figure ) as well as default values for the width of the peak
detection filter (“seconds”; see Extracted Ion Chromatogram Generation and Peak Detection below)
and intensity threshold. An “interactive” option is
also included so that the user can opt out of the iterative peak detection
step for any metabolite, for instance, if it is known that the peak
is always picked correctly by the current settings. Isotopologue selection
is simply a numerical input for 13C, 15N, or 2H, and combined isotopes can be selected: all possible isotopologues
are analyzed.
Figure 2
Examples of Initial and Final config files. Typical default
values
are given in the Initial config file. “seconds” refers
to the width of the peak detection filter and not the peak width.
The red box highlights parameters that are modified during interactive
peak picking. The blue box highlights the simple isotopologue number
input.
Examples of Initial and Final config files. Typical default
values
are given in the Initial config file. “seconds” refers
to the width of the peak detection filter and not the peak width.
The red box highlights parameters that are modified during interactive
peak picking. The blue box highlights the simple isotopologue number
input.
Extracted Ion Chromatogram
Generation and Peak Detection
A more detailed description
accompanies the R package code, which
is available at https://gitlab.com/jimiwills/assay.R. Briefly, a row from the configuration table (Figure ), representing an analyte, is read and the
configured mz ranges (combining m/z, ppm, and isotope settings) are extracted from mzML files via the
mzR package. Interpolation is used to standardize the retention times
across these chromatograms, and the maximal chromatographic profile
is taken forward for peak detection. This means that a peak only needs
to be present in a single sample for a single isotope for that peak
to be detected and measured across the whole context. The use of combined
isotopologues (Figure A) for metabolite peak identification is particularly important when
a mix of labeled and unlabeled samples are analyzed or for samples
where the labeling in a given metabolite is saturated, and therefore,
the monoisotopic m/z value would
be inappropriate for metabolite peak identification.
Figure 3
Peak detection in AssayR.
(A) Peak detection (shaded blue) is specified
for each metabolite based upon all isotopologues in all samples. (B)
Example of peak detection (blue shading) despite poor chromatography.
(C) AssayR enables split peaks to be detected separately (shaded green/yellow)
or together. Shaded areas are detected and quantified.
Peak detection in AssayR.
(A) Peak detection (shaded blue) is specified
for each metabolite based upon all isotopologues in all samples. (B)
Example of peak detection (blue shading) despite poor chromatography.
(C) AssayR enables split peaks to be detected separately (shaded green/yellow)
or together. Shaded areas are detected and quantified.The retention time minimum and maximum from the
configuration table
are used to select a region of the chromatogram on which peak detection
will be performed. A Mexican hat filter is used with the filter function
in R to translate the chromatogram and to set start and end indices
for each detected peak. The indices are used to define the range to
be summed to generate peak area measurements for each chromatogram.
Individual peaks are marked (in blue), where more than one peak is
identified within a metabolite retention time window; the peaks are
separated and shaded in different colors (e.g., green/yellow for 2
peaks; Figure C),
to assist with peak curation. The interactive peak picking procedure
then allows simple alteration of detection parameters through an intuitive
query-based format. Alteration of the width of the Mexican hat (“seconds”
column in the config file) enables most peaks to be picked, even when
the chromatography is poor, such as when measuring glucose on a ZIC-pHILIC
column (Figure B).
Split peaks can also be selected as a single metabolite or two peaks
by alteration of the hat width (Figure C). The minimum and maximum m/z values, the hat filter width, and the peak detection threshold
are all updated during the interactive peak picking process. Once
the user is satisfied with the result, the updated parameters are
written back to the configuration file (e.g., Final config file in Figure ) and the peak areas
are saved for output in the data .csv file.
Output
The primary
output from AssayR is a .csv file
with samples separated by column and metabolites/isotopologues separated
by row. Images of the extracted ion chromatograms generated during
peak detection are exported for recording metabolite identification
and quality control: these images are generated even when the software
is run without interactive peak curation. Stacked bar charts of absolute
and relative peak intensity are produced for each metabolite, allowing
quick and easy visualization of the data. These reveal variance between
samples and allow for quick identification of possible outlier samples
(as outlined below). A representative analysis of 13C6-glucose tracing in primary human fibroblasts is presented
in Figure .
Figure 4
Glycolytic
and related stable isotope tracing of 13C6-glucose
metabolism quantified by AssayR. Relative (percentage)
stacked bar charts of triplicates are shown as produced by AssayR
(absolute stacked bar charts and EICs are also produced automatically).
MRC-5 fibroblasts were pulsed for 5 or 60 min with 13C6-glucose in triplicate. Abbreviations: Glu (glucose), G6P
(glucose 6-phosphate), F6P (fructose 6-phosphate), FBP (fructose 1,6-bisphosphate),
PGA (3-phosphoglyceraldehyde), DHAP (dihydroxyacetone phosphate),
2/3-PG (2-/3-phosphoglycerate), PEP (phosphoenolpyruvate), Pyr (pyruvate),
Ala (alanine), Lac (lactate), Cit/Iso (citrate/isocitrate).
Glycolytic
and related stable isotope tracing of 13C6-glucose
metabolism quantified by AssayR. Relative (percentage)
stacked bar charts of triplicates are shown as produced by AssayR
(absolute stacked bar charts and EICs are also produced automatically).
MRC-5 fibroblasts were pulsed for 5 or 60 min with 13C6-glucose in triplicate. Abbreviations: Glu (glucose), G6P
(glucose 6-phosphate), F6P (fructose 6-phosphate), FBP (fructose 1,6-bisphosphate),
PGA (3-phosphoglyceraldehyde), DHAP (dihydroxyacetone phosphate),
2/3-PG (2-/3-phosphoglycerate), PEP (phosphoenolpyruvate), Pyr (pyruvate),
Ala (alanine), Lac (lactate), Cit/Iso (citrate/isocitrate).
Comparison with XCMS
A popular chromatographic approach in metabolomics is the use of
ZIC-pHILIC columns at high pH[9] because
they capture a wide range of metabolites, including most of the organic
acids of central carbon metabolism. However, the chromatographic performance
of these matrices can be poor, especially in comparison to reversed-phase
chromatographic approaches. Variability in peak shape can be pronounced
as some metabolites can interact with the matrix in more than one
way, and this can lead to spread (e.g., glucose in Figure B) or separated peaks. This
variability can be more pronounced if methanol is used during sample
loading due to additional surface effects of the solvent. Using the
metabolomic data set from MRC5 primary human fibroblasts pulsed with 13C6-glucose, we analyzed glycolytic and related
metabolites with AssayR (Figure ) and XCMS. As described above, the chromatography
of glucose is poor and XCMS did not pick any of the isotopologues,
whereas AssayR showed almost full labeling in all samples (Figure A). While fructose
6-phosphate was well resolved and accurately picked by both packages,
the monoisotopic glucose 6-phosphate peak had an overlapping isobaric
peak with slightly later retention time (Figure B). During the peak detection stage in AssayR,
it was clear that these were mixed metabolites because some of the
extracted ion chromatograms (EICs) matched the first peak only whereas
some matched both peaks (Figure B). AssayR was set up to resolve these peaks, but XCMS
picked them together.
Figure 5
Comparison of AssayR with XCMS. (A) Peaks that fail the
XCMS peak
detection are picked and quantified with AssayR. (B) User control
over peak detection in AssayR allows exclusion of incorrect peaks,
particularly with overlapping isobaric species (the chromatograms
are different because AssayR includes all isotopologues; the monoisotopic
only is shown for XCMS). (C) Example of misquantitation by XCMS due
to partial peak detection of a (m + 2) isotopologue.
XCMS detected/quantified area of the EIC is in red (red asterisk indicates
inaccurate m + 2 quantitation in the corresponding
bar chart). G6P = glucose 6-phosphate.
Comparison of AssayR with XCMS. (A) Peaks that fail the
XCMS peak
detection are picked and quantified with AssayR. (B) User control
over peak detection in AssayR allows exclusion of incorrect peaks,
particularly with overlapping isobaric species (the chromatograms
are different because AssayR includes all isotopologues; the monoisotopic
only is shown for XCMS). (C) Example of misquantitation by XCMS due
to partial peak detection of a (m + 2) isotopologue.
XCMS detected/quantified area of the EIC is in red (red asterisk indicates
inaccurate m + 2 quantitation in the corresponding
bar chart). G6P = glucose 6-phosphate.The stacked bar plots in AssayR revealed that the later peak
was
unlabeled, whereas the earlier peak (glucose 6-phosphate) was predominantly
labeled. Due to the analysis of mixed metabolites, XCMS underestimated
the labeling of glucose 6-phosphate and gave a high variance (Figure B). A third problem
was noticed with the quantitation of 13C incorporation
into citrate/isocitrate. Comparison of the stacked bar plots revealed
that the third sample in the 60 min time point (sample 6) was different
from the other two, showing lower 13C2 abundance
(Figure C). Examination
of the individual EICs and RT over which they were integrated revealed
that XCMS had only picked part of the peak in sample 6 (red area of
EIC), and therefore, this isotopologue was underrepresented in the
analysis. This type of error cannot occur in AssayR because integration
occurs over a fixed RT window for all isotopologues. Thus, we present
data that strongly support the use of tailored peak detection for
the quantitation of specific metabolites in wide scan high resolution
LC-MS data sets.
Conclusion
AssayR is an open source
platform-agnostic R package that enables
straightforward analysis of high resolution mass spectrometric data
sets for targeted analyses, particularly those involving stable isotope
tracers. The increasing availability of high resolution mass spectrometers
renders this a timely addition to the analytical capability of investigators
studying metabolic pathways. While common preference for the reliability
and quantitative capability of triple-quadrupole mass spectrometers
will not be displaced in the immediate future by high resolution spectrometers,
the versatility of the postacquisition approach afforded by the latter
is a very good match for stable isotope labeling studies. AssayR enables
a simple, robust, and powerful approach to the measurement of metabolite
usage in biological samples.
Authors: Darren J Creek; Andris Jankevics; Karl E V Burgess; Rainer Breitling; Michael P Barrett Journal: Bioinformatics Date: 2012-02-04 Impact factor: 6.937
Authors: Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick Journal: Nat Biotechnol Date: 2012-10 Impact factor: 54.908
Authors: Xiaojing Huang; Ying-Jr Chen; Kevin Cho; Igor Nikolskiy; Peter A Crawford; Gary J Patti Journal: Anal Chem Date: 2014-01-24 Impact factor: 6.986
Authors: Ramon I Klein Geltink; Joy Edwards-Hicks; Petya Apostolova; David O'Sullivan; David E Sanin; Annette E Patterson; Daniel J Puleston; Nina A M Ligthart; Joerg M Buescher; Katarzyna M Grzes; Agnieszka M Kabat; Michal Stanczak; Jonathan D Curtis; Fabian Hässler; Franziska M Uhl; Mario Fabri; Robert Zeiser; Edward J Pearce; Erika L Pearce Journal: Nat Metab Date: 2020-08-03
Authors: Min Yuan; Daniel M Kremer; He Huang; Susanne B Breitkopf; Issam Ben-Sahra; Brendan D Manning; Costas A Lyssiotis; John M Asara Journal: Nat Protoc Date: 2019-02 Impact factor: 13.491
Authors: Tamara M Sirey; Kenny Roberts; Wilfried Haerty; Oscar Bedoya-Reina; Sebastian Rogatti-Granados; Jennifer Y Tan; Nick Li; Lisa C Heather; Roderick N Carter; Sarah Cooper; Andrew J Finch; Jimi Wills; Nicholas M Morton; Ana Claudia Marques; Chris P Ponting Journal: Elife Date: 2019-05-02 Impact factor: 8.140
Authors: Katy McLaughlin; Ilya M Flyamer; John P Thomson; Heidi K Mjoseng; Ruchi Shukla; Iain Williamson; Graeme R Grimes; Robert S Illingworth; Ian R Adams; Sari Pennings; Richard R Meehan; Wendy A Bickmore Journal: Cell Rep Date: 2019-11-12 Impact factor: 9.423
Authors: Daniel J Puleston; Francesc Baixauli; David E Sanin; Joy Edwards-Hicks; Matteo Villa; Agnieszka M Kabat; Marcin M Kamiński; Michal Stanckzak; Hauke J Weiss; Katarzyna M Grzes; Klara Piletic; Cameron S Field; Mauro Corrado; Fabian Haessler; Chao Wang; Yaarub Musa; Lena Schimmelpfennig; Lea Flachsmann; Gerhard Mittler; Nir Yosef; Vijay K Kuchroo; Joerg M Buescher; Stefan Balabanov; Edward J Pearce; Douglas R Green; Erika L Pearce Journal: Cell Date: 2021-07-02 Impact factor: 66.850
Authors: Beth Kelly; Gustavo E Carrizo; Joy Edwards-Hicks; David E Sanin; Michal A Stanczak; Chantal Priesnitz; Lea J Flachsmann; Jonathan D Curtis; Gerhard Mittler; Yaarub Musa; Thomas Becker; Joerg M Buescher; Erika L Pearce Journal: Nature Date: 2021-02-24 Impact factor: 49.962
Authors: Jan Stanstrup; Corey D Broeckling; Rick Helmus; Nils Hoffmann; Ewy Mathé; Thomas Naake; Luca Nicolotti; Kristian Peters; Johannes Rainer; Reza M Salek; Tobias Schulze; Emma L Schymanski; Michael A Stravs; Etienne A Thévenot; Hendrik Treutler; Ralf J M Weber; Egon Willighagen; Michael Witting; Steffen Neumann Journal: Metabolites Date: 2019-09-23