Craig McLean1,2, Elizabeth B Kujawinski1. 1. Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, United States. 2. MIT/WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, United States.
Abstract
Untargeted metabolomics experiments provide a snapshot of cellular metabolism but remain challenging to interpret due to the computational complexity involved in data processing and analysis. Prior to any interpretation, raw data must be processed to remove noise and to align mass-spectral peaks across samples. This step requires selection of dataset-specific parameters, as erroneous parameters can result in noise inflation. While several algorithms exist to automate parameter selection, each depends on gradient descent optimization functions. In contrast, our new parameter optimization algorithm, AutoTuner, obtains parameter estimates from raw data in a single step as opposed to many iterations. Here, we tested the accuracy and the run-time of AutoTuner in comparison to isotopologue parameter optimization (IPO), the most commonly used parameter selection tool, and compared the resulting parameters' influence on the properties of feature tables after processing. We performed a Monte Carlo experiment to test the robustness of AutoTuner parameter selection and found that AutoTuner generated similar parameter estimates from random subsets of samples. We conclude that AutoTuner is a desirable alternative to existing tools, because it is scalable, highly robust, and very fast (∼100-1000× speed improvement from other algorithms going from days to minutes). AutoTuner is freely available as an R package through BioConductor.
Untargeted metabolomics experiments provide a snapshot of cellular metabolism but remain challenging to interpret due to the computational complexity involved in data processing and analysis. Prior to any interpretation, raw data must be processed to remove noise and to align mass-spectral peaks across samples. This step requires selection of dataset-specific parameters, as erroneous parameters can result in noise inflation. While several algorithms exist to automate parameter selection, each depends on gradient descent optimization functions. In contrast, our new parameter optimization algorithm, AutoTuner, obtains parameter estimates from raw data in a single step as opposed to many iterations. Here, we tested the accuracy and the run-time of AutoTuner in comparison to isotopologue parameter optimization (IPO), the most commonly used parameter selection tool, and compared the resulting parameters' influence on the properties of feature tables after processing. We performed a Monte Carlo experiment to test the robustness of AutoTuner parameter selection and found that AutoTuner generated similar parameter estimates from random subsets of samples. We conclude that AutoTuner is a desirable alternative to existing tools, because it is scalable, highly robust, and very fast (∼100-1000× speed improvement from other algorithms going from days to minutes). AutoTuner is freely available as an R package through BioConductor.
Metabolomics
is the study of
all the compounds present within a cell, organism, or tissue. Such
investigations provide a holistic snapshot of the activity within
a biological matrix and have led to a myriad of discoveries ranging
from the elucidation of novel biochemical pathways to the recognition
of molecular adaptive responses to stress and the clarification of
mechanisms driving cell–cell interactions.[1−3] Advances in
mass spectrometry fostered these discoveries, specifically improvements
in instrument sensitivity, accuracy, and data collection capacity.[1,4,5] Parallel advances in computational
tools have historically followed to fulfill the potential of analytical
improvements.[6]Prior to data analysis,
raw data from untargeted metabolomics experiments
must be processed to generate a features table. Features are defined
as peaks with unique mass to charge (m/z) and retention time values, with relative abundances determined
by their height or area. Processing is critical to extract chemical
signals from electrical noise and to correct for retention time drift
across samples.[7] A variety of untargeted
data processing methods exist,[8−11] including two commonly used tools: MZmine2[12] and XCMS.[13] Although
these tools reliably extract true features from complex data, their
performance depends on the selection of algorithmic parameters that
capture the structure of the data, such as matrix effects and differences
in analytical platforms.[14−16] No universal set of parameters
exists for all datasets; hence, parameter optimization must occur
prior to analysis to avoid noise inflation within the feature table.[17−19]Tuning parameters manually is prohibitively time-consuming
due
to the high number of possible numerical combinations. To overcome
this challenge, several methods exist to identify optimal dataset-specific
parameters.[20−22] These methods each use distinct optimization functions
based on maximizing or minimizing a numerical value. Each approach
iteratively runs XCMS peak-picking and retention time correction algorithms
until they identify a set of parameters that optimizes a desired criterion.
For example, isotopologue parameter optimization (IPO), the most commonly
used parameter selection tool, scores groups of parameters by the
number of features detected after XCMS that contain a naturally occurring 13C isotopologue. Many separate XCMS runs are required to find
ideal parameters, sometimes taking weeks to complete.[20−22] Currently, these parameter selection algorithms depend on high performance
computing resources. As users continue to adopt ultrahigh pressure
liquid chromatography systems and rapid scanning mass spectrometers,
the size and abundance of data from these platforms will preclude
the use of unscalable parameter selection algorithms to users without
access to high performance computing resources.[23,24]We designed a novel parameter optimization algorithm, AutoTuner,
to ameliorate these challenges. The method performs statistical inference
on raw data in a single step in order to make parameter estimates
as opposed to iteratively checking estimates. Further, it complements
recent tools focused on generating higher-confidence feature annotations.[25−28] AutoTuner is capable of selecting parameters for seven continuously
valued parameters required for centWave peak selection algorithm used
by both MZmine2 and XCMS, and it determines a key parameter for grouping
in XCMS. AutoTuner is freely distributed through BioConductor as an
R package.
Theory and Design of AutoTuner
Algorithm Overview
AutoTuner makes estimates for the
following mass spectrometry peak-picking and grouping algorithms parameters: , , , , , , and . We chose to optimize these parameters because they have the greatest
influence on the number and quality of features postprocessing and
have the greatest number of possible values.[21,29] We chose to optimize centWave peak-picking parameters over other
peak-picking methods, as centWave is the recommended method for processing
high-resolution untargeted data, which is increasingly becoming the
standard for untargeted metabolomics.[4] See Table for a description
of the parameters and their matching arguments in XCMS. AutoTuner
makes estimates in three steps ( Figure ). (1) Total ion chromatogram (TIC) peak
detection: A user identifies peaks within each sample’s total
ion chromatogram (TIC), which is the plot of the integrated ion intensities
within the mass spectrometer over time. (2) Parameter estimation within
extracted ion chromatograms (EICs) of each TIC peak: AutoTuner isolates
predicted EICs within each identified TIC peak. An EIC is a plot of
one or more selected m/z values
in a series of mass spectra. AutoTuner applies statistical inference
on all EIC peaks to estimate parameters in an unsupervised manner.
(3) Dataset-wide parameter estimates: AutoTuner integrates all peak-specific
estimates into a dataset-wide set.
Table 1
Parameters Estimated through the AutoTuner
Algorithma
parameter
description
XCMS parameter name
functionality
application
Group difference
expected retention time deviation of an mz/rt feature between samples
bw
grouping
XCMS
ppm
parts per million (ppm) error threshold
used to bin consecutive
mass intensities across adjacent scans into a single peak
ppm
centWave (peak-picking)
XCMS and
MZmine2
S/N Threshold
the minimum ratio between peak and average
noise intensity
required to retain a feature
snthresh
centWave (peak-picking)
XCMS and MZmine2
Scan count
minimum number of scans required to retain a peak
prefilter
scan
centWave (peak-picking)
XCMS and
MZmine2
Noise
numerical threshold used to filter out noise from
true masses
noise
centWave (peak-picking)
XCMS and MZmine2
Prefilter
intensity
minimum integrated intensity
required retain a peak
prefilter intensity
centWave (peak-picking)
XCMS and MZmine2
Peak-width
the width of a chromatographically resolved peak
min/max
peak-width
centWave (peak-picking)
XCMS
and MZmine2
We chose to optimize these parameters
due to their influence on the number and quality of features returned
following XCMS data processing.[20,28]Table S4 gives more information on these parameters.
Figure 1
Schematic of the three
stages of the AutoTuner algorithm. (A) Total
ion chromatogram (TIC) peak detection requires user input and is focused
on identifying peaks within each sample’s TIC. The user directly
adjusts a signal processing sliding window analysis to identify peaks
within the TIC. (B) Parameter estimation within extracted ion chromatograms
(EICs) of each TIC peak iteratively looks at each peak to make parameter
estimates from EICs. (C) Data set-wide parameter estimates aggregate
results from the second stage to provide an ideal set of parameters
for the entire dataset. Parameters estimated are in bold. The R package
vignette at BioConductor provides an example on how to use the algorithm.
Schematic of the three
stages of the AutoTuner algorithm. (A) Total
ion chromatogram (TIC) peak detection requires user input and is focused
on identifying peaks within each sample’s TIC. The user directly
adjusts a signal processing sliding window analysis to identify peaks
within the TIC. (B) Parameter estimation within extracted ion chromatograms
(EICs) of each TIC peak iteratively looks at each peak to make parameter
estimates from EICs. (C) Data set-wide parameter estimates aggregate
results from the second stage to provide an ideal set of parameters
for the entire dataset. Parameters estimated are in bold. The R package
vignette at BioConductor provides an example on how to use the algorithm.We chose to optimize these parameters
due to their influence on the number and quality of features returned
following XCMS data processing.[20,28]Table S4 gives more information on these parameters.
Total Ion Chromatogram (TIC) Peak Detection
To identify
TIC peaks, AutoTuner first applies a sliding window analysis, which
detects peaks by testing if an upcoming scan’s intensity is
greater than an intensity threshold determined by the average and
standard deviation of a fixed number of prior scans. To ensure the
correct peak bounds are retained, AutoTuner generates a linear model
from the first three and last three points bounding each TIC peak.
If the model fails to calculate an R2 value
greater than or equal to 0.8 or to reach a local R2 maximum, AutoTuner expands the ending bound by one scan
and reruns the model until the model meets either criterion. The time
difference of a TIC peak’s final bounds represents its width.AutoTuner groups all TIC peaks originating from distinct samples
whose maxima co-occur within each other’s retention-time bounds.
It then determines the time differences between intensity maxima of
all pairs of grouped peaks. AutoTuner returns the largest time difference
as the estimate for the parameter that is used in the grouping step following peak-picking.
Because highly complex datasets may contain distinct sample-specific
peaks occurring at similar retention times, AutoTuner may overestimate
this parameter. Prioritizing the inclusion of experimental replicates
within AutoTuner would limit this issue. The overestimation of does not affect downstream
parameter estimation, as future estimates do not involve comparisons
across samples and instead focus on properties of individual EICs.
At this point, AutoTuner has only collected data to estimate the parameter.
Parameter Estimation
within EICs of Each TIC Peak
AutoTuner
estimates the remaining parameters (, , , , , and ) from raw data contained within each individual
TIC peak. A central assumption in AutoTuner is that TIC peaks represent
chromatographic regions enriched in chemical ions relative to electrical
noise.[7]
Error ()
First, AutoTuner
sorts all m/z values detected in
mass spectra contained within the bounds of a TIC peak. AutoTuner
bins m/z values if the difference
in mass of two adjacent m/z values
is below a user-provided threshold. AutoTuner stores unbinned m/z values as noise peaks. Because peaks
of true features are made up of m/z values within adjacent scans (scan continuity criterion), AutoTuner
sorts each bin by scan number to check that this criterion holds for
the binned m/z values. In the case
where two or more m/z values are
retained from a single scan, only the m/z value with the lowest difference in mass to the previous scan’s
mass is retained. If multiple m/z values occur within the first scan of the bin, the difference in
mass is calculated for the next adjacent scan’s m/z value instead. AutoTuner stores m/z values within bins that fail the scan continuity
criterion as noise peaks, similar to the noise removal step earlier.AutoTuner estimates the parts per million () error parameter from the remaining bins by distinguishing
between bins formed by random associations of noise peaks and those
of hypothesized true features. To do this, AutoTuner first calculates
the of all m/z values within bins. AutoTuner then builds an empirical
distribution of values using a Gaussian
kernel density estimator (KDE) defined bywhereand x is the set of all observations, x is the ith observation, n is the number of observations,
and h is a measure of smoothness for the empirical
distribution. The function between and absolute error is not surjective, meaning two identical absolute
mass error values can have distinct values. Thus, we hypothesize that the value of noise peaks should be scattered widely, while values of real features should be within
a narrow range.[14] Hence, we expect that
by using a user-provided mass difference threshold larger than an
instrument-defined threshold, the KDE will have a long-tail and a
high narrow peak representing the instrument-dependent of real features and a shorter smaller downstream
peak representing the from erroneously
binned m/z values (Figure S1).AutoTuner subsamples the empirical distribution
of all values to speed downstream
calculations when
calculated values are abundant (>500).
To do this, AutoTuner checks the similarity between the original distribution
comprising all values and seven
distributions comprising one-half of all values randomly sampled from the total. Seven was chosen
arbitrarily. The distance between the original distribution and each
subsampled distribution is calculated using the Kullback–Leibler
divergence (KLD), a function that calculates the information theoretic
gain required to describe both distributions. A KLD value of 0.5 represents
an increase of one-half bit of information required to store the two
distributions. If a KLD value of 0.5 or greater is not calculated
across any comparison, AutoTuner replaces the original distribution
with one consisting of half as many values subsampled randomly from the original and repeats the subsampling.AutoTuner then calculates an outlier score for each value to distinguish between error values derived
from real features and those derived from random associations of noise
using the following outlier score function:[30]where C represents the largest
cluster of error values and x is the ith observation similar to (1). To identify this cluster, AutoTuner uses k-means
clustering, a data partitioning technique used to separate a set of
observations into k-many groups. Either the gap statistic or a user-provided
variance-explained threshold is used to determine the appropriate
number of clusters.[31] Using C ensures that the density of each calculated value is normalized by the density of the true error values
(Figure S1).The estimate is calculated by
the following:where x is any calculated value with outlier
scores above 1 and
sd(x) is the standard deviation of all x values. An outlier score value above 1 indicates that the density
of that particular x is at least as great as the
expected value of the density of all elements within C. The statistical properties of probability distributions inspired
this heuristic, as the sum of a probability distribution’s
mean and three times its standard deviation provides an upper bound
containing 99.7% of the total distribution area.[30]
Signal-to-Noise Threshold
We calculate
the maximum
intensity of each bin as well as the mean and standard deviation of
the intensity of all noise features occurring within two peak widths
from the original bin to estimate the signal-to-noise (S/N) threshold,
similar to Myers et al.[14] First, AutoTuner
subtracts the maximum intensity of each bin (xbin,) from the mean intensity of adjacent
noise (μnoise). AutoTuner retains the bin if this
difference is greater than three times the standard deviation (σnoise) of the adjacent noise intensity values:AutoTuner calculates
the from the smallest observed value of bin
and noise intensity difference divided by the standard deviation of
noise intensity across all bins passing the above threshold:
Remaining Parameters
AutoTuner sets
the estimate
as the minimum number of scans
across all bins. AutoTuner estimates and parameters
by first determining the minimum integrated bin and single scan intensities.
Then, it returns 90% of the magnitude of these values as the estimate
to ensure that no AutoTuner-detected bin is removed during actual
peak-picking. The represents the lowest number of scans within any bin multiplied
by the duty cycle of the instrument. To estimate the , AutoTuner expands bins at the
boundaries of the TIC peak. The expansion continues until an adjacent
scan does not contain a m/z value
whose error against the mean m/z of the bin is below the estimated threshold. A correlation checks to ensure that adjacent m/z values are not coming from noise after
a bin has been expanded by 3 scans. For this, AutoTuner requires an
absolute Spearman correlation coefficient of 0.9 between scans and
intensity values for expansion to continue. AutoTuner returns the across bins.
Data Set-Wide
Parameter Estimates
AutoTuner uses the
average of all and values weighed by the number of bins
within each TIC peak to return dataset-wide estimates for these parameters.
For dataset-wide values of , , , and , AutoTuner
returns the minimum values from all bins detected. The maximum calculated parameter represents the
dataset-wide parameter estimate. The average of each sample’s
maximal peak-width represents the estimate.
Experimental Demonstration
Materials
We chose a suite of 85 metabolites that represent
compounds expected in metabolomic experiments, including cofactors,
amino acids, and secondary metabolites. Of these, 41 ionized exclusively
in negative mode; 28 ionized exclusively in positive mode, and 16
ionized in both modes. See Table S1 for
a complete list of standards.We prepared stock solutions of
each metabolite standard in water or a 1:1 mix of methanol and water
at 1000 mg mL–1, unless constrained by solubility.
Some standards required the addition of ammonium hydroxide or formic
acid for dissolution. We stored stock solutions in the dark at −20
°C. We created a standard metabolite mix (10 mg mL–1) from the stock solutions and diluted it with Milli-Q water to obtain
four solutions with standard concentrations of 500 ng mL–1. We obtained standards at the highest grade available through Sigma-Aldrich
(MO, USA) with the exception of dimethylsulfoniopropionate (DMSP),
purchased from 21 Research Plus Inc. (NJ, USA).
Mass Spectrometry
We analyzed four replicates of the
standard mixes with ultrahigh-performance liquid chromatography (UPLC;
Accela Open Autosampler and Accela 1250 Pump (Thermo Scientific)),
coupled via heated electrospray ionization (H-ESI) to an ultrahigh
resolution tribrid mass spectrometer (Orbitrap Fusion Lumos (Thermo
Scientific)). We performed chromatographic separation with a Waters
Acquity HSS T3 column (2.1 × 100 mm, 1.8 μm) equipped with
a Vanguard precolumn, both maintained at 40 °C. We used mobile
phases of (A) 0.1% formic acid in water and (B) 0.1% formic acid in
acetonitrile at a flow rate of 0.5 mL min–1 to elute
the column. The gradient started at 1% B for 1 min, ramped to 15%
B from 1 to 3 min, ramped to 50% from 3 to 6 min, ramped to 95% B
from 6 to 9 min, held until 10 min, ramped to 1% from 10 to 10.2 min,
and finally held at 1% B (total gradient time of 12 min). We made
separate positive and negative ion mode autosampler injections of
5 μL. We set electrospray voltage to 3600 V (positive) and 2600
V (negative) and source gases to 55 (sheath) and 20 (auxiliary). We
set the heated capillary temperature to 375 °C and the vaporizer
temperature to 400 °C. We acquired full scan MS data in the Orbitrap
analyzer (mass resolution of 120 000 fwhm at m/z 200), with an automatic gain control (AGC) target
of 4 × 105, a maximum injection time of 50 s, and
a scan range of 100–1000 m/z. We set the AGC target value for fragmentation spectra at 5 ×
104, and the intensity threshold at 2 × 104. We collected all data in profile mode.
Validation Data
We used two published datasets to validate
the AutoTuner’s performance on experimental data: (1) a bacterial
culture experiment,[32] MetaboLights[33] identifier MTBLS157, and (2) a rat fecal microbiome,
by direct contact with the authors (Table ).[34]
Table 2
Information on the Datasets Used to
Test the AutoTuner’s Performancea
The mass spectrometers
and liquid
chromatography systems herein are some of the most commonly used analytical
platforms for untargeted metabolomics[4].
The mass spectrometers
and liquid
chromatography systems herein are some of the most commonly used analytical
platforms for untargeted metabolomics[4].
Data Processing and Quality
Control
We converted all
raw data files from their proprietary formats to mzML files using
msconvert.[35] All computing of mzML files
took place within an Ubuntu Xenial 16.04 Google Cloud instance with
8 CPUs and 10Gb of memory. During time comparisons, we used 8 and
1 CPU(s) to obtain IPO and AutoTuner data processing parameters, respectively.
We used an m/z mass error threshold
of 0.005 Da for AutoTuner, because this absolute error was sufficiently
large enough to return a broad range of error values (in ppm) greater
than those of the mass analyzers used to generate the validation data
(See Table ).We used XCMS and centWave to generate feature tables for each dataset[13,36] and CAMERA for isotopologue and adduct detection.[37]Table S2 contains parameters
used for processing. For the standards, we searched for the most abundant
parent ion within the EICs (see Table S1). We confirmed the presence of a metabolite standard within feature
tables if a feature had an intensity above 1 × 104, was within an exact mass error of 5 ppm of the parent ion, and
had a retention time error of 5 s from the EIC peak. We identified 12C13C1 and 12C13C2 isotopologue peaks as features with an exact
mass error of 5 ppm of the parent ion isotopologue masses (1.0033
for 13C1 and 2.0066 for 13C2). Additionally, we required that peaks matching m/z values of isotopologues also had a retention
time error less than 5 s from the 12C13C0 peak. Prior to any identification,
we confirmed that standards contained isotopologue peaks by visually
inspecting raw data. For the culture experiment, the data was subjected
to quality control as described previously.[38] Briefly, we removed features detected in process blanks, features
detected within only one replicate, and features representing isotopologues
and adducts. Additionally, we removed features with coefficient of
variation values above 0.4 across six pooled samples. We defined overlapping
features in AutoTuner- and IPO-parametrized feature tables to be those
with ppm error below 5 and retention time differences less than 20
s. The culture experiment allowed a higher retention time correction
because it relied on data collected with HPLC compared to the standards,
which were analyzed with a UPLC system.
Statistical Analyses
We applied several distinct statistical
methods to summarize the various pieces of data used to validate AutoTuner.
We used R programming language to perform all analyses (CRAN R-Project).
We used a Kolmogorov–Smirnov Test (KS-test) to compare empirical
cumulative distribution functions. We used the hypergeometric test
to compare MS2 enrichment of IPO- and AutoTuner-specific
features against features observed in the intersect of the two datasets.
In order to estimate the robustness of AutoTuner parameter estimation,
we performed a Monte Carlo experiment by running AutoTuner on distinct
subsets of the data. We first randomly selected 7 subsets of 11 samples
to compare the variability across parameters. We used the coefficient
of variation from estimates within each group as a measure of variability.
We also performed linear regressions on these values to find trends
between estimate variability and sample numbers used for estimates.
We randomly selected 3 to 9 samples from each of these subsets 55
times. In total, there were 385 estimates for each group of 3–9
samples, resulting in a total of 2695 separate runs of AutoTuner per
dataset. We performed a sensitivity analysis to determine the downstream
data processing effect that different values on mzDiff, the only continuous
valued centWave parameter not optimized by AutoTuner, had on the returned
feature table. To accomplish this, we counted the number of unique
features between pairs of feature tables generated with mzDiff parameters
varying by a value of 0.001.
Results
AutoTuner Accuracy
and Comparison to IPO
At this time,
the only open source method for automated selection of peak-picking
parameters for XCMS is isotopologue parameter optimization (IPO).[22] IPO uses a gradient descent algorithm that requires
users to iteratively run centWave with different combinations of parameters
until the set that maximizes a scoring function is identified. We
used 5 distinct metrics to compare the accuracy, speed, and downstream
data structure of IPO- and AutoTuner-derived parameters. The metrics
include the accuracy, number of features, the peak areas, and shapes
of EIC peaks only detected using parameters from one of the two methods
and MS2 count.We searched for 85 known chemical
standards (a total of 101 possible ions) within feature tables generated
with IPO- and AutoTuner-derived parameters to test the influence of
each parameter selection method on data processing accuracy (Figure , Tables and S1). We detected 82 and 100 standards within the feature table generated
with IPO- and AutoTuner-derived parameters, respectively. Figure S2 provides an example of compounds that
were only detected with AutoTuner and were absent when the IPO-derived
parameters were used. These results were robust to the choice of intensity
thresholds (Figure S3).
Figure 2
AutoTuner and IPO accuracy
comparison. Percentages were determined
from the number of detected standard peaks relative to the total possible
set. M denotes 12C13C0 isotopologue; [M + 1] denotes 12C13C1 isotopologue;
[M + 2] denotes 12C13C2 isotopologue. We normalized percentages
by the total number of possible detectable peaks based on the detection
of 12C standards (12C13C0).
AutoTuner and IPO accuracy
comparison. Percentages were determined
from the number of detected standard peaks relative to the total possible
set. M denotes 12C13C0 isotopologue; [M + 1] denotes 12C13C1 isotopologue;
[M + 2] denotes 12C13C2 isotopologue. We normalized percentages
by the total number of possible detectable peaks based on the detection
of 12C standards (12C13C0).Additionally, we enumerated all features matching 12C13C1 and 12C13C2 isotopologues of standards to determine the influence
of parameter values on the detection of lower intensity features (Figure ). We only considered
these isotopologues if the 12C13C0 ion was present within the feature tables
derived with method-specific parameters. We detected 80 out of 81
and 38 out of 64 possible 12C13C1 and 12C13C2 peaks, respectively, within
the AutoTuner-derived feature table. We detected 46 out of 68 and
8 out of 59 possible 12C13C1 and 12C13C2 peaks, respectively, within
the IPO-derived feature table.We first compared the number
of features from culture data generated
with parameters from each algorithm to understand the influence of
parameter selection on downstream data properties (Figures A and S5A). Each feature table contained a distinct number of total
features following processing and quality control (Table S3). In positive ion mode, AutoTuner-derived parameters
detected fewer unique features (203) compared to 2606 unique features
detected with IPO-derived parameters (Figure A), while sharing 1022 features between them.
A similar situation was observed in negative ion mode where AutoTuner
detected 540 unique features compared to 3420 unique features found
with IPO-derived parameters, while sharing 904 features (Figure S5A).
Figure 3
Comparing the differences between positive
ion mode data generated
by AutoTuner and IPO on the culture dataset. (A) The overlap in the
number of m/z-rt features generated
by both methods. Features with an error of 5 ppm and retention time
error of 20 s are placed in the intersect. (B) Comparison of the differences
in structural properties for the maximum continuous wavelet transform
coefficient (CWT) between peaks detected only within AutoTuner (orange)
and IPO (green). Both curves are empirical cumulative distribution
functions (CDF) of the calculated metric. (C) Three randomly selected
EIC peaks that fall on distinct regions of the maximum CWT empirical
cumulative distribution function to demonstrate how this metric influence
peak shapes. The EIC shape reflects the maximal CWT rather than the
parametrization method. The CDF curves were significantly different
(KS-test, p < 10–4, n = 203). Results for positive mode data area CDF and negative
data were similar to this data and are found in Figures S4 and S5, respectively.
Comparing the differences between positive
ion mode data generated
by AutoTuner and IPO on the culture dataset. (A) The overlap in the
number of m/z-rt features generated
by both methods. Features with an error of 5 ppm and retention time
error of 20 s are placed in the intersect. (B) Comparison of the differences
in structural properties for the maximum continuous wavelet transform
coefficient (CWT) between peaks detected only within AutoTuner (orange)
and IPO (green). Both curves are empirical cumulative distribution
functions (CDF) of the calculated metric. (C) Three randomly selected
EIC peaks that fall on distinct regions of the maximum CWT empirical
cumulative distribution function to demonstrate how this metric influence
peak shapes. The EIC shape reflects the maximal CWT rather than the
parametrization method. The CDF curves were significantly different
(KS-test, p < 10–4, n = 203). Results for positive mode data area CDF and negative
data were similar to this data and are found in Figures S4 and S5, respectively.We then compared the structural differences between features exclusively
detected using IPO- and AutoTuner-derived parameters. We created an
empirical cumulative distribution function (CDF) to compare the distribution
of peak areas (Figure S5A,B) and maximum
observed continuous wavelet transform (CWT) coefficients (Figures B and S5C) of all EIC peaks belonging to features outside
of the intersect. The maximum observed CWT coefficient increases with
peak steepness and may provide a measure of a peak’s chromatographic
resolution (Figure C). The CDF of each
metric was significantly different in positive (KS-Test; area: p < 10–6; CWT: p <
10–4; n = 203) and negative (KS-Test;
area: p < 10–14; CWT: p < 10–8; n = 540)
ionization mode data. The application of the same test on unbalanced
comparisons (e.g., negative ion mode: 3420 IPO- vs 540 AutoTuner-unique
features) was more highly significant than using equivalent numbers
of features obtained through subsampling.Next, we compared
the abundance of features with MS/MS spectra
within each unique feature table because features with MS/MS spectra
can be compared to spectral databases and authenticated standards,
thus enabling potential identification. In total, we observed more
features with MS/MS spectra within the feature table generated with
IPO-derived parameters compared to that generated with AutoTuner-derived
parameters (positive: 448 vs 115; negative: 686 vs 121; both for IPO
vs AutoTuner, respectively). However, this is due primarily to the
greater number of features in the IPO-derived table. Indeed, relative
to total features, IPO-derived features were less likely to have associated
MS/MS spectra than features within the intersect of both datasets
(hypergeometric test; negative ion mode: p < 10–10; positive ion mode: p < 10–10). A similar comparison revealed that MS/MS enrichment
was not significantly different between AutoTuner-derived features
and those within the intersect (Table S4).Finally, using all three of the test datasets, we compared
the
time required to run AutoTuner and IPO (Table ). After accounting for number of CPUs, AutoTuner
ran hundreds to thousands of times faster.
Table 3
Run-Times
for AutoTuner and IPO Required
to Run 6 Common Samples Collected in Positive (+) and Negative (−)
Ionization Modesa
algorithm
culture (−)
culture (+)
standards (−)
standards (+)
community
(−)
community (+)
AutoTuner
2 min
9 min
2 min
3 min
25 min
26 min
IPO
7 h 23 min
28 h 40 min
31 h 56 min
28 h 5 min
38 h 4 min
21 h 27 min
ratio (Auto/IPO)
1479
1518
6970
4238
715
396
samples used
6
6
4
4
6
6
All system time measurements
were done on an 8 CPUs and 10 Gb of memory Ubuntu Xenial 16.04 Google
Cloud instance. IPO ran on 8 CPUs, while AutoTuner ran on 1 CPU. The
ratio accounts for the total computing power used to run both algorithms.
All system time measurements
were done on an 8 CPUs and 10 Gb of memory Ubuntu Xenial 16.04 Google
Cloud instance. IPO ran on 8 CPUs, while AutoTuner ran on 1 CPU. The
ratio accounts for the total computing power used to run both algorithms.
Testing the Robustness
of the AutoTuner Estimation
Figure shows the
coefficient of variation and estimate values for each 11 sample subset
for the parameter obtained from
Monte Carlo analysis on the culture and community datasets. Figures S6–S13 show the complete set of
results from the Monte Carlo analysis for all parameters. For all
parameter estimates, the variability of estimation decreased linearly
with the number of samples used under both ionization modes (Figures A and S6–S13). The rendered parameter estimates
were consistent with expectations based on the mass analyzer used
to generate the data (Table ). With the exception of the parameter estimate in the community dataset and the estimate in the negative culture data,
all parameters had a coefficient of variation (CV) of less than or
equal to 0.1 when using 9 samples to obtain the estimates (Figures S6, S8, S10, and S12).
Figure 4
Results from the Monte
Carlo experiment for parameter in
positive ion mode data. (A) The distribution
of the coefficient of variation from parameters within the 11 sample
group, while (B) shows the distribution of all estimates for ppm.
Blue bars describe data collected by the qTOF instrument (community
data) while yellow bars describe FT-ICR-MS data (culture data). See Figures S5–S12 for results on other parameter
estimates in each ionization mode and dataset.
Results from the Monte
Carlo experiment for parameter in
positive ion mode data. (A) The distribution
of the coefficient of variation from parameters within the 11 sample
group, while (B) shows the distribution of all estimates for ppm.
Blue bars describe data collected by the qTOF instrument (community
data) while yellow bars describe FT-ICR-MS data (culture data). See Figures S5–S12 for results on other parameter
estimates in each ionization mode and dataset.
Discussion
AutoTuner is a robust, rapid, and high-fidelity
estimator of untargeted
mass spectrometry data processing parameters. Its unique design improves
upon previous methods by providing a scalable framework to handle
large datasets, reducing run-time, and generating high-accuracy parameters
that retain known features. AutoTuner’s ease of use make it
an ideal candidate to include within existing data processing pipelines.[39−42] AutoTuner’s high
accuracy indicates that its parameter selection is based on true data
features. One possible explanation for the lower accuracy of IPO is
that the peak-width of the missing standards was below the parameter selected by IPO (Table S2 and Figure S2).AutoTuner parameter
estimates were robust across all datasets and
ionization modes. Some parameters like , , , , and reflect systematic properties
inherent to the platform chromatography, mass analyzer, and/or sample
matrix.[43] Other parameters like are more specific to each sample;
hence, the increase in the total number of samples used to estimate
parameters always strengthened their robustness. The low CV values
for parameter estimates suggests that using a subset of samples to
generate estimates returns a set representative for all samples. On
the basis of our results, we recommend the use of 9 and 12 samples
to generate estimates in culture and community datasets, respectively.
For most of the parameters estimated here, 9 samples proved sufficient
to obtain estimates with CV values less than 0.1. The 12-parameter
recommendation originated from extrapolating the linear fits of these
data to obtain 0.1 CV values for remaining parameters that failed
this criterion (Figures S6, S8, S10, and S12). We were unable to check the robustness of the parameter estimate, as this parameter is
estimated through a nonautomatable cross sample comparison during
the TIC peak detection step of the algorithm.Although other
algorithms return more parameter estimates than
AutoTuner, the parameters calculated by AutoTuner represent continuous
valued ones with the greatest possible number of choices. Performing
a parameter sweeping optimization like previous approaches to estimate
the remaining parameters after fixing the AutoTuner derived ones reduces
the total combinations of available centWave parameters from a space
of at least 24 × 58 possible choices of
parameters to one of 40. This is because the centWave algorithm, used
by both XCMS and MZmine2
data processing tools, requires tuning of 11 distinct parameters.
Of these, 8 are continuously valued, meaning that they can be any
real number. The remainder either are boolean values or can be one
of a few discrete choices (Table S5). The
reduction of the total number of combinations is achieved by optimizing
7 of the 8 continuous valued parameters. In regards to the last continuous
parameter not optimized by AutoTuner, mzDiff, we performed a sensitivity
analysis to show that distinct values had a minimal effect on the
returned feature table (Table S6). Future
contributions towards AutoTuner’s design can help the estimation
of additional parameters not covered within its current design.AutoTuner’s low run-time indicates that the algorithm is
scalable (Table ).
As more and more data is generated due to increases in analytical
throughput and dataset size, AutoTuner will remain a tractable option
to generate estimates of metabolomics data processing parameters.[4,44] Because AutoTuner
estimates parameters much faster than IPO and IPO was shown to perform
at a faster rate than software preceding it, we surmise that AutoTuner
is the fastest parameter selection algorithm available.[22]The evaluation of quality between culture
dataset feature tables
generated with IPO- and AutoTuner-derived parameters is impossible
without performing a complete validation of all possible features.
Such analyses are time-consuming, labor intensive, and beyond the
scope of this manuscript. However, the measured properties of these
datasets may provide some expectation for practitioners of metabolomics
of how the data generated using each method may differ.When
considering the unique features identified by each algorithm
in the culture dataset, AutoTuner found fewer features in each case
(Figures and S4). This may be due to the selection of different parameters between AutoTuner and
IPO; the thresholds selected
by IPO were higher under each ionization mode than those selected
by AutoTuner (Table S2). AutoTuner’s
lower estimates do not appear
to be too stringent, as they are between 4 and 6 times greater than
the instrument-recommended error threshold of 0.5 ppm and they are
consistent with recommendations by the “centWave” developers.[36] The size of the processed data using AutoTuner-derived
parameters was in line with previous work validating the metabolome
of Escherichia coli after performing stringent isotope
labeling experiments and quality control filtering.[26] AutoTuner feature selection does not appear to be biased
toward higher intensity features, because the standard dataset processed
with AutoTuner-derived parameters contained a high percentage of possible 13C isotopologues. The paucity of size-validated metabolome
datasets precludes further evaluation of the feature number comparison.
Within the AutoTuner-derived feature tables, those features unique
to the AutoTuner-derived parameters were enriched in MS/MS relative
to the unique IPO-derived features. We stress that features with MS/MS
spectra cannot be assumed to be more or less important within a metabolomics
dataset; however, the presence of these spectra enhances downstream
identification efforts and may be desirable to some investigators.AutoTuner has several avenues for possible improvement. First,
AutoTuner could be parallelized to reduce computation time by a factor
of the total number of CPUs used. Second, additional algorithms may
be implemented to optimize parameters not covered here. One drawback
from the speed gained in the computation through its “divide-and-conquer”
approach comes at a loss of comparing EIC peaks across samples to
estimate retention time correction algorithms. This challenge leaves
room for the implementation of additional algorithms. Third, the replacement
of the sliding window analysis with a more sophisticated and sensitive
peak detection approach may eliminate the need for user input during
the first portion of AutoTuner. However, this automation comes at
the cost of manual inspection of the raw data. We support manual inspection
of the raw data, because it provides a quality control check for the
data generation steps leading up to the analysis. AutoTuner provides
several built-in plotting functions to facilitate this evaluation
step. Despite these minor caveats, AutoTuner is a viable and time-saving
option to determine proper data processing parameters for untargeted
metabolomics data.
Authors: Namrata S Kale; Kenneth Haug; Pablo Conesa; Kalaivani Jayseelan; Pablo Moreno; Philippe Rocca-Serra; Venkata Chandrasekhar Nainala; Rachel A Spicer; Mark Williams; Xuefei Li; Reza M Salek; Julian L Griffin; Christoph Steinbeck Journal: Curr Protoc Bioinformatics Date: 2016-03-24
Authors: Hannes L Röst; Timo Sachsenberg; Stephan Aiche; Chris Bielow; Hendrik Weisser; Fabian Aicheler; Sandro Andreotti; Hans-Christian Ehrlich; Petra Gutenbrunner; Erhan Kenar; Xiao Liang; Sven Nahnsen; Lars Nilse; Julianus Pfeuffer; George Rosenberger; Marc Rurik; Uwe Schmitt; Johannes Veit; Mathias Walzer; David Wojnar; Witold E Wolski; Oliver Schilling; Jyoti S Choudhary; Lars Malmström; Ruedi Aebersold; Knut Reinert; Oliver Kohlbacher Journal: Nat Methods Date: 2016-08-30 Impact factor: 28.547
Authors: Xavier Domingo-Almenara; J Rafael Montenegro-Burke; Carlos Guijas; Erica L-W Majumder; H Paul Benton; Gary Siuzdak Journal: Anal Chem Date: 2019-02-11 Impact factor: 6.986
Authors: Louis-Félix Nothias; Daniel Petras; Robin Schmid; Kai Dührkop; Johannes Rainer; Abinesh Sarvepalli; Ivan Protsyuk; Madeleine Ernst; Hiroshi Tsugawa; Markus Fleischauer; Fabian Aicheler; Alexander A Aksenov; Oliver Alka; Pierre-Marie Allard; Aiko Barsch; Xavier Cachet; Andres Mauricio Caraballo-Rodriguez; Ricardo R Da Silva; Tam Dang; Neha Garg; Julia M Gauglitz; Alexey Gurevich; Giorgis Isaac; Alan K Jarmusch; Zdeněk Kameník; Kyo Bin Kang; Nikolas Kessler; Irina Koester; Ansgar Korf; Audrey Le Gouellec; Marcus Ludwig; Christian Martin H; Laura-Isobel McCall; Jonathan McSayles; Sven W Meyer; Hosein Mohimani; Mustafa Morsy; Oriane Moyne; Steffen Neumann; Heiko Neuweger; Ngoc Hung Nguyen; Melissa Nothias-Esposito; Julien Paolini; Vanessa V Phelan; Tomáš Pluskal; Robert A Quinn; Simon Rogers; Bindesh Shrestha; Anupriya Tripathi; Justin J J van der Hooft; Fernando Vargas; Kelly C Weldon; Michael Witting; Heejung Yang; Zheng Zhang; Florian Zubeil; Oliver Kohlbacher; Sebastian Böcker; Theodore Alexandrov; Nuno Bandeira; Mingxun Wang; Pieter C Dorrestein Journal: Nat Methods Date: 2020-08-24 Impact factor: 28.547