A mass spectrometry (MS) method is described here that can reproducibly identify hundreds of peptides across multiple experiments. The method uses intelligent data acquisition to precisely target peptides while simultaneously identifying thousands of other, nontargeted peptides in a single nano-LC-MS/MS experiment. We introduce an online peptide elution order alignment algorithm that targets peptides based on their relative elution order, eliminating the need for retention-time-based scheduling. We have applied this method to target 500 mouse peptides across six technical replicate nano-LC-MS/MS experiments and were able to identify 440 of these in all six, compared with only 256 peptides using data-dependent acquisition (DDA). A total of 3757 other peptides were also identified within the same experiment, illustrating that this hybrid method does not eliminate the novel discovery advantages of DDA. The method was also tested on a set of mice in biological quadruplicate and increased the number of identified target peptides in all four mice by over 80% (826 vs 459) compared with the standard DDA method. We envision real-time data analysis as a powerful tool to improve the quality and reproducibility of proteomic data sets.
A mass spectrometry (MS) method is described here that can reproducibly identify hundreds of peptides across multiple experiments. The method uses intelligent data acquisition to precisely target peptides while simultaneously identifying thousands of other, nontargeted peptides in a single nano-LC-MS/MS experiment. We introduce an online peptide elution order alignment algorithm that targets peptides based on their relative elution order, eliminating the need for retention-time-based scheduling. We have applied this method to target 500 mousepeptides across six technical replicate nano-LC-MS/MS experiments and were able to identify 440 of these in all six, compared with only 256 peptides using data-dependent acquisition (DDA). A total of 3757 other peptides were also identified within the same experiment, illustrating that this hybrid method does not eliminate the novel discovery advantages of DDA. The method was also tested on a set of mice in biological quadruplicate and increased the number of identified target peptides in all four mice by over 80% (826 vs 459) compared with the standard DDA method. We envision real-time data analysis as a powerful tool to improve the quality and reproducibility of proteomic data sets.
Large-scale proteomic
studies make use of a variety of tools and
techniques to achieve depth and wide coverage of proteomes. The most
popular method for sequencing proteomes is shotgun sequencing where
peptides are digested from extracted proteins, separated with chromatography
(HPLC), and then mass-analyzed using mass spectrometry (MS).[1,2] Since complex proteomes can encompass thousands of proteins, leading
to millions of peptides, deciding how to allocate the limited mass
spectrometer bandwidth is key to successful analysis.[3] By far the most successful method for this time management
is data-dependent acquisition (DDA), where intact peptide precursors
are first mass-analyzed (MS1), specific m/z features are then selected to undergo fragmentation,
and finally the fragment ions are mass-analyzed again (MS/MS). This
process is repeated throughout the LC separation, resulting in a large
collection of MS and MS/MS spectra. Peptides are eventually identified
from the fragmentation spectra and then assembled into protein groups.[4−8] This approach has produced outstanding results in the past decade,
but due to a variety of reasons (e.g., large protein dynamic range,
speed of MS instrumentation, separation efficiency, etc.) undersampling
of proteomes is very common. In other words, not every peptide is
identified in every nano-LC–MS/MS experiment. Incomplete data
sets limit the questions researchers can answer; in particular, when
biological replication is used to increase statistical power, many
measurements become worthless if they cannot be measured reproducibly.[9] Because proteomics seeks to answer global biological
questions, reproducible peptide identification between data sets is
mandated.[10−12]Many studies have outlined the problem of poor
peptide reproducibility.[13−17] Aebersold succinctly summarized that irreproducibility is a multifaceted
issue, depending on user experience, equipment, and data analysis,
among others.[18] He outlines that there
are two main approaches in tackling irreproducibility. First, exhaustively
identify every peptide in a sample—an approach that is becoming
more feasible as technology improves.[19−21] The more common approach,
as many other researchers have embarked on, is to focus on a smaller
subset of peptides and to thoroughly identify and quantify those using
targeted methods.[22] Methods such as selected
reaction monitoring (SRM)[23] are powerful
and reproducible but are low-throughput, targeting a few hundred peptides
at most in a single nano-LC–MS/MS experiment.[24−27] Targeted methods almost exclusively rely on retention-time-based
scheduling to improve identification reproducibility and throughput,
segmenting the MS duty cycle among the target peptides. In SRM methods,
a series of MS/MS transitions for each targeted peptide is automatically
collected at the appropriate retention time (RT), removing the dependence
on MS1 detection. This requires precise knowledge of the
peptide RT for the LC–MS system and is low-throughput because
only one set of transitions is monitored at a given point in time.
Recent work on intelligent SRM (iSRM) increases throughput by monitoring
only a subset of transitions for each target, switching to normal
SRM when these transitions are detected.[28] We sought to expand upon the idea of intelligent real-time switching
of methods by combining the enhanced reproducibility of targeted scheduled
methods with the novel discovery advantages of DDA in a single hybrid
method. Our goals were three-fold: first, to develop a method that
increases the throughput of targeting; second, to replace retention-time
based scheduling and its laborious method development with a more
robust and straightforward peptide elution ordering; and last, to
maintain the discovery aspect of DDA sampling while simultaneously
targeting a subset of peptides.In the past decades, a few computational
approaches have been aimed
at solving the problem of poor reproducibility. The concept of accurate
mass tags (AMTs) was first introduced by Smith et al. as a means to
identify peptides in multiple runs based on accurate mass and RT.[29] This concept was further expanded with PepMiner
and PEPPeR, tools for clustering features among multiple data sets.[30,31] Most notably, Prakash et al. introduced the concept of aligning
multiple MS data sets based on peptide relative elution order (EO)
into signal maps.[32] To date, these and
other computational methods[33−38] have been performed postacquisition, attempting to improve already
collected data. We seek to improve the reproducibility at the source
by improving the algorithms the MS uses to select precursors to fragment.
We and others have proposed using real-time data analysis and dynamic
MS control as a means for improving the quality of acquired spectra.[39−41] These methods rely on determining peptide spectrum matches (PSMs)
in real time and using those identifications to make informed, dynamic
decisions. However, real-time identification has some setbacks: (1)
MS/MS spectra are not always identified leaving the data incomplete,
(2) wrongly assigned PSMs could negatively affect performance, and
(3) a reduction in the instrument duty cycle decreases the number
of MS/MS performed. These, and other issues, have lead us to investigate
alternative ways for detecting peptides in real-time, primarily through
accurate mass measurements. Here we present our findings on combining
accurate mass, EOs, and real-time data analysis to improve the sampling
reproducibility of the MS.
Experimental Procedures
Yeast Culture
Saccharomyces cerevisiae strainBY4741 was grown
in yeast extract peptone dextrose media
(YPD) (1% yeast extract, 2% peptone, 2% dextrose). A starter culture
was added to 2 L of media and was propagated for ∼12 generations
(20 h) to a total OD600 of ∼2. The cells were pelleted
with centrifugation at 5000 rpm for 5 min, the supernatant was decanted,
and the pellet was resuspended in chilled NanoPure water. Washing
with water was repeated twice, and the final pelleting was performed
at 5000 rpm for 10 min. The pellet was resuspended in lysis buffer
composed of 50 mM Tris pH8, 8 M urea, 75 mM sodium chloride, 100 mM
sodium butyrate, protease, and phosphatase inhibitor tablet (Roche).
Cell lysing was performed with glass bead milling in a stainless-steel
container (Retsch). A 2.5 mL aliquot of resuspended yeast was shaken
with 2 mL of acid-washed glass beads at 30 Hz for 4 min, followed
by 1 min of rest, for eight cycles.
Mouse Handling and Tissue
Isolation
Four male C57BL/B6
mice were bred from in-house colonies and housed in an environmentally
controlled facility with free access to water and standard rodent
chow (Purina #5008). Mice were kept in accordance to the University
of Wisconsin-Madison Research Animals Resource Center and NIH guidelines
for care and use of laboratory animals. At 10 weeks of age, mice were
sacrificed by decapitation after a 4 h fast. Eight tissues were dissected
from the mice (cerebellum, cerebrum, kidney, heart, liver, lung, extensor
digitorum longus, and spleen), flash frozen in liquid nitrogen, and
stored at −80 °C. Tissues were homogenized in 1 mL of
lysis buffer/100 mg tissue (8 M urea, 50 mM Tris, 100 mM NaCl, 1 mM
CaCl2, 100 mM sodium butyrate, 5 μM MS-275, 0.2 μM
SAHA, Roche protease, and phosphatase inhibitor tablets).
Sample Preparation
Protein was quantified by BCA (Pierce)
and reduced with 5 mM dithiothreitol and incubated for 45 min at 55
°C. Alkylation was performed with 15 mM iodoacetamide for 30
min in the dark and quenched with 5 mM dithiothreitol. Urea concentration
was diluted to 1.5 M with 50 mM Tris pH 8.0. Proteolytic digestion
was performed by the addition of Trypsin (Promega), 1:50 enzyme to
protein ratio, and incubated at ambient temperature overnight. For
quantitative studies, the resulting peptides were labeled with TMT
8-plex (Pierce) isobaric tag and mixed.[42,43] All samples
were desalted using C-18 solid-phase extraction (SPE) columns (Waters,
Milford, MA) prior to nano-LC–MS/MS analysis.
Nano LC–MS/MS
Analysis
Peptides were separated
with online reverse-phase chromatography using a nanoACQUITY UPLC
system (Waters, Milford, MA). Peptides were first loaded onto a precolumn
(75 μm ID, 5 cm Magic C18 particles, Bruker, Michrom) for 10
min at 1 μL/min flow rates. Peptides were then separated on
a 30 cm analytical column (75 μm ID, 5 cm Magic C18 particles)
for either 100 or 165 min over a linear gradient from 8 to 35% acetonitrile
at 300 nL/min. Mass analysis was performed on an LTQ Orbitrap Elite[44] mass spectrometer (Thermo Fisher Scientific,
San Jose, CA) using 60 000 resolving power (RP) MS1 scans. Peptides selected for MS/MS analysis used a 2 Th isolation
width, were fragmented with HCD (NCE = 35), and were analyzed in the
Orbitrap at 15 000 RP or 30 000 RP for quantitative
experiments. Unless otherwise noted, data-dependent analysis was performed
selecting the top 15 most intense m/z features (charge state >1) for MS/MS analysis. Dynamic exclusion
settings were enabled for 35 s at ±10 ppm mass window, 1 occurrence
with a maximum of 500 exclusions at any given point in time. Automatic
gain control (AGC) was enabled, and MS1 targets were set
to 1 × 106 and MS/MS targets were set to 5 ×
104. Accurate mass inclusion list experiments would prioritize
MS/MS sampling from a list of targets at ±10 ppm mass tolerances.
Remaining MS/MS events were filled with normal top-N DDA approaches.
Intelligent data acquisition control was implemented using the ion
trap control language (ITCL, Thermo Fisher Scientific), and the pseudocode
of these modifications is included in the Supporting
Information. In brief, following MS1 analysis, the
spectra were analyzed using algorithms written in ITCL to select targets
for MS/MS analysis (described herein). Any remaining MS/MS slots would
be filled by the unmodified DDA firmware code. For information on
implementing the modified firmware code, please contact Thermo Fisher
Scientific. All nanoLC–MS/MS experiments in the Thermo .raw
format are located on the Chorus Project Website (https://chorusproject.org/) under the ‘Elution Order Algorithm’ project.
Data Analysis
Thermo .raw files were processed using
the Coon OMSSA Proteomic Analysis Software Suite (COMPASS)[45] and in-house software. In brief, raw files were
converted to the dta file format (DTA Generator) and were searched
using the Open Mass Spectrometry Search Algorithm (OMSSA, v 2.1.9).[46] Yeast data were searched against a target-decoy[47] database of yeast ORFs (www.yeastgenome.com, February 3, 2011) and mouse data from UniProt canonical database.
Peptides were generated from a tryptic digestion with up to three
missed cleavages, carbamidomethylation of cysteines as fixed modifications,
and oxidation of methionines as variable modifications. For quantitative
experiments, a fixed modification of 8-plex TMT tag was added to lysines
and peptide n-terminus, with a variable modification of 8-plex TMT
tag on tyrosines. Precursor mass tolerance was 100 ppm using the multiisotope
function (-tem 4 -ti 4), and product ions were searched at 0.015 Da
tolerances. Peptide spectral matches (PSMs) were reduced to unique
peptide sequences (I/L ambiguity removed) and validated using FDR
Optimizer based on q values and precursor mass accuracy
(<10 ppm) at a 1% peptide-level false discovery rate (FDR).[48−50] Protein groups were constructed from peptide identifications according
to the law of parsimony and filtered to a 1% protein-level FDR (Protein
Hoarder). For quantitative data sets, peptides were quantified with
TagQuant (v1.4) using the generated TMT 8-plex reporter ions, corrected
for isotopic impurities, and normalized to total protein abundance.
Quantitative significance (p value) was determined
by the Student’s t test with Storey correction
assuming equal variances.[51] Peptide EO
determination algorithms were performed by custom software developed
in C# with the Microsoft .NET Framework version 4.5. This software
is available for download by visiting www.chem.wisc.edu/∼coon/software.php.
Results and Discussion
Irreproducible Peptide Identification
In DDApeptide
precursors are selected for fragmentation based on intensity in a
MS1 survey scan. This straightforward approach has proven
to be a simple and powerful technique. However, it is pestered with
inconsistent sampling and therefore irregular peptide identification
between experiments. The DDA method is inherently stochastic in nature,
depending heavily on the consistency of the input data (MS1) to deliver reproducible peptide identification (MS/MS). Even the
slightest change in the chromatography or ionization efficiencies
will have repercussions on the collection of the whole data set, as
selecting m/z features for MS/MS
analysis is often dependent on previous decisions (e.g., dynamic exclusion).
To characterize the extent these minor changes have on the reproducibility
of peptide identifications, six replicate injections of a tryptic
digest of yeast whole cell lysate were analyzed using DDA on the same
nano-LC–MS/MS system over a span of 10 days. On average, each
experiment identified 13 289 ± 340 unique peptide sequences
(I/L ambiguity removed) at a 1% peptide-level FDR, indicating a highly
consistent separation and nearly identical instrument performance.
Of the 23 919 unique peptides identified in total, only 5404
(22.6%) of those peptide were identified in all six experiments (Figure 1). A significant portion were only identified once
(7474, 31.2%), while the remaining peptides were divided between two
and five experiments. This clearly demonstrates the irreproducibility
of DDA sampling on the same peptide solution. The reproducibility
of identified protein groups fares better; 1708 of 3054 (56%) protein
groups were identified in every experiment. The higher overlap percentage
is because many different peptides can make up one protein group,
minimizing the importance of identifying the same peptides in all
experiments. However, post-translation modification (PTM) analysis
requires identification of the same sites to compare between experiments,
demanding the need for high peptide overlap. PTM analysis and quantitation
is becoming more prominent in the literature, thus making this a growing
problem in the field. Two reasons can be attributed to the poor reproducibility
of stochastic DDA sampling. First, precursors having low signal-to-noise
(S/N) are affected first by changes in chromatography and ionization.
For example, a precursor with a maximal S/N of four may have been
sampled and identified in one experiment, but in the next experiment,
the S/N may have dropped below the detection threshold and excluded
from being sampled. This is evident when 8883 MS1 features
from peptides identified in one or all of the six experiments were
examined for their maximal S/N (Supplemental Figure 1 in the Supporting Information). For peptides identified
once, 2707 (30.5%) had a maximal S/N ≤ 4, while only 814 (9.2%)
precursors identified in every experiment had similar maximum S/N.
The other reason for inconsistent peptide identification is increased
MS1 spectral complexity, specifically its effect on charge-state
assignment. In proteomic MS/MS workflows, precursors are often only
selected when they exhibit a well-defined charge state—usually
where z > 1, as singly charged precursors fragment
poorly and usually do not lead to positive identifications. Increases
in spectral complexity hinder the charge-state determination algorithms,
especially for low S/N precursors. This results in skipping precursors
even if its signal-to-noise is above the sampling threshold.
Figure 1
Overlap of
peptide identification among the analysis of six technical
replicates. Six nano-LC–MS/MS experiments produced 23 919
unique peptide identifications in total, but only one-fifth of the
identifications were observed in all six replicates. A large percentage
(31.2%) of the peptides were only detected in one of the six experiments.
Overlap of
peptide identification among the analysis of six technical
replicates. Six nano-LC–MS/MS experiments produced 23 919
unique peptide identifications in total, but only one-fifth of the
identifications were observed in all six replicates. A large percentage
(31.2%) of the peptides were only detected in one of the six experiments.
Retention-Time-Based Targeting
When good peptide identification
reproducibility is needed, RT-based targeting, that is, scheduling,
has been the method of choice. Here peptides of interest are assigned
an expected elution time and MS/MS is triggered, regardless of MS1 detection, during the appropriate time range. This avoids
the two issues with DDA sampling previously described and enables
much higher reproducibility. However, such methods are laborious to
construct and maintain—identical LC and MS parameters must
be kept between experiments to minimize any variances in RTs of the
peptides.To assess the degree of variance in peptide RTs that
occurs in normal nano-LC–MS/MS experiments, two of the yeastDDA experiments described above, performed 10 days apart, were compared.
The first experiment (July 22, D0) produced 13 529 unique peptides,
and the second experiment (July 31, D9) identified 13 433 yeastpeptides. Together, 7589 peptides were in common and the apex RT of
their elution in each experiment is plotted in Figure 2A. The relationship between RTs of matched peptides is highly
linear (R2 = 0.9989) but has a nonunity
slope and nonzero intercept (m = 1.033; b = −0.647). While the slope is very close to 1, even the slightest
deviation (0.033), compounded over time, leads to large RT differences
late in the separation (e.g., ∼1.6 min shift at 70 min). On
the whole, the average RT deviation was nearly 1 min (μ = −0.805
min) with a broad distribution over a 2 min range (Figure 2B). Typically, the assigned peptide elution times
must be corrected to encompass this shift.
Figure 2
To assess the deviation
in retention times for matched samples,
we ran two identical nano-LC–MS/MS experiments 10 days apart
on the same LC–MS system. (A) The relationship between apex
retention times of the 7589 unique peptides common between experiments
displays a high degree of linearity (R2 = 0.9989) but a skewed slope and nonzero intercept (m = 1.033; b = −0.647). (B) Average deviation
from unity was nearly a minute off (μ = −0.805 min),
with a broad distribution over 2 min wide. (C) Peptides ranked by
their relative elution order exhibit a normal distribution around
zero (μ = −1.097).
To assess the deviation
in retention times for matched samples,
we ran two identical nano-LC–MS/MS experiments 10 days apart
on the same LC–MS system. (A) The relationship between apex
retention times of the 7589 unique peptides common between experiments
displays a high degree of linearity (R2 = 0.9989) but a skewed slope and nonzero intercept (m = 1.033; b = −0.647). (B) Average deviation
from unity was nearly a minute off (μ = −0.805 min),
with a broad distribution over 2 min wide. (C) Peptides ranked by
their relative elution order exhibit a normal distribution around
zero (μ = −1.097).We hypothesize that—due to the degree of linearity
in peptide
RTs—we could avoid these corrections by scheduling peptides
based on their relative EO, opposed to their absolute RT. Under similar
LC conditions (i.e., same particles, temperature, column length, phase,
etc.) peptides elute in the same relative order regardless of separation
duration or slope. For example, if peptide ‘A’ elutes
before peptide ‘B’ in a 30 min LC gradient, the same
ordering is preserved with a 60 min LC gradient, even if the absolute
RTs vary greatly. When many peptides’ EOs are taken into account
(e.g., thousands of peptides), they provide a simple way to correct
for elution variation dynamically. This is evident when we took the
7589 peptides and rank ordered them based on their apex RTs for both
the D0 and D9 experiments and plotted the difference between matched
peptides (Figure 2C). Here the values are normally
distributed around zero (μ = −1.097) with a full width
at half-maximum (fwhm) of only ∼100. EO can be useful even
under extreme differences under chromatographic conditions as well.To simulate dynamic chromatographic conditions, we separated yeastpeptides under two different LC gradient profiles. The resulting peptide
identifications were again matched between the runs, and the RT difference
was plotted (Supplemental Figure 2A in the Supporting
Information). These data show an average deviation of 10 min
between the two gradients (Supplemental Figure 2B in the Supporting Information), but when ranked by their
EOs, the two experiments show a linear slope of 1 with a normal distribution
of ranked EOs around zero (Supplemental Figure 2C,D in the Supporting Information).
Real-Time Elution Ordering
Alignment
We reasoned that
using EO could improve the irreproducible sampling of DDA, similarly
to scheduled methods, but on a larger scale and more robustly. The
question shifts from “What RT is it?” as scheduled methods
ask, to “What is the current EO?” By knowing which peptides
are currently eluting from the LC, combined with the a priori knowledge
of their EO, we predict with high fidelity what peptides are going
to subsequently elute.Prior knowledge is needed of the sample
to adequately calculate the EOs of the peptides in the sample. With
time-based scheduled methods, many cursory experiments are performed
to optimize the RTs of the targeted peptides. To reduce variances
in RTs, it is vital that these initial experiments are conducted exactly
the same as the targeted experiments. In stark contrast, EOs can be
determined using a variety of methods. First, much work has been devoted
to determining peptide hydrophobicities from theoretical calculations
of the amino acid sequence.[52−55] A simple list of peptides, ordered by their hydrophobicities,
can produce a highly linear elution ordering. Second, previously collected
data of the sample can produce an accurate elution ordering as long
as the LC conditions are similar enough. This enables the combination
of multiple data sets to produce a single EO versus m/z map (elution order map, EOM), regardless of their
individual separation durations. This is accomplished by rank ordering
all peptide identifications in a given run and normalizing their orderings
between 0 and 100 (where 100 represents the last eluting peptide).
These normalized values are then matched between experiments and aligned
using a simple algorithm to produce the final EOM as shown in Figure 3A. Lastly, the most robust method for determining
peptide EOs is to perform a discovery experiment right before the
targeted experiment. Regardless of how EO is determined, the final
EOM is uploaded onto the instrument and is accessed throughout the
course of the subsequent analyses.
Figure 3
Real-time elution order alignment algorithm.
46.3 min into a nano-LC–MS/MS
experiment, an MS1 scan is performed (A) and m/z features are matched to a 2D ion map stored on
the instrument. (B) 21 of the peaks match 80 features in the ion map
at a 10 ppm tolerance. Of these, over half (41 of 80) were mapped
to one elution order bin (51 elution order). (C) A rolling elution
order range is continually updated throughout the nano-LC–MS/MS
experiment.
Real-time elution order alignment algorithm.
46.3 min into a nano-LC–MS/MS
experiment, an MS1 scan is performed (A) and m/z features are matched to a 2D ion map stored on
the instrument. (B) 21 of the peaks match 80 features in the ion map
at a 10 ppm tolerance. Of these, over half (41 of 80) were mapped
to one elution order bin (51 elution order). (C) A rolling elution
order range is continually updated throughout the nano-LC–MS/MS
experiment.Prior to targeted analysis,
a list of peptide targets, along with
their relative EOs, is also uploaded to the instrument (Figure 4B). Each target is assigned an EO range (first and
last appearance) depending on its length of elution in the discovery
experiments. (See Figure 4C for zoom in.) Maintaining
a dynamic EO range for each peptide is needed as different peptides
elute for different amount of time during the separation. During the
targeted analysis, instead of relying on absolute RT to trigger targeted
MS/MS scans, determining the current EO becomes the main goal of the
method. We have designed an online peptide elution order alignment
(EOA) algorithm that takes a single MS1 spectrum and computes
the current EO therefrom. In brief, following MS1 acquisition,
the EOA algorithm takes the most intense m/z feature and extracts all EO values from the uploaded EOM
at a narrow m/z tolerance (e.g.,
10 ppm) (Figure 3A). Each m/z feature is matched in a similar fashion, and the resulting EO values
are stored in a separate array (Figure 3B).
In this example MS1, 21 m/z features matched a total of 80 EO values. When binned into 1 EO-wide
bins, 41 of these values are contained within a single bin at 50 EO
units. This indicates with high confidence that the current EO is
somewhere near 50. To determine the EO precisely, the algorithm then
calculates the 95% confidence interval around the max EO bin and stores
the minimum (50.02) and maximum (51.64) EO. This process is repeated
for each MS1, and over time the calculated EO range constructs
a rolling average, as shown in Figure 3C. The
EOA algorithm is expedient, taking on average 26 ms per MS1 to execute and does not induce a statistically significant change
in the total number of MS/MS scans performed (Supplemental Figure
3 in the Supporting Information).
Figure 4
Following determination
of the current elution order range (A),
target peptides (B) sharing a similar elution order value are selected
(C, rectangles represent individual peptides). Peptide targets within
the elution order range are filtered based on when they were last
sampled for MS/MS (D), leaving only targets that have been waiting
the longest (e.g., > 5 s, highlighted rectangles). Those filtered
peptides are then immediately sampled by MS/MS, regardless of MS1 detection (D). Unfilled MS/MS events are automatically filled
with m/z features picked by the
intensity-based DDA algorithm using normal sampling parameters (e.g.,
dynamic exclusion, intensity threshold, charge state exclusion, etc.).
Following determination
of the current elution order range (A),
target peptides (B) sharing a similar elution order value are selected
(C, rectangles represent individual peptides). Peptide targets within
the elution order range are filtered based on when they were last
sampled for MS/MS (D), leaving only targets that have been waiting
the longest (e.g., > 5 s, highlighted rectangles). Those filtered
peptides are then immediately sampled by MS/MS, regardless of MS1 detection (D). Unfilled MS/MS events are automatically filled
with m/z features picked by the
intensity-based DDA algorithm using normal sampling parameters (e.g.,
dynamic exclusion, intensity threshold, charge state exclusion, etc.).Once the current EO range is determined,
peptides sharing a similar
EO are selected for MS/MS analysis. In brief, the current EO range
is intersected with the target peptides already uploaded on the instrument
(Figure 4B), and peptides whose EO overlaps
the current EO range are stored as potential targets (Figure 4C). These peptides have a high probability of imminently
eluting because they share very similar EO values with the current
overall EO value. To prevent oversampling of any given target, potential
targets are filtered based on how long since they were last sampled.
Peptides that have been waiting the longest (e.g., >5 s) are automatically
triggered for MS/MS analysis regardless of MS1 detection.
Unfilled MS/MS events are then populated using normal DDA top-N approaches,
excluding any m/z previously selected
to be targeted (Figure 4D). This data collection
scheme enables repetitive, consistent targeting of multiple peptides
over their elution while allowing DDA scans to facilitate discovery.
The EOA algorithm is compatible with other quantitative strategies
such as parallel reaction monitoring (PRM),[56,57] where peptide targets are repeatedly sampled (MS/MS) over their
elution, and the resulting fragment ions are extracted to provide
quantitative information (Supplemental Figure 4 in the Supporting Information).
Improving Peptide Identification
in Multiple Experiments
We reasoned that the EOA algorithm
would improve the reproducibility
of peptide identification across multiple runs. Additionally, we increased
the proteomic complexity by using a mammalian system (mouse) instead
of yeast to determine how sample complexity affects the algorithm.
Here a male C57BL/B6 mouse was sacrificed at 10 weeks, eight organs
were harvested, and peptides from a tryptic digestion of each organ
were labeled with a TMT 8-plex tag. First, six DDA top-15 nano-LC–MS/MS
experiments were performed on the peptide sample. From the results
of these discovery experiments, 500 peptides—identified in
only three of the six experiments—were randomly selected to
serve as peptide targets. These targets represent peptides that are
difficult to identify reproducibly using standard DDA methodology.
Additionally, we chose 500 targets because this represents the limit
of the number of targets one could target with an inclusion list on
the Orbitrap Elite MS in a single nano-LC–MS/MS analysis. Each
target peptide’s EO was calculated from the three discovery
experiments they were identified in, combined into a single EOM, and
then uploaded to the instrument prior to targeting (Figure 4B). The same vial of mousepeptides, kept at 4 °C
in an autosampler, used in the discovery experiments was then analyzed
using DDA, followed by an accurate mass inclusion list (INC) and last
intelligent data acquisition (IDA). This sequence was repeated for
six technical replicates. On average, only 256 (51%) of the targeted
peptides were identified in each of the DDA experiments (Figure 5A, 1% peptide-level FDR, error bars represent one
σ). This is consistent with targeted peptides, as they originated
from three of the original six discovery experiments (50%). The accurate
mass inclusion list modestly increases identifications to an average
of 280 (56%) targets per experiment. The biggest improvement is realized
with IDA, where 440 of 500 targets (88%) were identified on average
in each nano-LC–MS/MS experiment. When all six experiments
for each method were combined and analyzed together, IDA identified
483 target peptides at least once, while INC identified 456 and DDA
identified 426 at least once. Notably, 69 of the targets were only
identified by IDA and not by DDA, while only 13 unique targets were
discovered by DDA and not identified by IDA (Figure 5C and Supplemental Figure 5 in the Supporting
Information). These results indicate that both DDA and the
inclusion list undersampled the targeted peptides; presumably this
is a result of either low S/N or poor charge-state determination of
the precursor ions in the MS1. The IDA method avoids both
of these issues by sampling regardless of MS1 detection,
depending only on the target’s expected elution ordering.
Figure 5
Subset
of 500 mouse peptides were targeted with DDA, an accurate
mass inclusion list (INC), and our intelligent data acquisition (IDA)
method in hexplicate. (A) IDA identified the most target peptides
of the three methods (error bars represent the 1 σ). (B) Discovery
identifications by three methods show only a slight decline in the
total number of peptides identified using IDA. (C) 74% of the targets
were observed in all six technical replicates when IDA was used compared
with <20% for the inclusion list or data-dependent acquisition.
Subset
of 500 mousepeptides were targeted with DDA, an accurate
mass inclusion list (INC), and our intelligent data acquisition (IDA)
method in hexplicate. (A) IDA identified the most target peptides
of the three methods (error bars represent the 1 σ). (B) Discovery
identifications by three methods show only a slight decline in the
total number of peptides identified using IDA. (C) 74% of the targets
were observed in all six technical replicates when IDA was used compared
with <20% for the inclusion list or data-dependent acquisition.Since the IDA method enables simultaneous
DDA MS/MS sampling, comparisons
of the total number of peptide identifications between the three acquisition
methods can be made (Figure 5B). Each method
produced nearly the same number of PSMs. A difference appears at the
unique PSMs level (i.e., peptides), where both DDA and INC produced
similar number of identifications (∼5800 peptides) but dropped
to ∼3700 using IDA. We attributed this decline primarily to
the redundant sampling of target peptides with the IDA method compared
with the other methods. IDA identified each target 4.3 times on average,
compared with 0.59 and 0.63 for DDA and INC, respectively, a ∼7:1
ratio. This is in agreement with the ratio of dynamic exclusion times
between methods; IDA uses 5 s for each target compared with the longer
dynamic exclusion time (35 s, 1:7) used in the DDA and INC methods.
The oversampling of target peptides in IDA increases the likelihood
of identification. We feel that it is an acceptable trade-off between
maximizing reproducibility for a subset of peptides and a slight decline
in total identified peptides. The increased reproducibility is demonstrated
in Figure 5C; the IDA method identified 370
(74%) of the same peptides in all six experiments. The same cannot
be said for DDA or INC; they managed to identify only 69 and 84 peptides
in all six experiments, respectively. This represents an increase
of over 340% in the number of peptide targets that were seen in all
replicates.
Improved Reproducibility in Biological Systems
All
data previously described have consisted of technical replicates of
the same sample, injected with the same HPLC and analyzed using the
same MS. These technical replicates are ideal to develop acquisitions
methods on, primarily because the same peptides should exist in each
injection, which removes sample variability from obfuscating the results.
However, biological replication in proteomic studies is becoming more
prevalent due to the increase in statistical power it affords. Four
male C57BL/B6 mice were sacrificed at 10 weeks, eight organs were
harvested, and peptides from a tryptic digestion of each organ were
labeled with a TMT 8-plex tag to test whether intelligent data acquisition
improves reproducibility in biological systems (Figure 6A,B). The tagged peptides from each mouse were mixed together
and separated over a 165 min gradient and sampled using a DDA top-15
method to generate a list of peptide targets. An average of 8683 ±
313 peptide sequences were identified in each mouse for a total of
13 502 unique sequences. Of these, only 3969 (29.4%) peptides
were identified in every mouse (Figure 6C).
A subset of 1500 peptides was selected from the peptides detected
in either two or three of four mice and sorted based on their assigned
EOs (Figure 6D). Here peptide targets were
chosen to be evenly distributed in the EO dimension to limit the number
of coeluting peptides at given point. In subsequent targeting experiments,
each mouse sample was analyzed twice, once using DDA and the other
IDA, for a total of eight experiments. When the DDA targeting experiments
were analyzed, an average of 810 (54%) target peptides were identified
(Figure 7A, 1% peptide-level FDR, error bars
represent one σ). Using IDA, this number increases to 1072 (71.5%).
In total, over half of the targeted peptides (826, 55.1%) were identified
in all four mice when using IDA compared with only 30.6% (459) using
DDA (Figure 7B). The IDA method represents
a nearly 80% improvement over DDA in the number of peptide targets
it identifies in all mice. This increase in reproducible identification
improves the quantitative results as well. When each tissue is compared
with liver, the number of quantified peptides that are statistically
significant (p value <0.05, Student’s t test with Storey Correction) is on average 227 greater
with IDA compared with DDA (Figure 7C). For
example, when the quantitative data for muscle is compared with that
for liver (Figure 7D), IDA produced 826 significantly
different peptides while only 531 were significant for DDA, a 56%
increase. This can be directly attributed to increased reproducibility
in identification across biological samples.
Figure 6
(A) Four C57Bl/6 mice
were sacrificed at 10 weeks of age, and eight
organs were harvested from each mouse. (B) Peptides resulting from
a tryptic digestion of lysates from each organism were labeled with
TMT 8-plex tags in a randomized order. (C) 165 min nano-LC–MS/MS
experiments using DDA top-15 method identified only 3969 peptides
in all four mice. (D) A subset of 1500 peptide targets was selected
from peptides detected in only two or three of all four mice.
Figure 7
(A) In four subsequent nano-LC–MS/MS
experiments, only 810
of 1500 mouse peptide targets were identified with DDA. The identifications
improve to 1072 when IDA is used (error bars represent 1 σ).
(B) In total, 826 target peptides were identified in all four mice
when IDA was used to target. This number falls to only 459 peptides
when DDA is used. (C) The number of statistically significant differences
(p value <0.05) quantified when each tissue is
compared with liver is greater with IDA than DDA. (D) When comparing
target peptides identified in the muscle versus the liver, IDA quantified
826 statistically significant peptides compared with only 531 when
DDA was used, a 56% increase.
(A) Four C57Bl/6 mice
were sacrificed at 10 weeks of age, and eight
organs were harvested from each mouse. (B) Peptides resulting from
a tryptic digestion of lysates from each organism were labeled with
TMT 8-plex tags in a randomized order. (C) 165 min nano-LC–MS/MS
experiments using DDA top-15 method identified only 3969 peptides
in all four mice. (D) A subset of 1500 peptide targets was selected
from peptides detected in only two or three of all four mice.(A) In four subsequent nano-LC–MS/MS
experiments, only 810
of 1500 mousepeptide targets were identified with DDA. The identifications
improve to 1072 when IDA is used (error bars represent 1 σ).
(B) In total, 826 target peptides were identified in all four mice
when IDA was used to target. This number falls to only 459 peptides
when DDA is used. (C) The number of statistically significant differences
(p value <0.05) quantified when each tissue is
compared with liver is greater with IDA than DDA. (D) When comparing
target peptides identified in the muscle versus the liver, IDA quantified
826 statistically significant peptides compared with only 531 when
DDA was used, a 56% increase.
Conclusions
The ability to identify the same peptides
in multiple experiments
reproducibly is increasingly important in proteomic analysis because
increased statistical power is demanded. Historically, the most common
acquisition method, DDA, has been used to sample large portions of
proteomes, but it lacks adequate peptide identification reproducibility.
We expand upon our previous IDA work and introduce the concept of
using EO as a way to schedule and target peptides. Here we have described
an online EOA algorithm that automatically adjusts to different chromatographic
conditions to deliver consistent scheduling and robust reproducibility.
The method is capable of targeting large number of peptides (>500)
in a single run with minimal upfront preparation and effort. Using
this method, we have shown improvements in peptide identification
overlap among multiple experiments compared with DDA (88% compared
with 50% identification overlap in six experiments). The EOA algorithm
is capable of improving reproducibility even for highly variable samples.
In four mice, our method was able to identify 806 target peptides
compared with only 459 using normal DDA sampling.We believe
that such technologies can now be applied to traditional
SRM methods that use triple quadrupole mass spectrometers. Here periodic
full MS scans could be performed and analyzed to calculate the current
EO and adjust the timing of the SRM transitions. One challenge would
be the decreased specificity in determining EO from low-resolution
scans. However, using a more adaptable metric for scheduling (elution
ordering vs RT) could potentially increase the portability and robustness
of SRM methods while reducing development time. Additionally, improved
quantitative results could be obtained by deliberately oversampling
one particular peptide target during its elution and quantifying with
PRM or label-free methods.Unlike SRM methods, where every MS/MS
scan is predetermined, a
novel aspect of our method is the flexibility of combining both targeted
and discovery analysis in a single nano-LC–MS/MS experiment.
The MS intelligently switches between targeted and discovery modes
depending on what peptides are currently eluting, without any human
intervention. In one experiment, over 3700 unique mousepeptides were
discovered while simultaneously targeting 500 other peptides. Such
hybrid MS methods enable both a focused and holistic view on the same
sample, something that is welcomed when sample-limited.Until
comprehensive proteomic coverage is routinely obtained, targeted
methods will be heavily used and developed. We have explored increasing
the intelligence of MS methods as a means to improve the throughput
and power of peptide targeting without sacrificing the novel discovery
aspect of DDA sampling. Future work includes improvements to the determination
of EOs, increasing the success rate of target identification, exploring
additional quantitative strategies (e.g., PRM, label-free), and maximizing
the throughput to target larger portions of the proteome without laborious
upfront work.
Authors: Lewis Y Geer; Sanford P Markey; Jeffrey A Kowalak; Lukas Wagner; Ming Xu; Dawn M Maynard; Xiaoyu Yang; Wenyao Shi; Stephen H Bryant Journal: J Proteome Res Date: 2004 Sep-Oct Impact factor: 4.466
Authors: Konstantinos Petritis; Lars J Kangas; Bo Yan; Matthew E Monroe; Eric F Strittmatter; Wei-Jun Qian; Joshua N Adkins; Ronald J Moore; Ying Xu; Mary S Lipton; David G Camp; Richard D Smith Journal: Anal Chem Date: 2006-07-15 Impact factor: 6.986
Authors: Amelia C Peterson; Jason D Russell; Derek J Bailey; Michael S Westphall; Joshua J Coon Journal: Mol Cell Proteomics Date: 2012-08-03 Impact factor: 5.911
Authors: Johannes Graumann; Richard A Scheltema; Yong Zhang; Jürgen Cox; Matthias Mann Journal: Mol Cell Proteomics Date: 2011-12-13 Impact factor: 5.911
Authors: Brian K Erickson; Christopher M Rose; Craig R Braun; Alison R Erickson; Jeffrey Knott; Graeme C McAlister; Martin Wühr; Joao A Paulo; Robert A Everley; Steven P Gygi Journal: Mol Cell Date: 2017-01-05 Impact factor: 17.970
Authors: Brian K Erickson; Julian Mintseris; Devin K Schweppe; José Navarrete-Perea; Alison R Erickson; David P Nusinow; Joao A Paulo; Steven P Gygi Journal: J Proteome Res Date: 2019-02-04 Impact factor: 4.466
Authors: Devin K Schweppe; Jimmy K Eng; Qing Yu; Derek Bailey; Ramin Rad; Jose Navarrete-Perea; Edward L Huttlin; Brian K Erickson; Joao A Paulo; Steven P Gygi Journal: J Proteome Res Date: 2020-04-06 Impact factor: 4.466
Authors: Simion Kreimer; Mikhail E Belov; William F Danielson; Lev I Levitsky; Mikhail V Gorshkov; Barry L Karger; Alexander R Ivanov Journal: J Proteome Res Date: 2016-09-07 Impact factor: 4.466
Authors: Gregory K Potts; Emily A Voigt; Derek J Bailey; Christopher M Rose; Michael S Westphall; Alexander S Hebert; John Yin; Joshua J Coon Journal: Anal Chem Date: 2016-02-25 Impact factor: 6.986