Literature DB >> 24611583

Intelligent data acquisition blends targeted and discovery methods.

Derek J Bailey¹, Molly T McDevitt, Michael S Westphall, David J Pagliarini, Joshua J Coon.

Abstract

A mass spectrometry (MS) method is described here that can reproducibly identify hundreds of peptides across multiple experiments. The method uses intelligent data acquisition to precisely target peptides while simultaneously identifying thousands of other, nontargeted peptides in a single nano-LC-MS/MS experiment. We introduce an online peptide elution order alignment algorithm that targets peptides based on their relative elution order, eliminating the need for retention-time-based scheduling. We have applied this method to target 500 mouse peptides across six technical replicate nano-LC-MS/MS experiments and were able to identify 440 of these in all six, compared with only 256 peptides using data-dependent acquisition (DDA). A total of 3757 other peptides were also identified within the same experiment, illustrating that this hybrid method does not eliminate the novel discovery advantages of DDA. The method was also tested on a set of mice in biological quadruplicate and increased the number of identified target peptides in all four mice by over 80% (826 vs 459) compared with the standard DDA method. We envision real-time data analysis as a powerful tool to improve the quality and reproducibility of proteomic data sets.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 24611583 PMCID： PMC3983381 DOI： 10.1021/pr401278j

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Large-scale proteomic studies make use of a variety of tools and techniques to achieve depth and wide coverage of proteomes. The most popular method for sequencing proteomes is shotgun sequencing where peptides are digested from extracted proteins, separated with chromatography (HPLC), and then mass-analyzed using mass spectrometry (MS).[1,2] Since complex proteomes can encompass thousands of proteins, leading to millions of peptides, deciding how to allocate the limited mass spectrometer bandwidth is key to successful analysis.[3] By far the most successful method for this time management is data-dependent acquisition (DDA), where intact peptide precursors are first mass-analyzed (MS1), specific m/z features are then selected to undergo fragmentation, and finally the fragment ions are mass-analyzed again (MS/MS). This process is repeated throughout the LC separation, resulting in a large collection of MS and MS/MS spectra. Peptides are eventually identified from the fragmentation spectra and then assembled into protein groups.[4−8] This approach has produced outstanding results in the past decade, but due to a variety of reasons (e.g., large protein dynamic range, speed of MS instrumentation, separation efficiency, etc.) undersampling of proteomes is very common. In other words, not every peptide is identified in every nano-LC–MS/MS experiment. Incomplete data sets limit the questions researchers can answer; in particular, when biological replication is used to increase statistical power, many measurements become worthless if they cannot be measured reproducibly.[9] Because proteomics seeks to answer global biological questions, reproducible peptide identification between data sets is mandated.[10−12] Many studies have outlined the problem of poor peptide reproducibility.[13−17] Aebersold succinctly summarized that irreproducibility is a multifaceted issue, depending on user experience, equipment, and data analysis, among others.[18] He outlines that there are two main approaches in tackling irreproducibility. First, exhaustively identify every peptide in a sample—an approach that is becoming more feasible as technology improves.[19−21] The more common approach, as many other researchers have embarked on, is to focus on a smaller subset of peptides and to thoroughly identify and quantify those using targeted methods.[22] Methods such as selected reaction monitoring (SRM)[23] are powerful and reproducible but are low-throughput, targeting a few hundred peptides at most in a single nano-LC–MS/MS experiment.[24−27] Targeted methods almost exclusively rely on retention-time-based scheduling to improve identification reproducibility and throughput, segmenting the MS duty cycle among the target peptides. In SRM methods, a series of MS/MS transitions for each targeted peptide is automatically collected at the appropriate retention time (RT), removing the dependence on MS1 detection. This requires precise knowledge of the peptide RT for the LC–MS system and is low-throughput because only one set of transitions is monitored at a given point in time. Recent work on intelligent SRM (iSRM) increases throughput by monitoring only a subset of transitions for each target, switching to normal SRM when these transitions are detected.[28] We sought to expand upon the idea of intelligent real-time switching of methods by combining the enhanced reproducibility of targeted scheduled methods with the novel discovery advantages of DDA in a single hybrid method. Our goals were three-fold: first, to develop a method that increases the throughput of targeting; second, to replace retention-time based scheduling and its laborious method development with a more robust and straightforward peptide elution ordering; and last, to maintain the discovery aspect of DDA sampling while simultaneously targeting a subset of peptides. In the past decades, a few computational approaches have been aimed at solving the problem of poor reproducibility. The concept of accurate mass tags (AMTs) was first introduced by Smith et al. as a means to identify peptides in multiple runs based on accurate mass and RT.[29] This concept was further expanded with PepMiner and PEPPeR, tools for clustering features among multiple data sets.[30,31] Most notably, Prakash et al. introduced the concept of aligning multiple MS data sets based on peptide relative elution order (EO) into signal maps.[32] To date, these and other computational methods[33−38] have been performed postacquisition, attempting to improve already collected data. We seek to improve the reproducibility at the source by improving the algorithms the MS uses to select precursors to fragment. We and others have proposed using real-time data analysis and dynamic MS control as a means for improving the quality of acquired spectra.[39−41] These methods rely on determining peptide spectrum matches (PSMs) in real time and using those identifications to make informed, dynamic decisions. However, real-time identification has some setbacks: (1) MS/MS spectra are not always identified leaving the data incomplete, (2) wrongly assigned PSMs could negatively affect performance, and (3) a reduction in the instrument duty cycle decreases the number of MS/MS performed. These, and other issues, have lead us to investigate alternative ways for detecting peptides in real-time, primarily through accurate mass measurements. Here we present our findings on combining accurate mass, EOs, and real-time data analysis to improve the sampling reproducibility of the MS.

Experimental Procedures

Yeast Culture

Saccharomyces cerevisiae strain BY4741 was grown in yeast extract peptone dextrose media (YPD) (1% yeast extract, 2% peptone, 2% dextrose). A starter culture was added to 2 L of media and was propagated for ∼12 generations (20 h) to a total OD600 of ∼2. The cells were pelleted with centrifugation at 5000 rpm for 5 min, the supernatant was decanted, and the pellet was resuspended in chilled NanoPure water. Washing with water was repeated twice, and the final pelleting was performed at 5000 rpm for 10 min. The pellet was resuspended in lysis buffer composed of 50 mM Tris pH8, 8 M urea, 75 mM sodium chloride, 100 mM sodium butyrate, protease, and phosphatase inhibitor tablet (Roche). Cell lysing was performed with glass bead milling in a stainless-steel container (Retsch). A 2.5 mL aliquot of resuspended yeast was shaken with 2 mL of acid-washed glass beads at 30 Hz for 4 min, followed by 1 min of rest, for eight cycles.

Mouse Handling and Tissue Isolation

Four male C57BL/B6 mice were bred from in-house colonies and housed in an environmentally controlled facility with free access to water and standard rodent chow (Purina #5008). Mice were kept in accordance to the University of Wisconsin-Madison Research Animals Resource Center and NIH guidelines for care and use of laboratory animals. At 10 weeks of age, mice were sacrificed by decapitation after a 4 h fast. Eight tissues were dissected from the mice (cerebellum, cerebrum, kidney, heart, liver, lung, extensor digitorum longus, and spleen), flash frozen in liquid nitrogen, and stored at −80 °C. Tissues were homogenized in 1 mL of lysis buffer/100 mg tissue (8 M urea, 50 mM Tris, 100 mM NaCl, 1 mM CaCl2, 100 mM sodium butyrate, 5 μM MS-275, 0.2 μM SAHA, Roche protease, and phosphatase inhibitor tablets).

Sample Preparation

Protein was quantified by BCA (Pierce) and reduced with 5 mM dithiothreitol and incubated for 45 min at 55 °C. Alkylation was performed with 15 mM iodoacetamide for 30 min in the dark and quenched with 5 mM dithiothreitol. Urea concentration was diluted to 1.5 M with 50 mM Tris pH 8.0. Proteolytic digestion was performed by the addition of Trypsin (Promega), 1:50 enzyme to protein ratio, and incubated at ambient temperature overnight. For quantitative studies, the resulting peptides were labeled with TMT 8-plex (Pierce) isobaric tag and mixed.[42,43] All samples were desalted using C-18 solid-phase extraction (SPE) columns (Waters, Milford, MA) prior to nano-LC–MS/MS analysis.

Nano LC–MS/MS Analysis

Peptides were separated with online reverse-phase chromatography using a nanoACQUITY UPLC system (Waters, Milford, MA). Peptides were first loaded onto a precolumn (75 μm ID, 5 cm Magic C18 particles, Bruker, Michrom) for 10 min at 1 μL/min flow rates. Peptides were then separated on a 30 cm analytical column (75 μm ID, 5 cm Magic C18 particles) for either 100 or 165 min over a linear gradient from 8 to 35% acetonitrile at 300 nL/min. Mass analysis was performed on an LTQ Orbitrap Elite[44] mass spectrometer (Thermo Fisher Scientific, San Jose, CA) using 60 000 resolving power (RP) MS1 scans. Peptides selected for MS/MS analysis used a 2 Th isolation width, were fragmented with HCD (NCE = 35), and were analyzed in the Orbitrap at 15 000 RP or 30 000 RP for quantitative experiments. Unless otherwise noted, data-dependent analysis was performed selecting the top 15 most intense m/z features (charge state >1) for MS/MS analysis. Dynamic exclusion settings were enabled for 35 s at ±10 ppm mass window, 1 occurrence with a maximum of 500 exclusions at any given point in time. Automatic gain control (AGC) was enabled, and MS1 targets were set to 1 × 106 and MS/MS targets were set to 5 × 104. Accurate mass inclusion list experiments would prioritize MS/MS sampling from a list of targets at ±10 ppm mass tolerances. Remaining MS/MS events were filled with normal top-N DDA approaches. Intelligent data acquisition control was implemented using the ion trap control language (ITCL, Thermo Fisher Scientific), and the pseudocode of these modifications is included in the Supporting Information. In brief, following MS1 analysis, the spectra were analyzed using algorithms written in ITCL to select targets for MS/MS analysis (described herein). Any remaining MS/MS slots would be filled by the unmodified DDA firmware code. For information on implementing the modified firmware code, please contact Thermo Fisher Scientific. All nanoLC–MS/MS experiments in the Thermo .raw format are located on the Chorus Project Website (https://chorusproject.org/) under the ‘Elution Order Algorithm’ project.

Data Analysis

Thermo .raw files were processed using the Coon OMSSA Proteomic Analysis Software Suite (COMPASS)[45] and in-house software. In brief, raw files were converted to the dta file format (DTA Generator) and were searched using the Open Mass Spectrometry Search Algorithm (OMSSA, v 2.1.9).[46] Yeast data were searched against a target-decoy[47] database of yeast ORFs (www.yeastgenome.com, February 3, 2011) and mouse data from UniProt canonical database. Peptides were generated from a tryptic digestion with up to three missed cleavages, carbamidomethylation of cysteines as fixed modifications, and oxidation of methionines as variable modifications. For quantitative experiments, a fixed modification of 8-plex TMT tag was added to lysines and peptide n-terminus, with a variable modification of 8-plex TMT tag on tyrosines. Precursor mass tolerance was 100 ppm using the multiisotope function (-tem 4 -ti 4), and product ions were searched at 0.015 Da tolerances. Peptide spectral matches (PSMs) were reduced to unique peptide sequences (I/L ambiguity removed) and validated using FDR Optimizer based on q values and precursor mass accuracy (<10 ppm) at a 1% peptide-level false discovery rate (FDR).[48−50] Protein groups were constructed from peptide identifications according to the law of parsimony and filtered to a 1% protein-level FDR (Protein Hoarder). For quantitative data sets, peptides were quantified with TagQuant (v1.4) using the generated TMT 8-plex reporter ions, corrected for isotopic impurities, and normalized to total protein abundance. Quantitative significance (p value) was determined by the Student’s t test with Storey correction assuming equal variances.[51] Peptide EO determination algorithms were performed by custom software developed in C# with the Microsoft .NET Framework version 4.5. This software is available for download by visiting www.chem.wisc.edu/∼coon/software.php.

Results and Discussion

Irreproducible Peptide Identification

In DDA peptide precursors are selected for fragmentation based on intensity in a MS1 survey scan. This straightforward approach has proven to be a simple and powerful technique. However, it is pestered with inconsistent sampling and therefore irregular peptide identification between experiments. The DDA method is inherently stochastic in nature, depending heavily on the consistency of the input data (MS1) to deliver reproducible peptide identification (MS/MS). Even the slightest change in the chromatography or ionization efficiencies will have repercussions on the collection of the whole data set, as selecting m/z features for MS/MS analysis is often dependent on previous decisions (e.g., dynamic exclusion). To characterize the extent these minor changes have on the reproducibility of peptide identifications, six replicate injections of a tryptic digest of yeast whole cell lysate were analyzed using DDA on the same nano-LC–MS/MS system over a span of 10 days. On average, each experiment identified 13 289 ± 340 unique peptide sequences (I/L ambiguity removed) at a 1% peptide-level FDR, indicating a highly consistent separation and nearly identical instrument performance. Of the 23 919 unique peptides identified in total, only 5404 (22.6%) of those peptide were identified in all six experiments (Figure 1). A significant portion were only identified once (7474, 31.2%), while the remaining peptides were divided between two and five experiments. This clearly demonstrates the irreproducibility of DDA sampling on the same peptide solution. The reproducibility of identified protein groups fares better; 1708 of 3054 (56%) protein groups were identified in every experiment. The higher overlap percentage is because many different peptides can make up one protein group, minimizing the importance of identifying the same peptides in all experiments. However, post-translation modification (PTM) analysis requires identification of the same sites to compare between experiments, demanding the need for high peptide overlap. PTM analysis and quantitation is becoming more prominent in the literature, thus making this a growing problem in the field. Two reasons can be attributed to the poor reproducibility of stochastic DDA sampling. First, precursors having low signal-to-noise (S/N) are affected first by changes in chromatography and ionization. For example, a precursor with a maximal S/N of four may have been sampled and identified in one experiment, but in the next experiment, the S/N may have dropped below the detection threshold and excluded from being sampled. This is evident when 8883 MS1 features from peptides identified in one or all of the six experiments were examined for their maximal S/N (Supplemental Figure 1 in the Supporting Information). For peptides identified once, 2707 (30.5%) had a maximal S/N ≤ 4, while only 814 (9.2%) precursors identified in every experiment had similar maximum S/N. The other reason for inconsistent peptide identification is increased MS1 spectral complexity, specifically its effect on charge-state assignment. In proteomic MS/MS workflows, precursors are often only selected when they exhibit a well-defined charge state—usually where z > 1, as singly charged precursors fragment poorly and usually do not lead to positive identifications. Increases in spectral complexity hinder the charge-state determination algorithms, especially for low S/N precursors. This results in skipping precursors even if its signal-to-noise is above the sampling threshold.

Figure 1

Overlap of peptide identification among the analysis of six technical replicates. Six nano-LC–MS/MS experiments produced 23 919 unique peptide identifications in total, but only one-fifth of the identifications were observed in all six replicates. A large percentage (31.2%) of the peptides were only detected in one of the six experiments.

Retention-Time-Based Targeting

When good peptide identification reproducibility is needed, RT-based targeting, that is, scheduling, has been the method of choice. Here peptides of interest are assigned an expected elution time and MS/MS is triggered, regardless of MS1 detection, during the appropriate time range. This avoids the two issues with DDA sampling previously described and enables much higher reproducibility. However, such methods are laborious to construct and maintain—identical LC and MS parameters must be kept between experiments to minimize any variances in RTs of the peptides. To assess the degree of variance in peptide RTs that occurs in normal nano-LC–MS/MS experiments, two of the yeast DDA experiments described above, performed 10 days apart, were compared. The first experiment (July 22, D0) produced 13 529 unique peptides, and the second experiment (July 31, D9) identified 13 433 yeast peptides. Together, 7589 peptides were in common and the apex RT of their elution in each experiment is plotted in Figure 2A. The relationship between RTs of matched peptides is highly linear (R2 = 0.9989) but has a nonunity slope and nonzero intercept (m = 1.033; b = −0.647). While the slope is very close to 1, even the slightest deviation (0.033), compounded over time, leads to large RT differences late in the separation (e.g., ∼1.6 min shift at 70 min). On the whole, the average RT deviation was nearly 1 min (μ = −0.805 min) with a broad distribution over a 2 min range (Figure 2B). Typically, the assigned peptide elution times must be corrected to encompass this shift.

Figure 2

To assess the deviation in retention times for matched samples, we ran two identical nano-LC–MS/MS experiments 10 days apart on the same LC–MS system. (A) The relationship between apex retention times of the 7589 unique peptides common between experiments displays a high degree of linearity (R2 = 0.9989) but a skewed slope and nonzero intercept (m = 1.033; b = −0.647). (B) Average deviation from unity was nearly a minute off (μ = −0.805 min), with a broad distribution over 2 min wide. (C) Peptides ranked by their relative elution order exhibit a normal distribution around zero (μ = −1.097). We hypothesize that—due to the degree of linearity in peptide RTs—we could avoid these corrections by scheduling peptides based on their relative EO, opposed to their absolute RT. Under similar LC conditions (i.e., same particles, temperature, column length, phase, etc.) peptides elute in the same relative order regardless of separation duration or slope. For example, if peptide ‘A’ elutes before peptide ‘B’ in a 30 min LC gradient, the same ordering is preserved with a 60 min LC gradient, even if the absolute RTs vary greatly. When many peptides’ EOs are taken into account (e.g., thousands of peptides), they provide a simple way to correct for elution variation dynamically. This is evident when we took the 7589 peptides and rank ordered them based on their apex RTs for both the D0 and D9 experiments and plotted the difference between matched peptides (Figure 2C). Here the values are normally distributed around zero (μ = −1.097) with a full width at half-maximum (fwhm) of only ∼100. EO can be useful even under extreme differences under chromatographic conditions as well. To simulate dynamic chromatographic conditions, we separated yeast peptides under two different LC gradient profiles. The resulting peptide identifications were again matched between the runs, and the RT difference was plotted (Supplemental Figure 2A in the Supporting Information). These data show an average deviation of 10 min between the two gradients (Supplemental Figure 2B in the Supporting Information), but when ranked by their EOs, the two experiments show a linear slope of 1 with a normal distribution of ranked EOs around zero (Supplemental Figure 2C,D in the Supporting Information).

Real-Time Elution Ordering Alignment

We reasoned that using EO could improve the irreproducible sampling of DDA, similarly to scheduled methods, but on a larger scale and more robustly. The question shifts from “What RT is it?” as scheduled methods ask, to “What is the current EO?” By knowing which peptides are currently eluting from the LC, combined with the a priori knowledge of their EO, we predict with high fidelity what peptides are going to subsequently elute. Prior knowledge is needed of the sample to adequately calculate the EOs of the peptides in the sample. With time-based scheduled methods, many cursory experiments are performed to optimize the RTs of the targeted peptides. To reduce variances in RTs, it is vital that these initial experiments are conducted exactly the same as the targeted experiments. In stark contrast, EOs can be determined using a variety of methods. First, much work has been devoted to determining peptide hydrophobicities from theoretical calculations of the amino acid sequence.[52−55] A simple list of peptides, ordered by their hydrophobicities, can produce a highly linear elution ordering. Second, previously collected data of the sample can produce an accurate elution ordering as long as the LC conditions are similar enough. This enables the combination of multiple data sets to produce a single EO versus m/z map (elution order map, EOM), regardless of their individual separation durations. This is accomplished by rank ordering all peptide identifications in a given run and normalizing their orderings between 0 and 100 (where 100 represents the last eluting peptide). These normalized values are then matched between experiments and aligned using a simple algorithm to produce the final EOM as shown in Figure 3A. Lastly, the most robust method for determining peptide EOs is to perform a discovery experiment right before the targeted experiment. Regardless of how EO is determined, the final EOM is uploaded onto the instrument and is accessed throughout the course of the subsequent analyses.

Figure 3

Real-time elution order alignment algorithm. 46.3 min into a nano-LC–MS/MS experiment, an MS1 scan is performed (A) and m/z features are matched to a 2D ion map stored on the instrument. (B) 21 of the peaks match 80 features in the ion map at a 10 ppm tolerance. Of these, over half (41 of 80) were mapped to one elution order bin (51 elution order). (C) A rolling elution order range is continually updated throughout the nano-LC–MS/MS experiment. Prior to targeted analysis, a list of peptide targets, along with their relative EOs, is also uploaded to the instrument (Figure 4B). Each target is assigned an EO range (first and last appearance) depending on its length of elution in the discovery experiments. (See Figure 4C for zoom in.) Maintaining a dynamic EO range for each peptide is needed as different peptides elute for different amount of time during the separation. During the targeted analysis, instead of relying on absolute RT to trigger targeted MS/MS scans, determining the current EO becomes the main goal of the method. We have designed an online peptide elution order alignment (EOA) algorithm that takes a single MS1 spectrum and computes the current EO therefrom. In brief, following MS1 acquisition, the EOA algorithm takes the most intense m/z feature and extracts all EO values from the uploaded EOM at a narrow m/z tolerance (e.g., 10 ppm) (Figure 3A). Each m/z feature is matched in a similar fashion, and the resulting EO values are stored in a separate array (Figure 3B). In this example MS1, 21 m/z features matched a total of 80 EO values. When binned into 1 EO-wide bins, 41 of these values are contained within a single bin at 50 EO units. This indicates with high confidence that the current EO is somewhere near 50. To determine the EO precisely, the algorithm then calculates the 95% confidence interval around the max EO bin and stores the minimum (50.02) and maximum (51.64) EO. This process is repeated for each MS1, and over time the calculated EO range constructs a rolling average, as shown in Figure 3C. The EOA algorithm is expedient, taking on average 26 ms per MS1 to execute and does not induce a statistically significant change in the total number of MS/MS scans performed (Supplemental Figure 3 in the Supporting Information).

Figure 4

Following determination of the current elution order range (A), target peptides (B) sharing a similar elution order value are selected (C, rectangles represent individual peptides). Peptide targets within the elution order range are filtered based on when they were last sampled for MS/MS (D), leaving only targets that have been waiting the longest (e.g., > 5 s, highlighted rectangles). Those filtered peptides are then immediately sampled by MS/MS, regardless of MS1 detection (D). Unfilled MS/MS events are automatically filled with m/z features picked by the intensity-based DDA algorithm using normal sampling parameters (e.g., dynamic exclusion, intensity threshold, charge state exclusion, etc.). Once the current EO range is determined, peptides sharing a similar EO are selected for MS/MS analysis. In brief, the current EO range is intersected with the target peptides already uploaded on the instrument (Figure 4B), and peptides whose EO overlaps the current EO range are stored as potential targets (Figure 4C). These peptides have a high probability of imminently eluting because they share very similar EO values with the current overall EO value. To prevent oversampling of any given target, potential targets are filtered based on how long since they were last sampled. Peptides that have been waiting the longest (e.g., >5 s) are automatically triggered for MS/MS analysis regardless of MS1 detection. Unfilled MS/MS events are then populated using normal DDA top-N approaches, excluding any m/z previously selected to be targeted (Figure 4D). This data collection scheme enables repetitive, consistent targeting of multiple peptides over their elution while allowing DDA scans to facilitate discovery. The EOA algorithm is compatible with other quantitative strategies such as parallel reaction monitoring (PRM),[56,57] where peptide targets are repeatedly sampled (MS/MS) over their elution, and the resulting fragment ions are extracted to provide quantitative information (Supplemental Figure 4 in the Supporting Information).

Improving Peptide Identification in Multiple Experiments

We reasoned that the EOA algorithm would improve the reproducibility of peptide identification across multiple runs. Additionally, we increased the proteomic complexity by using a mammalian system (mouse) instead of yeast to determine how sample complexity affects the algorithm. Here a male C57BL/B6 mouse was sacrificed at 10 weeks, eight organs were harvested, and peptides from a tryptic digestion of each organ were labeled with a TMT 8-plex tag. First, six DDA top-15 nano-LC–MS/MS experiments were performed on the peptide sample. From the results of these discovery experiments, 500 peptides—identified in only three of the six experiments—were randomly selected to serve as peptide targets. These targets represent peptides that are difficult to identify reproducibly using standard DDA methodology. Additionally, we chose 500 targets because this represents the limit of the number of targets one could target with an inclusion list on the Orbitrap Elite MS in a single nano-LC–MS/MS analysis. Each target peptide’s EO was calculated from the three discovery experiments they were identified in, combined into a single EOM, and then uploaded to the instrument prior to targeting (Figure 4B). The same vial of mouse peptides, kept at 4 °C in an autosampler, used in the discovery experiments was then analyzed using DDA, followed by an accurate mass inclusion list (INC) and last intelligent data acquisition (IDA). This sequence was repeated for six technical replicates. On average, only 256 (51%) of the targeted peptides were identified in each of the DDA experiments (Figure 5A, 1% peptide-level FDR, error bars represent one σ). This is consistent with targeted peptides, as they originated from three of the original six discovery experiments (50%). The accurate mass inclusion list modestly increases identifications to an average of 280 (56%) targets per experiment. The biggest improvement is realized with IDA, where 440 of 500 targets (88%) were identified on average in each nano-LC–MS/MS experiment. When all six experiments for each method were combined and analyzed together, IDA identified 483 target peptides at least once, while INC identified 456 and DDA identified 426 at least once. Notably, 69 of the targets were only identified by IDA and not by DDA, while only 13 unique targets were discovered by DDA and not identified by IDA (Figure 5C and Supplemental Figure 5 in the Supporting Information). These results indicate that both DDA and the inclusion list undersampled the targeted peptides; presumably this is a result of either low S/N or poor charge-state determination of the precursor ions in the MS1. The IDA method avoids both of these issues by sampling regardless of MS1 detection, depending only on the target’s expected elution ordering.

Figure 5

Subset of 500 mouse peptides were targeted with DDA, an accurate mass inclusion list (INC), and our intelligent data acquisition (IDA) method in hexplicate. (A) IDA identified the most target peptides of the three methods (error bars represent the 1 σ). (B) Discovery identifications by three methods show only a slight decline in the total number of peptides identified using IDA. (C) 74% of the targets were observed in all six technical replicates when IDA was used compared with <20% for the inclusion list or data-dependent acquisition. Since the IDA method enables simultaneous DDA MS/MS sampling, comparisons of the total number of peptide identifications between the three acquisition methods can be made (Figure 5B). Each method produced nearly the same number of PSMs. A difference appears at the unique PSMs level (i.e., peptides), where both DDA and INC produced similar number of identifications (∼5800 peptides) but dropped to ∼3700 using IDA. We attributed this decline primarily to the redundant sampling of target peptides with the IDA method compared with the other methods. IDA identified each target 4.3 times on average, compared with 0.59 and 0.63 for DDA and INC, respectively, a ∼7:1 ratio. This is in agreement with the ratio of dynamic exclusion times between methods; IDA uses 5 s for each target compared with the longer dynamic exclusion time (35 s, 1:7) used in the DDA and INC methods. The oversampling of target peptides in IDA increases the likelihood of identification. We feel that it is an acceptable trade-off between maximizing reproducibility for a subset of peptides and a slight decline in total identified peptides. The increased reproducibility is demonstrated in Figure 5C; the IDA method identified 370 (74%) of the same peptides in all six experiments. The same cannot be said for DDA or INC; they managed to identify only 69 and 84 peptides in all six experiments, respectively. This represents an increase of over 340% in the number of peptide targets that were seen in all replicates.

Improved Reproducibility in Biological Systems

All data previously described have consisted of technical replicates of the same sample, injected with the same HPLC and analyzed using the same MS. These technical replicates are ideal to develop acquisitions methods on, primarily because the same peptides should exist in each injection, which removes sample variability from obfuscating the results. However, biological replication in proteomic studies is becoming more prevalent due to the increase in statistical power it affords. Four male C57BL/B6 mice were sacrificed at 10 weeks, eight organs were harvested, and peptides from a tryptic digestion of each organ were labeled with a TMT 8-plex tag to test whether intelligent data acquisition improves reproducibility in biological systems (Figure 6A,B). The tagged peptides from each mouse were mixed together and separated over a 165 min gradient and sampled using a DDA top-15 method to generate a list of peptide targets. An average of 8683 ± 313 peptide sequences were identified in each mouse for a total of 13 502 unique sequences. Of these, only 3969 (29.4%) peptides were identified in every mouse (Figure 6C). A subset of 1500 peptides was selected from the peptides detected in either two or three of four mice and sorted based on their assigned EOs (Figure 6D). Here peptide targets were chosen to be evenly distributed in the EO dimension to limit the number of coeluting peptides at given point. In subsequent targeting experiments, each mouse sample was analyzed twice, once using DDA and the other IDA, for a total of eight experiments. When the DDA targeting experiments were analyzed, an average of 810 (54%) target peptides were identified (Figure 7A, 1% peptide-level FDR, error bars represent one σ). Using IDA, this number increases to 1072 (71.5%). In total, over half of the targeted peptides (826, 55.1%) were identified in all four mice when using IDA compared with only 30.6% (459) using DDA (Figure 7B). The IDA method represents a nearly 80% improvement over DDA in the number of peptide targets it identifies in all mice. This increase in reproducible identification improves the quantitative results as well. When each tissue is compared with liver, the number of quantified peptides that are statistically significant (p value <0.05, Student’s t test with Storey Correction) is on average 227 greater with IDA compared with DDA (Figure 7C). For example, when the quantitative data for muscle is compared with that for liver (Figure 7D), IDA produced 826 significantly different peptides while only 531 were significant for DDA, a 56% increase. This can be directly attributed to increased reproducibility in identification across biological samples.

Figure 6

Figure 7

(A) In four subsequent nano-LC–MS/MS experiments, only 810 of 1500 mouse peptide targets were identified with DDA. The identifications improve to 1072 when IDA is used (error bars represent 1 σ). (B) In total, 826 target peptides were identified in all four mice when IDA was used to target. This number falls to only 459 peptides when DDA is used. (C) The number of statistically significant differences (p value <0.05) quantified when each tissue is compared with liver is greater with IDA than DDA. (D) When comparing target peptides identified in the muscle versus the liver, IDA quantified 826 statistically significant peptides compared with only 531 when DDA was used, a 56% increase.

(A) Four C57Bl/6 mice were sacrificed at 10 weeks of age, and eight organs were harvested from each mouse. (B) Peptides resulting from a tryptic digestion of lysates from each organism were labeled with TMT 8-plex tags in a randomized order. (C) 165 min nano-LC–MS/MS experiments using DDA top-15 method identified only 3969 peptides in all four mice. (D) A subset of 1500 peptide targets was selected from peptides detected in only two or three of all four mice. (A) In four subsequent nano-LC–MS/MS experiments, only 810 of 1500 mouse peptide targets were identified with DDA. The identifications improve to 1072 when IDA is used (error bars represent 1 σ). (B) In total, 826 target peptides were identified in all four mice when IDA was used to target. This number falls to only 459 peptides when DDA is used. (C) The number of statistically significant differences (p value <0.05) quantified when each tissue is compared with liver is greater with IDA than DDA. (D) When comparing target peptides identified in the muscle versus the liver, IDA quantified 826 statistically significant peptides compared with only 531 when DDA was used, a 56% increase.

Conclusions

The ability to identify the same peptides in multiple experiments reproducibly is increasingly important in proteomic analysis because increased statistical power is demanded. Historically, the most common acquisition method, DDA, has been used to sample large portions of proteomes, but it lacks adequate peptide identification reproducibility. We expand upon our previous IDA work and introduce the concept of using EO as a way to schedule and target peptides. Here we have described an online EOA algorithm that automatically adjusts to different chromatographic conditions to deliver consistent scheduling and robust reproducibility. The method is capable of targeting large number of peptides (>500) in a single run with minimal upfront preparation and effort. Using this method, we have shown improvements in peptide identification overlap among multiple experiments compared with DDA (88% compared with 50% identification overlap in six experiments). The EOA algorithm is capable of improving reproducibility even for highly variable samples. In four mice, our method was able to identify 806 target peptides compared with only 459 using normal DDA sampling. We believe that such technologies can now be applied to traditional SRM methods that use triple quadrupole mass spectrometers. Here periodic full MS scans could be performed and analyzed to calculate the current EO and adjust the timing of the SRM transitions. One challenge would be the decreased specificity in determining EO from low-resolution scans. However, using a more adaptable metric for scheduling (elution ordering vs RT) could potentially increase the portability and robustness of SRM methods while reducing development time. Additionally, improved quantitative results could be obtained by deliberately oversampling one particular peptide target during its elution and quantifying with PRM or label-free methods. Unlike SRM methods, where every MS/MS scan is predetermined, a novel aspect of our method is the flexibility of combining both targeted and discovery analysis in a single nano-LC–MS/MS experiment. The MS intelligently switches between targeted and discovery modes depending on what peptides are currently eluting, without any human intervention. In one experiment, over 3700 unique mouse peptides were discovered while simultaneously targeting 500 other peptides. Such hybrid MS methods enable both a focused and holistic view on the same sample, something that is welcomed when sample-limited. Until comprehensive proteomic coverage is routinely obtained, targeted methods will be heavily used and developed. We have explored increasing the intelligence of MS methods as a means to improve the throughput and power of peptide targeting without sacrificing the novel discovery aspect of DDA sampling. Future work includes improvements to the determination of EOs, increasing the success rate of target identification, exploring additional quantitative strategies (e.g., PRM, label-free), and maximizing the throughput to target larger portions of the proteome without laborious upfront work.

57 in total

1. Open mass spectrometry search algorithm.

Authors: Lewis Y Geer; Sanford P Markey; Jeffrey A Kowalak; Lukas Wagner; Ming Xu; Dawn M Maynard; Xiaoyu Yang; Wenyao Shi; Stephen H Bryant
Journal: J Proteome Res Date: 2004 Sep-Oct Impact factor: 4.466

Review 2. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.

Authors: Rovshan G Sadygov; Daniel Cociorva; John R Yates
Journal: Nat Methods Date: 2004-12 Impact factor: 28.547

3. Signal maps for mass spectrometry-based comparative proteomics.

Authors: Amol Prakash; Parag Mallick; Jeffrey Whiteaker; Heidi Zhang; Amanda Paulovich; Mark Flory; Hookeun Lee; Ruedi Aebersold; Benno Schwikowski
Journal: Mol Cell Proteomics Date: 2005-11-03 Impact factor: 5.911

4. Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information.

Authors: Konstantinos Petritis; Lars J Kangas; Bo Yan; Matthew E Monroe; Eric F Strittmatter; Wei-Jun Qian; Joshua N Adkins; Ronald J Moore; Ying Xu; Mary S Lipton; David G Camp; Richard D Smith
Journal: Anal Chem Date: 2006-07-15 Impact factor: 6.986

5. More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS.

Authors: Annette Michalski; Juergen Cox; Matthias Mann
Journal: J Proteome Res Date: 2011-02-28 Impact factor: 4.466

6. Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics.

Authors: Amelia C Peterson; Jason D Russell; Derek J Bailey; Michael S Westphall; Joshua J Coon
Journal: Mol Cell Proteomics Date: 2012-08-03 Impact factor: 5.911

7. A combinatorial approach to the peptide feature matching problem for label-free quantification.

Authors: Hao Lin; Lin He; Bin Ma
Journal: Bioinformatics Date: 2013-05-10 Impact factor: 6.937

8. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics.

Authors: Paola Picotti; Bernd Bodenmiller; Lukas N Mueller; Bruno Domon; Ruedi Aebersold
Journal: Cell Date: 2009-08-06 Impact factor: 41.582

9. Deep and highly sensitive proteome coverage by LC-MS/MS without prefractionation.

Authors: Suman S Thakur; Tamar Geiger; Bhaswati Chatterjee; Peter Bandilla; Florian Fröhlich; Juergen Cox; Matthias Mann
Journal: Mol Cell Proteomics Date: 2011-05-17 Impact factor: 5.911

10. A framework for intelligent data acquisition and real-time database searching for shotgun proteomics.

Authors: Johannes Graumann; Richard A Scheltema; Yong Zhang; Jürgen Cox; Matthias Mann
Journal: Mol Cell Proteomics Date: 2011-12-13 Impact factor: 5.911

16 in total

1. DeMix-Q: Quantification-Centered Data Processing Workflow.

Authors: Bo Zhang; Lukas Käll; Roman A Zubarev
Journal: Mol Cell Proteomics Date: 2016-01-04 Impact factor: 5.911

2. Large-Scale Targeted Proteomics Using Internal Standard Triggered-Parallel Reaction Monitoring (IS-PRM).

Authors: Sebastien Gallien; Sang Yoon Kim; Bruno Domon
Journal: Mol Cell Proteomics Date: 2015-03-09 Impact factor: 5.911

3. A Strategy to Combine Sample Multiplexing with Targeted Proteomics Assays for High-Throughput Protein Signature Characterization.

Authors: Brian K Erickson; Christopher M Rose; Craig R Braun; Alison R Erickson; Jeffrey Knott; Graeme C McAlister; Martin Wühr; Joao A Paulo; Robert A Everley; Steven P Gygi
Journal: Mol Cell Date: 2017-01-05 Impact factor: 17.970

4. Accelerating Lipidomic Method Development through in Silico Simulation.

Authors: Paul D Hutchins; Jason D Russell; Joshua J Coon
Journal: Anal Chem Date: 2019-07-25 Impact factor: 6.986

5. Active Instrument Engagement Combined with a Real-Time Database Search for Improved Performance of Sample Multiplexing Workflows.

Authors: Brian K Erickson; Julian Mintseris; Devin K Schweppe; José Navarrete-Perea; Alison R Erickson; David P Nusinow; Joao A Paulo; Steven P Gygi
Journal: J Proteome Res Date: 2019-02-04 Impact factor: 4.466