Literature DB >> 33553868

Benchmarking Quantitative Performance in Label-Free Proteomics.

James A Dowell¹, Logan J Wright¹, Eric A Armstrong¹, John M Denu^1,2.

Abstract

Previous benchmarking studies have demonstrated the importance of instrument acquisition methodology and statistical analysis on quantitative performance in label-free proteomics. However, the effects of these parameters in combination with replicate number and false discovery rate (FDR) corrections are not known. Using a benchmarking standard, we systematically evaluated the combined impact of acquisition methodology, replicate number, statistical approach, and FDR corrections. These analyses reveal a complex interaction between these parameters that greatly impacts the quantitative fidelity of protein- and peptide-level quantification. At a high replicate number (n = 8), both data-dependent acquisition (DDA) and data-independent acquisition (DIA) methodologies yield accurate protein quantification across statistical approaches. However, at a low replicate number (n = 4), only DIA in combination with linear models for microarrays (LIMMA) and reproducibility-optimized test statistic (ROTS) produced a high level of quantitative fidelity. Quantitative accuracy at low replicates is also greatly impacted by FDR corrections, with Benjamini-Hochberg and Storey corrections yielding variable true positive rates for DDA workflows. For peptide quantification, replicate number and acquisition methodology are even more critical. A higher number of replicates in combination with DIA and LIMMA produce high quantitative fidelity, while DDA performs poorly regardless of replicate number or statistical approach. These results underscore the importance of pairing instrument acquisition methodology with the appropriate replicate number and statistical approach for optimal quantification performance. Not subject to U.S. Copyright. Published 2021 by American Chemical Society.

Entities: CellLine Chemical Disease Species

Year: 2021 PMID： 33553868 PMCID： PMC7859943 DOI： 10.1021/acsomega.0c04030

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Proteomics is a powerful tool to profile global changes in protein expression and protein modification states.[1] For these “discovery” experiments to be useful in guiding follow-up “validation” experiments, they must have a high degree of quantitative accuracy. As with other “omics technologies”, many factors can impact the accuracy of quantitative proteomics, including the type of quantitative workflow, instrument parameters, replicate number, and statistical approach. In general, there are three different types of quantitative proteomics workflows—label-free, metabolic labeling, and chemical tagging. Each of these techniques has certain advantages and disadvantages. Label-free quantification (LFQ) is the simplest and most straightforward to set up. However, LFQ generally exhibits higher amounts of variance than metabolic labeling or chemical tagging.[2,3] In contrast, metabolic labeling via stable isotope labeling with amino acids in cell culture (SILAC) yields highly accurate quantification, but is limited to cell culture systems and produces lower proteome coverage than LFQ workflows.[4,5] Chemical tagging via isobaric tandem mass tags (TMTs) also yields highly accurate quantitation; however, TMT workflows display quantitative bias unless triple-stage isolation (MS3) or high-field asymmetric-waveform ion mobility (FAIMS) is employed, which requires a specific type of mass spectrometer and an electrospray source.[6−10] Data-dependent acquisition (DDA) is a highly effective method for identifying large numbers of peptides and proteins.[11−14] However, DDA presents some unique quantitative challenges for LFQ. Specifically, DDA precursor ion selection is based on relative peptide intensities at a given time in the LC–MS run and thus exhibits run-to-run inconsistencies in peptide identification and quantification. This stochastic peptide sampling results in a large number of missing peptide intensities across the samples. These missing values negatively affect downstream statistical analyses, especially for low-abundance peptides/proteins.[2,15,16] While newer, faster instruments provide immense sampling depth, they still suffer from undersampling when near the limit of detection.[2] To alleviate the issue of missing values in label-free DDA workflows, data imputation has been employed.[17−20] A simple application of data imputation uses the normal distribution to replace the missing values.[21] This methodology has been shown to increase the quantitative fidelity for low-abundance proteins but not for high-abundance proteins.[22] More recently, more sophisticated techniques combining multiple types of data, for example, peptide intensities and spectral counts, have been used to produce more robust quantitative performance from DDA data.[23] In addition to post hoc data imputation, instrument-centric methods employing data-independent acquisition (DIA) have been developed to reduce missing values in label-free workflows.[24−26] In contrast to DDA workflows that produce tandem MS1/MS2 spectra via isolation and fragmentation of specific peptide precursors, DIA methods cycle through wide mass isolation windows in which all peptides within a window are fragmented. These peptide spectra are then matched to a previously generated peptide library across the LC–MS/MS run. This unbiased isolation and fragmentation produces very few missing values.[15] However, DIA produces complex, multipeptide fragmentation spectra that are generally not suitable for database searching. To circumvent this issue, DIA workflows perform peptide identification and quantification using an external peptide library.[24,25,27] These libraries are typically generated via parallel DDA experiments and thus require additional instrument time. A popular alternative to LFQ is the use of isobaric labeling with TMTs.[8,28] In this workflow, tryptic peptides are N-terminal-labeled via amine-reactive isobaric mass tags that contain a sample-specific reporter fragment ion. After labeling, the samples are combined into a single sample and then assayed via standard DDA acquisition. Instead of the quantification being performed on peptide intensities (MS1), the signal intensity from each unique “reporter” fragment is compared against the other “reporter” fragments to yield relative peptide (and protein) quantification. This was traditionally performed on the peptide fragmentation spectra (MS2); however, it was discovered that peptide co-isolation leads to interference in the MS2 channel. To alleviate these issues, triple-stage isolation (MS3) is used, in which the most intense fragment ions of the precursor peptides are isolated and further fragmented to produce peptide-specific, interference-free “reporter” ions.[6−10] This results in almost no missing values because any peptide that undergoes fragmentation yields a complete quantitative profile across all samples. In addition to instrument acquisition methods, quantification is affected by post-acquisition data handling, including data normalization and statistical analyses.[29] For protein-level analyses, typically the peptide intensities for a specific protein are combined (by sum, mean, or median) to obtain a protein-level intensity value. Statistical analyses are then performed on these “rolled-up” protein-level values.[30−32] More recently, peptide-centric statistical approaches have been developed that perform statistical analysis at the peptide level before combining results into protein values. These approaches were shown to have superior quantitative accuracy in comparison with protein-level statistical approaches.[22,23,33−36] In addition to protein-level experiments, quantitative mass spectrometry is employed to analyze the alterations in peptide modification states, for example, phosphoproteomics and acetylomics.[37−40] The modified peptides are typically enriched and then analyzed via label-free (DDA or DIA) or isobaric labeling (TMT) workflows.[41] In contrast to protein intensities that are a combination of multiple peptide signals, peptide-level analyses rely on a single intensity of an individual peptide analyte. This presents more demands on the quantitative accuracy due to the use of a single intensity for downstream analysis. While a number of benchmarking studies have evaluated the effects of instrument acquisition, replicate numbers, and statistical approaches independently, a holistic benchmarking approach that examines the synergistic effects of these parameters on quantitative performance has not been undertaken.[2,29,34,42−48] In this study, we have rigorously determined the combined impact of these factors on the quantitative performance in label-free proteomics (DDA and DIA). As a corollary, we also evaluated the impact of statistical approach on a previously published data set employing isobaric labeling via TMTs.

Results

Experimental Overview

To assess the impact of data acquisition, statistical approach, false discovery rate (FDR) calculations, and replicate number on quantitative accuracy, we produced a benchmarking data set by spiking an Escherichia coli lysate into a human HEK293 lysate at 1:4 (1x sample) and 1:2 (2x sample), where the 2x sample contains twice as much total E. coli protein as the 1x sample, while the HEK293 protein amount is held constant (Figure ). Specifically, 100 μg (1x sample) or 200 μg (2x sample) of the E. coli lysate was spiked into 400 μg of the human lysate (HEK293 cell lysate). Since there is very little protein homology between these species, the expected peptide/protein fold change between the 1x and 2x conditions is known simply based on the species, and thus, human peptides/proteins have an ideal fold change of 1 (log2 FC = 0), while bacterial proteins have an ideal fold change of 2 (log2 FC = +1). We analyzed our benchmark standard using either label-free DIA (LFQ-DIA) or DDA (LFQ-DDA) in combination with various replicate numbers and statistical approaches. The peptide library for the DIA analysis was generated from the DDA replicates, and thus, the identified peptides and proteins are identical between the two data sets. In this manner, we obviated any identification bias between data sets.

Figure 1

Workflow overview. Benchmarking standards were created with an E. coli protein digest spiked into a human protein digest (HEK293 cell lysate) at either a 1x or 2x concentration. Benchmark standards were analyzed by LC–MS/MS using two different acquisition modes: DDA or DIA. Statistical analysis was performed with four distinct approaches: t-test, LIMMA, ROPECA, and ROTS. To extend our evaluation beyond label-free quantitative methodologies, we evaluated the impact of statistical approaches on the quantitative performance of a previously published data set that utilized tandem mass tagging (TMT-DDA) and label-free DDA2. The inclusion of this data set also served as a laboratory-independent control for our evaluation of statistical approaches. Gygi et al. employed a similar experimental design as the one employed in our study, but instead of using an E. coli digest as their spike-in proteome, they used a yeast digest. However, in contrast to our “single-shot” label-free workflow, Gygi et al. prefractionated their benchmarking sample via offline high pH reverse-phase HPLC before chemical tagging and LC–MS/MS analysis. Gygi et al. also employed a Thermo Scientific Lumos mass spectrometer coupled to a nanoflow HPLC, whereas we used a first-generation Thermo Scientific Q-Exactive coupled to a microflow HPLC.

Acquisition Methodology

To qualitatively assess the impact of instrument acquisition modes on quantitative performance, we analyzed the E. coli benchmarking standard via DDA and DIA acquisition. These analyses were performed using the MS1 signal intensities of eight replicates for both the DDA and DIA acquisition methodologies. We performed moderated t-tests with a Benjamini–Hochberg (BH) false discovery correction to identify those peptides and proteins that exhibit significantly different quantities between the 1x and 2x conditions. We plotted log2 fold change against the negative log10q-value for each protein or peptide (Figure ). All probes above the horizontal dotted line are statistically significant (q < 0.05). The expected fold change value is designated by the dotted vertical trendline at log2 fold change = 1. We assessed the quantitative performance via the overall negative log10q-value of the bacterial proteins/peptides and number of bacterial proteins/peptides that most closely adhere to the expected fold change trendline. For the protein-level analysis, the LFQ-DIA exhibited the highest negative log10q-values as well as a slightly tighter grouping around the expected log2 fold change trendlines (Figure A,B). For the peptide-level analysis, the LFQ-DIA data set also had the highest negative log10q-values. However, in contrast to the protein data, the LFQ-DIA and LFQ-DDA peptide-level analysis displayed a similar data spread and adherence to the expected log2 fold change trendlines (Figure C,D). We also visualized the data as a standard deviation plot and observed similar trends in the adherence to the expected fold change and data spread as observed in the scatter plots (Figure S1).

Figure 2

Scatter plots. Log2 fold change plotted against the negative log10q-value for each protein/peptide. Red dots are human proteins/peptides. Blue dots are E. coli proteins/peptides. All proteins above the horizontal dotted line indicate statistical significance (n = 8, moderated t-test with the BH correction, q < 0.05). Vertical lines indicate a log2 fold change of ±1. (A) Proteins analyzed via DIA, (B) proteins analyzed via DDA, (C) peptides analyzed via DIA, and (D) peptides analyzed via DDA. As an extension of these observations, we also qualitatively assessed the protein- and peptide-level quantitative performance of the Gygi et al. yeast benchmarking data set using the same metrics of the negative log10p-values and adherence to expected fold change values (Figure S2). The yeast TMT-DDA data set exhibited the tightest grouping around the expected log2 fold change trendline, while the yeast LFQ-DDA data set exhibited a much larger data spread. Also, the negative log10 q-values were higher for the yeast TMT-DDA than for the yeast LFQ-DDA data set. For the peptide-level analysis, the yeast TMT-DDA data set also had higher negative log10q-values and adhered more closely to the expected log2 fold change trendlines than the yeast LFQ-DDA data set. Interestingly, all the protein data sets exhibited a tighter grouping around the expected trendline than the peptide data sets, indicating a higher quantification fidelity at the protein than at the peptide level. This is likely due to protein-level quantification being a composite of multiple peptide signals, essentially producing multiple measurements of its intensity value, whereas peptide-level quantification relies on a single intensity value, that of the individual peptide analyte. Thus, peptide quantification is more prone to quantification errors and requires a more robust experimental design than protein-level quantification, a theme we will explore more thoroughly below.

Statistical Approaches

While the LFQ-DIA data display slightly less data spread and higher negative log10p-values than the LFQ-DDA, these observations are largely qualitative. To quantitatively assess the impact of acquisition methodology, statistical approach, and replicate number on protein-level quantitative performance, three distinct statistical methods were compared—t-tests, linear models for the analysis of microarrays (LIMMAs), and reproducibility-optimized test statistic (ROTS and ROPECA). We used true positive and false positive rates to assess the performance of each workflow. True positive rate (TPR) measures the proportion of actual positives that are identified in an experiment in which the actual positives and negatives are known. In our experimental design, a true positive is any E. coli protein/peptide that is identified as being significantly different between the 1x and 2x benchmarking standards. The false positive rate is the proportion of nonpositives identified as positive. In our experimental design, a false positive is any human protein/peptide identified as being significantly different between the 1x and 2x benchmarking standards. LIMMA fits a linear model to the expression data.[49] ROTS and ROPECA use a version of bootstrapping to optimize the parameters that maximize reproducibility to identify differentially expressed proteins.[29,34] ROTS is specific to protein-level data and ROPECA is the adaptation of ROTS that calculates peptide-level p-values and then aggregates expression values of each peptide within a protein into a single value for that protein. In order to control for the multiple testing hypothesis problem, we applied the BH FDR correction with a q-value threshold of <0.05.[50] To compare these various approaches for protein-level analysis, stacked bar graphs of true and false positives (Figure ), confusion tables (Figure S3), and receiver operating characteristic (ROC) curves (Figure S4) were generated. The LFQ-DIA workflow at high replicate numbers (n = 8) generated a high number of true positives and a low number of false positives across all statistical approaches with an average 90% TPR (∼500 out of 568 possible true positives) and a false positive rate of ∼3% across all statistical approaches (Figure A). In contrast, the quantitative performance of DIA at low replicate numbers (n = 4) was highly dependent on the statistical approach with LIMMA and ROPECA performing extremely well—LIMMA had a 75% TPR (427 out of 568 possible) and ROPECA had a 70% true positive rate (396 out of 568 possible)—while the t-statistic only had a 40% TPR and ROTS performed even more poorly, identifying a single true positive (Figure B).

Figure 3

Protein true and false positive stacked bar graphs. True positives (dark blue) and false positives (light blue) were plotted as stacked bar graphs for each protein workflow and statistical approach. (A) Eight replicates analyzed via DIA, (B) four replicates analyzed via DIA, (C) eight replicates analyzed via DDA, and (D) four replicates analyzed via DDA. For all replicate numbers and statistical approaches, the LFQ-DDA workflow had a lower TPR than the LFQ-DIA workflow. Across all statistical conditions, the high replicate analysis (n = 8) produced an average 54% TPR (311 out of 573 possible) with LIMMA slightly outperforming the other two statistical approaches (Figure C). However, LFQ-DDA exhibited a poor TPR at low replicate numbers, regardless of the statistical approach, with an average 10% TPR (59 out of 573 possible) (Figure D). However, there were still large differences in statistical performance at low replicate numbers in the LFQ-DDA data set—LIMMA had a 24% TPR (139 out of 573 possible), while the t-statistic and ROTS had TPRs well below 10%. While the reasons for the difference in quantitative performances between the LFQ-DIA and LFQ-DDA workflows are unclear, one of the major differences between the DDA and DIA data sets is the number of missing values. The LFQ-DDA workflow exhibited a significantly higher number of missing values (12%) versus the LFQ-DIA workflow (0.078%), pointing to the possible importance of missing values on quantitative performance. To examine if the trends we observed in label-free protein quantification extended to a chemical tagging approach, we compared the performance of the TMT and DDA workflows from a previously published data set.[2] The results from the yeast LFQ-DDA data set from Gygi et al. exhibited very similar trends to the E. coli LFQ-DDA data set at low replicates (n = 4), with LIMMA strongly outperforming ROTS and the t-statistic (Figure S5). The yeast LFQ-DDA analysis yielded a 45% TPR (432 out of 945 possible) and a 3% false positive rate with LIMMA, while ROTS and the t-test performed much more poorly. Interestingly, the yeast TMT-DDA workflow at low replicate numbers (n = 4) had a very high average TPR of 94% (1222 out of 1304 possible) (Figure S5). These results are in stark contrast to all other workflows at low replicate numbers, which all exhibited highly variable TPRs between statistical approaches. It also has the fewest missing values of any data set (less than 0.15%), once again emphasizing the strong impact of missing values on quantitative fidelity, especially at low replicate numbers. In terms of statistics, LIMMA and ROPECA performed the best across workflows and replicate numbers.[34] In summary, these data demonstrate the complex interplay of replicate number, instrument acquisition, and statistical approach. For the peptide-level analysis, two statistical approaches were performed—LIMMA and t-tests (Figure ). ROTS and ROPECA can only be performed on protein-level analyses, so these statistical approaches were not included.[33,34] The LFQ-DIA analysis at high replicate numbers (n = 8) had a 67% TPR (1786 out of 2,655 possible) (Figure A). In contrast, the LFQ-DDA analysis, even at high replicates, performed poorly with a 29% TPR (715 out of 2501 possible) using LIMMA and a 15% TPR using the t-statistic (Figure C). However, at low replicate numbers even DIA performed poorly, with LIMMA and t-tests exhibiting 23% and 1% TPRs, respectively (Figure B). The LFQ-DDA workflow at a low replicate number performed even worse, with less than 10 true positives using LIMMA and no true positives with the t-statistic (Figure D).

Figure 4

Peptide true and false positive stacked bar graphs. True positives (dark blue) and false positives (light blue) were plotted as stacked bar graphs for each peptide workflow and statistical approach. (A) Eight replicates analyzed via DIA, (B) four replicates analyzed via DIA, (C) eight replicates analyzed via DDA, and (D) four replicates analyzed via DDA. To examine if the trends observed in the peptide quantification via LFQ-DDA and LFQ-DIA workflows extend to a chemical tagging approach, we compared the performance of the Gygi et al. TMT and DDA data sets. The yeast LFQ-DDA data exhibited a 44% TPR for LIMMA and a 2% TPR for t-tests (Figure S5). In contrast, the yeast TMT-DDA workflow produced 97% true positives with LIMMA (6434 out of a possible 6610) and a 75% TPR via the t-statistic (Figure S5). While not directly comparable to our LFQ-DIA and LFQ-DDA data sets, the statistical performance of LIMMA and the t-statistic exhibited highly similar trends across all data sets, with LIMMA outperforming the t-statistic regardless of acquisition methodology. In summary, accurate peptide quantification is much more demanding than protein analyses. The LFQ-DIA and TMT-DDA workflows produced superior results to the LFQ-DDA workflows, with LIMMA producing the highest TPRs of any statistical approach across all workflows.

Replicate Numbers

To further examine the impact of replicate numbers on statistical performance, we plotted TPR versus replicate number for each statistical approach in combination with either the LFQ-DDA or LFQ-DIA workflow (Figure ). As with our previous analyses, the statistical performance was highly variable across the replicate number. For LFQ-DIA protein quantification, both LIMMA and ROPECA exhibited a high, flat TPR across replicates, which is consistent with the previous comparison of low and high replicate data that showed robust performance at both low (n = 4) and high (n = 8) replicates (Figure ). However, the t-statistic exhibited a steadily increasing curve that approached a plateau near n = 8 for the LFQ-DIA workflow, indicating that the optimal replicate number for the t-statistic is closer to n = 8. For the LFQ-DDA data, both the t-statistic and LIMMA analysis steadily increased with more replicates and do not appear to plateau, indicating that the optimal replicate number is above n = 8. For the peptide LFQ-DIA data, both the LIMMA and t-statistic steadily increased to n = 8 without plateauing, once again, indicating that the ideal replicate number is above n = 8. For the LFQ-DDA peptide data, very few true positives are identified until n = 5 for LIMMA and n = 7 for the t-statistic. These data show that the statistical performance is highly dependent on the data acquisition methodology and replicate number.

Figure 5

Replicate analysis. TPR plotted against replicate number for (A) each protein and (B) peptide workflow and statistical approach from the E. coli benchmarking data set. The legend indicates the instrument acquisition and statistical approach for each line plot.

FDR Correction

Controlling for the multiple testing hypothesis problem is essential for the quantitative accuracy of large data sets, including microarrays, RNA-seq, and quantitative proteomics experiments.[50,51] A commonly used FDR correction for type I errors (false positives) is the Benjamini–Hochberg (BH) correction.[50] For all previously displayed data, we applied the BH correction at a q-value threshold of <0.05 (see the Experimental Section). To assess the potential impact of FDR corrections on quantitative accuracy, we also applied the Storey correction at the same q-value cutoff.[51] The Storey correction is slightly less conservative than the BH correction.[51] Overall, the Storey correction produced higher TPRs with a concurrent increase in the false positive rates. However, the choice of FDR correction unexpectedly exhibited variable behavior that depended on replicate number, statistical approach, and acquisition methodology. The BH correction (q-value <0.05) in combination with the LFQ-DIA protein workflow at a high replicate number (n = 8) differed very little from the Storey corrected results, regardless of statistical approach (Table S4). However, the FDR correction had a slightly larger impact on the high replicate LFQ-DDA workflow with the Storey correction producing a consistently higher TPR. The choice of FDR correction also noticeably affected the results from the low replicate LFQ-DIA workflow. However, the most dramatic effect of altering the FDR correction was manifested in the low replicate (n = 4) LFQ-DDA workflow in combination with LIMMA, with the Storey correction producing 37% more true positives than the BH correction (Table S4). These results indicate that FDR corrections can greatly impact quantitative performance, especially at low replicates using LFQ-DDA approaches. It is unclear why the FDR correction produces large effects in certain conditions, for example, low replicate LFQ-DDA using LIMMA, and not others, for example, high replicate LFQ-DIA. However, the high replicate workflows clearly exhibit a greatly reduced dependence on the choice of FDR correction, again, highlighting the importance of replicate choice in experimental design.

Discussion

We have employed a quantitative proteomics benchmarking approach to explore the synergistic impact of instrument acquisition methodology, replicate number, and statistical approach on quantitative performance. We have evaluated two LFQ workflows as well as TMT workflow for protein- and peptide-level quantitative performance. The relative advantages of these quantitative workflows have been evaluated before.[2,29,34,42−48] However, the combined impact of replicate number and statistical approach on these workflows has not been thoroughly explored. Interestingly, we found a complex relationship between quantitative workflow, statistical approach, and replicate number. A common theme we found across our analyses is those acquisition methodologies with fewer missing values (TMT-DDA and LFQ-DIA) produced superior results at lower replicate numbers. Missing values have been shown to have a detrimental effect on downstream statistical analyses across disparate data types.[2,15,16] The effect of missing values is especially detrimental at low replicate numbers. However, the impact of missing values and/or low replicate numbers on quantitative performance depends on the statistical approach employed. Specifically, conventional parametric statistical tests, such as the t-statistic, are especially affected. In contrast, the performance of a linear model fitted to expression values, as implemented in the LIMMA package, was less impacted by low replicate numbers and/or high numbers of missing values. This is not surprising since LIMMA has been optimized for smaller sample sizes compared with the t-statistic and its improved performance might be due to the elimination of small within-group variance and/or the introduction of fold-change criterion.[49] Similarly, the performance of the peptide-centric, boot-strapping algorithm employed by ROPECA is also relatively unaffected by the low number of replicates.[34] Most likely, this is due to the generation and aggregation of p-values at the peptide level instead at the protein level. We would expect that other peptide-centric statistical methodologies would also exhibit similar performance, including the algorithm employed in MSstats.[35] In contrast to protein-level quantification, which is based on multiple peptide measurements, peptide-level quantification relies on a single value. This produces a much more demanding scenario for downstream statistical analysis, with missing values and/or low replicate numbers being especially detrimental to the quantitative performance. Furthermore, the use of a single peptide intensity for quantification precludes the implementation of peptide-centric statistical methods, such as those employed by ROPECA and MSstats, that aggregate multiple peptide p-values into a single protein-level value. For peptide-level quantification, LIMMA clearly outperforms the t-statistic. As mentioned above, this is not too surprising given that LIMMA has been optimized for small replicate numbers. However, even LIMMA does not perform well with low replicates in combination with high numbers of missing values, such as those encountered in the LFQ-DDA workflows. Thus, peptide-level quantification requires more robust experimental designs, including the use of data acquisition methodologies with less missing values (TMT-DDA and LFQ-DIA), higher replicate numbers (n = 8), and more sophisticated statistical approaches (LIMMA). Another important consideration is instrument requirements. DDA approaches are the simplest to set up and can be performed on a wide array of instrumentation. A quantitative DDA experiment can be performed on almost any mass spectrometer, whether it is a quadrupole time-of-flight, an ion trap (IT), an Orbitrap, or a hybrid instrument. In contrast to DDA, DIA requires a high-resolution quadrupole instrument to obtain suitable MS2 data for peptide library matching and quantification. Although TMT workflows can theoretically be performed on many proteomics instruments, quantitative accuracy is greatly enhanced when FAIMS or MS3-based quantification is used; however, this requires an IT-based instrument or a FAIMS-capable source. Also, TMT workflows generally require prefractionation, usually via ion exchange or high pH reverse-phase chromatography, which adds to the complexity and level of expertise required to execute this workflow. Another consideration is the cost of executing these workflows. The TMT-DDA workflow requires the purchase of chemical tagging reagents, which increases the cost per sample. Some LFQ-DIA workflows require the purchase of retention time standards, which also adds to the analysis cost. While LFQ-DDA workflows do not require the purchase of these reagents, they require a higher number of replicates to obtain adequate quantitative performance. This requirement for higher replicate numbers increases the upstream sample preparation costs as well as the instrument time per experiment. In summary, excellent quantitative performance in bottom-up proteomics can be achieved using LFQ-DDA, LFQ-DIA, or TMT-DDA workflows. However, each workflow exhibits unique characteristics that require careful attention to experimental design, including the choice of replicate number and downstream statistical approach.

Experimental Section

Sample Preparation

HEK293 cells were cultured in Dulbecco’s modified Eagle’s medium with 10% fetal bovine serum until confluent. The cells were rinsed once with phosphate-buffered saline, lifted with trypsin/ethylenediaminetetraacetic acid, and then pelleted. The pellet was resuspended in 1% sodium dodecyl sulfate in 100 mM Tris–HCl (pH 7.5) and then tip sonicated at 20% amplitude three times for 5 s each. Cysteines were reduced with 25 mM dithiothreitol at 95 °C for 10 min and then alkylated with 50 mM iodoacetamide for 20 min in the dark at room temperature. Proteins were precipitated with 4x volumes of ice-cold acetone for 60 min and then spun at 21,000 g to pellet proteins. Precipitated protein was resuspended in 2 M urea in 25 mM ammonium bicarbonate with tip sonication (as above). Protein concentration was determined via the bicinchoninic acid assay. The protein extract was digested with trypsin at 1:50 overnight at 37 °C. The digest was quenched with 0.5% trifluoroacetyl and protein extracts desalted via Stage-tips.[52] The samples were dried down and resuspended in 0.1% formic acid. E. coli were cultured in 2xYT media and then pelleted at 8000 g and processed in the same manner as the HEK293 samples. We prepared our benchmark sample as previously published by Mann and co-workers.[53] Two separate samples were created by mixing the HEK293 and E. coli digests. For the “1x” sample, 100 μg of E. coli digest was added to 400 μg of HEK293 sample and diluted to a total of 600 μL. For the “2x” sample, 200 μg of E. coli digest was added to 400 μg of HEK293 sample and diluted to a total of 600 μL. iRT peptides (HRM Kit, Biognosys) were spiked into each peptide digest according to the manufacturer’s instructions. Eight technical replicates of each sample group (1x and 2x) and each acquisition type (DDA and DIA) were analyzed by LC–MS/MS for a total of 32 runs. For the mapping of sample IDs to raw files, the sequence table was exported into an Excel file (Table S4).

LC–MS/MS Parameters

For both DDA and DIA, 20 μL of each sample was injected onto a Dionex Ultimate3000 microflow HPLC and separated at a flow rate of 6 μL/min using a Thermo Scientific Hypersil Gold C18 column (150 × 0.18 mm, particle size 3 μm) at 60 °C coupled to a Thermo Scientific Q-Exactive mass spectrometer with a HESI-II source equipped with a narrow-bore spray needle. HPLC mobile phases consisted of water + 0.1% formic acid and acetonitrile + 0.1% formic acid. Peptides were resolved with a linear gradient of 2–30% ACN over 85 min. HESI parameters were as follows: sheath gas = 5; spray voltage = 3 kV; capillary temperature = 240 °C; S-lens RF = 50; auxiliary gas heater = 30 °C; auxiliary gas flow = OFF; and sweep gas = OFF. The mass spectrometer was operated in the DDA mode with dynamic exclusion enabled (exclusion duration = 15 s), MS1 resolution = 70,000, MS1 automatic gain control target = 1 × 106, MS1 maximum fill time = 100 ms, MS2 resolution = 17,500, MS2 automatic gain control target = 2 × 105, MS2 maximum fill time = 200 ms, and MS2 normalized collision energy = 28. For each cycle, one full MS1 scan range = 400–1000 m/z, followed by MS2 scans with a loop count of 20 and an isolation window size of 2.0 m/z. In DIA, the mass spectrometer was operated with a MS1 scan resolution = 70 000, automatic gain control target = 1 × 106, MS1 maximum fill time = 100 ms, followed by a DIA scan with a loop count of 30 using nonoverlapping windows from 400 to 1000 m/z. The DIA settings were as follows: window size = 20 m/z, resolution = 17,500, automatic gain control target = 1 × 1 × 106, DIA maximum fill time = AUTO, and normalized collision energy = 30. The total cycle time was 3.1 s.

Data Processing

DDA runs were analyzed with MaxQuant (version 1.6.5.0) using default settings. The data were searched against a combined human and E. coli Swiss-Prot database (downloaded December 12, 2017). Search criteria included carbamidomethylation of cysteine as a fixed modification, methionine oxidation and N-terminal protein acetylation as variable modifications, and trypsin/P with two missed cleavages as the enzyme. Mass tolerance was set at 4.5 ppm for precursor ions and 20 ppm for fragment ions. Peptide spectrum match and Protein FDR were set at 0.01. For DDA peptide quantification, match between runs was performed with a matching time window of 0.7 min and an alignment window of 20 min. MS1 summed isotope intensities were used. For protein quantification, all peptide MS1 intensities for a given protein were summed to yield the total protein intensity. The MaxQuant output file was exported as an Excel file and is included in the Supporting Information (Table S4). Normalization was performed via the summation of the total intensity of all the identified human peptides. These calculations were performed used an internally generated R-script (see below). The normalized peptide and protein values were exported to an Excel file (Table S4). The DIA runs were analyzed with Spectronaut Pulsar X (version 12.0.20491.13.20699) using default parameters. The spectral libraries were generated from the MaxQuant results from all 16 DDA runs. The spectral library contained 19,561 precursors, 17,571 modified peptides, and 3092 proteins. For identification, a “mutated” decoy method was used, and the precursor and protein q-value cutoffs were set at 0.01, and the retention time prediction was set to dynamic. MS1 summed isotope intensities were used for peptide quantification. Protein inference was set to Automatic and interference correction was enabled. The Spectronaut output file was exported as an Excel file and is included in the Supporting Information (Table S4). Protein intensities and normalization were calculated as described for the DDA analysis (see above). The normalized peptide and protein values were exported to an Excel file (Table S4).

R-Script

The analyses were performed using the R programming language (www.r-project.org), version 3.6. Our script accepts peptide- or protein-level intensity data, performs sum normalization on total peptide intensities, rolls up peptide-to-protein intensities via summation, runs statistical tests, and generates plots. Note: Data imputation of missing values was not performed. We have written a general user guide to download and run the script in RStudio or your choice of editor, which includes the necessary parameters for the input files (see below).

Statistical Analysis

Statistical analyses were performed on both protein- and peptide-level intensities using q-adjusted t-tests, LIMMA, and ROTS. FDR was controlled via BH or Storey corrections.[50,51]

Yeast Benchmarking Label-Free DDA and TMT Data Sets

We analyzed a previously published data set from Gygi et al.[2] This benchmarking data set is very similar to our in-house generated sample except a yeast proteome spike-in was used instead of an E. coli spike-in. We analyzed both the DDA and TMT data sets using our analysis pipeline. The original Excel files were downloaded and minimally reformatted to conform to the data frame requirements of our R script. We only analyzed the “2x” versus the “1x” spike-in samples and did not include the “3x” samples. Normalization, peptide-to-protein rollup, and statistical analysis were performed as described above. The normalized peptide and protein values were exported to an Excel file (Table S4).

Data and R Script Availability

The normalized intensity values for all the experimental conditions are summarized in Table S4. The raw data, spectral libraries, and output files are publicly available on the ProteomeXchange data repository (PXD018408) and the MassIVE repository (MSV000085239). The R script and associated output files are available on GitHub (https://github.com/DenuLab/LFQMagic).

51 in total

1. Using Peptide-Level Proteomics Data for Detecting Differentially Expressed Proteins.

Authors: Tomi Suomi; Garry L Corthals; Olli S Nevalainen; Laura L Elo
Journal: J Proteome Res Date: 2015-09-29 Impact factor: 4.466

Review 2. Statistical design of quantitative mass spectrometry-based proteomic experiments.

Authors: Ann L Oberg; Olga Vitek
Journal: J Proteome Res Date: 2009-05 Impact factor: 4.466

Review 3. Dealing with missing values in large-scale studies: microarray data imputation and beyond.

Authors: Tero Aittokallio
Journal: Brief Bioinform Date: 2009-12-04 Impact factor: 11.622

4. MS1-based label-free proteomics using a quadrupole orbitrap mass spectrometer.

Authors: Tali Shalit; Dalia Elinger; Alon Savidor; Alexandra Gabashvili; Yishai Levin
Journal: J Proteome Res Date: 2015-03-24 Impact factor: 4.466

5. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.

Authors: Shao-En Ong; Blagoy Blagoev; Irina Kratchmarova; Dan Bach Kristensen; Hanno Steen; Akhilesh Pandey; Matthias Mann
Journal: Mol Cell Proteomics Date: 2002-05 Impact factor: 5.911

6. A multicenter study benchmarks software tools for label-free proteome quantification.

Authors: Pedro Navarro; Jörg Kuharev; Ludovic C Gillet; Oliver M Bernhardt; Brendan MacLean; Hannes L Röst; Stephen A Tate; Chih-Chiang Tsou; Lukas Reiter; Ute Distler; George Rosenberger; Yasset Perez-Riverol; Alexey I Nesvizhskii; Ruedi Aebersold; Stefan Tenzer
Journal: Nat Biotechnol Date: 2016-10-03 Impact factor: 54.908

7. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses.

Authors: George Rosenberger; Isabell Bludau; Uwe Schmitt; Moritz Heusel; Christie L Hunter; Yansheng Liu; Michael J MacCoss; Brendan X MacLean; Alexey I Nesvizhskii; Patrick G A Pedrioli; Lukas Reiter; Hannes L Röst; Stephen Tate; Ying S Ting; Ben C Collins; Ruedi Aebersold
Journal: Nat Methods Date: 2017-08-21 Impact factor: 28.547

8. Putting benchmarks in their rightful place: The heart of computational biology.

Authors: Bjoern Peters; Steven E Brenner; Edwin Wang; Donna Slonim; Maricel G Kann
Journal: PLoS Comput Biol Date: 2018-11-08 Impact factor: 4.475

9. Benchmarking comes of age.

Authors: Mark D Robinson; Olga Vitek
Journal: Genome Biol Date: 2019-10-09 Impact factor: 13.583

Review 10. Phosphoproteomics in the Age of Rapid and Deep Proteome Profiling.

Authors: Nicholas M Riley; Joshua J Coon
Journal: Anal Chem Date: 2015-11-19 Impact factor: 6.986

3 in total

1. Candesartan prevents arteriopathy progression in cerebral autosomal recessive arteriopathy with subcortical infarcts and leukoencephalopathy model.

Authors: Taisuke Kato; Ri-Ichiroh Manabe; Hironaka Igarashi; Fuyuki Kametani; Sachiko Hirokawa; Yumi Sekine; Natsumi Fujita; Satoshi Saito; Yusuke Kawashima; Yuya Hatano; Shoichiro Ando; Hiroaki Nozaki; Akihiro Sugai; Masahiro Uemura; Masaki Fukunaga; Toshiya Sato; Akihide Koyama; Rie Saito; Atsushi Sugie; Yasuko Toyoshima; Hirotoshi Kawata; Shigeo Murayama; Masaki Matsumoto; Akiyoshi Kakita; Masato Hasegawa; Masafumi Ihara; Masato Kanazawa; Masatoyo Nishizawa; Shoji Tsuji; Osamu Onodera
Journal: J Clin Invest Date: 2021-11-15 Impact factor: 14.808

2. Benchmarking of analysis strategies for data-independent acquisition proteomics using a large-scale dataset comprising inter-patient heterogeneity.

Authors: Klemens Fröhlich; Eva Brombacher; Matthias Fahrner; Daniel Vogele; Lucas Kook; Niko Pinter; Peter Bronsert; Sylvia Timme-Bronsert; Alexander Schmidt; Katja Bärenfaller; Clemens Kreutz; Oliver Schilling
Journal: Nat Commun Date: 2022-05-12 Impact factor: 17.694

3. Proteomic Response of Deinococcus radiodurans to Short-Term Real Microgravity during Parabolic Flight Reveals Altered Abundance of Proteins Involved in Stress Response and Cell Envelope Functions.

Authors: Karlis Arturs Moors; Emanuel Ott; Wolfram Weckwerth; Tetyana Milojevic
Journal: Life (Basel) Date: 2021-12-24

3 in total