Literature DB >> 24635752

Systematic assessment of survey scan and MS2-based abundance strategies for label-free quantitative proteomics using high-resolution MS data.

Chengjian Tu¹, Jun Li, Quanhu Sheng, Ming Zhang, Jun Qu.

Abstract

Survey-scan-based label-free method have shown no compelling benefit over fragment ion (MS2)-based approaches when low-resolution mass spectrometry (MS) was used, the growing prevalence of high-resolution analyzers may have changed the game. This necessitates an updated, comparative investigation of these approaches for data acquired by high-resolution MS. Here, we compared survey scan-based (ion current, IC) and MS2-based abundance features including spectral-count (SpC) and MS2 total-ion-current (MS2-TIC), for quantitative analysis using various high-resolution LC/MS data sets. Key discoveries include: (i) study with seven different biological data sets revealed only IC achieved high reproducibility for lower-abundance proteins; (ii) evaluation with 5-replicate analyses of a yeast sample showed IC provided much higher quantitative precision and lower missing data; (iii) IC, SpC, and MS2-TIC all showed good quantitative linearity (R(2) > 0.99) over a >1000-fold concentration range; (iv) both MS2-TIC and IC showed good linear response to various protein loading amounts but not SpC; (v) quantification using a well-characterized CPTAC data set showed that IC exhibited markedly higher quantitative accuracy, higher sensitivity, and lower false-positives/false-negatives than both SpC and MS2-TIC. Therefore, IC achieved an overall superior performance than the MS2-based strategies in terms of reproducibility, missing data, quantitative dynamic range, quantitative accuracy, and biomarker discovery.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Ions
Proteins

Year: 2014 PMID： 24635752 PMCID： PMC3993956 DOI： 10.1021/pr401206m

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Accurate and precise quantitative strategy is critical for reliable proteomic expression profiling and discovery of biomarker candidates. Roughly, LC/MS-based relative quantification methods can be divided into two main categories: isotope labeling and label-free methods. Stable isotope labeling approaches play a prominent role in quantitative proteomics. Since the introduction of the isotope-coded affinity tag (ICAT) in 1999,[1] a variety of chemical- or metabolic-labeling methods have been developed, such as the 18O-labeling,[2,3] stable isotope labeling by amino acids in cell culture (SILAC),[4] isobaric tags for relative and absolute quantification (iTRAQ),[5] tandem mass tags (TMT),[6,7] super-SILAC,[8] and more recently, neutron-encoded mass signatures (NeuCode).[9] Although most of these strategies have been tremendously successful and widely applied in proteomics profiling, certain drawbacks do exist, such as the high expense of reagents that renders the techniques cost-prohibitive for large-scale studies and that the efficiency and consistency of labeling may not be perfect for some methods and in some cases, complex data interpretation.[10,11] Label-free approaches have emerged as an attractive alternative to isotope-labeling methods, due to their simplicity, cost-effectiveness, and feasibility of quantifying multiple biological samples.[10,12] These approaches consist of two conceptually different types, which employ quantitative features either derived from MS2 product ion scans[13,14] or peptide precursor signals (MS1) obtained by the survey scan[15−17] to measure relative protein abundances in proteolytic digests. One of the classical MS2-based methods, termed spectral counting, estimates protein abundance by counting the total number of MS/MS spectra matched to all peptides from a given protein. This approach was recently improved by incorporating the MS2 fragment ion intensities and unique peptide number for quantitative analysis, for example, the normalized spectral index (SIN);[18] other examples in this avenue include exponentially modified protein abundance index (emPAI)[19] and the Normalized spectral abundance factor (NSAF).[20] Nonetheless, the MS2-based approaches are challenged by the nature of current MS/MS sampling techniques such as the data-dependent MS2 fragmentation. First, dynamic exclusion of the precursors fragmented in a previous scan, a widely practiced technique to improve the chance of detecting low-abundance peptides, significantly affects spectral acquisition; second, the MS2 acquisition for low-abundance peptides are often suppressed by coeluting peptides of higher abundance; finally, the accurate quantitative information for lower-abundance proteins/peptides, (e.g., these resulting in spectral counts of “1” and “0”, a very common sight in LC-MS data), is often elusive.[10,21,22] By comparison, the survey scan-based (or ion current-based) approach quantifies proteins by measuring the extracted ion current peak areas of peptide precursors (MS1) from each protein. The calculation of peak areas is independent to MS2 acquisition, and consequently, the above-mentioned problems associated with the MS2 sampling processes are either avoided or greatly alleviated. An additional salient advantage of ion current-based method is that as long as well-defined ion current peaks are observed and aligned properly across all samples, the corresponding peptide can be quantified without missing data (missing abundance values in one or more replicates) even if it was only successfully identified for once in the entire sample set.[23,24] Nevertheless, carrying out ion current-based quantification is generally more technically demanding than MS2-based approaches, owing to the requirement of accurate matching and quantification of precursor peaks among all samples, which in turn requires specific and accurate MS detection (i.e., the use of a high-resolution MS analyzer), as well as highly reproducible sample preparation and chromatographic separation.[10,16] In the recent several years, the rapid-growing availability of high-resolution MS analyzers such as the new generation of time-of-flight, Fourier transform ion cyclotron resonance, and Orbitrap may have favored the application of ion current-based approaches in proteomic studies.[16,25−27] The use of high-resolution analyzers permits extraction of peptide ion currents within a very narrow m/z range (e.g., <0.02 mass unit) to substantially reduce chemical noises and interferences, and therefore, greatly improves sensitivity and specificity of ion current-based quantification.[28,29] Meanwhile, the MS2 total ion current (or MS/MS fragment ion intensities, MS2-TIC) approach was introduced, a technique which utilizes the sum of the product ion intensities in each MS2 spectrum assigned to a given protein as the quantitative feature.[18,30] More recently, on the basis of high-resolution MS data, researchers have described that using MS2 intensities resulted in protein abundance measurements nearly as accurately as with MS1-intensities,[31] and combing precursor intensities with spectral counts, these researchers identified more true positives.[32] However, our previous works found that when using data from high-resolution MS, the MS1-based approach dramatically improved the quantitative accuracy compared to MS2-based methods, especially for the quantification of low-abundance proteins.[24,33] Given the above-mentioned developments in both MS1-based and MS2-based approaches, it would be of high value to perform an updated, comprehensive comparison of the quantitative performance of the MS1-based method versus MS2-based methods using high-resolution MS data. Such a comparison would greatly help us to understand the limitation and capacity of each approach and is highly valuable for the development of label-free quantification strategy. However, to our knowledge, a systematic and comprehensive comparison has not been conducted before this study. Here, we assessed the MS1-based ion current-based method (IC) along with two popular MS2-based methods, including spectral count (SpC) and MS2 total ion current (MS2-TIC),[18] using data sets generated by LTQ/Orbitrap MS. Quantitative metrics including reproducibility, precision, accuracy, missing data, dynamic linear range, and sensitivity/specificity for discovery of significantly altered proteins were thoroughly evaluated.

Materials and Methods

Protein Sample Preparation

The human bronchoalveolar lavage fluids, rat brain, rat liver, and rat retina were from Buffalo General Medical Center (Buffalo, NY). The human skeletal muscle cells, E. coli cells, and yeast cells were from Kinex Pharmaceuticals (Buffalo, NY). Cell or tissue samples used in this study were homogenized in an ice-cold lysis buffer (50 mM Tris-formic acid, 150 mM NaCl, 0.5% sodium deoxycholate, 2% SDS, 2% NP-40, pH 8.0) using a Polytron homogenizer (Kinematica AG, Switzerland). Homogenization was performed for a 5–10 s burst at 15 000 rpm, followed by a 20 s cooling period until the foam settled. This procedure was repeated 10 times. The mixture was then sonicated in a cold room for ∼10 min with a low-power sonicator until the solution was clear, followed by centrifugation at 140 000g for 1 h at 4 °C. The supernatant was carefully transferred to a fresh tube, and the protein concentrations were measured using BCA Protein Assay (Pierce, Rockford, IL). The resulted samples were stored at −80 °C until analysis. In order to remove undesirable components in the samples while maintaining high peptide recovery, a precipitation/on-pellet-digestion protocol was employed as previously described.[24,33−35] The precipitation/on-pellet-digestion procedure was directly performed without protein extraction when processing the human bronchoalveolar lavage fluid sample. Specimens (each containing 100 μg of total protein) were reduced with TCEP (3 mM) for 10 min and then alkylated with 20 mM IAM for 30 min in darkness. The mixture was precipitated by stepwise addition of 9 volumes of cold acetone with continuous vortexing and then incubated overnight at −20 °C. After centrifugation at 12 000g for 20 min at 4 °C, the supernatant was removed, and the pellet was allowed to air-dry. Two digestion phases were employed for the on-pellet digestion. In phase 1 (pellet-dissolving phase), 50 μL of trypsin solution at an enzyme/substrate ratio of 1:30 (w/w) was added and incubated at 37 °C for 6 h with agitation; then in phase 2 (complete-cleavage phase), another 50 μL of trypsin solution was added at an enzyme/substrate ratio of 1:25 (w/w), and the mixture was incubated overnight to achieve complete digestion.

NanoLC-MS/MS Analysis

The nano-RPLC (reverse-phase liquid chromatography) system consisted of a Spark Endurance autosampler (Emmen, Holland) and an ultrahigh pressure Eksigent (Dublin, CA) nano-2D Ultra capillary/nano-LC system. Mobile phase A and B were 0.1% formic acid in 2% acetonitrile and 0.1% formic acid in 88% acetonitrile, respectively. Four microliters of sample was loaded onto a reversed-phase trap (300 μm I.D. x1 cm) unless otherwise noted in the paper, with 1% mobile phase B at a flow rate of 10 μL/min, and the trap was washed for 3 min. A series of nanoflow gradients (flow rate, 250 nL/min) was used to back-flush the trapped samples onto the nano-LC column (75 μm i.d. × 75 cm) for separation. The nano-LC column was heated at 52 °C to greatly improve both chromatographic resolution and reproducibility. An LTQ/Orbitrap XL hybrid mass spectrometer (Thermo Fisher Scientific, San Jose, CA) was used for protein identification. The parameters for MS are shown in our previous publications.[24,33−35] In this study, two technical replicates of each of the seven biological samples (human bronchoalveolar lavage fluid, human skeletal muscle cells, rat brain, rat liver, rat retina, and E. coli cells) and five replicates of S. cerevisae (yeast) cell sample were analyzed in order to assess quantitative reproducibility and missing data by the three label-free approaches. To investigate the correlation between quantitative values given by the three abundance features and protein abundances in a complex proteome, an E. coli extract was spiked with bovine serum albumin (BSA) at six different levels (0.025, 0.1, 0.5, 2.5, 12.5, and 62.5% of total proteins) and analyzed in triplicate. The evaluation of the correlation among the quantitative values given by SpC, MS2-TIC, and IC with different amounts of sample loading, a pooled digest of a prostate cancer cell line (PC3-LN4) sample was loaded at protein amounts of 0.5, 1, 2, and 4 μg. In addition, to assess the performance of biomarker discovery by SpC, MS2-TIC and IC, we employed the “Study 6 LTQ Orbitrap XL @P65” data set generated by the program of Clinical Proteomic Technology Assessment for Cancer (CPTAC).[34,36] According to the publicly available documentation associated with this study, the Universal Proteomics Standard set 1 (UPS1, a 48-protein equimolar standard) was spiked at amounts of 0.25, 0.74, and 2.2 fmol/μL into yeast lysate for sets A, B and C, the subset of studies investigated in the current work. Each sample was analyzed by nano-LC/MS with an Orbitrap XL analyzer in triplicate.

Database Search and Data Validation

The raw data files were searched against the Swiss-Prot protein database (version 06/13/2012) using the Sequest algorithm embedded in Proteome Discoverer 1.2 (Thermo-Scientific). A total of 7766 protein entries, 20 238 entries, 4431 entries, and 7801 entries were presented in respective rat, human, E.coli, and yeast protein database. The databases were augmented with the sequences of bovine serum albumin and the UPS1 48 proteins (Sigma-Aldrich) when appropriate. The search parameters used were as follows: 10 ppm tolerance for precursor ion masses and 1.0 Da for fragment ion masses. Two missed cleavages were permitted for fully tryptic peptides. Carbamidomethylation of cysteines was set as a fixed modification, and a variable modification of methionine oxidation was allowed. The false discovery rate (FDR) was determined by using a target-decoy search strategy.[37] The sequence database contains each sequence in both forward and reversed orientations, enabling FDR estimation. Scaffold 3.6[38] (Proteome Software, Portland, OR), which is capable of handling large-scale proteomic data sets, was used to validate MS2-based peptide and protein identification based on cutoffs of cross-correlation (Xcorr) and Delta Cn values. The peptide FDR was controlled at 0.1%. Validated peptides were grouped into individual protein clusters by Scaffold software.

Protein Quantification

The protein quantitative values based on SpC and MS2-TIC were obtained using Scaffold 3.6[38] (Proteome Software, Portland, OR) under the same protein/peptide identification criteria as described above. The quantitative analysis by IC was performed by two steps: procurement of area-under-the-curve (AUC) data for peptides using SIEVE v2.0 (Thermo Scientific, San Jose, CA) and then a sum-intensity method to aggregate the quantitative data from peptide level to protein level as previously reported.[24] SIEVE is a label-free differential expression package that performs chromatographic alignment and global intensity-based MS1 feature extraction.[39] The package processes chromatographic alignment among sequential LC/MS runs using the ChromAlign algorithm.[40] Quantitative “frames” were defined based on m/z (width: 10 ppm) and retention time (width: 2.5 min) of peptide precursors in the aligned runs. Peptide ion current areas were calculated for individual replicates in each frame. Subsequent to ion current values extraction, MS2 fragmentation scans associated with each frame were identified by importing the msf files created by Proteome Discoverer (cf. the database search and data validation procedure). Peptides shared among different protein groups were excluded from quantitative analysis. For SpC, MS2-TIC, and IC, relative quantification of protein levels were based on the sum of respective abundance values of all peptides assigned to each protein, without any statistical outlier analysis. Normalization was performed against total abundance values in individual runs. In case of missing data, baseline quantitative values were assigned (e.g., a 0.5 and 1000, respectively, for SpC and MS2-TIC at the protein level and 1000 for IC at peptide level). The value of 0.5 counts for spectrum counting, and 1000 for MS2-TIC and ion current (IC) were experimentally determined (Supplemental Figure 1). Statistical significance between groups (comparing case vs control samples) was evaluated using a Student’s t test, with a p-value cutoff of 0.05. The relative protein ratio of a protein between the groups was calculated by comparing the average abundance values of the protein in each group. Abundance change > 2-fold and p-value < 0.05 were used as the thresholds to define altered proteins.

RESULTS AND DISSCUSION

Label-free approaches play a prominent role for relative proteomic quantification and biomarker discovery. To date, MS2-based methods have been the most common type of label-free approaches, especially for data generated by lower resolution MS.[10,31] This is largely due to the relatively poor specificity of low-resolution MS, which leads to difficulties in precise and selective measurement/match of peptide precursor ion currents among multiple runs of complex proteomic samples. Owing to the drastically increased availability of high-resolution instruments and the technical advances in hybrid instruments that have tremendously improved the robustness, throughput and sensitivity of high-resolution MS,[41−43] the application of the ion-current-based approach has been rapidly growing in the most recent years.[16,25,26] A comprehensive and updated comparison of these label-free methods based on the data generated by the high-resolution analyzers is highly valuable but remains to be conducted. To address this need, here we evaluated the ion current-based and several MS2-based approaches for quantitative reproducibility and accuracy, missing values, dynamic linear range, and performance for discovery of significantly altered proteins, using various high-resolution-MS data sets on complex proteomes that are either generated by our lab or publicly available.

Quantitative Reproducibility by the Three Label-Free Approaches

Good quantitative reproducibility is indispensable for accurate and precise proteomic quantification and reliable biomarker discovery. Here we evaluated the reproducibility of the three approaches (SpC, MS2-TIC, and IC), by correlating the quantitative results between duplicate LC-MS analyses of proteomic samples from seven different biological sources, including human bronchoalveolar lavage fluid, human skeletal muscle cells, rat brain, rat liver, rat retina, E. coli cells, and yeast cells. These samples represent a wide variety of biological matrices seen in typical proteomic investigations. Among the three approaches that are based on different abundance features, only IC is based on survey-scan (MS1). In case of missing data in any of the methods (i.e., the quantitative value of a protein is not measured in one or more replicates) a zero value was assigned to the affected replicate. The normalization was performed against the sum of all individual abundance values in the same replicate. In order to obtain a reliable comparison, a set of strict cutoff criteria for protein identification and validation were employed, resulting in a peptide FDR of 0.1% in individual data set (as determined by the target-decoy database searching strategy, see Materials and Methods). Linear regression of the correlation between the duplicate runs was performed for each of the seven types of proteomic samples. The R-squared values for SpC, MS2-TIC, and IC are 0.993 ± 0.007, 0.990 ± 0.007, and 0.998 ± 0.003, respectively. The good reproducibility achieved by these label-free approaches is in line with previous reports.[16,18,44,45] To further assess whether such correlations are abundance-dependent, we conducted comparison by separating the quantified proteins into two groups: high-abundance proteins (the top 33% abundant proteins as determined by spectral count) and lower-abundance proteins (the remaining 67% proteins). For high-abundance proteins, the R2 values for SpC, MS2-TIC, and IC are 0.992 ± 0.008, 0.989 ± 0.008, and 0.998 ± 0.003, respectively; for lower-abundance proteins, the R2 values for SpC, MS2-TIC, and IC are 0.407 ± 0.126, 0.702 ± 0.127, and 0.990 ± 0.008, respectively (Figure 1A). For lower-abundance proteins, only IC achieved a high quantitative reproducibility, which can also be visualized in Figure 1B, which shows representative scatter plots between two replicate runs for SpC, MS2-TIC, and IC. A substantially higher degree of reproducibility for quantifying lower-abundance proteins was observed for IC over SpC and MS2-TIC. All protein abundance values by these three approaches in the paired LC-MS runs of seven different biological samples are in the Supplemental Table I.

Figure 1

Quantitative reproducibility of the three label-free methods. (A) Comparison of the coefficient-of-determination (R2) of the linear regression by three methods including spectral count (SpC), MS/MS total ion current (MS2-TIC), and ion current (IC). Data of duplicate LC/MS runs of seven types of proteomic samples (human bronchoalveolar lavage fluid, human skeletal muscle cells, rat brain, rat liver, rat retina, E. coli cells, and yeast cells) were analyzed, and each data point represents the R2 of one of the proteome samples. The high-abundance proteins refer to the top 33% of all proteins ranked by spectral count, and the rest are designated as lower-abundance proteins. (B) Representative scatter plots of duplicate LC-MS/MS analyses by SpC, MS2-TIC, and IC. The two axes represent the quantitative abundance values of the same proteins, respectively, by the two duplicate runs.

Assessment of Quantitative Precision and the Level of Missing Data

The precision of SpC, MS2-TIC, and IC was evaluated by measuring coefficients of variation (CV) for the quantification of individual proteins, using five LC-MS runs (technical replicates, N = 5) of a yeast digest. It was observed that the distributions of CV for the 1196 quantified proteins are quite different by the three label-free methods. Figure 2A shows the box-and-whisker plot of these distributions, where the bottom and the top of the boxes, respectively, correspond to the top 25th and 75th percentile values of the CV distribution, the horizontal lines inside the box to the median CV values, and whiskers to the minimum and maximum values. The median CV values for quantification of individual proteins are 38%, 52%, and 12%, respectively for SpC, MS2-TIC, and IC. Figure 2B shows the distribution of CV versus relative protein abundance. While more than 99% of proteins have CV < 50% using IC approach, only 56.5% and 49.6% of the proteins are under this threshold, respectively, for SpC and MS2-TIC. Furthermore, for all three methods, lower quantitative precision (i.e., higher CV) was observed for low-abundance proteins compared to the high-abundance ones. Among them, IC achieved much lower CV for low-abundance proteins than the other two methods, which is in agreement with above observation that IC enabled more reproducible quantification for lower-abundance proteins (cf. Figure 1B). This result suggests that IC may be much more reliable than SpC and MS2-TIC for quantifying lower-abundance proteins. Supports of this notion can also be found in previous observations that spectral-count-based approaches yielded unreliable quantitative values for low-abundance peptides/proteins, by various laboratories including ours.[10,24,46,47]

Figure 2

Coefficients of variation (CV) of the abundance values of the 1196 quantified yeast proteins by SpC, MS2-TIC, and IC (N = 5 LC-MS analyses). (A) Box-and-whisker plot analysis was employed to show the spread of protein CVs around the median value (the horizontal line inside the box); bottom and top of the boxes correspond to the top 25th and 75th percentile of the CV distribution and whiskers to the minimum and maximum values. (B) The distribution of CV vs protein abundance. Red circles indicate SpC, black squares indicate MS2-TIC, and blue triangles indicate IC data spot. Owing to the high complexity and wide dynamic ranges of typical proteomes and certain technical issues such as the sampling nature of data-dependent MS2 analysis, missing data (i.e., missing quantitative values in one or more replicates) is a prevalent challenge in quantitative proteomics, which may severely undermine the reliability of quantification and biomarker discovery.[48,49] Missing values may arise from technical and/or biological sources,[48−50] and here we evaluated the levels of missing values exclusively from technical aspects. The five replicate runs of yeast sample were utilized to assess the frequency of missing abundance values at protein level. For both SpC and MS2-TIC, the frequency of missing quantitative values equals to the frequency of missing identifications of proteins. The missing values by SpC/MS2-TIC are summarized in Table 1, which shows 13.2% of all analyzed proteins were identified/quantified only in one replicate and thus resulting missing values in 4 other replicates (denoted as “4 missing” in Table 1); only 58.4% of all proteins did not have missing value (i.e., identified/quantified in all five replicates). By comparison, the IC approach was able to quantify the vast majority of proteins (99.8%) in all five replicates, rendering these proteins free of missing quantitative value (Table 1); the missing values of the rest 0.2% of proteins only exist in one out of the five replicates. This result demonstrated the IC is considerably less prone to the problem of missing quantitative value than MS2-based approaches. The explanation for such a dramatic difference is that the IC approach does not rely on MS2 for calculation of peak areas and thus decoupling the missing quantitative values from missed MS2-based identification in a replicate; given that the high-resolution MS such as Orbitrap enables highly sensitive and specific MS1 detection and excellent matching of peptide precursor among different runs,[10,39,44] in many cases the IC approach is capable of quantifying a peptide in all replicates with sufficient sensitivity even if the peptide was only identified once in all LC-MS experiments. This also contributes to a more reliable relative quantification by IC (discussed below). Per contra, as discussed in the Introduction section, both SpC and MS2-TIC are liable to missing values because of the relatively lower sensitivity and reproducibility of MS2 spectra acquisition, rooting from the use of dynamic exclusion technique and the fact that MS2 spectral acquisitions of low-abundance peptides are often suppressed by coeluting peptides.[10,21,22]

Table 1

Frequency of Missing Quantitative (Abundance) Values among the 1196 Quantified Proteins by Spectral Count (SpC), MS2-TIC, and Ion Current (IC) among Five Replicate LC-MS/MS Runs of a Yeast Sample (N = 5)

	no missing	1 missing	2 missing	3 missing	4 missing
SpC	58.4%	10.7%	10.4%	7.4%	13.2%
MS2-TIC	58.4%	10.7%	10.4%	7.4%	13.2%
IC	99.8%	0.2%	0.0%	0.0%	0.0%

The abundance values of each protein by the three methods and in each replicate are in the Supplemental Table 2.

Quantitative Responses of SpC, MS2-TIC, and IC to Levels of Protein Spiked in a Complex Proteome

For a relative quantification method, the capacity of obtaining linear, quantitative responses to protein abundances in a complex sample is important. A number of previous studies showed good linear correlations between the quantitative values by SpC or IC and protein abundances.[14,22,51,52] Here, to evaluate the correlation between the quantitative values obtained by different approaches and protein abundances in complex proteomes, we spiked E. coli extract with bovine serum albumin (BSA) at six different levels (0.025, 0.1, 0.5, 2.5, 12.5, and 62.5% BSA in the total protein). This series of mixtures were independently processed and digested using a precipitation/on-pellet digestion method[34] and then analyzed by LC-MS in triplicate. The relationships of the quantitative values of BSA given by of SpC, MS2-TIC, and IC versus the relative BSA abundances are shown in Figure 3A–C. In this study, no peptide was identified by MS2 from the mixture of 0.025% BSA in E. coli, thus only five different levels (0.1% to 62.5% BSA) were quantified for SpC and MS2-TIC methods. For the IC approach, the ion currents of BSA peptides from 0.025% BSA were quantified with well-defined peaks, and the derived quantitative values fit the trend line well (Figure 3C). This indicates that the IC method achieves a wider dynamic range of protein quantification than SpC or MS2-TIC. Excellent linearity was observed for all three methods (R2 ≥ 0.99) over the entire concentration range, which spun at least 3 orders of magnitude. This wide linear range suggests that the three methods may be able to accurately reveal large protein changes. Protein abundance data of BSA by the three methods is available in the Supplemental Table III.

Figure 3

Quantitative responses of spectral count (SpC), MS2-TIC and ion current (IC) vs protein abundance levels. BSA was spiked into E. coli extract at six different levels spanning a concentration range >1000. Excellent linearity was observed for (A) spectral count (SpC), (B) MS2-TIC, and (C) ion current (IC). As no BSA-derived peptide was identified in the lowest level, the level was below the detection limits of SpC and MS2-TIC; by comparison, this level can be quantified by IC with sufficient S/N.

Quantitative Responses to Protein Loading Amounts by SpC, MS2-TIC, and IC

Investigation of correlation between the quantitative values and the total amounts of proteins loaded per LC/MS analysis may reveal the effect of variations in sample preparation and loading and the capacity of each approach to detect or tolerate uneven loading, which would provide valuable information for method development and quality control of label-free quantification approaches. Nonetheless, such a study has been hardly conducted. Griffin et al. demonstrated that protein SIN values (incorporating abundance features of unique peptide number, spectral count, and fragment ion intensity) of two LC-MS analyses with different protein loads exhibited a linear correlation (R2> 0.94), and the slope of the line corresponded to the ratio of the two loading amounts.[18] Here the changes of quantitative values in response to varying total protein loading amounts by SpC, MS2-TIC, and IC were investigated. LC-MS analyses of a prostate cancer cell (PC3-LN4) sample at the loading levels of 0.5, 1, 2, and 4 μg per injection were utilized for this evaluation. In total, 1122 common protein groups were identified in all four loading levels, and linear regression analysis correlating the quantitative values of these proteins to loading amount respectively at 1, 2, and 4 μg versus 0.5 μg was performed. In order to investigate the “native” quantitative responses by the three methods, normalization was not performed. The results are shown in Figure 4. The linearity of the correlations is good for all three methods, whereas the R2 values of IC (ranged from 0.987 to 0.994) are better than either SpC (0.899–0.936) or MS2-TIC (0.916–0.932). Interestingly, not all methods showed a linear response to protein loading levels. If a method exhibited a “perfect” linear response to loading amounts, the true values of the slopes of trend lines of 1, 2, and 4 μg injections (all against 0.5 μg) would have been 2.00, 4.00, and 8.00, respectively. As shown in Figure 4, the three slopes by SpC were 1.03, 1.14 and 1.17, indicating no perceivable change in quantitative values responding to varying loading amounts. This is likely because the “dynamic exclusion” (a commonly used feature in data-dependent MS2 experiments) compensates the changes in total protein abundance. By comparison, both MS2-TIC and IC exhibited linear response to loading amounts: the slopes for 1, 2, and 4 μg vs 0.5 μg injections were respectively 2.06, 5.27, and 8.99 for MS-TIC and 2.39, 4.12, and 7.88 for IC. These results indicate both MS2-TIC and IC are capable of perceiving the differences in sample loading, which may be useful characteristics for assessment of the quality of sample preparation and for such cases as the comparison of protein levels relative to units other than the amount of total proteins (e.g., comparing protein levels per volume of a body fluid). On the other hand, these results also indicated SpC is more tolerant to uneven sample loading, which is a valuable feature when it is difficult to achieve a uniform loading across all samples. Finally, the results suggest that for both MS2-TIC and IC, it is necessary to achieve highly reproducible procedures for sample preparation and LC/MS analysis to minimize the effect of variations, and proper normalization approach needs to be in place. Protein abundance values determined by SpC, MS2-TIC, and IC for each protein group are shown in the Supplemental Table IV.

Figure 4

Linear regression analysis correlating the quantitative values with protein loading amounts, by spectral count (SpC), MS2-TIC, and ion current (IC). The quantitative values of individual proteins in 1, 2, and 4 μg loading were individually plotted against these with 0.5 μg. Slopes of trend lines and R2 values are shown.

Investigation of the Performance in Discovery of Significantly Altered Proteins Using a Publicly Available Data Set (CPTAC)

One of the most common goals of proteomics is to discover differentially expressed proteins in two different states. In this study, we investigated the performances of SpC, MS2-TIC, and IC in discovering significantly altered proteins in a complex proteome. To rule out the possibility that the findings to be obtained were associated only with the specific experimental procedures in our lab, a third-party data set was employed for this investigation. Here we compared the performances in discovering significantly altered proteins by the SpC, MS2-TIC, and IC, using one well-characterized, publicly available data set (CPTAC study 6[36,53]). Only the data sets generated by the high-resolution LTQ/Orbitrap were selected. In this subset of CPTAC experiments, the Universal Proteomics Standard set 1 (UPS1 from Sigma-Aldrich, containing 48 human proteins) protein mixture was spiked at different levels into yeast whole lysate, which represents an unchanged, complex proteomic background that is typical in routine biomarker discovery studies. More details of this study set are in previous publications.[36,53] We first evaluate the three methods based on the relative quantification Study 6B (0.74 fmol/μL UPS1 spiked into yeast lysate) vs 6A (0.25 fmol/μL UPS1 spiked into yeast lysate) samples, which contain relatively low abundance of spiked UPS proteins. Each sample was analyzed in triplicate. Stringent cutoffs for protein identifications were employed to yield a FDR of 0.1% at peptide level, and two unique peptides were required for each protein group. Quantitative values are normalized against the sum of total spectral counts (SpC), total product ion intensity (MS2-TIC), or total ion current peak area (IC). In the case of missing values, baseline values of 0.5, 1000, and 1000 were respectively assigned for SpC, MS2-TIC, and IC. A total of 761 yeast proteins and 14 UPS proteins were identified and quantified in the 6B vs 6A set, and the distributions of the ratios of these proteins (6B over 6A) are illustrated in Figure 5. The theoretical 6B/6A ratios of UPS proteins and yeast proteins are respectively ∼3 (1.57 on Log2 scale) and 1 (0 on Log2 scale). A previous study demonstrated that accurate quantification in CPTAC 6B vs 6A data set may be difficult due to their low concentrations in these samples.[32] In this study, the observed mean ratios of the 14 UPS proteins were 3.94 ± 2.84, 29.02 ± 35.01, and 3.97 ± 0.99 by SpC, MS2-TIC, and IC, respectively. As shown in Figure 5A, the ratios of the 14 UPS proteins determined by IC were much more tightly centered around the theoretical value than either SpC or MS2-TIC. This is in agreement with the observed good quantitative performance of IC for low-abundance proteins (cf. Figures 1B and 2B) but not SpC or MS2-TIC. The similar trend was also observed in the ratio distribution of the yeast proteins (Figure 5B), where the mean ratios of 761 yeast proteins were 1.04 ± 0.35, 1.35 ± 2.15, and 1.03 ± 0.13, respectively, for SpC, MS2-TIC, and IC, and the ratios of individual yeast proteins by IC were also more tightly centered around the true value compared with the other two approaches. Therefore, it is clearly evident that IC showed better accuracy and precision than SpC and MS2-TIC for relative protein quantification.

Figure 5

Distribution of the protein ratios in a CPTAC data set (Study 6B over 6A) for (A) the 14 UPS proteins and (B) 761 yeast proteins quantified by spectral count (SpC), MS2-TIC, and ion current (IC).

Distribution of the protein ratios in a CPTAC data set (Study 6B over 6A) for (A) the 14 UPS proteins and (B) 761 yeast proteins quantified by spectral count (SpC), MS2-TIC, and ion current (IC). In this set of study, the levels of UPS proteins were significantly different between the two groups (mimicking the significantly altered proteins between two proteomic samples), whereas the levels of all the yeast proteins remain constant. Here we compared the three methods for their capacity of accurately discovering the changed UPS proteins, the specificity and sensitivity of discovery, and the levels of false-discoveries. The cutoff thresholds for significantly altered proteins were determined as at least 2-fold differences between the two groups and statistical p-values ≤0.05 (by student t-test) for all three methods. The volcano plots (log2 ratios vs p-values) of UPS and yeast proteins by SpC, MS2-TIC, and IC are shown in Figure 6. The black dashed lines denote the cutoff thresholds, and the altered proteins under these thresholds are indicated by red dots. As shown in Figure 6A,B and Table 2, SpC discovered 11 significantly altered proteins, among which 5 are UPS proteins (i.e., true positives) and 6 are yeast proteins (i.e., false-positives); as to MS2-TIC, 5 UPS proteins and 16 yeast proteins were determined as significantly altered (Figure 6C,D and Table 2). In contrast, all 14 UPS proteins were correctly discovered as significantly altered by IC with no false-positive, as demonstrated in Figure 6E,F and Table 2. For this CPTAC study 6B vs 6A data set, the sensitivity of altered protein discovery were 36%, 36%, and 100% by SpC, MS2-TIC, and IC, respectively, and the levels of false discovery rate by IC are far lower than the two other methods (Table 2).

Figure 6

Table 2

Sensitivity and Specificity for the Discovery of Altered Proteins (Biomarkers) by SpC, MS2-TIC, and IC Based on CPTAC Study 6 Data Setsa

	spectral count (SpC)		MS2-TIC		ion current (IC)
	6B/6A	6C/6B	6B/6A	6C/6B	6B/6A	6C/6B
identified biomarkersb	11	28	21	36	14	39
true positives (TP)c	5	11	5	13	14	28
true negatives (TN)	755	712	745	706	761	718
false positives (FP)	6	17	16	23	0	11
false negatives (FN)	9	21	9	19	0	4
sensitivity, TP/(TP + FN)	36%	34%	36%	41%	100%	88%
specificity, TN/(TN + FP)	99%	98%	98%	97%	100%	98%
false discovery rate, FP/(TP + FP)	55%	61%	76%	64%	0%	28%

The data set consists of the high-resolution MS data of CPTAC study 6A, 6B, and 6C sets, which are triplicate analyses of UPS1 protein mixture spiked, respectively, at 0.25, 0.74, and 2.2 fmol/μL into yeast proteins (representing an unchanged proteomic background).

The cutoff thresholds for biomarker discovery are >2-fold changes and p-value ≤0.05.

Definition of the terms: if a UPS1 protein were determined as a biomarker, it is a true positive (TP), otherwise a false negative (FN); if a yeast protein was NOT determined as a biomarker, it is a true negative (TN), otherwise a false negative (FN).

Volcano plots illustrating the discovery of altered proteins in CPTAC study 6B vs 6A set by spectral count (SpC, panels A and B), MS2-TIC (C and D), and ion current (IC, E and F) approaches. The levels of the 14 UPS proteins are different between the two groups (nominal 6B/6A ratio ≈ 3), whereas the levels of yeast proteins are the same. The Y-axis shows the log2 ratios of proteins quantified, and the X-axis shows the p-values (by Student’s t-test) for the comparison. Each dot represents a unique protein group, and the dashed lines denote the cutoff thresholds (p ≤ 0.05 and >2-fold change) that define significantly altered proteins, which are in turn shown as red dots. The data set consists of the high-resolution MS data of CPTAC study 6A, 6B, and 6C sets, which are triplicate analyses of UPS1 protein mixture spiked, respectively, at 0.25, 0.74, and 2.2 fmol/μL into yeast proteins (representing an unchanged proteomic background). The cutoff thresholds for biomarker discovery are >2-fold changes and p-value ≤0.05. Definition of the terms: if a UPS1 protein were determined as a biomarker, it is a true positive (TP), otherwise a false negative (FN); if a yeast protein was NOT determined as a biomarker, it is a true negative (TN), otherwise a false negative (FN). We further investigated the performances of relative quantification and altered-protein discovery using the next tier of quantification data set, the study 6C (2.2 fmol/μL UPS1 spiked into yeast lysate) versus6B (0.74 fmol/μL UPS1 spiked into yeast lysate), which also demonstrated the superior performance of the IC approach (Table 2). A total of 729 yeast proteins and 32 UPS proteins were identified and quantified in this data set. Consistent with the observations in the 6B vs 6A study, IC achieved the best sensitivity (88%) for altered-protein discovery, compared to 34% and 41%, respectively, for SpC and MS2-TIC (Table 2); moreover, the false-positive and false-negative levels of IC were far lower (Table 2). The quantitative values by SpC, MS2-TIC, and IC for each quantified protein group in Study 6B vs 6A and 6C vs 6B are shown in the Supplemental Tables V and VI, respectively.

Conclusions

A comprehensive comparison of two types of label-free quantification approaches, ion current-based (IC) and MS2-based (SpC and MS2-TIC) approaches, was conducted using various data sets acquired by high-resolution MS. To date, SpC and MS2-TIC remain powerful and play an important role in classic biomarker discovery studies with apposite statistical tools, especially when low-resolution MS data is used. Nonetheless, it is evident that when a high-resolution MS is used, the IC approach is considerably superior to SpC and MS2-TIC in terms of quantitative reproducibility and accuracy and is much less prone to the missing-data problem, and thereby enabling more reliable proteomics quantification. Furthermore, IC was proved to be a more sensitive, accurate and reliable tool for biomarker discovery than SpC or MS2-TIC, with markedly lower false-positive and false-negative rates. Though high sample-to-sample reproducibility is more crucial for IC approach, development of informatics tools such as good algorithms for LC-MS alignment, normalization of quantitative values in each replicate, and statistical outlier analysis, may significantly reduce this demand. Moreover, when coupled with extensive fractionation and separation approaches such as long-gradient nano-LC and SDS-PAGE fractionation,[16,24,33] IC-based strategy may provide a dependable means for in-depth analysis and biomarker discovery in complex proteomes.[16,24] Given these favorable characteristics of IC approach, it is expected that further research of this technique on highly reproducible LC/MS analysis and statistics tools, and its clinical applications will emerge rapidly, and its popularity in users of high-resolution MS will continue to expand.

DATA SHARING

All raw files and data processing files associated with this paper will be available to public for download upon request.

53 in total

1. Refinements to label free proteome quantitation: how to deal with peptides shared by multiple proteins.

Authors: Ying Zhang; Zhihui Wen; Michael P Washburn; Laurence Florens
Journal: Anal Chem Date: 2010-03-15 Impact factor: 6.986

2. Mass spectrometric discovery and selective reaction monitoring (SRM) of putative protein biomarker candidates in first trimester Trisomy 21 maternal serum.

Authors: Mary F Lopez; Ramesh Kuppusamy; David A Sarracino; Amol Prakash; Michael Athanas; Bryan Krastins; Taha Rezai; Jennifer N Sutton; Scott Peterman; Kypros Nicolaides
Journal: J Proteome Res Date: 2010-06-04 Impact factor: 4.466

3. Differential proteomics via probabilistic peptide identification scores.

Authors: Jacques Colinge; Diego Chiappe; Sophie Lagache; Marc Moniatte; Lydie Bougueleret
Journal: Anal Chem Date: 2005-01-15 Impact factor: 6.986

4. ChromAlign: A two-step algorithmic procedure for time alignment of three-dimensional LC-MS chromatographic surfaces.

Authors: Rovshan G Sadygov; Fernando Martin Maroto; Andreas F R Hühmer
Journal: Anal Chem Date: 2006-12-15 Impact factor: 6.986

Review 5. Proteomics by mass spectrometry: approaches, advances, and applications.

Authors: John R Yates; Cristian I Ruse; Aleksey Nakorchevsky
Journal: Annu Rev Biomed Eng Date: 2009 Impact factor: 9.590

6. XDIA: improving on the label-free data-independent analysis.

Authors: Paulo C Carvalho; Xuemei Han; Tao Xu; Daniel Cociorva; Maria da Gloria Carvalho; Valmir C Barbosa; John R Yates
Journal: Bioinformatics Date: 2010-01-26 Impact factor: 6.937

7. A novel alignment method and multiple filters for exclusion of unqualified peptides to enhance label-free quantification using peptide intensity in LC-MS/MS.

Authors: Xianyin Lai; Lianshui Wang; Haixu Tang; Frank A Witzmann
Journal: J Proteome Res Date: 2011-09-21 Impact factor: 4.466

8. Quantitative proteome analysis of human plasma following in vivo lipopolysaccharide administration using 16O/18O labeling and the accurate mass and time tag approach.

Authors: Wei-Jun Qian; Matthew E Monroe; Tao Liu; Jon M Jacobs; Gordon A Anderson; Yufeng Shen; Ronald J Moore; David J Anderson; Rui Zhang; Steve E Calvano; Stephen F Lowry; Wenzhong Xiao; Lyle L Moldawer; Ronald W Davis; Ronald G Tompkins; David G Camp; Richard D Smith
Journal: Mol Cell Proteomics Date: 2005-03-07 Impact factor: 5.911

9. An ion-current-based, comprehensive and reproducible proteomic strategy for comparative characterization of the cellular responses to novel anti-cancer agents in a prostate cell model.

Authors: Chengjian Tu; Jun Li; Yahao Bu; David Hangauer; Jun Qu
Journal: J Proteomics Date: 2012-09-07 Impact factor: 4.044

10. Neutron-encoded mass signatures for multiplexed proteome quantification.

Authors: Alexander S Hebert; Anna E Merrill; Derek J Bailey; Amelia J Still; Michael S Westphall; Eric R Strieter; David J Pagliarini; Joshua J Coon
Journal: Nat Methods Date: 2013-02-24 Impact factor: 28.547

22 in total

1. Receptor/gene/protein-mediated signaling connects methylprednisolone exposure to metabolic and immune-related pharmacodynamic actions in liver.

Authors: Vivaswath S Ayyar; Siddharth Sukumaran; Debra C DuBois; Richard R Almon; Jun Qu; William J Jusko
Journal: J Pharmacokinet Pharmacodyn Date: 2018-04-27 Impact factor: 2.745

2. Functional proteomic analysis of corticosteroid pharmacodynamics in rat liver: Relationship to hepatic stress, signaling, energy regulation, and drug metabolism.

Authors: Vivaswath S Ayyar; Richard R Almon; Debra C DuBois; Siddharth Sukumaran; Jun Qu; William J Jusko
Journal: J Proteomics Date: 2017-03-14 Impact factor: 4.044

3. Large-Scale, Ion-Current-Based Proteomic Investigation of the Rat Striatal Proteome in a Model of Short- and Long-Term Cocaine Withdrawal.

Authors: Shichen Shen; Xiaosheng Jiang; Jun Li; Robert M Straubinger; Mauricio Suarez; Chengjian Tu; Xiaotao Duan; Alexis C Thompson; Jun Qu
Journal: J Proteome Res Date: 2016-04-11 Impact factor: 4.466

4. Proteomic profiling of the retinas in a neonatal rat model of oxygen-induced retinopathy with a reproducible ion-current-based MS1 approach.

Authors: Chengjian Tu; Kay D Beharry; Xiaomeng Shen; Jun Li; Lianshui Wang; Jacob V Aranda; Jun Qu
Journal: J Proteome Res Date: 2015-04-06 Impact factor: 4.466

5. Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

Authors: Chengjian Tu; Quanhu Sheng; Jun Li; Danjun Ma; Xiaomeng Shen; Xue Wang; Yu Shyr; Zhengping Yi; Jun Qu
Journal: J Proteome Res Date: 2015-09-30 Impact factor: 4.466

6. Robust Summarization and Inference in Proteome-wide Label-free Quantification.

Authors: Adriaan Sticker; Ludger Goeminne; Lennart Martens; Lieven Clement
Journal: Mol Cell Proteomics Date: 2020-04-22 Impact factor: 5.911

Review 7. Quantitative proteomics in cardiovascular research: global and targeted strategies.

Authors: Xiaomeng Shen; Rebeccah Young; John M Canty; Jun Qu
Journal: Proteomics Clin Appl Date: 2014-07-14 Impact factor: 3.494

8. Quantitative proteomic profiling of paired cancerous and normal colon epithelial cells isolated freshly from colorectal cancer patients.

Authors: Chengjian Tu; Wilfrido Mojica; Robert M Straubinger; Jun Li; Shichen Shen; Miao Qu; Lei Nie; Rick Roberts; Bo An; Jun Qu
Journal: Proteomics Clin Appl Date: 2017-01-20 Impact factor: 3.494

9. Experimental Null Method to Guide the Development of Technical Procedures and to Control False-Positive Discovery in Quantitative Proteomics.

Authors: Xiaomeng Shen; Qiang Hu; Jun Li; Jianmin Wang; Jun Qu
Journal: J Proteome Res Date: 2015-09-01 Impact factor: 4.466

10. Morpheus Spectral Counter: A computational tool for label-free quantitative mass spectrometry using the Morpheus search engine.

Authors: David C Gemperline; Mark Scalf; Lloyd M Smith; Richard D Vierstra
Journal: Proteomics Date: 2016-03 Impact factor: 3.984