Literature DB >> 25285707

ICan: an optimized ion-current-based quantification procedure with enhanced quantitative accuracy and sensitivity in biomarker discovery.

Chengjian Tu¹, Quanhu Sheng, Jun Li, Xiaomeng Shen, Ming Zhang, Yu Shyr, Jun Qu.

Abstract

The rapidly expanding availability of high-resolution mass spectrometry has substantially enhanced the ion-current-based relative quantification techniques. Despite the increasing interest in ion-current-based methods, quantitative sensitivity, accuracy, and false discovery rate remain the major concerns; consequently, comprehensive evaluation and development in these regards are urgently needed. Here we describe an integrated, new procedure for data normalization and protein ratio estimation, termed ICan, for improved ion-current-based analysis of data generated by high-resolution mass spectrometry (MS). ICan achieved significantly better accuracy and precision, and lower false-positive rate for discovering altered proteins, over current popular pipelines. A spiked-in experiment was used to evaluate the performance of ICan to detect small changes. In this study E. coli extracts were spiked with moderate-abundance proteins from human plasma (MAP, enriched by IgY14-SuperMix procedure) at two different levels to set a small change of 1.5-fold. Forty-five (92%, with an average ratio of 1.71 ± 0.13) of 49 identified MAP protein (i.e., the true positives) and none of the reference proteins (1.0-fold) were determined as significantly altered proteins, with cutoff thresholds of ≥ 1.3-fold change and p ≤ 0.05. This is the first study to evaluate and prove competitive performance of the ion-current-based approach for assigning significance to proteins with small changes. By comparison, other methods showed remarkably inferior performance. ICan can be broadly applicable to reliable and sensitive proteomic survey of multiple biological samples with the use of high-resolution MS. Moreover, many key features evaluated and optimized here such as normalization, protein ratio determination, and statistical analyses are also valuable for data analysis by isotope-labeling methods.

Entities: CellLine Chemical Disease Species

Keywords: ion current; label-free; normalization; protein ratio determination; quantitative proteomics

Mesh：

Substances：

Year: 2014 PMID： 25285707 PMCID： PMC4261937 DOI： 10.1021/pr5008224

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Liquid chromatography–mass spectrometry (LC–MS) techniques have been prevalently employed for the identification and relative/absolute quantification of proteins. LC–MS-based quantification approaches can be roughly divided into two main categories: (i) labeling techniques such as isobaric tags for relative and absolute quantification (iTRAQ),[1] tandem mass tags (TMTs),[2] stable isotope labeling by amino acids in cell culture (SILAC),[3] and neutron-encoded mass signatures (NeuCode)[4] and (ii) label-free methods such as spectral counting[5,6] and peptide ion-current-based[7−9] approaches. Recently, because of their simplicity, cost-effectiveness, and feasibility of multiple biological samples analyses,[10−12] ion-current-based approaches have emerged as an attractive tool in quantitative proteomics. This trend has been also boosted by the dramatically increasing availability of high-resolution MS instrumentations in the past few years.[13] Besides the well-controlled sample preparation and LC/MS procedures, an appropriate method for data analysis is also essential to achieve confident and accurate ion-current-based quantification. For instance, normalization is often applied in label-free quantitative proteomics to reduce the effect of the complicated analytical variability and systematic bias.[14,15] Many normalization methods such as central tendency, lowess regression, and quantile normalization were first used in the analysis of microarray data[16,17] and have been recently adapted for analyzing proteomics data.[18,19] The evaluation of different normalization approaches has been widely performed based on high-abundance peptides (common to all or the majority of LC–MS runs) in label-free quantitative proteomics.[14,19] Kultima et al. demonstrated that the RegrRun (linear regression followed by analysis order normalization) effectively decreased the median SD by 43% on average compared with raw data in peaks that successfully matched across more than 50% LC–MS analyses.[19] In addition, many factors involved in the normalization procedure such as imputation (for missing values), retention time, precursor m/z, and prefractionation of sample also have been studied in label-free quantification.[15,18,20] Another important issue for data analysis is choice of methods to compute protein ratios based on peptide quantitative information, which has been widely studied for labeling techniques.[21] It has been demonstrated that a simple sum-of-intensities algorithm achieved superior performance over other algorithms such as average of the ratios, libra ratio, linear regression, and total least-squares for estimation of true protein ratios.[21] A systematic evaluation in this regard has not been conducted for ion-current-based label-free method, although various methods were applied in popular packages and procedures. The sum or average intensity method has been employed in packages such as the intensity-based absolute quantification (iBAQ, though the intensity is divided by the number of theoretically observable peptides),[22] Progenesis LC–MS software (Nonlinear Dynamics Limited, Newcastle upon Tyne, U.K.),[23] and the ion-current-based method we developed previously.[9,13] Packages such as Census[24] and SIEVE (Thermo Fisher Scientific, San Jose, CA)[12,25] have applied a variance-weighted method (based on standard deviation or coefficient variation of peaks/peptides) to calculate protein quantitation ratios. Other protein ratio estimation methods such as TOP3 (using the sum intensities of the top-three unique peptides)[26] and average ratios[27] are also employed in quantitative proteomics. In this study, we developed and optimized a new label-free quantitative procedure for ion-current-based quantification, ICan (ion-current-based analysis), and evaluated its capacity for proteomic quantification and the discovery of significantly different proteins, even for these with small-fold changes (1.5-fold). Key quantitative features such as frame filtering, normalization, protein ratio determination, and statistical analysis were comprehensively evaluated and optimized. With these optimizations, ICan significantly improved the quantitative accuracy and sensitivity and performance in discovering altered proteins over existing methods.

Materials and Methods

Sample Preparation

The PC3-LN4 cells and E. coli cells were from Kinex Pharmaceuticals (Buffalo, NY). The rat brain samples were from Buffalo General Medical Center (Buffalo, NY). Cell or tissue samples were homogenized in an ice-cold lysis buffer (50 mM Tris-formic acid, 150 mM NaCl, 0.5% sodium deoxycholate, 2% SDS, 2% NP-40, pH 8.0) using a Polytron homogenizer (Kinematica AG, Switzerland). After homogenization performed for a 5–10s burst at 15 000 rpm for 10 times, the mixture was then sonicated in a cold room for ∼10 min with a low-power sonicator until the solution was clear. Lysates were centrifuged at 140 000g for 1 h at 4 °C. The supernatant was collected and stored at −80 °C until analysis. For preparation of moderate-abundance proteins (MAPs), the plasma sample (∼200 uL) from a healthy young woman was fractionated with IgY14-SuperMix tandem column (Sigma-Aldrich), as previously reported.[28] Three buffers (dilution/washing buffer: 10 mM Tris-HCl, 150 mM NaCl, pH7.4 (TBS); stripping buffer: 100 mM glycine, pH2.5; neutralization buffer: 100 mM Tris-HCl, pH8.0) were, respectively, used for loading/washing, eluting, and neutralization. The resulting flow-through fraction (low-abundance proteins) and the bound/eluted fractions from IgY-14 (high-abundance proteins) and from SuperMix (MAPs) were collected separately. All fractions were then individually concentrated in Amicon centrifugal filter with 3-kDa molecular mass cutoff (EMD Millipore), followed by buffer exchange to 50 mM NH4HCO3 according to the manufacturer’s instruction. Protein concentration was measured using BCA Protein Assay (Pierce, Rockford, IL). The amounts of 100 and 90 μg E. coli extracts were, respectively, spiked with bovine serum albumin (BSA) at four different levels (0.025, 0.05, 0.075, and 0.1% of total proteins) and MAPs at two different levels (5 μg and 7.5 μg). All samples (each containing ∼100 μg of total protein) were reduced with TCEP (3 mM) for 10 min and then alkylated with 20 mM IAM for 30 min in darkness. A precipitation/on-pellet-digestion procedure was applied to performed precipitation and tryptic digestion as previously described.[9,29]

NanoLC–MS/MS Analysis

Peptide samples were analyzed using an ultrahigh pressure Eksigent (Dublin, CA) nano-2D Ultracapillary/nano-LC system coupled to a LTQ/Orbitrap XL hybrid mass spectrometer (Thermo Fisher Scientific, San Jose, CA). The mobile phase consisted of 0.1% formic acid in 2% acetonitrile (A) and 0.1% formic acid in 88% acetonitrile (B). Samples were loaded onto a reversed-phase trap (300 μm ID × 1 cm), with 1% mobile phase B at a flow rate of 10 μL/min, and the trap was washed for 3 min. A series of nanoflow gradients (flow rate, 250 nL/min) was used to back-flush the trapped samples onto the nano-LC column (75 μm ID × 75 cm, packed with 3 μm particles) for separation. The nano-LC column was heated to 52 °C to greatly improve both chromatographic resolution and reproducibility. To stabilize ionization efficiency, the spray tip was cleaned by dripping 50% methanol by gravity after every three runs. The parameters for MS were demonstrated in our previous publications.[9,12] In this study, for the spiked-in BSA experiment, each group at different BSA concentration was analyzed four times; for the spiked-in MAP experiments, two groups at different MAPs concentration were alternatively analyzed five times. Five consecutive runs of the rat brain sample and six runs with different load amount of PC3-LN4 cell (1 and 2 μg, three replicates per group) were further analyzed to assess different normalization methods. In addition, to assess the performance of biomarker discovery by ICan and iBAQ, we employed the “Study 6 LTQ Orbitrap XL @P65” data set generated by the program of Clinical Proteomic Technology Assessment for Cancer (CPTAC).[30] According to the publicly available documentation associated with this study, the Universal Proteomics Standard set 1 (UPS1, a 48-protein equimolar standard) was spiked at amounts of 0.25 and 0.74 fmol/μL into yeast lysate for sets A and B, the subset of studies investigated in the current work. Each sample was analyzed by nano-LC–MS with an Orbitrap XL analyzer in triplicate.

Database Search and Validation

Proteome Discoverer version 1.4.1.14 (Thermo-Scientific) was used to perform database searching against Swiss-Prot protein database (version 06/13/2012) for the BSA spiked-in experiment, five consecutive LC–MS runs experiment, and six LC–MS runs with different load amount experiment. MaxQuant[31] v1.4.1.2, incorporated with the Andromeda search engine,[32] was used for the MAP spiked-in experiment and CPTAC study 6 data. A total of 7766 protein entries, 20 238 entries, 4431 entries, and 7801 entries were presented in respective rat, human, E. coli, and yeast database. The databases were augmented with sequence of BSA, the UPS1 48 proteins (Sigma-Aldrich), and 118 MAPs (achieved from three replicate LC–MS/MS runs of MAP sample) when appropriate. The search parameters used were as follows: 10 ppm tolerance for precursor ion masses and 1.0 Da for fragment ion masses. Two missed cleavages were permitted for fully tryptic peptides. Carbamidomethylation of cysteines was set as a fixed modification, and a variable modification of methionine oxidation was allowed. The false discovery rate (FDR) was determined by using a target-decoy search strategy.[33] The sequence database contains each sequence in both forward and reversed orientations, enabling FDR estimation. For resulted files from Proteome discoverer, Scaffold v4.2.0 (Proteome Software, Portland, OR) was used to validate MS2-based peptide and protein identification based on cutoffs of cross-correlation (Xcorr) and Delta Cn values. The FDR was set to 0.01 and 0.05, respectively, for peptide and protein identifications. For MaxQuant, the FDR was set to 0.01 for peptide and protein identifications, respectively. The identifications from the reverse database and common contaminants were eliminated.

Protein Quantification

The protein quantitative values based on MS2-TIC, NASF and emPAI for each data set were obtained using Scaffold v4.2.0 under the same peptide/protein identification criteria. The iBAQ intensities, the sum of intensities of all peptides divided by the number of theoretically observable peptides, were achieved from the MaxQuant using standard settings with the option “match between” runs selected. The iBAQ values for each protein are normalized against sum of quantitative values in individual runs. The quantitative analysis by ICan was performed as shown in the pipeline (Figure 1). The peak detection and chromatographic alignment based on retention time, m/z, and charge states were analyzed by SIEVE v2.1 (Thermo Scientific, San Jose, CA). Quantitative frames/features were defined based on m/z (width: 10 ppm) and retention time (width: 2.5 min) of peptide precursors in the aligned runs. Peptide ion current areas were calculated for individual replicates in each frame. Subsequently, using tools in-house the MS2 fragmentation scans associated with each frame were assigned to the peptide/protein identifications from Proteome Discoverer or MaxQuant as previously described. Frames assigned to multiple peptides were excluded in ICan. The LOESS normalization[34] was performed to reduce the systematic bias. In the case of missing data, a value of 1000 as the baseline quantitative value was assigned.[13] After further excluding frames shared with multiple proteins, intensities for frames with the same sequence were combined to be the unique peptide intensity and then intensities for unique peptides of the same protein were further combined to be the protein intensity with Grubbs’ test analysis in both steps. Grubbs’ test was performed by the ListPOR (v Version 2.2.2104) program (panomics.pnnl.gov). Minimum data set presence 3 and 2, p value cutoff of 0.01 and 0.05, were, respectively, set at frame level and unique peptide level. The relative protein ratio was calculated by comparing the summed abundance values of the protein in each group. Student’s t-test statistics was applied to analyze log-transformed values of protein intensities for all of these methods. Abundance change ≥ 1.3-fold and p value ≤ 0.05 were used as the thresholds to define altered proteins. The p-value adjustments for multiple testing were evaluated according to sequential Bonferroni correction (SB),[35] Benjamini–Hochberg FDR control (BH),[36] and sequential Fisher’s combined probability test (SFisher).[37]

Figure 1

Flowchart of ICan. The ICan supports identification results from Proteome Discoverer, MaxQuant, and Mascot. The optimal normalization and protein ratio estimation approaches were also integrated.

Results and Discussion

For label-free proteomic quantification, accurate and precise quantification of low-abundance proteins remains challenging. As demonstrated by various laboratories including ours, spectral count-based approaches resulted in suboptimal quantification of low-abundance proteins due to the inherent biases and variations in data-dependent sampling of fragment ions (MS2).[9,10,38] By comparison, ion-current-based approaches have been shown to afford markedly improved quantification for low-abundance proteins when efficient and reproducible liquid-chromatography (LC) separation and high-resolution MS are employed.[9,10,13,39] To date, owing to the prevalent use of high-resolution MS, ion-current-based methods have become the most promising label-free approaches.[10,13] However, comprehensive evaluation and optimization of data analysis approaches for ion-current-based quantification have not been adequately reported. Here, based on extensive evaluation and optimization, we developed an optimal ion-current-based procedure (Figure 1) termed ICan (ion-current-based analysis) and assessed its capacity for proteomic quantification and discovery of significantly altered proteins, even for these with small-fold changes (1.5-fold). The ICan is designed for data generated from high-resolution MS; in the current work, we chose to interface SIEVE (Thermo Scientific, San Jose, CA) with this pipeline, which performs peak detection and chromatographic alignment based on retention time, m/z, and charge states. Each aligned quantitative feature (i.e., a frame, the set of peak areas of a specific peptide) was correlated with the peptide/protein ID information from popular software such as Proteome Discoverer, MaxQuant, and Mascot with scripts developed in-house. Streamlined processes for frame filtering, LOESS normalization,[34] and outlier detection by Grubbs test[40] on both frame and peptide levels were integrated in ICan. These processes were comprehensively optimized and proved to significantly improve the quantitative accuracy and sensitivity and performance in discovering altered proteins over existing strategies.

Frame Identification and Filtering

In this study, the frame identification is derived from the spectrum identification results from popular database search algorithms such as Proteome Discoverer and MaxQuant. We used in-house scripts to assign these peptide identifications to the distinguished frames. On the basis of our previous studies,[9,13] it was observed that some frames contained multiple unique peptides and thus may lead to unreliable quantification. Here we examined the shared frame issue using the analysis of a series of E. coli extracts spiked with BSA at four different levels (0.025, 0.05, 0.075, and 0.1% of total proteins; four replicates per group). A total of 818 proteins including BSA were identified with a peptide FDR of 0.1% (Supplemental Table 1 in the Supporting Information). Among the total of 13 801 quantitative frames (Supplemental Table 2 in the Supporting Information), 654 (4.7%) assigned to multiple peptide IDs were observed. Of these shared frames, 617 (94.3%) frames only have two unique peptides (Supplemental Figure 1 in the Supporting Information). The peptides with shared frames likely have indistinguishable m/z and retention time, or some of them were derived from misassigned peptide during database search. We evaluated different cutoff thresholds for peptide/protein identification and found that more stringent cutoffs for identification (e.g., lower identification FDR threshold) reduced of the percentage of shared frames, and thus stringent criteria for identification is advisable. Some representative data are shown in Table 1. Moreover, the peptide FDR in shared frame-associated spectra is much higher (∼9-fold) than the determined global peptide FDR (Table 1), indicating increased incorrect identifications in shared frames. Thus, in this study, those share frames containing multiple unique peptides were eliminated.

Table 1

Evaluation of the Percentage of Shared Frames in Multiple Data Sets with Different Peptide FDRa

	five replicates of yeast		20 replicates of rat brain		spiked-in BSA experiment
peptide FDR	0.51%	0.10%	0.52%	0.10%	0.45%	0.11%
protein FDR	7.69%	1.55%	9.92%	2.26%	8.35%	2.08%
identified proteins	1119	970	1199	1063	886	818
total frames	15508	12486	34264	29832	14557	13801
shared frames	502	236	2533	1553	958	654
percentage of shared frames	3.24%	1.89%	7.39%	5.21%	6.58%	4.74%
peptide FDR in shared frames	4.09%	0.61%	4.77%	0.92%	5.79%	1.10%

Spiked-in BSA experiment, 5 replicates of yeast and 20 replicates of rat brain were analyzed in this study. Shared frames: frames assigned to multiple unique peptides.

Evaluation of Normalization Approaches

An optimal normalization method is indispensable to reduce systematic biases and variations and thus to ensure the accuracy and precision of relative quantification in multiple samples. Previously, many normalization approaches have been evaluated for label-free quantification on relatively high-abundance peptides that are commonly identified in all or the majority of LC–MS runs in an experimental set.[14,19] Here we evaluated all of the identified peptides with a wide range of abundance levels by six different normalization methods, including LOESS, quantile, upper-quantile, maximum intensity, median intensity, and total intensity normalization (Supporting Information). The LOESS and quantile normalization achieved best performances in the spiked-in BSA experiment, which decreased the median coefficient variations (CVs) of E. coli peptide intensities by an average of 29% compared with the original data (Figure 2). We also evaluated these methods on two other data sets, five LC–MS runs of the same rat brain digest and six runs of the same digest with different load amounts, respectively, representing data sets with minimal and substantial variations of sample preparation and loading. The LOESS approach showed the most effective normalization (Supplemental Figure 2 in the Supporting Information) and thus was employed in ICan and subsequent studies.

Figure 2

Evaluation of different normalization approaches using the spiked-in BSA data. BSA was spiked into E. coli extracts at four different levels (0.025, 0.05, 0.075, and 0.1% of total proteins; four replicates/group). Box and Whiskers (1–99 percentile) plot was used to analyze the coefficient variations (CVs) of E. coli peptide intensities among these 16 LC–MS runs using different normalization methods. After normalization, the level of missing data and reproducibility of the quantitative features by ICan was further evaluated based on replicate LC–MS runs. One of the most prominent advantages of ion-current-based approach over spectral counting or fragment-ion intensities (MS2-TIC) is the reliable quantification of low-abundance peptides, even though a peptide was only identified for once in the entire sample set,[9,10,13] thus substantially reducing the frequency of missing values and improving the analytical reproducibility. As shown in Supplemental Figure 3 in the Supporting Information, although several methods were employed to improve the reproducibility of LC–MS analysis as previously described,[9] only 655 (80.1% of all identified) proteins were identified in all 16 LC–MS runs and thereby quantifiable by spectral counting or MS2-TIC without missing data. Per contra, the ICan was able to quantify 816 (99.8%) proteins without any missing value across the 16 LC–MS runs. Two proteins (0.2%) were filtered out because all frames assigned to these two proteins were shared frames. We evaluated the quantitative reproducibility of ICan by correlating the protein intensities between any two of the four LC–MS analyses in the spiked-in BSA (0.075%) group. Here the protein intensity was obtained by summing the areas of all peptide peaks assigned to the specific protein. Linear regression of the correlation between two replicate runs was performed, and the R-squared values for paired correlations are all above 0.99, indicating the excellent quantitative reproducibility (Figure 3). Moreover, a high quantitative reproducibility was also achieved for both high- (the upper segment of each line) and low-abundance proteins (the lower segment of each line). The reproducibility of spectral counting or MS2-TIC methods was far inferior, which is particularly sound for low-abundance proteins (Supplemental Figure 4 in the Supporting Information). These results are in agreement with previously reported.[9,13]

Figure 3

Scatter plot of quantitative feature pairs to evaluate the analytical reproducibility. The excellent correlation of protein intensities between different replicates of a spiked-in BSA (0.075%) group was observed. The two axes represent the quantitative abundance values of the same proteins, respectively, by the two duplicate runs.

Accuracy and Precision of Relative Quantification by ICan

The preliminary assessment of quantitative accuracy and precision by ICan was performed using the BSA spiked-in E. coli data. The expected ratio of the reference proteins (E. coli) was 1.00, and the five possible changes of BSA have expected ratios of 1.33 (0.1%/0.075% BSA in E. coli), 1.50 (0.075%/0.05%), 2.00 (0.1%/0.05%), 3.00 (0.075%/0.025%), and 4.00 (0.1%/0.025%), respectively. As shown in Figure 4, ICan quantified nearly all identified proteins without missing data (as previously described), and the measured BSA ratios agreed very well with the expected values with small relative deviations (0.3–9.5%). Excellent linearity between the nominal and observed ratios was achieved (Supplemental Figure 5 in the Supporting Information). The ratios of reference proteins determined by ICan were tightly centered around the theoretical value. The means and standard deviations of the ratios of reference proteins, were, respectively, 0.99 ± 0.06, 1.01 ± 0.07, 1.00 ± 0.05, 1.00 ± 0.08, and 0.98 ± 0.05 for the five comparisons previously mentioned (N = 4/group), reflecting the high accuracy and precision achieved by ICan in calculating protein expression ratios. The popular MS2-based methods such as MS2-TIC, the normalized spectral abundance factor (NASF),[41] and exponentially modified protein abundance index (emPAI)[42] were also evaluated (Figure 4B–D). To achieve optimal analysis for these methods, we employed only the 655 proteins that had no missing data in any of the replicates when performing quantification with these approaches. Even ICan calculated the lowest 20% proteins in abundance while other MS2-based methods did not, it still performed significantly better in terms of quantitative accuracy and precision, as shown in Figure 4. On the basis of these results, the following sections are focused on the comparison of ion-current-based strategies.

Figure 4

Evaluation of accuracy and precision of relative quantification analysis by ICan, MS2-TIC, NASF, and emPAI using the spiked-in BSA data. BSA was spiked into E. coli extracts at four different levels (0.025, 0.05, 0.075, and 0.1% of total proteins). The expected ratio of 1 for reference proteins and five theoretical fold changes of 1.33 (0.1/0.075), 1.50 (0.075/0.05), 2.00 (0.1/0.05), 3.00 (0.075/0.025), and 4.00 (0.1/0.025) for BSA were investigated.

Sensitivity and False-Positive Rate for Discovering Altering Proteins

For proteomics quantification, one of the major aims is to completely discover the true altered proteins to the extent possible, while minimizing false-positives that can otherwise lead to misleading biological clues and waste of resources in informatics analysis and validation. To evaluate the sensitivity and false positive rate (FPR) of biomarker discovery by ion-current-based quantification methods, we spiked a mixture of MAPs obtained from human plasma into E. coli extracts at two different levels (MAP-A: 90 μg of E. coli and 5 μg of MAP; MAP-B: 90 μg of E. coli and 7.5 μg of MAP). In this set, the expected ratio of reference proteins (E. coli) and MAPs were 1.00 and 1.50 (MAP-B/MAP-A), respectively. A total of 775 proteins including 49 MAPs were identified with a peptide and protein FDR of 1%, respectively, using MaxQuant[31] (version 1.4.1.2). The list of peptide and protein identifications was shown in Supplemental Table 3 in the Supporting Information. When using 1.3-fold change (the lowest quantifiable fold-change by our ion-current-based quantitative method based on our previous investigation[9]) and p ≤ 0.05 (t test) as the cutoff thresholds, 45 of 49 (91.8%) MAPs and none of reference proteins were determined as altered proteins (FPR = 0%) by ICan, as shown in Figure 5A (details in Supplemental Table 4 in the Supporting Information). The reference proteins and MAP were, respectively, indicated by blue and red dots. The spots in gray shade have ratios below 1.3-fold change. The mean and standard deviation of the ratios of 45 altered MAPs quantified by ICan was 1.71 ± 0.13, demonstrating excellent sensitivity and accuracy in discovery of changed proteins. The outstanding ability of ICan for identifying altered proteins was further proved by the area under curve value of 0.97 using receiver-operating characteristic (ROC) analysis (Supplemental Figure 6 in the Supporting Information). In this study, it was showed that ICan could quantify nearly all identified proteins and assign significance with high sensitivity and low FPR to small changes of 1.5-fold, providing a competitive ability in the field of quantitative proteomics.

Figure 5

Relative ratios obtained by (A) ICan and (B) iBAQ for a quantitative experiment of E. coli extracts spiked with human plasma moderate-abundance proteins (MAPs) (N = 5/group). In total, 49 MAP proteins (red dots, expected ratio is 1.5 between two groups) and 726 E. coli proteins (blue dots, expected ratio of 1.0) were quantified. Gray shade denotes ≤1.3-fold change (i.e., the cutoff threshold). The iBAQ method, which divides the sum of intensities of all peptides by the number of theoretically observable peptides, was shown to be the most accurate among different absolute quantification methods in a previous work.[43] The intensities or iBAQ values for proteins were also achieved from the MaxQuant using standard settings with the option of “match between runs” selected. Thus, here the same list of peptide/protein identifications from MaxQuant was shared and analyzed by ICan and iBAQ. We calculated the relative ratios of proteins by the iBAQ values (Supplemental Table 4 in the Supporting Information) against ICan. Using the same threshold, 42 of 49 (85.7%) MAPs and 7 reference proteins (Figure 5B) were determined as altered proteins (FPR = 14.3%) by iBAQ, with a significantly lower sensitivity and higher FPR than ICan. The mean and standard deviation of the ratios of 42 altered MAPs quantified by iBAQ was 1.8 ± 0.2, also indicating good accuracy and precision in discovery of changed proteins. A third-party, publicly available data set, the Clinical Proteomic Technology Assessment for Cancer (CPTAC) study 6 data,[30] was employed for further investigation of these two methods on the relative quantification. Here Study 6B (0.74 fmol/μL UPS1 spiked into yeast lysate) versus 6A (0.25 fmol/μL UPS1 spiked into yeast lysate A) samples, which contain relatively low abundance of UPS proteins, were selected to analyze. The expected ratios for yeast proteins and UPS were, respectively, 1.0 and 3.0. After database searching by MaxQuant, a total of 777 proteins including 15 UPS proteins were identified in this study. Using the same threshold (≥1.3-fold and p ≤ 0.05), all (100%) UPS with a median ratio of 3.25 and 1 yeast protein were determined by ICan as significantly altered proteins, while 12 (80%) UPS with a median ratio of 4.78 and 5 yeast proteins were determined by iBAQ (Supplemental Figure 7 and Supplemental Table 5 in the Supporting Information). Again, ICan was demonstrated to be superior in that it identified more true-positives (UPS proteins) with higher quantitative accuracy and lower FPR than iBAQ method.

Evaluation of Protein Ratio Determination and Multiple Hypothesis Testing

Wrong peptide identification or incorrect assignment of peptide ID to quantitative frames may severely compromise the quantification of the affected proteins; in quantitative analysis, these incorrectly identified/assigned peptides often take the form of outliers, which must be removed to ensure reliable quantification. Here we used Grubbs’ test[40] to identify and then eliminate outliers arising from wrong peptide assignment or large biological/technical variations before the calculation of the quantitative values of unique peptides and proteins. In this study, we further evaluated the protein ratio determination method by comparing a sum-of-intensities method with outlier removal versus other popular approaches. Using the abundance values obtained by ICan, approaches for aggregating quantitative data from peptide-level to protein levels such as TOP3, sum-of-intensity, average ratios, variance-weighted (on coefficient variation of peptide), and linear regression (Supporting Information) were evaluated versus ICan. As previously described, these approaches have been widely used in quantitative proteomics. As shown in Figure 6A, similar sensitivity for biomarker discovery was achieved by ICan, variance-weighted, average ratio, and sum-of-intensity approaches using the spiked-in MAP data. The ICan and variance-weighted approaches showed the lowest and second lowest FPR for identifying altered protein. Without outlier analysis, variance-weighted approach achieved the comparable sensitivity with ICan in discovering altered proteins in this study, while the sensitivity of other approaches are inferior (Figure 6A). In addition, it is clear that Grubbs’ test outlier analysis greatly reduced the false-positives (ICan vs sum-of-intensity) (Figure 6A). For instance, E. coli protein glutamine-fructose-6-phosphate transaminase (Glms, expected ratio is 1.0), determined as an altered protein (1.56-fold and p value = 0.02) by sum-of-intensity method, was quantified by 11 unique peptides, but 10 (90.9%) of them have ratios around the expected ones, as shown in Supplemental Figure 8 in the Supporting Information. The ICan analysis (sum-of-intensity with rejection) removes the outlier (red spots in Supplemental Figure 8 in the Supporting Information) and gives the protein ratio (0.97-fold and p value = 0.43) that agrees well with most of the peptide ratio. Therefore, here we utilized the sum-of-intensity with rejection for protein ratio estimation in the ion-current-based quantification procedure to replace the sum-of-intensity approach we described in previous studies.[9,13]

Figure 6

Evaluation of (A) different methods for aggregating quantitative data from peptide-level to protein levels and (B) multiple testing approaches for Ican-based quantification. The false-positive rate (FPR) and sensitivity for discovering altered proteins were investigated with the combination of statistical analysis and a fold-change filter (1.3-fold). A p value of 0.05 was adopted. For investigation of multiple testing, critical significance levels of both 0.05 and 0.10 were evaluated. SB, Sequential Bonferroni test; BH, Benjamini and Hochberg test; SFisher, Sequential Fisher combined probability test. We also evaluated multiple hypothesis testing such as Sequential Bonferroni correction (SB),[35] Benjamini-Hochberg FDR control (BH),[36] and Sequential Fisher’s combined probability test (SFisher)[37] to adjust the p value of t test (Supporting Information). For investigation of multiple testing, we used 0.05 and 0.10, respectively, as the critical significance level. With the combination of fold-change (1.3-fold threshold) and statistical testing (0.05 or 0.10), the superior performance of biomarker discovery was observed by SFisher compared with the other two methods (Figure 6B and Supplemental Table 6 in the Supporting Information) in ion-current-based quantification. Forty-three (87.8%) MAPs and none of reference proteins were identified as altered proteins with the thresholds of ≥1.3-fold and p ≤ 0.05 by SFisher. The CPTAC data described above were also tested using multiple testing, and a similar result was shown in Supplemental Figure 9 and Supplemental Table 7 in the Supporting Information, indicating the superior of SFisher.

Conclusions

Ion-current-based quantitative approach has emerged as an attractive alternative to both spectral counting and labeling methods, which can analyze many biological samples for large-scale studies such as clinical and pharmaceutical investigations. Recently, the wide prevalence of high-resolution MS has greatly boosted the quality of ion-current-based analysis. Moreover, the substantial advancements in MS instrumentation (e.g., analysis of ∼4000 unique yeast proteins in 1 h of LC–MS/MS run using a hybrid Oribtrap MS instrument[44]), will markedly enhance the coverage of ion-current-based analysis. For ion-current-based strategy, a data-processing procedure enabling accurate, precise, and sensitive quantification is critical. Here we demonstrated that the ICan procedure is optimal for ion-current-based quantitative analysis, which provides superior quantitative accuracy and higher sensitivity for biomarker discovery with a lower FDR than these popular methods we’ve tested. Furthermore, the comparative investigations of various quantitative features in this study provide highly valuable information for the development and evaluation of algorithms for both labeling and label-free methods.

Data Sharing

All raw files associated with this paper are available at https://chorusproject.org/pages/dashboard.html for free downloads.

40 in total

1. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS.

Authors: Andrew Thompson; Jürgen Schäfer; Karsten Kuhn; Stefan Kienle; Josef Schwarz; Günter Schmidt; Thomas Neumann; R Johnstone; A Karim A Mohammed; Christian Hamon
Journal: Anal Chem Date: 2003-04-15 Impact factor: 6.986

2. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.

Authors: B M Bolstad; R A Irizarry; M Astrand; T P Speed
Journal: Bioinformatics Date: 2003-01-22 Impact factor: 6.937

3. Normalization of cDNA microarray data.

Authors: Gordon K Smyth; Terry Speed
Journal: Methods Date: 2003-12 Impact factor: 3.608

4. A model for random sampling and estimation of relative protein abundance in shotgun proteomics.

Authors: Hongbin Liu; Rovshan G Sadygov; John R Yates
Journal: Anal Chem Date: 2004-07-15 Impact factor: 6.986

5. Differential mass spectrometry: a label-free LC-MS method for finding significant differences in complex peptide and protein mixtures.

Authors: Matthew C Wiener; Jeffrey R Sachs; Ekaterina G Deyanova; Nathan A Yates
Journal: Anal Chem Date: 2004-10-15 Impact factor: 6.986

6. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein.

Authors: Yasushi Ishihama; Yoshiya Oda; Tsuyoshi Tabata; Toshitaka Sato; Takeshi Nagasu; Juri Rappsilber; Matthias Mann
Journal: Mol Cell Proteomics Date: 2005-06-14 Impact factor: 5.911

7. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations.

Authors: Joshua E Elias; Wilhelm Haas; Brendan K Faherty; Steven P Gygi
Journal: Nat Methods Date: 2005-09 Impact factor: 28.547

8. Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics.

Authors: Stephen J Callister; Richard C Barry; Joshua N Adkins; Ethan T Johnson; Wei-Jun Qian; Bobbie-Jo M Webb-Robertson; Richard D Smith; Mary S Lipton
Journal: J Proteome Res Date: 2006-02 Impact factor: 4.466

9. Guidelines for the routine application of the peptide hits technique.

Authors: Ji Gao; Mark S Friedrichs; Ashok R Dongre; Gregory J Opiteck
Journal: J Am Soc Mass Spectrom Date: 2005-08 Impact factor: 3.109

10. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents.

Authors: Philip L Ross; Yulin N Huang; Jason N Marchese; Brian Williamson; Kenneth Parker; Stephen Hattan; Nikita Khainovski; Sasi Pillai; Subhakar Dey; Scott Daniels; Subhasish Purkayastha; Peter Juhasz; Stephen Martin; Michael Bartlet-Jones; Feng He; Allan Jacobson; Darryl J Pappin
Journal: Mol Cell Proteomics Date: 2004-09-22 Impact factor: 5.911

11 in total

1. Large-Scale, Ion-Current-Based Proteomic Investigation of the Rat Striatal Proteome in a Model of Short- and Long-Term Cocaine Withdrawal.

Authors: Shichen Shen; Xiaosheng Jiang; Jun Li; Robert M Straubinger; Mauricio Suarez; Chengjian Tu; Xiaotao Duan; Alexis C Thompson; Jun Qu
Journal: J Proteome Res Date: 2016-04-11 Impact factor: 4.466

2. Proteomic profiling of the retinas in a neonatal rat model of oxygen-induced retinopathy with a reproducible ion-current-based MS1 approach.

Authors: Chengjian Tu; Kay D Beharry; Xiaomeng Shen; Jun Li; Lianshui Wang; Jacob V Aranda; Jun Qu
Journal: J Proteome Res Date: 2015-04-06 Impact factor: 4.466

3. Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

Authors: Chengjian Tu; Quanhu Sheng; Jun Li; Danjun Ma; Xiaomeng Shen; Xue Wang; Yu Shyr; Zhengping Yi; Jun Qu
Journal: J Proteome Res Date: 2015-09-30 Impact factor: 4.466

4. An IonStar Experimental Strategy for MS1 Ion Current-Based Quantification Using Ultrahigh-Field Orbitrap: Reproducible, In-Depth, and Accurate Protein Measurement in Large Cohorts.

Authors: Xiaomeng Shen; Shichen Shen; Jun Li; Qiang Hu; Lei Nie; Chengjian Tu; Xue Wang; Benjamin Orsburn; Jianmin Wang; Jun Qu
Journal: J Proteome Res Date: 2017-05-25 Impact factor: 4.466

5. Quantitative proteomic profiling of paired cancerous and normal colon epithelial cells isolated freshly from colorectal cancer patients.

Authors: Chengjian Tu; Wilfrido Mojica; Robert M Straubinger; Jun Li; Shichen Shen; Miao Qu; Lei Nie; Rick Roberts; Bo An; Jun Qu
Journal: Proteomics Clin Appl Date: 2017-01-20 Impact factor: 3.494

6. Experimental Null Method to Guide the Development of Technical Procedures and to Control False-Positive Discovery in Quantitative Proteomics.

Authors: Xiaomeng Shen; Qiang Hu; Jun Li; Jianmin Wang; Jun Qu
Journal: J Proteome Res Date: 2015-09-01 Impact factor: 4.466

7. IonStar enables high-precision, low-missing-data proteomics quantification in large biological cohorts.

Authors: Xiaomeng Shen; Shichen Shen; Jun Li; Qiang Hu; Lei Nie; Chengjian Tu; Xue Wang; David J Poulsen; Benjamin C Orsburn; Jianmin Wang; Jun Qu
Journal: Proc Natl Acad Sci U S A Date: 2018-05-09 Impact factor: 12.779

8. Comparative Proteomic Analysis of the Mitochondria-associated ER Membrane (MAM) in a Long-term Type 2 Diabetic Rodent Model.

Authors: Jacey Hongjie Ma; Shichen Shen; Joshua J Wang; Zhanwen He; Amanda Poon; Jun Li; Jun Qu; Sarah X Zhang
Journal: Sci Rep Date: 2017-05-18 Impact factor: 4.379

9. GPR56/ADGRG1 regulates development and maintenance of peripheral myelin.

Authors: Sarah D Ackerman; Rong Luo; Yannick Poitelon; Amit Mogha; Breanne L Harty; Mitchell D'Rozario; Nicholas E Sanchez; Asvin K K Lakkaraju; Paul Gamble; Jun Li; Jun Qu; Matthew R MacEwan; Wilson Zachary Ray; Adriano Aguzzi; M Laura Feltri; Xianhua Piao; Kelly R Monk
Journal: J Exp Med Date: 2018-01-24 Impact factor: 14.307

Review 10. MS1 ion current-based quantitative proteomics: A promising solution for reliable analysis of large biological cohorts.

Authors: Xue Wang; Shichen Shen; Sailee Suryakant Rasam; Jun Qu
Journal: Mass Spectrom Rev Date: 2019-03-28 Impact factor: 10.946