Literature DB >> 20130815

Statistical analysis of variation in the human plasma proteome.

Todd H Corzett1, Imola K Fodor, Megan W Choi, Vicki L Walsworth, Kenneth W Turteltaub, Sandra L McCutchen-Maloney, Brett A Chromy.   

Abstract

Quantifying the variation in the human plasma proteome is an essential prerequisite for disease-specific biomarker detection. We report here on the longitudinal and individual variation in human plasma characterized by two-dimensional difference gel electrophoresis (2-D DIGE) using plasma samples from eleven healthy subjects collected three times over a two week period. Fixed-effects modeling was used to remove dye and gel variability. Mixed-effects modeling was then used to quantitate the sources of proteomic variation. The subject-to-subject variation represented the largest variance component, while the time-within-subject variation was comparable to the experimental variation found in a previous technical variability study where one human plasma sample was processed eight times in parallel and each was then analyzed by 2-D DIGE in triplicate. Here, 21 protein spots had larger than 50% CV, suggesting that these proteins may not be appropriate as biomarkers and should be carefully scrutinized in future studies. Seventy-eight protein spots showing differential protein levels between different individuals or individual collections were identified by mass spectrometry and further characterized using hierarchical clustering. The results present a first step toward understanding the complexity of longitudinal and individual variation in the human plasma proteome, and provide a baseline for improved biomarker discovery.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20130815      PMCID: PMC2814230          DOI: 10.1155/2010/258494

Source DB:  PubMed          Journal:  J Biomed Biotechnol        ISSN: 1110-7243


1. Introduction

Mapping the human proteome presents a significant scientific challenge, partly because of the complexity of the population and partly because of technological limitations [1]. However, potential rewards in the diagnosis and treatment of diseases make proteomic characterization of human plasma a very worthwhile endeavor. The Human Proteome Organization (HUPO) represents an international consortium of academic and industrial partners whose common goal is to foster collaboration and facilitate a better understanding of the human proteome. Recognizing the need for reproducibility, and following in the footsteps of the more mature field of gene expression analysis, proteomic standards are starting to emerge [2, 3]. The Human Plasma Proteome (HPP) project [4] of HUPO, which specifically targets plasma proteins, has made considerable progress while highlighting the complexity of plasma proteomics. For example, protein identification of the same specimen resulted in less than 50% agreement when repeated multiple times [5, 6], reflecting the challenges involved in biomarker discovery from human plasma [7, 8] and underlining the need for improvements in plasma proteomic characterization. Studies providing prefractionation and other sample preparation aspects are looking to improve this process [9-12]. A primary technological problem that needs to be addressed is the quantification of the experimental variation on a given proteomic platform. Next, the baseline variation within individuals over time and the variation between multiple individuals also need to be quantitated. Searching for disease-specific biomarkers makes sense only after these two steps are addressed. Our recent study, referred to as the Technical Variation Study (TVS) [13] throughout the manuscript, addressed the first question for two-dimensional difference gel electrophoresis (2-D DIGE) experiments by processing one human plasma sample eight times and analyzing each of the resulting eight technical replicates in triplicate on twelve gels [13-16]. The present study is a follow-up to the TVS, whereby plasma samples from eleven healthy volunteer subjects, taken at three time points separated by two weeks, were analyzed in triplicate on 50 gels. The goal of this study was to assess longitudinal and individual variation in human plasma and to compare results to the experimental variation detected in the previously reported TVS [13]. While differences were detected in the plasma proteome within individuals over time, our analyses indicate that individual variation contributes the largest observed proteomic variability. Further, our results demonstrate that gender-related proteomic differences can be detected by 2-D DIGE and should also be considered in biomarker discovery. Overall, this work represents a first step in quantitating the variability in human plasma by addressing the individual and longitudinal proteomic variation in human plasma.

2. Materials and Methods

2.1. Sample Collection

Blood samples were collected from eleven healthy volunteers (five males, six females) at three time points separated by two weeks, with informed consent under Institutional Review Board approval from Lawrence Livermore National Laboratory. To minimize the effect of daily variations within an individual, the samples from a given subject were taken at approximately the same time, within a thirty-minute window, in the morning for each time point. Other variables were not controlled for in order to better mimic the variability in typical human plasma samples (age, fasting, illness, medication, etc.). To better examine the longitudinal variation and minimize the chance of an individual providing samples while experiencing an underlying condition such as a cold, two weeks between sample collection were chosen. Each individual was assigned an identification number to blind the samples and ease the experimental design.

2.2. Top-6 High-Abundance Protein Depletion and Sample Preparation

To increase the resolution of the 2-D DIGE technique, the six most abundant plasma proteins were depleted using affinity chromatography, as previously reported [12]. The sample cleanup and protein assay was performed as described previously [12, 13, 17, 18].

2.3. 2-D DIGE and Gel Imaging

The 33 top-6-depleted plasma samples from the 11 individuals were analyzed in triplicate in a 50-gel 2-D DIGE experiment [13-16] (see Table 1 in Supplementary Material available online at doi: 10.1155/2010/258494). Each gel contained three samples, one internal pooled standard and two experimental samples. The internal pooled standard consists of an equal amount of each of the 33 samples and was labeled with the Cy2 dye (GE Healthcare). Each experimental sample was dye-swapped and labeled with both the Cy3 and the Cy5 dyes (GE Healthcare) in the experimental design to mitigate the effect of potential dye-specific variations [19]. Samples from individuals obtained at different times were compared on some gels, while samples from two different individuals were compared on other gels. Gels were run randomly in batches of twelve in order to minimize batch-to-batch variability. Supplementary Table 1 shows the complete experimental design. Labeling first dimension (pI) separation, second dimension (mw) separation, and gel imaging was performed as described previously [13]. Mass spectrometry was carried out as previously described [20].

2.4. Data Analysis

The DeCyder Differential Analysis Software v5.01 (GE Healthcare) was used for quantitating differential abundance of proteins. The Differential In-gel Analysis (DIA) module was used to determine the optimal spot detection settings. Images were loaded into the Batch Processor module with the estimated number of spots set to 2,500. The master gel was assigned automatically to the gel with the most spots detected. Each sample was grouped for analysis in the Biological Variation Analysis (BVA) module. During batch processing, the Cy2 channel from each gel was used for normalization of the spot intensities and for automated matching between gels. For each spot on each gel, the software reported the standardized abundance (SA) as the ratio of the volume in the Cy3 (or Cy5) sample to the volume of the pooled standard sample labeled with Cy2, where the volumes were normalized across the gels. Standardized log abundance (SLA), defined as log 10 (SA), was used in quantifying differential expression. Fold change between groups was calculated as the ratio of the average SA in the two groups. If R denotes that ratio, the fold change F was defined as F = R if R ≧ 1 and F = −1/R otherwise. A k-fold expression increase/decrease corresponded to a +k/−k value of F. Using DeCyder, all possible pairwise comparisons were made between the 33 groups defined by the eleven subjects at the three times. Within the BVA module each comparison was filtered to find the spots having (a) P-value ≤.05 and (b) greater than 1.5-fold change in expression between the groups. The analysis was converted into DeCyder 2D (v6.5), and the Extended Data Analysis (EDA) module (GE Healthcare) was used to perform expression pattern clustering [21-26]. Data from the TVS were integrated into the analysis, pooled standards were normalized and principal component analysis was conducted [27-30] on the spots that were successfully matched on >75% of the gels from the TVS and the present study [13]. Spot characteristics calculated by DeCyder were exported for further statistical processing into the R statistical computing environment [19] (http://www.r-project.org/). Summary statistics for the spot matching across the gels were calculated. The high-quality spots, defined as the spots matched on at least 75% of the gels, were subjected to further analysis. Spotwise standard deviations (SDs) of the SLA values and coefficients of variations (CVs) of the SA values were calculated, first by using all the data at a given spot to obtain one SD and one CV for that spot, then by performing the same calculations separately for the eleven subjects, thus obtaining eleven SD and CV values for each spot. The former method estimated the protein expression variation among all subjects and time points, while the latter addressed the variation within individuals through time. The results were compared to the spotwise SD and CV values obtained from the TVS [13]. To quantitate the relative contribution of the components of variation, mixed-effects statistical modeling [31] was performed. Let y denote the SLA at spot i on gel j measured with dye k, with i = 1,…, I, where I represents the number of spots matched on 75% of the gels, j = 1,…, 50, and k = 1, 2. In addition, let l = 1,…, 11 and m = 1, 2, 3 indicate the subject and time indices, respectively. Let g = 1, 2 indicate male and female genders, respectively. The assumed model was of the form where μ is the overall mean, α denote the coefficients for the fixed gel effects, β the coefficients for the fixed dye effects, (α β) the coefficients for the gel-dye interactions, γ the coefficient for the fixed gender effect at spot i, and a, b, and e the random effects components for subject (individual), time (longitudinal), and error at spot i, independent and normally distributed [32] with mean zero and variance σ2, σ2, and σ2, respectively. The gender, subject, and time subscripts in (1) are redundant, as for any given j and k, the identity of the sample, including the subject, gender and time, was known. However, we included them for the clarity of the model description. Since the gel and dye factors were balanced with respect to the spots (the first four terms in the model were common to all spots), (1) was fit in two stages, with results equivalent to, and computationally more efficient than, the full one-stage solution in (1) repeated at each spot. Similar methods have been established for microarrays [33]. In the first stage, the data from all I spots were used to estimate the global dye and gel effects; that is, only the first four terms in (1) plus error were included in the model. In the second stage, the last four terms in (1) were fit to the residuals from the first stage, one spot at a time. In essence, this first stage amounted to a normalization step, whereby the fixed dye and gel effects were estimated and removed by pooling the information across all the spots. In the second stage, a fixed gender effect and random variance components of subject, time, and error were estimated separately at each spot. Thus, at spot i, the total variance σ2 was separated into its random components as σ2 = σ2 + σ2 + σ2. The effect of additional statistical normalizations of the SLA on the variance component estimates and on the differentially expressed spots was investigated. The SLA values obtained from DeCyder were further normalized by statistical methods that corrected for potential dye biases within gels and range differences among the gels as previously described [34]. The spots that were determined to be of differential abundance (>1.5-fold difference with P-value <.05) were excised from the pick gel and identified by mass spectrometry as previously reported [12, 18, 20]. Identified spots were selected in DeCyder for additional expression pattern clustering.

3. Results and Discussion

3.1. Experimental Design

The experimental design is shown in Supplementary Table 1 and Supplementary Figure 1. Rather than randomly pairing the samples on the gels, the design was selected to minimize the experimental variation among the samples whose comparison was of most interest. By placing the samples from a subject across different time points on the same gel, gel-related variations for intrasubject comparisons were minimized. Essentially, our design was based on the requirements that (1) each of the 33 samples has three replicates and (2) comparisons of the samples from the same subject were of more interest than comparisons of different subjects across time points. This led to an experimental design that contained 22 gels used for comparing the same individual at different time points and 28 gels to compare two individuals at different time points. In addition, the use of dye swapping and triplicates also contributed to overall quality of the data. Our results suggest that gender variability may also be present. As individual and longitudinal variability was our main objective, we did not attempt to control for gender differences. We selected both males and females for our study to get a more appropriate human sample set. Future work looking at human proteome variability should account for gender-specific variability in the design of the experiment.

3.2. Spot Matching

Landmarks were placed manually on each gel to assist in the spot matching across the gels. Spots of interest identified through the analyses were verified to have the three-dimensional profile characteristics of a protein spot. Those spots with volumes close to background level and dust particles with very large slopes and small areas were eliminated. The total number of protein spots detected on the master gel was 2556. Three hundred and ninety seven (15%) spots were matched on at least 37 gels, and 1215 (46%) spots were matched on at least 25 gels. The following statistical analyses were restricted to the 397 high-quality spots matched on 75% of the gels. These high-quality spots were chosen to focus on spots that did not require warping or imputing of any missing data. Future data analysis may help determine if warping and data imputation can expand high-quality gel proteomic data. The latest version of DeCyder contains the ability to warp gels to potentially add missing data. These additions may provide additional spots that can be studied as high-quality, but this type of data manipulation may also create skewed expression values, as the results depend on the type of postrun imputation model that is utilized [35]. Our previous study with technical replicates of one human plasma sample [13] had 42% of the spots matched on 8 of 12 gels. The addition of biological samples from different subjects at varying time points added to the complexity of the current dataset and reduced the matching accuracy. While most of the decrease in the matching accuracy is expected to stem from the biological complexity of the experiment, part of it may be attributed to the larger number of gels, which inevitably increases the expected experimental variation. A study of five commercial software programs showed that an average of only 3% of the total analysis time was automated as opposed to manual, and as the number of gels increased, the percentage of automatically generated correct matches was dramatically reduced [36]. Taken together, these studies suggest that improved spot detection and matching software and algorithms are needed to increase the quality of spot matching. One such study was accomplished that created algorithms to improve spot matching with an integrated approach using hierarchical-based and optimization-based methods [37].

3.3. Differential Expression and Protein Identifications

The pairwise comparisons among the 33 samples identified over 1400 spots with P-value <.05 and fold-change > 1.5. Down selection using manual inspection eliminated most of these spots (due to three-dimensional profile characteristics not representative of protein or because of insufficient representation of the spot on enough gels) resulting in 427 spots of further interest. The majority of the spots that did not pass manual verification were similar to background levels, lacking visual characteristics of protein spots. The sensitive detection parameters used in this study, while allowing for the detection of low abundance proteins, results in increased detection of artifacts that require manual verification. Of the 427 spots with differential abundance, those exhibiting the greatest differences in abundance levels were further characterized, and 78 proteins spots were identified by mass spectrometry. The identified proteins are listed in Table 2, along with the theoretical pI and molecular weight calculated from the full-length amino acid sequence of each protein. Figure 1 depicts the spatial distribution of the protein spots detected in human plasma by 2-D DIGE. Identified proteins are denoted by blue dots. Sixteen proteins (red dots) were found to have differences between the theoretical and observed molecular weights. The discrepancies are potentially due to posttranslational modifications or experimental processing; however, since all samples were treated identically, posttranslational modification is more likely. For example, spots 2189 and 2184, both identified as complement component C4A, were found to be statistically significant with at least a 1.5-fold difference between individuals. Complement component C4A has a theoretical molecular weight of 192.8 kDa; yet the protein spots identified indicate an approximate 32 kDa fragment. Since only C-terminal peptides were detected by mass spectrometry, the protein spot likely corresponds to the active Complement C4c fragment (mw = 33 kDa), which is a known cleavage product of Complement C4A [38, 39]. Variability in the amount of Complement C4c fragment between individuals could be a reflection of immune status, which may be a considerable variable when comparing human clinical subjects.
Table 2

Variable proteins identified from human plasma.

Protein NumberProtein IdentityAccession Number mwapIaGene
614aclusterinIPI0040082657.86.25CLU
614balpha-2-macroglobulin precursorIPI00478003163.36A2M
835aalpha-2-macroglobulin precursorIPI00478003163.36A2M
835bComplement C3 precursorIPI00783987187.16.02C3
849PlasminogenIPI0001958090.67.04PLG
856PlasminogenIPI0001958090.67.04PLG
881acomplement factor B preproproteinIPI0001959185.56.67CFB
881bcomplement protein C7 precursorIPI0029660893.56.09C7
881cComplement C3 precursorIPI00783987187.16.02C3
884PlasminogenIPI0001958090.67.04PLG
893complement factor B preproproteinIPI0001959185.56.67CFB
899acomplement factor B preproproteinIPI0001959185.56.67CFB
899bcomplement protein C7 precursorIPI0029660893.56.09C7
899cComplement C3 precursorIPI00783987187.16.02C3
910afibrinogen gammaIPI0021971349.55.7FGG
910bcomplement factor B preproproteinIPI0001959185.56.67CFB
956acomplement component 1, r subcomponentIPI0029616580.25.89C1R
956bcomplement component C4AIPI00032258192.86.66C4A
963complement component 1, r subcomponentIPI0029616580.25.89C1R
1002complement component 1,s subcomponentIPI0001769676.74.86C1S
1004agelsolinIPI0064677380.65.58GSN
1004bcomplement component 2IPI0030396383.37.23C2
1004ccomplement factor B preproproteinIPI0001959185.56.67CFB
1004dcomplement protein C7 precursorIPI0029660893.56.09C7
1004ealpha-2-macroglobulin precursorIPI00478003163.36A2M
1004fComplement C3 precursorIPI00783987187.16.02C3
1004gcomplement component C4AIPI00032258192.86.66C4A
1027transferrinIPI00022463776.81TF
1110IGHM proteinIPI0082820565.38.1IGHM
1113aIGHM proteinIPI0082820565.38.1IGHM
1113btransferrinIPI00022463776.81TF
1128aIGHM proteinIPI0082820565.38.1IGHM
1128btransferrinIPI00022463776.81TF
1129ahistidine-rich glycoprotein precursorIPI0002237159.67.09HRG
1129bcoagulation factor XII precursorIPI0001958167.57.94F12
1129ctransferrinIPI00022463776.81TF
1142ahistidine-rich glycoprotein precursorIPI0002237159.67.09HRG
1142btransferrinIPI00022463776.81TF
1156transferrinIPI00022463776.81TF
1185transferrinIPI00022463776.81TF
1254hemopexin precursorIPI0002248851.56.57HPX
1263ahistidine-rich glycoprotein precursorIPI0002237159.67.09HRG
1263bComplement C3 precursorIPI00783987187.16.02C3
1276ahemopexin precursorIPI0002248851.56.57HPX
1276bHeparin cofactor II precursorIPI0029295057.16.41HCF2
1276cpeptidoglycan recognition protein L precursorIPI0016320762.27.25PGLYRP
1382kininogenIPI0021589447.96.29KNG
1394albuminIPI0074587269.15.85ALB
1456alpha-1-antichymotrypsin precursorIPI0055099145.55.32AACT
1471aimmunoglobulin alpha-1 heavy chainIPI0016686637.66.06IGHA1
1471bkininogenIPI0021589447.96.29KNG
1471cantithrombin IIIIPI0003217952.66.32AT3
1525Vitronectin precursorIPI0029897154.35.55VTN
1526akininogenIPI0021589447.96.29KNG
1526bAngiotensinogenIPI0003222053.25.78AGT
1555Vitronectin precursorIPI0029897154.35.55VTN
1558immunoglobulin alpha-2 heavy chainIPI0064122936.45.71IGH
1568akininogenIPI0021589447.96.29KNG
1568bantithrombin IIIIPI0003217952.66.32AT3
1568cAngiotensinogenIPI0003222053.25.78AGT
1577Alpha-2-HS-glycoproteinIPI0002243139.35.43AHSG
1589aimmunoglobulin alpha-1 heavy chainIPI0016686637.66.06IGHA1
1589bapolipoprotein H precursorIPI0029882838.38.34APOH
1589cprepro-plasma carboxypeptidase BIPI0032977548.47.61pCPB
1589dfibrinogen beta chainIPI0029849755.98.54FGB
1589ealpha-2-macroglobulin precursorIPI00478003163.36A2M
1616aapolipoprotein H precursorIPI0029882838.38.34APOH
1616bfibrinogen beta chainIPI0029849755.98.54FGB
1626Alpha-2-HS-glycoproteinIPI0002243139.35.43AHSG
1648fibrinogen beta chainIPI0029849755.98.54FGB
1650aapolipoprotein DIPI00006662285.14APOD
1650bAlpha-2-HS-glycoproteinIPI0002243139.35.43AHSG
1650calpha-1-antichymotrypsin precursorIPI0055099145.55.32AACT
1650dfibrinogen gammaIPI0021971349.55.7FGG
1652Alpha-2-HS-glycoproteinIPI0002243139.35.43AHSG
1725vitamin D-binding protein precursorIPI0074269652.95.32GC
1731vitamin D-binding protein precursorIPI0074269652.95.32GC
1740vitamin D-binding protein precursorIPI0074269652.95.32GC
1741vitamin D-binding protein precursorIPI0074269652.95.32GC
1744vitamin D-binding protein precursorIPI0074269652.95.32GC
1749vitamin D-binding protein precursorIPI0074269652.95.32GC
1752vitamin D-binding protein precursorIPI0074269652.95.32GC
1843apigment epithelial-differentiating factorIPI0000611446.35.84PEDF
1843bfibrinogen gammaIPI0021971349.55.7FGG
1898vitamin D-binding protein precursorIPI0074269652.95.32GC
1911aserum paraoxonaseIPI0021873237.84.96PON
1911bfibrinogen gammaIPI0021971349.55.7FGG
1911cComplement C3 precursorIPI00783987187.16.02C3
1918serum paraoxonaseIPI0021873237.84.96PON
1925aserum paraoxonaseIPI0021873237.84.96PON
1925bfibrinogen gammaIPI0021971349.55.7FGG
1985aserum paraoxonaseIPI0021873237.84.96PON
1985bhaptoglobinIPI0064173745.26.13HP
1986aapolipoprotein A-IV precursorIPI0030427343.45.22APOA4
1986bhaptoglobinIPI0064173745.26.13HP
1986cserum paraoxonaseIPI0021873237.84.96PON
1998apolipoprotein A-IV precursorIPI0030427343.45.22APOA4
2008apolipoprotein A-IV precursorIPI0030427343.45.22APOA4
2029alpha-2-glycoprotein 1IPI0016672934.35.71AZGP1
2030aapolipoprotein A-IV precursorIPI0030427343.45.22APOA4
2030bhaptoglobinIPI0064173745.26.13HP
2065ahaptoglobinIPI0064173745.26.13HP
2065bComplement factor IIPI0029186765.87.72CFI
2095aalpha-1-antichymotrypsin precursorIPI0055099145.55.32AACT
2095bclusterinIPI0040082657.86.25CLU
2130aProapolipoproteinIPI00021841295.45APOA1
2130bComplement factor IIPI0029186765.87.72CFI
2137clusterinIPI0040082657.86.25CLU
2184complement component C4AIPI00032258192.86.66C4A
2189complement component C4AIPI00032258192.86.66C4A
2191transthyretinIPI0002243215.95.5TTR
2236immunoglobulin kappa light chainIPI00784070268.16IGKC
2259amyloid P componentIPI0002239125.46.1APCS
2260amyloid P componentIPI0002239125.46.1APCS
2272alambda-chain precursorIPI0015474224.77.54IGL
2272bimmunoglobulin kappa light chainIPI00784070268.16IGKC
2284immunoglobulin kappa light chainIPI00784070268.16IGKC
2314ProapolipoproteinIPI00021841295.45APOA1
2325ProapolipoproteinIPI00021841295.45APOA1
2326ProapolipoproteinIPI00021841295.45APOA1
2338ProapolipoproteinIPI00021841295.45APOA1
2346plasma glutathione peroxidaseIPI0002619916.78.93GPx-P
2415haptoglobinIPI0064173745.26.13HP
2468transthyretinIPI0002243215.95.5TTR
2520haptoglobinIPI0064173745.26.13HP

 aTheoretical molecular weight (mw) in kDa and isoelectric point (pI) values.

Figure 1

The spatial distribution of proteins spots (yellow dots) detected in human plasma by 2-D DIGE. Identified proteins (blue dots) and those showing differences between the theoretical and observed molecular weights (numbered red dots) are highlighted.

3.4. Spotwise Variation

The distribution of the SLA was consistent across the gels. The spot-wise SD values of the SLA for the 397 high-quality spots, when considering all samples, ranged from 0.04 to 0.53, with a median of 0.10. When broken down separately by subject, the range was 0.0002 to 0.50, with 0.06 as the median, reflecting the lower variation of time-within-subjects than variation between the different subjects. Both sets of values represented an increase from the spot-wise SDs observed among technical replicates of the same human plasma sample [13], where the maximum was 0.20 and the median 0.04. In the previous TVS work [13], the CV values of the SA had a median of 10% and a maximum of 42%. Here, the range of the spot-wise CVs was 10% to 93%, with a median CV of 23%. The higher CVs of the present study reflect the additional complexity due to the heterogeneity of the samples from multiple human subjects. These results are comparable to the recently reported 6% (min), 108% (max), and 19% (median) CVs found in a 2-D DIGE study of normal liver samples from ten human subjects [40]. Here, about 90% of the spots had less than 40% CV, and only 21 spots (5% of the 397 high-quality spots) had higher than 50% CV. The spots are likely not good biomarker candidates due to their high individual or longitudinal variability. These spots showing relatively high variation may correspond to a single isoform of individual proteins and do not represent all isoforms of any given protein. Notably, several of the proteins identified (Albumin, Transferrin, Haptoglobin, IgG, and IgA) are proteins removed by the Top-6 depletion process [12], which was subsequently found to result in variability when processing multiple samples in series. In future studies, column equilibration steps are recommended between samples to reduce this variability and ensure more complete depletion of high-abundant proteins. In summary, the majority of the spots had small enough CV to indicate that the corresponding protein expressions were relatively constant across individuals, and thus could be potentially used as biomarkers. The minimum, maximum, and median CVs, when calculated separately for the subjects were 0.05%, 131%, and 14%, respectively. Over 95% of the CVs were below 35%, indicating that for most subjects, the variation over the three timepoints was comparable to the experimental variation in the previously published TVS data [13].

3.5. Statistical Normalization, Gel and Dye Effect Removal, and Variance Decomposition

The SLA values were further normalized as explained in the methods. The effect of the normalization on the results is addressed as appropriate in the following sections. The F-tests for the analysis of variance calculations corresponding to (1) indicated significant gel (P-value < 2.2e−16), dye (P-value < 3.2e−16), and gel-dye interaction (P-value < 2.2e−16) effects. The residual diagnostic plots did not reveal major departures from the assumptions, thus indicating the validity of the model. Similar analyses using the normalized SLA resulted in slightly higher P-values (gel effect P-value < 2.2e−16, dye effect P-value.003, gel-dye interaction P-value < 3.0e−09) but were consistent with the conclusions based on the calculations using the SLA. The standard deviations corresponding to the random variance component estimates from the mixed-effects model (Figure 2) show the relative contribution of the three components at each of the spots matched on 75% of the gels. Overall, the time-within-subject component was found to have the smallest contribution to the total variance, while subject-related variation had the highest. The corresponding frequency distributions of the three variance components (Table 1) confirm that for most spots (89%) the contribution of the time component was less than 30% of the total variance. Only 5% of the spots had 70% to 80% of their variation explained by the time component, and no spot had the time component greater than 80% of the total variance. For 21% of the spots, the contribution of the subject component comprised over 70% of the total variance. For about 44% of the spots, the contribution of the subject component represented over 50% of the total variation.
Figure 2

The subject (σ), time-within-subject (σ), and random error (σ) variance component estimates (on SD scale) for the 397 protein spots matched on at least 75% of the gels, ordered by the magnitude of the subject component.

Table 1

Frequency distribution of the variance component estimates.

% contribution to total varianceSubjectTime in subjectError
(a)(b)(a)(b)(a)(b)
0–106.316.3163.8963.896.576.57
10–209.3415.6617.6881.5717.1723.74
20–3012.8928.507.8389.3916.6740.40
30–4014.1442.684.2993.699.3449.75
40–5013.6356.312.2795.9610.8660.61
50–6011.3667.683.0398.9911.8772.47
60–7011.3679.040.5199.4910.3782.83
70–8011.6190.660.50100.008.0890.91
80–907.8398.484.8095.71
90–1001.51100.004.29100.00

The components of subject (σ), time-within-subject (σ), and random error (σ) are shown separately as (a) the percentage of spots and (b) the cumulative percentage of spots with contribution to the total variance indicated in the first column.

To further elucidate the contributing factors involved in spot variance, we performed a meta-analysis on these data. Essentially, all the variances for the 397 high-quality spots were summed and the total variance that could be explained by the sum of the spot-wise subject, time within-subject, and error components was determined, respectively. A pooled estimate of the variance components was obtained by taking the average of the corresponding variance components over the spots. When aggregating the total variance over all the spots matched on at least 75% of the gels, the sum of the subject components explained 59% of the total variation and the sum of the time-within-subject components explained 12% of the total variation. The average subject variance component across the spots was 0.0097 (corresponding to σ = 0.098 on the Sd scale), and the average time-within-subject variance component was 0.0019 (σ = 0.044).

3.6. Multivariate Analysis of Expression Patterns

The EDA module of DeCyder 2D was used to visually display the results of the current study and the previous TVS study [13] (Figure 3). The multivariate expression profiles of the samples across 328 spots that were matched on >75% of the spot maps from both studies were transformed into the principal component basis and the projection of the samples onto the first two principal components displayed (Figure 3). The tight scatter of the samples from the TVS (encircled in black) indicates the small magnitude of the experimental variability when analyzing technical replicates of the same human sample. The magnitude of the longitudinal variation exceeded the technical variation, as evidenced by the larger scatter of the sample points of a given subject at the different time points. For example, the two red and green ellipses (Figure 3) highlight the longitudinal variation for subjects 1 and 11, respectively. The differential scattering of the samples, from the subjects into varying regions of the principal components plot, indicates that the subject-to-subject variation exceeded the longitudinal variation within subjects.
Figure 3

Principal component analysis of the 33 samples from the present study and the 8 replicates from the previous Technical Variation Study (TVS) [9], color-coded according to the legend, projected onto the first two principal components. Ellipses highlighting subjects 1 (red), 11 (green), and the TVS (black) are added for illustrative purposes only.

Hierarchical clustering was used to group the 33 samples based on the similarity of their protein expression profiles along the 397 high-quality spots that were matched on >75% of the gels (Figure 4). Clustering was performed in on the proteins and experimental samples, using Euclidean distance and average linkage to define similarity. For all subjects, the first clustering step placed the three samples of the given subject into one cluster. Samples of the same subject collected at the three time points were most similar to each other, as evidenced by the succession of self-similar bands of three rows (highlighted by the yellow lines in Figure 4). The clustering also shows a general trend of clustering the samples based on gender appeared (highlighted by the blue and red bars for males and females, resp., in Figure 4).
Figure 4

Hierarchical clustering of the 33 samples (y-axis) based on the abundance of the 397 high-quality protein spots on the x-axis, using Euclidean distance and average linkage. The samples are in SubjectNumberTime format, where SubjectNumber ranges from 01 to 11, and the Time values {x, y, and z } correspond to {T1, T2, and T3}. The intensities range from −1.5-fold change (bright green) to 1.5-fold change (bright red). The dendrogram on the right indicates the order of the sample grouping, with more similar samples being grouped together first. The color band on the left shows the genders of the samples, with red for females, and blue for males.

3.7. Gender Effects

In addition to the results seen in the hierarchical clustering (Figure 4), after fitting the mixed-effects model to the residuals from the SLA at the spots matched on 75% of the gels, 17 spots showed gender-effect P-value <.01. None of these spots were found to be significant (P-value <.01) after the False Discovery Rate (FDR) method [41] for multiple comparisons was applied suggesting that larger numbers of samples are needed to validate gender differences in the human plasma proteome. Despite the lack of statistically significant data on gender differences, trends in this dataset suggest that future, larger datasets might enable the differentiation of protein expression levels due to gender. One spot, 1659, had FDR-adjusted gender-effect P-value equal to.055 with a 1.49-fold-change between the male and female groups (Figure 5). Five additional spots (466, 1626 alpha-2-HS-glycoprotein, 1650 alpha-2-HS-glycoprotein, 1652 alpha-2-HS-glycoprotein, 1678) had adjusted P-values of.11. The results were similar when fitting the same model to the residuals from the statistically normalized SLA, albeit with P-values that slightly exceeded their corresponding values based on the SLA. Three of the spots exhibiting gender effects were identified as alpha-2-HS-glycoprotein, which has been shown to vary between males and females. The concentration of alpha-2-HS-glycoprotein has been found to undergo a progressive age-related decrease in women, while men show no noticeable change [36].
Figure 5

Expression data for alpha-2-HS-glycoprotein, with an average increase of 1.49-fold between the female and male groups, and FDR-adjusted gender-effect P-value =.055. The samples are in SubjectNumberTime format, where SubjectNumber ranges from 01 to 11, and the Time values {x, y, and z } correspond to {T1, T2, and T3}. The annotations indicate the gels (numbers) and the dyes (red for Cy5, green for Cy3) corresponding to the samples. Dotted lines connect samples multiplexed on the same gel. Crosses indicate sample averages over the technical replicates. The solid line connects all sample averages. Boxes around the three Time values for each SubjectNumber highlight male and female genders (blue and red respectively) added for illustrative purposes only.

The removal of the dye and gel effects using the model in (1) proved to be a beneficial preprocessing step. Without this step, when the mixed-effect model was fit to the original SLA, the smallest FDR-adjusted gender-effect P-value was.19 (spot 1659). When the same model was fit to the statistically normalized SLA, the smallest FDR-adjusted P-value was also.19 (spot 1659). The statistical normalization improved the quality of the data slightly, but it did not reduce dramatically the observed P-values. On the other hand, pooling the information across the gels to remove the common dye and gel effects strengthened the signal and reduced markedly the FDR-adjusted P-values. As explained in the previous paragraph, for spot 1659, the new P-value was close to.05.

3.8. Multivariate Analysis of Identified Proteins

Hierarchical clustering of the identified proteins (Figure 6) was conducted using the Euclidean distance metric and average linkage methods. For all subjects, other than subject 10, the first clustering step placed the three samples of that subject into one cluster. Subject 10 had two time points grouped together (x and z) with the third point (y) separated by Subject 3. Because the 78 identified proteins were the most differential between time and subjects, it is not unexpected to see clustering results that may not perfectly align all subjects or time points. For example, all protein spots identified as transferrin (TF) clustered together due to their similar expression patterns, while the vitamin D-binding protein (GC) spots were found in multiple clusters due to differences in expression patterns between the individuals. Multiple proteins may cluster together due to coregulation and similar functions, and in the case of APOA4 and APOA1 (Figure 6), coregulation has been reported [37, 42].
Figure 6

Hierarchical clustering of the 33 samples (x-axis) based on the abundance of the 78-identified protein spots on the y-axis, using Euclidean distance metrics and average linkage methods. The samples are in SubjectNumberTime format, where SubjectNumber ranges from 01 to 11, and the Time values (x, y, and z) correspond to (T1, T2, and T3). The intensities range from −1.5 (bright green) to 1.5 (bright red). The dendrogram on the top indicates the order of the sample grouping, with samples corresponding to the lower leaves being grouped together first. Similarly, the dendrogram on the left indicates the ordering of the protein spots. All transferrin (TF) and vitamin D-binding protein (GC) identifications are highlighted in blue and red, respectively.

4. Conclusion

Statistical analysis of a 2-D DIGE experiment involving triplicate plasma samples from eleven human subjects taken at three time points separated by several weeks demonstrated that the subject-to-subject variation exceeded the time-within-subject variation. The variation in the human plasma proteome reported here was greater than a previous technical variation study wherein one plasma sample was processed multiple times [13]. Here, for 70% of the high-quality protein spots, the coefficient of variation of the SLA was less than 30% across all subjects and time points, thus indicating that the baseline expression levels of those proteins are relatively stable in the population represented by the subjects in this study. Only 21 spots had larger than 50% CV, suggesting that these protein isoforms should be avoided as biomarker candidates. Many of these protein spots represent medium to high abundance plasma proteins. Since they are higher abundance, they might bias LC/MS datasets, but since the total number of these spots relative to the total plasma proteome is small, their total influence on a sample is likely also small. In addition, protein spots with gender-related differences should be considered separately for males and females. However, more thorough studies, including the use of a larger population set with additional time points over longer periods of time, are recommended to more fully address individual, longitudinal, and gender variability as related to biomarker discovery. We noted that preprocessing the data by first removing the fixed effects of the gels and dyes was important in data analysis and improved the quality of the data. This step resulted in six protein spots showing a statistically significant gender effect at an FDR-adjusted 11% significance level. Without the preprocessing step, the smallest gender effect P-value was.19. While removing the gel and dye effects lead to stronger conclusions, the additional statistical normalization of the SLA had only marginal effects and did not alter the conclusions. Spot matching confounds gel- and software-related protein differences with real biological effects. In the present study, we only considered spots that were matched on at least 75% of the gels. Spots with lower matching quality can be investigated separately, as they may correspond to proteins that are absent or have very low expression in certain individuals, but which may have biological significance. We envision that such studies will become more relevant as the field of personalized medicine matures, and as detection and matching algorithms continue to improve. This study represents a first step toward quantitating the longitudinal and individual variation in the human plasma proteome, as measured on the 2-D DIGE platform. Interestingly, gender-related variations were also detected suggesting that gender variability should also be considered in biomarker discovery. Future, larger-scale experiments that include more subjects representative of various population segments, encompassing differences in ethnicity, age, gender, disease status, and other relevant factors, have the potential to define baseline proteomic similarities and differences in the human population, which will in turn facilitate improved biomarker discovery. The supplementary material contains one table and one figure. Both of these are used to better illustrate the experimental design used in this project. In Supplementary Table 1, each individual sample is listed as it appears on one of the 33 gels run for this experiment. Each individual was randomly assigned a number (1, 2, 3,…, 11) and each time point was assigned a letter (x, y, or z). Each gel contained three samples, one internal pooled standard and two experimental samples. Each experimental sample was dye-swapped and labeled with both the Cy3 and the Cy5 dyes, while the pooled standard consisting of an equal amount of each of the 33 samples was labeled with Cy2 and included on every gel. Supplementary Figure 1 shows a graphical representation of how the experimental design was conducted and how any number of samples could be incorporated into this experimental design to obtain similar statistically valid results. Rather than randomly pairing the samples on the gels, the design was selected to minimize the experimental variation among the samples whose comparison was of most interest. Samples represented as circles are joined by arrows to represent the two samples that are directly compared on a given gel. Click here for additional data file.
  29 in total

1.  Proteomic analysis of human serum by two-dimensional differential gel electrophoresis after depletion of high-abundant proteins.

Authors:  Brett A Chromy; Arlene D Gonzales; Julie Perkins; Megan W Choi; Michele H Corzett; Brian C Chang; Christopher H Corzett; Sandra L McCutchen-Maloney
Journal:  J Proteome Res       Date:  2004 Nov-Dec       Impact factor: 4.466

Review 2.  The development of the DIGE system: 2D fluorescence difference gel analysis technology.

Authors:  Rita Marouga; Stephen David; Edward Hawkins
Journal:  Anal Bioanal Chem       Date:  2005-05-18       Impact factor: 4.142

3.  All about DIGE: quantification technology for differential-display 2D-gel proteomics.

Authors:  Kathryn S Lilley; David B Friedman
Journal:  Expert Rev Proteomics       Date:  2004-12       Impact factor: 3.940

Review 4.  Utilizing human blood plasma for proteomic biomarker discovery.

Authors:  Jon M Jacobs; Joshua N Adkins; Wei-Jun Qian; Tao Liu; Yufeng Shen; David G Camp; Richard D Smith
Journal:  J Proteome Res       Date:  2005 Jul-Aug       Impact factor: 4.466

5.  Depletion of multiple high-abundance proteins improves protein profiling capacities of human serum and plasma.

Authors:  Lynn A Echan; Hsin-Yao Tang; Nadeem Ali-Khan; KiBeom Lee; David W Speicher
Journal:  Proteomics       Date:  2005-08       Impact factor: 3.984

6.  Population proteomics: addressing protein diversity in humans.

Authors:  Dobrin Nedelkov
Journal:  Expert Rev Proteomics       Date:  2005-06       Impact factor: 3.940

7.  Human complement component C4. Structural studies on the fragments derived from C4b by cleavage with C3b inactivator.

Authors:  E M Press; J Gagnon
Journal:  Biochem J       Date:  1981-11-01       Impact factor: 3.857

8.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

9.  The apolipoprotein A-I/C-III/A-IV gene cluster: ApoC-III and ApoA-IV expression is regulated by two common enhancers.

Authors:  L Vergnes; T Taniguchi; K Omori; M M Zakin; A Ochoa
Journal:  Biochim Biophys Acta       Date:  1997-10-18

10.  Variations in the serum concentration and urine excretion of alpha 2HS-glycoprotein, a bone-related protein, in normal individuals and in patients with osteogenesis imperfecta.

Authors:  I R Dickson; M Bagga; C R Paterson
Journal:  Calcif Tissue Int       Date:  1983       Impact factor: 4.333

View more
  14 in total

1.  Plasma biomarkers for neuronal ceroid lipofuscinosis.

Authors:  Samantha L Hersrud; Ryan D Geraets; Krystal L Weber; Chun-Hung Chan; David A Pearce
Journal:  FEBS J       Date:  2015-12-17       Impact factor: 5.542

2.  Concordant release of glycolysis proteins into the plasma preceding a diagnosis of ER+ breast cancer.

Authors:  Lynn M Amon; Sharon J Pitteri; Christopher I Li; Martin McIntosh; Jon J Ladd; Mary Disis; Peggy Porter; Chee Hong Wong; Qing Zhang; Paul Lampe; Ross L Prentice; Samir M Hanash
Journal:  Cancer Res       Date:  2012-02-24       Impact factor: 12.701

Review 3.  Mass spectrometric immunoassays for discovery, screening and quantification of clinically relevant proteoforms.

Authors:  Olgica Trenchevska; Randall W Nelson; Dobrin Nedelkov
Journal:  Bioanalysis       Date:  2016-07-11       Impact factor: 2.681

4.  Differentially expressed urinary biomarkers in children with idiopathic nephrotic syndrome.

Authors:  C P Suresh; Abhijeet Saha; Manpreet Kaur; Ritesh Kumar; N K Dubey; Trayambak Basak; Vinay Singh Tanwar; Gaurav Bhardwaj; Shantanu Sengupta; Vineeta Vijay Batra; Ashish Datt Upadhyay
Journal:  Clin Exp Nephrol       Date:  2015-09-09       Impact factor: 2.801

5.  HER2-associated radioresistance of breast cancer stem cells isolated from HER2-negative breast cancer cells.

Authors:  Nadire Duru; Ming Fan; Demet Candas; Cheikh Menaa; Hsin-Chen Liu; Danupon Nantajit; Yunfei Wen; Kai Xiao; Angela Eldridge; Brett A Chromy; Shiyong Li; Douglas R Spitz; Kit S Lam; Max S Wicha; Jian Jian Li
Journal:  Clin Cancer Res       Date:  2012-10-22       Impact factor: 12.531

6.  Computational mass spectrometry-based proteomics.

Authors:  Lukas Käll; Olga Vitek
Journal:  PLoS Comput Biol       Date:  2011-12-01       Impact factor: 4.475

7.  Delineation of concentration ranges and longitudinal changes of human plasma protein variants.

Authors:  Olgica Trenchevska; David A Phillips; Randall W Nelson; Dobrin Nedelkov
Journal:  PLoS One       Date:  2014-06-23       Impact factor: 3.240

8.  Interindividual variation in the proteome of human peripheral blood mononuclear cells.

Authors:  Evelyne Maes; Bart Landuyt; Inge Mertens; Liliane Schoofs
Journal:  PLoS One       Date:  2013-04-11       Impact factor: 3.240

9.  Wound outcome in combat injuries is associated with a unique set of protein biomarkers.

Authors:  Brett A Chromy; Angela Eldridge; Jonathan A Forsberg; Trevor S Brown; Benjamin C Kirkup; Crystal Jaing; Nicholas A Be; Eric Elster; Paul A Luciw
Journal:  J Transl Med       Date:  2013-11-06       Impact factor: 5.531

10.  Proteomic sample preparation for blast wound characterization.

Authors:  Brett A Chromy; Angela Eldridge; Jonathan A Forsberg; Trevor S Brown; Benjamin C Kirkup; Eric Elster; Paul Luciw
Journal:  Proteome Sci       Date:  2014-02-14       Impact factor: 2.480

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.