| Literature DB >> 25177766 |
Mathieu Lavallée-Adam1, Navin Rauniyar, Daniel B McClatchy, John R Yates.
Abstract
The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights.Entities:
Keywords: bioinformatics; computational biology; cystic fibrosis; gene ontology; gene set enrichment analysis; isobaric tandem mass tagging; mass spectrometry; protein quantification; spectral counting; statistics
Mesh:
Substances:
Year: 2014 PMID: 25177766 PMCID: PMC4258137 DOI: 10.1021/pr500473n
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1PSEA-Quant workflow. PSEA-Quant computes a PES for each protein set and assesses the statistical significance of each PES by providing a p-value. These p-values are then transformed to q-values to correct for multiple hypothesis testing. Finally, the core of each statistically significant protein set is identified.
Figure 2Graphical representation of the PES calculation for label-free quantification. (A) Heat map representations of log normalized spectral counts log(SC) and normalized spectral count coefficients of variation (CV) for all proteins identified in the three biological replicates of the CFBE data set. A fictitious example of an annotation protein set of seven proteins is illustrated on the heat maps. (B) Color-coded representation of the enrichment score weight matrix W. Proteins p from the fictitious protein set in (A) are mapped onto W based on their respective protein mean abundances and abundance coefficients of variation CV. The sum of the enrichment score weights corresponding to their position in the matrix is then assigned as the PES of the fictitious protein set.
Figure 3FDRs associated with enrichment p-values computed by PSEA-Quant for the CFBE data set. The p-values were computed using the uniform and weighted Monte Carlo sampling procedures with mean abundance CV tolerance values of 1.0, 0.5, and 0.1 of the sampled protein sets (as described in Methods). When no CV tolerance was applied, the data are labeled as “Protein Independence”. FDRs were estimated using the “protein number randomization” strategy (as described in Methods).
Significant GO Terms Identified by PSEA-Quant (q-value < 0.1) and not Significant According to the GO Enrichment Analysis of Both Ontologizer and GOrilla (p-value ≥ 0.001) for the CFBE Data Seta
| GO term | best Ontologizer | best
GOrilla | GO term total size | number of proteins with GO term in CFBE data set | ||
|---|---|---|---|---|---|---|
| phosphopyruvate hydratase complex | <10–5 | <0.01 | 0.237 | >0.001 | 4 | 3 |
| RNA splicing, via transesterification reactions with bulged adenosine as nucleophile | <10–5 | <0.01 | 0.521 | >0.001 | 218 | 148 |
| UTP binding | 3.0 × 10–5 | 0.01 | 0.009 | >0.001 | 3 | 3 |
| pyrimidine ribonucleoside binding | 3.0 × 10–5 | 0.01 | 0.011 | >0.001 | 3 | 3 |
| nuclear envelope disassembly | 4.0 × 10–5 | 0.01 | 0.057 | >0.001 | 39 | 36 |
| positive regulation of cell size | 5.0 × 10–5 | 0.01 | 0.103 | >0.001 | 8 | 3 |
| protein binding involved in protein folding | 6.0 × 10–5 | 0.01 | 0.020 | >0.001 | 5 | 3 |
| exopeptidase activity | 7.0 × 10–5 | 0.01 | 0.016 | >0.001 | 109 | 37 |
| pyridoxal phosphate binding | 8.0 × 10–5 | 0.01 | 0.015 | >0.001 | 55 | 27 |
| positive regulation of protein import into nucleus, translocation | 9.0 × 10–5 | 0.01 | 0.011 | >0.001 | 11 | 6 |
| cell cortex part | 1.7 × 10–4 | 0.02 | 0.005 | >0.001 | 101 | 53 |
| glucose transport | 2.0 × 10–4 | 0.02 | 0.958 | >0.001 | 118 | 35 |
| fatty-acyl-CoA metabolic process | 2.0 × 10–4 | 0.02 | 0.316 | >0.001 | 26 | 11 |
| response to salt stress | 2.3 × 10–4 | 0.03 | 0.528 | >0.001 | 22 | 6 |
| hexose transport | 2.5 × 10–4 | 0.03 | 0.342 | >0.001 | 119 | 35 |
| adenylate cyclase-activating G-protein coupled receptor signaling pathway | 2.6 × 10–4 | 0.03 | 0.746 | >0.001 | 52 | 6 |
| NADP binding | 2.9 × 10–4 | 0.03 | 0.053 | >0.001 | 47 | 25 |
| sarcoplasm | 3.2 × 10–4 | 0.03 | 0.313 | >0.001 | 61 | 3 |
| transferase activity, transferring nitrogenous groups | 3.2 × 10–4 | 0.03 | 0.052 | >0.001 | 27 | 12 |
| dATP binding | 3.8 × 10–4 | 0.03 | 0.013 | >0.001 | 4 | 4 |
| cerebellar Purkinje cell layer development | 4.7 × 10–4 | 0.03 | 0.030 | >0.001 | 23 | 6 |
| protein N-linked glycosylation via asparagine | 4.7 × 10–4 | 0.03 | 0.052 | >0.001 | 92 | 54 |
| positive regulation of striated muscle contraction | 5.0 × 10–4 | 0.03 | 0.034 | >0.001 | 9 | 3 |
| mRNA binding | 6.1 × 10–4 | 0.05 | 0.031 | >0.001 | 105 | 73 |
| polypurine tract binding | 6.2 × 10–4 | 0.06 | 0.044 | >0.001 | 14 | 10 |
| aminopeptidase activity | 7.8 × 10–4 | 0.08 | 0.005 | >0.001 | 36 | 19 |
| GTPase inhibitor activity | 8.0 × 10–4 | 0.08 | 0.017 | >0.001 | 13 | 4 |
| opsonin binding | 9.2 × 10–4 | 0.09 | 0.348 | >0.001 | 9 | 3 |
| misfolded protein binding | 9.9 × 10–4 | 0.09 | 0.283 | >0.001 | 7 | 3 |
Redundant GO terms were removed and p-values and q-values were rounded.
Top Significant GO Terms Identified by PSEA-Quant in the CFBE Data Set (q-value < 0.06) but Not Significant in the Wild Type Data Set (q-value ≥ 0.1)a
| GO term | CFBE | CFBE | Wild Type | Wild Type | GO term total size | number of proteins with GO term in CFBE data set |
|---|---|---|---|---|---|---|
| catalytic step 2 spliceosome | <10–5 | <0.01 | 0.002 | 0.16 | 80 | 73 |
| regulation of apoptotic process | <10–5 | <0.01 | 0.030 | 0.65 | 1162 | 418 |
| cytosolic part | <10–5 | <0.01 | 0.004 | 0.22 | 184 | 131 |
| lyase activity | <10–5 | <0.01 | 0.002 | 0.13 | 387 | 64 |
| COP9 signalosome | 1.0 × 10–5 | <0.01 | 0.003 | 0.19 | 35 | 26 |
| response to interleukin-4 | 1.0 × 10–5 | <0.01 | 0.008 | 0.30 | 29 | 13 |
| nuclear pore | 2.0 × 10–5 | 0.01 | 0.009 | 0.33 | 67 | 49 |
| nitric-oxide synthase regulator activity | 4.0 × 10–5 | 0.01 | 0.001 | 0.11 | 6 | 4 |
| negative regulation of cell cycle phase transition | 5.0 × 10–5 | 0.01 | 0.103 | 0.91 | 172 | 97 |
| negative regulation of dephosphorylation | 6.0 × 10–5 | 0.01 | 0.250 | 1.00 | 6 | 5 |
| exopeptidase activity | 7.0 × 10–5 | 0.01 | 0.027 | 0.55 | 109 | 37 |
| positive regulation of protein insertion into mitochondrial membrane involved in apoptotic signaling pathway | 7.0 × 10–5 | 0.01 | 0.090 | 0.87 | 24 | 16 |
| pyridoxal phosphate binding | 8.0 × 10–5 | 0.01 | 0.050 | 0.76 | 55 | 27 |
| antigen processing and presentation of peptide antigen | 8.0 × 10–5 | 0.01 | 0.011 | 0.36 | 186 | 157 |
| structural constituent of cytoskeleton | 8.0 × 10–5 | 0.01 | 0.002 | 0.16 | 96 | 59 |
| cell junction assembly | 9.0 × 10–5 | 0.01 | 0.001 | 0.11 | 186 | 74 |
| pyridine nucleotide metabolic process | 1.2 × 10–4 | 0.01 | 0.017 | 0.44 | 52 | 28 |
| pyrimidine nucleotide binding | 1.4 × 10–4 | 0.02 | 0.020 | 0.48 | 8 | 5 |
| fatty-acyl-CoA metabolic process | 2.0 × 10–4 | 0.02 | 0.003 | 0.20 | 26 | 11 |
| response to salt stress | 2.3 × 10–4 | 0.03 | 0.030 | 0.65 | 22 | 6 |
| signal sequence binding | 2.3 × 10–4 | 0.03 | 0.022 | 0.50 | 21 | 12 |
| positive regulation of protein modification process | 2.5 × 10–4 | 0.03 | 0.030 | 0.65 | 803 | 273 |
| adenyl deoxyribonucleotide binding | 2.7 × 10–4 | 0.03 | 0.012 | 0.37 | 5 | 4 |
| NADP binding | 2.9 × 10–4 | 0.03 | 0.005 | 0.24 | 47 | 25 |
| zona pellucida receptor complex | 3.0 × 10–4 | 0.03 | 0.003 | 0.17 | 11 | 9 |
| intracellular organelle part | 3.1 × 10–4 | 0.03 | 0.007 | 0.28 | 6964 | 3132 |
| response to endoplasmic reticulum stress | 3.1 × 10–4 | 0.03 | 0.014 | 0.41 | 126 | 67 |
| sarcoplasm | 3.2 × 10–4 | 0.03 | 0.280 | 1.00 | 61 | 3 |
| protein N-linked glycosylation | 3.4 × 10–4 | 0.03 | 0.030 | 0.65 | 100 | 55 |
| dATP binding | 3.8 × 10–4 | 0.03 | 0.080 | 0.86 | 4 | 4 |
| organic acid metabolic process | 3.8 × 10–4 | 0.03 | 0.005 | 0.25 | 1042 | 382 |
| ribosomal protein import into nucleus | 4.2 × 10–4 | 0.03 | 0.009 | 0.33 | 4 | 4 |
| positive regulation of cell cycle process | 4.3 × 10–4 | 0.03 | 0.030 | 0.65 | 185 | 93 |
| positive regulation of catalytic activity | 4.3 × 10–4 | 0.03 | 0.015 | 0.43 | 1242 | 354 |
| peptidyl-asparagine modification | 4.4 × 10–4 | 0.03 | 0.050 | 0.76 | 93 | 55 |
| cerebellar purkinje cell layer development | 4.7 × 10–4 | 0.03 | 0.005 | 0.24 | 23 | 6 |
| positive regulation of molecular function | 4.7 × 10–4 | 0.03 | 0.030 | 0.65 | 1565 | 420 |
| positive regulation of striated muscle contraction | 5.0 × 10–4 | 0.03 | 0.002 | 0.15 | 9 | 3 |
| oxoacid metabolic process | 5.2 × 10–4 | 0.03 | 0.005 | 0.25 | 1026 | 378 |
| calcium-dependent phospholipid binding | 6.0 × 10–4 | 0.05 | 0.020 | 0.48 | 33 | 15 |
Redundant GO terms were removed and p-values and q-values were rounded.
Figure 4Scatter plot of the union of all proteins identified in all three replicates of the CFBE data set. The normalized spectral count coefficient of variation across all three replicates of each protein is plotted against its mean normalized spectral count. Proteins annotated with representative examples of GO terms related to cystic fibrosis identified as significant by PSEA-Quant in the CFBE data set, but not in the Wild Type one are color-coded. HSPA1A and HSP90AB1 are respectively colored in brown and purple due to their presence in more than one protein sets. Protein names in each protein set are listed. Protein names in bold correspond to proteins belonging to the core of a protein set.
Top Significant GO Terms Identified by PSEA-Quant As Enriched for Upregulated Proteins with Low Coefficient of Variation in Human vs Rat in the Human and Rat (HR) Frontal Cortex TMT Protein Quantification Data Set (q-value < 0.01)a
| GO term | number of proteins with GO term in HR data set | ||
|---|---|---|---|
| generation of precursor metabolites and energy | <10–5 | <0.01 | 162 |
| NADH dehydrogenase complex | <10–5 | <0.01 | 33 |
| single-organism biosynthetic process | <10–5 | <0.01 | 146 |
| cellular response to reactive oxygen species | <10–5 | <0.01 | 35 |
| response to hydrogen peroxide | <10–5 | <0.01 | 39 |
| oxidoreductase activity | <10–5 | <0.01 | 291 |
| mitochondrial part | <10–5 | <0.01 | 409 |
| monocarboxylic acid metabolic process | <10–5 | <0.01 | 148 |
| electron transport chain | <10–5 | <0.01 | 63 |
| organelle membrane | <10–5 | <0.01 | 766 |
| extracellular region | <10–5 | <0.01 | 195 |
| mitochondrial respiratory chain complex I | <10–5 | <0.01 | 33 |
| NADH dehydrogenase (ubiquinone) activity | <10–5 | <0.01 | 28 |
| respiratory chain complex I | <10–5 | <0.01 | 33 |
| response to wounding | <10–5 | <0.01 | 98 |
| response to ionizing radiation | <10–5 | <0.01 | 21 |
| cellular response to hydrogen peroxide | <10–5 | <0.01 | 25 |
| mitochondrion | <10–5 | <0.01 | 649 |
| antioxidant activity | <10–5 | <0.01 | 34 |
| lysosomal lumen | <10–5 | <0.01 | 24 |
| hydrogen ion transmembrane transporter activity | <10–5 | <0.01 | 44 |
| cellular modified amino acid metabolic process | <10–5 | <0.01 | 81 |
| mitochondrial ATP synthesis coupled proton transport | <10–5 | <0.01 | 11 |
| respiratory electron transport chain | <10–5 | <0.01 | 63 |
| vacuolar lumen | <10–5 | <0.01 | 25 |
| cytochrome | <10–5 | <0.01 | 14 |
| peroxidase activity | <10–5 | <0.01 | 18 |
| mitochondrial membrane | <10–5 | <0.01 | 262 |
| oxidation–reduction process | <10–5 | <0.01 | 245 |
| mitochondrial electron transport, NADH to ubiquinone | <10–5 | <0.01 | 26 |
| bicarbonate transport | 1.0 × 10–5 | <0.01 | 6 |
| cellular response to oxidative stress | 1.0 × 10–5 | <0.01 | 49 |
| regulation of blood vessel size | 1.0 × 10–5 | <0.01 | 14 |
| glutathione derivative metabolic process | 1.0 × 10–5 | <0.01 | 13 |
| sterol metabolic process | 1.0 × 10–5 | <0.01 | 40 |
| fatty acid catabolic process | 1.0 × 10–5 | <0.01 | 33 |
| carboxylic ester hydrolase activity | 2.0 × 10–5 | <0.01 | 30 |
| steroid metabolic process | 2.0 × 10–5 | <0.01 | 60 |
| protein activation cascade | 2.0 × 10–5 | <0.01 | 13 |
| lipid metabolic process | 2.0 × 10–5 | <0.01 | 321 |
| organonitrogen compound biosynthetic process | 3.1 × 10–5 | <0.01 | 185 |
| response to axon injury | 3.1 × 10–5 | <0.01 | 18 |
Redundant GO terms were removed and p-values and q-values were rounded.
Top Significant GO Terms Identified by PSEA-Quant As Enriched for Upregulated Proteins with Low Coefficient of Variation in Human vs Mouse in the Human, Mouse, and Rat (HMR) Frontal Cortex TMT protein quantification dataset (q-value < 0.02)a
| GO term | number of proteins with GO term in HMR data set | ||
|---|---|---|---|
| mitochondrial respiratory chain complex I | <10–5 | 0.01 | 38 |
| NADH dehydrogenase (quinone) activity | <10–5 | 0.01 | 30 |
| blood coagulation | <10–5 | 0.01 | 90 |
| protein folding | <10–5 | 0.01 | 88 |
| hydrogen ion transmembrane transporter activity | <10–5 | 0.01 | 48 |
| mitochondrial electron transport, NADH to ubiquinone | <10–5 | 0.01 | 28 |
| single-organism metabolic process | <10–5 | 0.01 | 1014 |
| respiratory electron transport chain | <10–5 | 0.01 | 65 |
| cofactor binding | <10–5 | 0.01 | 138 |
| cytoplasmic membrane-bounded vesicle lumen | <10–5 | 0.01 | 25 |
| catabolic process | <10–5 | 0.01 | 640 |
| oxidoreductase activity, acting on NAD(P)H | <10–5 | 0.01 | 56 |
| antioxidant activity | <10–5 | 0.01 | 33 |
| mitochondrial membrane part | <10–5 | 0.01 | 105 |
| extracellular matrix | <10–5 | 0.01 | 69 |
| mitochondrion | <10–5 | 0.01 | 672 |
| response to oxidative stress | <10–5 | 0.01 | 95 |
| hemostasis | <10–5 | 0.01 | 92 |
| cytosol | <10–5 | 0.01 | 765 |
| monocarboxylic acid metabolic process | <10–5 | 0.01 | 147 |
| organelle membrane | <10–5 | 0.01 | 738 |
| intracellular organelle lumen | <10–5 | 0.01 | 178 |
| platelet activation | <10–5 | 0.01 | 48 |
| cell activation | <10–5 | 0.01 | 104 |
| regulation of body fluid levels | <10–5 | 0.01 | 113 |
| protease binding | <10–5 | 0.01 | 19 |
| platelet degranulation | <10–5 | 0.01 | 28 |
| mitochondrial part | <10–5 | 0.01 | 402 |
| generation of precursor metabolites and energy | <10–5 | 0.01 | 151 |
| carboxylic acid metabolic process | <10–5 | 0.01 | 336 |
| oxoacid metabolic process | <10–5 | 0.01 | 351 |
| melanosome | 1.0 × 10–5 | 0.01 | 52 |
| organic acid catabolic process | 1.0 × 10–5 | 0.01 | 91 |
| sulfur compound metabolic process | 1.0 × 10–5 | 0.01 | 78 |
| cellular amino acid metabolic process | 1.0 × 10–5 | 0.01 | 186 |
| lipid metabolic process | 1.0 × 10–5 | 0.01 | 303 |
| xenobiotic metabolic process | 2.0 × 10–5 | 0.01 | 30 |
| small molecule catabolic process | 2.0 × 10–5 | 0.01 | 113 |
| GTPase activity | 2.0 × 10–5 | 0.01 | 113 |
| organonitrogen compound metabolic process | 2.0 × 10–5 | 0.01 | 558 |
| cellular modified amino acid metabolic process | 3.0 × 10–5 | 0.01 | 82 |
| cell junction assembly | 3.0 × 10–5 | 0.01 | 54 |
Redundant GO terms were removed and p-values and q-values were rounded.