| Literature DB >> 22870920 |
Parminder Kaur1, Daniela Schlatzer, Kenneth Cooke, Mark R Chance.
Abstract
BACKGROUND: An approach to molecular classification based on the comparative expression of protein pairs is presented. The method overcomes some of the present limitations in using peptide intensity data for class prediction for problems such as the detection of a disease, disease prognosis, or for predicting treatment response. Data analysis is particularly challenging in these situations due to sample size (typically tens) being much smaller than the large number of peptides (typically thousands). Methods based upon high dimensional statistical models, machine learning or other complex classifiers generate decisions which may be very accurate but can be complex and difficult to interpret in simple or biologically meaningful terms. A classification scheme, called ProtPair, is presented that generates simple decision rules leading to accurate classification which is based on measurement of very few proteins and requires only relative expression values, providing specific targeted hypotheses suitable for straightforward validation.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22870920 PMCID: PMC3468399 DOI: 10.1186/1471-2105-13-191
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Protein pairs ranked by the discriminability index scores
| APCS Serum amyloid P-component | HGFAC Hepatocyte growth factor activator | 2.01 | 0 | 0.00 |
| C8G Complement component C8 gamma chain | HGFAC Hepatocyte growth factor activator | 1.60 | 1.6×10−6 | 0.01 |
| APCS Serum amyloid P-component | CFHR1;LOC100293069 Complement factor H-related 1 | 1.58 | 1.6×10−6 | 0.01 |
| C4BPA C4b-binding protein alpha chain | ENO2 Gamma-enolase | 1.48 | 6.3×10−6 | 0.01 |
| C4BPA C4b-binding protein alpha chain | HGFAC Hepatocyte growth factor activator | 1.45 | 9.5×10−6 | 0.01 |
| APOA4 Apolipoprotein A-IV | ALB Putative uncharacterized protein ALB | 1.44 | 9.5×10−6 | 0.01 |
| APCS Serum amyloid P-component | CFHR2 Isoform Short of Complement factor H-related protein 2 | 1.43 | 1.1×10−6 | 0.01 |
| F2 Prothrombin (Fragment) | HGFAC Hepatocyte growth factor activator | 1.37 | 3.5×10−6 | 0.03 |
| APOD Apolipoprotein D | HGFAC Hepatocyte growth factor activator | 1.35 | 5.7×10−6 | 0.05 |
| CPB2 Isoform 1 of Carboxypeptidase B2 | HGFAC Hepatocyte growth factor activator | 1.31 | 9.3×10−6 | 0.08 |
| SERPINA6 Corticosteroid-binding globulin | HGFAC Hepatocyte growth factor activator | 1.30 | 9.8×10−6 | 0.09 |
| APCS Serum amyloid P-component | F10 Coagulation factor X | 1.30 | 1.1×10−4 | 0.09 |
| C8A Complement component C8 alpha chain | HGFAC Hepatocyte growth factor activator | 1.26 | 1.6×10−4 | 0.11 |
| APOA4 Apolipoprotein A-IV | GPX3 Glutathione peroxidase 3 | 1.25 | 1.7×10−4 | 0.10 |
| APOB Apolipoprotein B-100 | HGFAC Hepatocyte growth factor activator | 1.24 | 1.8×10−4 | 0.10 |
| FGG Isoform Gamma-B of Fibrinogen gamma chain | AZGP1 alpha-2-glycoprotein 1, zinc” | 1.23 | 2.0×10−4 | 0.10 |
| AGT Angiotensinogen | HGFAC Hepatocyte growth factor activator | 1.22 | 2.1×10−4 | 0.10 |
| APCS Serum amyloid P-component | ENO2 Gamma-enolase | 1.18 | 2.8×10−4 | 0.13 |
| CPB2 Isoform 1 of Carboxypeptidase B2 | C1QC Complement C1q subcomponent subunit C | 1.17 | 2.9×10−4 | 0.12 |
| APCS Serum amyloid P-component | GPX3 Glutathione peroxidase 3 | 1.17 | 3.0×10−4 | 0.11 |
| C4BPA C4b-binding protein alpha chain | LBP Lipopolysaccharide-binding protein | 1.16 | 3.0×10−4 | 0.11 |
| APCS Serum amyloid P-component | LOC653879 similar to complement component 3 | 1.15 | 3.3×10−4 | 0.11 |
| CPB2 Isoform 1 of Carboxypeptidase B2 | AFM Afamin | 1.15 | 3.3×10−4 | 0.10 |
Figure 1(a) and (b) Scatter plots for two pairs of peptides from top protein pair APCS (Serum amyloid P-component) and HGFAC (Hepatocyte growth factor activator). The two classes are represented using red and blue, the axes represent the abundance levels of the two peptides and the black line represents the decision boundary. Peptide sequences of APCS in Figure 1a and 1b: GYVIIKPLVWV, DNELLVYK, while corresponding sequences for HGFAC: LCNIEPDER and LHKPGVYTR. (c) and (d) Distribution of peptide signal abundance ratios ( ) from two unique peptide pairs originating from proteins APCS and HGFAC. Red and Blue indicate control and IPS progressors respectively.
Figure 2(a) DI score distribution of peptide pairs with random permutation of class labels, the location of true highest scoring peptide pair is indicated by arrow (b) Median DI score distribution for protein pairs with randomly assigned class labels, with true top scoring protein pair shown using arrow (c) DI score distribution from a randomly picked protein pair across all constituent peptides (d) DI score distribution from the highest scoring protein pair, APCS and HGFAC.
Figure 3Ingenuity Pathway Analysis (IPA) analysis: Biological processes and diseases most significantly associated with top 20 proteins identified by ProtPair.
Figure 4Top scoring biological network obtained using IPA analysis representing cluster of highly significant proteins identified by ProtPair.
Figure 5Most discriminating feature pair based upon MS1 only features.