| Literature DB >> 19243629 |
Chang Jin Shin1, Simon Wong, Melissa J Davis, Mark A Ragan.
Abstract
BACKGROUND: Many biological processes are mediated by dynamic interactions between and among proteins. In order to interact, two proteins must co-occur spatially and temporally. As protein-protein interactions (PPIs) and subcellular location (SCL) are discovered via separate empirical approaches, PPI and SCL annotations are independent and might complement each other in helping us to understand the role of individual proteins in cellular networks. We expect reliable PPI annotations to show that proteins interacting in vivo are co-located in the same cellular compartment. Our goal here is to evaluate the potential of using PPI annotation in determining SCL of proteins in human, mouse, fly and yeast, and to identify and quantify the factors that contribute to this complementarity.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19243629 PMCID: PMC2663780 DOI: 10.1186/1752-0509-3-28
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Comparison of PPI databases. (a) Numbers of proteins in each PPI database. The bars indicate, for each of the four species, the total number of unique proteins in the six databases (DBs) before (BF: black) and after (AF: grey) standardization of identifiers (IDs). (b) Numbers of PPIs in each database before (black) and after (grey) ID standardization. (c) Numbers of PPIs present in one, two, three, or four or more databases after redundancy reduction. Human PPIs have been accessioned into all six databases, mouse PPIs into five, and fly and yeast PPIs into four. The counts include homo- as well as hetero-dimeric interactions. (d) Numbers of PPIs for which one, two, or three or more distinct literature items are cited as evidence. (e) Numbers of PPIs supported by the four high-level experimental approaches. For a-e: H (human), M (mouse), F (fly) and Y (yeast).
Figure 2Subcellular location of proteins and PPIs. (a) Numbers of proteins with at least one Gene Ontology (GO) term for cellular component (CC). For each species, the six bars from left to right represent the numbers of proteins (1) in our in-house dataset (purple), (2) with at least one GO CC (orange), (3) with a GO CC term after filtration based on evidence code (4) with a GO term after collapse of terms (green), (5) with only one GO term (blue), and (6) with multiple GO terms (grey). Bars (5) and (6) are numbers after filtration based on evidence and collapse of GO terms. (b) Proportional distribution of protein numbers according to cellular compartment. Proteins are counted once in each annotated location; those with multiple locations have been counted a corresponding number of times. The bar on the far left represents overall proportion of compartment. The other three bars represent soluble (left), soluble secreted (middle), and membrane (right) proteins. GO CC terms were recorded after quality filtration and collapse. (c) For each species, numbers of co-PPIs in the three most-abundant CC locations. Both interacting proteins are required to have least one GO CC term. For a-c: CMV (cytoplasmic membrane-bound vesicle), Cyt (cytoplasm), End (endosome), ER (endoplasmic reticulum), ExM (extra cellular matrix), ExR (extra cellular region), Gol (golgi apparatus), LP (lipid particle), Lys (lysosome), Mel (melanosome), Mit (mitochondrion), Nuc (nucleus), Per (peroxisome), PM (plasma membrane), SV (synaptic vesicle).
Co-located PPIs in reference and evidence-supported subsets
| Known locations | 3298 | 2115 | 64% | 740 | 512 | 69% | 540 | 310 | 57% | 16110 | 9693 | 60% |
| Randomized locations | 3298 | 1045 | 32% | 740 | 268 | 36% | 540 | 220 | 41% | 16110 | 6118 | 38% |
| BPscore | 366 (11%) | 308 | 84% | 79 (11%) | 68 | 86% | 65 (12%) | 63 | 97% | 3387 (21%) | 3177 | 94% |
| DDI | 512 (16%) | 368 | 72% | 158 (21%) | 130 | 82% | 49 (9%) | 46 | 94% | 806 (5%) | 671 | 83% |
| Interologs | 134 (4%) | 114 | 85% | 73 (10%) | 56 | 77% | 22 (4%) | 20 | 91% | 256 (2%) | 234 | 91% |
| DB | 581 (18%) | 426 | 73% | 105 (14%) | 82 | 78% | 264 (49%) | 123 | 47% | 9579 (59%) | 5943 | 62% |
| Method | 378 (15%) | 280 | 74% | 110 (16%) | 75 | 68% | 27 (5%) | 24 | 89% | 2879 (20%) | 2329 | 81% |
| PMID | 281 (9%) | 213 | 76% | 56 (8%) | 46 | 82% | 23 (4%) | 21 | 91% | 1706 (11%) | 1462 | 86% |
| Biological evidence type (BIO) | 838 (25%) | 642 | 77% | 253 (34%) | 207 | 82% | 101 (19%) | 94 | 93% | 3825 (24%) | 3508 | 92% |
| Recorded evidence type (EVI) | 793 (23%) | 583 | 74% | 216 (30%) | 161 | 75% | 295 (54%) | 149 | 51% | 9733 (60%) | 6087 | 63% |
* The proportion of the reference set covered by total PPIs in each subset is indicated in parentheses.
Evaluation of prediction method variants on human reference and supported subsets
| Reference | 0.64 | 0.35 | 0.50 | 0.27 | 0.72 | 0.26 | 0.62 | 0.23 | 0.65 | 0.40 | 0.53 | 0.32 | 0.68 | 0.34 | 0.58 | 0.27 |
| BPscore | 0.84 | 0.56 | 0.76 | 0.46 | 0.86 | 0.55 | 0.79 | 0.42 | 0.85 | 0.57 | 0.77 | 0.47 | 0.86 | 0.55 | 0.79 | 0.43 |
| Interologs | 0.85 | 0.58 | 0.70 | 0.46 | 0.84 | 0.54 | 0.69 | 0.42 | 0.83 | 0.57 | 0.67 | 0.45 | 0.84 | 0.56 | 0.69 | 0.45 |
| DDI | 0.72 | 0.46 | 0.65 | 0.43 | 0.75 | 0.43 | 0.70 | 0.42 | 0.72 | 0.48 | 0.66 | 0.49 | 0.74 | 0.45 | 0.69 | 0.44 |
| PMID | 0.76 | 0.46 | 0.60 | 0.28 | 0.76 | 0.42 | 0.60 | 0.27 | 0.74 | 0.45 | 0.56 | 0.30 | 0.76 | 0.43 | 0.59 | 0.28 |
| Method | 0.74 | 0.38 | 0.61 | 0.29 | 0.76 | 0.37 | 0.63 | 0.27 | 0.73 | 0.41 | 0.58 | 0.32 | 0.75 | 0.38 | 0.62 | 0.29 |
| DB | 0.73 | 0.37 | 0.53 | 0.25 | 0.74 | 0.35 | 0.61 | 0.26 | 0.71 | 0.40 | 0.57 | 0.30 | 0.73 | 0.37 | 0.59 | 0.28 |
| ALL | 0.74 | 0.42 | 0.61 | 0.35 | 0.78 | 0.38 | 0.69 | 0.33 | 0.74 | 0.45 | 0.64 | 0.40 | 0.76 | 0.41 | 0.67 | 0.36 |
| BIO | 0.77 | 0.49 | 0.67 | 0.43 | 0.79 | 0.45 | 0.71 | 0.38 | 0.77 | 0.50 | 0.67 | 0.45 | 0.78 | 0.47 | 0.70 | 0.41 |
| EVI | 0.74 | 0.37 | 0.55 | 0.25 | 0.75 | 0.34 | 0.62 | 0.25 | 0.71 | 0.41 | 0.57 | 0.30 | 0.74 | 0.37 | 0.61 | 0.27 |
ALL: Union of all subsets (BPscore ∪ Interologs ∪ DDI ∪ PMID ∪ Method ∪ DB)
BIO: Union of biological evidence type subsets (BPscore ∪ Interologs ∪ DDI)
EVI: Union of recorded evidence type subsets (PMID ∪ Method ∪ DB)
M: Interactions involving an integral membrane protein
The permissive accuracy (PA) and strict accuracy (SA) were calculated for all variants (DISCRETE, MERGED, COMMON and MAJORITY) for all interactions, and for interactions involving an integral membrane protein (M).
Evaluation of SCL prediction methods using human union set (ALL)
| Total # of predicted proteins | 155 | 162 | 162 | 35 | 162 | 162 | 79 |
| # of correctly predicted proteins in PA | 133 | 140 | 141 | 21 | 88 | 99 | 53 |
| # of correctly predicted proteins in SA | 92 | 91 | 84 | 9 | 61 | 56 | 38 |
| PA | 0.86 | 0.86 | 0.87 | 0.60 | 0.54 | 0.61 | 0.67 |
| SA | 0.59 | 0.56 | 0.52 | 0.26 | 0.38 | 0.35 | 0.48 |
| Total # of predicted proteins | 686 | 803 | 807 | 411 | 807 | 807 | 499 |
| # of correctly predicted proteins in PA | 556 | 669 | 689 | 343 | 508 | 605 | 308 |
| # of correctly predicted proteins in SA | 350 | 372 | 314 | 211 | 320 | 367 | 193 |
| PA | 0.81 | 0.83 | 0.85 | 0.83 | 0.63 | 0.75 | 0.62 |
| SA | 0.51 | 0.46 | 0.39 | 0.51 | 0.40 | 0.45 | 0.39 |
| Total # of predicted proteins | 193 | 205 | 206 | 64 | 206 | 206 | 95 |
| # of correctly predicted proteins in PA | 75 | 87 | 92 | 47 | 121 | 116 | 56 |
| # of correctly predicted proteins in SA | 39 | 37 | 35 | 32 | 92 | 75 | 39 |
| PA | 0.39 | 0.42 | 0.45 | 0.73 | 0.59 | 0.56 | 0.59 |
| SA | 0.20 | 0.18 | 0.17 | 0.50 | 0.45 | 0.36 | 0.41 |
| Total # of predicted proteins | 1034 | 1170 | 1175 | 510 | 1175 | 1175 | 673 |
| # of correctly predicted proteins in PA | 764 | 896 | 922 | 411 | 717 | 820 | 417 |
| # of correctly predicted proteins in SA | 481 | 500 | 433 | 252 | 473 | 498 | 270 |
| PA | 0.74 | 0.77 | 0.78 | 0.81 | 0.61 | 0.70 | 0.62 |
| SA | 0.47 | 0.43 | 0.37 | 0.49 | 0.40 | 0.42 | 0.40 |
* Numbers in the parentheses indicate total number of proteins for prediction for each category
Evaluation of prediction method variants using LOCSCL
| Reference | 0.70 | 0.34 | 0.76 | 0.32 | 0.73 | 0.36 | 0.75 | 0.36 |
| ALL | 0.71 | 0.43 | 0.82 | 0.45 | 0.79 | 0.52 | 0.79 | 0.48 |
| BIO | 0.76 | 0.48 | 0.78 | 0.43 | 0.77 | 0.50 | 0.78 | 0.48 |
| EVI | 0.72 | 0.38 | 0.83 | 0.39 | 0.81 | 0.43 | 0.78 | 0.43 |
The four methods were applied to PPIs where one protein has a known SCL and the other SCL is unknown, generating new SCL predictions for the previously unlocated proteins. Permissive (PA) and strict accuracy (SA) were calculated based on the newly available SCL data for these proteins.