| Literature DB >> 22759426 |
Jonathan Q Jiang1, Maoying Wu.
Abstract
BACKGROUND: Proteins that interact in vivo tend to reside within the same or "adjacent" subcellular compartments. This observation provides opportunities to reveal protein subcellular localization in the context of the protein-protein interaction (PPI) network. However, so far, only a few efforts based on heuristic rules have been made in this regard.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22759426 PMCID: PMC3314587 DOI: 10.1186/1471-2105-13-S10-S20
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The classification of 2513 annotated proteins into 22 subcellular localizations.
| order | subcellular localization | number of proteins |
|---|---|---|
| 1 | Actin | 30 |
| 2 | Bud | 12 |
| 3 | Bud neck | 51 |
| 4 | Cell periphery | 59 |
| 5 | Cytoplasm | 1195 |
| 6 | Early Golgi | 40 |
| 7 | Endosome | 39 |
| 8 | Endoplasmic reticulum (ER) | 125 |
| 9 | ER to Golgi | 6 |
| 10 | Golgi | 29 |
| 11 | Late Golgi | 36 |
| 12 | Lipid particle | 9 |
| 13 | Microtubule | 15 |
| 14 | Mitochondrion | 206 |
| 15 | Nuclear periphery | 51 |
| 16 | Nucleolus | 145 |
| 17 | Nucleus | 1071 |
| 18 | Peroxisome | 18 |
| 19 | Punctate composite | 96 |
| 20 | Spindle pole | 57 |
| 21 | Vacuolar membrane | 31 |
| 22 | Vacuole | 48 |
Protein co-localization for 28 experiment sources in the BioGRID database.
| Experiment system | Throughput technique | number of interactions | number of common l localizations | |
|---|---|---|---|---|
| ≥ 1 (reliability) | ≥ 2 | |||
| Affinity Capture-Luminescence | low throughput | 29 | 0.3448 | 0 |
| Affnity Capture-MS | high throughput | 44399 | 0.5343 | 0.0798 |
| low throughput | 5627 | 0.5873 | 0.0945 | |
| Affnity Capture-RNA | high throughput | 3657 | 0.2625 | 0.0014 |
| low throughput | 86 | 0.3140 | 0.0581 | |
| Affinity Capture-Western | high throughput | 213 | 0.6526 | 0.0798 |
| low throughput | 11257 | 0.5477 | 0.0759 | |
| Biochemical Activity | high throughput | 4211 | 0.3363 | 0.0686 |
| low throughout | 3427 | 0.4471 | 0.1100 | |
| Co-crystal Structure | low throughput | 364 | 0.6593 | 0.1703 |
| Co-fractionation | high throughput | 102 | 0.0098 | 0 |
| low throughput | 585 | 0.4821 | 0.0598 | |
| Co-localization | low throughput | 448 | 0.5357 | 0.0588 |
| Co-purification | high throughput | 11 | 0.8182 | 0.5455 |
| low throughput | 1667 | 0.6155 | 0.0834 | |
| FRET | high throughput | 13 | 0.1538 | 0 |
| low throughput | 121 | 0.6364 | 0.0579 | |
| Far Western | low throughput | 74 | 0.5811 | 0.0541 |
| high throughput | 4738 | 0.4185 | 0.0319 | |
| PCA | low throughput | 409 | 0.5232 | 0.0098 |
| high throughput | 9 | 0.3140 | 0 | |
| Protein-RNA | low throughput | 168 | 0.1116 | 0.0129 |
| high throughput | 328 | 0.3333 | 0 | |
| Protein-peptide | low throughput | 233 | 0.4940 | 0.0833 |
| high throughput | 27 | 0.5556 | 0.0370 | |
| Reconstructed Complex | low throughput | 3347 | 0.5088 | 0.0986 |
| high throughput | 6624 | 0.3578 | 0.0773 | |
| Two-hybrid | low throughput | 4622 | 0.4799 | 0.1019 |
MAP of 5-fold cross validation for four graph-based semi-supervised learning algorithms.
| Algorithms | MAP (%) | |
|---|---|---|
| PPI-only | PPI-weight | |
| Majority | 42.13 | 42.39 |
| Merged | 32.53 | 32.53 |
| Common | 24.36 | 24.36 |
| 33.07 | ||
| 19.77 | ||
| 14.59 | ||
| GMC | 53.43 | 53.66 |
| FunFlow | 62.07 | 62.16 |
The χ2-score method can be only applied to "PPI-only" case. The GenMultiCut method were performed through ILP as suggested by [22].
Figure 1Average precision and F1 micro score for each subcellular localization in the "PPI-only" scenario. Different colour bars correspond to the results obtained by different algorithms. The first row is the average precision for the first 11 subcellular localizations; the second one is the average precision for the last 11 subcellular localizations. The similar interpenetrations are used in the third and four rows for F1 micro score.
Figure 2Average precision and F1 micro score for each subcellular localization in the "PPI-weight" scenario. Different colour bars correspond to the results obtained by different algorithms. The first row is the average precision for the first 11 subcellular localizations; the second one is the average precision for the last 11 subcellular localizations. The similar interpenetrations are used in the third and four rows for F1 micro score.
Figure 3Subgraphs of the PPI network in our case studies. (a) the subgraph consists of 72 proteins annotated with localizations "ER to Golgi" and "Lipid particle" as well as their immediate neighbors, and 204 interactions between these proteins. (b) the subgraph consists of 83 proteins annotated with localizations "Bud" as well as their immediate neighbors, and 164 interactions between these proteins.
Top 5 predictions of each type for the first group of 60 "ambiguous" annotated proteins.
| Type | Protein (ORF) | Annotation | Prediction | UniProt | SGD |
|---|---|---|---|---|---|
| Correct | YBL034C | ambiguous spindle pole | spindle pole | Nucleus. Spindle. | spindle pole body (IDA) |
| YDR181C | ambiguous; cytoplasm; | cytoplasm; | Nucleus | nuclear chromatin (IDA) nuclear chromosome, telomeric region (IC) | |
| YGR020C | ambiguous; vacuolar membrane | vacuolar membrane | fungal-type vacuole membrane (TAS) vacuolar proton-transporting V-type ATPase, V1 domain (TAS) | ||
| YHR119W | ambiguous; nucleus | nucleus | Nucleus (Probable). Chromosome (Probable). | Set1C/COMPASS complex (IPI) | |
| YHR183W | ambiguous; cytoplasm | cytoplasm | Cytoplasm | cytoplasm (IDA) | |
| Partial Correct | YAL029C | ambiguous; | Bud | Bud | cellular bud (IDA) |
| YBR102C | ambiguous; | cytoplasm; | secretory vesicle. | cellular bud neck (IDA) | |
| YBR130C | ambiguous; | bud | actin cap (TAS) | ||
| YBR260C | ambiguous; | cytoplasm | Cytoplasm. | actin cortical patch (IDA) | |
| YFR016C | ambiguous; | cytoplasm | cellular bud (IDA) | ||
| Mismatch | YAR019C | ambiguous; spindle pole | cytoplasm | cellular bud neck (TAS) | |
| YBL105C | ambiguous; | actin | cytoplasm (IDA) | ||
| YDL146W | ambiguous; | actin | Bud. | colocalizes with actin cortical patch (IDA) | |
| YDR309C | ambiguous; | actin | Bud neck (By similarity). | actin cap (TAS) | |
| YHR158C | ambiguous; | cytoplasm; | cellular bud neck (IDA) | ||
| YCL024W | ambiguous; | Bud neck | cellular bud neck (IDA) | ||
| cellular bud neck septin collar (IDA) | |||||
| YDL089W | ambiguous; | Membrane | nuclear periphery (IDA) | ||
| Unknown | YDR069C | ambiguous; | Cytoplasm. | endosome (IDA) | |
| YDR507C | ambiguous;bud | Cytoplasm. | cellular bud neck (IDA) | ||
| YHL019C | ambiguous; | coated pit. | AP-1 adaptor complex (IPI) | ||
In this table, "Annotation" denotes the experimentally observed subcellular localizations in Yeast GFP Fusion Localization Database [17]. "UniProt" means the subcellular localization in general annotation (comments) in UniProt Database [25]. "SGD" means the cellular component of GO annotation in SGD database [26]. Each type corresponds to different situation of the match between our prediction and the experiment validation.
Top 5 predictions of each type for the 606 proteins without prior knowledge.
| Type | Protein(ORF) | Prediction | UniProt | SGD |
|---|---|---|---|---|
| Correct | Q0045 | mitochondrion | Mitochondrion inner membrane. | mitochondrion (IDA) |
| Q0080 | mitochondrion | Mitochondrion membrane. | mitochondrion (IDA) | |
| YAL020C | cytoplasm | cytoplasm (IDA, IPI) | ||
| YAL029C | bud | Bud. | cellular bud (IDA) | |
| YBL041W | cytoplasm; | Cytoplasm. | endoplasmic reticulum membrane (IC) | |
| Partial Correct | YAL042W | ER | Endoplasmic reticulum membrane; | ER to Golgi transport vesicle (IDA) |
| YBL088C | cytoplasm; | Nucleus. | nucleus (IC) | |
| YBR020W | cytoplasm; | cytoplasm (IGI) | ||
| YBR072W | cytoplasm | cytoplasm (IDA) | ||
| YBR108W | actin; | Membrane raft; | actin cortical patch (IDA) | |
| Mismatch | YAL003W | cytoplasm | ribosome (TAS) | |
| YAL028W | cytoplasm; | Endoplasmic reticulum membrane | endoplasmic reticulum (IDA) | |
| YAL030W | lipid particle | Endomembrane system | cellular bud neck (IDA) | |
| YAL040C | cytoplasm | nucleus (IDA, IMP) | ||
| YAL062W | actin; | nucleus (IDA) | ||
| Unknown | Q0120 | Mitochondrion. | mitochondrion (IDA) | |
| YAL034C | nucleus | |||
| YAR018C | spindle pole | |||
| YAR027W | Nucleus membrane; | nuclear envelope (IDA) | ||
| YAR042W | Cytoplasm | early endosome (IDA) | ||
In this table, "UniProt" means the subcellular localization in general annotation (comments) in UniProt Database [25]. "SGD" means the cellular component of GO annotation in SGD database [26]. Each type corresponds to different situation of the match between our prediction and the experiment validation.
Annotation results of 5 proteins in yeast GFP Fusion Localization database by the ensemble classifier and 4 basic classifiers.
| Protein | Annotation | Majority | GMC | FunFlow | Ensemble | |
|---|---|---|---|---|---|---|
| YAL029C | cell periphery;bud neck; cytoplasm;bud | bud neck; cytoplasm;nucleus | cell periphery; | nucleus | cytoplasm | bud neck;cytoplasm; nucleus;bud |
| YBR130C | cell periphery; | cell periphery; cytoplasm;nucleus | cell periphery; | cytoplasm | cell periphery; cytoplasm;bud | |
| YBR260C | bud neck; cytoplasm;bud | bud neck; | mitochondrion; | cytoplasm | bud neck; | |
| YDR181C | cytoplasm; | cytoplasm; | mitochondrion; | cytoplasm;nucleus | nucleus | cytoplasm; |
| YNL298W | cell periphery; | cell periphery; | cell periphery; bud neck;cytoplasm | cytoplasm | cell periphery; | |
Our method can predict all the labels for the proteins, while other approaches can only recover part of the labels.
Figure 4Flowchart to show the ensemble classifier. The ensemble classifier ℂ is formed by fusing four basic individual classifiers ℂ1, ℂ2, ℂ3 and ℂ4 derived from four graph-based semi-supervised learning.