| Literature DB >> 22629314 |
Abstract
Subcellular locations of proteins are important functional attributes. An effective and efficient subcellular localization predictor is necessary for rapidly and reliably annotating subcellular locations of proteins. Most of existing subcellular localization methods are only used to deal with single-location proteins. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. To better reflect characteristics of multiplex proteins, it is highly desired to develop new methods for dealing with them. In this paper, a new predictor, called Euk-ECC-mPLoc, by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and hybridizing gene ontology with dipeptide composition information, has been developed that can be used to deal with systems containing both singleplex and multiplex eukaryotic proteins. It can be utilized to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome, (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Experimental results on a stringent benchmark dataset of eukaryotic proteins by jackknife cross validation test show that the average success rate and overall success rate obtained by Euk-ECC-mPLoc were 69.70% and 81.54%, respectively, indicating that our approach is quite promising. Particularly, the success rates achieved by Euk-ECC-mPLoc for small subsets were remarkably improved, indicating that it holds a high potential for simulating the development of the area. As a user-friendly web-server, Euk-ECC-mPLoc is freely accessible to the public at the website http://levis.tongji.edu.cn:8080/bioinfo/Euk-ECC-mPLoc/. We believe that Euk-ECC-mPLoc may become a useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying subcellular locations of eukaryotic proteins.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22629314 PMCID: PMC3358325 DOI: 10.1371/journal.pone.0036317
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Schematic illustration to show the 22 subcellular locations of eukaryotic proteins.
They are: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Adopted from [24] with permission.
Breakdown of the eukaryotic protein benchmark dataset taken from [24].
| Subset | Subcellular location | Number of proteins |
|
| Acrosome | 14 |
|
| Cell membrane | 697 |
|
| Cell wall | 49 |
|
| Centrosome | 96 |
|
| Chloroplast | 385 |
|
| Cyanelle | 79 |
|
| Cytoplasm | 2186 |
|
| Cytoskeleton | 139 |
|
| Endoplasmic reticulum | 457 |
|
| Endosome | 41 |
|
| Extracellular | 1048 |
|
| Golgi apparatus | 254 |
|
| Hydrogenosome | 10 |
|
| Lysosome | 57 |
|
| Melanosome | 47 |
|
| Microsome | 13 |
|
| Mitochondrion | 610 |
|
| Nucleus | 2320 |
|
| Peroxisome | 110 |
|
| Spindle pole body | 68 |
|
| Synapse | 47 |
|
| Vacuole | 170 |
| Total number of locative proteins N(loc) |
| |
| Total number of different proteins N(seq) |
| |
Figure 2Figure to illustrate the complete process of BR method.
Figure 3Figure to illustrate the complete process of ECC method.
Figure 4A flowchart to show the prediction process of Euk-ECC-mPLoc.
A comparison of the jackknife success rates by iLoc-Euk [24] and the proposed Euk-ECC-mPLoc on the benchmark dataset that covers 22 location sites of eukaryotic proteins in which none of the proteins included has pairwise sequence identity to any other in a same location.
| Code | Subcellular location | Success rate by jackknife test | |
| iLoc-Euk | Euk-ECC-mPLoc | ||
| 1 | Acrosome | 7.14% | 71.43% |
| 2 | Cell membrane | 80.49% | 79.20% |
| 3 | Cell wall | 16.33% | 51.02% |
| 4 | Centrosome | 69.79% | 66.67% |
| 5 | Chloroplast | 87.79% | 87.01% |
| 6 | Cyanelle | 64.56% | 60.76% |
| 7 | Cytoplasm | 76.72% | 77.77% |
| 8 | Cytoskeleton | 27.34% | 28.78% |
| 9 | Endoplasmic reticulum | 89.06% | 87.96% |
| 10 | Endosome | 7.32% | 36.59% |
| 11 | Extracellular | 90.46% | 91.60% |
| 12 | Golgi apparatus | 63.39% | 69.29% |
| 13 | Hydrogenosome | 0.00% | 90.00% |
| 14 | Lysosome | 31.58% | 73.68% |
| 15 | Melanosome | 2.13% | 53.19% |
| 16 | Microsome | 0.00% | 38.46% |
| 17 | Mitochondrion | 77.05% | 83.11% |
| 18 | Nucleus | 87.93% | 87.28% |
| 19 | Peroxisome | 54.55% | 85.45% |
| 20 | Spindle pole body | 66.18% | 83.82% |
| 21 | Synapse | 38.30% | 46.81% |
| 22 | Vacuole | 71.76% | 83.53% |
| Average | 50.45% | 69.70% | |
| Overall | 79.06% | 81.54% | |
A comparison of the jackknife “exact match” success rates by iLoc-Euk [24] and the proposed Euk-ECC-mPLoc on the benchmark dataset that covers 22 location sites of eukaryotic proteins in which none of the proteins included has pairwise sequence identity to any other in a same location.
| Number of Locations | Euk-ECC-mPLoc | iLoc-Euk | Random |
| 1 | 75% | - |
|
| 2 | 59.09% | - |
|
| 3 | 10.42% | - |
|
| 4 | 0% | - |
|
| Overall | 72.59% | 71.27% | - |
the predicted outputs by iLoc-Euk and Euk-ECC-mPLoc as well as the corresponding experimental annotations from DBMLoc [89].
| UniProt entry | UniProt entry name | Locations predicted by iLoc-Euk | Locations predicted by Euk-ECC-mPLoc | Annotations in DBMLoc |
| P38143 | GPX2_YEAST | Cytoplasm | Cytoplasm | Cytoplasm |
| Nucleus | Nucleus | |||
| P25823 | TUD_DROME | Mitochondrion | Cytoplasm | Cytoplasm |
| Mitochondrion | Mitochondrion | |||
| P28829 | BYR2_SCHPO | Cytoplasm | Cell membrane | Cell membrane |
| Cytoplasm | Cytoplasm | |||
| P32614 | FRDS_YEAST | Cytoplasm | Cytoplasm | Cytoplasm |
| Mitochondrion | Mitochondrion | Mitochondrion | ||
| Nucleus | ||||
| Q9H190 | SDCB2_HUMAN | Cytoplasm | Cell membrane | Cell membrane |
| Cytoplasm | Cytoplasm | |||
| Q9Y7Q2 | GST1_SCHPO | Cytoplasm | Cytoplasm | Cytoplasm |
| Nucleus | Nucleus | |||
| O59827 | GST2_SCHPO | Cytoplasm | Cytoplasm | Cytoplasm |
| Nucleus | Nucleus | |||
| P27476 | NSR1_YEAST | Nucleus | Mitochondrion | Mitochondrion |
| Nucleus | Nucleus | |||
| P47119 | ITPA_YEAST | Nucleus | Cytoplasm | Cytoplasm |
| Nucleus | Nucleus |