| Literature DB >> 21483473 |
Kuo-Chen Chou1, Zhi-Cheng Wu, Xuan Xiao.
Abstract
Predicting protein subcellular localization is an important and difficult problem, particularly when query proteins may have the multiplex character, i.e., simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular location predictor can only be used to deal with the single-location or "singleplex" proteins. Actually, multiple-location or "multiplex" proteins should not be ignored because they usually posses some unique biological functions worthy of our special notice. By introducing the "multi-labeled learning" and "accumulation-layer scale", a new predictor, called iLoc-Euk, has been developed that can be used to deal with the systems containing both singleplex and multiplex proteins. As a demonstration, the jackknife cross-validation was performed with iLoc-Euk on a benchmark dataset of eukaryotic proteins classified into the following 22 location sites: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centriole, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole, where none of proteins included has ≥25% pairwise sequence identity to any other in a same subset. The overall success rate thus obtained by iLoc-Euk was 79%, which is significantly higher than that by any of the existing predictors that also have the capacity to deal with such a complicated and stringent system. As a user-friendly web-server, iLoc-Euk is freely accessible to the public at the web-site http://icpr.jci.edu.cn/bioinfo/iLoc-Euk. It is anticipated that iLoc-Euk may become a useful bioinformatics tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development Also, its novel approach will further stimulate the development of predicting other protein attributes.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21483473 PMCID: PMC3068162 DOI: 10.1371/journal.pone.0018258
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Illustration to show the 22 subcellular locations of eukaryotic proteins.
The 22 locations are: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centriole, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome (17) mitochondria, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Adapted from [73] with permission.
A system or dataset that contains eukaryotic proteins classified into 22 subcellular location sites (cf. Eq.17), where the site or subset contains proteins. Note that since a protein may belong to more than one subcellular location, we generally have .
| Subset a | Subcellular location | Number of proteins |
|
| Acrosome |
|
|
| Cell membrane |
|
|
| Cell wall |
|
|
| Centrosome |
|
|
| Chloroplast |
|
|
| Cyanelle |
|
|
| Cytoplasm |
|
|
| Cytoskeleton |
|
|
| Endoplasmic reticulum |
|
|
| Endosome |
|
|
| Extracellular |
|
|
| Golgi apparatus |
|
|
| Hydrogenosome |
|
|
| Lysosome |
|
|
| Melanosome |
|
|
| Microsome |
|
|
| Mitochondrion |
|
|
| Nucleus |
|
|
| Peroxisome |
|
|
| Spindle pole body |
|
|
| Synapse |
|
|
| Vacuole |
|
A comparison of iLoc-Euk with Euk-mPLoc 2.0 [40] using the jackknife cross-validation test on the benchmark dataset taken from the Online Supporting Information S1 of [40].
| Code | Subcellular location site | Success rate by jackknife test | |
| Euk-mPLoc 2.0 | iLoc-Euk | ||
| 1 | Acrosome | 1/14 = 7.14% | 1/14 = 7.14% |
| 2 | Cell membrane | 452/697 = 64.85% | 561/697 = 80.49% |
| 3 | Cell wall | 6/49 = 12.24% | 8/49 = 16.33% |
| 4 | Centrosome | 22/96 = 22.92% | 67/96 = 69.79% |
| 5 | Chloroplast | 318/385 = 82.60% | 338/385 = 87.79% |
| 6 | Cyanelle | 47/79 = 59.49% | 51/79 = 64.56% |
| 7 | Cytoplasm | 1418/2186 = 64.87% | 1677/2186 = 76.72% |
| 8 | Cytoskeleton | 44/139 = 31.65% | 38/139 = 27.34% |
| 9 | Endoplasmic reticulum | 348/457 = 76.15% | 407/457 = 89.06% |
| 10 | Endosome | 2/41 = 4.88% | 3/41 = 7.32% |
| 11 | Extracell | 858/1048 = 81.87% | 948/1048 = 90.46% |
| 12 | Golgi apparatus | 56/254 = 22.05% | 161/254 = 63.39% |
| 13 | Hydrogenosome | 2/10 = 20.00% | 0/10 = 0.00% |
| 14 | Lysosome | 26/57 = 45.61% | 18/57 = 31.58% |
| 15 | Melanosome | 0/47 = 0.00% | 1/47 = 2.13% |
| 16 | Microsome | 1/13 = 7.69% | 0/13 = 0.00% |
| 17 | Mitochondrion | 427/610 = 70.00% | 470/610 = 77.05% |
| 18 | Nucleus | 1501/2320 = 64.70% | 2040/2320 = 87.93% |
| 19 | Peroxisome | 56/110 = 50.91% | 60/110 = 54.55% |
| 20 | Spindle pole body | 23/68 = 33.82% | 45/68 = 66.18% |
| 21 | Synapse | 0/47 = 0.00% | 18/47 = 38.30% |
| 22 | Vacuole | 101/170 = 59.41% | 122/170 = 71.76% |
| Overall | 5709/8897 = | 7034/8897 = | |
The dataset contains 7,766 different eukaryotic protein sequences covering 22 location sites where none of the proteins included has pairwise sequence identity to any other in a same location.
The predictor from [40].
The predictor proposed in this paper.
Note that instead of 7,766 (the number of total different proteins), here we use 8,897 (the number of total different virtual proteins) for the denominator. This is because some proteins may have two or more location sites. As for the definition of “virtual protein”, see Eqs.2–3 of [40] and the relevant explanation there.
Figure 2A flowchart to show the prediction process of iLoc-Euk.
Figure 3A semi-screenshot to show the top page of the iLoc-Euk web-server.
Its website address is at http://icpr.jci.edu.cn/bioinfo/iLoc-Euk.
Figure 4A semi-screenshot to show the output of iLoc-Euk.
The input was taken from the three protein sequences listed in the Example window of the iLoc-Euk web-server (cf. Fig. 3).