| Literature DB >> 20368981 |
Kuo-Chen Chou1, Hong-Bin Shen.
Abstract
Information of subcellular locations of proteins is important for in-depth studies of cell biology. It is very useful for proteomics, system biology and drug development as well. However, most existing methods for predicting protein subcellular location can only cover 5 to 12 location sites. Also, they are limited to deal with single-location proteins and hence failed to work for multiplex proteins, which can simultaneously exist at, or move between, two or more location sites. Actually, multiplex proteins of this kind usually posses some important biological functions worthy of our special notice. A new predictor called "Euk-mPLoc 2.0" is developed by hybridizing the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracell, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole. Compared with the existing methods for predicting eukaryotic protein subcellular localization, the new predictor is much more powerful and flexible, particularly in dealing with proteins with multiple locations and proteins without available accession numbers. For a newly-constructed stringent benchmark dataset which contains both single- and multiple-location proteins and in which none of proteins has pairwise sequence identity to any other in a same location, the overall jackknife success rate achieved by Euk-mPLoc 2.0 is more than 24% higher than those by any of the existing predictors. As a user-friendly web-server, Euk-mPLoc 2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/. For a query protein sequence of 400 amino acids, it will take about 15 seconds for the web-server to yield the predicted result; the longer the sequence is, the more time it may usually need. It is anticipated that the novel approach and the powerful predictor as presented in this paper will have a significant impact to Molecular Cell Biology, System Biology, Proteomics, Bioinformatics, and Drug Development.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20368981 PMCID: PMC2848569 DOI: 10.1371/journal.pone.0009931
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Illustration to show the 22 subcellular locations of eukaryotic proteins.
The 22 location sites are: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracell, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole. Reprinted from [14] with permission.
Breakdown of the eukaryotic protein benchmark dataset derived from Swiss-Prot database (release 55.3) according to the procedures described in the Materials section.
| Subset | Subcellular location | Number of proteins |
|
| Acrosome | 14 |
|
| Cell membrane | 697 |
|
| Cell wall | 49 |
|
| Centrosome | 96 |
|
| Chloroplast | 385 |
|
| Cyanelle | 79 |
|
| Cytoplasm | 2186 |
|
| Cytoskeleton | 139 |
|
| Endoplasmic reticulum | 457 |
|
| Endosome | 41 |
|
| Extracell | 1048 |
|
| Golgi apparatus | 254 |
|
| Hydrogenosome | 10 |
|
| Lysosome | 57 |
|
| Melanosome | 47 |
|
| Microsome | 13 |
|
| Mitochondrion | 610 |
|
| Nucleus | 2320 |
|
| Peroxisome | 110 |
|
| Spindle pole body | 68 |
|
| Synapse | 47 |
|
| Vacuole | 170 |
| Number of total virtual proteins | 8,897 | |
| Number of total different proteins | 7,766 |
None of the proteins included here has sequence identity to any other in a same subcellular location.
See Fig. 1 and Eq.1 as well as the relevant text for the definitions of the subsets listed in this table.
See Eqs.2–3 for the definition about the number of virtual proteins, and its relation with the number of different proteins.
Of the 7,766 different proteins, 6,687 belong to one subcellular location, 1,029 to two locations, 48 to three locations, and 2 to four locations. See Online for the protein sequences.
Figure 2A flowchart to show the prediction process of Euk-mPLoc 2.0.
Figure 3Semi-screenshot to show the prediction steps.
(a) The top page of the Euk-mPLoc 2.0 web server at http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/. (b) The input of a query protein in FASTA format. (c) The output predicted by Euk-mPLoc 2.0 for the query protein 1 in the Example window. (d) The output for the query protein 2 in the Example window.
A comparison of Euk-mPLoc 2.0 with Euk-PLoc in the jackknife cross-validation test on the benchmark dataset covering 22 location sites where none of the eukaryotic proteins included has pairwise sequence identity to any other in a same location.
| Subcellular location site | Success rate by jackknife cross-validation | |
| Euk-mPLoc | Euk-mPLoc 2.0 | |
| Acrosome | 0/14 = 0.00% | 1/14 = 7.14% |
| Cell membrane | 262/697 = 37.58% | 452/697 = 64.85% |
| Cell wall | 4/49 = 8.16% | 6/49 = 12.24% |
| Centrosome | 9/96 = 9.38% | 22/96 = 22.92% |
| Chloroplast | 117/385 = 30.39% | 318/385 = 82.60% |
| Cyanelle | 12/79 = 15.19% | 47/79 = 59.49% |
| Cytoplasm | 918/2186 = 41.99% | 1418/2186 = 64.87% |
| Cytoskeleton | 4/139 = 2.88% | 44/139 = 31.65% |
| Endoplasmic reticulum | 115/457 = 25.16% | 348/457 = 76.15% |
| Endosome | 1/41 = 2.44% | 2/41 = 4.88% |
| Extracell | 678/1048 = 64.69% | 858/1048 = 81.87% |
| Golgi apparatus | 5/254 = 1.97% | 56/254 = 22.05% |
| Hydrogenosome | 0/10 = 0.00% | 2/10 = 20.00% |
| Lysosome | 5/57 = 8.77% | 26/57 = 45.61% |
| Melanosome | 0/47 = 0.00% | 0/47 = 0.00% |
| Microsome | 0/13 = 0.00% | 1/13 = 7.69% |
| Mitochondrion | 143/610 = 23.44% | 427/610 = 70.00% |
| Nucleus | 1212/2320 = 52.24% | 1501/2320 = 64.70% |
| Peroxisome | 1/110 = 0.91% | 56/110 = 50.91% |
| Spindle pole body | 0/68 = 0.00% | 23/68 = 0.3382 |
| Synapse | 0/47 = 0.00% | 0/47 = 0.00% |
| Vacuole | 7/170 = 4.12% | 101/170 = 59.41% |
|
|
|
|
Note that in order to make the comparison under exactly the same condition, only the sequences of proteins in the Online but not their accession numbers were used as inputs during the prediction.