| Literature DB >> 20596258 |
Kuo-Chen Chou1, Hong-Bin Shen.
Abstract
One of the fundamental goals in proteomics and cell biology is to identify the functions of proteins in various cellular organelles and pathways. Information of subcellular locations of proteins can provide useful insights for revealing their functions and understanding how they interact with each other in cellular network systems. Most of the existing methods in predicting plant protein subcellular localization can only cover three or four location sites, and none of them can be used to deal with multiplex plant proteins that can simultaneously exist at two, or move between, two or more different location sits. Actually, such multiplex proteins might have special biological functions worthy of particular notice. The present study was devoted to improve the existing plant protein subcellular location predictors from the aforementioned two aspects. A new predictor called "Plant-mPLoc" is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify plant proteins among the following 12 location sites: (1) cell membrane, (2) cell wall, (3) chloroplast, (4) cytoplasm, (5) endoplasmic reticulum, (6) extracellular, (7) Golgi apparatus, (8) mitochondrion, (9) nucleus, (10) peroxisome, (11) plastid, and (12) vacuole. Compared with the existing methods for predicting plant protein subcellular localization, the new predictor is much more powerful and flexible. Particularly, it also has the capacity to deal with multiple-location proteins, which is beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization. As a user-friendly web-server, Plant-mPLoc is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. It is anticipated that the Plant-mPLoc predictor as presented in this paper will become a very useful tool in plant science as well as all the relevant areas.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20596258 PMCID: PMC2893129 DOI: 10.1371/journal.pone.0011335
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Schematic illustration to show the 12 subcellular locations of plant proteins.
The 12 location sites are: (1) cell membrane, (2) cell wall, (3) chloroplast, (4) cytoplasm, (5) endoplasmic reticulum, (6) extracellular, (7) Golgi apparatus, (8) mitochondrion, (9) nucleus, (10) peroxisome, (11) plastid, and (12) vacuole.
Breakdown of the plant protein benchmark dataset derived from Swiss-Prot database (release 55.3) according to the procedures described in the Materials section.
| Subset | Subcellular location | Number of proteins |
|
| Cell membrane | 56 |
|
| Cell wall | 32 |
|
| Chloroplast | 286 |
|
| Cytoplasm | 182 |
|
| Endoplasmic reticulum | 42 |
|
| Extracellular | 22 |
|
| Golgi apparatus | 21 |
|
| Mitochondrion | 150 |
|
| Nucleus | 152 |
|
| Peroxisome | 21 |
|
| Plastid | 39 |
|
| Vacuole | 52 |
| Total number of locative proteins | 1,055 | |
| Total number of different proteins | 978 | |
None of proteins included here has sequence identity to any other in a same subcellular location.
The benchmark dataset here covers 12 plant subcellular locations and the “Golgi apparatus” is newly added in comparison with the dataset in [13] that covered 11 location sites.
See Eqs.2–3 for the definition about the number of locative proteins, and its relation with the number of different proteins.
Of the 978 different proteins, 904 have one subcellular location, 71 have two locations, 3 have three locations, and none have four or more locations.
Figure 2A flowchart to show the prediction process of Plant-mPLoc.
Figure 3Semi-screenshot to show the prediction steps.
(a) the top page of the Plant-mPLoc web server at http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/, (b) the input of a query protein in FASTA format, (c) the output predicted by Plant-mPLoc for the query protein 1 in the Example window, and (d) the output for the query protein 2 in the Example window.
A comparison of the jackknife success rates by Plant-PLoc [13] and the current Plant-mPLoc on the benchmark dataset (cf. ) that covers 12 location sites of plant proteins in which none of the proteins included has 25% pairwise sequence identity to any other in a same location.
| Subcellular location | Success rate | |
| Plant-PLoc | Plant-mPLoc | |
| Cell membrane | 15/56 = 26.8% | 24/56 = 42.9% |
| Cell wall | 7/32 = 21.9% | 8/32 = 25.0% |
| Chloroplast | 184/286 = 64.3% | 248/286 = 86.7% |
| Cytoplasm | 51/182 = 28.0% | 72/182 = 39.6% |
| Endoplasmic reticulum | 1/42 = 2.4% | 17/42 = 40.5% |
| Extracellular | 4/22 = 18.2% | 3/22 = 13.6% |
| Golgi apparatus | 6/21 = 28.6% | 6/21 = 28.6% |
| Mitochondrion | 26/150 = 17.3% | 114/150 = 76.0% |
| Nucleus | 92/152 = 60.5% | 136/152 = 89.5% |
| Peroxisome | 2/21 = 9.5% | 14/21 = 66.7% |
| Plastid | 9/39 = 23.1% | 4/39 = 10.3% |
| Vacuole | 4/52 = 7.7% | 26/52 = 50.0% |
| Total | 401/1055 = 38.0% | 672/1055 = 63.7% |
Note that in order to make the comparison under exactly the same condition, only the sequences of proteins in the but not their accession numbers were used as inputs during the prediction.