| Literature DB >> 21698097 |
Xuan Xiao1, Zhi-Cheng Wu, Kuo-Chen Chou.
Abstract
Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the "multi-label scale" and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localization of gram-positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gneg-mPLoc was adopted to demonstrate the power of iLoc-Gneg. The dataset contains 1,392 gram-negative bacterial proteins classified into the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Of the 1,392 proteins, 1,328 are each with only one subcellular location and the other 64 are each with two subcellular locations, but none of the proteins included has pairwise sequence identity to any other in a same subset (subcellular location). It was observed that the overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gneg was over 91%, which is about 6% higher than that by Gneg-mPLoc. As a user-friendly web-server, iLoc-Gneg is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user's convenience, the iLoc-Gneg web-server also has the function to accept the batch job submission, which is not available in the existing version of Gneg-mPLoc web-server. It is anticipated that iLoc-Gneg may become a useful high throughput tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21698097 PMCID: PMC3117797 DOI: 10.1371/journal.pone.0020592
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Illustration to show the 8 subcellular locations of Gram-negative bacterial proteins.
The 8 locations are: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Note that in prokaryotic life forms, the nucleoid region is the part of the cell that contains the DNA molecule; unlike the true nucleus of eukaryotes, it is not delimited by a membrane.
Breakdown of the Gram-negative bacterial protein benchmark dataset taken from [11].
| Subset | Subcellular location | Number of proteins |
|
| Cell inner membrane | 557 |
|
| Cell outer membrane | 124 |
|
| Cytoplasm | 410 |
|
| Extracellular | 133 |
|
| Fimbrium | 32 |
|
| Flagellum | 12 |
|
| Nucleoid | 8 |
|
| Periplasm | 180 |
| Total number of locative proteins | 1,456 | |
| Total number of different proteins | 1,392 | |
None of proteins included here has sequence identity to any other in a same subcellular location.
See Eqs.36–38 of [2] for the definition about the number of locative proteins, and its relation with the number of different proteins.
Of the 1,392 different proteins, 1,328 have one subcellular location, 64 have two locations, and none have three or more locations.
A comparison of the jackknife success rates by Gnec-mPLoc [11] and the current iLoc-Gneg on the benchmark dataset (cf. ) that covers 8 location sites of Gram-negative bacterial proteins in which none of the proteins included has 25% pairwise sequence identity to any other in a same location.
| Code | Subcellular location | Success rate by jackknife test | |
| Gneg-mPLoc | iLoc-Gneg | ||
| 1 | Cell inner membrane | 525/557 = 94.3% | 539/557 = 96.8% |
| 2 | Cell outer membrane | 105/124 = 84.7% | 103/124 = 83.1% |
| 3 | Cytoplasm | 357/410 = 87.1% | 367/410 = 89.5% |
| 4 | Extracellular | 79/133 = 59.4% | 115/133 = 86.5% |
| 5 | Fimbrium | 28/32 = 87.5% | 30/32 = 93.8% |
| 6 | Flagellum | 0/12 = 0.0% | 12/12 = 100% |
| 7 | Nucleoid | 0/8 = 0.0% | 4/8 = 50% |
| 8 | Periplasm | 154/180 = 85.6% | 161/180 = 89.4% |
| Overall | 1248/1456 = | 1331/1456 = | |
The predictor from [11].
The predictor proposed in this paper.
Note that instead of 1,392 (the number of total different Gram-positive bacterial proteins), here we use 1,456 (the number of total different locative proteins) for the denominator. This is because some of the Gram-negative bacterial proteins in may have more than one location site. See footnotes a and b of Table 1 for further explanation.
Figure 2A flowchart to show the prediction process of iLoc-Gneg.
Figure 3A semi-screenshot to show the top page of the iLoc-Gneg web-server.
Its website address is at http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg.
Figure 4A semi-screenshot to show the output of iLoc-Gneg.
The input was taken from the three protein sequences listed in the Example window of the iLoc-Gneg web-server (cf. Fig. 3).