| Literature DB >> 22759391 |
Jhih-Rong Lin1, Ananda Mohan Mondal, Rong Liu, Jianjun Hu.
Abstract
BACKGROUND: Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22759391 PMCID: PMC3426488 DOI: 10.1186/1471-2105-13-157
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Features used in localization prediction algorithms
| NetLoc | | | | | | X |
| YLoc | X | X | X | X | | |
| MultiLoc2 | X | X | X | | X | |
| KnowPred | | | | X | | |
| Subcell | | X | | | | |
| WoLFPSORT | X | X | X | | | |
| BaCelLo | | X | | | X | |
| CELLO | | X | | | | |
| SubLoc | X |
The distributions of proteins in different locations for the test datasets
| Yeast-LowRes | 498 | 175 | 234 | 315 | 1222 | |||
| Human | 361 | 327 | 159 | 458 | 1305 | |||
| | | |||||||
| Yeast-HighRes | 530 | 165 | 233 | 149 | 103 | 33 | 34 | 1247 |
| 1Overlap | 451 | 133 | 218 | 132 | 90 | 32 | 0 | 1056 |
1Overlap of Yeast LowRes and Yeast HighRres.
Prediction performance (MCC Scores) of individual predictors for the Yeast Low-Res dataset
| Cytosol | 0.146 | 0.270 | 0.268 | 0.286 | 0.134 | 0.265 | 0.164 | 0.261 | 0.184 | 0.429 | 0.504 |
| Mitochondrion | 0.556 | 0.350 | 0.581 | 0.415 | 0.243 | 0.549 | 0.526 | 0.547 | 0.354 | 0.668 | 0.666 |
| Nucleus | 0.367 | 0.484 | 0.420 | 0.345 | 0.181 | 0.312 | 0.291 | 0.302 | 0.260 | 0.476 | 0.550 |
| Secretory | 0.314 | 0.473 | 0.339 | 0.534 | 0.326 | 0.568 | 0.339 | 0.534 | 0.391 | 0.607 | 0.664 |
| Overall Accuracy | 0.453 | 0.556 | 0.558 | 0.51 | 0.399 | 0.484 | 0.468 | 0.493 | 0.439 | 0.668 | 0.707 |
Prediction performance (MCC Scores) of individual predictors for the Yeast High-Res dataset
| Cytosol | 0.441 | 0.293 | 0.146 | 0.251 | 0.255 | 0.247 | 0.459 | 0.555 |
| Mitochondrion | 0.689 | 0.496 | 0.251 | 0.510 | 0.501 | 0.318 | 0.684 | 0.713 |
| Nucleus | 0.405 | 0.275 | 0.181 | 0.311 | 0.306 | 0.434 | 0.351 | 0.473 |
| ER | 0.207 | 0.203 | 0.022 | 0.059 | 0.000 | 0.340 | 0.431 | 0.463 |
| Vacuole | 0.115 | 0.045 | 0.034 | 0.000 | 0.061 | 0.189 | 0.174 | 0.191 |
| Golgi | 0.008 | 0.010 | 0.054 | 0.118 | −0.005 | 0.465 | 0.038 | 0.275 |
| Cell Periphery | 0.107 | 0.044 | 0.068 | 0.142 | 0.090 | 0.449 | 0.04 | 0.269 |
| Overall accuracy | 0.506 | 0.473 | 0.300 | 0.362 | 0.354 | 0.523 | 0.585 | 0.640 |
Prediction performance (MCC Scores) of individual predictors for the Human dataset
| Cytosol | 0.308 | 0.334 | 0.307 | 0.050 | 0.261 | 0.220 | 0.117 | 0.065 | 0.362 |
| Mitochondrion | 0.546 | 0.451 | 0.048 | 0.080 | 0.329 | 0.439 | 0.369 | 0.264 | 0.515 |
| Nucleus | 0.454 | 0.293 | 0.419 | 0.122 | 0.277 | 0.233 | 0.234 | 0.162 | 0.375 |
| Secretory | 0.720 | 0.627 | 0.477 | 0.205 | 0.553 | 0.607 | 0.428 | 0.339 | 0.712 |
| Overall Accuracy | 0.628 | 0.581 | 0.514 | 0.303 | 0.527 | 0.54 | 0.419 | 0.375 | 0.646 |
Figure 1 Prediction performance of the logistic regression ensemble methods with K individual predictors selected by exhaustive search. (a) Performance on the Yeast Low-Res dataset, (b) Performance on the Human dataset. Each dot represents one combination of predictors. The number of predictors is annotated on the X axis. The performance of the logistic regression ensemble method is annotated on the Y axis. The dots connected by the line represent the combinations of predictors determined by the minimalist algorithm for different K values.
Figure 2 Contribution scores of individual predictors. (a) 9 predictors for the Yeast Low-Res dataset, (b) 8 predictors for the Human dataset.
The most frequent predictors selected by the minimalist algorithm with size of each K (noted by M) during the 10-fold cross-validation and the best combination of K predictors (noted by B) according to the exhaustive search result of the logistic regression ensemble on the Yeast dataset
| 2 | | BM | | | | BM | | | |
| 3 | B | BM | M | | | BM | | | |
| 4 | B | BM | BM | M | | BM | | | |
| 5 | B | BM | M | BM | | BM | | M | B |
| 6 | BM | BM | M | BM | | BM | | BM | B |
| 7 | BM | BM | M | M | B | BM | BM | BM | B |
| 8 | BM | BM | BM | BM | B | BM | M | BM | BM |
Figure 3 Performance of the best ensemble on the Yeast dataset using different ensemble schemes with K (K = 2..9) predictors selected by exhaustive search. (a) 9 predictors including NetLoc (PPI) (b) 8 predictors without NetLoc (PPI).
Figure 4 Performance of different ensemble schemes on the Yeast Low-Res dataset with K (k = 2..9) predictors selected by Minimalist algorithm and Top-K accurate method. (a) Different ensemble methods with K (k = 2..9) predictors selected by Minimalist algorithm. (b) Different ensemble methods with K (k = 2..9) predictors selected by Top-K accurate algorithm.
Comparison of the performance of ConLoc and Minimalist LR ensemble algorithm with 13 predictors on the Yeast Low-Res dataset
| Cytosol | 0.301 | 0.441 | 0.489 | 0.472 |
| Mitochondrion | 0.574 | 0.622 | 0.708 | 0.731 |
| Nucleus | 0.341 | 0.461 | 0.537 | 0.541 |
| Secretory | 0.533 | 0.537 | 0.608 | 0.605 |
| Overall Accuracy | 0.529 | 0.616 | 0.696 | 0.693 |
Comparison of the performance of ConLoc and Minimalist LR ensemble algorithm with 13 predictors on the Human dataset
| Cytosol | 0.390 | 0.414 | 0.429 | 0.460 |
| Mitochondrion | 0.613 | 0.628 | 0.641 | 0.645 |
| Nucleus | 0.463 | 0.415 | 0.371 | 0.392 |
| Secretory | 0.754 | 0.721 | 0.749 | 0.758 |
| Overall Accuracy | 0.644 | 0.664 | 0.689 | 0.703 |