| Literature DB >> 22303481 |
Liqi Li1, Yuan Zhang, Lingyun Zou, Changqing Li, Bo Yu, Xiaoqi Zheng, Yue Zhou.
Abstract
With the rapid increase of protein sequences in the post-genomic age, it is challenging to develop accurate and automated methods for reliably and quickly predicting their subcellular localizations. Till now, many efforts have been tried, but most of which used only a single algorithm. In this paper, we proposed an ensemble classifier of KNN (k-nearest neighbor) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic proteins based on a voting system. The overall prediction accuracies by the one-versus-one strategy are 78.17%, 89.94% and 75.55% for three benchmark datasets of eukaryotic proteins. The improved prediction accuracies reveal that GO annotations and hydrophobicity of amino acids help to predict subcellular locations of eukaryotic proteins.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22303481 PMCID: PMC3268814 DOI: 10.1371/journal.pone.0031057
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Three benchmark datasets used to train and test our predictor.
| iLoc8897 | Euk7579 | Hum3681 | |||
| Subcellular location | Number of proteins | Subcellular location | Number of proteins | Subcellular location | Number of proteins |
| Acrosome | 14 | Chloroplast | 671 | Centriole | 77 |
| Cell membrane | 697 | Cytoplasm | 1241 | Cytoplasm | 817 |
| Cell wall | 49 | Cytoskeleton | 40 | Cytoskeleton | 79 |
| Centrosome | 96 | Endoplasmic reticulum | 114 | Endosome | 24 |
| Chloroplast | 385 | Extracell | 861 | Endoplasmic reticulum | 229 |
| Cyanelle | 79 | Golgi apparatus | 47 | Extracell | 385 |
| Cytoplasm | 2186 | Lysosomal | 93 | Golgi apparatus | 161 |
| Cytoskeleton | 139 | Mitochondrion | 727 | Lysosome | 77 |
| Endoplasmic reticulum | 457 | Nucleus | 1932 | Microsome | 24 |
| Endosome | 41 | Peroxisomal | 125 | Mitochondrion | 364 |
| Extracell | 1048 | Plasma membrane | 1674 | Nucleus | 1021 |
| Golgi apparatus | 254 | Vacuolar | 54 | Peroxisome | 47 |
| Hydrogenosome | 10 | - | - | Plasma membrane | 354 |
| Lysosome | 57 | - | - | Synapse | 22 |
| Melanosome | 47 | - | - | - | - |
| Microsome | 13 | - | - | - | - |
| Mitochondrion | 610 | - | - | - | - |
| Nucleus | 2320 | - | - | - | - |
| Peroxisome | 110 | - | - | - | - |
| Spindle pole body | 68 | - | - | - | - |
| Synapse | 47 | - | - | - | - |
| Vacuole | 170 | - | - | - | - |
| Total | 8897 | Total | 7579 | Total | 3681 |
Figure 1This graph shows the contribution scores of top 45 features on the iLoc8897 dataset.
Figure 2This graph shows the contribution scores of top 45 features on the Euk7579 dataset.
Hydrophobicity: 6, 2, 5 … stand for the 6th, 2nd, 5th … elements in the hydrophobicity vectors respectively.
Figure 3This graph shows the contribution scores of top 45 features on the Hum3681 dataset.
Figure 4This graph shows the flow chart for application of KNN and LIBSVM algorithms.
Prediction performance of different top-N features on the iLoc8897 dataset by LIBSVM.
| Top10 | Top15 | Top20 | Top25 | Top30 | Top35 | Top40 | Top45 | Top50 | |
|
| 0.03125 | 0.5 | 0.5 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 |
|
| 512 | 0.03125 | 0.03125 | 2 | 2 | 2 | 2 | 2 | 2 |
| Overall accuracy (%) | 51.14 | 73.08 | 75.12 | 74.18 | 74.40 | 77.46 | 77.65 | 78.01 | 77.98 |
|
| - | - | - | - | - | - | - | 5 | - |
| Overall accuracy (%) | - | - | - | - | - | - | - | 74.70 | - |
Performance comparisons for eukaryotic protein subcellular location prediction method based on the iLoc8897 dataset.
| Subcellular location | Euk-mPLoc 2.0 (2010) (Chou and Shen 2010) | iLoc-Euk (2011) (Chou et al. 2011) | LIBSVM | KNN | The proposed method | |||
| Jackknife | Jackknife | Jackknife | Jackknife | Jackknife | ||||
| Accuracy (%) | Accuracy (%) | Accuracy (%) | MCC | Accuracy (%) | MCC | Accuracy (%) | MCC | |
| Acrosome | 7.14 | 7.14 | 57.14 | 0.8526 | 71.43 | 0.8449 | 64.29 | 0.8659 |
| Cell membrane | 64.85 | 80.49 | 84.52 | 0.9123 | 96.67 | 0.8558 | 85.09 | 0.9121 |
| Cell wall | 12.24 | 16.33 | 91.84 | 0.8750 | 85.71 | 0.8981 | 91.84 | 0.8750 |
| Centrosome | 22.92 | 69.79 | 86.17 | 0.8650 | 92.55 | 0.6513 | 88.30 | 0.8688 |
| Chloroplast | 82.60 | 87.79 | 99.73 | 0.9943 | 99.73 | 0.9873 | 99.73 | 0.9943 |
| Cyanelle | 59.49 | 64.56 | 100.00 | 1.0000 | 98.73 | 1.0000 | 100.00 | 1.0000 |
| Cytoplasm | 64.87 | 76.72 | 45.24 | 0.9399 | 90.34 | 0.8198 | 45.70 | 0.9361 |
| Cytoskeleton | 31.65 | 27.34 | 50.36 | 0.7629 | 6.47 | 0.8318 | 49.64 | 0.7640 |
| Endoplasmic reticulum | 76.15 | 89.06 | 87.72 | 0.9529 | 84.65 | 0.9457 | 87.72 | 0.9542 |
| Endosome | 4.88 | 7.32 | 21.95 | 0.7272 | 19.51 | 0.8163 | 21.95 | 0.7497 |
| Extracell | 81.87 | 90.46 | 91.82 | 0.9812 | 88.64 | 0.9902 | 91.92 | 0.9824 |
| Golgi apparatus | 22.05 | 63.39 | 76.59 | 0.8997 | 46.83 | 0.9633 | 77.38 | 0.9131 |
| Hydrogenosome | 20.00 | 0.00 | 100.00 | 1.0000 | 70.00 | 1.0000 | 100.00 | 1.0000 |
| Lysosome | 45.61 | 31.58 | 87.72 | 0.8813 | 57.89 | 0.9851 | 87.72 | 0.8813 |
| Melanosome | 0.00 | 2.13 | 76.60 | 0.9474 | 14.89 | 1.0000 | 76.60 | 0.9474 |
| Microsome | 7.69 | 0.00 | 69.23 | 0.8579 | 15.38 | 1.0000 | 69.23 | 0.8579 |
| Mitochondrion | 70.00 | 77.05 | 78.03 | 0.9749 | 80.66 | 0.9688 | 78.20 | 0.9750 |
| Nucleus | 64.70 | 87.93 | 93.69 | 0.8865 | 50.65 | 0.9943 | 93.60 | 0.8873 |
| Peroxisome | 50.91 | 54.55 | 100.00 | 0.9650 | 74.55 | 1.0000 | 100.00 | 0.9650 |
| Spindle pole body | 33.82 | 66.18 | 95.59 | 0.9110 | 4.41 | 1.0000 | 95.59 | 0.9181 |
| Synapse | 0.00 | 38.30 | 80.85 | 0.7918 | 25.53 | 0.8399 | 80.85 | 0.7918 |
| Vacuole | 59.41 | 71.76 | 95.88 | 0.9399 | 80.59 | 0.9819 | 93.53 | 0.9606 |
| Overall accuracy | 64.17 | 79.06 | 78.01 | - | 74.70 | - | 78.17 | - |
|
| - | 71.27 | 75.54 | - | 72.84 | - | 75.64 | - |
Performance comparisons for eukaryotic protein subcellular location prediction method based on the Euk7579 dataset.
| Subcellular location | Park et al. (2003) (Park and Kanehisa 2003) | LOCSVMPSI (2005) (Xie et al. 2005) | Complexity-based method (2009) (Zheng et al. 2009) | LIBSVM | KNN | The proposed method | ||||
| Jackknife | 5-Fold cross | 5-Fold cross | Jackknife | Jackknife | Jackknife | Jackknife | ||||
| Accuracy (%) | Accuracy (%) | Accuracy (%) | Accuracy (%) | Accuracy (%) | MCC | Accuracy (%) | MCC | Accuracy (%) | MCC | |
| Chloroplast | 57 | 72.3 | 76.5 | 86.4 | 93.21 | 0.9982 | 85.52 | 0.9689 | 93.21 | 0.9982 |
| Cytoplasm | 88 | 72.2 | 76.4 | 81.6 | 87.81 | 0.9035 | 89.13 | 0.7444 | 87.81 | 0.9013 |
| Cytoskeleton | 44 | 58.5 | 60.0 | 77.5 | 12.82 | 1.0000 | 35.90 | 0.9660 | 35.90 | 0.9660 |
| Endoplasmic reticulum | 31 | 46.5 | 61.4 | 78.9 | 59.82 | 0.9708 | 27.68 | 0.9276 | 59.82 | 0.9708 |
| Extracell | 57 | 78.0 | 89.7 | 84.0 | 91.01 | 0.9746 | 85.92 | 0.8879 | 91.01 | 0.9739 |
| Golgi apparatus | 12 | 14.6 | 46.8 | 61.7 | 33.33 | 1.0000 | 22.22 | 0.9127 | 33.33 | 0.9682 |
| Lysosomal | 54 | 61.8 | 62.4 | 73.1 | 67.74 | 0.9691 | 16.13 | 0.9392 | 67.74 | 0.9691 |
| Mitochondrion | 42 | 57.4 | 68.2 | 62.9 | 87.02 | 0.9502 | 70.99 | 0.9017 | 87.15 | 0.9494 |
| Nucleus | 73 | 89.6 | 91.5 | 84.4 | 95.94 | 0.8710 | 81.85 | 0.9441 | 95.94 | 0.8741 |
| Peroxisomal | 4 | 25.2 | 41.6 | 62.4 | 66.94 | 0.9648 | 20.16 | 0.8446 | 66.94 | 0.9648 |
| Plasma membrane | 91 | 92.2 | 94.7 | 86.7 | 93.07 | 0.9647 | 93.98 | 0.9140 | 93.07 | 0.9647 |
| Vacuolar | 25 | 25.0 | 40.7 | 66.7 | 50.94 | 0.9648 | 0.00 | - | 50.94 | 0.9330 |
| Overall accuracy | 75 | 78.2 | 83.5 | 81.6 | 89.80 | - | 81.60 | - | 89.94 | - |
|
| - | - | - | - | 89.65 | - | 81.60 | - | 89.73 | - |
Performance comparisons for human protein subcellular location prediction method based on the Hum3681 dataset.
| Subcellular location | Hum-mPLoc 2.0 (2009) (Shen and Chou 2009) | LIBSVM | KNN | The proposed method | |||
| Jackknife | Jackknife | Jackknife | Jackknife | ||||
| Accuracy (%) | Accuracy (%) | MCC | Accuracy (%) | MCC | Accuracy (%) | MCC | |
| Centriole | - | 93.51 | 0.9240 | 93.51 | 0.8867 | 94.81 | 0.9249 |
| Cytoplasm | - | 39.66 | 0.9151 | 91.43 | 0.7218 | 41.37 | 0.9007 |
| Cytoskeleton | - | 51.90 | 0.8138 | 8.86 | 0.8816 | 51.90 | 0.8232 |
| Endosome | - | 54.17 | 0.7012 | 33.33 | 0.7552 | 54.17 | 0.7417 |
| Endoplasmic reticulum | - | 78.85 | 0.9046 | 79.30 | 0.8960 | 78.85 | 0.9043 |
| Extracell | - | 86.23 | 0.9705 | 82.60 | 0.9029 | 86.23 | 0.9689 |
| Golgi apparatus | - | 70.19 | 0.8853 | 39.75 | 0.9284 | 70.19 | 0.8887 |
| Lysosome | - | 93.51 | 0.9407 | 57.14 | 0.9777 | 93.51 | 0.9407 |
| Microsome | - | 50.00 | 0.8008 | 0.00 | - | 50.00 | 0.8008 |
| Mitochondrion | - | 84.89 | 0.9569 | 81.04 | 0.9763 | 83.79 | 0.9596 |
| Nucleus | - | 91.67 | 0.8876 | 50.15 | 0.9833 | 91.77 | 0.8932 |
| Peroxisome | - | 97.87 | 0.9380 | 51.06 | 0.9605 | 97.87 | 0.9481 |
| Plasma membrane | - | 84.66 | 0.8887 | 60.80 | 0.9618 | 84.66 | 0.8870 |
| Synapse | - | 86.36 | 0.8487 | 27.27 | 0.8657 | 86.36 | 0.8487 |
| Overall accuracy | 62.7 | 75.22 | - | 67.75 | - | 75.55 | - |
|
| - | 72.22 | - | 65.19 | - | 72.25 | - |
Performance comparisons for eukaryotic protein subcellular location prediction method based on the Euk6181 dataset.
| Subcellular location | Euk-mPloc | KNN-SVM ensemble classifier (2010) | The proposed method | ||||
| Jackknife | Jackknife | Resubstitution | Jackknife | ||||
| Accuracy(%) | Accuracy(%) | MCC | Accuracy(%) | MCC | Accuracy(%) | MCC | |
| Acrosome | - | 41.2 | 0.641 | 76.5 | 0.874 | 76.47 | 0.9308 |
| Cell wall | - | 67.9 | 0.711 | 88.7 | 0.903 | 92.45 | 0.9028 |
| Centriole | - | 62.5 | 0.690 | 81.3 | 0.786 | 89.06 | 0.8857 |
| Chloroplast | - | 97.4 | 0.879 | 99.0 | 0.918 | 97.80 | 0.9956 |
| Cyanelle | - | 91.8 | 0.957 | 91.8 | 0.957 | 100.00 | 1.0000 |
| Cytoplasm | - | 88.2 | 0.640 | 91.8 | 0.729 | 82.64 | 0.7946 |
| Cytoskeleton | - | 24.3 | 0.491 | 41.9 | 0.645 | 0.00 | 0.0000 |
| Endoplasmic reticulum | - | 79.7 | 0.776 | 86.8 | 0.839 | 77.20 | 0.8906 |
| Endosome | - | 62.9 | 0.770 | 67.4 | 0.812 | 65.17 | 0.7867 |
| Golgi apparatus | - | 74.0 | 0.802 | 79.5 | 0.828 | 81.89 | 0.8355 |
| Hydrogenosome | - | 38.5 | 0.620 | 69.2 | 0.692 | 100.00 | 1.0000 |
| Lysosome | - | 65.0 | 0.662 | 72.5 | 0.772 | 98.75 | 0.9106 |
| Melanosome | - | 53.9 | 0.733 | 84.6 | 0.880 | 76.92 | 1.0000 |
| Microsome | - | 19.4 | 0.380 | 41.9 | 0.647 | 9.68 | 0.5996 |
| Mitochondrion | - | 85.1 | 0.872 | 87.5 | 0.910 | 89.91 | 0.9425 |
| Nucleus | - | 84.6 | 0.824 | 85.7 | 0.862 | 61.97 | 0.9642 |
| Peroxisome | - | 37.1 | 0.589 | 74.2 | 0.860 | 98.97 | 0.9896 |
| Plasma membrane | - | 81.4 | 0.766 | 84.4 | 0.817 | 71.86 | 0.9373 |
| Extracell | - | 83.3 | 0.864 | 85.9 | 0.894 | 92.81 | 0.9537 |
| Spindle pole body | - | 50.0 | 0.669 | 75.0 | 0.850 | 72.22 | 0.8679 |
| Synapse | - | 66.7 | 0.816 | 66.7 | 0.816 | 53.33 | 1.0000 |
| Vacuole | - | 42.2 | 0.610 | 82.4 | 0.865 | 92.16 | 0.9181 |
| Overall accuracy | 67.4 | 70.5 | - | 77.6 | - | 79.14 | - |
|
| - | - | - | - | - | 77.62 | - |
Examples to show the predicted results by three predictors.
| Accession number | Entry name | Swiss-Prot annotation | iLoc-Euk (2011) | Hum-mPLoc 2.0 (2009) | The proposed method |
| Trained by iLoc8897 dataset | |||||
| P55287 | Cad11_human | Plasma membrane | Plasma membrane | Plasma membraneCytoplasmExtracell | Plasma membrane |
| P02751 | Finc_human | Extracell | Extracell | Extracell | Extracell |
| Q8IZC6 | Cora1_human | Extracell | Extracell | Extracell | |
| Q9EPU7 | Z354c_rat | Nucleus | Nucleus | - | Nucleus |
| Q5QNQ9 | Cora1_mouse | Extracell | Extracell | - | Extracell |
| Q5BKR2 | Nhdc2_mouse | Mitochondrion | Plasma membrane | - | Mitochondrion |
| P12645 | Bmp3_human | Extracell | Extracell | Extracell | Extracell |
| P51690 | Arse_human | Golgi apparatus | Cytoplasm | Lysosome | Golgi apparatus |
| Q8C341 | Ospt_mouse | Endoplasmic reticulum | Plasma membrane | - | Cytoplasm |
| P00922 | Cah2_sheep | Cytoplasm | Cytoplasm | - | Cytoplasm |
| Q30D77 | Cooa1_mouse | Extracell | Extracell | - | Extracell |
Examples to show the predicted results by three predictors on multiple-location proteins.
| Accession number | Entry name | Swiss-Prot annotation | iLoc-Euk (2011) | Hum-mPLoc 2.0 (2009) | The proposed method |
| Trained by iLoc8897 dataset | |||||
| Q05329 | DCE2_human | Plasma membraneGolgi apparatusSynapse | Plasma membraneGolgi apparatusSynapse | CytoplasmMitochondrionSynapse | Plasma membraneGolgi apparatusSynapse |
| P58335 | Antr2_human | Endoplasmic reticulumPlasma membraneExtracell | Extracell | Endoplasmic reticulum | Endoplasmic reticulumPlasma membraneExtracell |
| P30622 | Clip1_human | CytoplasmCytoskeleton | Cytoplasm | CytoskeletonEndosome | CytoplasmCytoskeletonEndosome |
| P13395 | Sptca_drome | CytoskeletonGolgi apparatusPlasma membrane | Golgi apparatus | - | CytoskeletonGolgi apparatus |
| P11279 | Lamp1_human | EndosomeLysosomePlasma membrane | Plasma membrane | Lysosome | Plasma membraneLysosomeMelanosome |
| Q15942 | Zyx_human | CytoplasmCytoskeleton | Cytoskeleton | Plasma membrane | CytoplasmCytoskeletonNucleus |