| Literature DB >> 22715364 |
Jianjun He1, Hong Gu, Wenqi Liu.
Abstract
It is well known that an important step toward understanding the functions of a protein is to determine its subcellular location. Although numerous prediction algorithms have been developed, most of them typically focused on the proteins with only one location. In recent years, researchers have begun to pay attention to the subcellular localization prediction of the proteins with multiple sites. However, almost all the existing approaches have failed to take into account the correlations among the locations caused by the proteins with multiple sites, which may be the important information for improving the prediction accuracy of the proteins with multiple sites. In this paper, a new algorithm which can effectively exploit the correlations among the locations is proposed by using gaussian process model. Besides, the algorithm also can realize optimal linear combination of various feature extraction technologies and could be robust to the imbalanced data set. Experimental results on a human protein data set show that the proposed algorithm is valid and can achieve better performance than the existing approaches.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22715364 PMCID: PMC3371015 DOI: 10.1371/journal.pone.0037155
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Graphical model for IMMMLGP.
The experimental results (mean) on human protein data sets for investigating the usefulness of the correlations among the locations.
| Evaluation metric | The proposed algorithm | ||||||
| The original data set | The new data set (40%) | ||||||
| Normal | Variation | The gap | Normal | Variation | The gap | ||
| The whole test set | Average precision | 0.661 | 0.655 | 0.006 | 0.653 | 0.636 | 0.017 |
| Recall | 0.595 | 0.587 | 0.008 | 0.562 | 0.543 | 0.019 | |
| F1-score | 0.530 | 0.522 | 0.008 | 0.516 | 0.504 | 0.012 | |
| Absolute true success rate | 0.274 | 0.261 | 0.013 | 0.204 | 0.189 | 0.015 | |
| Coverage | 2.003 | 2.047 | −0.044 | 2.630 | 2.711 | −0.081 | |
| Ranking loss | 0.129 | 0.132 | −0.003 | 0.143 | 0.148 | −0.005 | |
| Samples withmultiple sites | Average precision | 0.688 | 0.673 | 0.015 | 0.700 | 0.678 | 0.022 |
| Recall | 0.478 | 0.459 | 0.019 | 0.535 | 0.498 | 0.037 | |
| F1-score | 0.535 | 0.518 | 0.017 | 0.572 | 0.545 | 0.027 | |
| Absolute true success rate | 0.179 | 0.148 | 0.031 | 0.231 | 0.181 | 0.050 | |
| Coverage | 3.889 | 4.030 | −0.141 | 3.825 | 3.954 | −0.129 | |
| Ranking loss | 0.152 | 0.158 | −0.006 | 0.148 | 0.155 | −0.007 | |
Figure 2Subcellular distribution of the test samples.
The performance comparison between the proposed algorithm and Hum-mPLoc 2.0.
| Evaluation metric | The proposed algorithm | Hum-mPLoc 2.0 | |
| The whole test set | Average precision |
| 0.579 |
| Recall |
| 0.519 | |
| F1-score | 0.506 |
| |
| Absolute true success rate | 0.202 |
| |
| Coverage |
| 5.317 | |
| Ranking loss |
| 0.496 | |
| Samples with multiple sites | Average precision |
| 0.568 |
| Recall |
| 0.443 | |
| F1-score |
| 0.548 | |
| Absolute true success rate |
| 0.114 | |
| Coverage |
| 8.453 | |
| Ranking loss |
| 0.568 | |
Some examples of the experimental results outputted by the two algorithms.
| Accession number | Locations annotated in Swiss-Prot database | The predicted results ofHum-mPLoc 2.0 | The predicted results of theproposed algorithm |
| P60852 | Plasma membrane; Extracell | Extracell | Plasma membrane; Extracell |
| O75396 | Endoplasmic reticulum; Golgi apparatus | Endoplasmic reticulum | Endoplasmic reticulum; Golgi apparatus |
| Q2VWA4 | Cytoplasm; Nucleus | Nucleus | Cytoplasm; Nucleus |
| Q6NT55 | Endoplasmic reticulum; Microsome | Endoplasmic reticulum; Microsome;Extracell | Endoplasmic reticulum; Microsome |
| P42261 | Plasma membrane; Endoplasmic reticulum; Synapse | Plasma membrane; Synapse; Extracell | Plasma membrane; Endoplasmic reticulum; Synapse |
| Q9Y3A5 | Cytoplasm; Nucleus; Cytoskeleton | Mitochondrion | Cytoplasm; Nucleus |
| P49419 | Cytoplasm; Nucleus; Mitochondrion | Mitochondrion | Cytoplasm; Mitochondrion |
| Q86WV6 | Endoplasmic reticulum; Cytoplasm; Mitochondrion; Plasma membrane | Cytoplasm | Cytoplasm; Endoplasmic reticulum |
| Q99527 | Plasma membrane; Golgi apparatus; Endoplasmic reticulum | Plasma membrane | Plasma membrane; Endoplasmic reticulum |
| O75410 | Cytoplasm; Nucleus; Centriole | Nucleus | Cytoplasm; Nucleus; Centriole; Mitochondrion |