| Literature DB >> 27213149 |
Xiao Wang1, Hui Li1, Qiuwen Zhang1, Rong Wang1.
Abstract
Apoptosis proteins play a key role in maintaining the stability of organism; the functions of apoptosis proteins are related to their subcellular locations which are used to understand the mechanism of programmed cell death. In this paper, we utilize GO annotation information of apoptosis proteins and their homologous proteins retrieved from GOA database to formulate feature vectors and then combine the distance weighted KNN classification algorithm with them to solve the data imbalance problem existing in CL317 data set to predict subcellular locations of apoptosis proteins. It is found that the number of homologous proteins can affect the overall prediction accuracy. Under the optimal number of homologous proteins, the overall prediction accuracy of our method on CL317 data set reaches 96.8% by Jackknife test. Compared with other existing methods, it shows that our proposed method is very effective and better than others for predicting subcellular localization of apoptosis proteins.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27213149 PMCID: PMC4860209 DOI: 10.1155/2016/1793272
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Number of proteins in each of the 6 subcellular locations.
| Subset | Subcellular location | Number of proteins |
|---|---|---|
| 1 | Cytoplasmic | 110 |
| 2 | Membrane | 55 |
| 3 | Mitochondrial | 34 |
| 4 | Secreted | 17 |
| 5 | Nuclear | 51 |
| 6 | Endoplasmic reticulum | 47 |
| Total number | 314 | |
Figure 1A flowchart to show the prediction process.
Figure 2This graph shows how different numbers of homologous proteins affect the overall accuracies.
The prediction result for the data set.
| Location | SN (%) | SP (%) | MCC |
|---|---|---|---|
| Cy | 98.2 | 97.5 | 0.951 |
| Me | 98.2 | 99.6 | 0.978 |
| Mi | 97.1 | 99.3 | 0.951 |
| Se | 94.1 | 100 | 0.968 |
| Nu | 90.2 | 99.2 | 0.917 |
| En | 100 | 100 | 1.0 |
| ACC | 96.8 |
Comparison of different methods on CL317 data set.
| Method | SN (%) | ACC (%) | |||||
|---|---|---|---|---|---|---|---|
| Cy | Me | Mi | Se | Nu | En | ||
| ID [ | 81.3 | 81.8 | 85.3 | 88.2 | 82.7 | 83.0 | 82.7 |
| ID_SVM [ | 91.1 | 89.1 | 79.4 | 58.8 | 73.1 | 87.2 | 84.2 |
| DF_SVM [ | 92.9 | 85.5 | 76.5 | 76.5 | 93.6 | 86.5 | 88.0 |
| Auto_Cova [ | 86.4 | 90.7 | 93.8 | 85.7 | 92.1 | 93.8 | 90.0 |
| FKNN [ | 93.8 | 92.7 | 82.4 | 76.5 | 90.4 | 93.6 | 90.9 |
| PseAAC_SVM [ | 93.8 | 90.9 | 85.3 | 76.5 | 90.4 | 95.7 | 91.1 |
| EN_FKNN [ | 98.2 | 83.6 | 79.4 | 82.4 | 90.4 | 97.9 | 91.5 |
| PSSM-AC [ | 93.8 | 90.9 | 91.2 | 82.4 | 86.5 | 95.7 | 91.5 |
| APSLAP [ | 99.1 | 89.1 | 85.3 | 88.2 | 84.3 | 95.8 | 92.4 |
| Trigram encoding [ | 98.2 | 96.4 | 94.1 | 82.4 | 96.2 | 95.7 | 95.9 |
| Our method | 98.2 | 98.2 | 97.1 | 94.1 | 90.2 | 100 | 96.8 |