| Literature DB >> 29296195 |
Bin Yu1,2,3, Shan Li1,2, Wen-Ying Qiu1,2, Cheng Chen1,2, Rui-Xin Chen1,2, Lei Wang4, Ming-Hui Wang1,2, Yan Zhang2,5.
Abstract
Apoptosis proteins subcellular localization information are very important for understanding the mechanism of programmed cell death and the development of drugs. The prediction of subcellular localization of an apoptosis protein is still a challenging task because the prediction of apoptosis proteins subcellular localization can help to understand their function and the role of metabolic processes. In this paper, we propose a novel method for protein subcellular localization prediction. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM), then the feature information of the extracted is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins. Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods. The results indicate that the method proposed in this paper can remarkably improve the prediction accuracy of apoptosis protein subcellular localization, which will be a supplementary tool for future proteomics research.Entities:
Keywords: apoptosis protein subcellular location; pseudo-amino acid composition; pseudo-position specific scoring matrix; support vector machine; two-dimensional wavelet denoising
Year: 2017 PMID: 29296195 PMCID: PMC5746097 DOI: 10.18632/oncotarget.22585
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Prediction results of subcellular localization of the CL317 dataset by selecting different λ values
| Locations | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Jackknife test (%) | |||||||||||
| 0 | 5 | 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 49 | |
| Cy | 80.36 | 79.46 | 86.61 | 88.39 | 86.61 | 84.82 | 87.50 | 85.71 | 83.03 | 83.93 | 83.93 |
| Me | 74.55 | 80.00 | 85.45 | 90.91 | 89.09 | 89.09 | 87.27 | 83.64 | 87.27 | 87.27 | 85.45 |
| Mi | 58.82 | 67.65 | 70.59 | 73.53 | 64.71 | 73.53 | 61.76 | 73.53 | 67.65 | 61.76 | 67.65 |
| Se | 29.41 | 47.06 | 58.82 | 41.18 | 47.06 | 47.06 | 58.82 | 64.71 | 70.59 | 58.82 | 58.82 |
| Nu | 57.69 | 76.92 | 71.15 | 78.85 | 71.15 | 80.76 | 71.15 | 75.00 | 75.00 | 75.00 | 75.00 |
| En | 89.36 | 93.62 | 93.62 | 95.74 | 95.74 | 97.87 | 95.74 | 95.74 | 95.74 | 93.62 | 93.62 |
| OA | 71.92 | 78.23 | 81.70 | 84.23 | 81.39 | 83.60 | 81.70 | 82.65 | 82.02 | 80.76 | 81.07 |
Prediction results of subcellular localization of the ZW225 dataset by selecting different λ values
| Locations | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Jackknife test (%) | |||||||||||
| 0 | 5 | 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 49 | |
| Cy | 81.43 | 85.71 | 80.00 | 85.71 | 84.29 | 80.00 | 80.00 | 81.43 | 78.57 | 81.43 | 78.57 |
| Me | 83.15 | 86.52 | 91.01 | 87.64 | 91.01 | 88.76 | 88.76 | 91.01 | 88.76 | 87.64 | 85.39 |
| Mi | 52.00 | 52.00 | 56.00 | 64.00 | 56.00 | 60.00 | 56.00 | 56.00 | 56.00 | 60.00 | 56.00 |
| Nu | 70.73 | 65.85 | 56.10 | 63.41 | 60.98 | 65.85 | 68.29 | 75.61 | 70.73 | 68.29 | 75.61 |
| OA | 76.89 | 78.67 | 77.33 | 80.00 | 79.56 | 78.67 | 78.67 | 81.33 | 78.67 | 79.11 | 78.22 |
Prediction results of subcellular localization of the CL317 dataset by selecting different ξ values
| Locations | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Jackknife test (%) | |||||||||||
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| Cy | 83.04 | 83.93 | 87.50 | 87.50 | 87.50 | 89.29 | 91.96 | 91.96 | 91.96 | 91.07 | 91.96 |
| Me | 78.18 | 81.82 | 90.91 | 90.91 | 89.09 | 90.91 | 90.91 | 89.09 | 89.09 | 90.91 | 89.09 |
| Mi | 50.00 | 55.88 | 70.59 | 73.53 | 82.35 | 82.35 | 82.35 | 85.29 | 85.29 | 85.29 | 88.24 |
| Se | 88.24 | 82.35 | 82.35 | 88.24 | 82.35 | 82.35 | 82.35 | 82.35 | 82.35 | 82.35 | 82.35 |
| Nu | 67.31 | 61.54 | 78.85 | 80.77 | 86.54 | 90.38 | 90.38 | 88.46 | 88.46 | 90.38 | 88.46 |
| En | 87.23 | 91.49 | 93.62 | 93.62 | 93.62 | 93.62 | 93.62 | 95.74 | 97.87 | 97.87 | 97.87 |
| OA | 76.97 | 77.92 | 85.49 | 86.44 | 87.70 | 89.27 | 90.22 | 90.22 | 90.54 | 90.85 | 90.85 |
Prediction results of subcellular localization of the ZW225 dataset by selecting different ξ values
| Locations | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Jackknife test (%) | |||||||||||
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| Cy | 81.43 | 80.00 | 82.86 | 84.29 | 80.00 | 81.43 | 85.71 | 84.29 | 84.29 | 84.29 | 84.29 |
| Me | 82.02 | 85.39 | 89.89 | 91.01 | 91.01 | 91.01 | 91.01 | 92.13 | 91.01 | 91.01 | 91.01 |
| Mi | 36.00 | 60.00 | 68.00 | 68.00 | 72.00 | 72.00 | 72.00 | 76.00 | 72.00 | 72.00 | 72.00 |
| Nu | 63.41 | 65.85 | 70.73 | 75.61 | 80.49 | 78.05 | 82.93 | 82.93 | 82.93 | 82.93 | 85.37 |
| OA | 73.33 | 77.33 | 81.78 | 83.56 | 83.56 | 83.56 | 85.78 | 86.22 | 85.33 | 85.33 | 85.78 |
Figure 1Effect of selecting different values of λ on the prediction results of subcellular localization for CL317 and ZW225 datasets
Figure 2Effect of selecting different values of ξ on the prediction results of subcellular localization for CL317 and ZW225 datasets
Prediction results of subcellular localization in the datasets CL317 and ZW225 under different wavelet functions and different decomposition scales
| Datasets | Functions | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Jackknife test (%) | |||||||||||
| db1 | db4 | db8 | sym3 | sym7 | coif2 | coif4 | bior1.1 | bior2.4 | bior3.3 | ||
| CL317 | 3 | 99.05 | 98.11 | 99.05 | 98.74 | 98.11 | 98.11 | 97.48 | 99.05 | 98.42 | 98.74 |
| 4 | 98.74 | 97.48 | 99.37 | 98.11 | 97.79 | 98.74 | 98.42 | 98.74 | 98.42 | 98.74 | |
| 5 | 98.74 | 97.79 | 99.05 | 98.74 | 98.42 | 98.11 | 98.42 | 98.74 | 98.42 | 98.74 | |
| ZW225 | 3 | 98.67 | 98.67 | 100 | 99.11 | 99.11 | 99.56 | 99.56 | 98.67 | 98.22 | 99.11 |
| 4 | 98.22 | 98.67 | 100 | 98.22 | 98.67 | 99.11 | 99.11 | 98.22 | 97.78 | 99.11 | |
| 5 | 97.78 | 98.22 | 99.56 | 99.11 | 99.11 | 98.67 | 99.56 | 97.78 | 97.78 | 98.67 | |
Figure 3Prediction performance of dataset CL317 under different wavelet functions and different decomposition scales
Figure 4Prediction performance of dataset ZW225 under different wavelet functions and different decomposition scales
Prediction results of subcellular localization by four different feature extraction methods on dataset CL317
| Algorithm | Locations | ||||||
|---|---|---|---|---|---|---|---|
| Jackknife test (%) | |||||||
| Cy | Me | Mi | Se | Nu | En | OA | |
| PseAAC | 88.39 | 90.91 | 73.53 | 41.18 | 78.85 | 95.74 | 84.23 |
| PsePSSM | 91.96 | 89.09 | 88.24 | 82.35 | 88.46 | 97.87 | 90.85 |
| PseAAC-PsePSSM | 90.18 | 89.09 | 85.29 | 88.24 | 90.38 | 93.62 | 89.91 |
| PseAAC-PsePSSM-WD | 100 | 100 | 94.12 | 100 | 100 | 100 | 99.37 |
Prediction results of subcellular localization by four different feature extraction methods on dataset ZW225
| Algorithm | Locations | ||||
|---|---|---|---|---|---|
| Jackknife test (%) | |||||
| Cy | Me | Mi | Nu | OA | |
| PseAAC | 85.71 | 87.64 | 64.00 | 63.41 | 80.00 |
| PsePSSM | 84.29 | 91.01 | 72.00 | 85.37 | 85.78 |
| PseAAC-PsePSSM | 87.14 | 93.26 | 80.00 | 80.49 | 87.56 |
| PseAAC-PsePSSM-WD | 100 | 100 | 100 | 100 | 100 |
Figure 5This graph shows the ROC curves of CL317 dataset
Figure 6This graph shows the ROC curves of ZW225 dataset
Prediction results of subcellular localization of the CL317 dataset under different kernel functions
| Locations | Functions | |||
|---|---|---|---|---|
| Jackknife test (%) | ||||
| 0 | 1 | 2 | 3 | |
| Cy | 100 | 100 | 100 | 87.50 |
| Me | 100 | 30.91 | 100 | 80.00 |
| Mi | 94.12 | 0 | 91.18 | 61.76 |
| Se | 100 | 94.12 | 100 | 0.00 |
| Nu | 100 | 71.15 | 100 | 73.08 |
| En | 100 | 95.74 | 100 | 93.62 |
| OA | 99.37 | 71.61 | 99.05 | 77.29 |
Prediction results of subcellular localization of the ZW225 dataset under different kernel functions
| Locations | Functions | |||
|---|---|---|---|---|
| Jackknife test (%) | ||||
| 0 | 1 | 2 | 3 | |
| Cy | 100 | 84.29 | 100 | 100 |
| Me | 100 | 100 | 98.88 | 86.52 |
| Mi | 100 | 0.00 | 100 | 52.00 |
| Nu | 100 | 97.56 | 100 | 80.49 |
| OA | 100 | 83.56 | 99.56 | 85.78 |
Figure 7The overall prediction accuracy of two apoptosis datasets CL317 and ZW225 under four different kernel functions
Prediction results of subcellular localization of the CL317 dataset under different classification algorithms
| Classifiers | Evaluate | |||
|---|---|---|---|---|
| Jackknife test | ||||
| Sens (%) | Spec (%) | MCC | OA (%) | |
| SVM | 99.02 | 99.87 | 0.9908 | 99.37 |
| KNN | 99.01 | 99.79 | 0.9889 | 99.05 |
| RF | 97.26 | 99.52 | 0.9690 | 97.79 |
| Naïve Bayes | 89.61 | 97.51 | 0.8629 | 88.33 |
| DT | 93.61 | 98.98 | 0.9164 | 94.64 |
Prediction results of subcellular localization of the ZW225 dataset under different classification algorithms
| Classifiers | Evaluate | |||
|---|---|---|---|---|
| Jackknife test | ||||
| Sens (%) | Spec (%) | MCC | OA (%) | |
| SVM | 100 | 100 | 1 | 100 |
| KNN | 93.72 | 98.77 | 0.9447 | 96.89 |
| RF | 98.72 | 99.65 | 0.9871 | 99.11 |
| Naïve Bayes | 95.35 | 98.04 | 0.9087 | 93.78 |
| DT | 97.44 | 99.34 | 0.9710 | 98.22 |
Figure 8The overall prediction accuracy of subcellular localization of the five classification algorithms for datasets CL317 and ZW225
Prediction performance of different test method for protein subcellular localization on the CL317 dataset
| Locations | Test | |||||
|---|---|---|---|---|---|---|
| Self-consistency test | Jackknife test | |||||
| MCC | MCC | |||||
| Cy | 100 | 100 | 1 | 100 | 100 | 1 |
| Me | 100 | 100 | 1 | 100 | 99.62 | 0.9891 |
| Mi | 100 | 100 | 1 | 94.12 | 100 | 0.9667 |
| Se | 100 | 100 | 1 | 100 | 100 | 1 |
| Nu | 100 | 100 | 1 | 100 | 99.62 | 0.9886 |
| En | 100 | 100 | 1 | 100 | 100 | 1 |
| OA | 100 | 99.37 | ||||
Prediction performance of different test method for protein subcellular localization on the ZW225 dataset
| Locations | Test | |||||
|---|---|---|---|---|---|---|
| Self-consistency test | Jackknife test | |||||
| MCC | MCC | |||||
| Cy | 100 | 100 | 1 | 100 | 100 | 1 |
| Me | 100 | 100 | 1 | 100 | 100 | 1 |
| Mi | 100 | 100 | 1 | 100 | 100 | 1 |
| Nu | 100 | 100 | 1 | 100 | 100 | 1 |
| OA | 100 | 100 | ||||
Prediction performance of different test method for protein subcellular localization on the ZD98 dataset
| Locations | Test | |||||
|---|---|---|---|---|---|---|
| Self-consistency test | Jackknife test | |||||
| MCC | MCC | |||||
| Cy | 100 | 100 | 1 | 100 | 100 | 1 |
| Me | 100 | 100 | 1 | 100 | 100 | 1 |
| Mi | 100 | 100 | 1 | 100 | 98.82 | 0.9579 |
| Other | 100 | 100 | 1 | 91.67 | 100 | 0.9519 |
| OA | 100 | 98.98 | ||||
Figure 9This graph shows the ROC curves of ZD98 dataset
Prediction results with different methods on the CL317 dataset using jackknife test
| Methods | Jackknife test (%) | ||||||
|---|---|---|---|---|---|---|---|
| Sensitivity for each class | |||||||
| Cy | Me | Mi | Se | Nu | En | OA | |
| ID [ | 81.3 | 81.8 | 85.3 | 88.2 | 82.7 | 83.0 | 82.7 |
| ID_SVM [ | 91.1 | 89.1 | 79.4 | 58.8 | 73.1 | 87.2 | 84.2 |
| DF_SVM [ | 92.9 | 85.5 | 76.5 | 76.5 | 93.6 | 86.5 | 88.0 |
| FKNN [ | 93.8 | 92.7 | 82.4 | 76.5 | 90.4 | 93.6 | 90.9 |
| PseAAC_SVM [ | 93.8 | 90.9 | 85.3 | 76.5 | 90.4 | 95.7 | 91.1 |
| EN_FKNN [ | 98.2 | 83.6 | 79.4 | 82.4 | 90.4 | 97.9 | 91.5 |
| DWT_SVM [ | 100 | 98.2 | 82.4 | 94.1 | 100 | 100 | 97.5 |
| APSLAP [ | 99.1 | 89.1 | 85.3 | 88.2 | 84.3 | 95.8 | 92.4 |
| Liu | 98.2 | 96.4 | 94.1 | 82.4 | 96.2 | 95.7 | 95.9 |
| Auto_Cova [ | 86.4 | 90.7 | 93.8 | 85.7 | 92.1 | 93.8 | 90.0 |
| DCCA coefficient [ | 91.1 | 92.7 | 82.4 | 76.5 | 80.8 | 93.6 | 88.3 |
| PseAAC-PsePSSM-WD | 94.1 | 100 | 100 | ||||
Prediction results with different methods on the ZW225 dataset using jackknife test
| Methods | Jackknife test (%) | ||||
|---|---|---|---|---|---|
| Sensitivity for each class | |||||
| Cy | Me | Mi | Nu | OA | |
| ID_SVM [ | 92.9 | 91.0 | 68.0 | 73.2 | 85.8 |
| DF_SVM [ | 87.1 | 92.1 | 64.0 | 73.2 | 84.0 |
| FKNN [ | 84.3 | 93.3 | 72.0 | 85.5 | 85.8 |
| EN_FKNN [ | 94.3 | 94.4 | 60.0 | 80.5 | 88.0 |
| DWT_SVM [ | 87.1 | 93.2 | 64.0 | 90.2 | 87.6 |
| Liu | 97.1 | 98.9 | 96.0 | 97.6 | 97.8 |
| Auto_Cova [ | 81.3 | 93.3 | 85.7 | 84.6 | 87.1 |
| EBGW_SVM [ | 90.0 | 93.3 | 60.0 | 63.4 | 83.1 |
| Dual-layer SVM [ | 91.4 | 94.4 | 76.0 | 78.1 | 88.4 |
| PseAAC-PsePSSM-WD | 100 | 100 | 100 | 100 | |
Performance comparison of the independent testing dataset by the jackknife test on the ZD98 dataset
| Methods | Jackknife test (%) | ||||
|---|---|---|---|---|---|
| Sensitivity for each class | |||||
| Cy | Me | Mi | Other | OA | |
| ID [ | 90.7 | 90.0 | 92.3 | 91.7 | 90.8 |
| ID_SVM [ | 95.3 | 93.3 | 84.6 | 58.3 | 88.8 |
| DF_SVM [ | 97.7 | 96.7 | 92.3 | 75.0 | 93.9 |
| FKNN [ | 95.3 | 96.7 | 100 | 91.7 | 95.9 |
| PseAAC_SVM [ | 95.3 | 93.3 | 92.3 | 83.3 | 92.9 |
| DWT_SVM [ | 95.4 | 93.3 | 53.9 | 91.7 | 88.8 |
| APSLAP [ | 95.3 | 90.0 | 100 | 91.7 | 94.9 |
| Liu et al. [ | 95.3 | 100 | 100 | 91.7 | 96.9 |
| EBGW_SVM [ | 97.7 | 90.0 | 92.3 | 83.3 | 92.9 |
| DCCA coefficient [ | 93.0 | 86.7 | 92.3 | 75.0 | 88.9 |
| PseAAC-PsePSSM-WD | 100 | 100 | 100 | 91.7 | |
Figure 10Wavelet denoising using db8 wavelet function and decomposition scale is j = 4. The x axis indicates the residue position along the sequence, the y axis indicates the intensity of signal
Figure 11Flow chart of apoptosis protein subcellular localization prediction based on PseAAC-PsePSSM-WD method