| Literature DB >> 32420339 |
Xiaoli Ruan1, Dongming Zhou1, Rencan Nie1, Yanbu Guo1.
Abstract
Apoptosis proteins are strongly related to many diseases and play an indispensable role in maintaining the dynamic balance between cell death and division in vivo. Obtaining localization information on apoptosis proteins is necessary in understanding their function. To date, few researchers have focused on the problem of apoptosis data imbalance before classification, while this data imbalance is prone to misclassification. Therefore, in this work, we introduce a method to resolve this problem and to enhance prediction accuracy. Firstly, the features of the protein sequence are captured by combining Improving Pseudo-Position-Specific Scoring Matrix (IM-Psepssm) with the Bidirectional Correlation Coefficient (Bid-CC) algorithm from position-specific scoring matrix. Secondly, different features of fusion and resampling strategies are used to reduce the impact of imbalance on apoptosis protein datasets. Finally, the eigenvector adopts the Support Vector Machine (SVM) to the training classification model, and the prediction accuracy is evaluated by jackknife cross-validation tests. The experimental results indicate that, under the same feature vector, adopting resampling methods remarkably boosts many significant indicators in the unsampling method for predicting the localization of apoptosis proteins in the ZD98, ZW225, and CL317 databases. Additionally, we also present new user-friendly local software for readers to apply; the codes and software can be freely accessed at https://github.com/ruanxiaoli/Im-Psepssm.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32420339 PMCID: PMC7201498 DOI: 10.1155/2020/4071508
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Framework of the proposed prediction model.
Data and the distribution of the sequence identity percentage for apoptosis.
| Datasets | ≤40% | 41%–80% | 81%–90% | ≥91% | Cy | Me | Mi | Nu | En | Se | Sum |
| CL317 | 40.1 | 15.5 | 18.9 | 25.6 | 112 | 55 | 34 | 52 | 47 | 17 | 317 |
| ZW225 | 52.9 | 16 | 16 | 15.1 | 70 | 89 | 25 | 41 | — | — | 225 |
| Dataset | ≤40% | 41%–80% | 81%–90% | ≥91% | Cy | Me | Mi | Others | — | — | Sum |
| ZD98 | 34.69 | 30.61 | 17.35 | 17.35 | 43 | 30 | 13 | 12 | — | — | 98 |
Figure 2The segment of submatrix for Psepssm and IM-Psepssm.
Figure 3Effect of selecting different values of ξ on CL317, ZW225, and ZD98 datasets by jackknife test.
Figure 4Effect of selecting different values of k on CL317, ZW225, and ZD98 datasets by jackknife test.
The contribution of two feature submodels for the final overall accuracy (%).
| Datasets | Index | IM-Psepssm | T1-IM-PSSM | T2-IM-PSSM | T3-IIM-PSSM | BID-CC | BIM-PSSM |
|---|---|---|---|---|---|---|---|
| ZD98 | Sn | 95.92 | 93.26 | 93.84 | 88.0 | 93 | 94.41 |
| Sp | 98.59 | 97.85 | 98.14 | 96.49 | 97.60 | 98.27 | |
|
| 96.59 | 94.77 | 95.97 | 94.11 | 96.19 | 97.88 | |
| Mcc | 93.82 | 90.41 | 92.0 | 86.28 | 91.52 | 94.32 | |
|
| 97.23 | 95.48 | 95.90 | 91.81 | 95.20 | 96.24 | |
| OA | 95.91 | 93.87 | 94.89 | 90.81 | 93.87 | 95.91 | |
|
| |||||||
| ZW225 | Sn | 83.80 | 79.67 | 78.03 | 75.87 | 81 | 84.55 |
| Sp | 95.28 | 93.82 | 93.34 | 92.68 | 94.4 | 95.52 | |
|
| 91.36 | 88.24 | 87.39 | 87.24 | 89.32 | 90.34 | |
| Mcc | 81.00 | 75.27 | 73.36 | 71.81 | 77.36 | 80.69 | |
|
| 89.21 | 86.29 | 85.11 | 83.55 | 87.29 | 89.79 | |
| OA | 86.66 | 82.66 | 81.33 | 80 | 84.44 | 87.55 | |
|
| |||||||
| CL317 | Sn | 88.21 | 88.64 | 89.25 | 81.64 | 87.54 | 88.84 |
| Sp | 97.88 | 97.76 | 97.89 | 96.79 | 97.85 | 98 | |
|
| 93.97 | 92.91 | 93.93 | 93.09 | 93.80 | 95.42 | |
| Mcc | 87.22 | 86.38 | 87.66 | 82.18 | 86.71 | 88.88 | |
|
| 92.87 | 93.06 | 93.44 | 88.56 | 92.45 | 93.22 | |
| OA | 90.22 | 89.58 | 90.22 | 86.12 | 90.22 | 91.48 | |
Figure 5Comparison of sampling methods for each class of samples. (a) ZD98. (b) ZW225. (c) CL317.
Performance comparison of original data and sampling methods for each class of sample.
| Dataset location | Original dataset | Resampling 1 | Resampling 2 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mcc |
|
| Mcc |
|
| Mcc |
|
| |
|
| |||||||||
| Cy | 91.83 | 93.94 | 96.1 | 95.17 | 97.56 | 97.57 | 97.3 | 100 | 97.7 |
| Me | 100 | 100 | 100 | 98.35 | 100 | 98.83 | 100 | 100 | 100 |
| Mi | 90.24 | 100 | 91.29 | 96.44 | 96.43 | 99.09 | 98.2 | 98.2 | 99.3 |
| Others | 95.2 | 97.59 | 97.59 | 98.33 | 100 | 98.8 | 100 | 100 | 100 |
|
| |||||||||
|
| |||||||||
| Cy | 86.43 | 91.26 | 93.7 | 96.56 | 96.56 | 99.63 | 97.31 | 97.28 | 99.71 |
| Me | 94.95 | 98.7 | 96.58 | 98.86 | 99.43 | 99.43 | 100 | 100 | 100 |
| Mi | 92.51 | 94.62 | 97.22 | 99.14 | 99.41 | 99.9 | 98.83 | 98.83 | 99.79 |
| Se | 89.91 | 96.44 | 93.6 | 97.78 | 99.44 | 98.54 | 96.13 | 100 | 96.85 |
| Nu | 82.62 | 91.48 | 90.76 | 95.2 | 96.98 | 97.98 | 99.37 | 99.37 | 99.9 |
| En | 86.87 | 100 | 87.45 | 96.23 | 99.4 | 97.28 | 99.41 | 99.41 | 99.89 |
|
| |||||||||
|
| |||||||||
| Cy | 79.44 | 88.62 | 89.98 | 91.65 | 92.79 | 97.61 | 94.7 | 94.64 | 99.11 |
| Me | 89.75 | 94.37 | 93.85 | 97.85 | 97.85 | 99.53 | 98.9 | 98.9 | 99.76 |
| Mi | 78.83 | 92.03 | 86.52 | 97.4 | 99.13 | 98.54 | 99.15 | 100 | 99.42 |
| Nu | 76.62 | 86.35 | 88.81 | 94.69 | 100 | 96.18 | 96.47 | 100 | 97.5 |
Performance comparison of different models on ZD98 dataset.
| Methods | Prediction accuracy (%) | ||||
|---|---|---|---|---|---|
| Cy | Me | Mi | Others | OA | |
| OF-SVM [ | 97.7 | 86.3 | 92.3 | 66.7 | 90.8 |
| FTD-SVM [ | 95.4 | 93.3 | 76.9 | 83.3 | 90.8 |
| BOW-SVM [ | 97.7 | 92.9 | 76.9 | 83.3 | 91.7 |
| GA_DCCA-SVM [ | 95.4 | 90.0 | 92.3 | 83.3 | 91.8 |
| OA-SVM [ | 95.3 | 88.9 | 97.4 | 91.7 | 93.2 |
| PSSMP [ | 95.3 | 93.3 | 84.6 | 91.7 | 92.9 |
| OA-MLSC [ | 100 | 96.7 | 92.3 | 95.9 | 96.7 |
| This paper |
|
|
|
|
|
Performance comparison of different models on the ZW225 dataset.
| Methods | Prediction accuracy (%) | ||||
|---|---|---|---|---|---|
| Cy | Me | Mi | Nu | OA | |
| OF-SVM [ | 85.7 | 91.0 | 68.0 | 82.9 | 85.3 |
| FTD-SVM [ | 88.6 | 93.3 | 64.0 | 75.6 | 85.3 |
| GA_DCCA-SVM [ | 87.1 | 91.0 | 68.0 | 75.6 | 84.4 |
| OA-SVM [ | 93.3 | 92.1 | 96.0 | 93.5 | 92.2 |
| IACC-SVM [ | 88.6 | 92.1 | 64.0 | 75.6 | 84.9 |
| EN-FKNN [ | 94.3 | 94.4 | 60.0 | 80.5 | 88.0 |
| Dual-layer SVM [ | 91.4 | 94.4 | 76.0 | 78.1 | 88.4 |
| This paper |
|
|
|
|
|
Performance comparison of different models on the CL317 dataset.
| Methods | Prediction accuracy (%) | ||||||
|---|---|---|---|---|---|---|---|
| Cy | Me | Mi | Se | Nu | En | OA | |
| OF-SVM [ | 94.6 | 90.9 | 76.5 | 92.2 | 86.5 | 93.6 | 89.6 |
| FTD-SVM [ | 92.9 | 89.1 | 82.4 | 70.6 | 86.5 | 93.6 | 89.0 |
| BOW-SVM [ | 94.6 | 87.3 | 82.4 | 82.4 | 84.3 | 91.5 | 89.2 |
| GA_DCCA-SVM [ | 92.9 | 89.1 | 82.4 | 76.5 | 84.6 | 93.6 | 89.0 |
| OA-SVM [ | 96.1 | 95.7 | 93.9 |
| 95.5 | 100 | 96.0 |
| IACC-SVM [ | 96.4 | 94.5 | 82.4 | 76.5 | 80.8 | 93.6 | 90.5 |
| PSSMP [ | 92.0 | 92.7 | 82.4 | 76.5 | 90.4 | 93.6 | 90.5 |
| EN-FKNN [ | 98.2 | 83.6 | 79.4 | 82.4 | 90.4 | 97.9 | 91.5 |
| OA-MLSC [ | 95.5 | 93.6 | 96.4 | 94.1 | 94.2 | 94.1 | 94.8 |
| This paper |
|
|
| 94.69 |
|
|
|