| Literature DB >> 33225896 |
Xin Liu1, Liang Wang2,3, Jian Li4, Junfeng Hu2, Xiao Zhang5.
Abstract
BACKGROUND: Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs.Entities:
Keywords: Machine learning; Malonylation; Post-translational modification; Principal component analysis; Support vector machine
Mesh:
Substances:
Year: 2020 PMID: 33225896 PMCID: PMC7682087 DOI: 10.1186/s12864-020-07166-w
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Comparison of accuracy of different CKSAAP feature combinations
5-fold cross-validation results of different dimensions
| dimensions | Acc (%) | Sen (%) | Spec (%) | F1 (%) | MCC (%) |
|---|---|---|---|---|---|
| 50 | 85.91 | 86.09 | 85.68 | 85.94 | 75.77 |
| 100 | |||||
| 150 | 90.20 | 91.00 | 89.48 | 90.30 | 82.31 |
| 200 | 88.11 | 89.53 | 86.65 | 88.39 | 79.02 |
| 250 | 84.61 | 86.15 | 83.10 | 84.73 | 73.91 |
| 300 | 82.31 | 84.34 | 80.36 | 82.27 | 70.89 |
Fig. 2ROC curves of 5-fold cross-validation performed by SVM (dimensions equal to 100)
Fig. 3The comparison of different metrics result using PCA or not
The performance comparisons of different classical classifiers
| classifier | Acc (%) | Sen (%) | Spec (%) | F1 (%) | MCC (%) |
|---|---|---|---|---|---|
| KNN | 59.68 | 26.73 | 92.18 | 34.34 | 34.98 |
| NB | 83.24 | 84.17 | 82.36 | 83.39 | 72.11 |
| RF | 68.25 | 62.41 | 74.04 | 66.27 | 56.36 |
| Ensemble | 64.11 | 60.11 | 68.20 | 62.50 | 53.69 |
| Mal-Prec (SVM) |
Performance of different feature combinations on the independent data set
| Features | Acc (%) | Sen (%) | Spec (%) | F1 (%) | MCC (%) |
|---|---|---|---|---|---|
| CKSAAP | 77.55 | 77.14 | 77.97 | 77.59 | 65.18 |
| AAindex | 61.73 | 65.43 | 57.97 | 63.26 | 52.61 |
| One-hot | 58.71 | 61.43 | 55.94 | 59.97 | 51.44 |
| CKSAAP (exclude) | 86.19 | 86.86 | 85.51 | 86.36 | 76.19 |
| AAindex (exclude) | 79.42 | 80.57 | 78.26 | 79.77 | 67.30 |
| One-hot (exclude) | 71.08 | 76.29 | 65.80 | 72.65 | 58.65 |
| ALL | 90.65 | 89.71 | 91.59 | 90.62 | 83.04 |
Fig. 4ROC curves performed by different feature combinations on the independent data set
Comparison of state-of-the-art approaches in terms of Acc and AUC in different organisms
| Approach | Feature | Species | Acc (%) | AUC (%) |
|---|---|---|---|---|
| mRMR+SVM [ | K-gram+AAindex | N/A | N/A | 79.35 |
| IG + SVM [ | AAC + BINA (sequence-based) | 72.30 | 75.50 | |
| EBGW (physicochemical) | Mouse | 74.65 | 82.70 | |
| KNN + PSSM (evolutionary) | 73.72 | 87.10 | ||
| IG + SVM [ | PKsaap+AAindex+DC | Mouse | N/A | 73.90 |
| N/A | 74.30 | |||
| LSTM+RF [ | EAAC+word embedding | Mouse | 88.00 | 82.40 |
| SVM [ | Binary+PSSM+AAindex+Structured (ASA + SS + HSE + IDR) | Mouse | N/A | 76.00 |
| Proposed method | AAindex+One-hot+CKSAAP |
Fig. 5The statistical two-sample logo with t-test on human datasets (P-value < 0.05)
Fig. 6Schematic illustration of the Mal-Prec method from protein data selection to k-fold cross-validation
Nine physicochemical properties used in this study
| Properties description | Reference |
|---|---|
| Hydrophilicity value | Hopp and Woods [ |
| Mean polarity | Radzicka and Wolfenden [ |
| Isoelectric point | Zimmerman et al. [ |
| Refractivity | Treece et al. [ |
| Average flexibility indices | Bhaskaran and Ponnuswamy [ |
| Average volume of buries residue | Chothia [ |
| Electron-ion interaction potential values | Cosic [ |
| Transfer free energy to surface | Bull and Breese [ |
| Consensus normalized hydrophobicity | Eisenberg [ |