| Literature DB >> 35462533 |
Alessia Auriemma Citarella1, Luigi Di Biasi2, Michele Risi2, Genoveffa Tortora2.
Abstract
BACKGROUND: SNARE proteins play an important role in different biological functions. This study aims to investigate the contribution of a new class of molecular descriptors (called SNARER) related to the chemical-physical properties of proteins in order to evaluate the performance of binary classifiers for SNARE proteins.Entities:
Keywords: AdaBoost; KNN; Machine learning; Protein classification; Random forest; SNARE
Mesh:
Substances:
Year: 2022 PMID: 35462533 PMCID: PMC9035248 DOI: 10.1186/s12859-022-04677-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Visualization of the layers of the bundle of the fusion complex between the 4 parallel α-helices of the SNARE: 7 upstream layers (layers from − 1 to − 7) and 8 downstream layers (layers from + 1 to + 8) of the ionic layer (the layer 0) [4]
The SNARER descriptors
| Code | Description | Source |
|---|---|---|
| ARGP820102 | Signal sequence helical potential% | AAindex [ |
| CHAM830101 | The Chou-Fasman parameter of the coil conformation | |
| CHAM830107 | A parameter of charge transfer capability | |
| CHAM830108 | A parameter of charge transfer donor capability | |
| CHOP780204-CHOP780206 | Normalized frequency of N-terminal helix-non helical region | |
| CHOP780205-CHOP780207 | Normalized frequency of C-terminal helix-non helical region | |
| EISD860101 | Solvation free energy | |
| FAUJ880108 | Localized electrical effect | |
| FAUJ880111 | Positive charge | |
| FAUJ880112 | Negative charge | |
| GUYH850101 | Partition energy | |
| JANJ780101 | Average accessible surface area | |
| KRIW790101 | Side chain interaction parameter | |
| ZIMJ680102 | Bulkiness | |
| ONEK900102 | Helix formation parameters (delta delta G) | |
| Steric parameter | Fauchere et al. [ | |
| Polarizability | ||
| Volume | ||
| Isoelectric point | ||
| Helix probability | ||
| Sheet probability | ||
| Hydrophobicity |
Fig. 2Comparison between GAAC, CTDT, CKSAAP and 188 D ACC with related extended classes with SNARER (on DUNI dataset)
Performance of average ACC on the DUNI dataset
| Accuracy | |||
|---|---|---|---|
| RF | KNN (%) | ADA (%) | |
| GAAC | 76.1 | 85.1 | 77.9 |
| GAAC.ext | |||
| CTDT | 76.1 | 83.1 | 76.7 |
| CTDT.ext | |||
| CKSAAP | 91.1 | 90.04 | 83.7 |
| CKSAAP.ext | 90.02 | ||
| 188D | 93.9 | 94.8 | 88.1 |
| 188D.ext | |||
The highest values are shown in bold
Performance for average SN and SP on the DUNI dataset
| Sensitivity | Specificity | |||||
|---|---|---|---|---|---|---|
| RF | KNN (%) | ADA (%) | RF (%) | KNN (%) | ADA (%) | |
| GAAC | 90.3 | 83.6 | 7 | 7 | ||
| GAAC.ext | 97.2 | 60.6 | ||||
| CTDT | 89.1 | 83.3 | 7.1 | 65.6 | ||
| CTDT.ext | 96.6 | 54 | ||||
| CKSAAP | 97.8 | 98 | 89.9 | 71.7 | 66.7 | 65.5 |
| CKSAAP.ext | 97.8 | 98 | 66.7 | |||
| 188D | 92 | 85 | 89.5 | 76.7 | ||
| 188D.ext | 96.8 | 96.5 | ||||
The highest values are shown in bold
Performance of the average ACC on the DUNI dataset with oversampling and subsampling
| Oversampling | Subsampling | |||||
|---|---|---|---|---|---|---|
| RF | KNN (%) | ADA (%) | RF (%) | KNN (%) | ADA (%) | |
| GAAC | 94.7 | 96.3 | 73.12 | 75.2 | 79.2 | 72.6 |
| GAAC.ext | ||||||
| CTDT | 93.9 | 96.1 | 70.4 | 74.6 | 78.1 | 71.7 |
| CTDT.ext | ||||||
| CKSAAP | 84 | 83.5 | ||||
| CKSAAP.ext | 99.01 | 98.6 | 79 | 84.2 | ||
| 188D | 98.5 | 98.90 | 89.5 | 93.1 | 86.6 | |
| 188D.ext | 98.5 | 94 | ||||
The highest values are shown in bold
Performance for average SN and SP on the DUNI dataset with oversampling
| Sensitivity | Specificity | |||||
|---|---|---|---|---|---|---|
| RF (%) | KNN (%) | ADA (%) | RF (%) | KNN (%) | ADA (%) | |
| GAAC | 91.9 | 95 | 74.9 | 97.5 | 97.6 | 71.3 |
| GAAC.ext | ||||||
| CTDT | 88.9 | 94 | 68.8 | 98.8 | 98.2 | 72 |
| CTDT.ext | ||||||
| CKSAAP | 80.8 | 99.2 | 98.2 | 87.2 | ||
| CKSAAP.ext | 98.7 | 99.1 | 98.2 | |||
| 188D | 97.5 | 98.3 | 99.5 | 88.8 | ||
| 188D.ext | 98.3 | 89.4 | 99.3 | |||
The highest values are shown in bold
Performance for average SN and SP on the DUNI dataset with subsampling
| Sensitivity | Specificity | |||||
|---|---|---|---|---|---|---|
| RF (%) | KNN (%) | ADA (%) | RF (%) | KNN (%) | ADA (%) | |
| GAAC | 75.7 | 76.1 | 73.6 | 74.6 | 82.2 | 71.7 |
| GAAC.ext | ||||||
| CTDT | 78.3 | 76.4 | 73.9 | 71 | 79.7 | 69.6 |
| CTDT.ext | ||||||
| CKSAAP | 83.3 | 70.3 | 83.7 | |||
| CKSAAP.ext | 76.4 | 98.2 | 81.5 | 70.3 | ||
| 188D | 90.9 | 88 | 94.6 | 85.1 | ||
| 188D.ext | 93.1 | 94.9 | ||||
The highest values are shown in bold
Performance of average ACC for the D128 dataset
| Accuracy | |||
|---|---|---|---|
| RF (%) | KNN (%) | ADA (%) | |
| GAAC | 71.1 | 64.2 | 70 |
| GAAC.ext | |||
| CTDT | 73.4 | 66.4 | 70.3 |
| CTDT.ext | |||
| CKSAAP | 92.2 | 72.4 | 80.7 |
| CKSAAP.ext | |||
| 188D | |||
| 188D.ext | 95.3 | 88.6 | 90 |
The highest values are shown in bold
Fig. 3Comparison between GAAC, CTDT, CKSAAP and 188D ACC with related extended classes with SNARE (on D128 dataset)
Performance for average SN and SP on the D128 dataset
| Sensitivity | Specificity | |||||
|---|---|---|---|---|---|---|
| RF (%) | KNN (%) | ADA (%) | RF (%) | KNN (%) | ADA (%) | |
| GAAC | 80.1 | 74.5 | 62.2 | 63 | 65.4 | |
| GAAC.ext | 62.2 | |||||
| CTDT | 74.7 | 70 | 72.2 | 62.3 | 70.5 | |
| CTDT.ext | 64.7 | |||||
| CKSAAP | 89.7 | 55.4 | 80.2 | 95 | 89.4 | 81.3 |
| CKSAAP.ext | 95 | |||||
| 188D | 88.5 | 95.1 | ||||
| 188D.ext | 95.5 | 88 | 95.1 | 89.2 | 91.2 | |
The highest values are shown in bold
Comparison of MCC for the DUNI and D128 datasets
| Matthews correlation coefficient | ||||
|---|---|---|---|---|
| Dataset | MCC RF | MCC KNN | MCC ADA | |
| GAAC.ext | DUNI | 0.61 | ||
| D128 | 0.69 | 0.32 | ||
| CTDT.ext | DUNI | 0.77 | 0.49 | |
| D128 | 0.77 | 0.39 | ||
| CKSAAP.ext | DUNI | 0.77 | 0.69 | |
| D128 | 0.53 | |||
| 188D.ext | DUNI | 0.84 | 0.70 | |
| D128 | 0.81 | |||
The highest values are shown in bold
Fig. 4Graphic visualization of MCC for RF, KNN and ADA algorithms
Average AUC and AUPRC on the DUNI dataset
| AUC | AUPRC | |||||
|---|---|---|---|---|---|---|
| RF | KNN | ADA | RF | KNN | ADA | |
| GAAC | 0.84 | 0.81 | 0.78 | 0.93 | 0.89 | 0.89 |
| GAAC.ext | ||||||
| CTDT | 0.84 | 0.79 | 0.79 | 0.94 | 0.88 | 0.89 |
| CTDT.ext | ||||||
| CKSAAP | 0.98 | 0.84 | 0.89 | 0.99 | 0.90 | 0.96 |
| CKSAAP.ext | 0.98 | 0.84 | 0.99 | 0.90 | ||
| 188D | 0.98 | 0.94 | 0.94 | 0.99 | 0.96 | 0.98 |
| 188D.ext | 0.98 | 0.94 | 0.99 | 0.96 | 0.98 | |
The highest values are shown in bold
Average AUC and AUPRC on the D128 dataset
| AUC | AUPRC | |||||
|---|---|---|---|---|---|---|
| RF | KNN | ADA | RF | KNN | ADA | |
| GAAC | 0.76 | 0.64 | 0.75 | 0.76 | 0.61 | 0.74 |
| GAAC.ext | ||||||
| CTDT | 0.82 | 0.66 | 0.77 | 0.84 | 0.62 | 0.78 |
| CTDT.ext | ||||||
| CKSAAP | 0.97 | 0.72 | 0.90 | 0.98 | 0.70 | 0.91 |
| CKSAAP.ext | 0.97 | 0.98 | ||||
| 188D | 0.99 | 0.97 | 0.99 | 0.97 | ||
| 188D.ext | 0.99 | 0.89 | 0.97 | 0.99 | 0.85 | 0.97 |
The highest values are shown in bold
Comparison with reference literature
| Authors | Methods | ACC | SP | SN |
|---|---|---|---|---|
| Kloepper et al. | HMM | 95% | – | – |
| Nguyen et al. | 2D-CNN | 89.7% | 93.5% | 76.6% |
| Guilin Li | 188D-RF-oversample | 90–95% | 95–100% | 75–80% |
| (highest value) | RF-188D.ext | 95.3% | 95.1% | 95.5% |
| (best value) | RF-CKSAAP.ext | 92.3% | 95% | 90.1% |
The highest values are shown in bold