| Literature DB >> 31823720 |
Liu Liu, Xiuzhen Hu1, Zhenxing Feng2, Xiaojin Zhang2, Shan Wang2, Shuang Xu2, Kai Sun2.
Abstract
BACKGROUND: Proteins perform their functions by interacting with acid radical ions. Recently, it was a challenging work to precisely predict the binding residues of acid radical ion ligands in the research field of molecular drug design.Entities:
Keywords: Acid radical ions; Binding residues; K-nearest neighbors classifier
Mesh:
Substances:
Year: 2019 PMID: 31823720 PMCID: PMC6904995 DOI: 10.1186/s12860-019-0238-8
Source DB: PubMed Journal: BMC Mol Cell Biol ISSN: 2661-8850
Benchmark dataset of four acid radical ions
| Acid radical ion | Chains | Positive segments | Negative segments |
|---|---|---|---|
| NO2− | 22 | 98 | 8144 |
| CO32− | 62 | 316 | 22,766 |
| SO42− | 303 | 2125 | 99,729 |
| PO43− | 339 | 2168 | 112,279 |
Hydrophilic-hydrophobic classification of amino acids
| Classification | Amino Acids | Classification | Amino Acids |
|---|---|---|---|
| strongly hydrophobic | R, D, E, N, Q, K, H | Proline | P |
| strongly hydrophilic | L, I, V, A, M, F | Glycine | G |
| weakly hydrophilic | S, T, Y, W | Cysteine | C |
Fig. 1The relation between the values of k and MCC for SO42−
Fig. 2The MCC values of different L
Evaluation metrics of position combination features at different L of PO43−
| L | Optimal k value | Sn (%) | Sp (%) | Acc (%) | FPR (%) | MCC |
|---|---|---|---|---|---|---|
| 5 | 77 | 76.6 | 69.6 | 73.1 | 30.4 | 0.463 |
| 7 | 23 | 76.9 | 71.8 | 74.4 | 28.2 | 0.488 |
| 9 | 33 | 76.1 | 73.8 | 75.0 | 26.2 | 0.500 |
| 11 | 17 | 76.2 | 74.5 | 75.3 | 25.5 | 0.507 |
| 13 | 15 | 76.0 | 75.4 | 75.7 | 24.6 | 0.514 |
| 15 | 21 | 76.7 | 74.4 | 75.5 | 25.6 | 0.510 |
| 17 | 21 | 78.0 | 72.8 | 75.4 | 27.2 | 0.508 |
The performance of amino acid composition feature by KNN classifier
| Ligand | Optimal k value | Sn (%) | Sp (%) | Acc (%) | FPR (%) | MCC |
|---|---|---|---|---|---|---|
| NO2− | 23 | 54.1 | 72.4 | 63.3 | 27.6 | 0.270 |
| CO32− | 37 | 63.9 | 52.8 | 58.4 | 47.2 | 0.169 |
| SO42− | 91 | 59.3 | 61.1 | 60.2 | 38.9 | 0.204 |
| PO43− | 41 | 62.9 | 62.1 | 62.5 | 37.9 | 0.250 |
The performance of composition combination features by KNN classifier
| Ligand | Optimal k value | Sn (%) | Sp (%) | Acc (%) | FPR (%) | MCC |
|---|---|---|---|---|---|---|
| NO2− | 77 | 57.1 | 76.5 | 66.8 | 23.5 | 0.343 |
| CO32− | 35 | 63.6 | 59.5 | 61.6 | 40.5 | 0.231 |
| SO42− | 25 | 66.0 | 61.7 | 63.9 | 38.3 | 0.277 |
| PO43− | 71 | 69.1 | 66.3 | 67.7 | 33.7 | 0.355 |
The performance of position combination features by KNN classifier
| Ligand | Optimal k value | Sn (%) | Sp (%) | Acc (%) | FPR (%) | MCC |
|---|---|---|---|---|---|---|
| NO2− | 75 | 81.6 | 61.2 | 71.4 | 38.8 | 0.438 |
| CO32− | 31 | 75.6 | 67.7 | 71.7 | 32.3 | 0.435 |
| SO42− | 33 | 73.5 | 71.2 | 72.3 | 28.8 | 0.447 |
| PO43− | 15 | 76.0 | 75.4 | 75.7 | 24.6 | 0.514 |
Comparison of prediction results of three features
| Ligand | Feature | Optimal k value | Sn (%) | Sp (%) | Acc (%) | FPR (%) | MCC |
|---|---|---|---|---|---|---|---|
| NO2− | C | 77 | 57.1 | 76.5 | 66.8 | 23.5 | 0.343 |
| P | 75 | 81.6 | 61.2 | 71.4 | 38.8 | 0.438 | |
| R | 75 | 81.6 | 79.6 | 80.6 | 20.4 | 0.612 | |
| CO32− | C | 35 | 63.6 | 59.5 | 61.6 | 40.5 | 0.231 |
| P | 31 | 75.6 | 67.7 | 71.7 | 32.3 | 0.435 | |
| R | 115 | 74.4 | 78.5 | 76.4 | 21.5 | 0.529 | |
| SO42− | C | 25 | 66.0 | 61.7 | 63.9 | 38.3 | 0.277 |
| P | 33 | 73.5 | 71.2 | 72.3 | 28.8 | 0.447 | |
| R | 37 | 75.8 | 69.2 | 72.5 | 30.8 | 0.450 | |
| PO43− | C | 71 | 69.1 | 66.3 | 67.7 | 33.7 | 0.355 |
| P | 15 | 76.0 | 75.4 | 75.7 | 24.6 | 0.514 | |
| R | 61 | 76.4 | 74.0 | 75.2 | 26.0 | 0.504 |
The data of the training dataset and independent test dataset
| Ligand | Training dataset | Independent test dataset | ||||
|---|---|---|---|---|---|---|
| Chains | Pa | Nb | Chains | Pa | Nb | |
| NO2− | 17 | 76 | 6218 | 5 | 22 | 1926 |
| CO32− | 49 | 252 | 18,066 | 13 | 64 | 4700 |
| SO42− | 242 | 1751 | 79,164 | 61 | 374 | 20,565 |
| PO43− | 271 | 1730 | 90,786 | 68 | 438 | 21,493 |
aThe number of positive (binding) samples
bThe number of negative (non-binding) samples
Comparison of our independent test with IonSeq
| Ligand | Method | L | Optimal k value | Sn (%) | Sp (%) | Acc (%) | FPR(%) | MCC |
|---|---|---|---|---|---|---|---|---|
| NO2− | IonSeq | 11 | – | 18.00 | 99.78 | 98.79 | – | 0.2847 |
| OUR’S | 13 | 75 | 40.90 | 98.60 | 97.90 | 1.40 | 0.3100 | |
| CO32− | IonSeq | 13 | – | 10.62 | 99.82 | 98.58 | – | 0.2127 |
| OUR’S | 15 | 115 | 48.40 | 95.00 | 94.40 | 5.00 | 0.2170 | |
| SO42− | IonSeq | 11 | – | 13.65 | 99.32 | 97.53 | – | 0.1906 |
| OUR’S | 13 | 37 | 43.90 | 86.80 | 85.80 | 13.20 | 0.1160 | |
| PO43− | IonSeq | 11 | – | 24.15 | 99.38 | 97.95 | – | 0.3121 |
| OUR’S | 13 | 61 | 63.20 | 84.60 | 84.20 | 15.40 | 0.1810 |
Benchmark dataset of six metal ion ligands
| Metal ions | Chains | Binding residues | Non-binding residues |
|---|---|---|---|
| Zn2+ | 1428 | 6408 | 405,113 |
| Fe2+ | 92 | 382 | 29,345 |
| Fe3+ | 217 | 1057 | 68,829 |
| Cu2+ | 117 | 485 | 33,948 |
| Mn2+ | 459 | 2124 | 156,625 |
| Co2+ | 194 | 875 | 55,050 |
Comparison of results between KNN classifier with SVM
| Ligand | Method | L | Optimal k value | Sn (%) | Sp (%) | Acc (%) | FPR (%) | MCC |
|---|---|---|---|---|---|---|---|---|
| Zn2+ | OUR’S | 7 | 103 | 94.3 | 83.8 | 89.1 | 16.2 | 0.786 |
| SVM | – | 99.8 | 99.5 | 99.7 | – | 0.993 | ||
| Fe2+ | OUR’S | 9 | 41 | 92.1 | 80.4 | 86.3 | 19.6 | 0.730 |
| SVM | – | 91.9 | 90.7 | 91.3 | – | 0.826 | ||
| Fe3+ | OUR’S | 9 | 15 | 84.6 | 84.9 | 84.7 | 15.1 | 0.694 |
| SVM | – | 86.9 | 88.7 | 87.8 | – | 0.756 | ||
| Cu2+ | OUR’S | 13 | 49 | 92.4 | 86.6 | 89.5 | 13.4 | 0.791 |
| SVM | – | 95.5 | 97.1 | 96.3 | – | 0.926 | ||
| Mn2+ | OUR’S | 7 | 23 | 79.1 | 80.9 | 80.0 | 19.1 | 0.600 |
| SVM | – | 82.1 | 84.4 | 83.2 | – | 0.664 | ||
| Co2+ | OUR’S | 11 | 99 | 77.6 | 83.1 | 80.3 | 16.9 | 0.608 |
| SVM | – | 80.8 | 85.1 | 83.0 | – | 0.660 |