| Literature DB >> 32908888 |
Xinke Zhan1, Zhuhong You1, Changqing Yu1, Liping Li1, Jie Pan1.
Abstract
Identifying the drug-target interactions (DTIs) plays an essential role in new drug development. However, there still has the limited knowledge of DTIs and a significant number of unknown DTI pairs. Moreover, the traditional experimental methods have inevitable disadvantages such as high cost and time-consuming. Therefore, developing computational methods for predicting DTIs is attracting more and more attention. In this study, we report a novel computational approach for predicting DTI using GIST feature, position-specific scoring matrix (PSSM), and rotation forest (RF). Specifically, each target protein is first converted into a PSSM for retaining evolutionary information. Then, the GIST feature is extracted from PSSM and substructure fingerprint information is adopted to extract the feature of the drug. Finally, combining each protein and drug features to form a new drug-target pair, which is employed as input feature for RF classifier. In the experiment, the proposed method achieves high average accuracies of 89.25%, 85.93%, 82.36%, and 73.89% on enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor, respectively. For further evaluating the prediction performance of the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the same golden standard dataset. These promising results illustrate that the proposed method is more effective and stable than other methods. We expect the proposed method to be a useful tool for predicting large-scale DTIs.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32908888 PMCID: PMC7463380 DOI: 10.1155/2020/4516250
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
The statistic of four drug-target data.
| Dataset | Drugs | Target proteins | Interactions |
|---|---|---|---|
| Enzyme | 445 | 664 | 2926 |
| Ion channels | 210 | 204 | 1476 |
| GPCRs | 223 | 95 | 635 |
| Nuclear receptors | 54 | 26 | 90 |
5-fold cross-validation results were generated through the proposed method on the enzyme dataset.
| Testing set | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) | AUC | AUPR |
|---|---|---|---|---|---|---|
| 1 | 88.89 | 89.70 | 87.86 | 80.24 | 0.9570 | 0.8579 |
| 2 | 89.66 | 91.07 | 88.14 | 81.45 | 0.9450 | 0.8920 |
| 3 | 88.63 | 88.87 | 88.10 | 79.85 | 0.9444 | 0.8803 |
| 4 | 89.91 | 91.97 | 87.20 | 81.83 | 0.9538 | 0.8728 |
| 5 | 89.15 | 91.88 | 86.13 | 80.62 | 0.9391 | 0.8788 |
| Average |
|
|
|
|
|
|
5-fold cross-validation results were generated through the proposed method on the nuclear receptor dataset.
| Testing set | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) | AUC | AUPR |
|---|---|---|---|---|---|---|
| 1 | 75.00 | 76.92 | 62.50 | 60.78 | 0.7688 | 0.6444 |
| 2 | 80.56 | 78.95 | 83.33 | 68.62 | 0.8148 | 0.7007 |
| 3 | 69.44 | 81.25 | 61.90 | 56.69 | 0.8254 | 0.8484 |
| 4 | 72.22 | 60.00 | 85.71 | 58.61 | 0.8442 | 0.7507 |
| 5 | 72.22 | 72.00 | 85.71 | 56.06 | 0.7524 | 0.7053 |
| Average |
|
|
|
|
|
|
Figure 1The curves obtained by the proposed method on the enzyme dataset: (a) ROC curves and (b) PR curves.
Figure 4The curves obtained by the proposed method on the nuclear receptor dataset: (a) ROC curves and (b) PR curves.
5-fold cross-validation results were generated by using the proposed RF classifier and SVM classifier on the enzyme dataset.
| Testing set | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) | AUC |
|---|---|---|---|---|---|
| PSSM+GIST+RF | |||||
| 1 | 88.89 | 89.70 | 87.86 | 80.24 | 0.9570 |
| 2 | 89.40 | 89.90 | 88.98 | 81.05 | 0.9450 |
| 3 | 88.63 | 88.87 | 88.10 | 79.85 | 0.9444 |
| 4 | 89.91 | 91.97 | 87.20 | 81.83 | 0.9538 |
| 5 | 89.15 | 91.88 | 86.13 | 80.62 | 0.9391 |
| Average |
|
|
|
|
|
| PSSM+GIST+SVM | |||||
| 1 | 81.11 | 82.85 | 78.46 | 69.32 | 0.8794 |
| 2 | 82.65 | 83.65 | 81.53 | 71.31 | 0.8895 |
| 3 | 82.22 | 83.94 | 79.31 | 70.71 | 0.8901 |
| 4 | 81.28 | 82.23 | 79.24 | 69.53 | 0.8820 |
| 5 | 81.88 | 84.02 | 79.19 | 70.29 | 0.8770 |
| Average |
|
|
|
|
|
Figure 5The ROC curves performed by SVM classifier on the enzyme dataset.
Comparison of the AUC values between the proposed method and other four existing methods on four datasets.
| Dataset | Our method | NetCBP | Mousavian et al. | Li | RFDTI |
|---|---|---|---|---|---|
| Enzyme |
| 0.8251 | 0.9480 | 0.9288 | 0.9172 |
| Ion channels |
| 0.8034 | 0.8890 | 0.9171 | 0.8827 |
| GPCRs |
| 0.8235 | 0.8720 | 0.8856 | 0.8557 |
| Nuclear receptors |
| 0.8394 | 0.8690 | 0.9300 | 0.7531 |
5-fold cross-validation results were generated through the proposed method on the ion channel dataset.
| Testing set | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) | AUC | AUPR |
|---|---|---|---|---|---|---|
| 1 | 85.59 | 86.76 | 84.12 | 75.33 | 0.9292 | 0.8435 |
| 2 | 86.95 | 87.63 | 86.15 | 77.30 | 0.9431 | 0.8686 |
| 3 | 87.63 | 88.18 | 87.29 | 78.31 | 0.9323 | 0.8633 |
| 4 | 85.59 | 83.16 | 87.59 | 75.31 | 0.9370 | 0.8056 |
| 5 | 83.90 | 86.01 | 81.73 | 72.96 | 0.9146 | 0.8287 |
| Average |
|
|
|
|
|
|
5-fold cross-validation results were generated through the proposed method on the GPCR dataset.
| Testing set | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) | AUC | AUPR |
|---|---|---|---|---|---|---|
| 1 | 83.86 | 81.89 | 85.25 | 72.91 | 0.8853 | 0.7820 |
| 2 | 81.89 | 79.37 | 83.33 | 70.29 | 0.8894 | 0.7493 |
| 3 | 78.74 | 84.75 | 73.53 | 66.35 | 0.8660 | 0.8154 |
| 4 | 85.04 | 88.29 | 79.67 | 74.34 | 0.8998 | 0.8591 |
| 5 | 82.28 | 82.48 | 84.33 | 70.71 | 0.8992 | 0.7993 |
| Average |
|
|
|
|
|
|