| Literature DB >> 35625469 |
Ying Wang1, Lei Wang1,2, Leon Wong2, Bowei Zhao3, Xiaorui Su3, Yang Li4, Zhuhong You2,5.
Abstract
As the basis for screening drug candidates, the identification of drug-target interactions (DTIs) plays a crucial role in the innovative drugs research. However, due to the inherent constraints of small-scale and time-consuming wet experiments, DTI recognition is usually difficult to carry out. In the present study, we developed a computational approach called RoFDT to predict DTIs by combining feature-weighted Rotation Forest (FwRF) with a protein sequence. In particular, we first encode protein sequences as numerical matrices by Position-Specific Score Matrix (PSSM), then extract their features utilize Pseudo Position-Specific Score Matrix (PsePSSM) and combine them with drug structure information-molecular fingerprints and finally feed them into the FwRF classifier and validate the performance of RoFDT on Enzyme, GPCR, Ion Channel and Nuclear Receptor datasets. In the above dataset, RoFDT achieved 91.68%, 84.72%, 88.11% and 78.33% accuracy, respectively. RoFDT shows excellent performance in comparison with support vector machine models and previous superior approaches. Furthermore, 7 of the top 10 DTIs with RoFDT estimate scores were proven by the relevant database. These results demonstrate that RoFDT can be employed to a powerful predictive approach for DTIs to provide theoretical support for innovative drug discovery.Entities:
Keywords: drug; rotation forest; support vector machine; target protein
Year: 2022 PMID: 35625469 PMCID: PMC9138819 DOI: 10.3390/biology11050741
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1Flow framework diagram of RoFDT model.
Figure 2The influence of different parameters of the FwRF classifier on classification accuracy.
5FCV prediction results obtained by RoFDT on the Enzyme dataset.
| Test Set | Accu.(%) | Sen.(%) | Prec.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 90.51 | 89.20 | 91.27 | 81.03 | 90.04 |
| 2 | 92.82 | 93.22 | 92.59 | 85.64 | 92.96 |
| 3 | 91.62 | 90.19 | 92.74 | 83.28 | 92.09 |
| 4 | 91.97 | 89.68 | 94.40 | 84.05 | 91.79 |
| 5 | 91.47 | 91.90 | 90.96 | 82.94 | 91.73 |
| Average | 91.68 ± 0.84 | 90.84 ± 1.68 | 92.39 ± 1.37 | 83.39 ± 1.68 | 91.72 ± 1.06 |
5FCV prediction results obtained by RoFDT on the Ion Channel dataset.
| Test Set | Accu.(%) | Sen.(%) | Prec.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 86.61 | 90.38 | 83.76 | 76.76 | 86.16 |
| 2 | 87.63 | 91.92 | 84.78 | 78.22 | 87.83 |
| 3 | 88.98 | 91.61 | 87.22 | 80.36 | 89.68 |
| 4 | 88.31 | 89.67 | 87.62 | 79.33 | 89.07 |
| 5 | 89.02 | 87.93 | 89.47 | 80.43 | 88.59 |
| Average | 88.11 ± 1.01 | 90.30 ± 1.61 | 86.57 ± 2.29 | 79.02 ± 1.55 | 88.27 ± 1.36 |
5FCV prediction results obtained by RoFDT on the GPCR dataset.
| Test Set | Accu.(%) | Sen.(%) | Prec.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 82.28 | 86.21 | 77.52 | 70.73 | 82.86 |
| 2 | 87.01 | 88.62 | 85.16 | 77.38 | 88.82 |
| 3 | 86.22 | 86.52 | 88.41 | 76.00 | 86.72 |
| 4 | 84.63 | 80.33 | 86.73 | 73.83 | 84.37 |
| 5 | 83.46 | 81.95 | 85.83 | 72.37 | 85.11 |
| Average | 84.72 ± 1.94 | 84.73 ± 3.45 | 84.73 ± 4.21 | 74.06 ± 2.68 | 85.57 ± 2.28 |
5FCV prediction results obtained by RoFDT on the Nuclear Receptor dataset.
| Test Set | Accu.(%) | Sen.(%) | Prec.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 69.44 | 83.33 | 65.22 | 55.90 | 72.22 |
| 2 | 77.78 | 85.00 | 77.27 | 64.34 | 69.69 |
| 3 | 80.56 | 92.31 | 66.67 | 67.47 | 74.25 |
| 4 | 83.33 | 77.78 | 87.50 | 72.05 | 75.31 |
| 5 | 80.56 | 71.43 | 93.75 | 68.03 | 85.08 |
| Average | 78.33 ± 5.34 | 81.97 ± 7.85 | 78.08 ± 12.56 | 65.56 ± 6.05 | 75.31 ± 5.87 |
Figure 3ROC curves of the 5FCV experiment acquired by RoFDT on the Enzyme dataset.
Figure 4ROC curves of the 5FCV experiment acquired by RoFDT on the Ion Channel dataset.
Figure 5ROC curves of the 5FCV experiment acquired by RoFDT on the GPCR dataset.
Figure 6ROC curves of the 5FCV experiment acquired by RoFDT on the Nuclear Receptor dataset.
5FCV outcomes of the LPQ combined with FwRF model on the four gold standard datasets.
| Dataset | Model | Accu.(%) | Sen.(%) | Prec.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| Enzyme | LPQ | 89.63 ± 0.39 | 89.69 ± 1.82 | 89.64 ± 2.16 | 79.32 ± 0.79 | 89.40 ± 0.98 |
| RoFDT | 91.68 ± 0.84 | 90.84 ± 1.68 | 92.39 ± 1.37 | 83.39 ± 1.68 | 91.72 ± 1.06 | |
| Ion Channel | LPQ | 83.97 ± 2.32 | 86.93 ± 3.03 | 81.89 ± 3.66 | 68.13 ± 4.54 | 84.66 ± 2.01 |
| RoFDT | 88.11 ± 1.01 | 90.30 ± 1.61 | 86.57 ± 2.29 | 79.02 ± 1.55 | 88.27 ± 1.36 | |
| GPCR | LPQ | 82.52 ± 2.17 | 83.87 ± 3.58 | 81.79 ± 3.78 | 65.19 ± 4.15 | 83.19 ± 1.79 |
| RoFDT | 84.72 ± 1.94 | 84.73 ± 3.45 | 84.73 ± 4.21 | 74.06 ± 2.68 | 85.57 ± 2.28 | |
| Nuclear Receptor | LPQ | 66.67 ± 7.35 | 67.64 ± 16.23 | 67.97 ± 9.98 | 35.46 ± 10.89 | 69.56 ± 6.85 |
| RoFDT | 78.33 ± 5.34 | 81.97 ± 7.85 | 78.08 ± 12.56 | 65.56 ± 6.05 | 75.31 ± 5.87 |
5FCV outcomes of different classifier models on the four gold standard datasets.
| Dataset | Model | Accu.(%) | Sen.(%) | Prec.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| Enzyme | SVM | 84.20 ± 0.60 | 69.90 ± 1.70 | 98.00 ± 0.50 | 71.50 ± 1.00 | 84.30 ± 1.20 |
| RoFDT | 91.68 ± 0.84 | 90.84 ± 1.68 | 92.39 ± 1.37 | 83.39 ± 1.68 | 91.72 ± 1.06 | |
| Ion Channel | SVM | 81.90 ± 1.20 | 69.70 ± 3.70 | 92.40 ± 2.20 | 66.00 ± 1.90 | 81.70 ± 1.20 |
| RoFDT | 88.11 ± 1.01 | 90.30 ± 1.61 | 86.57 ± 2.29 | 79.02 ± 1.55 | 88.27 ± 1.36 | |
| GPCR | SVM | 70.00 ± 2.10 | 50.40 ± 7.80 | 82.30 ± 3.30 | 42.80 ± 4.90 | 70.10 ± 2.70 |
| RoFDT | 84.72 ± 1.94 | 84.73 ± 3.45 | 84.73 ± 4.21 | 74.06 ± 2.68 | 85.57 ± 2.28 | |
| Nuclear Receptor | SVM | 63.30 ± 3.60 | 57.60 ± 7.90 | 67.50 ± 14.60 | 29.60 ± 7.40 | 61.80 ± 5.80 |
| RoFDT | 78.33 ± 5.34 | 81.97 ± 7.85 | 78.08 ± 12.56 | 65.56 ± 6.05 | 75.31 ± 5.87 |
Comparison with previous excellent models on the four gold standard dataset.
| Dataset | NetCBP [ | KBMF2K [ | RFDT [ | SIMCOMP [ | RoFDT |
|---|---|---|---|---|---|
| Enzyme | 0.8251 | 0.832 | 0.915 | 0.863 | 0.9172 |
| Ion Channel | 0.8034 | 0.799 | 0.890 | 0.776 | 0.8827 |
| GPCR | 0.8235 | 0.857 | 0.845 | 0.867 | 0.8557 |
| Nuclear Receptor | 0.8394 | 0.824 | 0.723 | 0.856 | 0.7531 |
Top 10 DTI pairs predicted by RoFDT on the SuperTarget database.
| Drug Name | Drug ID | Target Protein Name | Target Protein ID | Validation Database |
|---|---|---|---|---|
| Dihydroxypropyltheophylline | D00691 | PDE7A_HUMAN | has5150 | SuperTarget |
| Isotretinoino | D00348 | RXRA_HUM | hsa6256 | SuperTarget |
| Xanthotoxine | D00139 | CP1A1_hasAN | hsa1543 | SuperTarget |
| Loxapinsuccinate | D02340 | DRhasHUMAN | hsa1812 | SuperTarget |
| Prochlorpermazine | D00493 | has2A_HUMAN | hsa3356 | unconfirmed |
| Bromochlorotrifluoroethane | D00542 | CP2E1_HUMAN | hsa1571 | SuperTarget |
| Mifepristone | D00585 | ESR1_HUMAN | hsa2099 | SuperTarget |
| Olanzapine | D00454 | DRD2_HUMAN | hsa1813 | unconfirmed |
| Transdermal Nicotine | D03365 | ACHA4_HUMAN | hsa1137 | SuperTarget |
| Epoprostenol | D00106 | PE2R3_HUMAN | hsa5733 | unconfirmed |