| Literature DB >> 29255285 |
Farshid Rayhan1, Sajid Ahmed1, Swakkhar Shatabda2, Dewan Md Farid1, Zaynab Mousavian3, Abdollah Dehzangi4, M Sohel Rahman5.
Abstract
Prediction of new drug-target interactions is critically important as it can lead the researchers to find new uses for old drugs and to disclose their therapeutic profiles or side effects. However, experimental prediction of drug-target interactions is expensive and time-consuming. As a result, computational methods for predictioning new drug-target interactions have gained a tremendous interest in recent times. Here we present iDTI-ESBoost, a prediction model for identification of drug-target interactions using evolutionary and structural features. Our proposed method uses a novel data balancing and boosting technique to predict drug-target interaction. On four benchmark datasets taken from a gold standard data, iDTI-ESBoost outperforms the state-of-the-art methods in terms of area under receiver operating characteristic (auROC) curve. iDTI-ESBoost also outperforms the latest and the best-performing method found in the literature in terms of area under precision recall (auPR) curve. This is significant as auPR curves are argued as suitable metric for comparison for imbalanced datasets similar to the one studied here. Our reported results show the effectiveness of the classifier, balancing methods and the novel features incorporated in iDTI-ESBoost. iDTI-ESBoost is a novel prediction method that has for the first time exploited the structural features along with the evolutionary features to predict drug-protein interactions. We believe the excellent performance of iDTI-ESBoost both in terms of auROC and auPR would motivate the researchers and practitioners to use it to predict drug-target interactions. To facilitate that, iDTI-ESBoost is implemented and made publicly available at: http://farshidrayhan.pythonanywhere.com/iDTI-ESBoost/ .Entities:
Mesh:
Substances:
Year: 2017 PMID: 29255285 PMCID: PMC5735173 DOI: 10.1038/s41598-017-18025-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of evolutionary and structural features used for protein targets and fingerprint features for drugs.
| Feature Group | Number of Features | Feature Type | Group |
|---|---|---|---|
| Molecular finger print | 881 | drug | |
| PSSM bigram | 400 | target | A |
| Secondary Structure Composition | 3 | target | B |
| Accessible Surface Area Composition | 1 | ||
| Torsional Angles Composition | 8 | ||
| Torsional Angles Auto-Covariance | 80 | target | C |
| Structural Probabilities Auto-Covariance | 30 | ||
| Torsional Angles bigram | 64 | target | D |
| Structural Probabilities bigram | 9 | ||
| Total | 1476 |
The “Group” column shows different feature groups used in our experiments and will be discussed in a later section.
Performance of AdaBoost, Random Forest and Support Vector Machine classifiers on the gold standard datasets in terms of area under Receiver Operating Characteristic (ROC) curve (auROC) and area under precision recall curve (auPR) using different feature group combinations and random under sampling.
| Dataset | Feature Combination | Classifier | auPR | auROC |
|---|---|---|---|---|
| enzymes | A | AdaBoost | 0.54 | 0.9530 |
| Random Forest | 0.43 | 0.9457 | ||
| SVM |
|
| ||
| A, B | AdaBoost |
| 0.9431 | |
| Random Forest | 0.49 | 0.9445 | ||
| SVM | 0.48 |
| ||
| A, B, C | AdaBoost |
|
| |
| Random Forest | 0.48 | 0.9334 | ||
| SVM | 0.41 | 0.9360 | ||
| A, B, C, D | AdaBoost |
|
| |
| Random Forest | 0.50 | 0.9493 | ||
| SVM | 0.63 | 0.9628 | ||
| ion channels | A | AdaBoost |
| 0.9271 |
| Random Forest | 0.33 | 0.9232 | ||
| SVM | 0.25 |
| ||
| A, B | AdaBoost |
| 0.9191 | |
| Random Forest | 0.30 | 0.8898 | ||
| SVM | 0.23 |
| ||
| A, B, C | AdaBoost |
| 0.9202 | |
| Random Forest | 0.31 | 0.8734 | ||
| SVM | 0.23 |
| ||
| A, B, C, D | AdaBoost |
|
| |
| Random Forest | 0.40 | 0.9234 | ||
| SVM | 0.14 | 0.6723 | ||
| GPCRs | A | AdaBoost |
|
|
| Random Forest | 0.23 | 0.8743 | ||
| SVM | 0.18 | 0.7832 | ||
| A, B | AdaBoost |
|
| |
| Random Forest | 0.22 | 0.8698 | ||
| SVM | 0.15 | 0.7802 | ||
| A, B, C | AdaBoost |
|
| |
| Random Forest | 0.31 | 0.9034 | ||
| SVM | 0.15 | 0.7945 | ||
| A, B, C, D | AdaBoost |
|
| |
| Random Forest | 0.30 | 0.9168 | ||
| SVM | 0.21 | 0.7896 | ||
| nuclear receptors | A | AdaBoost |
|
|
| Random Forest | 0.23 | 0.7519 | ||
| SVM | 0.19 | 0.7898 | ||
| A, B | AdaBoost |
|
| |
| Random Forest | 0.29 | 0.7723 | ||
| SVM | 0.20 | 0.6789 | ||
| A, B, C | AdaBoost |
|
| |
| Random Forest | 0.21 | 0.7234 | ||
| SVM | 0.21 | 0.6971 | ||
| A, B, C, D | AdaBoost |
|
| |
| Random Forest | 0.29 | 0.7145 | ||
| SVM | 0.20 | 0.7287 |
Performance of Adaboost classifier on different datasets in terms of area under Receiver Operating Characteristic (ROC) curve (auROC) and area under precision recall curve (auPR) using different feature group combinations and balancing methods.
| Dataset | Feature Combination | Balancing Method | auPR | auROC |
|---|---|---|---|---|
| enzymes | A | random | 0.54 | 0.9530 |
| clustered | 0.58 | 0.9493 | ||
| A, B | random | 0.51 | 0.9431 | |
| clustered | 0.59 | 0.9353 | ||
| A, B, C | random | 0.66 | 0.9638 | |
| clustered | 0.63 | 0.9577 | ||
| A, B, C, D | random | 0.65 |
| |
| clustered |
| 0.9598 | ||
| ion channels | A | random | 0.36 | 0.9271 |
| clustered | 0.38 | 0.8982 | ||
| A, B | random | 0.33 | 0.9191 | |
| clustered | 0.41 | 0.8902 | ||
| A, B, C | random | 0.34 | 0.9202 | |
| clustered | 0.45 | 0.9021 | ||
| A, B, C, D | random | 0.43 |
| |
| clustered |
| 0.9051 | ||
| GPCRs | A | random | 0.29 | 0.8856 |
| clustered | 0.48 | 0.9189 | ||
| A, B | random | 0.29 | 0.8834 | |
| clustered | 0.49 | 0.8968 | ||
| A, B, C | random | 0.35 | 0.9116 | |
| clustered |
| 0.8890 | ||
| A, B, C, D | random | 0.31 | 0.9128 | |
| clustered | 0.48 |
| ||
| nuclear receptors | A | random | 0.41 | 0.8145 |
| clustered | 0.79 | 0.9270 | ||
| A, B | random | 0.43 | 0.7969 | |
| clustered | 0.32 | 0.8715 | ||
| A, B, C | random | 0.36 | 0.7590 | |
| clustered | 0.57 | 0.8935 | ||
| A, B, C, D | random | 0.33 | 0.7946 | |
| clustered |
|
|
Figure 1Precision-Recall curves of different classifier algorithms using random under sampling and all the feature combinations on four datasets: (a) enzymes (b) ion channels (ic) (c) GPCRs (d) nuclear receptors (nr).
Figure 2Receiver operating characteristic (ROC) curves of different classifier algorithms using random under sampling and all the feature combinations on four datasets: (a) enzymes (b) ion channels (ic) (c) GPCRs (d) nuclear receptors (nr).
Parameters of AdaBoost Algorithm used with decision tree as weak classifier along with different balancing methods on four datasets.
| Balancing method | Dataset | Max depth | Min sample split | Min samples Leaf | Criterion |
|---|---|---|---|---|---|
| random | enzymes | 100 | 16 | 1 | Gini impurity |
| ion channels | 8 | 4 | 1 | Gini impurity | |
| GPCRs | 6 | 3 | 1 | Gini impurity | |
| nuclear receptors | 5 | 7 | 2 | Gini impurity | |
| clustered | enzymes | 110 | 1 | 1 | Gini impurity |
| ion channels | 9 | 2 | 1 | Gini impurity | |
| GPCRs | 6 | 3 | 1 | Gini impurity | |
| nuclear receptors | 150 | 2 | 1 | Gini impurity |
Figure 3Receiver operating characteristic (ROC) curves of AdaBoost classifier showing differences between random under sampling and cluster based sampling using all the feature combinations on four datasets: (a) enzymes (b) ion channels (ic) (c) GPCRs (d) nuclear receptors (nr).
Figure 4Precision vs Recall curves of AdaBoost classifier showing differences between random under sampling and cluster based sampling using all the feature combinations on four datasets: (a) enzymes (b) ion channels (ic) (c) GPCRs (d) nuclear receptors (nr).
Performance of iDTI-ESBoost on the four benchmark gold datasets in terms on area under receiver operating characteristic curve (auROC) with comparison to other state-of-the-art methods.
| Dataset | DBSI[ | KBMF2K[ | NetCBP[ | Yamanishi | Yamanishi | Wang | Mousavian | iDTI-ESBoost |
|---|---|---|---|---|---|---|---|---|
| enzymes | 0.8075 | 0.8320 | 0.8251 | 0.904 | 0.8920 | 0.8860 | 0.9480 |
|
| ion channels | 0.8029 | 0.7990 | 0.8034 | 0.8510 | 0.8120 | 0.8930 | 0.8890 |
|
| GPCRs | 0.8022 | 0.8570 | 0.8235 | 0.8990 | 0.8270 | 0.8730 | 0.8720 |
|
| nuclear receptors | 0.7578 | 0.8240 | 0.8394 | 0.8430 | 0.8350 | 0.8240 | 0.8690 |
|
Comparison of the performance of iDTI-ESBoost on the four benchmark gold datasets in terms on area under the precision-recall curve (auPR) with the state-of-the-art method in Mousavian et al.[22].
| Predictor | Enzymes | Ion channels | GPCRs | Nuclear receptors |
|---|---|---|---|---|
| Mousavian | 0.546 | 0.390 | 0.282 | 0.411 |
| iDTI-ESBoost |
|
|
|
|
Specificity, Sensitivity, Precision, MCC and F1 score for four datasets as achieved by iDTI-ESBoost using different feature groups.
| Dataset | Feature Group | Specificity | Sensitivity | Precision | MCC | F1 score |
|---|---|---|---|---|---|---|
| enzymes | A | 0.83 | 0.9 | 0.05 | 0.1962 | 0.10 |
| A, B | 0.82 | 0.89 | 0.05 | 0.1812 | 0.09 | |
| A, B, C | 0.83 | 0.87 | 0.05 | 0.1762 | 0.09 | |
| A, B, C, D | 0.85 | 0.85 | 0.15 | 0.1889 | 0.10 | |
| Ion channels | A | 0.78 | 0.81 | 0.13 | 0.2615 | 0.22 |
| A, B | 0.78 | 0.84 | 0.14 | 0.256 | 0.24 | |
| A, B, C | 0.8 | 0.86 | 0.12 | 0.2980 | 0.20 | |
| A, B, C, D | 0.78 | 0.84 | 0.13 | 0.2913 | 0.20 | |
| GPCRs | A | 0.78 | 0.84 | 0.12 | 0.254 | 0.20 |
| A, B | 0.8 | 0.85 | 0.11 | 0.2760 | 0.20 | |
| A, B, C | 0.79 | 0.89 | 0.11 | 0.2797 | 0.19 | |
| A, B, C, D | 0.8 | 0.84 | 0.11 | 0.2647 | 0.19 | |
| Nuclear receptors | A | 0.85 | 0.91 | 0.16 | 0.2141 | 0.27 |
| A, B | 0.77 | 0.88 | 0.11 | 0.2154 | 0.19 | |
| A, B, C | 0.81 | 0.88 | 0.12 | 0.1798 | 0.20 | |
| A, B, C, D | 0.92 | 0.87 | 0.14 | 0.2253 | 0.24 |
New prediction made by iDTI-ESBoost for four gold standard datasets used in this paper.
| Dataset | Protein Id | Drug Id | Drug Name | Score |
|---|---|---|---|---|
| Enzymes | hsa:10825 | D00041 | Threonine (USP) | 0.7207 |
| hsa:4759 | D00041 | Threonine (USP) | 0.7163 | |
| hsa:129807 | D00041 | Threonine (USP) | 0.7163 | |
| hsa:4953 | D00041 | Threonine (USP) | 0.7095 | |
| hsa:1845 | D00041 | Threonine (USP) | 0.7078 | |
| hsa:9610 | D00041 | Threonine (USP) | 0.7073 | |
| hsa:6652 | D00041 | Threonine (USP) | 0.7034 | |
| hsa:1734 | D00136 | Haloperidol (JP17/USP/INN) | 0.6995 | |
| hsa:1178 | D03643 | Dalvastatin (USAN/INN) | 0.6985 | |
| hsa:8435 | D03643 | Dalvastatin (USAN/INN) | 0.6962 | |
| Ion channels | hsa:285242 | D00294 | Diazoxide (JAN/USP/INN) | 0.9407 |
| hsa:779 | D00294 | Diazoxide (JAN/USP/INN) | 0.9366 | |
| hsa:2561 | D00294 | Diazoxide (JAN/USP/INN) | 0.9357 | |
| hsa:785 | D00294 | Diazoxide (JAN/USP/INN) | 0.9353 | |
| hsa:11254 | D00294 | Diazoxide (JAN/USP/INN) | 0.935 | |
| hsa:3775 | D00225 | Alprazolam (JP17/USP/INN) | 0.9339 | |
| hsa:6263 | D00294 | Diazoxide (JAN/USP/INN) | 0.932 | |
| hsa:6324 | D02261 | Quinine hydrochloride hydrate (JP17) | 0.9305 | |
| hsa:6324 | D02262 | Quinine sulfate (USP) | 0.9305 | |
| hsa:6332 | D02262 | Quinine sulfate (USP) | 0.8464 | |
| GPCRs | hsas:9052 | D04625 | Isoetharine (USP) | 0.9311 |
| hsa:9052 | D00632 | Dobutamine hydrochloride (JP17/USP) | 0.9311 | |
| hsa:9052 | D03880 | Dobutamine lactobionate (USAN) | 0.9311 | |
| hsa:9052 | D03881 | Dobutamine tartrate (USP) | 0.9311 | |
| hsa:1909 | D03621 | Cyclizine (INN) | 0.931 | |
| hsa:57105 | D01712 | Theophylline sodium acetate (JAN) | 0.9307 | |
| hsa:155 | D02671 | Mesoridazine (USAN/INN) | 0.9306 | |
| hsa:148 | D02614 | Denopamine (JAN/INN) | 0.9303 | |
| hsa:155 | D00480 | Promethazine hydrochloride (JP17/USP) | 0.9302 | |
| hsa:1909 | D00480 | Promethazine hydrochloride (JP17/USP) | 0.9302 | |
| Nuclear receptors | hsa:2099 | D01132 | Tazarotene (JAN/USAN/INN) | 0.9792 |
| hsa:2101 | D00956 | Nandrolone phenpropionate (USP) | 0.9755 | |
| hsa:2101 | D00443 | Spironolactone (JP17/USP/INN) | 0.9758 | |
| hsa:2099 | D00316 | Etretinate (JAN/USAN/INN) | 0.9602 | |
| hsa:9971 | D00316 | Etretinate (JAN/USAN/INN) | 0.9593 | |
| hsa:2101 | D00327 | Fluoxymesterone (JP17/USP/INN) | 0.9591 | |
| hsa:2101 | D00088 | Hydrocortisone (JP17/USP/INN) | 0.9571 | |
| hsa:2101 | D00075 | Testosterone (JAN/USP) | 0.9558 | |
| hsa:2099 | D00565 | Fenofibrate (JAN/INN) | 0.9557 | |
| hsa:2101 | D00462 | Oxandrolone (JAN/USP/INN) | 0.9557 |
Figure 5Schematic diagram of the training module of iDTI-ESBoost showing the steps of the training phase.
Description of the gold standard datasets[16].
| Dataset | Drugs | Proteins | Positive Interactions | Imbalance Ratio |
|---|---|---|---|---|
| Enzyme | 445 | 664 | 2926 | 99.98 |
| Ion Chanel | 210 | 204 | 1476 | 28.02 |
| GPCR | 223 | 95 | 635 | 32.36 |
| Nuclear Receptor | 54 | 26 | 90 | 14.6 |