| Literature DB >> 28894115 |
Zhengwei Li1, Pengyong Han2,3, Zhu-Hong You4, Xiao Li5, Yusen Zhang6, Haiquan Yu2, Ru Nie1, Xing Chen7.
Abstract
Analysis of drug-target interactions (DTIs) is of great importance in developing new drug candidates for known protein targets or discovering new targets for old drugs. However, the experimental approaches for identifying DTIs are expensive, laborious and challenging. In this study, we report a novel computational method for predicting DTIs using the highly discriminative information of drug-target interactions and our newly developed discriminative vector machine (DVM) classifier. More specifically, each target protein sequence is transformed as the position-specific scoring matrix (PSSM), in which the evolutionary information is retained; then the local binary pattern (LBP) operator is used to calculate the LBP histogram descriptor. For a drug molecule, a novel fingerprint representation is utilized to describe its chemical structure information representing existence of certain functional groups or fragments. When applying the proposed method to the four datasets (Enzyme, GPCR, Ion Channel and Nuclear Receptor) for predicting DTIs, we obtained good average accuracies of 93.16%, 89.37%, 91.73% and 92.22%, respectively. Furthermore, we compared the performance of the proposed model with that of the state-of-the-art SVM model and other previous methods. The achieved results demonstrate that our method is effective and robust and can be taken as a useful tool for predicting DTIs.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28894115 PMCID: PMC5593914 DOI: 10.1038/s41598-017-10724-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Five-fold cross validation results by our method on the Enzyme dataset.
| Test set | Pre (%) | Acc (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 94.10 | 93.33 | 92.12 | 86.67 | 93.21 |
| 2 | 92.37 | 93.33 | 94.29 | 86.69 | 92.94 |
| 3 | 93.06 | 92.56 | 92.28 | 85.13 | 91.93 |
| 4 | 92.92 | 92.91 | 91.73 | 85.74 | 92.22 |
| 5 | 93.46 | 93.68 | 94.09 | 87.35 | 94.13 |
| Average |
|
|
|
|
|
Five-fold cross validation results by our method on the Nuclear Receptor dataset.
| Test set | Prec (%) | Accu (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 94.12 | 94.44 | 94.12 | 88.85 | 93.50 |
| 2 | 83.33 | 88.89 | 93.75 | 78.26 | 91.67 |
| 3 | 90.91 | 91.67 | 95.24 | 82.83 | 86.69 |
| 4 | 90.00 | 94.44 | 100.00 | 89.44 | 97.81 |
| 5 | 85.00 | 91.67 | 100.00 | 84.60 | 95.31 |
| Average |
|
|
|
|
|
Five-fold cross validation results by our method on the GPCR dataset.
| Test set | Pre (%) | Acc (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 88.80 | 87.80 | 86.72 | 75.61 | 87.73 |
| 2 | 89.68 | 90.55 | 91.13 | 81.11 | 88.71 |
| 3 | 90.32 | 88.58 | 86.82 | 77.23 | 88.31 |
| 4 | 90.40 | 90.55 | 90.40 | 81.10 | 91.37 |
| 5 | 87.79 | 89.37 | 91.27 | 78.81 | 86.69 |
| Average |
|
|
|
|
|
Five-fold cross validation results by our method on the Icon Channel dataset.
| Test set | Pre (%) | Acc (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 90.72 | 91.19 | 91.35 | 82.37 | 90.89 |
| 2 | 90.28 | 91.36 | 93.51 | 82.71 | 91.91 |
| 3 | 91.26 | 93.39 | 94.91 | 86.81 | 93.75 |
| 4 | 91.47 | 91.02 | 90.54 | 82.04 | 90.14 |
| 5 | 90.79 | 91.69 | 92.93 | 83.41 | 91.85 |
| Average |
|
|
|
|
|
Figure 1ROC curves by our method on the Enzyme dataset.
Figure 4ROC curves by our method on the Nuclear Receptor dataset.
Comparisons of five-fold cross validation prediction performance using five different randomly selected negative training samples on the GPCR dataset.
| Negative Samples | Prec (%) | Accu (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 89.65 ± 2.20 | 88.61 ± 0.95 | 89.50 ± 1.62 | 79.21 ± 1.92 | 89.84 ± 1.26 |
| 2 | 89.13 ± 2.04 | 88.98 ± 2.25 | 87.87 ± 2.64 | 77.98 ± 2.59 | 86.67 ± 2.15 |
| 3 | 90.39 ± 2.33 | 90.16 ± 1.72 | 89.91 ± 1.39 | 80.35 ± 1.41 | 91.14 ± 1.64 |
| 4 | 91.68 ± 2.89 | 90.08 ± 2.51 | 88.36 ± 2.74 | 80.50 ± 2.03 | 90.04 ± 2.99 |
| 5 | 89.74 ± 2.69 | 89.37 ± 1.95 | 89.06 ± 2.98 | 78.88 ± 2.15 | 89.76 ± 2.43 |
Five-fold cross validation results on the Enzyme dataset of DVM and SVM.
| Model | Testing Set | Pre (%) | Acc (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| SVM | 1 | 90.24 | 90.68 | 91.31 | 81.37 | 90.70 |
| 2 | 88.64 | 89.15 | 89.71 | 78.30 | 88.47 | |
| 3 | 89.85 | 90.00 | 90.60 | 79.99 | 89.96 | |
| 4 | 90.39 | 91.71 | 92.78 | 83.44 | 91.37 | |
| 5 | 90.81 | 89.68 | 88.51 | 79.38 | 89.26 | |
| Average |
|
|
|
|
| |
| DVM | 1 | 94.10 | 93.33 | 92.12 | 86.67 | 93.21 |
| 2 | 92.37 | 93.33 | 94.29 | 86.69 | 92.94 | |
| 3 | 93.06 | 92.56 | 92.28 | 85.13 | 91.93 | |
| 4 | 92.92 | 92.91 | 91.73 | 85.74 | 92.22 | |
| 5 | 93.46 | 93.68 | 94.09 | 87.35 | 94.13 | |
| Average |
|
|
|
|
|
Five-fold cross validation results on the Nuclear Receptordataset of DVM and SVM.
| Model | Testing Set | Pre (%) | Acc (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| SVM | 1 | 83.33 | 83.33 | 83.33 | 66.67 | 84.26 |
| 2 | 80.00 | 86.11 | 94.12 | 73.41 | 86.38 | |
| 3 | 86.67 | 75.00 | 65.00 | 52.92 | 71.56 | |
| 4 | 76.47 | 75.00 | 72.22 | 50.08 | 73.46 | |
| 5 | 82.35 | 83.33 | 82.35 | 66.56 | 83.59 | |
| Average |
|
|
|
|
| |
| DVM | 1 | 94.12 | 94.44 | 94.12 | 88.85 | 93.50 |
| 2 | 83.33 | 88.89 | 93.75 | 78.26 | 91.67 | |
| 3 | 90.91 | 91.67 | 95.24 | 82.83 | 86.69 | |
| 4 | 90.00 | 94.44 | 100.00 | 89.44 | 97.81 | |
| 5 | 85.00 | 91.67 | 100.00 | 84.60 | 95.31 | |
| Average |
|
|
|
|
|
Figure 5Comparison of ROC curves between DVM and SVM on the Enzyme dataset.
Figure 8Comparison of ROC curves between DVM and SVM on the Nuclear Receptor dataset.
Prediction performances of NetCBP[21], Yamanishi et al.[22], KBMF2K[7], and our method on the four benchmark datasets in terms of average AUC.
| Dataset | Our method | Shen | NetCBP | Yamanishi | KBMF2K |
|---|---|---|---|---|---|
|
|
| 0.812 | 0.8251 | 0.845 | 0.832 |
|
|
| 0.875 | 0.8235 | 0.812 | 0.857 |
|
|
| 0.811 | 0.8034 | 0.731 | 0.799 |
|
|
| 0.871 | 0.8394 | 0.830 | 0.824 |
The four drug–target interaction datasets.
| Dataset | drug compounds | target proteins | Interactions |
|---|---|---|---|
|
| 445 | 664 | 2926 |
|
| 223 | 95 | 635 |
|
| 210 | 204 | 1476 |
|
| 54 | 26 | 90 |
Figure 9Flow chart of the proposed method.
Five-fold cross validation results on the GPCR dataset of DVM and SVM.
| Model | Testing Set | Pre (%) | Acc (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| SVM | 1 | 85.04 | 85.43 | 85.71 | 70.87 | 84.92 |
| 2 | 85.82 | 83.86 | 83.94 | 67.59 | 84.86 | |
| 3 | 84.03 | 84.25 | 82.64 | 68.42 | 85.43 | |
| 4 | 83.87 | 85.43 | 85.95 | 70.85 | 87.18 | |
| 5 | 87.41 | 88.58 | 90.77 | 77.19 | 88.64 | |
| Average |
|
|
|
|
| |
| DVM | 1 | 88.80 | 87.80 | 86.72 | 75.61 | 87.73 |
| 2 | 89.68 | 90.55 | 91.13 | 81.11 | 88.71 | |
| 3 | 90.32 | 88.58 | 86.82 | 77.23 | 88.31 | |
| 4 | 90.40 | 90.55 | 90.40 | 81.10 | 91.37 | |
| 5 | 87.79 | 89.37 | 91.27 | 78.81 | 86.69 | |
| Average |
|
|
|
|
|
Five-fold cross validation results on the Ion Channel dataset of DVM and SVM.
| Model | Testing Set | Pre (%) | Acc (%) | Sen (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| SVM | 1 | 84.46 | 85.93 | 87.11 | 71.90 | 85.40 |
| 2 | 85.71 | 84.58 | 83.11 | 69.19 | 85.19 | |
| 3 | 87.75 | 86.10 | 85.48 | 72.20 | 86.11 | |
| 4 | 83.55 | 84.75 | 86.39 | 69.53 | 85.48 | |
| 5 | 84.11 | 86.61 | 89.12 | 73.36 | 87.28 | |
| Average |
|
|
|
|
| |
| DVM | 1 | 90.72 | 91.19 | 91.35 | 82.37 | 90.89 |
| 2 | 90.28 | 91.36 | 93.51 | 82.71 | 91.91 | |
| 3 | 91.26 | 93.39 | 94.91 | 86.81 | 93.75 | |
| 4 | 91.47 | 91.02 | 90.54 | 82.04 | 90.14 | |
| 5 | 90.79 | 91.69 | 92.93 | 83.41 | 91.85 | |
| Average |
|
|
|
|
|