| Literature DB >> 28813000 |
Cong Shen1,2, Yijie Ding3,4, Jijun Tang5,6,7, Xinying Xu8, Fei Guo9,10.
Abstract
The prediction of drug-target interactions (DTIs) via computational technology plays a crucial role in reducing the experimental cost. A variety of state-of-the-art methods have been proposed to improve the accuracy of DTI predictions. In this paper, we propose a kind of drug-target interactions predictor adopting multi-scale discrete wavelet transform and network features (named as DAWN) in order to solve the DTIs prediction problem. We encode the drug molecule by a substructure fingerprint with a dictionary of substructure patterns. Simultaneously, we apply the discrete wavelet transform (DWT) to extract features from target sequences. Then, we concatenate and normalize the target, drug, and network features to construct feature vectors. The prediction model is obtained by feeding these feature vectors into the support vector machine (SVM) classifier. Extensive experimental results show that the prediction ability of DAWN has a compatibility among other DTI prediction schemes. The prediction areas under the precision-recall curves (AUPRs) of four datasets are 0 . 895 (Enzyme), 0 . 921 (Ion Channel), 0 . 786 (guanosine-binding protein coupled receptor, GPCR), and 0 . 603 (Nuclear Receptor), respectively.Entities:
Keywords: discrete wavelet transform; drug–target interactions; network property; support vector machine
Mesh:
Year: 2017 PMID: 28813000 PMCID: PMC5578170 DOI: 10.3390/ijms18081781
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Statistics of DTI datasets [4].
| Drugs ( | Targets ( | Interactions | Ratio ( | |
|---|---|---|---|---|
| Enzyme | 445 | 664 | 2926 | 0.67 |
| IC | 210 | 204 | 1476 | 1.03 |
| GPCR | 223 | 95 | 635 | 2.35 |
| Nuclear receptors | 54 | 26 | 90 | 2.08 |
IC: ion channel; GPCR: guanosine-binding protein coupled receptor.
Comparison of the prediction performance between different features on balanced datasets.
| Dataset | Feature | ACC | Sn | SP | AUC |
|---|---|---|---|---|---|
| Enzyme | DWT + MACCS | 0.867 ± 0.002 | 0.861 ± 0.004 | 0.873 ± 0.003 | 0.925 ± 0.003 |
| DWT + MACCS (FS) | 0.895 ± 0.001 | 0.901 ± 0.003 | 0.889 ± 0.003 | 0.949 ± 0.001 | |
| DWT + NET + MACCS | 0.932 ± 0.003 | 0.933 ± 0.002 | 0.933 ± 0.002 | 0.977 ± 0.002 | |
| DWT + NET + MACCS (FS) | 0.938 ± 0.002 | 0.938 ± 0.002 | 0.939 ± 0.004 | 0.980 ± 0.001 | |
| IC | DWT + MACCS | 0.864 ± 0.003 | 0.868 ± 0.004 | 0.861 ± 0.005 | 0.929 ± 0.004 |
| DWT + MACCS (FS) | 0.879 ± 0.004 | 0.891 ± 0.004 | 0.866 ± 0.007 | 0.935 ± 0.003 | |
| DWT + NET + MACCS | 0.940 ± 0.004 | 0.932 ± 0.005 | 0.943 ± 0.006 | 0.978 ± 0.003 | |
| DWT + NET + MACCS (FS) | 0.943 ± 0.002 | 0.938 ± 0.003 | 0.949 ± 0.003 | 0.983 ± 0.001 | |
| GPCR | DWT + MACCS | 0.826 ± 0.005 | 0.831 ± 0.003 | 0.822 ± 0.007 | 0.872 ± 0.004 |
| DWT + MACCS (FS) | 0.836 ± 0.006 | 0.846 ± 0.007 | 0.827 ± 0.009 | 0.892 ± 0.005 | |
| DWT + NET + MACCS | 0.872 ± 0.004 | 0.872 ± 0.005 | 0.872 ± 0.003 | 0.934 ± 0.005 | |
| DWT + NET + MACCS (FS) | 0.890 ± 0.005 | 0.888 ± 0.009 | 0.891 ± 0.011 | 0.950 ± 0.002 | |
| Nuclear receptor | DWT + MACCS | 0.750 ± 0.011 | 0.619 ± 0.013 | 0.879 ± 0.021 | 0.816 ± 0.015 |
| DWT + MACCS (FS) | 0.791 ± 0.017 | 0.790 ± 0.018 | 0.793 ± 0.036 | 0.850 ± 0.016 | |
| DWT + NET + MACCS | 0.805 ± 0.021 | 0.767 ± 0.017 | 0.837 ± 0.013 | 0.866 ± 0.011 | |
| DWT + NET + MACCS (FS) | 0.860 ± 0.009 | 0.855 ± 0.013 | 0.867 ± 0.024 | 0.931 ± 0.009 |
DWT: discrete wavelet transform; FS: feature selection; NET: network features; MACCS: drug features of molecular access system.
Figure 1The area under the Receiver Operating characteristic Curve (ROC) values obtained on balanced datasets (with FS). The blue curve is the combined feature of MACCS (chem), DWT (bio), and net. The red curve is the combined feature of MACCS (chem) and DWT (bio); (a) Enzyme’s ROC curve with network feature; (b) IC ’s ROC curve with network feature; (c) GPCR’s ROC curve with network feature; (d) Nuclear receptor’s ROC curve with network feature.
Figure 2The area under the precision–recall (PR) curve (AUPR) values obtained on balanced datasets (with FS). The blue curve is the combined feature of MACCS (chem), DWT (bio), and net. The red curve is the combined feature of MACCS (chem) and DWT (bio); (a) Enzyme’s PR curve with network feature; (b) IC’s PR curve with network feature; (c) GPCR’s PR curve with network feature; (d) Nuclear receptor’s PR curve with network feature.
The mean AUC values of five methods on balanced datasets.
| Methods | Enzyme | IC | GPCR | Nuclear Receptor |
|---|---|---|---|---|
| Cao’s work [ | 0.979 | 0.951 | 0.924 | |
| BGL | 0.904 | 0.851 | 0.899 | 0.843 |
| BLM | 0.976 | 0.973 | 0.881 | |
| NetLapRLS | 0.956 | 0.947 | 0.931 | 0.856 |
| RLS | 0.978 | 0.984 | 0.954 | 0.922 |
| DAWN (our method) | 0.983 | 0.950 |
Results excerpted from [14]. The best results in each column are in bold faces. BGL: bipartite graph learning; BLM: bipartite local model; NetLapRLS: Laplacian regularized least square based on interaction network; RLS: regularized least square. DAWN: prediction of Drug–tArget interactions based on multi-scale discrete Wavelet transform and Network features.
Overall AUC and AUPR values of different methods on imbalanced dataset for four species.
| Evaluation | Method | Enzyme | Ion Channel | GPCR | Nuclear Receptor |
|---|---|---|---|---|---|
| AUC | NetLapRLS | 0.972 ± 0.002 | 0.969 ± 0.003 | 0.915 ± 0.006 | 0.850 ± 0.021 |
| BLM-NII | 0.978 ± 0.002 | 0.981 ± 0.002 | 0.950 ± 0.006 | 0.905 ± 0.023 | |
| WNN-GIP | 0.964 ± 0.003 | 0.959 ± 0.003 | 0.944 ± 0.005 | 0.901 ± 0.017 | |
| KBMF2K | 0.905 ± 0.003 | 0.961 ± 0.003 | 0.926 ± 0.006 | 0.877 ± 0.023 | |
| CMF | 0.969 ± 0.002 | 0.981 ± 0.002 | 0.940 ± 0.007 | 0.864 ± 0.026 | |
| NRLMF | |||||
| DAWN | |||||
| AUPR | NetLapRLS | 0.789 ± 0.005 | 0.837 ± 0.009 | 0.616 ± 0.015 | 0.465 ± 0.044 |
| BLM-NII | 0.752 ± 0.011 | 0.821 ± 0.012 | 0.524 ± 0.024 | ||
| WNN-GIP | 0.706 ± 0.017 | 0.717 ± 0.020 | 0.520 ± 0.021 | 0.589 ± 0.034 | |
| KBMF2K | 0.654 ± 0.008 | 0.771 ± 0.009 | 0.578 ± 0.018 | 0.534 ± 0.050 | |
| CMF | 0.877 ± 0.005 | 0.745 ± 0.013 | 0.584 ± 0.042 | ||
| NRLMF | 0.906 ± 0.008 | ||||
| DAWN | 0.603 ± 0.087 |
Results excerpted from [12]. The best results in each column are in bold faces and the second best results are underlined. BLM-NII: improved BLM with neighbor-based interaction-profile inferring; CMF: collaborative matrix factorization; KBMF2K: kernelized Bayesian matrix factorization with twin kernels; NRLMF: neighborhood regularized logistic matrix factorization; WNN-GIP: weighted nearest neighbor with Gaussian interaction profile kernels.
Figure 3ROC of imbalanced datasets by 10-fold cross-validation; (a) Enzyme’s ROC curve with network feature; (b) IC’s ROC curve with network feature; (c) GPCR’s ROC curve with network feature; (d) Nuclear receptor’s ROC curve with network feature.
Figure 4AUPR of imbalanced datasets by 10-fold cross-validation. (a) Enzyme’s PR curve with network feature. (b) IC’s PR curve with network feature. (c) GPCR’s PR curve with network feature. (d) Nuclear receptor’s PR curve with network feature.
Top five new DTIs predicted by DAWN on four data sets.
| Dataset | Rank | Drug | Target | Databases |
|---|---|---|---|---|
| Enzyme | 1 | D00545 | hsa1571 | |
| 2 | D03365 | hsa1571 | ||
| 3 | D00437 | hsa1559 | M | |
| 4 | D00546 | hsa1571 | ||
| 5 | D00184 | hsa5478 | D | |
| Ion channel | 1 | D00542 | hsa6262 | |
| 2 | D00542 | hsa6263 | M | |
| 3 | D00349 | hsa6263 | ||
| 4 | D00477 | hsa6336 | C | |
| 5 | D01448 | hsa3782 | ||
| GPCR | 1 | D01051 | hsa3269 | |
| 2 | D00563 | hsa3269 | D, M | |
| 3 | D00563 | hsa1812 | D | |
| 4 | D00715 | hsa1129 | D, K | |
| 5 | D00563 | hsa1129 | ||
| Nuclear receptor | 1 | D01689 | hsa5241 | |
| 2 | D01115 | hsa5241 | ||
| 3 | D00443 | hsa5241 | D | |
| 4 | D00443 | hsa367 | D | |
| 5 | D00187 | hsa2099 |
C: ChEMBL; D: DrugBank; K: KEGG; M: Matador.
Six physicochemical properties of 20 amino acid types.
| Amino Acid | H | VSC | P1 | P2 | SASA | NCISC |
|---|---|---|---|---|---|---|
| A | 0.62 | 27.5 | 8.1 | 0.046 | 1.181 | 0.007187 |
| C | 0.29 | 44.6 | 5.5 | 0.128 | 1.461 | −0.03661 |
| D | −0.9 | 40 | 13 | 0.105 | 1.587 | −0.02382 |
| E | −0.74 | 62 | 12.3 | 0.151 | 1.862 | 0.006802 |
| F | 1.19 | 115.5 | 5.2 | 0.29 | 2.228 | 0.037552 |
| G | 0.48 | 0 | 9 | 0 | 0.881 | 0.179052 |
| H | −0.4 | 79 | 10.4 | 0.23 | 2.025 | −0.01069 |
| I | 1.38 | 93.5 | 5.2 | 0.186 | 1.81 | 0.021631 |
| K | −1.5 | 100 | 11.3 | 0.219 | 2.258 | 0.017708 |
| L | 1.06 | 93.5 | 4.9 | 0.186 | 1.931 | 0.051672 |
| M | 0.64 | 94.1 | 5.7 | 0.221 | 2.034 | 0.002683 |
| N | −0.78 | 58.7 | 11.6 | 0.134 | 1.655 | 0.005392 |
| P | 0.12 | 41.9 | 8 | 0.131 | 1.468 | 0.239531 |
| Q | −0.85 | 80.7 | 10.5 | 0.18 | 1.932 | 0.049211 |
| R | −2.53 | 105 | 10.5 | 0.291 | 2.56 | 0.043587 |
| S | −0.18 | 29.3 | 9.2 | 0.062 | 1.298 | 0.004627 |
| T | −0.05 | 51.3 | 8.6 | 0.108 | 1.525 | 0.003352 |
| V | 1.08 | 71.5 | 5.9 | 0.14 | 1.645 | 0.057004 |
| W | 0.81 | 145.5 | 5.4 | 0.409 | 2.663 | 0.037977 |
| Y | 0.26 | 117.3 | 6.2 | 0.298 | 2.368 | 0.023599 |
H: hydrophobicity; VSC: volumes of side chains of amino acids; P1: polarity; P2: polarizability; SASA: solvent-accessible surface area; NCISC: net charge index of side chains.
Figure 5Wavelet decomposition tree.
Figure 6Overview of the drug–target interaction (DTI) prediction.
Figure 7Flow chart. DWT: discrete wavelet transform; DCT: discrete cosine transform; Std: standard deviation; SVM: support vector machine.