| Literature DB >> 32047432 |
Huiqing Wang1, Jingjing Wang1, Chunlin Dong2, Yuanyuan Lian1, Dan Liu1, Zhiliang Yan1.
Abstract
Drug targets are biomacromolecules or biomolecular structures that bind to specific drugs and produce therapeutic effects. Therefore, the prediction of drug-target interactions (DTIs) is important for disease therapy. Incorporating multiple similarity measures for drugs and targets is of essence for improving the accuracy of prediction of DTIs. However, existing studies with multiple similarity measures ignored the global structure information of similarity measures, and required manual extraction features of drug-target pairs, ignoring the non-linear relationship among features. In this paper, we proposed a novel approach MDADTI for DTIs prediction based on MDA. MDADTI applied random walk with restart method and positive pointwise mutual information to calculate the topological similarity matrices of drugs and targets, capturing the global structure information of similarity measures. Then, MDADTI applied multimodal deep autoencoder to fuse multiple topological similarity matrices of drugs and targets, automatically learned the low-dimensional features of drugs and targets, and applied deep neural network to predict DTIs. The results of 5-repeats of 10-fold cross-validation under three different cross-validation settings indicated that MDADTI is superior to the other four baseline methods. In addition, we validated the predictions of the MDADTI in six drug-target interactions reference databases, and the results showed that MDADTI can effectively identify unknown DTIs.Entities:
Keywords: drug-target interactions; multimodal deep autoencoder; multiple similarity measures; positive pointwise mutual information; random walk with restart
Year: 2020 PMID: 32047432 PMCID: PMC6997437 DOI: 10.3389/fphar.2019.01592
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Figure 1The overall framework of MDADTI method. (A) MDADTI applied RWR method and PPMI to calculate topological similarity matrices of drugs (targets); (B) MDA was applied to fuse multiple topological similarity matrices of drugs (targets) and automatically learned the low-dimensional features of drugs (targets); (C) DNN was applied to predict DTIs.
Summary of drug-target interaction data.
| Datasets | Number of drugs | Number of targets | Number of positive interactions | Number of negative interactions | Total number of interactions |
|---|---|---|---|---|---|
| NR | 54 | 26 | 90 | 1314 | 1404 |
| GPRC | 223 | 95 | 635 | 20550 | 21185 |
| IC | 210 | 204 | 1476 | 41364 | 42840 |
| E | 445 | 664 | 2926 | 292554 | 295480 |
| DrugBank_FDA | 1482 | 1408 | 9881 | 2076775 | 2086656 |
Summary of multiple similarity measures of drugs and targets.
| Dataset | Entity | Information source | Similarity measures |
|---|---|---|---|
| Drug | Chemical structure fingerprints | TAN-Tanimoto Kernel | |
| Side-effects | AERS-bit-AERS bit | ||
| Target | Functional annotation | GO - Gene Ontology Semantic Similarity | |
| Sequences | MIS-k3m1-Mismatch kernel | ||
| Protein-protein Interactions | PPI-Proximity in | ||
| DrugBank_FDA | Drug | Molecular fingerprints | CDK_Standard, CDK_Graph, |
| ATC code | _FDA_FirstLevel, | ||
| Drug interaction profile | D_interactions_FDA | ||
| side-effects | SIDER-Side-effects Similarity | ||
| Drug- induced gene expression | Cmap_v2_MCF7 | ||
| Drug pathways profiles | KEGG_Drug_2_Pathway | ||
| Drug disease profiles | KEGG_Drug_Compound_ | ||
| Target | Amino acid sequence | mismatch_kernel_3_1, | |
| GO annotations | CC_WANG_BMA | ||
| Proximity in the PPI network | shortest_path_networkX_distance_UP_ID_Sim_Perlman, | ||
| Protein domain profiles | protein2ipr_binaryMatrix | ||
| Gene expression similarity profiles | Cmap_v2_MCF7 | ||
| Target disease | KEGG_Gene_2_Disease | ||
| Target pathway | KEGG_Gene_2_Pathway |
Figure 2Structure diagram of MDA. The MDA consists of two parts: encoder and decoder, the inputs of encoder are multiple topological similarity matrices , the hidden layer in the red box is feature layer whose output is the low-dimensional feature matrix of drugs , the output of decoder are multiple reconstructed topological similarity matrices .
Figure 3The layer configurations diagram of MDA. The layer configurations are [n*m, n*100, n*75, 50, n*75, n*100, n*m]. It consists of 7 layers of neurons, including 1 input layer n*m, where n is the number of input similarity measures, and m is the number of columns of each similarity matrix, i.e. the number of drugs (targets), 1 output layer n*m, 2 encoding layers n*100 and n*75, 2 decoding layers n*75 and n*100, 1 feature layer with 50 neurons.
The comparison results of MDADTI models with different layer configurations of two MDAs under 5-repeats of 10-fold cross-validation on four datasets. The AUC and AUPR values in bold are highest among three sets of evaluation indicator values corresponding tree different layer configurations of MDAs in each dataset.
| Datasets | Different layer configurations of MDAs | AUC | AUPR | |
|---|---|---|---|---|
| GPCR | drug | [n*nd,50,n*nd] |
|
|
| target | [n*nt,25,n*nt] | |||
| drug | [n*nd,n*75,50,n*75,n*nd] | 0.965 | 0.963 | |
| target | [n*nt,n*50,25,n*50,n*nt] | |||
| drug | [ n*nd,n*150,n*75,50,n*75,n*150,n*nd] | 0.930 | 0.925 | |
| target | [ n*nt,n*75,n*50,25,n*50,n*75,n*nt] | |||
| IC | drug | [n*nd,50,n*nd] |
|
|
| target | [ n*nt,50,n*nt] | |||
| drug | [n*nd,n*75,50,n*75,n*nd] | 0.944 | 0.923 | |
| target | [n*nt,n*75,50,n*75,n*nt] | |||
| drug | [ n*nd,n*150,n*75,50,n*75,n*150,n*nd] | 0.914 | 0.906 | |
| target | n*nt,n*150,n*75,50,n*75,n*150,n*nt] | |||
| E | drug | [n*nd,100,n*nd] | 0.956 | 0.947 |
| target | [n*nt,100,n*nt] | |||
| drug | [n*nd,n*200,100,n*200,n*nd] |
|
| |
| target | [n*nt,n*200,100,n*200,n*nt] | |||
| drug | [n*nd,n*300,n*200,100,n*200,n*300,n*nd] | 0.893 | 0.886 | |
| target | [n*nt,n*300,n*200,100,n*200,n*300,n*nt] | |||
| DrugBank_FDA | drug | [n*nd,100,n*nd] | 0.925 | 0.912 |
| target | [n*nt,100,n*nt] | |||
| drug | [n*nd,n*200,100,n*200,n*nd] |
|
| |
| target | [n*nt,n*200,100,n*200,n*nt] | |||
| drug | [n*nd,n*300,n*200,100,n*200,n*300,n*nd] | 0.946 | 0.938 | |
| target | [n*nt,n*300,n*200,100,n*200,n*300,n*nt] | |||
Figure 4The ROC and precision-recall curves of the first repeat of 10-fold cross-validation for five datasets; the left is the ROC curve and the right is the precision-recall curve. (A) The ROC and precision-recall curves for NR dataset; (B) The ROC and precision-recall curves for GPCR dataset; (C) The ROC and precision-recall curves for IC dataset; (D) The ROC and precision-recall curves for E dataset; (E) The ROC and precision-recall curves for DrugBank_FDA dataset.
Figure 5Comparison of AUC and AUPR among MDADTI, DDR, KronRLS-MKL, NRLMF, and BLM-NII methods on NR, GPCR, IC, E, and Drugbank_FDA datasets under CVS1, CVS2, and CVS3 setting. (A) Comparison of AUC and AUPR under CVS1 setting; (B) Comparison of AUC and AUPR under CVS2 setting; (C) Comparison of AUC and AUPR under CVS3 setting. The symbols +/- denote if the differences between our method MDADTI and other methods are statistically significant (+) or not (-) at the significance level of 0.05.
Figure 6The comparison of AUC and AUPR between MDADTI and RWR_DNN method on NR, GPCR, IC and E dataset under CVS1, CVS2 and CVS3 setting. (A) Comparison of AUC (B) Comparison of AUPR.
Top 30 unknown DTIs predicted by MDADTI model on E dataset. DTIs in bold indicate that they are validated in one or more reference databases.
| Rank | Drug | Target | Probability | Databases | |||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
|
| ||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
| 15 | D00043 | hsa1504 | 0.977 | ||||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
| 19 | D01223 | hsa5538 | 0.9616 | ||||||
| 20 | D00002 | hsa31 | 0.9553 | ||||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
| 23 | D00139 | hsa5742 | 0.9344 | ||||||
|
|
|
|
|
| |||||
|
|
|
|
|
| |||||
| 26 | D00002 | hsa7298 | 0.9207 | ||||||
| 27 | D03670 | hsa1579 | 0.8932 | ||||||
| 28 | D01441 | hsa3551 | 0.8806 | ||||||
|
|
|
|
|
|
| ||||
| 30 | D00043 | hsa686 | 0.8688 | ||||||
Figure 7Network visualization of the top 100 unknown DTIs in E dataset. Yellow and blue nodes represent drugs and targets, respectively. Solid lines represent verified interaction and dashed lines represent unverified interactions. There are 40 unknown DTIs that were verified.
The fractions of true DTIs among the predicted Top N (N = 10, 30, 50,100) unknown DTIs in five datasets.
| Dataset | Top N | Fraction |
|---|---|---|
| NR | Top10 | 50.00% |
| Top30 | 43.33% | |
| Top50 | 28.00% | |
| Top100 | 20.00% | |
| GPCR | Top10 | 80.00% |
| Top30 | 66.67% | |
| Top50 | 60.00% | |
| Top100 | 40.00% | |
| IC | Top10 | 80.00% |
| Top30 | 50.00% | |
| Top50 | 40.00% | |
| Top100 | 32.00% | |
| E | Top10 | 100.00% |
| Top30 | 73.33% | |
| Top50 | 52.00% | |
| Top100 | 40.00% | |
| DrugBank_FDA | Top10 | 80.00% |
| Top30 | 66.67% | |
| Top50 | 68.00% | |
| Top100 | 46.00% |