| Literature DB >> 33912218 |
Zhixian Liu1,2, Qingfeng Chen3, Wei Lan3, Haiming Pan3, Xinkun Hao3, Shirui Pan4.
Abstract
Identifying drug-target interaction (DTI) is the basis for drug development. However, the method of using biochemical experiments to discover drug-target interactions has low coverage and high costs. Many computational methods have been developed to predict potential drug-target interactions based on known drug-target interactions, but the accuracy of these methods still needs to be improved. In this article, a graph autoencoder approach for DTI prediction (GADTI) was proposed to discover potential interactions between drugs and targets using a heterogeneous network, which integrates diverse drug-related and target-related datasets. Its encoder consists of two components: a graph convolutional network (GCN) and a random walk with restart (RWR). And the decoder is DistMult, a matrix factorization model, using embedding vectors from encoder to discover potential DTIs. The combination of GCN and RWR can provide nodes with more information through a larger neighborhood, and it can also avoid over-smoothing and computational complexity caused by multi-layer message passing. Based on the 10-fold cross-validation, we conduct three experiments in different scenarios. The results show that GADTI is superior to the baseline methods in both the area under the receiver operator characteristic curve and the area under the precision-recall curve. In addition, based on the latest Drugbank dataset (V5.1.8), the case study shows that 54.8% of new approved DTIs are predicted by GADTI.Entities:
Keywords: autoencoder; drug-target interaction prediction; graph convolutional network; heterogeneous network; network embedding; random walk
Year: 2021 PMID: 33912218 PMCID: PMC8072283 DOI: 10.3389/fgene.2021.650821
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Sources of datasets and their statistical information.
| Drug | 708 | |
| Targets | 1,512 | |
| Disease | 5,603 | |
| Side effect | 4,192 | |
| drug–target interaction | 1,923 | DrugBank v3.0 (Knox et al., |
| Drug–drug interaction | 10,036 | DrugBank v3.0 (Knox et al., |
| Protein–protein | 7,363 | HPRD Release 9 (Keshava Prasad et al., |
| Drug–disease | 199,214 | Comparative Toxicogenomics Database (Davis et al., |
| Drug side effect | 80,164 | SIDER Version 2 (Kuhn et al., |
| Protein–disease | 1,596,745 | Comparative Toxicogenomics Database (Davis et al., |
| Drug structure similarity | Based on Morgan fingerprints (Rogers and Hahn, | |
| Protein sequence similarity | Based on Smith–Waterman scores (Smith and Waterman, | |
| Total | 1,895,445 | |
This edge is not counted because all node pairs are connected.
Figure 1Overview of GADTI model architecture.
Figure 2An small example of the heterogeneous network.
Figure 3The process of the encoder (taking a drug node as an example).
Figure 4Comparison between MSCMF, TL_HGBI, DTINet, NeoDTI, and GADTI in terms of AUROC and AUPRC based on 10-fold cross-validation (#positive: #negative = 1:10).
Figure 5The ROC curves and PR curves of different methods (#positive: #negative = 1:10).
Figure 6Comparison between different methods in terms of AUROC and AUPRC based on 10-fold cross-validation (all unknown pairs were treated as negative examples).
Figure 7Comparison between different methods in terms of AUROC and AUPRC based on 10-fold cross-validation (#positive: #negative = 1:10, DTIs with similar drugs or targets were removed).
Hit numbers of GADTI in different configurations.
| Configuration A: | 211 | 406 | 508 | 570 |
| Configuration B: | 291 | 402 | 475 | 523 |
| Configuration C: | 149 | 265 | 351 | 422 |