| Literature DB >> 32657387 |
Zichen Wang1,2, Mu Zhou3, Corey Arnold1,2.
Abstract
MOTIVATION: Mining drug-disease association and related interactions are essential for developing in silico drug repurposing (DR) methods and understanding underlying biological mechanisms. Recently, large-scale biological databases are increasingly available for pharmaceutical research, allowing for deep characterization for molecular informatics and drug discovery. However, DR is challenging due to the molecular heterogeneity of disease and diverse drug-disease associations. Importantly, the complexity of molecular target interactions, such as protein-protein interaction (PPI), remains to be elucidated. DR thus requires deep exploration of a multimodal biological network in an integrative context.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32657387 PMCID: PMC7355266 DOI: 10.1093/bioinformatics/btaa437
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of our heterogeneous information network. The multirelational network has 592 disease, 1012 drug and 13 460 protein nodes connected by 3204 drug–disease, 7713 drug–protein, 104 716 disease–protein and 141 296 protein–protein edges
Fig. 2.Overview of BiFusion model architecture. (A) The pipeline of BiFusion contains a bipartite GCN encoder and a MLP decoder. The encoder takes similarity features of drug and disease nodes as inputs, and generates drug–disease pair embeddings by fusing heterogeneous information through message passing across drug, disease and protein nodes. Each BiFusion layer consists of three computing steps shown in the following subfigures. BiFusion decoder takes pair embeddings to produce prediction score and reconstruct drug–disease association matrix. (B) The first step in BiFusion layer: a single bipartite graph attention convolution layer is applied to project information from drug and disease domains to protein domain. (C) The second step in BiFusion layer: a single layer graph attention convolution layer is applied within PPI network. (D) The third step in BiFusion layer: another bipartite graph attention convolution layer is used to update drug and disease features based on learnt protein node embeddings
The summary of model performance on repoDB dataset under 10-fold cross validation
| Method | AUROC | AUPRC | Overall accuracy | Drug-centric accuracy | Disease-centric accuracy |
|---|---|---|---|---|---|
| BiFusion |
|
|
|
|
|
| BiFusion-v2 (w/o PPI) | 0.837 ± 0.003 | 0.810 ± 0.003 | 0.712 ± 0.003 | 0.687 ± 0.002 | 0.674 ± 0.003 |
| GCN | 0.792 ± 0.004 | 0.774 ± 0.005 | 0.700 ± 0.003 | 0.651 ± 0.004 | 0.659 ± 0.004 |
| DeepWalk | 0.769 ± 0.003 | 0.764 ± 0.003 | 0.672 ± 0.003 | 0.617 ± 0.003 | 0.637 ± 0.003 |
| cVAE | 0.743 ± 0.003 | 0.739 ± 0.003 | 0.665 ± 0.002 | 0.623 ± 0.003 | 0.616 ± 0.003 |
| SSLIM | 0.701 ± 0.002 | 0.703 ± 0.002 | 0.635 ± 0.002 | 0.590 ± 0.002 | 0.625 ± 0.002 |
| Network-based proximity | 0.663 ± 0.004 | 0.678 ± 0.004 | 0.608 ± 0.004 | 0.568 ± 0.004 | 0.603 ± 0.005 |
The best results are highlighted in bold.
The results of novel predictions on repoDB dataset
| Method | AUROC | AUPRC | Overall accuracy | Drug-centric accuracy | Disease-centric accuracy |
|---|---|---|---|---|---|
| BiFusion |
|
|
| 0.666 ± 0.003 |
|
| BiFusion-v2 (w/o PPI) | 0.749 ± 0.003 | 0.732 ± 0.003 | 0.674 ± 0.003 | 0.663 ± 0.003 | 0.668 ± 0.004 |
| GCN | 0.740 ± 0.004 | 0.726 ± 0.005 | 0.687 ± 0.004 |
| 0.656 ± 0.005 |
| DeepWalk | 0.712 ± 0.004 | 0.700 ± 0.004 | 0.663 ± 0.003 | 0.647 ± 0.003 | 0.655 ± 0.004 |
| cVAE | 0.696 ± 0.003 | 0.698 ± 0.003 | 0.637 ± 0.002 | 0.631 ± 0.002 | 0.641 ± 0.003 |
| SSLIM | 0.671 ± 0.002 | 0.699 ± 0.003 | 0.616 ± 0.002 | 0.575 ± 0.003 | 0.591 ± 0.002 |
| Network-based proximity | 0.661 ± 0.004 | 0.692 ± 0.004 | 0.622 ± 0.004 | 0.574 ± 0.004 | 0.594 ± 0.005 |
The summary of model performance on external dataset
| Method | AUROC | AUPRC | Overall accuracy | Drug-centric accuracy | Disease-centric accuracy |
|---|---|---|---|---|---|
| BiFusion |
|
| 0.671 ± 0.004 |
|
|
| BiFusion-v2 (w/o PPI) | 0.722 ± 0.005 | 0.677 ± 0.005 |
| 0.670 ± 0.004 | 0.636 ± 0.005 |
| GCN | 0.717 ± 0.004 | 0.676 ± 0.004 | 0.664 ± 0.003 | 0.667 ± 0.003 | 0.624 ± 0.004 |
| DeepWalk | 0.649 ± 0.003 | 0.628 ± 0.003 | 0.611 ± 0.003 | 0.604 ± 0.003 | 0.572 ± 0.003 |
| cVAE | 0.676 ± 0.006 | 0.653 ± 0.005 | 0.637 ± 0.005 | 0.629 ± 0.006 | 0.639 ± 0.006 |
| SSLIM | 0.652 ± 0.003 | 0.607 ± 0.003 | 0.602 ± 0.002 | 0.614 ± 0.003 | 0.625 ± 0.003 |
| Network-based proximity | 0.610 ± 0.003 | 0.579 ± 0.002 | 0.573 ± 0.003 | 0.566 ± 0.002 | 0.563 ± 0.003 |
New candidate drugs ranked by prediction scores by BiFusion for breast carcinoma and Parkinson’s disease
| Diseases | Rank | Candidate drugs | Evidences |
|---|---|---|---|
| Breast carcinoma | 1 | Clofarabine |
|
| 3 | Cimetidine |
| |
| 4 | Thiamine |
| |
| 5 | Arsenic trioxide |
| |
| Parkinson disease | 1 | Dextromethorphan |
|
| 2 | Solifenacin |
| |
| 4 | Atomoxetine |
| |
| 7 | Venlafaxine |
| |
| 8 | Tapentadol |
|
Fig. 3.Effect of the number of layers on model performance. The x axis denotes the number of BiFusion layers and the y axis is the model performance on testing set