| Literature DB >> 33781206 |
Ming He1, Chen Huang1, Bo Liu2, Yadong Wang1,2, Junyi Li3.
Abstract
BACKGROUND: Exploring the relationship between disease and gene is of great significance for understanding the pathogenesis of disease and developing corresponding therapeutic measures. The prediction of disease-gene association by computational methods accelerates the process.Entities:
Keywords: Disease-gene association prediction; Factorization; Graph neural network; Heterogeneous network
Year: 2021 PMID: 33781206 PMCID: PMC8006390 DOI: 10.1186/s12859-021-04099-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Experimental results (%) of link prediction task on dataset
| Model | P@1000 | R@1000 | P@10,000 | R@10,000 | P@20,000 | R@20,000 |
|---|---|---|---|---|---|---|
| Metapath2vec | 99.60 ± 0.202 | 3.81 ± 0.301 | 95.40 ± 0.202 | 36.46 ± 0.023 | 82.65 ± 0.011 | 63.18 ± 0.022 |
| HIN2vec | 99.60 ± 0.051 | 3.81 ± 0.241 | 74.22 ± 0.102 | 28.37 ± 0.043 | 58.99 ± 0.031 | 45.09 ± 0.021 |
| HERec | 63.30 ± 0.063 | 2.42 ± 0.182 | 72.58 ± 0.035 | 27.74 ± 0.061 | 71.55 ± 0.102 | 54.69 ± 0.100 |
| GAT | 94.90 ± 0.171 | 3.62 ± 0.304 | 93.67 ± 0.043 | 35.80 ± 0.102 | 90.45 ± 0.211 | 69.14 ± 0.117 |
| HAN | 99.60 ± 0.093 | 3.81 ± 0.301 | 96.28 ± 0.130 | 73.59 ± 0.317 | ||
| MAGNN | 99.30 ± 0.122 | 3.80 ± 0.103 | 98.11 ± 0.351 | 37.50 ± 0.203 | 94.99 ± 0.033 | 72.61 ± 0.091 |
| FactorHNE | 99.07 ± 0.121 | 37.87 ± 0.082 |
Bold values are the highest value of all baselines
Fig. 1a ROC curves of all models; b P–R curve of all models
Fig. 5An example of node aggregation based on self-attention mechanism
Fig. 2Parameter analysis of FactorHNE
Case study results
| CUI | Disease name | CUI | Disease |
|---|---|---|---|
| C0575081 | Gait abnormality | CUI:C0678230 | Congenital Epicanthus |
In the prediction results of the above table, the candidate genes with known association were labeled in the original DB, and candidate genes marked with "*" indicate newly discovered associated genes, that is, there are not exist in dataset but records in the latest online database. The results show that our model has the ability to mine new disease gene associations, such as OFD1-C057508 and FLNA-C0678230. Our model does not remember the existing associations in the original dataset, but predicts new candidate genes by mining the hidden patterns. This is very important, because it is difficult to mine new genes only by making a high score for the known associations. Therefore, our model can help to decipher the relationship between diseases and genes, which has certain biomedical significance
An overview of heterogeneous network dataset
| Node | Number | Relation | Metapath |
|---|---|---|---|
| Gene (G) | 21584 | G–G|G–D|G–O | GG|GDG|GOG |
| Disease (D) | 15030 | D–G|D–S | DGD|DSD |
| GO (O) | 14204 | O–G | – |
| Symptom (S) | 6540 | S–D | – |
Fig. 3Metagraph for heterogeneous networks
Fig. 4The overall architecture of FactorHNE. a Model global architecture; b neighborhood subgraph factor decomposition; c inter-metapath factor graph aggregation