| Literature DB >> 35806415 |
Yoonbee Kim1, Jong-Hoon Park1, Young-Rae Cho1,2.
Abstract
Genome-wide association studies (GWAS) can be used to infer genome intervals that are involved in genetic diseases. However, investigating a large number of putative mutations for GWAS is resource- and time-intensive. Network-based computational approaches are being used for efficient disease-gene association prediction. Network-based methods are based on the underlying assumption that the genes causing the same diseases are located close to each other in a molecular network, such as a protein-protein interaction (PPI) network. In this survey, we provide an overview of network-based disease-gene association prediction methods based on three categories: graph-theoretic algorithms, machine learning algorithms, and an integration of these two. We experimented with six selected methods to compare their prediction performance using a heterogeneous network constructed by combining a genome-wide weighted PPI network, an ontology-based disease network, and disease-gene associations. The experiment was conducted in two different settings according to the presence and absence of known disease-associated genes. The results revealed that HerGePred, an integrative method, outperformed in the presence of known disease-associated genes, whereas PRINCE, which adopted a network propagation algorithm, was the most competitive in the absence of known disease-associated genes. Overall, the results demonstrated that the integrative methods performed better than the methods using graph-theory only, and the methods using a heterogeneous network performed better than those using a homogeneous PPI network only.Entities:
Keywords: disease gene prioritization; disease networks; disease-gene associations; heterogeneous networks; protein-protein interaction networks
Mesh:
Year: 2022 PMID: 35806415 PMCID: PMC9266751 DOI: 10.3390/ijms23137411
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
List of network-based methods for disease-gene association prediction including data sources, input network type, and key techniques.
| Method | Data Source | Network Format | Technique |
|---|---|---|---|
| RWR [ | OMIM, HPRD, BIND, BioGrid, IntAct, STRING | homogeneous network | random walk |
| RWRH [ | OMIM, HPRD, MimMiner | heterogeneous network | random walk |
| PRINCE [ | OMIM, HPRD, GO | heterogeneous network | network propagation |
| DADA [ | OMIM, HPRD, BIND, BioGrid, MimMiner | homogeneous network | random walk |
| RWR-MH [ | OMIM, HPO, Interactome | heterogeneous, multiplex network | random walk |
| PhenoRank [ | OMIM, HPRD, BioGrid, IntAct, HPO | heterogeneous network | network propagation |
| NetCore [ | DisGeNet, ConsensusPathDB | homogeneous network | random walk |
| PRYNT [ | CTD, STRING | homogeneous network | path search, random walk |
| CIPHER [ | OMIM, HPRD, BIND, MimMiner | heterogeneous network | linear regression |
| CrossRank [ | OMIM, PubMed | tissue-specific heterogeneous network | network propagation |
| pBRIT [ | HPO, ConsensusPathDB, GO | homogeneous network | Bayesian ridge regression |
| Scuba [ | HPRD, STRING, Reactome, PID | homogeneous network | graph node kernels |
| IDLP [ | OMIM, HPRD, BioGrid, IntAct, Interactome, MimMiner | heterogeneous network | network propagation |
| HerGePred [ | HPO, DisGeNet, MalaCard, Orphanet | heterogeneous network | network embedding, random walk |
Numbers of nodes and edges in each network for the experiment.
| Experimental Data | Number of Nodes | Number of Edges |
|---|---|---|
| Gene network | 14,663 | 258,476 |
| Disease network | 6465 | 4,354,956 |
| Disease-gene associations | - | 5024 |
AUC results evaluated by gene ranks for disease-gene association prediction with known disease genes from sample-1.
| Method | 500 | 1000 | 5000 | 10,000 | |
|---|---|---|---|---|---|
| RWR | 0.813 | 0.800 | 0.791 | 0.775 | 0.733 |
| PRINCE | 0.561 | 0.636 | 0.620 | 0.694 | 0.703 |
| DADA | 0.816 | 0.789 | 0.785 | 0.782 | 0.741 |
| IDLP | 0.596 | 0.559 | 0.524 | 0.650 | 0.791 |
| HerGePred | 0.838 | 0.863 | 0.835 | 0.800 | 0.778 |
| NetCore | 0.790 | 0.805 | 0.792 | 0.764 | 0.745 |
Recall values evaluated by gene ranks for disease-gene association prediction with known disease genes from sample-1.
| Method | 500 | 1000 | 5000 | 10,000 | |
|---|---|---|---|---|---|
| RWR | 0.225 | 0.350 | 0.425 | 0.657 | 0.863 |
| PRINCE | 0.077 | 0.197 | 0.306 | 0.673 | 0.898 |
| DADA | 0.230 | 0.357 | 0.436 | 0.680 | 0.889 |
| IDLP | 0.019 | 0.091 | 0.176 | 0.617 | 0.661 |
| HerGePred | 0.302 | 0.401 | 0.471 | 0.719 | 0.889 |
| NetCore | 0.197 | 0.295 | 0.364 | 0.633 | 0.817 |
Figure 1ROC (a) and recall (b) curves evaluated by gene ranks for disease-gene association prediction with known disease-associated genes.
AUC results evaluated by prediction scores for disease-gene association prediction with known disease genes from sample-1.
| Method | 500 | 1000 | 5000 | 10,000 | |
|---|---|---|---|---|---|
| RWR | 0.391 | 0.533 | 0.443 | 0.528 | 0.550 |
| PRINCE | 0.543 | 0.643 | 0.620 | 0.694 | 0.702 |
| DADA | 0.442 | 0.682 | 0.540 | 0.548 | 0.484 |
| IDLP | 0.663 | 0.532 | 0.543 | 0.709 | 0.632 |
| HerGePred | 0.795 | 0.834 | 0.835 | 0.794 | 0.774 |
| NetCore | 0.622 | 0.738 | 0.764 | 0.761 | 0.732 |
Recall values evaluated by prediction scores for disease-gene association prediction with known disease genes from sample-1.
| Method | 500 | 1000 | 5000 | 10,000 | |
|---|---|---|---|---|---|
| RWR | 0.007 | 0.037 | 0.104 | 0.439 | 0.745 |
| PRINCE | 0.079 | 0.195 | 0.306 | 0.673 | 0.898 |
| DADA | 0.012 | 0.030 | 0.058 | 0.209 | 0.715 |
| IDLP | 0.016 | 0.095 | 0.176 | 0.357 | 0.552 |
| HerGePred | 0.302 | 0.411 | 0.469 | 0.722 | 0.889 |
| NetCore | 0.179 | 0.325 | 0.394 | 0.638 | 0.826 |
Figure 2ROC (a) and recall (b) curves evaluated by prediction scores for disease-gene association prediction with known disease-associated genes.
AUC results evaluated by gene ranks for disease-gene association prediction without known disease genes from sample-2.
| Method | 500 | 1000 | 5000 | 10,000 | |
|---|---|---|---|---|---|
| PRINCE | 0.485 | 0.655 | 0.576 | 0.685 | 0.663 |
| IDLP | 0.333 | 0.478 | 0.559 | 0.698 | 0.644 |
| HerGePred | 0.257 | 0.647 | 0.536 | 0.534 | 0.520 |
Recall values evaluated by gene ranks for disease-gene association prediction without known disease genes from sample-2.
| Method | 500 | 1000 | 5000 | 10,000 | |
|---|---|---|---|---|---|
| PRINCE | 0.086 | 0.171 | 0.303 | 0.594 | 0.846 |
| IDLP | 0.011 | 0.086 | 0.166 | 0.446 | 0.749 |
| HerGePred | 0.011 | 0.034 | 0.069 | 0.343 | 0.657 |
Figure 3ROC (a) and recall (b) curves evaluated by gene ranks for disease-gene association prediction without known disease-associated genes.
AUC results evaluated by prediction scores for disease-gene association prediction without known disease genes from sample-2.
| Method | 500 | 1000 | 5000 | 10,000 | |
|---|---|---|---|---|---|
| PRINCE | 0.480 | 0.651 | 0.571 | 0.684 | 0.663 |
| IDLP | 0.550 | 0.367 | 0.468 | 0.642 | 0.657 |
| HerGePred | 0.351 | 0.566 | 0.567 | 0.499 | 0.531 |
Recall values evaluated by prediction scores for disease-gene association prediction without known disease genes from sample-2.
| Method | 500 | 1000 | 5000 | 10,000 | |
|---|---|---|---|---|---|
| PRINCE | 0.086 | 0.171 | 0.303 | 0.594 | 0.846 |
| IDLP | 0.011 | 0.097 | 0.194 | 0.451 | 0.640 |
| HerGePred | 0.023 | 0.040 | 0.074 | 0.371 | 0.646 |
Figure 4ROC (a) and recall (b) curves evaluated by prediction scores for disease-gene association prediction without known disease-associated genes.
Figure 5Numbers of disease-associated genes (i.e., ground-truth) and true positives with each method according to their degree in the PPI network in prediction with known disease genes from sample-1 (a) and without known disease genes from sample-2 (b).