| Literature DB >> 32657369 |
Kapil Devkota1, James M Murphy2, Lenore J Cowen1.
Abstract
MOTIVATION: One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein-protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments.Entities:
Mesh:
Year: 2020 PMID: 32657369 PMCID: PMC7355260 DOI: 10.1093/bioinformatics/btaa459
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Solid lines represent actual network edges. In triadic closure, we would predict the dotted yellow edge (p, q) with high confidence, because p and q have many common neighbors. In L3, we would first predict the dotted purple edges and
Graph properties of the largest connected components of the human DREAM 1–3 networks and the yeast BioGRID network
| Graph | # Nodes | # Edges | Average degree | Diameter | Clustering coefficient |
|---|---|---|---|---|---|
| DREAM1 | 17 388 | 2 232 398 | 11.63 | 7 | 0.34 |
| DREAM2 | 12 325 | 397 254 | 11.63 | 9 | 0.34 |
| DREAM3 | 5009 | 18 270 | 11.45 | 12 | 0.20 |
| BioGRID (2014) | 4996 | 76 010 | 11.45 | 5 | 0.31 |
| BioGRID (2017) | 4996 | 107 769 | 11.45 | 5 | 0.38 |
Note: Note that in addition, we restricted the connected components of BioGRID (2014) and BioGRID (2017) networks to have the same set of vertices. We also removed edges in BioGRID (2014) that were not present in BioGRID (2017).
AUPRC and AUROC scores for different link prediction methods under global and node-based setting for DREAM1
| AUPRC | AUROC | |
|---|---|---|
| Performance for global link ranking | ||
| Common-weighted | 0.0737 ± 0.0004 | 0.9519 ± 0.0002 |
| GLIDE (CW) | 0.0737 ± 0.0004 | 0.9519 ± 0.0002 |
| GLIDE (L3) |
| 0.9450 ± 0.0002 |
| L3 | 0.1736 ± 0.0006 |
|
| node2vec | 0.0573 ± 0.0020 | 0.9298 ± 0.0012 |
| Performance for node-based link ranking | ||
| Common-weighted | 0.0329 ± 0.0010 | 0.9146 ± 0.0040 |
| GLIDE (CW) | 0.0329 ± 0.0010 |
|
| GLIDE (L3) |
| 0.9002 ± 0.0025 |
| L3 | 0.0366 ± 0.0010 | 0.9011 ± 0.0028 |
| node2vec | 0.0276 ± 0.0013 | 0.8955 ± 0.0064 |
Notes: Best performing method in bold.
AUPRC and AUROC scores for different link prediction methods under both global and node-based settings for DREAM2
| AUPRC | AUROC | |
|---|---|---|
| Performance for global link ranking | ||
| Common-weighted |
| 0.9569 ± 0.0007 |
| GLIDE (CW) | 0.1074 ± 0.0007 |
|
| GLIDE (L3) | 0.0923 ± 0.0002 | 0.9540 ± 0.0005 |
| L3 | 0.0921 ± 0.0003 | 0.9584 ± 0.0005 |
| node2vec | 0.0206 ± 0.0011 | 0.9035 ± 0.0027 |
| Performance for node-based link ranking | ||
| Common-weighted | 0.0307 ± 0.0027 | 0.8697 ± 0.0122 |
| GLIDE (CW) |
|
|
| GLIDE (L3) | 0.0214 ± 0.0017 | 0.8855 ± 0.0078 |
| L3 | 0.0211 ± 0.0016 | 0.8857 ± 0.0085 |
| node2vec | 0.0159 ± 0.0012 | 0.8640 ± 0.0113 |
Notes: Best performing method in bold.
AUPRC and AUROC scores for different link prediction methods under both global and node-based setting for DREAM3
| AUPRC | AUROC | |
|---|---|---|
| Performance for global link ranking | ||
| Common-weighted | 0.0039 ± 0.0003 | 0.8078 ± 0.0074 |
| GLIDE (CW) | 0.0041 ± 0.0003 | 0.8503 ± 0.0060 |
| GLIDE (L3) | 0.0087 ± 0.0005 |
|
| L3 |
| 0.8896 ± 0.0040 |
| node2vec | 0.0035 ± 0.0002 | 0.8191 ± 0.0050 |
| Performance for node-based link ranking | ||
| Common-weighted | 0.0046 ± 0.0003 | 0.7480 ± 0.0079 |
| GLIDE (CW) | 0.0055 ± 0.0003 | 0.7983 ± 0.0070 |
| GLIDE (L3) |
|
|
| L3 | 0.0096 ± 0.0006 | 0.8608 ± 0.0070 |
| node2vec | 0.0061 ± 0.0005 | 0.8310 ± 0.072 |
Notes: Best performing method in bold.
AUPRC and AUROC scores for different link prediction methods under global and node-based settings for BioGRID
| AUPRC | AUROC | |
|---|---|---|
| Performance for global link ranking | ||
| Common-weighted | 0.0168 | 0.7656 |
| GLIDE (CW) | 0.0169 | 0.7757 |
| GLIDE (L3) |
| 0.8027 |
| L3 |
|
|
| node2vec | 0.0064 | 0.6301 |
| Performance for node-based link ranking | ||
| Common-weighted | 0.0111 | 0.7719 |
| GLIDE (CW) | 0.0115 | 0.7963 |
| GLIDE (L3) |
| 0.8147 |
| L3 |
|
|
| node2vec | 0.0054 | 0.6629 |
Notes: Best performing method in bold.
Percentage of top 25 links predicted from DREAM3, using different link prediction methods, present in DREAM1, DREAM2 or both
| Link prediction metrics | In DREAM1 (%) | In DREAM2 (%) | In both (%) |
|---|---|---|---|
| Common neighbors (weighted) | 84 | 76 | 76 |
| L3 | 72 | 60 | 56 |
| node2vec | 56 | 44 | 40 |
| DSE | 84 | 60 | 60 |
Note: Details of gene names and overlap between methods appear in Supplementary Tables S5–S7 and Supplementary Fig. S7.
Top 25 predicted links by the two variants of GLIDE in the DREAM3 network between Crohn’s disease genes from the study of Franke and the study of Marigorta restricted to consider only links between Crohn’s disease genes and genes of degree at most 25 in DREAM3
| (a) 2010-Glide (CW) | |
| LRRK2 | TP53RK |
|
| IL2RB |
|
| PLA2G4A |
|
| EIF4EBP1 |
|
| GAB2 |
|
| CAMK4 |
|
| KRT8 |
|
| EIF4EBP1 |
|
| ELK1 |
|
| GJA1 |
|
| GAB1 |
|
| GRB10 |
|
| CTTN |
|
| PYCARD |
|
| PLCG2 |
|
| CAMK4 |
|
| CAMK4 |
|
| GTF2I |
|
| IRS2 |
|
| HSF1 |
|
| CAV1 |
|
| Q4LE43 |
|
| HNRNPK |
|
| Q9UFY1 |
|
| STMN1 |
| (b) 2010-Glide (L3) | |
|
| IL2RB |
|
| MEF2A |
|
| SMURF2 |
|
| TGIF1 |
|
| MEF2C |
|
| SOCS1 |
|
| BMPR1B |
|
| UBE2I |
|
| SKP2 |
|
| DOK1 |
|
| GHR |
|
| SNIP1 |
|
| CBLB |
|
| IL2RG |
|
| MAPK11 |
|
| INPP5D |
|
| CSF2RB |
|
| GNB2L1 |
|
| IFNAR2 |
|
| IL2RB |
|
| NLK |
|
| GAB1 |
|
| AXL |
|
| PIAS1 |
|
| GRAP |
| (c) 2017-Glide (CW) | |
|
| TP53RK |
|
| Q4LE43 |
|
| ACACA |
|
| Q9UFY1 |
|
| GAB2 |
|
| GAB1 |
|
| CBLB |
|
| PLCG2 |
|
| PTPRA |
|
| CAV1 |
|
| LCP2 |
|
| PTPN2 |
|
| Q59GM6 |
|
| VAV2 |
|
| TNK2 |
|
| ACP1 |
|
| ITGB3 |
|
| RICTOR |
|
| STAT5B |
|
| PTK6 |
|
| MET |
|
| IL2RB |
|
| EIF4EBP1 |
|
| CTNND1 |
|
| INPPL1 |
| (d) 2017-Glide (L3) | |
|
| FZD1 |
|
| FZD8 |
|
| CBLB |
|
| IL2RB |
|
| S1PR3 |
|
| DOK1 |
|
| LAT |
|
| FZD7 |
|
| NCK1 |
|
| STAT5B |
|
| RYK |
|
| HCK |
|
| LCP2 |
|
| TBXA2R |
|
| AGTR1 |
|
| GRAP |
|
| EDNRA |
|
| FZD5 |
|
| VAV2 |
|
| GAB2 |
|
| CD247 |
|
| FZD4 |
|
| PLCG2 |
|
| PTPN2 |
|
| LRP5 |
Note: Genes identified already in the article as Crohn’s disease genes in bold. Links supported by the link existing in at least one of the DREAM1 and DREAM2 networks denoted by *. The fraction of supported links and associated P-values are: ; ; ; . Details are in the Supplementary Material.
Top 25 predicted links by the two variants of GLIDE in the DREAM3 network between Crohn’s disease genes from the study of Franke and the study of Marigorta restricted to consider only links between Crohn’s disease genes and genes of degree at most 25 in DREAM3, restricted to gene pairs also with no common neighbors in DREAM3
| (a) 2010-Glide (CW) | |
|
| TP53RK |
|
| V9HWE1 |
|
| ACACA |
|
| EIF4EBP1 |
|
| V9HWE1 |
|
| TP53RK |
|
| V9HWE1 |
|
| MAPK13 |
|
| ACACA |
|
| LMNA |
|
| Q4LE43 |
|
| Q9UFY1 |
|
| MARCKS |
|
| CAMK4 |
|
| RPS6 |
|
| NCK1 |
|
| CAMK4 |
|
| MAPK12 |
|
| REL |
|
| TP53RK |
|
| MAPK13 |
|
| HSF1 |
|
| VIM |
|
| PIM1 |
|
| KRT18 |
| (b) 2010-Glide (L3) | |
|
| MMP9 |
|
| PTPRA |
|
| ACKR2 |
|
| MMP9 |
|
| RASA1 |
|
| MAPK12 |
|
| GRIN2A |
|
| IRF5 |
|
| ACKR4 |
|
| APC |
|
| CXCR2 |
|
| VCAN |
|
| ULK1 |
|
| IL22RA1 |
|
| WNT3A |
|
| TRPV4 |
|
| REL |
|
| YES1 |
|
| CDC20 |
|
| IFNLR1 |
|
| APAF1 |
|
| ILK |
|
| CDC14B |
|
| CFTR |
|
| KEAP1 |
| (c) 2017-Glide (CW) | |
|
| STMN1 |
|
| VIM |
|
| TOP2A |
|
| CAMK4 |
|
| TP53RK |
|
| HSF1 |
|
| V9HWE1 |
|
| ACACA |
|
| MAP3K8 |
|
| MAP2K2 |
|
| WEE1 |
|
| NCOA3 |
|
| MARCKS |
|
| NFKBIB |
|
| MAPK13 |
|
| KRT8 |
|
| MAP2K3 |
|
| KRT18 |
|
| MCL1 |
|
| REL |
|
| TAB2 |
|
| TAB1 |
|
| BIRC5 |
|
| CCNE1 |
|
| RPS6 |
| (d) 2017-Glide (L3) | |
|
| S1PR3 |
|
| FZD7 |
|
| RYK |
|
| TBXA2R |
|
| AGTR1 |
|
| EDNRA |
|
| FZD4 |
|
| LRP5 |
|
| RDX |
|
| GNA12 |
|
| MUSK |
|
| GNAI1 |
|
| FRZB |
|
| FZD9 |
|
| GPC4 |
|
| PTGER3 |
|
| CHRNA1 |
|
| MMP2 |
|
| FZD1 |
|
| PLD2 |
|
| PIK3CB |
|
| GNRHR |
|
| GPSM1 |
|
| F2RL1 |
|
| BLK |
Note: Genes identified already in the article as Crohn’s disease genes in bold. Links supported by the link existing in at least one of the DREAM1 and DREAM2 networks denoted by *. The fraction of supported links and associated P-values are: ; ; ; . Details are in the Supplementary Material.