Mustafa Coşkun1,2, Mehmet Koyutürk3,4. 1. Department of Computer Engineering, Abdullah Gül University. 2. Hakkari University, Kayseri, 38080, Turkey. 3. Department of Computer and Data Sciences. 4. Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, 44106, USA.
Abstract
BACKGROUND: Link prediction is an important and well-studied problem in network biology. Recently, graph representation learning methods, including Graph Convolutional Network (GCN)-based node embedding have drawn increasing attention in link prediction. MOTIVATION: An important component of GCN-based network embedding is the convolution matrix, which is used to propagate features across the network. Existing algorithms use the degree-normalized adjacency matrix for this purpose, as this matrix is closely related to the graph Laplacian, capturing the spectral properties of the network. In parallel, it has been shown that GCNs with a single layer can generate more robust embeddings by reducing the number of parameters. Laplacian-based convolution is not well suited to single layered GCNs, as it limits the propagation of information to immediate neighbors of a node. RESULTS: Capitalizing on the rich literature on unsupervised link prediction, we propose using node similarity based convolution matrices in GCNs to compute node embeddings for link prediction. We consider eight representative node similarity measures (Common Neighbors, Jaccard Index, Adamic-Adar, Resource Allocation, Hub Depressed Index, Hub Promoted Index, Sorenson Index, Salton Index) for this purpose. We systematically compare the performance of the resulting algorithms against GCNs that use the degree-normalized adjacency matrix for convolution, as well as other link prediction algorithms. In our experiments, we use three link prediction tasks involving biomedical networks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction. Our results show that node similarity-based convolution matrices significantly improve the link prediction performance of GCN-based embeddings. CONCLUSION: As sophisticated machine learning frameworks are increasingly employed in biological applications, historically well-established methods can be useful in making a head-start. AVAILABILITY: Our method, SiGraC, is implemented as a Python library and is freely available at https://github.com/mustafaCoskunAgu/SiGraC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
BACKGROUND: Link prediction is an important and well-studied problem in network biology. Recently, graph representation learning methods, including Graph Convolutional Network (GCN)-based node embedding have drawn increasing attention in link prediction. MOTIVATION: An important component of GCN-based network embedding is the convolution matrix, which is used to propagate features across the network. Existing algorithms use the degree-normalized adjacency matrix for this purpose, as this matrix is closely related to the graph Laplacian, capturing the spectral properties of the network. In parallel, it has been shown that GCNs with a single layer can generate more robust embeddings by reducing the number of parameters. Laplacian-based convolution is not well suited to single layered GCNs, as it limits the propagation of information to immediate neighbors of a node. RESULTS: Capitalizing on the rich literature on unsupervised link prediction, we propose using node similarity based convolution matrices in GCNs to compute node embeddings for link prediction. We consider eight representative node similarity measures (Common Neighbors, Jaccard Index, Adamic-Adar, Resource Allocation, Hub Depressed Index, Hub Promoted Index, Sorenson Index, Salton Index) for this purpose. We systematically compare the performance of the resulting algorithms against GCNs that use the degree-normalized adjacency matrix for convolution, as well as other link prediction algorithms. In our experiments, we use three link prediction tasks involving biomedical networks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction. Our results show that node similarity-based convolution matrices significantly improve the link prediction performance of GCN-based embeddings. CONCLUSION: As sophisticated machine learning frameworks are increasingly employed in biological applications, historically well-established methods can be useful in making a head-start. AVAILABILITY: Our method, SiGraC, is implemented as a Python library and is freely available at https://github.com/mustafaCoskunAgu/SiGraC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi P Tsafou; Michael Kuhn; Peer Bork; Lars J Jensen; Christian von Mering Journal: Nucleic Acids Res Date: 2014-10-28 Impact factor: 16.971
Authors: Andrea Franceschini; Damian Szklarczyk; Sune Frankild; Michael Kuhn; Milan Simonovic; Alexander Roth; Jianyi Lin; Pablo Minguez; Peer Bork; Christian von Mering; Lars J Jensen Journal: Nucleic Acids Res Date: 2012-11-29 Impact factor: 16.971
Authors: Allan Peter Davis; Cynthia J Grondin; Robin J Johnson; Daniela Sciaky; Roy McMorran; Jolene Wiegers; Thomas C Wiegers; Carolyn J Mattingly Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971
Authors: Daniel Domingo-Fernández; Yojana Gadiya; Abhishek Patel; Sarah Mubeen; Daniel Rivas-Barragan; Chris W Diana; Biswapriya B Misra; David Healey; Joe Rokicki; Viswa Colluru Journal: PLoS Comput Biol Date: 2022-02-25 Impact factor: 4.475