| Literature DB >> 34084927 |
Abstract
The problem of determining the likelihood of the existence of a link between two nodes in a network is called link prediction. This is made possible thanks to the existence of a topological structure in most real-life networks. In other words, the topologies of networked systems such as the World Wide Web, the Internet, metabolic networks, and human society are far from random, which implies that partial observations of these networks can be used to infer information about undiscovered interactions. Significant research efforts have been invested into the development of link prediction algorithms, and some researchers have made the implementation of their methods available to the research community. These implementations, however, are often written in different languages and use different modalities of interaction with the user, which hinders their effective use. This paper introduces LinkPred, a high-performance parallel and distributed link prediction library that includes the implementation of the major link prediction algorithms available in the literature. The library can handle networks with up to millions of nodes and edges and offers a unified interface that facilitates the use and comparison of link prediction algorithms by researchers as well as practitioners. ©2021 Kerrache.Entities:
Keywords: Complex networks; High performance computing; Software library; Graph embedding; Link prediction
Year: 2021 PMID: 34084927 PMCID: PMC8157017 DOI: 10.7717/peerj-cs.521
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Comparison of LinkPred against the most important free/open-source link prediction software packages.
| Functionality | LinkPred | NetworkX | GEM | SNAP | linkpred | scikit-network | |
|---|---|---|---|---|---|---|---|
| Supported languages | C++, Python (a subset of the functionalities), Java (a subset of the functionalities) | Python | R | Python | C++, Python (a subset of the functionalities) | Python | Python |
| Topological similarity methods | Yes (with shared memory and distributed parallelism) | Yes (no parallelism) | Yes (no parallelism) | No | No (A limited number of algorithms is included as an experimental component) | Yes (no parallelism) | Yes (no parallelism) |
| Global link prediction methods | Yes (with shared memory parallelism and for some predictors also distributed parallelism) | No | No | No | No | Yes (Rooted PageRank, SimRank, Katz, shortest path) | No |
| Graph embedding algorithms | LLE, Laplacian Eigenmaps, Graph Factorization, DeepWalk, LINE, LargeVis, node2vec, and HMSM | No | No | LLE, Laplacian Eigenmaps, Graph Factorization, HOPE, SDNE, and node2vec | node2vec and GraphWave | No | Spectral, SVD, GSVD, PCA, Random Projection, Louvain, Hierarchical Louvain, Force Atlas, and Spring. |
| Classifiers | Yes (mainly via mlpack) | No | No | No | No | No | No |
| Similarity measures | Yes | No | No | Yes | Yes | No | No |
| Test data generation | Yes | No | No | No | No | No | No |
| Performance measures | Yes | No | No | No | No | No | No |
Figure 1Architecture of LinkPred.
Figure 2A node map associate values to nodes (A), whereas an edge map associates values to edges (B).
Figure 3The first stage in a graph embedding method is accomplished by an encoder class which uses a graph embedding algorithm to assign coordinates to nodes.
In the class UESMPredictor, this is followed by a similarity measure to predict link scores, whereas UESMPredictor uses a classifier to make the prediction.
Figure 4Example of performance curves generated by LinkPred (the plots are created using an external tool).
The area under the curve (shown in gray) is the value associated with the performance curve.
Description of the networks used in the experimental analysis.
Columns n and m represent the number of nodes and edges in the network, respectively.
| Network | Description | ||
|---|---|---|---|
| Amazon ( | Amazon product co-purchasing network. An edge indicates that two products have been co-purchased. Data available at | 334,863 | 925,872 |
| Brightkite ( | Friendship network on the social platform Brightkite. The data is available at | 58,228 | 214,078 7 |
| CA Roads ( | California road network. Data available at | 1,965,206 | 2,766,607 |
| Diseasome ( | A network of genes’ disorders and disease linked by known disorder–gene associations. The data is available at | 1,419 | 2,738 |
| Email ( | The symmetrized network of email communication at the University Rovira i Virgili (Tarragona, Spain). The nodes represent users, and edges indicate an email communication took place between the two uses. The dataset is available at | 1,133 | 5,451 |
| Erdos 02 | The 2002 version of Erdös’ co-authorship network. The network is available at | 6,927 | 11,850 |
| Indochina 2004 ( | A WWW network available at | 11,358 | 47,606 |
| Internet ( | Network of Internet routers. The network is available at | 124,651 | 193,620 |
| Java | The symmetrized version of a network where nodes represent Java classes and edges represent compile-time dependencies between two classes. The dataset can be found at | 1,538 | 7,817 |
| Oregon ( | Autonomous Systems (AS) peering network inferred from Oregon route-views on May 26, 2001.The data is available at | 11,174 | 23,409 |
| PGP ( | A social network of users using Pretty Good Privacy (PGP) algorithm. The network is available at | 10,680 | 24,316 |
| Political Blogs ( | A network of hyperlinks among political web blogs. The data is available at | 643 | 2,280 |
| Power ( | The Western States Power Grid of the United States. Data available at | 4,941 | 6,594 |
| Spam ( | A WWW network available at | 4,767 | 37,375 |
| Twitter ( | A Twitter network of follow relationship. Data available at | 404,719 | 713,319 |
| Web Edu ( | A WWW network available at | 3,031 | 6,474 |
| Wiki Talks ( | A symmetrized version of the Wikipedia talk network. A node represents a user, and an edge indicates that one user edited the talk age of another user. Data available at | 2,394,385 | 4,659,565 |
| World Transport ( | A worldwide airport network. Nodes represent cities, and edges indicate a flight connecting two cities. The data is available at | 3,618 | 14,142 |
| Yahoo IM ( | Network of sample Yahoo! Messenger communication events. The data is available at | 100,001 | 587,964 |
| Youtube ( | A Youtube friendship network. Data available at | 1,134,890 | 2,987,624 |
| Zakary’s Karate Club ( | A friendship network among members of a karate club at an American university. The data was collected in the 1970s by Wayne Zachary and is available at | 34 | 78 |
Time (in seconds) required to compute the score of all non-existing links using Resource Allocation index on a single core.
| Network | LinkPred (C++) | LinkPred (Java) | LinkPred (Python) | Python package NetworkX | R package | Python package linkpred | Python package scikit-network |
|---|---|---|---|---|---|---|---|
| Political Blogs | 0.02 | 0.03 | 0.14 | 1.83 | 3.70 | 0.68 | 3.13 |
| Diseasome | 0.04 | 0.16 | 0.86 | 6.33 | 2.53 | 1.26 | 14.98 |
| 0.05 | 0.12 | 0.56 | 7.78 | 6.88 | 1.60 | 9.63 | |
| Web Edu | 0.14 | 0.72 | 4.16 | 36.92 | 8.67 | 5.31 | 68.71 |
| Java | 0.08 | 0.23 | 1.04 | 17.08 | 55.54 | 8.95 | 17.82 |
| Power | 0.36 | 1.83 | 11.05 | 80.55 | 3.80 | 11.16 | 183.71 |
| Erdos 02 | 0.76 | 3.62 | 21.71 | 179.75 | 44.15 | 30.42 | 358.37 |
| World Air | 0.31 | 1.10 | 5.79 | 81.06 | 55.06 | 11.71 | 97.91 |
| Oregon | 2.32 | 9.62 | 56.96 | 525.76 | 573.47 | 157.60 | 936.84 |
| PGP | 2.42 | 9.12 | 51.31 | 603.75 | 35.74 | 57.32 | 862.56 |
| Spam | 0.99 | 2.33 | 10.33 | 318.16 | 199.83 | 42.80 | 171.68 |
| Indochina 2004 | 2.48 | 10.04 | 59.16 | 1,086.26 | 91.95 | 74.61 | 1,003.82 |
Time achieved by LinkPred on different prediction tasks.
Column n contains the number of nodes in the network, whereas m shows the number of edges.
| Network | Task | Hardware | Time (sec.) | ||
|---|---|---|---|---|---|
| Brightkite | 58,228 | 214,078 | Compute ROC using 10% removed edges for ADA. | 1 node, 6 cores (Core i7-8750H) | 32.92 |
| Yahoo IM | 100,001 | 587,964 | Find the top 104 edges using RAL. | 1 node, 1 core (Core i7-8750H) | 6.70 |
| 404,719 | 713,319 | Find the top 105 edges using RAL. | 1 node, 1 core (Core i7-8750H) | 16.93 | |
| Youtube | 1,134,890 | 2,987,624 | Find the top 105 edges using CNE. | 1 node, 6 cores (Core i7-8750H) | 79,41 |
| CA Roads | 1,965,206 | 2,766,607 | Find the top 105 edges using CNE. | 1 node, 6 cores (Core i7-8750H) | 7.08 |
| Wiki Talks | 2,394,385 | 4,659,565 | Find the top 105 edges using CNE. | 1 node, 6 cores (Core i7-8750H) | 470.04 |
| Internet | 124,651 | 193,620 | Compute top-precision using 10% removed edges for eight algorithms. | 8 nodes, 16 cores in each node (Xeon E5-2650) | 3.73 |
| Amazon | 334,863 | 925,872 | Compute top-precision using 10% removed edges for eight algorithms. | 8 nodes, 16 cores in each node (Xeon E5-2650) | 24.17 |