| Literature DB >> 33244025 |
Purushottam Kumar1, Dolly Sharma2.
Abstract
Link prediction in networks has applications in computer science, graph theory, biology, economics, etc. Link prediction is a very well studied problem. Out of all the different versions, link prediction for unipartite graphs has attracted most attention. In this work we focus on link prediction for bipartite graphs that is based on two very important concepts-potential energy and mutual information. In the three step approach; first the bipartite graph is converted into a unipartite graph with the help of a weighted projection, next the potential energy and mutual information between each node pair in the projected graph is computed. Finally, we present Potential Energy-Mutual Information based similarity metric which helps in prediction of potential links. To evaluate the performance of the proposed algorithm four similarity metrics, namely AUC, Precision, Prediction-power and Precision@K were calculated and compared with eleven baseline algorithms. The Experimental results show that the proposed method outperforms the baseline algorithms.Entities:
Year: 2020 PMID: 33244025 PMCID: PMC7691373 DOI: 10.1038/s41598-020-77364-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Projection of bipartite graph G.
Figure 2Link prediction process.
Figure 3Graph G2.
Potential energy computation for some missing edges in Graph G2.
| Edge ( | Product of the degree of nodes | The sum of the clustering coefficient of common neighbors | Inverse of the shortest distance between nodes. | |
|---|---|---|---|---|
| (B, E) | 3 | .1666 | .50 | .25 |
| (C, E) | 2 | .1666 | .50 | .16 |
| (E, G) | 2 | .1 | .20 | .04 |
| (E, F) | 3 | .1 | .33 | .1 |
Figure 4Projection of bipartite graph.
Statistics of the dataset, where ML(MovieLens), EN (Enzyme), SWN (Southern Women Network), CL (Corporate Leadership), CM (Club Membership), IC (Ion channels), C2O (Country-Organizations), Mal (Malaria), GPC (G-protein coupled receptors) and Drug (Drug Target) are the names of the datasets and Data Types represent the domain of the datasets.
| Name of dataset | Number of nodes | Number of edges | Average degree | Dataset types |
|---|---|---|---|---|
| ML | 2625 | 85,250 | 32.48 | Entertainment |
| EN | 1109 | 2926 | 5.2 | Biological |
| SWN | 32 | 89 | 5.5 | Social network |
| CL | 64 | 99 | 4.5 | Management |
| CM | 65 | 95 | 4.7 | Social network |
| IC | 414 | 1476 | 3.5 | Biological |
| C2O | 295 | 12,170 | 41.2 | Global network |
| Mal | 1103 | 2965 | 2.6 | Genetic network |
| GPC | 318 | 635 | 2 | Biological |
| Drug | 350 | 454 | 1.3 | Chemical network |
AUC comparison results on ten datasets (ML, EN, SWN, CL, CM, IC, C2O, Mal, GPC and Drug).
| Dataset | ML | EN | SWN | CL | CM | IC | C2O | Mal | GPC | Drug |
|---|---|---|---|---|---|---|---|---|---|---|
| PA | .881 | .788 | .648 | .773 | .764 | .823 | .901 | .591 | .720 | .880 |
| CN | .871 | .851 | .730 | .811 | .801 | .910 | .990 | .901 | .812 | .920 |
| JC | .791 | .880 | .663 | .821 | .798 | .850 | .950 | .901 | .821 | .910 |
| CAR | .912 | .867 | .726 | .940 | .906 | .916 | .990 | .910 | .811 | .901 |
| CJC | .882 | .867 | .762 | .960 | .940 | .925 | .990 | .910 | .831 | .910 |
| CAA | .910 | .851 | .760 | .950 | .942 | .920 | .831 | .910 | ||
| CRA | .921 | .890 | .772 | .945 | .955 | .931 | .910 | .821 | .930 | |
| BPR | .911 | .891 | .742 | .943 | .959 | .920 | .990 | .901 | .840 | .920 |
| CS | .831 | .836 | .761 | .775 | .882 | .835 | .960 | .821 | .801 | .871 |
| PLP | .930 | .889 | .936 | .905 | .960 | .965 | .906 | .849 | .938 | |
| NMF | .891 | .761 | .692 | .854 | .846 | .850 | .990 | .861 | .702 | .890 |
| PMIL | .940 | .938 | .971 |
Each dataset is divided into training set (90%) and test set (10%) and results are computed by averaging over 1000 runs.
Precision comparison results on ten datasets averaged over 1000 runs.
| Dataset | ML | EN | SWN | CL | CM | IC | C2O | Mal | GPC | Drug |
|---|---|---|---|---|---|---|---|---|---|---|
| PA | .153 | .023 | .122 | .110 | .157 | .036 | .871 | .022 | .081 | .313 |
| CN | .141 | .370 | .141 | .210 | .202 | .230 | .871 | .192 | .310 | .610 |
| JC | .001 | .031 | .021 | .042 | .036 | .021 | .601 | .250 | .012 | .383 |
| CAR | .177 | .507 | .188 | .189 | .202 | .432 | .871 | .191 | .332 | .601 |
| CJC | .184 | .496 | .188 | .217 | .231 | .494 | .871 | .232 | .361 | .191 |
| CAA | .181 | .502 | .122 | .621 | .531 | .870 | .191 | .320 | .591 | |
| CRA | .181 | .651 | .210 | .612 | .631 | .560 | .880 | .251 | .373 | .631 |
| BPR | .181 | .501 | .162 | .620 | .641 | .442 | .253 | .271 | ||
| CS | .120 | .330 | .163 | .165 | .455 | .349 | .661 | .142 | .201 | .491 |
| PLP | .191 | .491 | .410 | .210 | .620 | .631 | .221 | .283 | .301 | |
| NMF | .001 | .001 | .031 | .022 | .031 | .011 | .001 | .001 | .012 | .021 |
| PMIL | .205 | .581 | .601 | .310 |
Figure 5AUC values for baseline algorithms where the size of the training set varies from 0.4 to 0.9 tested on Drug dataset.
Figure 6Comparison of precision@10 on the four datasets.
Figure 7Comparison of precision@20 on the four datasets.
Figure 8Comparison of precision@50 on the four datasets.
Figure 9Comparison of prediction-power on the four datasets.