| Literature DB >> 26788737 |
Bo Ouyang1, Lurong Jiang2, Zhaosheng Teng1.
Abstract
Link prediction plays an important role in both finding missing links in networked systems and complementing our understanding of the evolution of networks. Much attention from the network science community are paid to figure out how to efficiently predict the missing/future links based on the observed topology. Real-world information always contain noise, which is also the case in an observed network. This problem is rarely considered in existing methods. In this paper, we treat the existence of observed links as known information. By filtering out noises in this information, the underlying regularity of the connection information is retrieved and then used to predict missing or future links. Experiments on various empirical networks show that our method performs noticeably better than baseline algorithms.Entities:
Mesh:
Year: 2016 PMID: 26788737 PMCID: PMC4720285 DOI: 10.1371/journal.pone.0146925
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Topological parameters of the real-world networks.
| | | | | 〈 | ||||
|---|---|---|---|---|---|---|
| Karate | 34 | 78 | 0.571 | -0.476 | 4.588 | 1.693 |
| FoodWeb | 128 | 2075 | 0.335 | -0.112 | 32.422 | 1.237 |
| Jazz | 198 | 2742 | 0.617 | 0.020 | 27.697 | 1.395 |
| Neural | 297 | 2148 | 0.292 | -0.163 | 14.465 | 1.801 |
| USAir | 332 | 2126 | 0.625 | -0.208 | 12.807 | 3.464 |
| Metabolic | 453 | 2025 | 0.646 | -0.226 | 8.940 | 4.485 |
| 1133 | 5451 | 0.220 | 0.078 | 9.622 | 1.942 | |
| PB | 1490 | 16715 | 0.263 | -0.221 | 22.436 | 3.622 |
| Yeast | 2361 | 6646 | 0.130 | -0.099 | 5.630 | 2.944 |
| EPA | 4772 | 8909 | 0.064 | -0.303 | 3.734 | 7.573 |
| Router | 5022 | 6258 | 0.012 | -0.138 | 2.492 | 5.503 |
| WikiVote | 8297 | 100762 | 0.121 | -0.083 | 24.289 | 5.985 |
|V| and |E| are the number of nodes and links. C is the clustering coefficient and r the degree-degree correlation coefficient. 〈k〉 is the average degree, 〈d〉 is the average shortest distance, and H is the degree heterogeneity H = 〈k2〉/〈k〉2.
Fig 1Demonstration of using rows of A + I as the feature vectors for nodes.
In the network, nodes 4 and 5 are topologically equivalent. However, the 4th row of A reads [0, 1, 1, 0, 1], and the 5th reads [0, 1, 1, 1, 0], which are different. By adding I, the 4th and 5th rows of A + I now are both [0, 1, 1, 1, 1], which is exactly what we want. This is also the case for nodes 2 and 3. The k-th feature of a node can be interpreted as whether the distance between it and node k is no more than 1. For example, the distance between node 1 and 4 is greater than 1, while the distance between all the other nodes and node 4 are within 1, so the 4th feature is [0, 1, 1, 1, 1].
Comparison of the prediction accuracy under the AUC metric in real-world networks.
| CN | AA | RA | PA | LP | Katz | NF | |
|---|---|---|---|---|---|---|---|
| Karate | 0.6994(162) | 0.7338(202) | 0.7281(182) | 0.7006(297) | 0.7206(200) | 0.7375(284) | |
| FoodWeb | 0.6104(11) | 0.6094(11) | 0.6120(8) | 0.7332(9) | 0.6235(11) | 0.6770(10) | |
| Jazz | 0.9545(2) | 0.9619(2) | 0.7668(8) | 0.9591(1) | 0.9485(2) | 0.9663(1) | |
| Neural | 0.8441(4) | 0.8589(4) | 0.8644(4) | 0.7529(7) | 0.8595(6) | 0.8575(5) | |
| USAir | 0.9359(3) | 0.9477(3) | 0.9537(2) | 0.8856(5) | 0.9427(3) | 0.9242(3) | |
| Metabolic | 0.9198(3) | 0.9506(2) | 0.8172(7) | 0.9233(3) | 0.9195(4) | 0.9319(2) | |
| 0.8442(1) | 0.8464(1) | 0.8467(1) | 0.7779(3) | 0.8942(2) | 0.8973(1) | ||
| PB | 0.9368(0) | 0.9396(0) | 0.9398(0) | 0.9325(0) | 0.9495(0) | 0.9336(1) | |
| Yeast | 0.7061(0) | 0.7066(0) | 0.7061(1) | 0.7865(3) | 0.8184(2) | 0.7989(3) | |
| EPA | 0.5860(0) | 0.5865(0) | 0.5868(0) | 0.7371(2) | 0.7855(0) | 0.7376(1) | |
| Router | 0.5580(0) | 0.5579(0) | 0.5579(0) | 0.4694(3) | 0.6320(0) | 0.3738(3) | |
| Wikivote | 0.9337(0) | 0.9347(0) | 0.9344(0) | 0.9484(0) | 0.9616(0) | 0.9584(0) |
Each value is obtained by averaging over 100 implementations with independent random divisions of the training set(90%) and the probe set(10%). The method proposed in this paper is in the last column, NF (Noise Filtering). The best result achieved for each network data is in boldface. The numbers in the brackets denote the standard deviations. For example, 0.6994(162) means that the AUC value is 0.6994 and the standard deviation is 162 × 10−4.
Comparison of the prediction accuracy under the precision metric in real-world networks.
| CN | AA | RA | PA | LP | Katz | NF | |
|---|---|---|---|---|---|---|---|
| Karate | 0.1525(96) | 0.1538(156) | 0.1538(146) | 0.0863(68) | 0.1613(123) | 0.1487(93) | |
| FoodWeb | 0.0707(2) | 0.0755(2) | 0.0754(3) | 0.1607(4) | 0.0758(2) | 0.1023(3) | |
| Jazz | 0.5044(6) | 0.5244(6) | 0.5393(5) | 0.1300(4) | 0.5120(7) | 0.4920(6) | |
| Neural | 0.0962(2) | 0.1039(3) | 0.1025(3) | 0.0575(2) | 0.0985(3) | 0.1027(2) | |
| USAir | 0.3730(8) | 0.3898(8) | 0.3164(7) | 0.3738(9) | 0.3695(8) | 0.3905(9) | |
| Metabolic | 0.1378(4) | 0.1932(4) | 0.2680(5) | 0.0999(4) | 0.1449(5) | 0.1408(4) | |
| 0.1392(2) | 0.1552(2) | 0.1400(2) | 0.0174(0) | 0.1469(1) | 0.1355(2) | ||
| PB | 0.1729(0) | 0.1716(0) | 0.1493(0) | 0.0652(0) | 0.1735(0) | 0.0861(11) | |
| Yeast | 0.0924(0) | 0.0912(0) | 0.0736(0) | 0.0093(0) | 0.0950(1) | 0.0925(0) | |
| EPA | 0.0090(0) | 0.0148(0) | 0.0198(0) | 0.0044(0) | 0.0135(0) | 0.0136(0) | |
| Router | 0.0166(0) | 0.0162(0) | 0.0096(0) | 0.0096(0) | 0.0212(0) | 0.0226(0) | |
| Wikivote | 0.1009(0) | 0.0999(0) | 0.0833(0) | 0.0616(0) | 0.1005(0) | 0.1028(0) |
Each value is obtained by averaging over 100 implementations with independent random divisions of the training set(90%) and the probe set(10%). The method proposed in this paper is in the last column, NF (Noise Filtering). The best result achieved for each network data is in boldface. The numbers in the brackets denote the standard deviations. For example, 0.1525(96) means that the precision value is 0.1525 and the standard deviation is 96 × 10−4.
Fig 2Comparison of prediction accuracy under the AUC metric.
The fraction of training sets f is varied from 0.5 to 0.9.
Fig 3Comparison of prediction accuracy under the precision metric.
The fraction of training sets f is varied from 0.5 to 0.9.
Fig 4Prediction accuracy with different cutoff threshold t in the proposed noise-filtering method.
The symbol f denotes the fraction of links in the training sets.
Comparison of the computational efficiency in real-world networks.
| CN | AA | RA | PA | LP | Katz | NF | |
|---|---|---|---|---|---|---|---|
| Karate | 0.2722 | 0.0863 | 0.0794 | 0.0741 | 0.0765 | 0.0767 | 0.3953 |
| FoodWeb | 0.1265 | 0.1332 | 0.1319 | 0.1408 | 0.1511 | 0.1659 | 0.2652 |
| Jazz | 0.1928 | 0.1939 | 0.2207 | 0.2117 | 0.2357 | 0.2596 | 0.4017 |
| Neural | 0.2041 | 0.2295 | 0.2302 | 0.2499 | 0.2491 | 0.3566 | 0.4580 |
| USAir | 0.2428 | 0.2789 | 0.2220 | 0.2788 | 0.3358 | 0.3118 | 0.5731 |
| Metabolic | 0.3635 | 0.3644 | 0.3881 | 0.4841 | 0.5499 | 0.5712 | 0.6719 |
| 1.3969 | 1.6700 | 1.5221 | 3.2462 | 2.4013 | 5.0422 | 3.1099 | |
| PB | 4.5587 | 5.0003 | 5.1569 | 6.0813 | 6.8293 | 9.7084 | 6.0244 |
| Yeast | 4.9859 | 6.8925 | 6.1101 | 13.9745 | 7.1093 | 21.4607 | 12.9680 |
| EPA | 20.3863 | 29.6148 | 26.7357 | 53.4191 | 24.4295 | 89.9699 | 50.5754 |
| Router | 19.2990 | 33.6029 | 25.6877 | 64.5513 | 23.8481 | 72.9081 | 89.1175 |
| WikiVote | 366.9862 | 387.7545 | 386.7545 | 447.2611 | 453.6359 | 704.4210 | 526.8122 |
Each value is the total time (in seconds) for 100 runs, with independent random divisions of the training set(90%) and the probe set(10%). The method proposed in this paper is in the last column, NF (Noise Filtering).