| Literature DB >> 27905526 |
Jinxuan Yang1, Xiao-Dong Zhang1.
Abstract
The algorithms based on common neighbors metric to predict missing links in complex networks are very popular, but most of these algorithms do not account for missing links between nodes with no common neighbors. It is not accurate enough to reconstruct networks by using these methods in some cases especially when between nodes have less common neighbors. We proposed in this paper a new algorithm based on common neighbors and distance to improve accuracy of link prediction. Our proposed algorithm makes remarkable effect in predicting the missing links between nodes with no common neighbors and performs better than most existing currently used methods for a variety of real-world networks without increasing complexity.Entities:
Year: 2016 PMID: 27905526 PMCID: PMC5131303 DOI: 10.1038/srep38208
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Illustration of properties of networks.
| Networks | 〈 | 〈 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Karate | 34 | 78 | 0.571 | 0.859 | 0.793 | 0.771 | 2.408 | 4.588 | −0.476 | 1.693 |
| Dolphins | 62 | 159 | 0.259 | 0.761 | 0.710 | 0.715 | 3.357 | 5.129 | −0.044 | 1.327 |
| Polbook | 105 | 441 | 0.488 | 0.959 | 0.937 | 0.927 | 3.079 | 8.400 | −0.128 | 1.421 |
| Word | 112 | 425 | 0.173 | 0.725 | 0.694 | 0.672 | 2.536 | 7.589 | −0.129 | 1.815 |
| Neural | 297 | 2148 | 0.292 | 0.945 | 0.913 | 0.916 | 2.455 | 14.465 | −0.163 | 1.801 |
| Circuit | 512 | 819 | 0.055 | 0.137 | 0.118 | 0.115 | 6.858 | 3.199 | −0.030 | 1.259 |
| 1133 | 5451 | 0.220 | 0.776 | 0.734 | 0.733 | 3.606 | 9.622 | 0.078 | 1.942 | |
| Power | 4941 | 6594 | 0.080 | 0.208 | 0.179 | 0.176 | 18.989 | 2.669 | 0.003 | 1.450 |
Parameters are measured in original networks G except and c in G′ = G − E, where .
c: CN coefficient; ; 〈d〉: average distance; 〈k〉: average degree; c: clustering coefficient; r: assortativity coefficient (see Methods section); : degree heterogeneity. The values of c and are the average of 20 realizations to randomly remove E for each network every time.
Root mean square error (EMSE) and Pearson correlation coefficient (CC) between c , , c and clustering coefficient c in the 8 networks.
| | | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.012 | 0.999 | 0.777 | 0.999 | ||||||
Figure 1The distributions p for d = 2, 3, 4 and the rest in 8 real-world networks.
The AUC of different methods under 10% and 20% probe set in 8 networks.
| Methods | Karate | Dolphins | Polbook | Word | Neural | Circuit | Power | ||
|---|---|---|---|---|---|---|---|---|---|
| 10% | RA | 0.721(78) | 0.775(71) | 0.899(25) | 0.675(38) | 0.552(13) | 0.848(11) | 0.586(5) | |
| AA | 0.711(75) | 0.776(71) | 0.898(25) | 0.677(41) | 0.863(12) | 0.552(13) | 0.849(11) | 0.586(5) | |
| Jaccard | 0.591(63) | 0.770(67) | 0.878(25) | 0.621(36) | 0.792(11) | 0.552(13) | 0.845(11) | 0.586(5) | |
| LHN | 0.578(74) | 0.751(62) | 0.850(26) | 0.584(31) | 0.727(10) | 0.552(13) | 0.838(10) | 0.586(5) | |
| HDI | 0.581(64) | 0.772(69) | 0.865(23) | 0.620(35) | 0.781(12) | 0.552(13) | 0.844(11) | 0.586(5) | |
| HPI | 0.696(79) | 0.754(63) | 0.895(26) | 0.637(37) | 0.808(12) | 0.552(13) | 0.841(10) | 0.586(5) | |
| Sen | 0.591(63) | 0.770(67) | 0.878(25) | 0.621(36) | 0.792(11) | 0.552(13) | 0.845(11) | 0.586(5) | |
| Sal | 0.617(69) | 0.765(66) | 0.886(25) | 0.624(36) | 0.800(10) | 0.552(13) | 0.844(11) | 0.586(5) | |
| CN | 0.679(72) | 0.772(70) | 0.889(26) | 0.678(42) | 0.844(13) | 0.552(13) | 0.847(11) | 0.586(5) | |
| Our | 0.844(10) | ||||||||
| 20% | RA | 0.753(39) | 0.883(18) | 0.658(22) | 0.541(10) | 0.822(6) | 0.571(2) | ||
| AA | 0.683(59) | 0.754(39) | 0.881(18) | 0.660(22) | 0.842(10) | 0.541(10) | 0.822(6) | 0.571(2) | |
| Jaccard | 0.597(41) | 0.749(37) | 0.856(17) | 0.609(19) | 0.773(9) | 0.541(10) | 0.819(6) | 0.571(2) | |
| LHN | 0.588(50) | 0.737(36) | 0.829(19) | 0.583(18) | 0.724(8) | 0.541(10) | 0.813(6) | 0.571(2) | |
| HDI | 0.590(37) | 0.751(38) | 0.847(16) | 0.609(19) | 0.765(9) | 0.541(10) | 0.819(6) | 0.571(2) | |
| HPI | 0.662(63) | 0.738(35) | 0.868(20) | 0.623(21) | 0.789(10) | 0.541(10) | 0.816(6) | 0.571(2) | |
| Sen | 0.597(41) | 0.749(37) | 0.856(17) | 0.609(19) | 0.773(9) | 0.541(10) | 0.819(6) | 0.571(2) | |
| Sal | 0.613(47) | 0.746(36) | 0.862(18) | 0.611(20) | 0.780(10) | 0.541(10) | 0.818(6) | 0.571(2) | |
| CN | 0.662(52) | 0.740(37) | 0.858(17) | 0.655(21) | 0.824(10) | 0.540(10) | 0.821(6) | 0.571(2) | |
| Our | 0.678(71) | 0.828(6) |
The results are the average of 20 realizations for each network, and probe set E will be randomly removed every time. The highest value for each network is labeled in boldface. The numbers in the brackets denote the standard deviations. For example, 0.721(78) denotes that the AUC value is 0.721 and the standard deviation is 78 × 10−4.
The computation of link prediction methods.
| CN | Sal (Salton) | ||
| Jaccard | Sen (S | ||
| HPI (Hub Promoted) | HDI (Hub Depressed) | ||
| LHN (Leicht-Holme-Newman) | AA (Adamic-Adar) | ||
| RA (Resource Allocation) | |||
| PA (Preferential Attachment) | LP (Local Path) |
k is the degree of node i. PA method and LP method do not directly relate to common neighbors, but based on local information, where A is adjacency matrix and β = 0.01.
Figure 2The changes of AUC when increases from 10% to 20% in 8 real-world networks (a–h).
The Precision of different methods under 10% and 20% probe set in 8 networks.
| Methods | Karate | Dolphins | Polbook | Word | Neural | Circuit | Power | ||
|---|---|---|---|---|---|---|---|---|---|
| 10% | RA | 0.154(109) | 0.123(73) | 0.185(55) | 0.054(27) | 0.103(17) | 0.012(9) | 0.143(12) | 0.028(9) |
| AA | 0.132(125) | 0.128(81) | 0.172(53) | 0.068(39) | 0.105(20) | 0.012(9) | 0.158(13) | 0.031(8) | |
| Jaccard | 0.004(16) | 0.087(58) | 0.122(46) | 0.002(4) | 0.021(10) | 0.031(18) | 0.074(9) | 0.016(2) | |
| LHN | 0.007(22) | 0.017(30) | 0.077(48) | 0.001(5) | 0.000(1) | 0.007(9) | 0.004(3) | 0.009(2) | |
| HDI | 0.000(0) | 0.083(64) | 0.105(45) | 0.002(7) | 0.023(8) | 0.020(12) | 0.075(9) | 0.020(1) | |
| HPI | 0.171(91) | 0.022(25) | 0.142(52) | 0.011(9) | 0.007(4) | 0.012(9) | 0.013(4) | 0.030(3) | |
| Sen | 0.004(16) | 0.087(58) | 0.122(46) | 0.002(4) | 0.021(10) | 0.031(18) | 0.074(9) | 0.016(2) | |
| Sal | 0.000(0) | 0.075(57) | 0.120(39) | 0.000(0) | 0.021(10) | 0.015(15) | 0.056(8) | 0.015(2) | |
| CN | 0.143(73) | 0.135(57) | 0.148(46) | 0.063(32) | 0.099(20) | 0.058(21) | 0.149(15) | 0.069(18) | |
| Our | |||||||||
| 20% | RA | 0.160(76) | 0.156(38) | 0.092(33) | 0.150(12) | 0.007(4) | 0.174(9) | 0.022(3) | |
| AA | 0.155(78) | 0.161(43) | 0.278(32) | 0.102(31) | 0.155(14) | 0.007(4) | 0.188(9) | 0.023(2) | |
| Jaccard | 0.028(50) | 0.131(53) | 0.157(34) | 0.014(17) | 0.043(10) | 0.027(13) | 0.097(5) | 0.013(4) | |
| LHN | 0.015(23) | 0.027(25) | 0.089(27) | 0.001(3) | 0.002(2) | 0.027(12) | 0.012(3) | 0.010(3) | |
| HDI | 0.045(47) | 0.159(47) | 0.138(35) | 0.017(18) | 0.050(10) | 0.018(7) | 0.113(7) | 0.018(4) | |
| HPI | 0.145(70) | 0.018(14) | 0.192(36) | 0.006(6) | 0.006(2) | 0.015(7) | 0.012(2) | 0.027(3) | |
| Sen | 0.028(50) | 0.131(53) | 0.157(34) | 0.014(17) | 0.043(10) | 0.027(13) | 0.097(5) | 0.013(4) | |
| Sal | 0.023(42) | 0.108(47) | 0.158(33) | 0.009(12) | 0.036(9) | 0.025(11) | 0.071(8) | 0.013(4) | |
| CN | 0.200(59) | 0.226(44) | 0.243(59) | 0.107(30) | 0.146(13) | 0.047(9) | 0.163(16) | ||
| Our | 0.252(44) | 0.060(4) |
The results are the average of 20 realizations for each network, and probe set E will be randomly removed every time. The highest value for each network is labeled in boldface. The numbers in the brackets denote the standard deviations. For example, 0.154(109) denotes that the Precision value is 0.154 and the standard deviation is 109 × 10−4.
Figure 3The changes of Precision when increases from 10% to 20% in 8 real-world networks (a–h).
Figure 4Pearson correlation coefficient of different methods for prediction accuracy metrics vs. c under 10% and 20% probe set in 8 networks.
(a) The correlation coefficient of different methods for AUC metric vs. c. (b) The correlation coefficient of different methods for Precision metric vs. c.
The Precision of LP, PA and our method under 10% and 20% probe set in 8 networks.
| Networks | PA | LP | Our | PA | LP | Our | ||
|---|---|---|---|---|---|---|---|---|
| Karate | 10% | 0.068(82) | 0.175(140) | 20% | 0.118(70) | 0.177(88) | ||
| Dolphins | 0.020(29) | 0.133(70) | 0.025(31) | 0.199(47) | ||||
| Polbook | 0.044(32) | 0.172(44) | 0.088(25) | 0.221(39) | ||||
| Word | 0.082(43) | 0.083(31) | 0.150(32) | 0.102(30) | ||||
| Neural | 0.054(18) | 0.099(18) | 0.098(9) | 0.145(14) | ||||
| Circuit | 0.002(4) | 0.005(7) | 0.003(3) | 0.020(8) | ||||
| 0.018(7) | 0.136(11) | 0.029(5) | 0.175(10) | |||||
| Power | 0.001(1) | 0.042(2) | 0.002(1) | 0.045(6) |
The results are the average of 20 realizations for each network, and probe set will be randomly removed every time. The highest value is labeled in boldface. The numbers in the brackets denote the standard deviations. For example, 0.068(82) denotes that the Precision value is 0.068 and the standard deviation is 82 × 10−4.
The Precision of different methods to predict missing links between nodes with no common neighbors under 10% and 20% probe set in 8 networks.
| Networks | PA | LP | Our | PA | LP | Our | ||
|---|---|---|---|---|---|---|---|---|
| Karate | 10% | 0.05(224) | 0(0) | 20% | 0.064(116) | 0(0) | ||
| Dolphins | 0(0) | 0(0) | 0.004(19) | 0(0) | ||||
| Polbook | 0(0) | 0(0) | 0(0) | 0(0) | ||||
| Word | 0(0) | 0(0) | 0.003(9) | 0(0) | ||||
| Neural | 0(0) | 0(0) | 0.005(8) | 0(0) | ||||
| Circuit | 0.002(5) | 0(0) | 0.004(3) | 0(0) | ||||
| 0.001(2) | 0(0) | 0.004(4) | 0(0) | |||||
| Power | 0(1) | 0(0) | 0(0) | 0(0) |
Precision = , which denotes the proportion of relevant links in the probe set . The results are the average of 20 realizations for each network, and probe set E will be randomly removed every time. The highest value for each network is labeled in boldface. The numbers in the brackets denote the standard deviations. For example, 0.064 (116) denotes that the Precision value is 0.064 and the standard deviation is 116 × 10−4. The previous mentioned methods based on common neighbors cannot find any missing links between nodes with no common neighbors, and thus we do not list them here.