| Literature DB >> 26752405 |
Tao Wang1, Hongjue Wang1, Xiaoxia Wang2.
Abstract
Lots of similarity-based algorithms have been designed to deal with the problem of link prediction in the past decade. In order to improve prediction accuracy, a novel cosine similarity index CD based on distance between nodes and cosine value between vectors is proposed in this paper. Firstly, node coordinate matrix can be obtained by node distances which are different from distance matrix and row vectors of the matrix are regarded as coordinates of nodes. Then, cosine value between node coordinates is used as their similarity index. A local community density index LD is also proposed. Then, a series of CD-based indices include CD-LD-k, CD*LD-k, CD-k and CDI are presented and applied in ten real networks. Experimental results demonstrate the effectiveness of CD-based indices. The effects of network clustering coefficient and assortative coefficient on prediction accuracy of indices are analyzed. CD-LD-k and CD*LD-k can improve prediction accuracy without considering the assortative coefficient of network is negative or positive. According to analysis of relative precision of each method on each network, CD-LD-k and CD*LD-k indices have excellent average performance and robustness. CD and CD-k indices perform better on positive assortative networks than on negative assortative networks. For negative assortative networks, we improve and refine CD index, referred as CDI index, combining the advantages of CD index and evolutionary mechanism of the network model BA. Experimental results reveal that CDI index can increase prediction accuracy of CD on negative assortative networks.Entities:
Mesh:
Year: 2016 PMID: 26752405 PMCID: PMC4713445 DOI: 10.1371/journal.pone.0146727
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Basic topological features of example networks.
Where, e is the efficiency of a network and defined as [51], c is clustering coefficient [4], r is assortative coefficient [33], h is degree heterogeneity and defined as , where 〈k〉 is average degree of a network [16]. d is the diameter of a network. lcp is the correlation between LCP and CN indices presented in [3]. For more definitions and details of the mentioned topological measures, please reference to [51–53].
| n | m | e | c | r | h | d | lcp | |
|---|---|---|---|---|---|---|---|---|
| 332 | 2126 | 0.406 | 0.749 | -0.208 | 3.46 | 6 | 0.9799 | |
| 1224 | 19090 | 0.397 | 0.361 | -0.079 | 3.13 | 8 | 0.9286 | |
| 5022 | 6258 | 0.167 | 0.033 | -0.138 | 5.05 | 15 | 0.8067 | |
| 297 | 2148 | 0.308 | 0.2924 | -0.1632 | 1.8008 | 5 | 0.9056 | |
| 112 | 425 | 0.442 | 0.1728 | -0.1293 | 1.8149 | 5 | 0.8528 | |
| 1461 | 2742 | 0.016 | 0.878 | 0.462 | 1.85 | 17 | 0.9474 | |
| 4941 | 6594 | 0.063 | 0.107 | 0.003 | 1.45 | 46 | 0.8456 | |
| 115 | 613 | 0.4504 | 0.4032 | 0.1624 | 1.01 | 4 | 0.8931 | |
| 1133 | 5451 | 0.2999 | 0.254 | 0.0782 | 1.94 | 8 | 0.8538 | |
| 198 | 2742 | 0.5132 | 0.633 | 0.0202 | 1.3951 | 6 | 0.9484 |
Precision values of link prediction indices on example networks.
The order of the networks is organized according to their increasing assortative coefficient (from negative to positive), and values in brackets under the network names are the coefficient of each network.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| USAir | Neural | INT | Word | PB | Grid | Jazz | FT | NS | ||
| (-0.208) | (-0.1632) | (-0.138) | (-0.1293) | (-0.079) | (0.003) | (0.0202) | (0.0782) | (0.1624) | (0.462) | |
| 0.3862 | 0.0943 | 0.0566 | 0.0744 | 0.1764 | 0.0568 | 0.5062 | 0.1421 | 0.2894 | 0.5098 | |
| 0.0554 | 0.0233 | 0 | 0 | 0.0099 | 0.0152 | 0.5404 | 0.044 | 0.3484 | 0.5913 | |
| 0.3313 | 0.0579 | 0.0192 | 0.1023 | 0.0671 | 0.0008 | 0.1305 | 0.0173 | 0.0003 | 0.04 | |
| 0.0742 | 0.0288 | 0 | 0 | 0.0196 | 0.0147 | 0.5287 | 0.0648 | 0.3452 | 0.5905 | |
| 0.0103 | 0 | 0 | 0 | 0.0005 | 0.0121 | 0.1018 | 0.0018 | 0.2429 | ||
| 0.0994 | 0.0195 | 0.0614 | 0.1562 | 0.0326 | 0.5397 | 0.1447 | 0.2826 | |||
| 0.3822 | 0.0993 | 0.0577 | 0.0812 | 0.0614 | 0.4727 | 0.1368 | 0.2631 | 0.4982 | ||
| 0.3728 | 0.0958 | 0.0511 | 0.1821 | 0.448 | 0.133 | 0.271 | 0.4982 | |||
| 0.1191 | 0.0397 | 0.067 | 0.095 | 0.0094 | 0.3569 | 0.0667 | 0.2531 | 0.3927 | ||
| 0.3697 | 0.0862 | 0.1019 | 0.1462 | 0.0579 | 0.2753 | 0.0852 | 0.1865 | 0.3655 | ||
| 0.3771 | 0.0921 | 0.062 | 0.0549 | 0.1729 | 0.035 | 0.5187 | 0.1402 | 0.3366 | 0.5062 | |
| 0.3778 | 0.098 | 0.0208 | 0.0695 | 0.173 | 0.005 | 0.5168 | 0.1293 | 0.3323 | 0.3193 | |
| 0.3792 | 0.1001 | 0.0617 | 0.047 | 0.1731 | 0.0314 | 0.5338 | 0.1434 | 0.3576 | 0.584 | |
| 0.4028 | 0.115 | 0.0617 | 0.0491 | 0.1796 | 0.0314 | 0.3581 | 0.6269 | |||
| 0.3587 | 0.0701 | 0.0607 | 0.0249 | 0.1616 | 0.0314 | 0.5568 | 0.1523 | 0.3563 | 0.5585 | |
| 0.3841 | 0.0977 | 0.0515 | 0.0298 | 0.1765 | 0.0562 | 0.5128 | 0.1478 | 0.3111 | 0.5331 | |
| 0.4178 | 0.0837 | 0.0655 | 0.106 | 0.1758 | 0.0621 | 0.5164 | 0.1381 | 0.3387 | 0.54 | |
| 0.4178 | 0.0837 | 0.0687 | 0.1023 | 0.1764 | 0.0621 | 0.5236 | 0.1344 | 0.3548 | 0.5436 | |
| 0.4178 | 0.0977 | 0.0703 | 0.098 | 0.1758 | 0.0621 | 0.5236 | 0.141 | 0.3468 | 0.5418 | |
| 0.4178 | 0.0977 | 0.0687 | 0.0977 | 0.1755 | 0.0621 | 0.5236 | 0.1447 | 0.3468 | 0.5418 | |
| 0.4178 | 0.0977 | 0.0703 | 0.0977 | 0.1755 | 0.0621 | 0.5236 | 0.1458 | 0.3468 | 0.54 | |
| 0.3839 | 0.0915 | 0.0475 | 0.0395 | 0.1768 | 0.0562 | 0.5186 | 0.1465 | 0.3192 | 0.5076 | |
| 0.4131 | 0.0814 | 0.0495 | 0.0447 | 0.1755 | 0.0621 | 0.5236 | 0.1341 | 0.3306 | 0.54 | |
| 0.4178 | 0.0744 | 0.0607 | 0.0726 | 0.1767 | 0.0621 | 0.5345 | 0.1337 | 0.3629 | 0.54 | |
| 0.4178 | 0.0837 | 0.0527 | 0.0726 | 0.1764 | 0.0621 | 0.5345 | 0.141 | 0.3629 | 0.5327 | |
| 0.4178 | 0.0837 | 0.0591 | 0.0726 | 0.1764 | 0.0621 | 0.5345 | 0.1443 | 0.3629 | 0.5255 | |
| 0.4178 | 0.0837 | 0.0607 | 0.0781 | 0.1767 | 0.0621 | 0.5345 | 0.1454 | 0.3629 | 0.5236 | |
| 0.0085 | 0.0051 | 0.0002 | 0.0058 | 0.001 | 0.0106 | 0.2589 | 0.012 | 0.3477 | 0.12 | |
| 0.0188 | 0.0093 | 0 | 0.0078 | 0.003 | 0.0152 | 0.36 | 0.0308 | 0.2419 | 0.3945 | |
| 0.0188 | 0.0047 | 0 | 0.0078 | 0.0015 | 0.0136 | 0.36 | 0.0484 | 0.371 | 0.2709 | |
| 0.0141 | 0.0047 | 0 | 0.0078 | 0.0012 | 0.0152 | 0.3055 | 0.0319 | 0.379 | 0.2073 | |
| 0.0141 | 0.0047 | 0 | 0.0078 | 0.0018 | 0.0121 | 0.2945 | 0.0117 | 0.379 | 0.1855 | |
| 0.0141 | 0.0047 | 0 | 0.0078 | 0.0018 | 0.0106 | 0.2945 | 0.0099 | 0.379 | 0.1527 | |
| 0.3352 | 0.0558 | 0.0272 | 0.0744 | 0.0726 | 0.0083 | 0.1404 | 0.0257 | 0.0032 | 0.0938 | |
| 0.0047 | 0.0047 | 0 | 0.0065 | 0.0025 | 0 | 0.0161 | 0.0018 | 0.0102 | 0 |
Affiliations of the proposed indices.
CD-based indices represent four proposed indices as CD-LD-k, CD*LD-k, CD-k and CDI, where k is the value of threshold in Eq 1.
| CD-based | |||
|---|---|---|---|
| CD-LD-k | CD*LD-k | CD-k | CDI |
Relative precision of each index.
The order of the networks is organized according to their increasing assortative coefficient (from negative to positive), and values in brackets under the network names are the coefficient of each network. Mean and minimum relative precision values of each index are shown in last two columns. The mean value is used as an indicator of average performance and minimum value is used as a measure of robustness performance.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | Mean | Minimum | |
|---|---|---|---|---|---|---|---|---|---|
| USAir | Neural | Word | PB | Jazz | FT | ||||
| (-0.208) | (-0.1632) | (-0.1293) | (-0.079) | (0.0202) | (0.0782) | (0.1624) | |||
| 82.26059 | 20.2745 | 11.44615 | 70.56 | 31.44099 | 77.58659 | 28.37255 | 45.99163 | 11.44615 | |
| 11.8002 | 5.0095 | 0 | 3.96 | 33.56522 | 24.024 | 34.15686 | 16.07368 | 0 | |
| 70.5669 | 12.4485 | 15.73846 | 26.84 | 8.10559 | 9.445799 | 0.029412 | 20.45352 | 0.029412 | |
| 15.8046 | 6.192 | 0 | 7.84 | 32.83851 | 35.3808 | 33.84314 | 18.84272 | 0 | |
| 2.1939 | 0 | 0 | 0.2 | 6.322981 | 0.9828 | 6.852055 | 0 | ||
| 21.371 | 9.446154 | 62.48 | 33.52174 | 79.00619 | 27.70588 | 47.69951 | 9.446154 | ||
| 81.40859 | 21.3495 | 12.49231 | 29.36025 | 74.69279 | 25.79412 | 45.45965 | 12.49231 | ||
| 79.40639 | 20.597 | 72.84 | 27.82609 | 72.61799 | 26.56863 | 45.18823 | |||
| 25.3683 | 10.30769 | 38 | 22.1677 | 36.4182 | 24.81373 | 26.78544 | 10.30769 | ||
| 78.65957 | 18.34043 | 15.67692 | 58.48 | 17.09938 | 47.33333 | 18.28431 | 36.2677 | 15.67692 | |
| 80.32229 | 19.8015 | 8.446154 | 69.16 | 32.21739 | 76.54919 | 33 | 45.64236 | 8.446154 | |
| 80.47139 | 21.07 | 10.69231 | 69.2 | 32.09938 | 70.59779 | 32.57843 | 45.24419 | 10.69231 | |
| 80.76959 | 21.5215 | 7.230769 | 69.24 | 33.15528 | 78.29639 | 35.05882 | 46.46748 | 7.230769 | |
| 85.79639 | 24.725 | 7.553846 | 71.84 | 35.10784 | 7.553846 | ||||
| 76.40309 | 15.0715 | 3.830769 | 64.64 | 34.58385 | 83.15579 | 34.93137 | 44.65948 | 3.830769 | |
| 81.81329 | 21.0055 | 4.584615 | 70.6 | 31.85093 | 80.69879 | 30.5 | 45.86473 | 4.584615 | |
| 88.99139 | 17.9955 | 16.30769 | 70.32 | 32.07453 | 75.40259 | 33.20588 | 47.7568 | 16.30769 | |
| 88.99139 | 17.9955 | 15.73846 | 70.56 | 32.52174 | 73.38239 | 34.78431 | 47.71054 | 15.73846 | |
| 88.99139 | 21.0055 | 15.07692 | 70.32 | 32.52174 | 76.98599 | 34 | 48.41451 | 15.07692 | |
| 88.99139 | 21.0055 | 15.03077 | 70.2 | 32.52174 | 79.00619 | 34 | 48.67937 | 15.03077 | |
| 88.99139 | 21.0055 | 15.03077 | 70.2 | 32.52174 | 79.60679 | 34 | 48.76517 | 15.03077 | |
| 81.77069 | 19.6725 | 6.076923 | 70.72 | 32.21118 | 79.98899 | 31.29412 | 45.96206 | 6.076923 | |
| 87.99029 | 17.501 | 6.876923 | 70.2 | 32.52174 | 73.21859 | 32.41176 | 45.81719 | 6.876923 | |
| 88.99139 | 15.996 | 11.16923 | 70.68 | 33.19876 | 73.00019 | 35.57843 | 46.94486 | 11.16923 | |
| 88.99139 | 17.9955 | 11.16923 | 70.56 | 33.19876 | 76.98599 | 35.57843 | 47.78276 | 11.16923 | |
| 88.99139 | 17.9955 | 11.16923 | 70.56 | 33.19876 | 78.78779 | 35.57843 | 48.04016 | 11.16923 | |
| 88.99139 | 17.9955 | 12.01538 | 70.68 | 33.19876 | 79.38839 | 35.57843 | 48.26398 | 12.01538 | |
| 1.8105 | 1.0965 | 0.892308 | 0.4 | 16.08075 | 6.551999 | 34.08824 | 8.702898 | 0.4 | |
| 4.0044 | 1.9995 | 1.2 | 1.2 | 22.36025 | 16.8168 | 23.71569 | 10.18523 | 1.2 | |
| 4.0044 | 1.0105 | 1.2 | 0.6 | 22.36025 | 26.4264 | 36.37255 | 13.13916 | 0.6 | |
| 3.0033 | 1.0105 | 1.2 | 0.48 | 18.97516 | 17.4174 | 37.15686 | 11.32046 | 0.48 | |
| 3.0033 | 1.0105 | 1.2 | 0.72 | 18.29193 | 6.388199 | 37.15686 | 9.681541 | 0.72 | |
| 3.0033 | 1.0105 | 1.2 | 0.72 | 18.29193 | 5.4054 | 37.15686 | 9.541141 | 0.72 | |
| 71.3976 | 11.997 | 11.44615 | 29.04 | 8.720497 | 14.0322 | 0.313725 | 20.99245 | 0.313725 |
Index complexity.
All complexities of indices in this table are estimated by the worst condition of networks. CP represents the complexity of an index.
| Index | CN | Salton | PA | Sorensen | LHN | RA | LP | LRW | LB | LCP-based | CD-LD-k | CD*LD-k | CD-k | CDI |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|