| Literature DB >> 25207920 |
Fei Tan1, Yongxiang Xia1, Boyao Zhu1.
Abstract
Topological properties of networks are widely applied to study the link-prediction problem recently. Common Neighbors, for example, is a natural yet efficient framework. Many variants of Common Neighbors have been thus proposed to further boost the discriminative resolution of candidate links. In this paper, we reexamine the role of network topology in predicting missing links from the perspective of information theory, and present a practical approach based on the mutual information of network structures. It not only can improve the prediction accuracy substantially, but also experiences reasonable computing complexity.Entities:
Mesh:
Year: 2014 PMID: 25207920 PMCID: PMC4160214 DOI: 10.1371/journal.pone.0107056
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1An illustration about the calculation of MI model.
Comparison of the prediction accuracy measured by AUC on ten real-world networks.
| Network | CN | RA | LNB-CN | LNB-RA | CAR | CRA | MI |
|
|
|
|
|
|
| 0.8917 | |
| PB |
|
|
|
|
|
| 0.9322 |
| Yeast |
|
|
|
|
|
| 0.9368 |
| SciMet |
|
|
|
|
|
| 0.871 |
| Kohonen |
|
|
|
|
|
| 0.9111 |
| EPA |
|
|
|
|
|
| 0.9249 |
| Grid | 0.6257 | 0.6255 | 0.6258 | 0.6256 |
|
| 0.6076 |
| INT |
|
|
|
|
|
| 0.9559 |
| Wikivote |
|
|
|
|
|
| 0.9663 |
| Lederberg |
|
|
|
|
|
| 0.9449 |
Each value is averaged over 100 independent runs with random divisions of training set and probe set . The bold font represents that MI is better than the corresponding prediction index.
Comparison of the prediction accuracy measured by precision (top-100) on ten real-world networks.
| Network | CN | RA | LNB-CN | LNB-RA | CAR | CRA | MI |
|
|
|
|
|
| 0.3442 | 0.3293 | |
| PB |
|
|
|
| 0.4795 | 0.4876 | 0.4765 |
| Yeast |
|
|
|
|
|
| 0.8264 |
| SciMet |
|
|
|
| 0.1707 | 0.1791 | 0.166 |
| Kohonen |
|
|
|
|
| 0.2345 | 0.224 |
| EPA |
|
|
|
|
|
| 0.0578 |
| Grid |
|
|
|
|
| 0.1846 | 0.1749 |
| INT |
|
|
|
|
|
| 0.217 |
| Wikivote |
|
|
|
| 0.2639 | 0.2849 | 0.1933 |
| Lederberg |
|
|
|
|
| 0.3422 | 0.3312 |
Each value is averaged over 100 independent runs with random divisions of training set and probe set . The bold font represents that MI is better than the corresponding prediction index.
Comparison of the computational efficiency of seven methods on ten real-world networks.
| Network | CN | RA | LNB-CN | LNB-RA | CAR | CRA | MI |
| 0.115 | 0.201 | 0.161 | 0.161 | 1.65 | 1.64 | 0.472 | |
| PB | 0.263 | 0.351 | 0.454 | 0.455 | 2.56 | 2.44 | 0.746 |
| Yeast | 0.414 | 0.802 | 0.499 | 0.499 | 15.3 | 15.3 | 1.97 |
| SciMet | 0.556 | 1.04 | 0.689 | 0.69 | 19.4 | 19.4 | 2.52 |
| Kohonen | 1.21 | 2.13 | 1.6 | 1.6 | 73.9 | 73.7 | 4.51 |
| EPA | 1.25 | 2.45 | 1.4 | 1.4 | 118 | 118 | 5.1 |
| Grid | 1.39 | 3 | 1.49 | 1.49 | 184 | 184 | 7.69 |
| INT | 1.75 | 3.56 | 1.9 | 1.91 | 240 | 239 | 7.69 |
| Wikivote | 5.42 | 8.61 | 9.85 | 9.85 | 573 | 569 | 19.6 |
| Lederberg | 5.38 | 9.8 | 6.9 | 6.9 | 952 | 952 | 22.3 |
Each value is the average time in seconds for 10 independent runs.
The basic structural parameters of the giant components of example networks.
| Network | N | M | e | C | r | H |
|
|
| 1133 | 5451 | 0.2999 | 0.2540 | 0.0782 | 1.9421 | 9.6222 | 3.6028 | |
| PB | 1222 | 16714 | 0.3982 | 0.3600 | −0.2213 | 2.9707 | 27.3552 | 2.7353 |
| Yeast | 2375 | 11693 | 0.2181 | 0.3883 | 0.4539 | 3.4756 | 9.8467 | 5.0938 |
| SciMet | 2678 | 10368 | 0.2569 | 0.2026 | −0.0352 | 2.4265 | 7.7431 | 4.1781 |
| Kohonen | 3704 | 12673 | 0.2957 | 0.3044 | −0.1211 | 9.3170 | 6.8429 | 3.6693 |
| EPA | 4253 | 8897 | 0.2356 | 0.1360 | −0.3041 | 6.7668 | 4.1839 | 4.4993 |
| Grid | 4941 | 6594 | 0.0629 | 0.1065 | 0.0035 | 1.4504 | 2.6691 | 18.9853 |
| INT | 5022 | 6258 | 0.1667 | 0.0329 | −0.1384 | 5.5031 | 2.4922 | 6.4475 |
| Wikivote | 7066 | 100736 | 0.3268 | 0.2090 | −0.0833 | 5.0992 | 28.5129 | 3.2471 |
| Lederberg | 8212 | 41430 | 0.2560 | 0.3634 | −0.1001 | 6.1339 | 10.0901 | 4.4071 |
and are the network size and the number of links, respectively. is the network efficiency [36], denoted as , where is the shortest distance between nodes and . and are clustering coefficient [12] and assortative coefficient [33], respectively. and are the average degree and the average shortest distance. denotes the degree heterogeneity defined as .
Figure 2Cumulative distribution function of local assortativity, vs , for networks PB, Yeast and EPA respectively, where is denoted as the percent of nodes with the local assortativity value not larger than .
is the assortativity coefficient of the network which is presented in Table 4.