| Literature DB >> 26620158 |
Peng Zhang1, Futian Wang1, Xiang Wang1, An Zeng2, Jinghua Xiao3,1.
Abstract
Link prediction is a fundamental problem with applications in many fields ranging from biology to computer science. In the literature, most effort has been devoted to estimate the likelihood of the existence of a link between two nodes, based on observed links and nodes' attributes in a network. In this paper, we apply several representative link prediction methods to reconstruct the network, namely to add the missing links with high likelihood of existence back to the network. We find that all these existing methods fail to identify the links connecting different communities, resulting in a poor reproduction of the topological and dynamical properties of the true network. To solve this problem, we propose a community-based link prediction method. We find that our method has high prediction accuracy and is very effective in reconstructing the inter-community links.Entities:
Mesh:
Year: 2015 PMID: 26620158 PMCID: PMC4664866 DOI: 10.1038/srep17287
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The illustration of the community-based link prediction method.
The network on the left is the network consisting of the links in the training set. The nodes within one community are marked by the same color. The solid links represent the observed links and the dashed links stand for the predicted links. When β = 0, the inter-community missing links are ranked higher than the intra-community missing links in the prediction list. Therefore, mainly inter-community links are added to the network by the link prediction method. When β = 1, the intra-community missing links are ranked higher than the inter-community missing links in the prediction list, and mainly intra-community links are added to the network. When β = 0.05, the results are mixed, both inter- and intra-community missing links are added to the network. The similarity measure used in this toy network is CN.
Figure 2The influence of β on AUC and 〈B〉 in the GN-benchmark networks.
(a–d) are the results of CBCN and CBRA, respectively. The solid lines are the results of the community-based link prediction methods (CBCN and CBRA) and the dashed lines are the results of the classic link prediction methods (CN and RA). The results are averaged over 100 independent realizations.
Basic structural properties (network size N, edge number E, average degree 〈 k 〉) of the real networks, and β * of CBCN and CBRA and AUC of the four methods when applied to these networks (AUC of CBCN and CBRA is obtained when β = β *).
| Network | 〈 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| ZK | 34 | 78 | 4.59 | 0.72 | 0.82 | 0.724 | 0.695 | 0.752 | 0.734 |
| NS | 379 | 914 | 4.82 | 0.35 | 0.74 | 0.982 | 0.978 | 0.984 | 0.981 |
| 1133 | 5451 | 9.62 | 0.26 | 0.63 | 0.875 | 0.855 | 0.873 | 0.856 | |
| C.elegans | 297 | 2148 | 14.5 | 0.80 | 0.93 | 0.852 | 0.850 | 0.847 | 0.870 |
The results are averaged over 100 independent realizations.
Figure 3The influence of β on AUC and 〈B〉 in four real networks.
(a–d) are the results of CBCN and CBRA, respectively. The solid lines are the results of the community-based link prediction methods (CBCN and CBRA) and the dashed lines are the results of the classic link prediction methods (CN and RA). The results are averaged over independent realizations.
Figure 4The influence of κ on β* and AUC* in the GN-benchmark networks.
(a–d) are the results of CBCN and CBRA, respectively. The solid lines are the results of the community-based link prediction methods (CBCN and CBRA) and the dashed lines are the results of the classic link prediction methods (CN and RA). The results are averaged over 100 independent realizations.
The description of the parameters β *, and Constrained .
| Parameter | Data division | Description |
|---|---|---|
| 10% | determined when | |
| 10% | determined when | |
| Constrained | 10% | (1) If |
Here, AUCo means the AUC value of the original link prediction methods such as CN or RA.
The properties of the reconstructed networks when different link prediction methods are applied.
| Net | properties | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ZK | 2.41 | 2.28 | 2.45 | 2.58 | 2.25 | 2.46 | 2.49 | |||
| 0.571 | 0.595 | 0.550 | 0.612 | 0.611 | 0.623 | 0.668 | ||||
| −0.476 | −0.369 | −0.388 | − | −0.193 | −0.389 | −0.430 | −0.204 | |||
| 462 | 502 | 466 | 469 | 492 | 466 | 469 | ||||
| 38.7 | 57.6 | 42.5 | 48.6 | 52.9 | 42.4 | 49.2 | ||||
| 7.77 | 8.64 | 7.54 | 8.66 | 8.41 | 7.46 | 8.80 | ||||
| NS | 6.04 | 6.24 | 6.42 | 6.80 | 5.83 | 6.42 | 7.12 | |||
| 0.741 | 0.654 | 0.668 | 0.667 | 0.694 | 0.713 | 0.724 | ||||
| −0.0817 | 0.0183 | 0.0335 | 0.0670 | −0.1004 | − | −0.0712 | 0.0485 | |||
| 5.7·104 | 6.1·104 | 6.2·104 | 5.9·104 | 6.3·104 | 6.0·104 | 5.7·104 | ||||
| 2305 | 3421 | 2861 | 3447 | 3128 | 2787 | 3150 | ||||
| 8.02 | 8.73 | 8.70 | 9.21 | 8.38 | 7.76 | 8.74 | ||||
| 3.61 | 3.65 | 3.68 | 3.75 | 3.57 | 3.63 | 3.71 | ||||
| 0.220 | 0.232 | 0.238 | 0.233 | 0.327 | 0.358 | 0.339 | ||||
| 0.0782 | 0.163 | 0.165 | 0.238 | 0.0753 | 0.0756 | 0.212 | ||||
| 217 | 331 | 307 | 372 | 235 | 262 | 273 | ||||
| 18.7 | 21.7 | 21.5 | 21.6 | 19.7 | 19.4 | 19.9 | ||||
| C.elegans | 2.45 | 2.47 | 2.49 | 2.53 | 2.40 | 2.48 | 2.56 | |||
| 0.292 | 0.333 | 0.351 | 0.349 | 0.385 | 0.369 | 0.384 | ||||
| −0.163 | −0.113 | −0.0980 | − | −0.0405 | −0.130 | −0.129 | − | −0.0428 | ||
| 159 | 176 | 148 | 195 | 185 | 151 | 217 | ||||
| 26.1 | 29.7 | 28.2 | 31.5 | 29.7 | 27.3 | 32.1 | ||||
represents the original networks, and , , stand for the reconstructed networks, when β = 0, β = constrained , β = 1 respectively. is the reconstructed networks of the traditional methods CN and RA. , , , , , in turn, represent the average shortest path, the clustering coefficient, the assortativity coefficient, congestability, synchronizability and spreading ability of the networks. We highlight the values that are closest to the original networks in bold font. The results are averaged over 100 independent realizations.