| Literature DB >> 26961965 |
Liming Pan1,2, Tao Zhou2,3, Linyuan Lü1, Chin-Kun Hu4,5,6.
Abstract
Real network data is often incomplete and noisy, where link prediction algorithms and spurious link identification algorithms can be applied. Thus far, it lacks a general method to transform network organizing mechanisms to link prediction algorithms. Here we use an algorithmic framework where a network's probability is calculated according to a predefined structural Hamiltonian that takes into account the network organizing principles, and a non-observed link is scored by the conditional probability of adding the link to the observed network. Extensive numerical simulations show that the proposed algorithm has remarkably higher accuracy than the state-of-the-art methods in uncovering missing links and identifying spurious links in many complex biological and social networks. Such method also finds applications in exploring the underlying network evolutionary mechanisms.Entities:
Mesh:
Year: 2016 PMID: 26961965 PMCID: PMC4785364 DOI: 10.1038/srep22955
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Illustrating network (graph) G(V, E) with nodes and links for predicting missing links.
Figure 2Illustrating network (graph) G(V, E) with nodes and links for identifying spurious links.
The basic topological features of seven real networks.
| Jazz | 198 | 2742 | 0.618 | 0.020 | 27.697 | 2.235 | 1.395 |
| Metabolic | 453 | 2025 | 0.647 | −0.226 | 8.940 | 2.664 | 4.485 |
| C. elegans | 297 | 2148 | 0.292 | −0.163 | 14.465 | 2.455 | 1.801 |
| USAir | 332 | 2126 | 0.625 | −0.208 | 12.807 | 2.738 | 3.464 |
| FWF | 128 | 2075 | 0.335 | −0.112 | 32.422 | 1.776 | 1.237 |
| FWM | 97 | 1446 | 0.468 | −0.151 | 29.814 | 1.693 | 1.266 |
| Macaca | 94 | 1515 | 0.774 | −0.151 | 32.234 | 1.771 | 1.238 |
and are the number of nodes and links. C is the clustering coefficient38 and r the assortative coefficient43. is the average degree, is the average shortest distance, and H is the degree heterogeneity, as .
The prediction accuracy measured by precision for the seven real networks.
| Precision | Ours | CN | AA | RA | Katz | HSM | SBM | CAR | CPA | CAA | CRA | CJC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jazz | 0.699 | 0.506 | 0.525 | 0.541 | 0.548 | 0.326 | 0.480 | 0.512 | 0.512 | 0.530 | 0.555 | 0.517 |
| Metabolic | 0.384 | 0.137 | 0.190 | 0.147 | 0.145 | 0.100 | 0.213 | 0.142 | 0.142 | 0.153 | 0.209 | 0.133 |
| C. elegans | 0.200 | 0.095 | 0.105 | 0.107 | 0.104 | 0.073 | 0.143 | 0.089 | 0.091 | 0.101 | 0.118 | 0.086 |
| USAir | 0.483 | 0.374 | 0.394 | 0.455 | 0.373 | 0.216 | 0.376 | 0.380 | 0.380 | 0.382 | 0.403 | 0.376 |
| FWF | 0.577 | 0.073 | 0.075 | 0.076 | 0.175 | 0.249 | 0.451 | 0.084 | 0.084 | 0.089 | 0.093 | 0.087 |
| FWM | 0.566 | 0.121 | 0.123 | 0.130 | 0.212 | 0.304 | 0.463 | 0.120 | 0.119 | 0.126 | 0.129 | 0.123 |
| Macaca | 0.755 | 0.528 | 0.533 | 0.513 | 0.586 | 0.462 | 0.662 | 0.543 | 0.542 | 0.551 | 0.549 | 0.550 |
The prediction accuracy measured by AUC for the seven real networks.
| AUC | Ours | CN | AA | RA | Katz | HSM | SBM | CAR | CPA | CAA | CRA | CJC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jazz | 0.981 | 0.955 | 0.962 | 0.971 | 0.964 | 0.881 | 0.940 | 0.952 | 0.948 | 0.955 | 0.961 | 0.952 |
| Metabolic | 0.964 | 0.921 | 0.953 | 0.958 | 0.922 | 0.852 | 0.926 | 0.853 | 0.776 | 0.862 | 0.868 | 0.851 |
| C. elegans | 0.909 | 0.847 | 0.863 | 0.867 | 0.856 | 0.810 | 0.889 | 0.756 | 0.749 | 0.757 | 0.760 | 0.754 |
| USAir | 0.972 | 0.935 | 0.946 | 0.952 | 0.943 | 0.896 | 0.942 | 0.907 | 0.890 | 0.909 | 0.914 | 0.906 |
| FWF | 0.949 | 0.610 | 0.611 | 0.614 | 0.738 | 0.809 | 0.917 | 0.625 | 0.633 | 0.633 | 0.638 | 0.631 |
| FWM | 0.942 | 0.709 | 0.712 | 0.715 | 0.774 | 0.822 | 0.914 | 0.710 | 0.711 | 0.718 | 0.723 | 0.715 |
| Macaca | 0.988 | 0.944 | 0.944 | 0.948 | 0.946 | 0.949 | 0.978 | 0.936 | 0.935 | 0.937 | 0.940 | 0.936 |
Figure 3Predicting missing links for different sizes of probe set.
The prediction accuracy is measured by precision.
Figure 4Predicting missing links for different sizes of probe set.
The prediction accuracy is measured by AUC.
The accuracy of spurious link identification measured by precision for the seven real networks.
| Precision | Ours | CN | AA | RA | Katz | HSM | SBM | CAR | CPA | CAA | CRA | CJC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jazz | 0.794 | 0.701 | 0.723 | 0.761 | 0.745 | 0.471 | 0.582 | 0.697 | 0.690 | 0.703 | 0.731 | 0.695 |
| Metabolic | 0.769 | 0.716 | 0.763 | 0.762 | 0.749 | 0.183 | 0.548 | 0.627 | 0.611 | 0.627 | 0.627 | 0.628 |
| C. elegans | 0.593 | 0.446 | 0.465 | 0.465 | 0.433 | 0.277 | 0.599 | 0.243 | 0.253 | 0.243 | 0.243 | 0.246 |
| USAir | 0.749 | 0.642 | 0.686 | 0.686 | 0.626 | 0.311 | 0.738 | 0.513 | 0.536 | 0.513 | 0.513 | 0.513 |
| FWF | 0.672 | 0.232 | 0.229 | 0.218 | 0.220 | 0.342 | 0.575 | 0.239 | 0.241 | 0.243 | 0.246 | 0.242 |
| FWM | 0.657 | 0.322 | 0.328 | 0.332 | 0.313 | 0.420 | 0.603 | 0.333 | 0.323 | 0.331 | 0.332 | 0.329 |
| Macaca | 0.897 | 0.614 | 0.633 | 0.686 | 0.620 | 0.636 | 0.861 | 0.588 | 0.598 | 0.591 | 0.617 | 0.599 |
The accuracy of spurious link identification measured by AUC for the seven real networks.
| AUC | Ours | CN | AA | RA | Katz | HSM | SBM | CAR | CPA | CAA | CRA | CJC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jazz | 0.983 | 0.954 | 0.960 | 0.969 | 0.970 | 0.884 | 0.933 | 0.956 | 0.953 | 0.958 | 0.964 | 0.955 |
| Metabolic | 0.972 | 0.942 | 0.966 | 0.969 | 0.943 | 0.815 | 0.920 | 0.926 | 0.916 | 0.941 | 0.954 | 0.924 |
| C. elegans | 0.909 | 0.858 | 0.872 | 0.875 | 0.867 | 0.806 | 0.894 | 0.804 | 0.822 | 0.806 | 0.811 | 0.813 |
| USAir | 0.974 | 0.942 | 0.953 | 0.958 | 0.940 | 0.868 | 0.951 | 0.925 | 0.923 | 0.928 | 0.934 | 0.924 |
| FWF | 0.955 | 0.621 | 0.623 | 0.626 | 0.729 | 0.779 | 0.917 | 0.641 | 0.643 | 0.651 | 0.658 | 0.650 |
| FWM | 0.945 | 0.717 | 0.719 | 0.721 | 0.777 | 0.819 | 0.923 | 0.734 | 0.731 | 0.740 | 0.744 | 0.736 |
| Macaca | 0.990 | 0.944 | 0.945 | 0.947 | 0.943 | 0.920 | 0.984 | 0.943 | 0.946 | 0.944 | 0.947 | 0.946 |
Figure 5Identifying Spurious links for different sizes of probe set.
The prediction accuracy is measured by precision.
Figure 6Identifying Spurious links for different sizes of probe set.
The prediction accuracy is measured by AUC.
The accuracy of missing link prediction of the macaque brain network.
| Accuracy | precision | AUC |
|---|---|---|
| Uncertain links excluded | 0.452 | 0.877 |
| Uncertain links included | 0.295 | 0.855 |
The 16 most likely latent links among the uncertain links and their corresponding values of the Hamiltonian.
| link | Hamiltonian | link | Hamiltonian | link | Hamiltonian | link | Hamiltonian |
|---|---|---|---|---|---|---|---|
| FEF-TF | 4.4565e3 | PITd-PITv | 4.4558e3 | TF-TH | 4.4553e3 | V4t-LIP | 4.4547e3 |
| MSTd-MSTI | 4.4565e3 | V3A-VIP | 4.4557e3 | PIP-V3A | 4.4553e3 | PITd-TF | 4.4546e3 |
| CITd-CITv | 4.4563e3 | CITd-TF | 4.4555e3 | PITd-CITd | 4.4550e3 | PIP-V4t | 4.4546e3 |
| DP-VIP | 4.4560e3 | V3A-V4t | 4.4555e3 | PITd-TH | 4.4548e3 | PITv-STPp | 4.4545e3 |