| Literature DB >> 30533511 |
Tiziano Squartini1, Giulio Cimini1,2, Andrea Gabrielli1,2, Diego Garlaschelli3.
Abstract
Reconstructing weighted networks from partial information is necessary in many important circumstances, e.g. for a correct estimation of systemic risk. It has been shown that, in order to achieve an accurate reconstruction, it is crucial to reliably replicate the empirical degree sequence, which is however unknown in many realistic situations. More recently, it has been found that the knowledge of the degree sequence can be replaced by the knowledge of the strength sequence, which is typically accessible, complemented by that of the total number of links, thus considerably relaxing the observational requirements. Here we further relax these requirements and devise a procedure valid when even the the total number of links is unavailable. We assume that, apart from the heterogeneity induced by the degree sequence itself, the network is homogeneous, so that its (global) link density can be estimated by sampling subsets of nodes with representative density. We show that the best way of sampling nodes is the random selection scheme, any other procedure being biased towards unrealistically large, or small, link densities. We then introduce our core technique for reconstructing both the topology and the link weights of the unknown network in detail. When tested on real economic and financial data sets, our method achieves a remarkable accuracy and is very robust with respect to the sampled subsets, thus representing a reliable practical tool whenever the available topological information is restricted to small portions of nodes.Entities:
Keywords: 89.75.Hc; 89.65.Gh; 02.50.Tt
Year: 2017 PMID: 30533511 PMCID: PMC6245230 DOI: 10.1007/s41109-017-0021-8
Source DB: PubMed Journal: Appl Netw Sci ISSN: 2364-8228
Statistical indicators used to evaluate the performance of our sampled-based reconstruction method, for different cardinalities n of the known subset I. Results are shown together with the 95% confidence intervals (not shown whenever their difference affects the significant digits beyond the third one)
| WTW |
|
|
|
|
|
|---|---|---|---|---|---|
| 2000 - True positive rate | 0.794 [0.772,0.816] | 0.779 [0.765,0.793] | 0.804 [0.796,0.812] | 0.801 [0.797,0.806] | 0.801 [0.799,0.804] |
| 2000 - Specificity | 0.700 [0.669,0.731] | 0.742 [0.726,0.758] | 0.721 [0.710,0.731] | 0.728 [0.723,0.734] | 0.729 [0.726,0.733] |
| 2000 - Positive predicted value | 0.796 [0.784,0.808] | 0.810 [0.803,0.817] | 0.799 [0.795,0.804] | 0.802 [0.800,0.805] | 0.802 [0.801,0.803] |
| 2000 - Accuracy | 0.755 [0.750,0.760] | 0.763 [0.762,0.766] | 0.769 [0.768,0.770] | 0.771 | 0.771 |
| 2000 - Cosine similarity | 0.712 | 0.712 | 0.712 | 0.712 | 0.712 |
| e-MID | |||||
| 1999 - True positive rate | 0.641 [0.601,0.673] | 0.633 [0.614,0.653] | 0.633 [0.620,0.646] | 0.637 [0.623,0.643] | 0.636 [0.632,0.640] |
| 1999 - Specificity | 0.839 [0.823,0.856] | 0.856 [0.848,0.864] | 0.860 [0.854,0.865] | 0.860 [0.857,0.863] | 0.861 [0.859,0.862] |
| 1999 - Positive predicted value | 0.623 [0.611,0.637] | 0.632 [0.625,0.639] | 0.633 [0.628,0.638] | 0.632 [0.623,0.635] | 0.633 [0.631,0.634] |
| 1999 - Accuracy | 0.785 [0.780,0.790] | 0.795 [0.794,0.796] | 0.798 [0.797,0.799] | 0.799 [0.798,0.800] | 0.799 |
| 1999 - Cosine similarity | 0.810 [0.805,0.815] | 0.814 [0.811,0.816] | 0.816 [0.815,0.817] | 0.817 | 0.817 |
The considered cardinalities n=5,10,20,50,100 correspond to percentages ranging from ≃2% to ≃50% of the total number of nodes. As reference values, the link density is c=0.578 for the WTW (in the year 2000) and c=0.274 for e-MID (in the year 1999)
Link density estimation for different cardinalities n of the random sampled subset I. Results are based on 1000 samples and are shown together with the 95% confidence intervals
| Link density |
|
|
|
|
|
|---|---|---|---|---|---|
| WTW 2000 (true: 0.578) | 0.586 [0.560;0.611] | 0.559 [0.544;0.574] | 0.583 [0.574;0.592] | 0.578 [0.573;0.583] | 0.577 [0.574;0.580] |
| e-MID 1999 (true: 0.274) | 0.292 [0.271;0.313] | 0.278 [0.267;0.289] | 0.276 [0.268;0.283] | 0.276 [0.272;0.280] | 0.275 [0.273;0.278] |
The considered cardinalities n=5,10,20,50,100 correspond to percentages ranging from ≃2% to ≃50% of the total number of nodes. The true link densities calculated on the entire networks are shown in brackets for reference
Fig. 1Left panels: scatter plots of the link density c versus the internal total strength of the subset I. Nodes characterized by large values of the total strength tend to form densely-connected groups, while nodes characterized by small values of the total strength tend, on the contrary, to form loosely-connected groups. Right panels: empirical probability distributions of the link density c , when nodes belonging to I are chosen randomly. Each distribution is peaked around the density value of the whole network. Top panels refer to the WTW, bottom panels to e-MID
Statistical indicators used to evaluate the performance of our sampled-based reconstruction method, for different cardinalities n of the known subset I. Results are shown together with the 95% confidence intervals (not shown whenever their difference affects the significant digits beyond the third one)
| WTW |
|
|
|
|
|---|---|---|---|---|
| 1950 - Link density (true: 0.402) | 0.401 [0.375;0.426] | 0.402 [0.387;0.416] | 0.401 [0.393;0.409] | 0.400 [0.396;0.403] |
| 1950 - Accuracy | 0.736 [0.731;0.741] | 0.747 [0.746;0.749] | 0.751 | 0.752 |
| 1950 - Cosine similarity | 0.460 [0.458;0.462] | 0.463 | 0.463 | 0.463 |
| 1960 - Link density (true: 0.383) | 0.329 [0.305;0.352] | 0.343 [0.330;0.357] | 0.346 [0.338;0.355] | 0.348 [0.344;0.353] |
| 1960 - Accuracy | 0.737 [0.734;0.741] | 0.746 | 0.749 [0.748;0.750] | 0.751 |
| 1960 - Cosine similarity | 0.586 | 0.591 | 0.591 | 0.591 |
| 1970 - Link density (true: 0.460) | 0.464 [0.436;0.492] | 0.478 [0.462;0.496] | 0.461 [0.451;0.471] | 0.464 [0.458;0.469] |
| 1970 - Accuracy | 0.695 [0.691;0.699] | 0.704 [0.702;0.706] | 0.709 | 0.709 |
| 1970 - Cosine similarity | 0.669 | 0.669 | 0.669 | 0.669 |
| 1980 - Link density (true: 0.468) | 0.484 [0.458;0.510] | 0.470 [0.455;0.485] | 0.471 [0.461;0.481] | 0.463 [0.458;0.469] |
| 1980 - Accuracy | 0.719 [0.715;0.723] | 0.731 [0.730;0.733] | 0.734 [0.733;0.735] | 0.736 |
| 1980 - Cosine similarity | 0.732 | 0.732 | 0.732 | 0.732 |
| 1990 - Link density (true: 0.505) | 0.495 [0.467;0.522] | 0.516 [0.500;0.532] | 0.506 [0.497;0.515] | 0.507 [0.503;0.512] |
| 1990 - Accuracy | 0.731 [0.726;0.736] | 0.743 [0.741;0.745] | 0.748 | 0.749 |
| 1990 - Cosine similarity | 0.751 | 0.751 | 0.751 | 0.751 |
The considered cardinalities n=5,10,20,50 correspond to percentages ranging from ≃2% to ≃25% of the total number of nodes. The true link densities calculated on the entire networks for the various periods are shown in brackets for reference
Statistical indicators used to evaluate the performance of our sampled-based reconstruction method, for different cardinalities n of the known subset I. Results are shown together with the 95% confidence intervals (not shown whenever their difference affects the significant digits beyond the third one)
| e-MID |
|
|
|
|
|---|---|---|---|---|
| 2000 - Link density (true: 0.278) | 0.293 [0.269;0.317] | 0.279 [0.263;0.295] | 0.273 [0.264;0.281] | 0.280 [0.273;0.283] |
| 2000 - Accuracy | 0.763 [0.759;0.768] | 0.772 [0.769;0.775] | 0.778 [0.777;0.779] | 0.778 [0.777;0.779] |
| 2000 - Cosine similarity | 0.573 [0.566;0.580] | 0.578 [0.576;0.582] | 0.582 | 0.582 |
| 2001 - Link density (true: 0.263) | 0.279 [0.256;0.303] | 0.264 [0.249;0.278] | 0.257 [0.246;0.267] | 0.266 [0.261;0.272] |
| 2001 - Accuracy | 0.763 [0.757;0.770] | 0.774 [0.772;0.776] | 0.777 [0.775;0.779] | 0.778 [0.777;0.779] |
| 2001 - Cosine similarity | 0.560 [0.554;0.566] | 0.566 [0.563;0.569] | 0.569 | 0.570 |
| 2002 - Link density (true: 0.233) | 0.253 [0.230;0.276] | 0.237 [0.221;0.252] | 0.235 [0.225;0.246] | 0.233 [0.228;0.239] |
| 2002 - Accuracy | 0.759 [0.752;0.766] | 0.767 [0.763;0.771] | 0.770 [0.767;0.772] | 0.772 [0.770;0.773] |
| 2002 - Cosine similarity | 0.684 [0.675;0.694] | 0.670 [0.682;0.697] | 0.699 [0.697;0.701] | 0.701 [0.700;0.702] |
| 2003 - Link density (true: 0.214) | 0.248 [0.223;0.273] | 0.225 [0.208;0.243] | 0.217 [0.205;0.228] | 0.213 [0.208;0.219] |
| 2003 - Accuracy | 0.746 [0.737;0.756] | 0.758 [0.752;0.763] | 0.763 [0.759;0.766] | 0.766 [0.764;0.767] |
| 2003 - Cosine similarity | 0.461 [0.453;0.470] | 0.462 [0.454;0.470] | 0.472 [0.469;0.475] | 0.476 [0.475;0.477] |
| 2004 - Link density (true: 0.190) | 0.210 [0.185;0.235] | 0.183 [0.168;0.199] | 0.194 [0.182;0.205] | 0.187 [0.181;0.192] |
| 2004 - Accuracy | 0.772 [0.762;0.783] | 0.785 [0.780;0.790] | 0.784 [0.780;0.788] | 0.788 [0.786;0.790] |
| 2004 - Cosine similarity | 0.481 [0.470;0.492] | 0.482 [0.474;0.491] | 0.497 [0.493;0.502] | 0.503 [0.501;0.504] |
| 2005 - Link density (true: 0.201) | 0.232 [0.205;0.258] | 0.210 [0.190;0.222] | 0.210 [0.200;0.221] | 0.208 [0.203;0.214] |
| 2005 - Accuracy | 0.751 [0.740;0.762] | 0.767 [0.760;0.773] | 0.767 [0.763;0.771] | 0.769 [0.767;0.771] |
| 2005 - Cosine similarity | 0.461 [0.448;0.474] | 0.476 [0.470;0.483] | 0.486 [0.483;0.490] | 0.491 [0.490;0.492] |
The considered cardinalities n=5,10,20,50,100 correspond to percentages ranging from ≃2% to ≃25% of the total number of nodes. The true link densities calculated on the entire networks for the various periods are shown in brackets for reference