| Literature DB >> 35252851 |
Alexander J Freund1, Philippe J Giabbanelli1.
Abstract
Node centrality measures are among the most commonly used analytical techniques for networks. They have long helped analysts to identify "important" nodes that hold power in a social context, where damages could have dire consequences for transportation applications, or who should be a focus for prevention in epidemiology. Given the ubiquity of network data, new measures have been proposed, occasionally motivated by emerging applications or by the ability to interpolate existing measures. Before analysts use these measures and interpret results, the fundamental question is: are these measures likely to complete within the time window allotted to the analysis? In this paper, we comprehensively examine how the time necessary to run 18 new measures (introduced from 2005 to 2020) scales as a function of the number of nodes in the network. Our focus is on giving analysts a simple and practical estimate for sparse networks. As the time consumption depends on the properties in the network, we nuance our analysis by considering whether the network is scale-free, small-world, or random. Our results identify that several metrics run in the order of O(nlogn) and could scale to large networks, whereas others can require O(n 2) or O(n 3) and may become prime targets in future works for approximation algorithms or distributed implementations.Entities:
Keywords: empirical study; node centrality; scale-free; small-world; synthetic graph generation
Year: 2022 PMID: 35252851 PMCID: PMC8889076 DOI: 10.3389/fdata.2022.797584
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
The 18 centrality measures used in this study.
|
|
|
|
|
|---|---|---|---|
| Subgraph | 2005 | Estrada and Rodriguez-Velazquez, | Eigenvalues to count close walks |
| Geodesic K-Path | 2006 | Borgatti and Everett, | Number of nodes reachable |
| Maximum neighborhood component | 2008 | Lin et al., | Size of the largest connected component within the direct neighbors of a given node |
| Density of maximum neighborhood component | Lin et al., | Ratio of edges to nodes within the largest connected component between a node's neighbors | |
| Decay | Jackson, | Proximity between a given node and every other node, weighted by a decay rate | |
| Topological coefficient | 2009 | Zhuge and Zhang, | Average number of neighbors of a given node that are also neighbors to a different node |
| Lobby Index | Campiteli et al., | Largest integer k such that the node has at least k neighbors with a degree of at least k. | |
| Coreness | 2010 | Kitsak et al., | Sum of k-shell indexes of a given node's neighbors |
| Leverage | Joyce et al., | Degree of a node relative to its neighbors | |
| Group | Narayanam and Narahari, | Game theory, to measure the marginal increase in group influence | |
| Wiener Index | 2011 | Caporossi et al., | Average distance from a given node to all other nodes |
| K-Path | Alahakoon et al., | Number of random paths of length k from all nodes that include a given node | |
| Diffusion Degree | Kundu et al., | Degree contribution of a node and its neighbors, weighted by a propagation probability | |
| LeaderRank | Lü et al., | Convergence of a random walk | |
| Laplacian | Gutman and Zhou, | Degrees of a node and its neighbors. Equivalent to using eigenvalues in the Laplacian | |
| Local Bridging | 2016 | Macker, | Ratio of shortest paths going through a node, modulated by its degree and degree of neighbors |
| VoteRank | Zhang et al., | Spreading ability, measuring by the convergence of an election process between neighbors. | |
| Heatmap | 2020 | Durón, | Sum of distance from a node to all others (i.e., farness) and average farness of the neighbors. |
Divisions in the table emphasize publication years.
Density of networks used in previous experimental assesssments of centrality measures.
|
|
|
|
|
|
|---|---|---|---|---|
| Group Centrality | Western States Power Grid | 4,940 | 6,594 undirected | 0.0005 |
| Collaborations in astrophysics | 16,705 | 121,251 undirected | 0.0008 | |
| K-Path | Kazaa file sharing | 2,424 | 13,354 undirected | 0.0045 |
| SciMet citations | 2,729 | 10,416 undirected | 0.0027 | |
| Co-authorships in condensed matter | 23,133 | 186,936 undirected | 0.0007 | |
| Citations (Cit-HepPh) | 34,546 | 421,578 directed | 0.0007 | |
| Company emails at Enron | 36,692 | 367,662 undirected | 0.0005 | |
| Social (Epinions1) | 75,879 | 508,837 directed | 0.00008 | |
| Social (Slashdot0922) | 82,168 | 948,464 directed | 0.00014 | |
| Heatmap | University emails in Spain | 1,133 | 5,451 undirected | 0.0085 |
| Hyperlinks in US political blogs | 1,222 | 16,714 undirected | 0.0224 | |
| US Airline Flights in 2010 | 1,572 | 17,214 undirected | 0.0139 | |
| Facebook from UC Irvine students | 1,893 | 13,835 undirected | 0.0077 | |
| Laplacian | Terrorist network mapped by Krebs | 37 | 170 directed | 0.1276 |
| LeaderRank | Users of | 1,675,008 | 169,378 undirected | 0.0000001 |
| Subgraph | Protein–protein interaction (yeast) | 2,224 | 6,608 undirected | 0.0026 |
| Protein–protein interaction (bacterium) | 710 | 1,396 undirected | 0.0055 | |
| Words in Roget's Thesaurus of English | 994 | 3,640 undirected | 0.0073 | |
| Words in Online Dict. of Library & Info. Science | 2,898 | 16,376 undirected | 0.0039 | |
| Collaborations in computational geometry | 3,621 | 9,461 undirected | 0.0014 | |
| Citations of papers on graph drawing | 249 | 635 undirected | 0.0205 | |
| Internet at the autonomous system (1997) | 3,015 | 5,156 undirected | 0.0011 | |
| Internet at the autonomous system (1998) | 3,522 | 6,324 undirected | 0.0010 | |
| Topological | DBLP research database | 664,188 | 79,128 directed | 0.0000001 |
| VoteRank | Friendships of Youtube users | 1,134,890 | 2,987,624 undirected | 0.000004 |
| Co-authorship in condensed matters (arXiv) | 23,133 | 93,497 undirected | 0.0003 | |
| Hyperlinks in Berkeley/Stanford webpages | 685,230 | 7,600,595 directed | 0.00001 | |
| Hyperlinks in U. Notre Dame webpages | 325,729 | 1,497,134 directed | 0.00001 |
Note that density is calculated as .
Generate comparable scale-free, small-world, and random networks for the desired sizes
Average edge densities of simulated network types for a sample of the sizes considered.
|
| ||||
|---|---|---|---|---|
|
|
|
|
| |
| 100 | 0.03600 | 0.03593 | 0.03576 | 0.00010 |
| 200 | 0.01903 | 0.01900 | 0.01893 | 0.00005 |
| 400 | 0.01017 | 0.01031 | 0.01017 | 0.00006 |
| 800 | 0.00530 | 0.00558 | 0.00530 | 0.00013 |
| 1,600 | 0.00283 | 0.00316 | 0.00284 | 0.00015 |
| 3,200 | 0.00148 | 0.00170 | 0.00147 | 0.00011 |
| 6,400 | 0.00078 | 0.00093 | 0.00078 | 0.00007 |
| 12,800 | 0.00041 | 0.00050 | 0.00041 | 0.00004 |
Experimental time complexities of 18 node centrality metrics on 3 network types.
|
|
| ||
|---|---|---|---|
|
|
|
| |
| Subgraph | |||
| Geodesic K-Path | |||
| MNC | |||
| DMNC | |||
| Decay | |||
| Topological |
| ||
| Lobby index | |||
| Coreness | |||
| Leverage | |||
| Group | |||
| Wiener Index | |||
| K-Path | |||
| Diffusion degree | |||
| LeaderRank | |||
| Laplacian | |||
| Local bridging | |||
| VoteRank | |||
| HeatMap | |||
Situations with good scaling behaviors are shown in .
Figure 1Sample of our experimental results on Coreness, K-Path, and Lobby Index for three network types (scale-free, small-world, random) of comparable network edges at varying network sizes (n = 100, 200, …, 16,000). Scaling is obtained by fitting on each column, for each network type.
Figure 2Sample of our experimental results on Heatmap, Topological, and VoteRank for three network types (scale-free, small-world, random) of comparable network edges at varying network sizes (n = 100, 200, …, 16,000). Scaling is obtained by fitting on each column, for each network type.