| Literature DB >> 29743534 |
Sadamori Kojaku1,2, Naoki Masuda3.
Abstract
Many empirical networks have community structure, in which nodes are densely interconnected within each community (i.e., a group of nodes) and sparsely across different communities. Like other local and meso-scale structure of networks, communities are generally heterogeneous in various aspects such as the size, density of edges, connectivity to other communities and significance. In the present study, we propose a method to statistically test the significance of individual communities in a given network. Compared to the previous methods, the present algorithm is unique in that it accepts different community-detection algorithms and the corresponding quality function for single communities. The present method requires that a quality of each community can be quantified and that community detection is performed as optimisation of such a quality function summed over the communities. Various community detection algorithms including modularity maximisation and graph partitioning meet this criterion. Our method estimates a distribution of the quality function for randomised networks to calculate a likelihood of each community in the given network. We illustrate our algorithm by synthetic and empirical networks.Entities:
Year: 2018 PMID: 29743534 PMCID: PMC5943579 DOI: 10.1038/s41598-018-25560-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) A network with 31 non-overlapping communities generated by the LFR model. The circles represent nodes. The lines between the nodes represent edges. The colour of each node indicates the planted community to which the node belongs. (b–e) Quality of a community (i.e., , , and ) plotted against its number of nodes, n. The circles indicate the planted communities shown in panel (a). The crosses indicate the communities detected in 500 randomised networks generated by the configuration model. To find communities in the randomised networks, we use the Louvain algorithm[26] for (panel (b)) and a variant of the Kernighan–Lin algorithm[27] for , and (panels (c–e)).
Properties of 12 empirical networks.
| Network |
|
|
|
| vol | ||
|---|---|---|---|---|---|---|---|
| Min | Max | Min | Max | ||||
| Karate[ | 34 | 78 | 3 | 5 | 17 | 16 | 78 |
| Dolphin[ | 62 | 159 | 4 | 7 | 22 | 37 | 123 |
| Les Misérables[ | 77 | 254 | 10 | 2 | 16 | 3 | 147 |
| Email[ | 151 | 1527 | 6 | 16 | 50 | 258 | 1081 |
| Jazz[ | 198 | 2742 | 6 | 3 | 63 | 9 | 2029 |
| Network science[ | 379 | 914 | 11 | 6 | 65 | 27 | 290 |
| Blog[ | 1222 | 16,714 | 2 | 565 | 657 | 15,755 | 17,673 |
| Airport[ | 2939 | 15,677 | 20 | 2 | 712 | 2 | 12,638 |
| Protein[ | 3023 | 6149 | 161 | 2 | 312 | 2 | 1832 |
| Chess[ | 7115 | 55,779 | 409 | 2 | 812 | 3 | 23,034 |
| Astro-ph (co-authorship)[ | 18,771 | 198,050 | 116 | 2 | 3547 | 2 | 98,628 |
| Internet[ | 34,761 | 107,720 | 65 | 4 | 13,710 | 7 | 106,881 |
Column C indicates the number of communities detected by the Louvain algorithm. Columns n and vol indicate the number of nodes in a community and the sum of degrees of nodes in a community, respectively.
Figure 2True positive rate for the statistical tests applied to the networks generated by the LFR model. Legends S, L, and indicate the S–test, the L–test, the –test and the –test, respectively. The error bars indicate the ±1 standard deviation.
Fraction of significant communities identified by the S–test, the L–test, the (qmod, s)–test, the (qint, s)–test, the (qexp, s)–test and the (qcnd, s)–test in the 12 empirical networks.
| Network | S | L |
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| vol |
| vol |
| vol |
| vol | |||
| Karate | 1.00 | 0.33 | 0.67 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.33 |
| Dolphin | 1.00 | 0.50 | 1.00 | 0.75 | 0.00 | 0.00 | 0.00 | 0.00 | 0.50 | 0.50 |
| Les Misérables | 0.40 | 0.40 | 0.40 | 0.60 | 0.20 | 0.40 | 0.00 | 0.00 | 0.50 | 0.40 |
| Enron | 1.00 | 0.00 | 1.00 | 1.00 | 0.33 | 0.67 | 0.00 | 0.00 | 1.00 | 1.00 |
| Jazz | 0.67 | 0.67 | 0.67 | 1.00 | 0.67 | 0.83 | 0.00 | 0.00 | 1.00 | 1.00 |
| Netscience | 1.00 | 0.64 | 1.00 | 1.00 | 0.91 | 0.82 | 0.09 | 0.09 | 0.91 | 1.00 |
| Blog | 0.00 | 1.00 | 1.00 | 1.00 | 0.50 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 |
| Airport | 0.00 | 0.60 | 0.70 | 0.80 | 0.15 | 0.55 | 0.00 | 0.00 | 0.40 | 0.20 |
| Protein | 0.00 | 0.35 | 0.14 | 0.22 | 0.03 | 0.12 | 0.01 | 0.01 | 0.00 | 0.00 |
| Chess | 0.00 | 0.25 | 0.13 | 0.15 | 0.36 | 0.58 | 0.00 | 0.00 | 0.01 | 0.03 |
| Astro-ph | — | 0.61 | 0.24 | 0.53 | 1.00 | 1.00 | 0.00 | 0.00 | 0.33 | 0.12 |
| Internet | — | 0.55 | 0.65 | 0.60 | 0.00 | 0.18 | 0.00 | 0.00 | 0.00 | 0.02 |
The hyphen indicates that the test did not terminate within 64 days on our computer (Intel 2.6 GHz Sandy Bridge processors and 4GB of memory).
Agreement between pairs of statistical tests.
| Test | S | L |
|
|
|---|---|---|---|---|
| S | 1.00 | 0.42 | 0.73 | 0.66 |
| L | 0.42 | 1.00 | 0.49 | 0.58 |
|
| 0.73 | 0.49 | 1.00 | 0.84 |
|
| 0.66 | 0.58 | 0.84 | 1.00 |
Figure 3True positive rate as a function of mixing parameter, μ, for the six (q, s)–tests.
Agreement between the (q, n)–test and the (q, vol)–test.
| Network |
| ||
|---|---|---|---|
|
|
|
| |
| Karate | 1.00 | 1.00 | 0.67 |
| Dolphin | 1.00 | 1.00 | 1.00 |
| Les Misérables | 0.60 | 1.00 | 0.90 |
| Enron | 0.67 | 1.00 | 1.00 |
| Jazz | 0.50 | 1.00 | 1.00 |
| Netscience | 0.73 | 1.00 | 0.91 |
| Blog | 1.00 | 1.00 | 1.00 |
| Airport | 0.60 | 1.00 | 0.60 |
| Protein | 0.90 | 0.99 | 1.00 |
| Chess | 0.77 | 1.00 | 0.98 |
| Astro-ph | 1.00 | 1.00 | 0.76 |
| Internet | 0.82 | 1.00 | 0.98 |