| Literature DB >> 27053090 |
Yan Chen1, Pei Zhao1, Ping Li1, Kai Zhang2, Jie Zhang3.
Abstract
Detecting communities or clusters in a real-world, networked system is of considerable interest in various fields such as sociology, biology, physics, engineering science, and interdisciplinary subjects, with significant efforts devoted in recent years. Many existing algorithms are only designed to identify the composition of communities, but not the structures. Whereas we believe that the local structures of communities can also shed important light on their detection. In this work, we develop a simple yet effective approach that simultaneously uncovers communities and their centers. The idea is based on the premise that organization of a community generally can be viewed as a high-density node surrounded by neighbors with lower densities, and community centers reside far apart from each other. We propose so-called "community centrality" to quantify likelihood of a node being the community centers in such a landscape, and then propagate multiple, significant center likelihood throughout the network via a diffusion process. Our approach is an efficient linear algorithm, and has demonstrated superior performance on a wide spectrum of synthetic and real world networks especially those with sparse connections amongst the community centers.Entities:
Mesh:
Year: 2016 PMID: 27053090 PMCID: PMC4823754 DOI: 10.1038/srep24017
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Networks used in the experiments.
| Synthetic Networks | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Networks | ||||||||||
| LFR-1 | 50 | 3 | 0 | 0.1 | 2 | 1 | 25 | 25 | 0 | 0 |
| LFR-2 | 1000 | 15 | 0.18 | 0.1 | 2 | 0 | 500 | 500 | 0 | 0 |
| LFR-3 | 1000 | 20 | 0.07 | 0.1 | 2 | 1 | 100 | 500 | 0 | 0 |
| LFR-4 | 50 | 3 | 0 | 0.1 | 2 | 1 | 25 | 25 | 5 | 2 |
| Networks | Description | |||||||||
| Karate | 34 | 4.59 | 0 | Zachary’s social network of a karate club | ||||||
| Dolphins | 62 | 5.13 | 0.67 | Dolphin social network | ||||||
| Polbooks | 105 | 8.40 | 0.4 | Books about US politics | ||||||
| Football | 115 | 10.66 | 0.3 | Network of American football games | ||||||
| SFI | 118 | 3.40 | 0.3 | Collaboration network of scientists at the Santa Fe Institute | ||||||
| Jazz | 198 | 27.70 | 0.94 | Network of Jazz musicians | ||||||
| E-coli | 328 | 3.03 | 0.13 | Transcriptional regulation network of Escherichia coli | ||||||
| 1133 | 9.62 | 0.27 | Network of e-mail interchanges | |||||||
| Polblogs | 1222 | 27.36 | 0.5 | Blogs about politics | ||||||
| Power Grid | 4941 | 2.67 | 0.02 | The Western States Power Grid of the United States | ||||||
| Wiki-vote | 7066 | 28.51 | 0.55 | Wikipedia who-votes-on-whom network | ||||||
| CA-HepTh | 9877 | 5.74 | 0.1 | Collaboration network of Arxiv High Energy Physics Theory | ||||||
| PGP | 10680 | 4.55 | 0.17 | Web of trust of PGP | ||||||
| CA-CondMat | 23133 | 8.55 | 0.28 | Collaboration network of Arxiv Condensed Matter | ||||||
| Email-Enron | 36692 | 10.73 | 0.45 | Email communication network from Enron | ||||||
Here n denotes the numbers of vertices, for networks that are not fully connected, the largest graph components are considered. k is the averaged node degree, μ is the mixing parameter, t1 is the negative exponent for the degree distribution, t2 is the negative exponent for the community size distribution, c and c is the minimum and maximum size of communities, respectively, n is the number of overlapping nodes, and m is the number of memberships in the overlapping nodes. The ϕ(r) is the rich-club connectivity of the network40. Here we choose r ∼ log N/N.
Performance comparison in the networks with ground truth.
| Networks | Ground Truth | Ours | Louvain | Fastgreedy | Infomap | Eigenvector | LP | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C | Q | C | NMI | Q | NMI | Q | NMI | Q | NMI | Q | NMI | Q | NMI | Q | |
| LFR-1 | 2 | 0.43 | 2 | 0.43 | 0.40 | 0.56 | 0.51 | 0.56 | 0.46 | 0.56 | 0.51 | 0.52 | 0.40 | 0.51 | |
| LFR-2 | 3 | 0.52 | 3 | 0.52 | 1.00 | 0.40 | 0.88 | 0.39 | 1.00 | 0.40 | 1.00 | 0.40 | 1.00 | 0.40 | |
| LFR-3 | 2 | 0.40 | 2 | 0.40 | 1.00 | 0.52 | 0.99 | 0.51 | 1.00 | 0.52 | 0.88 | 0.49 | 1.00 | 0.52 | |
| LFR-4 | 2 | 0.39 | 2 | 0.41 | 0.45 | 0.54 | 0.34 | 0.53 | 0.39 | 0.55 | 0.43 | 0.51 | 0.33 | 0.46 | |
| Karate | 2 | 0.37 | 2 | 0.37 | 0.59 | 0.42 | 0.69 | 0.38 | 0.70 | 0.40 | 0.68 | 0.39 | 0.70 | 0.40 | |
| Polbooks | 2 | 0.41 | 2 | 0.46 | 0.51 | 0.52 | 0.53 | 0.50 | 0.49 | 0.52 | 0.52 | 0.47 | 0.57 | 0.50 | |
| Football | 12 | 0.55 | 12 | 0.59 | 0.88 | 0.60 | 0.70 | 0.55 | 0.60 | 0.70 | 0.49 | 0.60 | |||
| Dolphins | 2 | 0.38 | 2 | 0.38 | 0.48 | 0.52 | 0.61 | 0.50 | 0.50 | 0.52 | 0.54 | 0.49 | 0.69 | 0.50 | |
| Polblogs | 2 | 0.41 | 2 | 0.42 | 0.63 | 0.43 | 0.65 | 0.43 | 0.48 | 0.42 | 0.69 | 0.42 | 0.69 | 0.43 | |
Here C is the number of communities, Q is the modularity result and NMI the normalized mutual information.
Figure 1Two networks of LFR-benchmark.
(a) LFR-2 with 1000 nodes and three communities. The result of our algorithm agrees with the ground truth. (b) The real communities of LFR-4 given by the LFR benchmark algorithm. 5 overlapping nodes are shown using pie vertex in two different colors. (c) Our partition of LFR-4.
Figure 2The partition results by the proposed method for real-world networks.
(a) Zachary karate club network: the two communities we detected are identical with the real communities. (b) Dolphins network: the 2 communities we detected are identical with the 2 real groups of male and female. (c) Pol-blogs network: the 2 communities we discovered. (d) The SFI collaboration network: this network has obvious tree structure, the degree density indice can be used to find the centers.
Performance comparison in the networks without ground truth.
| Networks | Ours | Louvain | Fastgreedy | Infomap | Eigenvector | LP | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CA-CondMat | 105 | 0.63 | 55 | 0.72 | 261 | 0.63 | 1347 | 0.63 | 34 | 0.54 | 1537 | 0.62 |
| Email-Enron | 267 | 0.42 | 246 | 0.60 | 560 | 0.51 | 1554 | 0.52 | 2 | 0.34 | 887 | 0.32 |
| CA-HepTh | 60 | 0.76 | 53 | 0.82 | 79 | 0.78 | 520 | 0.73 | 22 | 0.57 | 541 | 0.74 |
| PGP | 121 | 0.82 | 102 | 0.88 | 190 | 0.85 | 1070 | 0.80 | 25 | 0.68 | 955 | 0.81 |
| Power Grid | 35 | 0.90 | 40 | 0.93 | 39 | 0.93 | 483 | 0.82 | 35 | 0.83 | 479 | 0.81 |
| Wiki-vote | 9 | 0.34 | 9 | 0.43 | 31 | 0.34 | 254 | 0.38 | 10 | 0.42 | 3 | 9e-05 |
| Jazz | 2 | 0.29 | 4 | 0.44 | 4 | 0.44 | 7 | 0.28 | 3 | 0.39 | 2 | 0.28 |
| 19 | 0.43 | 12 | 0.54 | 16 | 0.51 | 68 | 0.52 | 7 | 0.49 | 8 | 0.28 | |
| E-coli | 10 | 0.66 | 14 | 0.75 | 15 | 0.75 | 39 | 0.71 | 11 | 0.64 | 42 | 0.68 |
| SFI coli | 5 | 0.70 | 8 | 0.75 | 8 | 0.73 | 14 | 0.72 | 7 | 0.71 | 11 | 0.70 |
Here C is the number of communities, Q is the modularity result.
Impact of different density indice (strong-tie and degree) on the Performance of our approach.
| Networks | Strong-tie | Degree | ||
|---|---|---|---|---|
| Karate | 2 | 2 | ||
| Dolphins | 3 | 3 | 0.43 | |
| Polbooks | 7 | 3 | 0.44 | |
| Polblogs | 2 | 2 | ||
| Football | 12 | 15 | 0.51 | |
| E-coli | 10 | 15 | 0.63 | |
| SFI | 3 | 0.65 | 5 | |
| Jazz | 44 | 41 | 0.32 | |
| 18 | 0.41 | 19 | ||
| CA-CondMat | 105 | 0.61 | 145 | |
| Email-Enron | 224 | 0.41 | 267 | |
| CA-HepTh | 86 | 0.74 | 60 | |
| PGP | 126 | 0.81 | 121 | |
| Power Grid | 63 | 0.88 | 35 | |
| Wiki-vote | 7 | 0.32 | 9 | |
The results shown here are the modularity (Q) obtained by the proposed method and the number (C) of communities identified.
Figure 3The log-log graph of γ = ηψ in descending order for six small networks.
η and ψi are normalized so that most of the nodes are very small.