| Literature DB >> 28542520 |
Zhen Li1, Yong Gong1, Zhisong Pan1, Guyu Hu1.
Abstract
Community detection is an important tasks across a number of research fields including social science, biology, and physics. In the real world, topology information alone is often inadequate to accurately find out community structure due to its sparsity and noise. The potential useful prior information such as pairwise constraints which contain must-link and cannot-link constraints can be obtained from domain knowledge in many applications. Thus, combining network topology with prior information to improve the community detection accuracy is promising. Previous methods mainly utilize the must-link constraints while cannot make full use of cannot-link constraints. In this paper, we propose a semi-supervised community detection framework which can effectively incorporate two types of pairwise constraints into the detection process. Particularly, must-link and cannot-link constraints are represented as positive and negative links, and we encode them by adding different graph regularization terms to penalize closeness of the nodes. Experiments on multiple real-world datasets show that the proposed framework significantly improves the accuracy of community detection.Mesh:
Year: 2017 PMID: 28542520 PMCID: PMC5441628 DOI: 10.1371/journal.pone.0178046
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Notations definitions.
| number of nodes | - | |
| number of communities | - | |
| the adjacency matrix of the original network | ||
| the adjacency matrix of a positive network | ||
| the adjacency matrix of a negative network | ||
| clustering indicator matrix | ||
| Laplacian matrix |
Details of the networks.
| Datasets | Nodes | Links | Communities |
|---|---|---|---|
| 34 | 78 | 2 | |
| 62 | 159 | 2 | |
| 115 | 613 | 12 | |
| 105 | 441 | 3 | |
| 1490 | 16718 | 2 | |
| 419 | 19950 | 5 | |
| 464 | 7787 | 28 |
Fig 1Experimental results of different algorithms with different percentages of must-link priors on seven real-world datasets.
Fig 2Experimental results of different algorithms with different percentages of cannot-link priors on seven real-world datasets.
Fig 3Experimental results of different algorithms with different percentages of priors.
Half of the priors are must-link priors and the other half are cannot-link priors.
Fig 4Experimental results of our algorithm with respect to parameters γ1 and γ2.
In the experiments, 5% must-link priors and 5% cannot-link priors are added.