| Literature DB >> 26090857 |
Jing Yang1, Limin Chen2, Jianpei Zhang1.
Abstract
It is important to cluster heterogeneous information networks. A fast clustering algorithm based on an approximate commute time embedding for heterogeneous information networks with a star network schema is proposed in this paper by utilizing the sparsity of heterogeneous information networks. First, a heterogeneous information network is transformed into multiple compatible bipartite graphs from the compatible point of view. Second, the approximate commute time embedding of each bipartite graph is computed using random mapping and a linear time solver. All of the indicator subsets in each embedding simultaneously determine the target dataset. Finally, a general model is formulated by these indicator subsets, and a fast algorithm is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. The proposed fast algorithm, FctClus, is shown to be efficient and generalizable and exhibits high clustering accuracy and fast computation speed based on a theoretic analysis and experimental verification.Entities:
Mesh:
Year: 2015 PMID: 26090857 PMCID: PMC4474961 DOI: 10.1371/journal.pone.0130086
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The influence of k for clustering papers on s
Fig 2The influence of k for clustering authors on s
Fig 3The influence of u for clustering papers on s
Fig 4The influence of u for clustering authors on s
Comparison of clustering accuracy (%).
| target object &dataset | CIT | NetClus | ComClus | FctClus |
|---|---|---|---|---|
| Papers on | 73.91 | 71.54 | 72.83 | 78.87 |
| Authors on | 74.41 | 69.13 | 74.91 | 81.33 |
| Papers on | 70.84 | 71.28 | 72.93 | 76.36 |
| Authors on | 71.02 | 68.29 | 73.01 | 77.94 |
Comparison of computation speed(s).
| target object &dataset | CIT | NetClus | ComClus | FctClus |
|---|---|---|---|---|
| Papers on | 78.5 | 37.3 | 40.3 | 37.1 |
| Authors on | 79.8 | 36.9 | 39.8 | 38.3 |
| Papers on | 1469.3 | 802.6 | 827.3 | 808.4 |
| Authors on | 1484.7 | 743.7 | 781.4 | 774.9 |
Fig 5A stability comparison of the 3 algorithms for 10 times.
Distribution of running time for FctClus.
| target object &dataset | Embedding time(s) | Clustering time(s) | Total time(s) |
|---|---|---|---|
| Papers on | 19.6 | 17.5 | 37.1 |
| Authors on | 18.1 | 20.2 | 38.3 |
| Papers on | 398.8 | 409.6 | 808.4 |
| Authors on | 382.4 | 392.5 | 774.9 |