| Literature DB >> 29312447 |
Wenfen Liu1,2,3, Mao Ye4, Jianghong Wei3, Xuexian Hu3.
Abstract
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight.Entities:
Mesh:
Year: 2017 PMID: 29312447 PMCID: PMC5632995 DOI: 10.1155/2017/2658707
Source DB: PubMed Journal: Comput Intell Neurosci
Algorithm 1Fast constrained spectral clustering.
Algorithm 2Spectral ensemble clustering with random projection.
Data sets information.
| Data set | #instances | #attributes | #classes |
|---|---|---|---|
| Letter recognition | 20,000 | 16 | 26 |
| MNIST | 70,000 | 784 | 10 |
| CoverType | 581,012 | 54 | 7 |
Figure 1Performance of clustering algorithms with different constraint information.
Figure 2Influence of sample rates on proposed algorithms.
Figure 3Performance of ensemble clustering algorithms with different constraint information.
Figure 4Performance of ensemble clustering algorithms with different dimension.
Decrease of running time of SECRP from SEC with different dimensions rd.
|
| 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| LetterRec | 2.44 | 2.39 | 2.11 | 2.07 | 2.04 | 2.03 | 1.92 | 1.74 | 1.67 | 1.56 |
| MNIST | 2.76 | 2.68 | 2.66 | 2.58 | 2.51 | 2.34 | 2.31 | 2.11 | 2.16 | 2.03 |
| CoverType | 18.85 | 18.64 | 15.34 | 15.26 | 14.04 | 11.43 | 9.73 | 8.31 | 7.72 | 7.44 |