| Literature DB >> 33834111 |
Wenna Huang1, Yong Peng1,2, Yuan Ge3, Wanzeng Kong1,4.
Abstract
The Kmeans clustering and spectral clustering are two popular clustering methods for grouping similar data points together according to their similarities. However, the performance of Kmeans clustering might be quite unstable due to the random initialization of the cluster centroids. Generally, spectral clustering methods employ a two-step strategy of spectral embedding and discretization postprocessing to obtain the cluster assignment, which easily lead to far deviation from true discrete solution during the postprocessing process. In this paper, based on the connection between the Kmeans clustering and spectral clustering, we propose a new Kmeans formulation by joint spectral embedding and spectral rotation which is an effective postprocessing approach to perform the discretization, termed KMSR. Further, instead of directly using the dot-product data similarity measure, we make generalization on KMSR by incorporating more advanced data similarity measures and call this generalized model as KMSR-G. An efficient optimization method is derived to solve the KMSR (KMSR-G) model objective whose complexity and convergence are provided. We conduct experiments on extensive benchmark datasets to validate the performance of our proposed models and the experimental results demonstrate that our models perform better than the related methods in most cases. ©2021 Huang et al.Entities:
Keywords: Data similarity; Kmeans clustering; Spectral clustering; Spectral rotation
Year: 2021 PMID: 33834111 PMCID: PMC8022527 DOI: 10.7717/peerj-cs.450
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1The framework of the proposed models, KMSR and KMSR-G.
Figure 2The illustration procedure of updating Y (Chen et al., 2017).
The basic characteristics the twelve data sets used in the experiments.
| Data Sets | # Samples | # Dimensions | # Clusters |
|---|---|---|---|
| ecoli | 327 | 7 | 5 |
| abalone | 4177 | 8 | 3 |
| scale | 625 | 4 | 3 |
| COIL20 | 1440 | 1024 | 20 |
| umist | 575 | 644 | 20 |
| AT&T | 400 | 189 | 40 |
| YaleB | 2414 | 1024 | 38 |
| Yale | 165 | 105 | 15 |
| PIE | 1428 | 1024 | 68 |
| AR | 1200 | 261 | 100 |
| MNIST | 1000 | 784 | 10 |
| jaffe | 212 | 177 | 7 |
Clustering performance (%) of Kmeans and KMSR on the benchmark data sets.
| Acc | NMI | Purity | ||||
|---|---|---|---|---|---|---|
| Data Sets | KMSR | KMSR | KMSR | |||
| ecoli | 65.50 ± 6.85 | 58.70 ± 2.27 | 78.46 ± 1.50 | |||
| abalone | 51.55 ± 0.39 | 13.23 ± 0.63 | 53.50 ± 0.03 | |||
| scale | 55.31 ± 5.66 | 17.42 ± 8.74 | 69.72 ± 5.33 | |||
| COIL20 | 56.06 ± 4.94 | 70.51 ± 2.18 | 59.64 ± 4.34 | |||
| umist | 39.51 ± 2.07 | 58.33 ± 1.80 | 46.51 ± 1.54 | |||
| AT&T | 52.94 ± 4.21 | 73.06 ± 2.51 | 61.25 ± 3.41 | |||
| YaleB | 8.49 ± 0.70 | 10.46 ± 1.00 | 9.22 ± 0.73 | |||
| Yale | 39.09 ± 4.61 | 45.19 ± 3.61 | 41.36 ± 3.92 | |||
| PIE | 33.56 ± 1.97 | 66.40 ± 0.92 | 39.22 ± 1.27 | |||
| AR | 15.25 ± 0.31 | 48.43 ± 0.65 | 16.18 ± 0.39 | |||
| MNIST | 52.04 ± 3.16 | 49.83 ± 1.93 | 55.83 ± 2.63 | |||
| jaffe | 27.88 ± 2.92 | 12.51 ± 3.02 | 29.62 ± 3.38 | |||
Figure 3Clustering results obtained by Kmeans and KMSR in statistical box diagrams.
(A) Acc. (B) NMI. (C) Purity.
Clustering performance (%) of KMSR-G and related models on twelve data sets.
| Acc | |||||
|---|---|---|---|---|---|
| Data Sets | NCut+KM | NCut+SR | RCut+KM | RCut+SR | KMSR-G |
| ecoli | 73.70 ± 0.55 | 75.26 ± 1.04 | 75.15 ± 5.25 | 77.84 ± 1.49 | |
| abalone | 49.42 ± 3.85 | 50.10 ± 2.68 | 48.87 ± 4.30 | 50.97 ± 0.00 | |
| scale | 65.61 ± 0.31 | 66.15 ± 0.50 | 63.37 ± 3.08 | 65.17 ± 1.61 | |
| COIL20 | 72.46 ± 4.80 | 76.58 ± 5.16 | 71.45 ± 4.76 | 77.77 ± 3.31 | |
| umist | 50.77 ± 2.10 | 53.97 ± 1.98 | 51.32 ± 3.86 | 55.29 ± 1.56 | |
| AT&T | 62.70 ± 2.15 | 66.30 ± 2.53 | 62.38 ± 1.25 | 63.46 ± 1.01 | |
| YaleB | 32.69 ± 1.19 | 43.47 ± 0.33 | 33.62 ± 1.31 | 37.76 ± 0.03 | |
| Yale | 42.55 ± 0.66 | 42.61 ± 1.73 | 41.64 ± 1.56 | 43.42 ± 1.60 | |
| PIE | 82.56 ± 3.13 | 84.93 ± 3.16 | 81.55 ± 2.07 | 84.82 ± 3.96 | |
| AR | 16.65 ± 0.23 | 16.80 ± 0.15 | 16.56 ± 0.26 | 16.92 ± 0.11 | |
| MNIST | 49.29 ± 3.46 | 50.01 ± 1.45 | 49.70 ± 2.33 | 50.75 ± 1.24 | |
| jaffe | 24.43 ± 0.60 | 25.68 ± 0.24 | 24.62 ± 1.12 | 25.94 ± 0.48 | |
| NMI | |||||
| ecoli | 55.94 ± 0.00 | 56.75 ± 0.47 | 57.86 ± 1.31 | 58.37 ± 0.19 | |
| abalone | 10.05 ± 4.54 | 11.42 ± 3.06 | 9.47 ± 4.54 | 12.41 ± 0.00 | |
| scale | 34.23 ± 0.35 | 34.64 ± 0.46 | 32.26 ± 3.88 | 34.13 ± 1.75 | |
| COIL20 | 84.50 ± 1.81 | 87.66 ± 2.39 | 84.91 ± 2.41 | 87.78 ± 2.43 | |
| umist | 71.14 ± 1.14 | 72.66 ± 1.17 | 70.37 ± 2.01 | 73.17 ± 0.30 | |
| AT&T | 79.04 ± 0.80 | 80.39 ± 1.20 | 78.95 ± 0.63 | 79.20 ± 0.55 | |
| YaleB | 43.10 ± 0.80 | 41.41 ± 0.91 | 50.00 ± 0.05 | 44.37 ± 0.09 | |
| Yale | 49.94 ± 0.73 | 50.29 ± 1.16 | 49.18 ± 1.27 | 50.95 ± 1.00 | |
| PIE | 94.67 ± 1.08 | 95.12 ± 1.10 | 93.97 ± 0.97 | 95.03 ± 1.41 | |
| AR | 51.84 ± 0.22 | 50.59 ± 0.38 | 52.11 ± 0.28 | 52.12 ± 0.32 | |
| MNIST | 53.78 ± 2.21 | 55.47 ± 0.95 | 52.68 ± 1.12 | 53.69 ± 1.11 | |
| jaffe | 8.71 ± 0.82 | 9.40 ± 0.22 | 9.48 ± 0.88 | 9.69 ± 0.40 | |
| Purity | |||||
| ecoli | 74.68 ± 0.14 | 75.72 ± 0.61 | 77.91 ± 0.29 | 78.12 ± 0.29 | |
| abalone | 50.54 ± 4.42 | 51.93 ± 3.31 | 49.89 ± 4.90 | 53.00 ± 0.00 | |
| scale | 78.74 ± 0.38 | 78.91 ± 0.55 | 77.41 ± 1.35 | 78.17 ± 0.73 | |
| COIL20 | 75.23 ± 4.13 | 81.32 ± 5.24 | 76.56 ± 4.05 | 82.67 ± 3.33 | |
| umist | 61.96 ± 2.12 | 63.31 ± 1.97 | 60.16 ± 3.21 | 64.33 ± 1.66 | |
| AT&T | 66.00 ± 1.79 | 68.53 ± 2.86 | 65.75 ± 1.32 | 67.90 ± 1.02 | |
| YaleB | 34.72 ± 1.03 | 45.28 ± 0.35 | 35.65 ± 1.04 | 39.08 ± 0.03 | |
| Yale | 42.91 ± 0.51 | 43.00 ± 1.14 | 42.52 ± 1.78 | 43.42 ± 1.60 | |
| PIE | 85.51 ± 2.82 | 85.53 ± 3.24 | 82.46 ± 2.53 | 85.46 ± 3.96 | |
| AR | 17.29 ± 0.23 | 17.32 ± 0.13 | 17.50 ± 0.25 | 17.56 ± 0.17 | |
| MNIST | 55.74 ± 2.62 | 58.00 ± 2.81 | 55.55 ± 1.70 | 57.55 ± 1.59 | |
| jaffe | 24.65 ± 0.88 | 25.68 ± 0.24 | 24.79 ± 1.36 | 26.44 ± 0.50 | |
Figure 4The decreasing of the KMSR-G objective function values in terms of iterations (λ = 0.1).
(A) abalone. (B) scale. (C) PIE. (D) YaleB. (E) jaffe. (F) umist.
Figure 5The accuracy of KMSR-G obtained in different settings of λ.
(A) abalone. (B) scale. (C) AT&T. (D) PIE. (E) Yale. (F) YaleB. (G) MNIST. (H) umist.
Figure 6Number of iterations in updating Y on AT&T and COIL20 data sets.
(A) AT&T. (B) COIL20.
Clustering performance (%) of KMSR-G based on learning-based graphs of CAN and PCAN.
| Data Sets | CAN | KMSR-GC | PCAN | KMSR-GPC |
|---|---|---|---|---|
| Acc | ||||
| ecoli | 81.96 | 81.96 | ||
| abalone | 50.97 | 50.97 | ||
| scale | 53.28 | 66.72 | ||
| COIL20 | 83.96 | 81.81 | ||
| umist | 69.04 | 54.78 | ||
| AT&T | 55.25 | 60.00 | ||
| YaleB | 37.12 | 38.69 | ||
| Yale | 41.21 | 40.00 | ||
| PIE | 100.0 | 100.0 | ||
| AR | 13.67 | 13.33 | ||
| MNIST | 45.80 | 45.80 | ||
| jaffe | 22.64 | 25.00 | ||
| NMI | ||||
| ecoli | 65.74 | 66.32 | ||
| abalone | 12.49 | 12.41 | ||
| scale | 17.80 | 17.04 | ||
| COIL20 | 91.34 | 89.60 | ||
| umist | 81.72 | 65.02 | ||
| AT&T | 73.88 | 74.13 | ||
| YaleB | 39.98 | 39.99 | ||
| Yale | 44.95 | 42.64 | ||
| PIE | 100.0 | 100.0 | ||
| AR | 42.93 | 35.88 | ||
| MNIST | 49.87 | 46.35 | ||
| jaffe | 10.80 | 11.10 | ||
| Purity | ||||
| ecoli | 82.87 | 82.87 | ||
| abalone | 53.08 | 53.00 | ||
| scale | 67.20 | 70.08 | ||
| COIL20 | 87.22 | 86.11 | ||
| umist | 74.78 | 62.78 | ||
| AT&T | 64.25 | 67.50 | ||
| YaleB | 39.73 | 40.56 | ||
| Yale | 43.03 | 42.42 | ||
| PIE | 100.0 | 100.0 | ||
| AR | 16.83 | 17.17 | ||
| MNIST | 50.90 | 51.10 | ||
| jaffe | 23.11 | 25.94 | ||