| Literature DB >> 26348483 |
Wei-Chang Yeh1, Chyh-Ming Lai1.
Abstract
Data clustering is commonly employed in many disciplines. The aim of clustering is to partition a set of data into clusters, in which objects within the same cluster are similar and dissimilar to other objects that belong to different clusters. Over the past decade, the evolutionary algorithm has been commonly used to solve clustering problems. This study presents a novel algorithm based on simplified swarm optimization, an emerging population-based stochastic optimization approach with the advantages of simplicity, efficiency, and flexibility. This approach combines variable vibrating search (VVS) and rapid centralized strategy (RCS) in dealing with clustering problem. VVS is an exploitation search scheme that can refine the quality of solutions by searching the extreme points nearby the global best position. RCS is developed to accelerate the convergence rate of the algorithm by using the arithmetic average. To empirically evaluate the performance of the proposed algorithm, experiments are examined using 12 benchmark datasets, and corresponding results are compared with recent works. Results of statistical analysis indicate that the proposed algorithm is competitive in terms of the quality of solutions.Entities:
Mesh:
Year: 2015 PMID: 26348483 PMCID: PMC4562660 DOI: 10.1371/journal.pone.0137246
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Example of a solution string.
Fig 2An amplitude of V(t) along with iteration when v = 5.
Fig 3Particles moving behavior in accelerated strategy.
Fig 4Flowchart of the VSSO-RCS algorithm.
Characteristics of the considered data sets.
| Categorization | Dataset | Features ( | Instances ( | Clusters ( |
|---|---|---|---|---|
| Small | Vowel | 3 | 871 | 6 |
| Iris | 4 | 150 | 3 | |
| Crude oil | 5 | 56 | 3 | |
| Cancer | 9 | 683 | 2 | |
| CMC | 9 | 1473 | 3 | |
| Glass | 9 | 214 | 6 | |
| MG Telescope | 10 | 19020 | 2 | |
| Wine | 13 | 178 | 3 | |
| EGG eye | 15 | 14980 | 2 | |
| Medium | WDBC | 30 | 569 | 2 |
| Ionosphere | 34 | 351 | 2 | |
| Large | Sonar | 60 | 208 | 2 |
Value of parameters in six algorithms.
| Parameter | KSRPSO | SSO-based | VSSO-RCS | GSA-KM |
|---|---|---|---|---|
| Value | ||||
|
| 0.5, 2.5 | - | - | - |
|
| 0.5× | - | - | - |
|
| 0.2, 0.2, 0.8 | - | - | - |
|
| - | 0.1, 0.4, 0.9 | 0.2, -, 0.9 | - |
|
| - | - | 10, 0.1 | - |
|
| - | - | - | 100, 20 |
a There is no parameter need to be set in BH [3].
Results of three accelerated strategies.
| Dataset | Criteria | KM | SSO | SSO-KMO | SSO-OFK | SSO-RCS |
|---|---|---|---|---|---|---|
| Cancer | Best | 2,986.96 | 2,965.13 | 2,965.56 | 2,964.82 | 2,964.93 |
| Avg. | 2,987.84 | 2,967.87 | 2,967.11 | 2,966.65 |
| |
| Worst | 2,988.43 | 2,975.96 | 2,969.20 | 2,970.40 | 2,967.93 | |
| Std. | 0.73 | 2.55 | 1.05 | 1.40 | 0.82 | |
| CT(s) | 1.45 | 5.52 | 10.84 | 5.79 | 5.83 | |
| Glass | Best | 213.24 | 213.52 | 211.53 | 211.70 | 211.34 |
| Avg. | 223.58 | 223.85 | 220.55 | 214.71 |
| |
| Worst | 253.83 | 240.66 | 243.43 | 218.57 | 215.60 | |
| Std. | 10.21 | 9.74 | 10.00 | 1.65 | 1.01 | |
| CT(s) | 2.75 | 45.64 | 85.25 | 48.05 | 48.42 | |
| INSP | Best | 796.33 | 811.75 | 795.90 | 795.42 | 795.13 |
| Avg. | 796.40 | 817.62 | 796.23 | 796.10 |
| |
| Worst | 796.47 | 824.36 | 796.66 | 796.33 | 795.59 | |
| Std. | 0.07 | 3.72 | 0.28 | 0.28 | 0.15 | |
| CT(s) | 1.88 | 110.06 | 197.01 | 108.65 | 114.02 | |
| Sonar | Best | 234.77 | 235.25 | 240.00 | 234.72 | 234.51 |
| Avg. | 235.10 | 238.32 | 240.42 | 235.06 |
| |
| Worst | 235.21 | 240.32 | 241.66 | 235.21 | 234.60 | |
| Std. | 0.15 | 2.33 | 0.50 | 0.20 | 0.03 | |
| CT(s) | 2.29 | 664.94 | 693.93 | 432.56 | 425.46 |
Fig 5Convergence comparison of different accelerated strategies.
Results obtained by the algorithms on Vowel, Iris, Crude oil and Cancer datasets.
| Dataset | Criteria | SSO | SSO-RCS | SSO-ELS | BH | KSRPSO | GSA-KM | VSSO-RCS |
|---|---|---|---|---|---|---|---|---|
| Vowel | Best | 149,247.64 | 149,021.16 | 149,096.69 |
| 149,089.96 | 149,076.71 |
|
| Avg | 150,421.69 | 149,315.13 | 150,118.24 | 151,684.62 | 151,758.39 | 152,289.92 |
| |
| Worst | 154,119.51 | 150,267.43 | 153,232.13 | 168,379.82 | 170,433.64 | 158,612.03 |
| |
| Std | 1,249.07 | 394.08 | 863.56 | 5,100.30 | 4,205.04 | 2,947.95 |
| |
| CT | 6.54 | 6.97 | 10.86 |
| 6.65 | 13.38 | 6.44 | |
| Iris | Best | 96.75 | 96.67 | 96.72 |
| 96.68 |
|
|
| Avg | 97.11 | 96.73 | 97.17 |
| 98.98 | 96.71 |
| |
| Worst | 98.06 | 96.82 | 97.94 | 96.71 | 127.67 | 97.22 |
| |
| Std | 0.35 | 0.04 | 0.42 | 0.01 | 6.96 | 0.13 |
| |
| CT | 0.70 | 0.74 | 1.20 |
| 0.75 | 1.85 |
| |
| Crude oil | Best | 277.25 | 277.25 | 277.24 |
| 277.22 | 277.21 |
|
| Avg | 277.53 | 277.35 | 277.99 | 277.27 | 277.35 | 277.63 |
| |
| Worst | 278.21 | 277.42 | 293.58 |
| 277.86 | 285.76 | 277.36 | |
| Std | 0.25 |
| 2.95 |
| 0.13 | 1.56 | 0.05 | |
| CT | 0.99 | 1.06 | 1.64 |
| 1.02 | 3.76 |
| |
| Cancer | Best | 2,965.13 | 2,964.93 | 2,964.95 | 2,964.39 | 2,964.86 | 2,965.00 |
|
| Avg | 2,967.87 | 2,965.91 | 2,967.05 | 2,964.40 | 2,966.32 | 2,973.42 |
| |
| Worst | 2,975.96 | 2,967.93 | 2,970.83 | 2,964.41 | 2,969.62 | 2,985.84 |
| |
| Std | 2.55 | 0.82 | 1.32 | 0.00 | 1.24 | 6.57 |
| |
| CT | 5.52 | 5.83 | 10.03 |
| 5.90 | 11.28 | 5.39 |
Results obtained by the algorithms on EGG eye, WDBC, INSP and Sonar datasets.
| Dataset | Criteria | SSO | SSO-RCS | SSO-ELS | BH | KSRPSO | GSA-KM | VSSO-RCS |
|---|---|---|---|---|---|---|---|---|
| EGG eye | Best | 8,032,669.11 | 5,644,264.31 | 2,385,500.94 |
| 3,010,467.48 | 2,778,514.06 | 2,354,756.19 |
| Avg | 16,361,259.24 | 14,464,277.41 | 2,505,596.68 | 2,586,299.09 | 3,210,719.24 | 2,791,675.62 |
| |
| Worst | 26,910,999.74 | 29,333,498.64 | 2,762,809.53 | 3,214,000.36 | 3,456,696.67 | 2,867,748.16 |
| |
| Std | 6,668,788.32 | 6,757,831.04 | 108,461.85 | 271,990.85 | 114,894.72 | 27,242.75 |
| |
| CT | 856.77 | 874.94 | 1116.37 | 850.98 | 880.40 | 912.04 |
| |
| WDBC | Best | 149,474.46 | 149,474.20 | 149,474.40 |
| 149,473.89 |
|
|
| Avg | 149,480.77 | 149,477.49 | 149,479.23 |
| 149,474.13 |
|
| |
| Worst | 149,494.82 | 149,483.18 | 149,493.45 | 149,473.87 | 149,474.62 |
|
| |
| Std | 5.93 | 2.76 | 4.10 |
| 0.20 |
|
| |
| CT | 94.57 | 105.77 | 185.15 | 90.91 |
| 732.59 | 90.76 | |
| INSP | Best | 814.52 | 794.96 | 794.32 | 793.92 | 793.78 |
|
|
| Avg | 819.07 | 795.13 | 794.86 | 794.30 | 793.87 |
|
| |
| Worst | 827.72 | 795.30 | 796.37 | 795.34 | 794.02 |
| 793.72 | |
| Std | 4.76 | 0.10 | 0.61 | 0.42 | 0.07 |
|
| |
| CT | 105.74 | 114.02 | 195.46 | 105.38 |
| 1,171.57 | 96.35 | |
| Sonar | Best | 247.73 | 234.51 | 238.85 | 234.22 | 233.77 |
|
|
| Avg | 249.19 | 234.58 | 239.87 | 245.02 | 233.86 |
|
| |
| Worst | 251.12 | 234.60 | 245.27 | 266.59 | 234.08 |
| 233.77 | |
| Std | 1.11 | 0.03 | 1.93 | 14.97 | 0.09 |
|
| |
| CT | 395.85 | 425.46 | 650.07 | 347.83 |
| 11,132.94 | 328.31 |
Results of Friedman ranks on the average of SICD.
| Dataset | SSO | SSO-RCS | SSO-ELS | BH | KSRPSO | GSA-KM | VSSO-RCS |
|---|---|---|---|---|---|---|---|
| Vowel | 4.00 | 2.00 | 3.00 | 5.00 | 6.00 | 7.00 | 1.00 |
| Iris | 5.00 | 4.00 | 6.00 | 1.50 | 7.00 | 3.00 | 1.50 |
| Crude oil | 5.00 | 3.50 | 7.00 | 2.00 | 3.50 | 6.00 | 1.00 |
| Cancer | 6.00 | 3.00 | 5.00 | 2.00 | 4.00 | 7.00 | 1.00 |
| CMC | 7.00 | 5.00 | 6.00 | 3.00 | 4.00 | 2.00 | 1.00 |
| Glass | 6.00 | 2.00 | 4.00 | 7.00 | 3.00 | 5.00 | 1.00 |
| MG T | 7.00 | 4.00 | 6.00 | 2.00 | 5.00 | 1.00 | 3.00 |
| Wine | 6.00 | 4.00 | 7.00 | 1.00 | 3.00 | 5.00 | 2.00 |
| EGG eye | 7.00 | 6.00 | 2.00 | 3.00 | 5.00 | 4.00 | 1.00 |
| WDBC | 7.00 | 5.00 | 6.00 | 2.00 | 4.00 | 2.00 | 2.00 |
| Ionosphere | 7.00 | 6.00 | 5.00 | 4.00 | 3.00 | 1.50 | 1.50 |
| Sonar | 7.00 | 4.00 | 5.00 | 6.00 | 3.00 | 1.50 | 1.50 |
| Average | 6.17 | 4.04 | 5.17 | 3.21 | 4.21 | 3.75 | 1.46 |
Results of Friedman tests on the average of SICD.
| Method | Statistical value |
| Hypothesis |
|---|---|---|---|
| Friedman | 34.07 | 0.000 | Rejected |
Results of the post hoc test on the average of SICD.
| Method |
|
| Hypothesis |
|---|---|---|---|
| SSO | 5.339 | 0.000 | Rejected |
| VSSO-RCS | 2.929 | 0.003 | Rejected |
| SSO-ELS | 4.205 | 0.000 | Rejected |
| BH | 1.984 | 0.047 | Rejected |
| KSRPSO | 3.118 | 0.002 | Rejected |
| GSA-KM | 2.598 | 0.009 | Rejected |
Results of Friedman ranks on the average of CPU time.
| Dataset | SSO | SSO-RCS | SSO-ELS | BH | KSRPSO | GSA-KM | VSSO-RCS |
|---|---|---|---|---|---|---|---|
| Vowel | 3.00 | 5.00 | 6.00 | 1.00 | 4.00 | 7.00 | 2.00 |
| Iris | 3.00 | 4.00 | 6.00 | 1.50 | 5.00 | 7.00 | 1.50 |
| Crude oil | 3.00 | 5.00 | 6.00 | 1.50 | 4.00 | 7.00 | 1.50 |
| Cancer | 3.00 | 4.00 | 6.00 | 1.00 | 5.00 | 7.00 | 2.00 |
| CMC | 4.00 | 5.00 | 6.00 | 1.00 | 3.00 | 7.00 | 2.00 |
| Glass | 4.00 | 5.00 | 6.00 | 2.00 | 1.00 | 7.00 | 3.00 |
| MG T | 3.00 | 4.00 | 7.00 | 1.00 | 5.00 | 6.00 | 2.00 |
| Wine | 4.00 | 5.00 | 6.00 | 1.00 | 3.00 | 7.00 | 2.00 |
| EGG eye | 3.00 | 4.00 | 7.00 | 2.00 | 5.00 | 6.00 | 1.00 |
| WDBC | 4.00 | 5.00 | 6.00 | 3.00 | 1.00 | 7.00 | 2.00 |
| INSP | 4.00 | 5.00 | 6.00 | 3.00 | 1.00 | 7.00 | 2.00 |
| Sonar | 4.00 | 5.00 | 6.00 | 3.00 | 1.00 | 7.00 | 2.00 |
| Average | 3.50 | 4.67 | 6.17 | 1.75 | 3.17 | 6.83 | 1.92 |
Results of the post hoc test on the average of SICD.
| Method |
|
| Hypothesis |
|---|---|---|---|
| SSO | 1.795 | 0.073 | Not rejected |
| VSSO-RCS | 3.118 | 0.002 | Rejected |
| SSO-ELS | 4.819 | 0.000 | Rejected |
| BH | 0.189 | 1.150 | Not rejected |
| KSRPSO | 1.417 | 0.156 | Not rejected |
| GSA-KM | 5.575 | 0.000 | Rejected |
Empirical analysis result of varying in number of data instances.
|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
|
| 16000 | 32000 | 64000 | 128000 | 256000 | 512000 | 1024000 | 2048000 |
| Max CT | 15.21 | 29.72 | 56.83 | 121.20 | 246.08 | 496.77 | 999.67 | 2149.43 |
| Mean CT | 14.47 | 28.49 | 55.80 | 119.06 | 241.86 | 484.74 | 966.41 | 1958.85 |
| Min CT | 13.85 | 27.38 | 54.90 | 116.95 | 237.07 | 473.70 | 945.32 | 1989.48 |
| Std CT | 0.48 | 0.73 | 0.69 | 1.26 | 2.77 | 6.65 | 18.57 | 51.56 |
| Ratio | 1.97 | 1.96 | 2.13 | 2.03 | 2.00 | 1.99 | 2.03 |
aRatio = Mean CT / Mean CTi-1
Empirical analysis result of varying in number of data features.
|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
|
| 400 | 800 | 1600 | 3200 | 6400 | 12800 | 25600 | 51200 |
| Max CT | 55.82 | 109.48 | 217.14 | 434.14 | 881.76 | 1764.43 | 3592.34 | 7413.43 |
| Mean CT | 54.76 | 108.70 | 214.54 | 431.66 | 877.09 | 1754.31 | 3574.15 | 7227.01 |
| Min CT | 53.43 | 107.58 | 211.54 | 428.05 | 871.33 | 1727.27 | 3557.24 | 7230.68 |
| Std CT | 0.71 | 0.60 | 1.95 | 2.07 | 2.88 | 10.68 | 10.60 | 54.29 |
| Ratio | 1.99 | 1.97 | 2.01 | 2.03 | 2.00 | 2.04 | 2.02 |
Fig 6Log-log plot of CPU time vs. instance size.
Fig 7Log-log plot of CPU time vs. feature size.
Results obtained by the algorithms on CMC, Glass, MG Telescope and Wine datasets.
| Dataset | Criteria | SSO | SSO-RCS | SSO-ELS | BH | KSRPSO | GSA-KM | VSSO-RCS |
|---|---|---|---|---|---|---|---|---|
| CMC | Best | 5,533.46 | 5,532.77 | 5,532.76 | 5,532.19 | 5,532.36 | 5,532.19 |
|
| Avg | 5,535.93 | 5,533.45 | 5,534.84 | 5,532.24 | 5,532.87 | 5,532.19 |
| |
| Worst | 5,540.84 | 5,534.59 | 5,539.57 | 5,532.45 | 5,533.59 | 5,532.19 |
| |
| Std | 1.77 | 0.45 | 1.48 | 0.05 | 0.32 |
|
| |
| CT | 27.90 | 28.66 | 46.22 |
| 27.45 | 54.62 | 27.12 | |
| Glass | Best | 213.52 | 211.34 | 211.19 | 237.65 |
| 213.11 | 210.43 |
| Avg | 223.85 | 212.11 | 218.38 | 260.93 | 217.87 | 218.52 |
| |
| Worst | 240.66 | 215.60 | 235.35 | 269.64 | 245.32 | 227.58 |
| |
| Std | 9.74 |
| 8.50 | 8.36 | 7.91 | 3.60 | 1.78 | |
| CT | 45.64 | 48.42 | 78.82 | 38.66 |
| 508.16 | 39.26 | |
| MG Telescope | Best | 1,623,698.99 | 1,623,296.10 | 1,623,772.73 | 1,623,042.28 | 1,623,322.11 |
| 1,623,042.28 |
| Avg | 1,631,490.90 | 1,625,505.00 | 1,629,648.90 | 1,623,042.31 | 1,627,770.46 |
| 1,623,045.45 | |
| Worst | 1,637,591.98 | 1,632,364.00 | 1,635,628.40 | 1,623,042.38 | 1,635,781.99 |
| 1,623,072.86 | |
| Std | 4,685.46 | 3,544.38 | 3,877.04 | 0.03 | 4,191.48 |
| 9.63 | |
| CT | 1145.16 | 1175.22 | 1709.08 |
| 1211.86 | 1341.66 | 1084.09 | |
| Wine | Best | 16,292.44 | 16,292.45 | 16,292.43 | 16,292.19 | 16,292.22 | 16,292.67 |
|
| Avg | 16,294.07 | 16,293.86 | 16,294.14 |
| 16,292.78 | 16,293.90 | 16,292.76 | |
| Worst | 16,296.90 | 16,295.72 | 16,295.67 |
| 16,294.47 |
|
| |
| Std | 1.09 | 0.89 | 1.05 | 0.70 | 0.67 |
| 0.82 | |
| CT | 20.00 | 21.65 | 34.01 |
| 18.33 | 141.91 | 18.08 |
Results of Friedman tests on the average of CPU time.
| Method | Statistical value |
| Hypothesis |
|---|---|---|---|
| Friedman | 60.464 | 0.000 | Rejected |