| Literature DB >> 25993469 |
Jinchao Ji1, Wei Pang2, Yanlin Zheng3, Zhe Wang4, Zhiqiang Ma3.
Abstract
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data.Entities:
Mesh:
Year: 2015 PMID: 25993469 PMCID: PMC4439097 DOI: 10.1371/journal.pone.0127125
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The AC of the four algorithms on the Zoo dataset.
| Algorithms | AC | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.9307 | 0.9134 | 0.0124 |
|
| 0.9109 | 0.8287 | 0.0502 |
|
| 0.9208 | 0.8371 | 0.0490 |
|
| 0.9208 | 0.9074 | 0.0141 |
The RI of the four algorithms on the Zoo dataset.
| Algorithms | RI | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.9766 | 0.8969 | 0.0079 |
|
| 0.9549 | 0.8758 | 0.0465 |
|
| 0.9604 | 0.8787 | 0.0388 |
|
| 0.9018 | 0.8934 | 0.0079 |
The AC of the four algorithms on the Breast Cancer dataset.
| Algorithms | AC | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.9399 | 0.9199 | 0.0134 |
|
| 0.9399 | 0.8568 | 0.1201 |
|
| 0.9399 | 0.7694 | 0.1338 |
|
| 0.6552 | 0.6552 | 0.0000 |
The RI of the four algorithms on the Breast Cancer dataset.
| Algorithms | RI | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.8869 | 0.8527 | 0.0229 |
|
| 0.8869 | 0.7744 | 0.1565 |
|
| 0.8869 | 0.6571 | 0.1757 |
|
| 0.5197 | 0.5178 | 0.0027 |
The AC of the four algorithms on the Soybean dataset.
| Algorithms | AC | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 1.0000 | 0.9862 | 0.0199 |
|
| 1.0000 | 0.8883 | 0.1193 |
|
| 1.0000 | 0.8032 | 0.1074 |
|
| 1.0000 | 0.9829 | 0.0191 |
The RI of the four algorithms on the Soybean dataset.
| Algorithms | RI | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 1.0000 | 0.9850 | 0.0216 |
|
| 1.0000 | 0.9023 | 0.1078 |
|
| 1.0000 | 0.8507 | 0.0726 |
|
| 1.0000 | 0.9813 | 0.0208 |
The AC of the four algorithms on the Lung Cancer dataset.
| Algorithms | AC | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.6563 | 0.5578 | 0.0529 |
|
| 0.5938 | 0.5344 | 0.0417 |
|
| 0.5938 | 0.5313 | 0.0517 |
|
| 0.6562 | 0.5562 | 0.0561 |
The RI of the four algorithms on the Lung Cancer dataset.
| Algorithms | RI | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.6593 | 0.6019 | 0.0270 |
|
| 0.6431 | 0.5919 | 0.0294 |
|
| 0.6331 | 0.5976 | 0.0241 |
|
| 0.6452 | 0.6010 | 0.0260 |
The AC of the four algorithms on the Mushroom dataset.
| Algorithms | AC | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.8946 | 0.8573 | 0.0993 |
|
| 0.8000 | 0.5998 | 0.0698 |
|
| 0.8326 | 0.7033 | 0.0997 |
|
| 0.6489 | 0.5523 | 0.0448 |
The RI of the four algorithms on the Mushroom dataset.
| Algorithms | RI | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.8114 | 0.7740 | 0.0914 |
|
| 0.6799 | 0.5291 | 0.0518 |
|
| 0.7212 | 0.6015 | 0.0796 |
|
| 0.5443 | 0.5089 | 0.0144 |
The AC of the four algorithms on the Dermatology dataset.
| Algorithms | AC | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.8361 | 0.7652 | 0.0339 |
|
| 0.7951 | 0.6984 | 0.0752 |
|
| 0.7240 | 0.6848 | 0.0449 |
|
| 0.7404 | 0.6246 | 0.0693 |
The RI of the four algorithms on the Dermatology dataset.
| Algorithms | RI | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.9073 | 0.8548 | 0.0255 |
|
| 0.8882 | 0.8206 | 0.0503 |
|
| 0.8630 | 0.8274 | 0.0297 |
|
| 0.8801 | 0.8293 | 0.0257 |
The average running time of the four algorithms on different datasets.
| Datasets (number of data objects, number of attributes) | Average running time (seconds) | |||
|---|---|---|---|---|
| ABC-K-Modes | K-Modes | Fuzzy K-Modes | Genetic K-Modes | |
|
| 11.3711 | 0.0237 | 0.0352 | 0.2774 |
|
| 30.2426 | 0.3631 | 0.0441 | 9.1701 |
|
| 6.4793 | 0.0129 | 0.0252 | 0.2059 |
|
| 6.9895 | 0.0125 | 0.0256 | 0.1663 |
|
| 738.1470 | 90.2529 | 1.8341 | 3270.2254 |
|
| 92.7335 | 0.2294 | 0.2330 | 6.6162 |
The PR of the four algorithms on the Zoo dataset.
| Algorithms | PR | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.9089 | 0.8796 | 0.0162 |
|
| 0.8798 | 0.8331 | 0.0452 |
|
| 0.8828 | 0.7761 | 0.0988 |
|
| 0.8819 | 0.8694 | 0.0126 |
The RE of the four algorithms on the Zoo dataset.
| Algorithms | RE | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.8286 | 0.8144 | 0.0083 |
|
| 0.8145 | 0.6126 | 0.1036 |
|
| 0.8143 | 0.6375 | 0.1026 |
|
| 0.8143 | 0.8024 | 0.0244 |
The PR of the four algorithms on the Breast Cancer dataset.
| Algorithms | PR | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.9385 | 0.9320 | 0.0044 |
|
| 0.9385 | 0.8785 | 0.0994 |
|
| 0.9385 | 0.7988 | 0.1187 |
|
| 0.7439 | 0.7096 | 0.0480 |
The RE of the four algorithms on the Breast Cancer dataset.
| Algorithms | RE | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.9276 | 0.8924 | 0.0236 |
|
| 0.9276 | 0.7998 | 0.1791 |
|
| 0.9276 | 0.6768 | 0.2021 |
|
| 0.5000 | 0.5000 | 0.0000 |
The PR of the four algorithms on the Soybean dataset.
| Algorithms | PR | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 1.0000 | 0.9864 | 0.0195 |
|
| 1.0000 | 0.9409 | 0.0497 |
|
| 1.0000 | 0.8419 | 0.1046 |
|
| 1.0000 | 0.9829 | 0.0188 |
The RE of the four algorithms on the Soybean dataset.
| Algorithms | RE | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 1.0000 | 0.9904 | 0.0137 |
|
| 1.0000 | 0.8765 | 0.1444 |
|
| 1.0000 | 0.7728 | 0.1279 |
|
| 1.0000 | 0.9882 | 0.0131 |
The PR of the four algorithms on the Lung Cancer dataset.
| Algorithms | PR | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.7152 | 0.6142 | 0.0662 |
|
| 0.6955 | 0.5992 | 0.0790 |
|
| 0.7033 | 0.5757 | 0.0880 |
|
| 0.6905 | 0.5974 | 0.0783 |
The RE of the four algorithms on the Lung Cancer dataset.
| Algorithms | RE | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.6530 | 0.5654 | 0.0501 |
|
| 0.6333 | 0.5390 | 0.0560 |
|
| 0.6333 | 0.5504 | 0.0648 |
|
| 0.6481 | 0.5619 | 0.0546 |
The PR of the four algorithms on the Mushroom dataset.
| Algorithms | PR | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.9128 | 0.8725 | 0.1047 |
|
| 0.8574 | 0.6058 | 0.0874 |
|
| 0.8695 | 0.7273 | 0.1165 |
|
| 0.6598 | 0.5529 | 0.0472 |
The RE of the four algorithms on the Mushroom dataset.
| Algorithms | RE | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.8910 | 0.8537 | 0.0993 |
|
| 0.7927 | 0.5956 | 0.0686 |
|
| 0.8269 | 0.6969 | 0.1007 |
|
| 0.6433 | 0.5418 | 0.0508 |
The PR of the four algorithms on the Dermatology dataset.
| Algorithms | PR | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.8961 | 0.8221 | 0.0538 |
|
| 0.8866 | 0.7742 | 0.0795 |
|
| 0.8205 | 0.7322 | 0.0609 |
|
| 0.7294 | 0.6918 | 0.0645 |
The RE of the four algorithms on the Dermatology dataset.
| Algorithms | RE | ||
|---|---|---|---|
| Best | Avg. | Std. | |
|
| 0.7620 | 0.6508 | 0.0570 |
|
| 0.7316 | 0.5716 | 0.0796 |
|
| 0.6660 | 0.5661 | 0.0527 |
|
| 0.6358 | 0.5185 | 0.0539 |