| Literature DB >> 26336509 |
Parvaneh Shabanzadeh1, Rubiyah Yusof1.
Abstract
Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.Entities:
Mesh:
Year: 2015 PMID: 26336509 PMCID: PMC4532808 DOI: 10.1155/2015/802754
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1The encoding of an example of candidate solution.
Algorithm 1Pseudocodes of proposed method.
Summarized characteristics of the test datasets.
| Name of dataset | Number of data objects | Number of features | Number of clusters |
|---|---|---|---|
| Cancer | 683 | 9 | 2 (444, 239) |
| CMC | 1473 | 9 | 3 (629, 334, 510) |
| Glass | 214 | 9 | 6 (70, 76, 17, 13, 9, 29) |
| Iris | 150 | 4 | 3 (50, 50, 50) |
| Vowel | 871 | 3 | 6 (72, 89, 172, 151, 207, 180) |
| Wine | 178 | 13 | 3 (59, 71, 48) |
Intracluster distances for real life datasets.
| Dataset | Criteria |
| TS | SA | PSO | BB-BC | GA | GSA | ACO | BBO |
|---|---|---|---|---|---|---|---|---|---|---|
| Cancer | Average | 3032.2478 | 3251.37 | 3239.17 | 2981.7865 | 2964.3880 | 3249.46 | 2972.6631 | 3,046.06 | 2964.3879 |
| Best | 2986.9613 | 2982.84 | 2993.45 | 2974.4809 | 2964.3875 | 2999.32 | 2965.7639 | 2,970.49 | 2964.3875 | |
| Worst | 5216.0895 | 3434.16 | 3421.95 | 3053.4913 | 2964.3890 | 3427.43 | 2993.2446 | 3,242.01 | 2964.3887 | |
| Std. | 315.1456 | 232.217 | 230.192 | 10.43651 | 0.00048 | 229.734 | 8.91860 | 90.50028 | 0.00036 | |
|
| ||||||||||
| CMC | Average | 5543.4234 | 5993.59 | 5893.48 | 5547.8932 | 5574.7517 | 5756.59 | 5581.9450 | 5,819.1347 | 5532.2550 |
| Best | 5542.1821 | 5885.06 | 5849.03 | 5539.1745 | 5534.0948 | 5705.63 | 5542.2763 | 5,701.9230 | 5532.2113 | |
| Worst | 5545.3333 | 5999.80 | 5966.94 | 5561.6549 | 5644.7026 | 5812.64 | 5658.7629 | 5,912.4300 | 5532.432 | |
| Std. | 1.5238 | 40.845 | 50.867 | 7.35617 | 39.4349 | 50.369 | 41.13648 | 45.634700 | 0.06480 | |
|
| ||||||||||
| Glass | Average | 227.9779 | 283.79 | 282.19 | 230.49328 | 231.2306 | 255.38 | 233.5433 | 273.46 | 215.2097 |
| Best | 215.6775 | 279.87 | 275.16 | 223.90546 | 223.8941 | 235.50 | 224.9841 | 269.72 | 210.6173 | |
| Worst | 260.8385 | 286.47 | 287.18 | 246.08915 | 243.2088 | 278.37 | 248.3672 | 280.08 | 233.9314 | |
| Std. | 14.1389 | 4.19 | 4.238 | 4.79320 | 4.6501 | 12.47 | 6.13946 | 3.5848 | 3.525 | |
|
| ||||||||||
| Iris | Average | 105.7290 | 97.8680 | 99.95 | 98.1423 | 96.7654 | 125.1970 | 96.7311 | 97.1715 | 96.5653 |
| Best | 97.3259 | 97.3659 | 97.45 | 96.8793 | 96.6765 | 113.9865 | 96.6879 | 97.1007 | 96.5403 | |
| Worst | 128.4042 | 98.56949 | 102.01 | 99.7695 | 97.4287 | 139.7782 | 96.8246 | 97.8084 | 96.6609 | |
| Std. | 12.3876 | 72.86 | 2.018 | 0.84207 | 0.20456 | 14.563 | 0.02761 | 0.367 | 0.0394 | |
|
| ||||||||||
| Vowel | Average | 153,660.8071 | 162108.53 | 161566.28 | 153,218.23418 | 151,010.0339 | 159153.49 | 152,931.8104 | 159,458.1438 | 149072.9042 |
| Best | 149,394.8040 | 149468.26 | 149370.47 | 152,461.56473 | 149,038.5168 | 149513.73 | 151,317.5639 | 149,395.602 | 148967.2544 | |
| Worst | 168,474.2659 | 165996.42 | 165986.42 | 158,987.08231 | 153,090.4407 | 165991.65 | 155,346.6952 | 165,939.8260 | 153051.96931 | |
| Std. | 4123.04203 | 2846.235 | 0.645 | 2945.23167 | 1859.3235 | 3105.544 | 2486.70285 | 3,485.3816 | 137.7311 | |
|
| ||||||||||
| Wine | Average | 16,963.0441 | 16785.46 | 17,521.09 | 16,316.2745 | 16,303.4121 | 16,530.53 | 16,374.3091 | 16,530.53381 | 16292.6782 |
| Best | 16,555.6794 | 16666.22 | 16,473.48 | 16,304.4858 | 16,298.6736 | 16,530.53 | 16,313.8762 | 16,530.53381 | 16292.6782 | |
| Worst | 23,755.0495 | 16837.54 | 18,083.25 | 16,342.7811 | 16,310.1135 | 16,530.53 | 16,428.8649 | 16,530.53381 | 16292.6782 | |
| Std. | 1180.6942 | 52.073 | 753.084 | 12.60275 | 2.6620 | 0 | 34.67122 | 0 | 0 | |
The obtained best centroids coordinate for Cancer data.
| Cancer data | Cluster 1 | Cluster 2 |
|---|---|---|
| Feature A | 7.1156 | 2.8896 |
| Feature B | 6.6398 | 1.1278 |
| Feature C | 6.6238 | 1.2018 |
| Feature D | 5.6135 | 1.1646 |
| Feature E | 5.2402 | 1.9943 |
| Feature F | 8.0995 | 1.1215 |
| Feature G | 6.0789 | 2.0059 |
| Feature H | 6.0198 | 1.1014 |
| Feature I | 2.3282 | 1.0320 |
The obtained best centroids coordinate for CMC data.
| CMC data | Cluster 1 | Cluster 2 | Cluster 3 |
|---|---|---|---|
| Feature A | 43.6354 | 33.4957 | 24.4102 |
| Feature B | 3.0140 | 3.1307 | 3.0417 |
| Feature C | 3.4513 | 3.5542 | 3.5181 |
| Feature D | 4.582 | 3.6511 | 1.7947 |
| Feature E | 0.7965 | 0.7928 | 0.9275 |
| Feature F | 0.7629 | 0.6918 | 0.7928 |
| Feature G | 1.8245 | 2.0903 | 2.2980 |
| Feature H | 3.4355 | 3.29183 | 2.9754 |
| Feature I | 0.094 | 0.0573 | 0.037 |
The obtained best centroids coordinate for Glass data.
| Glass data | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Cluster 6 |
|---|---|---|---|---|---|---|
| Feature A | 1.5260 | 1.5156 | 1.5228 | 1.5266 | 1.5203 | 1.5243 |
| Feature B | 11.9759 | 13.0863 | 14.6577 | 13.2229 | 13.7277 | 13.8085 |
| Feature C | 0.006 | 3.5272 | 0.0061 | 0.4232 | 3.5127 | 2.3414 |
| Feature D | 1.0514 | 1.3618 | 2.2170 | 1.5242 | 1.0249 | 2.5919 |
| Feature E | 72.0540 | 72.8710 | 73.2504 | 73.0610 | 71.9072 | 71.1423 |
| Feature F | 0.2552 | 0.5768 | 0.0299 | 0.3865 | 0.2067 | 2.5749 |
| Feature G | 14.3566 | 8.3588 | 8.6714 | 11.1471 | 9.4166 | 5.9948 |
| Feature H | 0.1808 | 0.0046 | 1.047 | 0.00979 | 0.0281 | 1.3373 |
| Feature I | 0.1254 | 0.0568 | 0.0196 | 0.1544 | 0.0498 | 0.2846 |
The obtained best centroids coordinate for Iris data.
| Iris data | Cluster 1 | Cluster 2 | Cluster 3 |
|---|---|---|---|
| Feature A | 5.0150 | 5.9338 | 6.7343 |
| Feature B | 3.4185 | 2.7974 | 3.0681 |
| Feature C | 1.4681 | 4.4173 | 5.6299 |
| Feature D | 0.2380 | 1.4165 | 2.1072 |
The obtained best centroids coordinate for Vowel data.
| Vowel data | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Cluster 6 |
|---|---|---|---|---|---|---|
| Feature A | 357.8349 | 375.8459 | 508.1135 | 407.9219 | 623.6778 | 439.6126 |
| Feature B | 2,291.6435 | 2,148.4110 | 1,838.2133 | 1,0182.0145 | 1,309.8038 | 987.4300 |
| Feature C | 2,978.2399 | 2,678.8524 | 2,555.9085 | 2,317.2847 | 2,332.7767 | 2,665.4154 |
The obtained best centroids coordinates for Wine data.
| Wine data | Cluster 1 | Cluster 2 | Cluster 3 |
|---|---|---|---|
| Feature A | 13.3856 | 12.7859 | 12.7093 |
| Feature B | 1.9976 | 2.3535 | 2.3219 |
| Feature C | 2.3150 | 2.4954 | 2.4497 |
| Feature D | 16.9836 | 19.5480 | 21.1983 |
| Feature E | 105.2124 | 98.9327 | 92.6449 |
| Feature F | 3.0255 | 2.0964 | 2.1366 |
| Feature G | 3.1380 | 1.4428 | 1.9187 |
| Feature H | 0.51050 | 0.31322 | 0.3520 |
| Feature I | 2.3769 | 1.7629 | 1.4966 |
| Feature J | 5.7760 | 5.8415 | 4.3213 |
| Feature K | 0.8339 | 1.1220 | 1.2229 |
| Feature L | 3.0686 | 1.9611 | 2.5417 |
| Feature M | 1137.4923 | 687.3041 | 463.8856 |
Error rates for real life datasets.
| Dataset |
| PSO | GSA | BBO |
|---|---|---|---|---|
| Cancer | 4.08 | 5.11 | 3.74 | 3.7 |
| CMC | 54.49 | 54.41 | 55.67 | 54.22 |
| Glass | 37.71 | 45.59 | 41.39 | 36.47 |
| Iris | 17.80 | 12.53 | 10.04 | 10.03 |
| Vowel | 44.26 | 44.65 | 42.26 | 41.36 |
| Wine | 31.12 | 28.71 | 29.15 | 28.65 |