| Literature DB >> 26132309 |
Ahmad Abubaker1, Adam Baharum2, Mahmoud Alrefaei3.
Abstract
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, "MOPSOSA". The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets.Entities:
Mesh:
Year: 2015 PMID: 26132309 PMCID: PMC4488466 DOI: 10.1371/journal.pone.0130995
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flowchart for the proposed MOPSOSA algorithm.
Fig 2Flowchart for initializing particle swarm.
Fig 3Flowchart for the MOSA technique applied in MOPSOSA.
Description of the artificial and real-life datasets.
| Dataset | # Points | Dimension | # Clusters |
|---|---|---|---|
| Sph_5_2 | 250 | 2 | 5 |
| Sph_4_3 | 400 | 3 | 4 |
| Sph_6_2 | 300 | 2 | 6 |
| Sph_10_2 | 500 | 2 | 10 |
| Sph_9_2 | 900 | 2 | 9 |
| Pat1 | 557 | 2 | 3 |
| Pat2 | 417 | 2 | 2 |
| Long1 | 1000 | 2 | 2 |
| Sizes5 | 1000 | 2 | 4 |
| Spiral | 1000 | 2 | 2 |
| Square1 | 1000 | 2 | 4 |
| Square4 | 1000 | 2 | 4 |
| Twenty | 1000 | 2 | 20 |
| Fourty | 1000 | 2 | 40 |
| Iris | 150 | 4 | 3 |
| Cancer | 683 | 9 | 2 |
| Newthyroid | 215 | 5 | 3 |
| LiverDisorder | 345 | 6 | 2 |
| Glass | 214 | 9 | 6 |
Fig 4Graphs of the artificial datasets.
(a) Sph_5_2. (b) Sph_4_3. (c) Sph_6_2. (d) Sph_10_2. (e) Sph_9_2. (f) Pat1. (g) Pat2. (h) Long1. (i) Sizes5. (j) Spiral. (k) Square1. (l) Square4. (m) Twenty. (n) Fourty.
Parameter settings used in MOPSOSA algorithm.
| Description | Parameters | Value |
|---|---|---|
| Swarm size |
| 50 |
| Number of iteration |
| 100 |
| Probability value to generate |
| 0.95 |
| Probability value to generate |
| 0.90 |
| Probability value to generate |
| 0.90 |
| Minimum number of clusters |
| 2 |
| Maximum number of clusters |
|
|
| Initial temperature |
| 100 |
F-measure value and the number of clusters for different datasets obtained by MOPSOSA compared with those acquired by GenClustMOO, GenClustPESA2, MOCK, and VGAPS algorithms.
| MOPSOSA | GenClustMOO [ | GenClustPESA2 [ | MOCK [ | VGAPS [ | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | # Clusters | k | FM | k | FM | k | FM | k | FM | k | FM |
| Sph_5_2 | 5 | 5 | 0.98 | 5 | 0.97 | 5 | 0.94 | 6 | 0.91 | 5 | 0.55 |
| Sph_4_3 | 4 | 4 | 1.00 | 4 | 1.00 | 4 | 1.00 | 4 | 1.00 | 4 | 1.00 |
| Sph_6_2 | 6 | 6 | 1.00 | 6 | 1.00 | 6 | 1.00 | 6 | 1.00 | 6 | 1.00 |
| Sph_10_2 | 10 | 10 | 0.99 | 10 | 0.99 | 12 | 0.94 | 6 | 0.72 | 7 | 0.76 |
| Sph_9_2 | 9 | 9 | 0.92 | 9 | 0.69 | 8 | 0.66 | 9 | 0.73 | 9 | 0.49 |
| Pat1 | 3 | 3 | 1.00 | 3 | 0.95 | 3 | 0.95 | 10 | 0.55 | 4 | 0.42 |
| Pat2 | 2 | 2 | 1.00 | 2 | 1.00 | 2 | 1.00 | 11 | 0.55 | 4 | 0.59 |
| Long1 | 2 | 2 | 1.00 | 2 | 1.00 | 2 | 1.00 | 2 | 1.00 | 3 | 0.50 |
| Sizes5 | 4 | 4 | 0.98 | 4 | 0.97 | 3 | 0.88 | 2 | 0.80 | 5 | 0.82 |
| Spiral | 2 | 2 | 1.00 | 2 | 1.00 | 2 | 1.00 | 3 | 0.95 | 6 | 0.38 |
| Square1 | 4 | 4 | 0.99 | 4 | 0.99 | 4 | 0.99 | 4 | 0.99 | 4 | 0.99 |
| Square4 | 4 | 4 | 0.94 | 4 | 0.92 | 4 | 0.88 | 4 | 0.90 | 2 | 0.93 |
| Twenty | 20 | 20 | 1.00 | 20 | 1.00 | 24 | 0.95 | 20 | 1.00 | 20 | 0.48 |
| Fourty | 40 | 40 | 1.00 | 40 | 1.00 | 40 | 0.98 | 40 | 1.00 | 2 | 0.10 |
| Iris | 3 | 3 | 0.92 | 3 | 0.79 | 3 | 0.93 | 2 | 0.78 | 3 | 0.76 |
| Cancer | 2 | 2 | 0.98 | 2 | 0.97 | 2 | 0.98 | 2 | 0.82 | 2 | 0.95 |
| Newthyroid | 3 | 3 | 0.89 | 3 | 0.86 | 9 | 0.69 | 2 | 0.74 | 5 | 0.66 |
| Liver Disorder | 2 | 2 | 0.69 | 2 | 0.67 | 5 | 0.60 | 2 | 0.67 | 2 | 0.70 |
| Glass | 6 | 6 | 0.57 | 6 | 0.49 | 5 | 0.53 | 5 | 0.53 | 5 | 0.53 |
Averages and standard deviations for the F-measure values on the different datasets obtained from MOPSOSA, GenClustMOO, GenClustPESA2, MOCK, VGAPS, KM, and SL algorithms.
| F-measure that obtained from | |||||||
|---|---|---|---|---|---|---|---|
| Dataset | MOPSOSA | GenClustMOO [ | GenClustPESA2 [ | MOCK [ | VGAPS [ | KM [ | SL [ |
| Sph_5_2 |
| 0.957 ± 0.021 | 0.936 ± 0.012 | 0.902 ± 0.011 | 0.541 ± 0.011 | 0.938 ± 0.015 | 0.661 ± 0.012 |
| Sph_4_3 |
|
|
|
|
|
|
|
| Sph_6_2 |
|
|
|
|
|
|
|
| Sph_10_2 |
| 0.981 ± 0.011 | 0.931 ± 0.021 | 0.717 ± 0.013 | 0.752 ± 0.011 | 0.891 ± 0.014 | 0.841 ± 0.011 |
| Sph_9_2 |
| 0.681 ± 0.012 | 0.652 ± 0.018 | 0.717 ± 0.009 | 0.481 ± 0.012 | 0.683 ± 0.013 | 0.250 ± 0.014 |
| Pat1 |
| 0.946 ± 0.013 | 0.946 ± 0.009 | 0.547 ± 0.011 | 0.418 ± 0.014 | 0.618 ± 0.008 | 0.882 ± 0.011 |
| Pat2 |
|
|
| 0.545 ± 0.013 | 0.582 ± 0.021 | 0.754 ± 0.013 |
|
| Long1 |
|
|
|
| 0.487 ± 0.021 | 0.500 ± 0.011 |
|
| Sizes5 |
| 0.968 ± 0.001 | 0.883 ± 0.011 | 0.791 ± 0.012 | 0.816 ± 0.013 | 0.226 ± 0.021 | 0.181 ± 0.011 |
| Spiral |
|
|
| 0.948 ± 0.011 | 0.373 ± 0.016 | 0.509 ± 0.011 | 0.504 ± 0.015 |
| Square1 |
|
|
|
|
| 0.732 ± 0.021 | 0.368 ± 0.006 |
| Square4 |
| 0.918 ± 0.014 | 0.878 ± 0.011 | 0.895 ± 0.011 | 0.925 ± 0.013 | 0.715 ± 0.015 | 0.368 ± 0.016 |
| Twenty |
|
| 0.948 ± 0.015 |
| 0.479 ± 0.022 | 0.809 ± 0.003 | 0.947 ± 0.009 |
| Fourty |
|
| 0.979 ± 0.015 |
| 0.950 ± 0.006 | 0.798 ± 0.018 | 0.909 ± 0.023 |
| Iris |
| 0.788 ± 0.011 | 0.926 ± 0.015 | 0.775 ± 0.022 | 0.754 ± 0.013 | 0.887 ± 0.001 | 0.764 ± 0.009 |
| Cancer |
| 0.969 ± 0.009 | 0.979 ± 0.014 | 0.918 ± 0.014 | 0.953 ± 0.012 | 0.961 ± 0.013 | 0.688 ± 0.008 |
| Newthyroid |
| 0.863 ± 0.016 | 0.687 ± 0.015 | 0.739 ± 0.014 | 0.659 ± 0.011 | 0.677 ± 0.013 | 0.648 ± 0.009 |
| Liver Disorder |
| 0.673 ± 0.002 | 0.603 ± 0.015 | 0.671 ± 0.012 | 0.705 ± 0.009 | 0.655 ± 0.013 | 0.672 ± 0.006 |
| Glass |
| 0.494 ± 0.012 | 0.534 ± 0.012 | 0.534 ± 0.006 | 0.534 ± 0.008 | 0.492 ± 0.014 | 0.422 ± 0.007 |
The best F-measure for each dataset is marked in bold. Each algorithm is implemented on 30 independent runs.
Fig 5Graphs of the artificial datasets after applying the MOPSOSA algorithm.
(a) Sph_5_2. (b) Sph_4_3. (c) Sph_6_2. (d) Sph_10_2. (e) Sph_9_2. (f) Pat1. (g) Pat2. (h) Long1. (i) Sizes5. (j) Spiral. (k) Square1. (l) Square4. (m) Twenty. (n) Fourty.