| Literature DB >> 22412346 |
Kaushik Suresh1, Debarati Kundu, Sayan Ghosh, Swagatam Das, Ajith Abraham, Sang Yong Han.
Abstract
This paper applies the Differential Evolution (DE) algorithm to the task of automatic fuzzy clustering in a Multi-objective Optimization (MO) framework. It compares the performances of two multi-objective variants of DE over the fuzzy clustering problem, where two conflicting fuzzy validity indices are simultaneously optimized. The resultant Pareto optimal set of solutions from each algorithm consists of a number of non-dominated solutions, from which the user can choose the most promising ones according to the problem specifications. A real-coded representation of the search variables, accommodating variable number of cluster centers, is used for DE. The performances of the multi-objective DE-variants have also been contrasted to that of two most well-known schemes of MO clustering, namely the Non Dominated Sorting Genetic Algorithm (NSGA II) and Multi-Objective Clustering with an unknown number of Clusters K (MOCK). Experimental results using six artificial and four real life datasets of varying range of complexities indicate that DE holds immense promise as a candidate algorithm for devising MO clustering schemes.Entities:
Keywords: differential evolution; fuzzy clustering; micro-array data clustering; multi-objective optimization
Year: 2009 PMID: 22412346 PMCID: PMC3297137 DOI: 10.3390/s90503981
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1.Non-dominated Pareto front for artificial dataset_3.
Details of the datasets used.
| Dataset_1 | 900 | 9 | 2 |
| Dataset _2 | 76 | 3 | 2 |
| Dataset _3 | 400 | 4 | 3 |
| Dataset _4 | 300 | 6 | 2 |
| Dataset _5 | 500 | 10 | 2 |
| Dataset_ 6 | 810 | 3 | 2 |
| Iris | 150 | 3 | 4 |
| Wine | 178 | 3 | 13 |
| Breast-Cancer | 683 | 2 | 9 |
| Yeast Sporulation | 474 | 7 | 7 |
Figure 2.Final clustering result for artificial datasets 5 and 6 with MODE for different settings of scale factor F.
Figure 3.Final clustering result for artificial datasets 5 and 6 with MODE for different settings of crossover rate Cr.
Mean value of adjusted Rand index found and standard deviations (in parentheses) by four contestant algorithms over 30 independent runs on nine datasets.
| Adjusted Rand Index | Adjusted Rand Index | Adjusted Rand Index | Adjusted Rand Index | |||||
|---|---|---|---|---|---|---|---|---|
| Dataset_1 | 9.43 (0.843) | 0.828437 (0.046182) | 9.37 (1.72) | 0.802180 (0.004782) | 8.52 (2.81) | 0.810934 (0.0059348) | ||
| Dataset_2 | 3.74 (0.363) | 0.9273464 (0.0008573) | 3.16 (0.072) | 0.9378123 (0.006821) | 3.33 (1.03) | 0.946547 (0.004536) | ||
| Dataset_3 | 4.14 (0.36) | 0.951786 (0.004827) | 3.57 (0.51) | 0.963841 (0.0046719) | 3.78 (1.25) | 0.878732 (0.0712523) | ||
| Dataset_4 | 6.13 (1.27) | 0.857463 (0.065639) | 6.28 (0.46) | 0.957818 (0.004678) | 6.08 (0.51) | 0.978761 (0.006734) | ||
| Dataset_5 | 9.24 (3.89) | 0.983785 (0.076764) | 12.43 (0.939) | 0.947641 (0.006646) | 10.41 (0.80) | 0.9454568 (0.0012043) | ||
| Dataset_6 | 5.62 (0.867) | 0.881136 (0.078348) | 4.65 (1.58) | 0.881395 (0.056483) | 5.16 (0.38) | 0.910294 (0.016743) | ||
| Iris | 3.04 (0.16) | 0.738626 (0.0756779) | 2.16 (1.06) | 0.715898 (0.005739) | 3.05 (0.37) | 0.736574 (0.075763) | ||
| Wine | 3.65 (0.83) | 0.858876 (0.0035287) | 3.88 (0.67) | 0.828645 (0.0074653) | 3.59 (0.46) | 0.864764 (0.0034398) | ||
| Breast Cancer | 2.68 (0.64) | 0.912173 (0.0043247) | 2.57 (0.60) | 0.944236 (0.006521) | 2.10 (0.53) | 0.9465731 (0.006748) | ||
Average Silhouette index and number of clusters found and standard deviations (in parentheses) by four contestant algorithms over 30 independent runs on the Yeast sporulation dataset.
| Silhouette Index | Silhouette Index | Silhouette Index | Silhouette Index | |||||
|---|---|---|---|---|---|---|---|---|
| Yeast Sporulation | 6.34 (0.32) | 0.558619 (0.057832) | 7.22 (0.68) | 0.641306 (0.04813) | 6.67 (0.857) | 0.613567 (0.005738) | ||
Unpaired t-test Results for adjusted Rand index.
| Dataset_1 | 0.021 | 2.9201 | -0.1050 to -0.0189 | 0.0059 | |
| Dataset_2 | 0.013 | 5.0453 | -0.0922 to -0.0394 | < 0.0001 | |
| Dataset_3 | 0.002 | 17.965 | -0.0452 to -0.0360 | < 0.0001 | |
| Dataset_4 | 0.005 | 6.4431 | -0.0419 to -0.0219 | < 0.0001 | |
| Dataset_5 | 0.009 | 1.3744 | -0.0309 to 0.0059 | 0.1774 | Not Significant |
| Dataset_6 | 0.003 | 2.3999 | -0.0118 to -0.0010 | 0.0214 | |
| Iris | 0.009 | 6.3744 | -0.0309 to 0.0059 | 0.1774 | |
| Wine | 0.003 | 2.3999 | -0.0118 to -0.0010 | 0.0278 | |
| Breast Cancer | 0.009 | 1.3744 | -0.0309 to 0.0059 | 0.1774 | Not Significant |
Unpaired t-test results for Silhouette index
| Yeast Sporulation | 0.003 | 2.3999 | -0.0118 to -0.0010 | 0.0214 |
Figure 4.Clustering result for artificial dataset_1.
Figure 9.Clustering result for artificial dataset_6.
Figure 10.Cluster profile plots for clustering solution obtained by MODE-based clustering algorithm for yeast sporulation data.
Figure 11.Heatmaps (Eisen plots) for clustering solution obtained by MODE-based clustering algorithm for yeast sporulation data.
Figure 12.Part of FatiGO result for (a) cluster 6 and (b) cluster 2 of the best multi-objective clustering algorithm on yeast sporulation dataset.