| Literature DB >> 22870181 |
Yi Peng1, Yong Zhang, Gang Kou, Yong Shi.
Abstract
Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm--k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study.Entities:
Mesh:
Year: 2012 PMID: 22870181 PMCID: PMC3411440 DOI: 10.1371/journal.pone.0041713
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A MCDM-based approach for determining the number of clusters in a dataset.
Data set structures.
| Data Sets | Number of Records | Number of Attributes | Number of Classes |
| Breast cancer | 699 | 10 | 2 |
| Breast tissue | 106 | 10 | 6 |
| Acute inflammations | 120 | 6 | 2 |
| Ecoli | 336 | 8 | 8 |
| Glass | 214 | 10 | 6 |
| Haberman’s survival | 306 | 3 | 2 |
| Ionosphere | 351 | 34 | 2 |
| Iris | 150 | 4 | 3 |
| Parkinsons | 197 | 23 | 2 |
| Pima Indians diabetes | 768 | 8 | 2 |
| Sonar | 208 | 60 | 2 |
| Transfusion | 748 | 5 | 2 |
| Wine | 178 | 13 | 3 |
| Wine quality (red) | 1599 | 11 | 6 |
| Yeast | 1484 | 8 | 10 |
Rankings of numbers of clusters for the yeast data set.
| PROMETHEE II | TOPSIS | WSM | ||||
| Number ofclusters | Value | Order | Value | Order | Value | Order |
| K = 2 | −0.2265 | 8 | 0.400601 | 9 | −0.25409 | 9 |
| K = 3 | 0.1125 | 3 | 0.537494 | 5 | −0.1994 | 3 |
| K = 4 | −0.17975 | 7 | 0.451931 | 8 | −0.2342 | 7 |
| K = 5 | 0.102 | 4 | 0.539354 | 4 | −0.2154 | 4 |
| K = 6 | −0.31675 | 9 | 0.481188 | 7 | −0.2463 | 8 |
| K = 7 | 0.02575 | 5 | 0.544836 | 3 | −0.2213 | 5 |
| K = 8 | −0.10825 | 6 | 0.529223 | 6 | −0.2336 | 6 |
| K = 9 | 0.29475 | 2 |
|
|
|
|
|
|
|
| 0.603641 | 2 | −0.185 | 2 |
Estimations of number of clusters by the relative measures.
| Relative measures | |||||||||||
| Data sets | Dunn | Sil | PBM | Hubert | Normalized Hubert | DB | SD | S_Dbw | CS | C-index | #Cluster |
| Breast cancer | 5 |
|
|
|
| 10 |
| 10 |
| 5 | 2 |
| Breast tissue | 3 | 2 |
| 2 | 2 | 3 | 2 | 7 |
| 10 | 6 |
| Acute inflammations | 4 |
| 9 |
|
| 10 | 4 | 10 | 9 | 4 | 2 |
| Ecoli | 3 | 2 | 3 | 2 | 2 | 10 | 4 | 7 | 4 | 4 | 8 |
| Glass | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 10 | 8 | 2 | 6 |
| Haberman’s survival | 8 |
| 5 |
|
| 10 | 4 | 10 | 4 | 10 | 2 |
| Ionosphere |
|
|
|
| 3 | 10 |
| 9 | 9 | 10 | 2 |
| Iris | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 10 | 2 | 2 | 3 |
| Parkinsons | 3 | 3 | 5 |
|
| 8 | 3 | 9 | 8 | 10 | 2 |
| Pima Indians diabetes |
|
| 4 |
|
| 10 | 3 | 10 | 10 | 10 | 2 |
| Sonar | 4 |
|
|
|
| 10 | 4 | 10 | 4 | 4 | 2 |
| Transfusion | 9/10 |
| 7 |
|
|
|
| 10 | 7 | 9 | 2 |
| Wine | 6 |
|
| 2 | 2 |
| 2 | 7 |
| 6 | 3 |
| Wine quality (red) | 2 | 2 | 3 | 2 | 2 | 9 | 3 | 3 | 3 | 9 | 6 |
| Yeast |
| 2 | 2 | 2 | 2 |
| 3 | 9 |
|
| 10 |
Estimations of number of clusters by the MCDM methods.
| MCDM Methods | ||||
| Data sets | PROMETHEE II | TOPSIS | WSM | #Cluster |
| Breast cancer |
|
|
| 2 |
| Breast tissue |
|
|
| 6 |
| Acute inflammations |
| 4 | 4 | 2 |
| Ecoli | 4 | 3 | 3 | 8 |
| Glass | 8 | 2 | 2 | 6 |
| Haberman’s survival |
|
|
| 2 |
| Ionosphere |
|
|
| 2 |
| Iris | 2 | 2 | 2 | 3 |
| Parkinsons | 5 | 3 | 3 | 2 |
| Pima Indians diabetes |
|
|
| 2 |
| Sonar |
|
|
| 2 |
| Transfusion |
|
|
| 2 |
| Wine |
|
|
| 3 |
| Wine quality (red) |
|
| 3 | 6 |
| Yeast |
| 9 | 9 | 10 |
Results summary.
| Relative Measures | MCDM Methods | ||||||||||||
| Dunn | Silhouette | PBM | Hubert | Normalized Hubert | DB | SD | S_Dbw | CS | C-index | PROMETHEE | TOPSIS | WSM | |
| Correct number | 3 | 8 | 5 | 8 | 7 | 3 | 3 | 0 | 4 | 1 | 11 | 9 | 8 |