| Literature DB >> 35720897 |
Abstract
The cluster evaluation process is of great importance in areas of machine learning and data mining. Evaluating the clustering quality of clusters shows how much any proposed approach or algorithm is competent. Nevertheless, evaluating the quality of any cluster is still an issue. Although many cluster validity indices have been proposed, there is a need for new approaches that can measure the clustering quality more accurately because most of the existing approaches measure the cluster quality correctly when the shape of the cluster is spherical. However, very few clusters in the real world are spherical. Therefore, a new Validity Index for Arbitrary-Shaped Clusters based on the kernel density estimation (the VIASCKDE Index) to overcome the mentioned issue was proposed in the study. In the VIASCKDE Index, we used separation and compactness of each data to support arbitrary-shaped clusters and utilized the kernel density estimation (KDE) to give more weight to the denser areas in the clusters to support cluster compactness. To evaluate the performance of our approach, we compared it to the state-of-the-art cluster validity indices. Experimental results have demonstrated that the VIASCKDE Index outperforms the compared indices.Entities:
Year: 2022 PMID: 35720897 PMCID: PMC9200537 DOI: 10.1155/2022/4059302
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The example of the relationship between the compactness and separation concepts of two clusters in a two-dimensional data space.
Figure 2The example of Silhouette Index.
Comparison of clustering validity indices that were used for experimentation in the present study.
| Cluster validity Index | Notation | Runtime complexity | Optimal value | Considering denser region? | Handling arbitrary-shaped clusters? | Advantages | Disadvantages |
|---|---|---|---|---|---|---|---|
| Silhouette Index [ | SI |
| Max. | ✗ | ✗ | The score is higher when the clusters are dense and well separated | Good at handling the spherical clusters, high computational complexity |
| Dunn Index [ | DI |
| Max. | ✗ | ✓ | Competent at cluster validity task | High computational cost with high-dimensional data and the number of clusters |
| Calinski-Harabasz Index [ | CH |
| Max. | ✗ | ✗ | Good at well separated and compact clusters, its computational complexity is very low | It is not competent enough at the cluster validation task. |
| Davies–Bouldin Index [ | DB |
| Min. | ✗ | ✗ | Good at well separated and compact clusters, its computational complexity is very low | It is not competent enough at the cluster validation task. |
| S_Dbw validity Index [ | S_Dbw |
| Min. | ✗ | ✓ | Its computational complexity is very low | Affected negatively by the distribution of data |
| Distance-based Separability Index [ | DSI |
| Min | ✗ | ✓ | Useful to discover the shape of clusters | Affected negatively when clusters are too close and its computational complexity is high |
| Root-mean-square std dev [ | RMSSTD |
| Min. | ✗ | ✗ | Good for hierarchical clustering | Has issues when the clusters are close to each other |
| VIASCKDE Index (proposed) | VIASCKDE |
| Max. | ✓ | ✓ | It can handle the arbitrary-shaped clusters, take into account the denser regions, can be used for density-based and micro-cluster-based approaches | Has issues when the clusters are close to each other |
Figure 3Some examples of the arbitrary-shaped cluster.
Figure 4An example of various densities in clusters: example of an Aggregation dataset. (a)Density distribution of the dataset. (b) Density distribution inside a cluster.
Figure 5Relationship between the compactness and separation values of any data in the VIASCKDE Index.
Figure 6An example of the kernel density estimation and its histogram.
Figure 7Types of kernel density estimation curves.
Used datasets.
| Dataset | Type | # of Features | # of data | # of classes | Reference |
|---|---|---|---|---|---|
| Half-kernel | Synthetic | 2 | 1000 | 2 | [ |
| Two spirals | Synthetic | 2 | 312 | 3 | [ |
| Outlier | Synthetic | 2 | 700 | 4 | [ |
| Corners | Synthetic | 2 | 2000 | 4 | [ |
| Cluster in cluster | Synthetic | 2 | 1012 | 2 | [ |
| Crescent full moon | Synthetic | 2 | 1000 | 2 | [ |
| Moon | Synthetic | 2 | 514 | 4 | [ |
| Face | Synthetic | 2 | 322 | 4 | [ |
| Wave | Synthetic | 2 | 287 | 2 | [ |
| Aggregation | Synthetic | 2 | 788 | 7 | [ |
| Zelnik1 | Synthetic | 2 | 622 | 4 | [ |
| Zelnik5 | Synthetic | 2 | 512 | 4 | [ |
| Xclara | Synthetic | 2 | 3000 | 3 | [ |
| Banana | Synthetic | 2 | 4811 | 2 | [ |
| D2c2sc13 | Synthetic | 2 | 588 | 13 | [ |
| 2sp2glob | Synthetic | 2 | 999 | 3 | [ |
| Cure-t1-200n | Synthetic | 2 | 2000 | 5 | [ |
| Thyroid | Real | 4 | 215 | 2 | [ |
| Fisher iris | Real | 4 | 150 | 3 | [ |
| Breast cancer | Real | 8 | 699 | 2 | [ |
Figure 8The distributions of some of the used datasets.
ARI results obtained with the parametric and nonparametric methods.
| Datasets | Adjusted Rand Index (ARI) | |
|---|---|---|
| Methods | ||
| Gaussian Weight | KDE Weight | |
| Half-kernel |
|
|
| Two spirals |
|
|
| Outlier |
|
|
| Corners |
|
|
| Cluster in cluster |
|
|
| Crescent full moon |
|
|
| Moon |
|
|
| Face | 0.9949 |
|
| Wave | 1.0000 |
|
| Fisher iris | 0.7493 |
|
| Breast cancer | 0.7540 |
|
| Aggregation | 0.7338 |
|
| Thyroid | -0.0619 |
|
| Zelnik1 |
| 0.9488 |
| Zelnik5 |
|
|
| Xclara | 0.0001 | 0.0001 |
| Banana |
|
|
| Ds2c2sc13 | 0.3187 |
|
| 2sp2glob |
| 0.9880 |
| Cure-t1-2000n |
|
|
Obtained results with the different kernels values.
| Kernels | Datasets | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Obtained VIASCKDE Values with each | Obtained ARI Values with each | |||||||||||
| Face | Aggregation | Outliers | Thyroid | Crescent full moon | Cure-t1-200n | Face | Aggregation | Outliers | Thyroid | Crescent full moon | Cure-t1-200n | |
| Gaussian | 0.7063 | 0.6368 | 0.6797 | 0.4947 | 0.6623 | 0.6555 |
|
|
|
|
|
|
| Cosine | 0.5967 | 0.6564 | 0.6499 | 0.1699 | 0.6340 | 0.6343 |
| 0.8089 |
|
|
|
|
| Exponential | 0.7005 | 0.6371 | 0.6714 | 0.5541 | 0.6426 | 0.6653 | 0.0386 | 0.8089 |
| 0.5034 |
|
|
| Linear | 0.5736 | 0.6427 | 0.6306 | 0.1594 | 0.6169 | 0.6371 |
| 0.8089 |
|
|
|
|
| Epanechnikov | 0.6021 | 0.6562 | 0.6581 | 0.1758 | 0.6388 | 0.6295 |
| 0.8089 |
|
|
|
|
| Tophat | 0.6457 | 0.6165 | 0.6433 | 0.2306 | 0.6664 | 0.6299 |
| 0.0333 |
|
|
|
|
Figure 9Types of the kernel density estimation curves.
Obtained results with the different bandwidth values.
| Bandwidth | Datasets | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Obtained VIASCKDE values with each | Obtained ARI values with each | |||||||||||
| Face | Aggregation | Outliers | Thyroid | Crescent full moon | Cure-t1-200n | Face | Aggregation | Outliers | Thyroid | Crescent full moon | Cure-t1-200n | |
| 0.01 | 0.3377 | 0.3444 | 0.4650 | 0.0556 | 0.4780 | 0.5264 | −0.0386 | 0.8089 |
|
|
|
|
| 0.03 | 0.6627 | 0.6565 | 0.6508 | 0.3493 | 0.6608 | 0.6421 |
| 0.8089 |
| 0.5034 |
|
|
| 0.05 | 0.7063 | 0.6388 | 0.6797 | 0.4947 | 0.6623 | 0.6555 |
|
|
| 0.5034 |
|
|
| 0.1 | 0.7365 | 0.6225 | 0.6851 | 0.6306 | 0.6486 | 0.6565 | −0.0386 | 0.8089 |
| 0.5034 |
|
|
| 0.3 | 0.7857 | 0.5947 | 0.6773 | 0.7402 | 0.6143 | 0.6189 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 0.5 | 0.7586 | 0.5689 | 0.5481 | 0.7591 | 0.5945 | 0.6039 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 1.0 | 0.7412 | 0.5636 | 0.5257 | 0.7618 | 0.5927 | 0.6018 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 1.5 | 0.7362 | 0.5629 | 0.5236 | 0.7618 | 0.5923 | 0.6016 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 2 | 0.7339 | 0.5626 | 0.5229 | 0.7618 | 0.5921 | 0.6015 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 2.5 | 0.7328 | 0.5625 | 0.5226 | 0.7618 | 0.5920 | 0.6015 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 3 | 0.7322 | 0.5624 | 0.5225 | 0.7618 | 0.5920 | 0.6015 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 3.5 | 0.7317 | 0.5624 | 0.5223 | 0.7618 | 0.5919 | 0.6015 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 4 | 0.7314 | 0.5623 | 0.5222 | 0.7617 | 0.5919 | 0.6015 | −0.0386 | 0.7338 |
| 0.2099 |
|
|
| 4.5 | 0.3377 | 0.3444 | 0.4650 | 0.0556 | 0.4780 | 0.5264 | −0.0386 | 0.8089 |
|
|
|
|
| 5 | 0.6627 | 0.6565 | 0.6508 | 0.3493 | 0.6608 | 0.6421 |
| 0.8089 |
| 0.5034 |
|
|
Figure 10The clustering results suggested by each validity index when the DBSCAN algorithm was tested in the Aggregation dataset.
The best parameters for datasets that were detected by the cluster validity indices with the DBSCAN algorithm.
| Dataset | DBSCAN parameters | Best parameters detected by indices for the DBSCAN algorithm | |||||||
|---|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | DSI | RMSSTD | VIASCKDE | ||
| Half-kernel |
| 0.08 | 0.08 | 0.05 | 0.08 | 0.05 | 0.05 | 0.08 | 0.08 |
| MinPts | 7 | 7 | 11 | 7 | 15 | 11 | 7 | 7 | |
| Two spirals |
| 0.1 | 0.1 | 0.05 | 0.1 | 0.05 | 0.1 | 0.05 | 0.1 |
| MinPts | 11 | 11 | 15 | 11 | 15 | 11 | 14 | 11 | |
| Outlier |
| 0.07 | 0.07 | 0.07 | 0.07 | 0.05 | 0.07 | 0.05 | 0.07 |
| MinPts | 15 | 15 | 15 | 15 | 8 | 15 | 14 | 15 | |
| Corners |
| 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
| MinPts | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | |
| Cluster in cluster |
| 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 |
| MinPts | 12 | 12 | 12 | 12 | 12 | 12 | 14 | 12 | |
| Crescent full moon |
| 0.07 | 0.07 | 0.07 | 0.07 | 0.05 | 0.06 | 0.05 | 0.07 |
| MinPts | 14 | 14 | 14 | 14 | 15 | 12 | 15 | 14 | |
| Moon |
| 0.06 | 0.08 | 0.06 | 0.06 | 0.05 | 0.05 | 0.06 | 0.06 |
| MinPts | 7 | 11 | 9 | 7 | 9 | 9 | 15 | 15 | |
| Face |
| 0.06 | 0.1 | 0.1 | 0.06 | 0.06 | 0.05 | 0.06 | 0.1 |
| MinPts | 15 | 8 | 5 | 6 | 15 | 12 | 11 | 8 | |
| Wave |
| 0.09 | 0.09 | 0.06 | 0.09 | 0.05 | 0.06 | 0.05 | 0.06 |
| MinPts | 12 | 5 | 12 | 12 | 9 | 12 | 15 | 12 | |
| Fisher iris |
| 0.14 | 0.19 | 0.14 | 0.14 | 0.08 | 0.14 | 0.06 | 0.19 |
| MinPts | 15 | 6 | 15 | 15 | 5 | 15 | 7 | 6 | |
| Breast cancer |
| 0.39 | 0.33 | 0.39 | 0.39 | 0.06 | 0.06 | 0.05 | 0.4 |
| MinPts | 8 | 5 | 8 | 8 | 5 | 5 | 14 | 5 | |
| Aggregation |
| 0.06 | 0.09 | 0.06 | 0.06 | 0.06 | 0.06 | 0.05 | 0.06 |
| MinPts | 13 | 7 | 13 | 13 | 14 | 12 | 14 | 13 | |
| Thyroid |
| 0.1 | 0.1 | 0.06 | 0.09 | 0.07 | 0.05 | 0.05 | 0.1 |
| MinPts | 5 | 5 | 12 | 5 | 6 | 8 | 9 | 5 | |
| Zelnik1 |
| 0.08 | 0.08 | 0.05 | 0.1 | 0.07 | 0.07 | 0.08 | 0.07 |
| MinPts | 6 | 15 | 14 | 7 | 5 | 5 | 15 | 5 | |
| Zelnik5 |
| 0.06 | 0.1 | 0.05 | 0.1 | 0.06 | 0.05 | 0.05 | 0.1 |
| MinPts | 14 | 13 | 12 | 13 | 15 | 12 | 14 | 13 | |
| Xclara |
| 0.05 | 0.08 | 0.09 | 0.05 | 0.05 | 0.05 | 0.08 | 0.05 |
| MinPts | 13 | 12 | 15 | 13 | 13 | 13 | 12 | 13 | |
| Banana |
| 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 |
| MinPts | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |
| Ds2c2sc13 |
| 0.09 | 0.09 | 0.06 | 0.06 | 0.05 | 0.06 | 0.09 | 0.05 |
| MinPts | 10 | 10 | 14 | 14 | 13 | 14 | 10 | 8 | |
| 2sp2glob |
| 0.1 | 0.1 | 0.05 | 0.07 | 0.08 | 0.1 | 0.06 | 0.07 |
| MinPts | 9 | 9 | 12 | 14 | 6 | 9 | 5 | 14 | |
| Cure-t1-2000n |
| 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
| MinPts | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | |
Obtained values for each index based on the parameters given in Table 6.
| Dataset | Obtained values for the each index | |||||||
|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | DSI | RMSSTD | VIASCKDE | |
| Half-kernel | 0.2010 | 0.0949 | 1.8818 | 127.8905 | 0.5419 | 0.5068 | 0.2495 | 0.7125 |
| Two spirals | 0.0588 | 0.1317 | 3.3241 | 152.9447 | 0.5848 | 0.1069 | 0.28 | 0.7903 |
| Outlier | 0.5608 | 0.4291 | 0.4037 | 1075.5609 | 0.2099 | 0.9654 | 0.1302 | 0.6797 |
| Corners | 0.4614 | 0.2872 | 0.7436 | 2020.1068 | 0.4976 | 0.6358 | 0.1187 | 0.6295 |
| Cluster in cluster | 0.2231 | 0.2341 | 208.8458 | 0.0169 | 0.8536 | 0.7332 | 0.2276 | 0.595 |
| Crescent full moon | 0.2784 | 0.1923 | 1.1646 | 285.1423 | 0.3255 | 0.6568 | 0.2449 | 0.6623 |
| Moon | 0.2371 | 0.1052 | 0.9739 | 244.1722 | 0.2081 | 0.8788 | 0.2525 | 0.7508 |
| Face | 0.4569 | 0.2217 | 1.1099 | 213.0246 | 0.3725 | 0.7627 | 0.2423 | 0.6631 |
| Wave | 0.4525 | 0.1291 | 0.7119 | 366.1095 | 0.2344 | 0.8935 | 0.2696 | 0.6495 |
| Fisher iris | 0.5692 | 0.1222 | 0.5234 | 223.6137 | 0.3386 | 0.8296 | 0.2527 | 0.443 |
| Breast cancer | 0.5698 | 0.1228 | 0.8037 | 900.1988 | 0.3606 | 0.9617 | 0.2993 | 0.2944 |
| Aggregation | 0.4763 | 0.1432 | 0.5461 | 1156.7539 | 0.2073 | 0.9442 | 0.1878 | 0.6388 |
| Thyroid | 0.433 | 0.0598 | 2.7626 | 16.6429 | 0.5343 | 0.7486 | 0.1528 | 0.3275 |
| Zelnik1 | 0.2045 | 0.0992 | 5.6978 | 95.196 | 0.2523 | 0.8939 | 0.2171 | 0.6604 |
| Zelnik5 | 0.4971 | 0.2224 | 0.8098 | 413.8835 | 0.3651 | 0.8338 | 0.1534 | 0.7739 |
| Xclara | 0.6654 | 0.0656 | 1.1863 | 6889.0154 | 0.3492 | 0.7462 | 0.229 | 0.8101 |
| Banana | 0.3589 | 0.1258 | 1.1322 | 3532.2201 | 0.7625 | 0.4334 | 0.2146 | 0.8076 |
| Ds2c2sc13 | 0.5724 | 0.237 | 0.5891 | 1907.2388 | 0.1921 | 0.9193 | 0.1091 | 0.605 |
| 2sp2glob | 0.3899 | 0.1278 | 2.7559 | 158.5187 | 0.6374 | 0.8003 | 0.2089 | 0.8819 |
| Cure-t1-2000n | 0.4514 | 0.1196 | 0.6775 | 1365.0774 | 0.3054 | 0.787 | 0.1721 | 0.6555 |
The best parameters for the datasets that were detected by the cluster validity indices with the Spectral Clustering algorithm are given in Table 7.
| Dataset | Spectral clustering parameters | Best parameters detected by indices for the Spectral Clustering algorithm | |||||||
|---|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | DSI | RMSSTD | VIASCKDE | ||
| Half-kernel | n_clusters | 14 | 2 | 15 | 15 | 14 | 15 | 2 | 2 |
| Two spirals | n_clusters | 15 | 2 | 15 | 15 | 15 | 15 | 2 | 2 |
| Outlier | n_clusters | 2 | 4 | 4 | 13 | 3 | 4 | 2 | 4 |
| Corners | n_clusters | 12 | 4 | 12 | 12 | 15 | 14 | 2 | 2 |
| Cluster in cluster | n_clusters | 4 | 2 | 4 | 15 | 15 | 15 | 2 | 2 |
| Crescent full moon | n_clusters | 5 | 2 | 5 | 13 | 15 | 14 | 2 | 6 |
| Moon | n_clusters | 15 | 2 | 15 | 15 | 15 | 15 | 2 | 2 |
| Face | n_clusters | 11 | 2 | 10 | 12 | 15 | 13 | 2 | 2 |
| Wave | n_clusters | 7 | 2 | 15 | 15 | 15 | 15 | 2 | 2 |
| Fisher iris | n_clusters | 2 | 2 | 2 | 3 | 15 | 2 | 2 | 3 |
| Breast cancer | n_clusters | 2 | 2 | 2 | 2 | 11 | 14 | 15 | 12 |
| Aggregation | n_clusters | 4 | 2 | 6 | 14 | 2 | 15 | 2 | 2 |
| Thyroid | n_clusters | 3 | 2 | 3 | 3 | 15 | 15 | 2 | 3 |
| Zelnik1 | n_clusters | 12 | 2 | 13 | 12 | 15 | 13 | 3 | 3 |
| Zelnik5 | n_clusters | 8 | 2 | 8 | 15 | 15 | 15 | 2 | 4 |
| Xclara | n_clusters | 3 | 2 | 3 | 3 | 10 | 3 | 2 | 3 |
| Banana | n_clusters | 9 | 2 | 9 | 15 | 14 | 15 | 2 | 2 |
| Ds2c2sc13 | n_clusters | 3 | 3 | 5 | 8 | 2 | 15 | 2 | 5 |
| 2sp2glob | n_clusters | 7 | 2 | 15 | 15 | 15 | 15 | 2 | 7 |
| Cure-t1-2000n | n_clusters | 5 | 2 | 4 | 13 | 2 | 12 | 2 | 3 |
The best parameters for the datasets that were detected by the cluster validity indices with the HDBSCAN algorithm.
| Dataset | HDBSCAN Parameter | Best parameters detected by the indices for the HDBSCAN algorithm | |||||||
|---|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | DSI | RMSSTD | VIASCKDE | ||
| Half-kernel | n_clusters_size | 24 | 24 | 2 | 25 | 25 | 25 | 24 | 24 |
| n_samples | 6 | 6 | 10 | 25 | 25 | 25 | 6 | 6 | |
| Two spirals | n_clusters_size | 3 | 25 | 3 | 17 | 2 | 2 | 15 | 6 |
| n_samples | 2 | 17 | 2 | 7 | 2 | 2 | 19 | 12 | |
| Outlier | n_clusters_size | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 |
| n_samples | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | |
| Corners | n_clusters_size | 8 | 8 | 8 | 8 | 2 | 2 | 8 | 8 |
| n_samples | 8 | 8 | 8 | 8 | 2 | 2 | 8 | 8 | |
| Cluster in cluster | n_clusters_size | 20 | 20 | 9 | 11 | 7 | 7 | 20 | 20 |
| n_samples | 10 | 10 | 2 | 2 | 3 | 3 | 10 | 10 | |
| Crescent full moon | n_clusters_size | 20 | 20 | 3 | 20 | 3 | 3 | 20 | 20 |
| n_samples | 12 | 12 | 2 | 12 | 2 | 2 | 12 | 12 | |
| Moon | n_clusters_size | 22 | 6 | 22 | 22 | 10 | 2 | 10 | 6 |
| n_samples | 3 | 4 | 3 | 3 | 24 | 25 | 24 | 4 | |
| Face | n_clusters_size | 21 | 13 | 9 | 21 | 9 | 9 | 13 | 9 |
| n_samples | 5 | 19 | 8 | 5 | 8 | 8 | 19 | 8 | |
| Wave | n_clusters_size | 16 | 6 | 16 | 16 | 3 | 4 | 6 | 2 |
| n_samples | 13 | 3 | 23 | 13 | 13 | 19 | 3 | 5 | |
| Fisher iris | n_clusters_size | 5 | 5 | 14 | 5 | 5 | 18 | 9 | 5 |
| n_samples | 12 | 12 | 16 | 12 | 12 | 21 | 25 | 12 | |
| Breast cancer | n_clusters_size | 11 | 5 | 2 | 5 | 2 | 2 | 22 | 5 |
| n_samples | 34 | 55 | 3 | 55 | 3 | 3 | 53 | 55 | |
| Aggregation | n_clusters_size | 17 | 12 | 9 | 12 | 23 | 2 | 12 | 2 |
| n_samples | 25 | 14 | 16 | 14 | 13 | 4 | 14 | 4 | |
| Thyroid | n_clusters_size | 3 | 2 | 3 | 3 | 3 | 3 | 2 | 8 |
| n_samples | 2 | 7 | 2 | 2 | 4 | 2 | 16 | 4 | |
| Zelnik1 | n_clusters_size | 11 | 3 | 5 | 3 | 20 | 2 | 14 | 3 |
| n_samples | 16 | 11 | 25 | 15 | 16 | 17 | 19 | 11 | |
| Zelnik5 | n_clusters_size | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 |
| n_samples | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | |
| Xclara | n_clusters_size | 9 | 22 | 3 | 13 | 3 | 3 | 3 | 13 |
| n_samples | 2 | 6 | 3 | 9 | 3 | 3 | 3 | 9 | |
| Banana | n_clusters_size | 21 | 21 | 13 | 21 | 21 | 16 | 21 | 21 |
| n_samples | 14 | 14 | 16 | 14 | 14 | 24 | 14 | 14 | |
| Ds2c2sc13 | n_clusters_size | 22 | 22 | 16 | 22 | 4 | 22 | 24 | 16 |
| n_samples | 19 | 19 | 20 | 19 | 6 | 19 | 24 | 10 | |
| 2sp2glob | n_clusters_size | 21 | 21 | 21 | 21 | 21 | 21 | 21 | 21 |
| n_samples | 22 | 22 | 22 | 22 | 22 | 22 | 22 | 22 | |
| Cure-t1-2000n | n_clusters_size | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 25 |
| n_samples | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 4 | |
Obtained values for each index based on the parameters are given in Table 8.
| Dataset | Obtained values for the each index | |||||||
|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | RMSSTD | DSI | VIASCKDE | |
| Half-kernel | 0.4748 | 0.0949 | 0.6066 | 1761.6198 | 0.2246 | 0.9163 | 0.2495 | 0.7395 |
| Two spirals | 0.3175 | 0.1317 | 1.058 | 1378.878 | 0.2857 | 0.7829 | 0.2865 | 0.8151 |
| Outlier | 0.6178 | 0.4291 | 0.4037 | 1804.463 | 0.1176 | 0.9654 | 0.2924 | 0.6863 |
| Corners | 0.5672 | 0.2872 | 0.5315 | 4102.5883 | 0.1873 | 0.9439 | 0.207 | 0.6575 |
| Cluster in cluster | 0.4547 | 0.2341 | 0.9465 | 832.9385 | 0.2764 | 0.857 | 0.2275 | 0.6052 |
| Crescent full moon | 0.4993 | 0.1923 | 0.5792 | 2022.7022 | 0.2055 | 0.9103 | 0.2423 | 0.6689 |
| Moon | 0.4543 | 0.1285 | 0.6781 | 602.0907 | 0.2169 | 0.9098 | 0.2689 | 0.7527 |
| Face | 0.4996 | 0.2361 | 0.5473 | 1055.0573 | 0.1705 | 0.9271 | 0.2481 | 0.7575 |
| Wave | 0.4957 | 0.1291 | 0.631 | 681.3681 | 0.1639 | 0.9124 | 0.2541 | 0.617 |
| Fisher iris | 0.6295 | 0.3581 | 0.4877 | 356.289 | 0.2163 | 0.8923 | 0.1432 | 0.4539 |
| Breast cancer | 0.5839 | 0.1291 | 0.7738 | 993.0158 | 0.1796 | 0.7795 | 0.2031 | 0.4341 |
| Aggregation | 0.4541 | 0.1091 | 0.589 | 1623.9684 | 0.1434 | 0.921 | 0.2966 | 0.6944 |
| Thyroid | 0.5517 | 0.0973 | 0.85 | 138.1291 | 0.3809 | 0.685 | 0.1309 | 0.4832 |
| Zelnik1 | 0.5042 | 0.0992 | 0.663 | 194.586 | 0.2836 | 0.8614 | 0.2171 | 0.6544 |
| Zelnik5 | 0.5948 | 0.2651 | 0.5353 | 1832.5626 | 0.1548 | 0.9495 | 0.2763 | 0.7686 |
| Xclara | 0.6946 | 0.023 | 0.4203 | 10843.7203 | 0.2779 | 0.946 | 0.1612 | 0.8164 |
| Banana | 0.5087 | 0.1258 | 0.5734 | 14012.5597 | 0.1806 | 0.9343 | 0.2146 | 0.82 |
| Ds2c2sc13 | 0.3939 | 0.0639 | 0.8082 | 1133.5545 | 0.1434 | 0.9064 | 0.2896 | 0.6187 |
| 2sp2glob | 0.6102 | 0.1456 | 0.6921 | 1548.8465 | 0.2544 | 0.8693 | 0.2396 | 0.725 |
| Cure-t1-2000n | 0.4994 | 0.1921 | 0.6581 | 3615.5302 | 0.1582 | 0.9016 | 0.2817 | 0.6589 |
ARI values were obtained from the parameters that are given in Table 6 and were proposed by each index.
| Dataset | Obtained ARI values for the each index | |||||||
|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | DSI | RMSSTD | VIASCKDE | |
| Half-kernel |
|
| 0.9940 |
| 0.9153 | 0.9940 |
|
|
| Two spirals |
|
| 0.9804 |
| 0.9804 |
| 0.9990 |
|
| Outlier |
|
|
|
| 0.9973 |
| 0.8621 |
|
| Corners |
|
|
|
|
|
|
|
|
| Cluster in cluster |
|
|
|
|
|
| 0.8879 |
|
| Crescent full moon |
|
| 0.9968 |
| 0.9105 | 0.9873 | 0.8509 |
|
| Moon |
| 0.6322 | 0.9256 |
| 0.7874 | 0.7874 | 0.7949 | 0.7949 |
| Face | 0.2645 | 0.9949 |
| 0.2892 | 0.1304 | 0.1226 | 0.8521 |
|
| Wave | 0.3514 |
| 0.1441 | 0.3514 | 0.1913 | 0.1441 | 0.0508 | 0.0536 |
| Fisher iris | 0.4518 |
| 0.4518 | 0.4518 | 0.2369 | 0.4518 | 0.0106 |
|
| Breast cancer | 0.8240 | 0.8189 | 0.8240 | 0.8240 | −0.0779 | −0.0779 | −0.0780 |
|
| Aggregation |
| 0.7338 |
|
| 0.8770 | 0.9866 | 0.6330 |
|
| Thyroid | 0.6715 | 0.6715 | −0.0664 |
| 0.2940 | −0.1332 | −0.1396 | 0.6715 |
| Zelnik1 | 0.7708 |
| 0.3409 | 0.7852 | 0.7724 | 0.7724 |
| 0.7781 |
| Zelnik5 | 0.9214 |
| 0.9278 |
| 0.9216 | 0.9126 | 0.9839 |
|
| Xclara |
| 0.0001 | 0.0001 |
|
|
| 0.0001 |
|
| Banana |
|
|
|
|
|
|
|
|
| Ds2c2sc13 | 0.3187 | 0.3187 | 0.4911 | 0.4911 | 0.5325 | 0.4911 | 0.3187 |
|
| 2sp2glob | 1.0000 |
| 0.9850 | 0.9940 | 0.9985 |
| 0.9970 | 0.9940 |
| Cure-t1-2000n |
|
|
|
|
|
|
|
|
ARI values, which were obtained from the parameters, were given in Table 8 and were proposed by each index.
| Dataset | Obtained ARI values for the each index | |||||||
|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | RMSSTD | DSI | VIASCKDE | |
| Half-kernel | 0.1514 |
| 0.1422 | 0.1421 | 0.1515 | 0.1421 |
|
|
| Two spirals | 0.1401 |
| 0.1435 | 0.1401 | 0.1401 | 0.1401 | 0.2047 |
|
| Outlier | 0.8463 |
|
| 0.2236 | 0.2322 |
| 0.2271 |
|
| Corners | 0.4581 |
| 0.4581 | 0.4581 | 0.3917 | 0.4199 | 0.3330 | 0.3330 |
| Cluster in cluster | 0.6584 |
| 0.6584 | 0.1365 | 0.1368 | 0.1365 |
|
|
| Crescent full moon | 0.2934 |
| 0.2934 | 0.1021 | 0.0869 | 0.0955 |
| 0.2341 |
| Moon | 0.3629 | 0.2973 | 0.3629 | 0.3629 | 0.3092 | 0.3092 |
|
|
| Face | 0.0646 |
| 0.0747 | 0.0580 | 0.0443 | 0.0538 |
|
|
| Wave | 0.2970 |
| 0.1333 | 0.1323 | 0.1323 | 0.1356 |
|
|
| Fisher iris | 0.5681 | 0.5681 | 0.5681 |
| 0.2395 | 0.5681 | 0.5681 |
|
| Breast cancer |
|
|
|
| 0.2875 | 0.1779 | 0.0669 | 0.2534 |
| Aggregation | 0.7975 | 0.0646 |
| 0.4453 | 0.0486 | 0.4156 | 0.1149 | 0.0646 |
| Thyroid |
| 0.4204 |
|
| 0.0830 | 0.0830 | 0.4204 |
|
| Zelnik1 | 0.3170 | 0.4352 | 0.3004 | 0.3170 | 0.2225 | 0.3007 |
|
|
| Zelnik5 | 0.6567 | 0.3096 | 0.6567 | 0.3638 | 0.3790 | 0.3638 | 0.5003 |
|
| Xclara |
| 0.6270 |
|
| 0.3602 |
| 0.6270 |
|
| Banana | 0.2394 |
| 0.2394 | 0.1369 | 0.1463 | 0.1369 | 1.0000 |
|
| Ds2c2sc13 | 0.3267 | 0.3267 | 0.2766 | 0.4531 | 0.0244 |
| 0.0244 | 0.2394 |
| 2sp2glob |
| 0.5709 | 0.3226 | 0.3195 | 0.3185 | 0.3226 | 0.5709 |
|
| Cure-t1-2000n | 0.6334 | 0.3423 | 0.7818 | 0.3303 | 0.1757 | 0.3546 | 0.1757 |
|
Obtained values for each index based on the parameters given in Table 9.
| Dataset | Obtained values for the each index | |||||||
|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | DSI | RMSSTD | VIASCKDE | |
| Half-kernel | 0.201 | 0.0949 | 1.8878 | 171.8984 | 0.5589 | 0.4662 | 0.2495 | 0.7125 |
| Two spirals | 0.4071 | 0.1317 | 1.1858 | 259.0349 | 0.0136 | 0.9957 | 0.28 | 0.8151 |
| Outlier | 0.5608 | 0.4291 | 0.4037 | 1075.5609 | 0.2099 | 0.9654 | 0.1235 | 0.6881 |
| Corners | 0.4614 | 0.2872 | 0.7436 | 2020.1068 | 0.0437 | 0.9791 | 0.1187 | 0.6268 |
| Cluster in cluster | 0.2231 | 0.2341 | 4.4083 | 2.5624 | 0.0642 | 0.947 | 0.2275 | 0.6052 |
| Crescent full moon | 0.2784 | 0.1923 | 1.0934 | 285.1423 | 0.0527 | 0.9829 | 0.2423 | 0.6623 |
| Moon | 0.2371 | 0.0794 | 1.1729 | 244.1722 | 0.3243 | 0.7021 | 0.2628 | 0.7002 |
| Face | 0.417 | 0.2217 | 0.9539 | 204.5665 | 0.4031 | 0.8557 | 0.2339 | 0.6654 |
| Wave | 0.3746 | 0.1291 | 1.1785 | 168.9936 | 0.3155 | 0.7862 | 0.2541 | 0.617 |
| Fisher iris | 0.6295 | 0.3581 | 0.4659 | 353.3674 | 0.4488 | 0.9296 | 0.1478 | 0.4722 |
| Breast cancer | 0.4306 | 0.1125 | 1.1919 | 493.4632 | 0.1958 | 0.9575 | 0.2983 | 0.0143 |
| Aggregation | 0.4925 | 0.1432 | 0.6452 | 778.9448 | 0.2701 | 0.8481 | 0.1497 | 0.6108 |
| Thyroid | 0.4359 | 0.0683 | 1.68 | 38.4235 | 0.6519 | 0.7913 | 0.1532 | 0.3833 |
| Zelnik1 | 0.0008 | 0.0992 | 13.2535 | 12.7433 | 0.3022 | 0.7287 | 0.2171 | 0.541 |
| Zelnik5 | 0.4663 | 0.2224 | 1.0459 | 413.8835 | 0.4593 | 0.7425 | 0.1493 | 0.7739 |
| Xclara | 0.6745 | 0.0295 | 1.213 | 7008.8746 | 0.0475 | 0.9918 | 0.1114 | 0.7814 |
| Banana | 0.3589 | 0.1258 | 1.0288 | 3532.2201 | 0.7625 | 0.7003 | 0.2146 | 0.82 |
| Ds2c2sc13 | 0.5724 | 0.237 | 0.5829 | 1785.9002 | 0.1831 | 0.8928 | 0.1093 | 0.6045 |
| 2sp2glob | 0.3899 | 0.1278 | 2.7973 | 158.408 | 0.6374 | 0.8003 | 0.2088 | 0.7146 |
| Cure-t1-2000n | 0.4514 | 0.1196 | 0.6775 | 1365.0774 | 0.3054 | 0.787 | 0.1721 | 0.655 |
ARI values, which were obtained from the parameters, are given in Table 9 and were proposed by each index.
| Dataset | Obtained ARI values for the each index | |||||||
|---|---|---|---|---|---|---|---|---|
| SI | DI | DB | CH | S_Dbw | DSI | RMSSTD | VIASCKDE | |
| Half-kernel |
|
| 0.9980 | 0.7901 | 0.7901 | 0.7901 |
|
|
| Two spirals | 0.0079 |
| 0.0079 | 0.7524 | 0.0076 | 0.0076 | 0.9990 |
|
| Outlier |
|
|
|
|
|
|
|
|
| Corners |
|
|
|
| 0.8261 | 0.8261 |
|
|
| Cluster in cluster |
|
| 0.5285 | 0.5360 | 0.5274 | 0.5274 |
|
|
| Crescent full moon |
|
| 0.1160 |
| 0.1160 | 0.1160 |
|
|
| Moon | 0.9379 |
| 0.9379 | 0.9379 | 0.2933 | 0.3697 | 0.2933 |
|
| Face | 0.1883 | 0.9949 |
| 0.1883 |
|
| 0.9949 |
|
| Wave | 0.2609 |
| 0.1709 | 0.2609 | 0.2528 | 0.2140 |
|
|
| Fisher iris | 0.5681 | 0.5681 | 0.5657 | 0.5681 | 0.5681 | 0.5638 |
|
|
| Breast cancer | 0.8349 |
| 0.0011 |
| 0.0011 | 0.0011 | -0.0707 |
|
| Aggregation | 0.7962 | 0.7338 | 0.7323 | 0.7338 |
| 0.8089 | 0.7338 | 0.8089 |
| Thyroid | 0.4885 |
| 0.4885 | 0.4885 | 0.4880 | 0.4885 | -0.0255 | 0.4873 |
| Zelnik1 | 0.9313 |
| 0.3207 | 0.8880 |
| 0.9680 | 0.9771 |
|
| Zelnik5 |
|
|
|
|
|
|
|
|
| Xclara | 0.9861 |
| 0.3936 | 0.9880 | 0.3936 | 0.3936 | 0.3936 |
|
| Banana |
|
| 0.8308 |
|
| 0.8278 |
|
|
| Ds2c2sc13 | 0.3187 | 0.3187 | 0.3180 | 0.3187 |
| 0.3187 | 0.3165 | 0.4260 |
| 2sp2glob |
|
|
|
|
|
|
|
|
| Cure-t1-2000n |
|
|
|
|
|
|
|
|
The number of highest ARI values that each index reached.
| Index | # of datasets that each index was the best on the different algorithms | |||
|---|---|---|---|---|
| DBSCAN | Spectral Clustering | HDBSCAN | Total | |
| SI |
|
|
|
|
| DI |
|
|
|
|
| DB |
|
|
|
|
| CH |
|
|
|
|
| S_Dbw |
|
|
|
|
| DSI |
|
|
|
|
| RMSSTD |
|
|
|
|
| VIASCKDE (proposed index) |
|
|
|
|