Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data.

Literature DB >> 28194550

Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data.

Abstract

One of the main problems in cluster analysis is that of determining the number of groups in the data. In general, the approach taken depends on the cluster method used. For K-means, some of the most widely employed criteria are formulated in terms of the decomposition of the total point scatter, regarding a two-mode data set of N points in p dimensions, which are optimally arranged into K classes. This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode [Formula: see text] dissimilarity matrix describing the objects. In this framework, p and the coordinates of points are usually unknown, and the application of criteria originally formulated for two-mode data sets is dependent on their possible reformulation in the one-mode situation. The decomposition of the variability of the clustered objects is proposed in terms of the corresponding block-shaped partition of the dissimilarity matrix. Within-block and between-block dispersion values for the partitioned dissimilarity matrix are derived, and variance-based criteria are subsequently formulated in order to determine the number of groups in the data. A Monte Carlo experiment was carried out to study the performance of the proposed criteria. For simulated clustered points in p dimensions, greater efficiency in recovering the number of clusters is obtained when the criteria are calculated from the related Euclidean distances instead of the known two-mode data set, in general, for unequal-sized clusters and for low dimensionality situations. For simulated dissimilarity data sets, the proposed criteria always outperform the results obtained when these criteria are calculated from their original formulation, using dissimilarities instead of distances.

Keywords: K-means; SYNCLUS; cluster analysis; dissimilarity; number of clusters; variance-based criterion

Mesh：

Year: 2017 PMID： 28194550 DOI： 10.1007/s11336-017-9561-1

Source DB: PubMed Journal: Psychometrika ISSN： 0033-3123 Impact factor: 2.500

2 in total

1. K-means clustering: a half-century synthesis.

Authors: Douglas Steinley
Journal: Br J Math Stat Psychol Date: 2006-05 Impact factor: 3.380

2. Scaling and clustering in the study of semantic disruptions in patients with schizophrenia: a re-evaluation.

Authors: Brita Elvevåg; Gert Storms
Journal: Schizophr Res Date: 2003-10-01 Impact factor: 4.939

2 in total

1 in total

1. On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling.

Authors: J Fernando Vera; Rodrigo Macías
Journal: Psychometrika Date: 2021-05-19 Impact factor: 2.500

1 in total