Literature DB >> 28194550

Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data.

J Fernando Vera1, Rodrigo Macías2.   

Abstract

One of the main problems in cluster analysis is that of determining the number of groups in the data. In general, the approach taken depends on the cluster method used. For K-means, some of the most widely employed criteria are formulated in terms of the decomposition of the total point scatter, regarding a two-mode data set of N points in p dimensions, which are optimally arranged into K classes. This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode [Formula: see text] dissimilarity matrix describing the objects. In this framework, p and the coordinates of points are usually unknown, and the application of criteria originally formulated for two-mode data sets is dependent on their possible reformulation in the one-mode situation. The decomposition of the variability of the clustered objects is proposed in terms of the corresponding block-shaped partition of the dissimilarity matrix. Within-block and between-block dispersion values for the partitioned dissimilarity matrix are derived, and variance-based criteria are subsequently formulated in order to determine the number of groups in the data. A Monte Carlo experiment was carried out to study the performance of the proposed criteria. For simulated clustered points in p dimensions, greater efficiency in recovering the number of clusters is obtained when the criteria are calculated from the related Euclidean distances instead of the known two-mode data set, in general, for unequal-sized clusters and for low dimensionality situations. For simulated dissimilarity data sets, the proposed criteria always outperform the results obtained when these criteria are calculated from their original formulation, using dissimilarities instead of distances.

Keywords:  K-means; SYNCLUS; cluster analysis; dissimilarity; number of clusters; variance-based criterion

Mesh:

Year:  2017        PMID: 28194550     DOI: 10.1007/s11336-017-9561-1

Source DB:  PubMed          Journal:  Psychometrika        ISSN: 0033-3123            Impact factor:   2.500


  2 in total

1.  K-means clustering: a half-century synthesis.

Authors:  Douglas Steinley
Journal:  Br J Math Stat Psychol       Date:  2006-05       Impact factor: 3.380

2.  Scaling and clustering in the study of semantic disruptions in patients with schizophrenia: a re-evaluation.

Authors:  Brita Elvevåg; Gert Storms
Journal:  Schizophr Res       Date:  2003-10-01       Impact factor: 4.939

  2 in total
  1 in total

1.  On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling.

Authors:  J Fernando Vera; Rodrigo Macías
Journal:  Psychometrika       Date:  2021-05-19       Impact factor: 2.500

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.