Literature DB >> 15516271

How many clusters? An information-theoretic perspective.

Susanne Still1, William Bialek.   

Abstract

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. A natural question is how many clusters are appropriate for the description of a given system. Traditional approaches to this problem are based on either a framework in which clusters of a particular shape are assumed as a model of the system or on a two-step procedure in which a clustering criterion determines the optimal assignments for a given number of clusters and a separate criterion measures the goodness of the classification to determine the number of clusters. In a statistical mechanics approach, clustering can be seen as a trade-off between energy- and entropy-like terms, with lower temperature driving the proliferation of clusters to provide a more detailed description of the data. For finite data sets, we expect that there is a limit to the meaningful structure that can be resolved and therefore a minimum temperature beyond which we will capture sampling noise. This suggests that correcting the clustering criterion for the bias that arises due to sampling errors will allow us to find a clustering solution at a temperature that is optimal in the sense that we capture maximal meaningful structure--without having to define an external criterion for the goodness or stability of the clustering. We show that in a general information-theoretic framework, the finite size of a data set determines an optimal temperature, and we introduce a method for finding the maximal number of clusters that can be resolved from the data in the hard clustering limit.

Year:  2004        PMID: 15516271     DOI: 10.1162/0899766042321751

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  10 in total

1.  Modularity-based graph partitioning using conditional expected models.

Authors:  Yu-Teng Chang; Richard M Leahy; Dimitrios Pantazis
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2012-01-12

2.  Connectivity cluster analysis for discovering discriminative subnetworks in schizophrenia.

Authors:  Gowtham Atluri; Michael Steinbach; Kelvin O Lim; Vipin Kumar; Angus MacDonald
Journal:  Hum Brain Mapp       Date:  2014-11-13       Impact factor: 5.038

3.  An information-theoretic approach to curiosity-driven reinforcement learning.

Authors:  Susanne Still; Doina Precup
Journal:  Theory Biosci       Date:  2012-07-12       Impact factor: 1.919

4.  Predictive modeling of EEG time series for evaluating surgery targets in epilepsy patients.

Authors:  Andreas Steimer; Michael Müller; Kaspar Schindler
Journal:  Hum Brain Mapp       Date:  2017-02-16       Impact factor: 5.038

5.  Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck.

Authors:  Andrew K Tan; Max Tegmark; Isaac L Chuang
Journal:  Entropy (Basel)       Date:  2022-05-30       Impact factor: 2.738

Review 6.  Systems analysis of high-throughput data.

Authors:  Rosemary Braun
Journal:  Adv Exp Med Biol       Date:  2014       Impact factor: 2.622

7.  Probabilistic Prediction of Protein Phosphorylation Sites Using Classification Relevance Units Machines.

Authors:  Mark Menor; Kyungim Baek; Guylaine Poisson
Journal:  ACM SIGAPP Appl Comput Rev       Date:  2012-12-01

8.  Partition decoupling for multi-gene analysis of gene expression profiling data.

Authors:  Rosemary Braun; Gregory Leibon; Scott Pauls; Daniel Rockmore
Journal:  BMC Bioinformatics       Date:  2011-12-30       Impact factor: 3.169

9.  Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples.

Authors:  Damián G Hernández; Inés Samengo
Journal:  Entropy (Basel)       Date:  2019-06-25       Impact factor: 2.524

10.  Pattern Recognition Analysis Reveals Unique Contrast Sensitivity Isocontours Using Static Perimetry Thresholds Across the Visual Field.

Authors:  Jack Phu; Sieu K Khuu; Lisa Nivison-Smith; Barbara Zangerl; Agnes Yiu Jeung Choi; Bryan W Jones; Rebecca L Pfeiffer; Robert E Marc; Michael Kalloniatis
Journal:  Invest Ophthalmol Vis Sci       Date:  2017-09-01       Impact factor: 4.799

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.