Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Variable selection for clustering with Gaussian mixture models.

Literature DB >> 19210744

Variable selection for clustering with Gaussian mixture models.

Cathy Maugis¹, Gilles Celeux, Marie-Laure Martin-Magniette.

Abstract

This article is concerned with variable selection for cluster analysis. The problem is regarded as a model selection problem in the model-based cluster analysis context. A model generalizing the model of Raftery and Dean (2006, Journal of the American Statistical Association 101, 168-178) is proposed to specify the role of each variable. This model does not need any prior assumptions about the linear link between the selected and discarded variables. Models are compared with Bayesian information criterion. Variable role is obtained through an algorithm embedding two backward stepwise algorithms for variable selection for clustering and linear regression. The model identifiability is established and the consistency of the resulting criterion is proved under regularity conditions. Numerical experiments on simulated datasets and a genomic application highlight the interest of the procedure.

Mesh：

Year: 2009 PMID： 19210744 DOI： 10.1111/j.1541-0420.2008.01160.x

Source DB: PubMed Journal: Biometrics ISSN： 0006-341X Impact factor: 2.571

Keyword Cloud
Cited

15 in total

Variable selection for clustering with Gaussian mixture models.

1. A Dirichlet process mixture model for clustering longitudinal gene expression data.

2. A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data.

3. A framework for feature selection in clustering.

4. Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

5. Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data.

6. Unobserved classes and extra variables in high-dimensional discriminant analysis.

7. clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R.

8. Clustering and variable selection in the presence of mixed variable types and missing data.

9. Identification of significant features in DNA microarray data.

10. Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination.