Literature DB >> 19210744

Variable selection for clustering with Gaussian mixture models.

Cathy Maugis1, Gilles Celeux, Marie-Laure Martin-Magniette.   

Abstract

This article is concerned with variable selection for cluster analysis. The problem is regarded as a model selection problem in the model-based cluster analysis context. A model generalizing the model of Raftery and Dean (2006, Journal of the American Statistical Association 101, 168-178) is proposed to specify the role of each variable. This model does not need any prior assumptions about the linear link between the selected and discarded variables. Models are compared with Bayesian information criterion. Variable role is obtained through an algorithm embedding two backward stepwise algorithms for variable selection for clustering and linear regression. The model identifiability is established and the consistency of the resulting criterion is proved under regularity conditions. Numerical experiments on simulated datasets and a genomic application highlight the interest of the procedure.

Mesh:

Year:  2009        PMID: 19210744     DOI: 10.1111/j.1541-0420.2008.01160.x

Source DB:  PubMed          Journal:  Biometrics        ISSN: 0006-341X            Impact factor:   2.571


  15 in total

1.  A Dirichlet process mixture model for clustering longitudinal gene expression data.

Authors:  Jiehuan Sun; Jose D Herazo-Maya; Naftali Kaminski; Hongyu Zhao; Joshua L Warren
Journal:  Stat Med       Date:  2017-06-15       Impact factor: 2.373

2.  A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data.

Authors:  Monia Ranalli; Roberto Rocci
Journal:  Psychometrika       Date:  2017-09-06       Impact factor: 2.500

3.  A framework for feature selection in clustering.

Authors:  Daniela M Witten; Robert Tibshirani
Journal:  J Am Stat Assoc       Date:  2010-06-01       Impact factor: 5.033

4.  Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

Authors:  Gilles Celeux; Marie-Laure Martin-Magniette; Cathy Maugis-Rabusseau; Adrian E Raftery
Journal:  J Soc Fr Statistique (2009)       Date:  2014

5.  Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data.

Authors:  Daniel Nevo; David M Zucker; Rulla M Tamimi; Molin Wang
Journal:  Stat Med       Date:  2016-08-24       Impact factor: 2.373

6.  Unobserved classes and extra variables in high-dimensional discriminant analysis.

Authors:  Michael Fop; Pierre-Alexandre Mattei; Charles Bouveyron; Thomas Brendan Murphy
Journal:  Adv Data Anal Classif       Date:  2022-03-01

7.  clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R.

Authors:  Luca Scrucca; Adrian E Raftery
Journal:  J Stat Softw       Date:  2018-04-17       Impact factor: 6.440

8.  Clustering and variable selection in the presence of mixed variable types and missing data.

Authors:  C B Storlie; S M Myers; S K Katusic; A L Weaver; R G Voigt; P E Croarkin; R E Stoeckel; J D Port
Journal:  Stat Med       Date:  2018-05-17       Impact factor: 2.373

9.  Identification of significant features in DNA microarray data.

Authors:  Eric Bair
Journal:  Wiley Interdiscip Rev Comput Stat       Date:  2013-07

10.  Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination.

Authors:  Christopher Yau; Chris Holmes
Journal:  Bayesian Anal       Date:  2011-07-01       Impact factor: 3.728

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.