Literature DB >> 25279246

Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

Gilles Celeux1, Marie-Laure Martin-Magniette2, Cathy Maugis-Rabusseau3, Adrian E Raftery4.   

Abstract

We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension context.

Entities:  

Keywords:  Model selection; Model-based clustering; Regularization approach; Variable selection

Year:  2014        PMID: 25279246      PMCID: PMC4178956     

Source DB:  PubMed          Journal:  J Soc Fr Statistique (2009)        ISSN: 1962-5197


  9 in total

1.  PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS.

Authors:  J H Wolfe
Journal:  Multivariate Behav Res       Date:  1970-04-01       Impact factor: 5.923

2.  Simultaneous feature selection and clustering using mixture models.

Authors:  Martin H C Law; Mário A T Figueiredo; Anil K Jain
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2004-09       Impact factor: 6.226

3.  Variable selection for model-based high-dimensional clustering and its application to microarray data.

Authors:  Sijian Wang; Ji Zhu
Journal:  Biometrics       Date:  2007-10-26       Impact factor: 2.571

4.  Variable selection for clustering with Gaussian mixture models.

Authors:  Cathy Maugis; Gilles Celeux; Marie-Laure Martin-Magniette
Journal:  Biometrics       Date:  2009-02-04       Impact factor: 2.571

5.  Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.

Authors:  Benhuai Xie; Wei Pan; Xiaotong Shen
Journal:  Electron J Stat       Date:  2008       Impact factor: 1.125

6.  A framework for feature selection in clustering.

Authors:  Daniela M Witten; Robert Tibshirani
Journal:  J Am Stat Assoc       Date:  2010-06-01       Impact factor: 5.033

7.  Penalized model-based clustering with unconstrained covariance matrices.

Authors:  Hui Zhou; Wei Pan; Xiaotong Shen
Journal:  Electron J Stat       Date:  2009-01-01       Impact factor: 1.125

8.  Pairwise variable selection for high-dimensional model-based clustering.

Authors:  Jian Guo; Elizaveta Levina; George Michailidis; Ji Zhu
Journal:  Biometrics       Date:  2010-09       Impact factor: 2.571

9.  CATdb: a public access to Arabidopsis transcriptome data from the URGV-CATMA platform.

Authors:  Séverine Gagnot; Jean-Philippe Tamby; Marie-Laure Martin-Magniette; Frédérique Bitton; Ludivine Taconnat; Sandrine Balzergue; Sébastien Aubourg; Jean-Pierre Renou; Alain Lecharny; Véronique Brunaud
Journal:  Nucleic Acids Res       Date:  2007-10-16       Impact factor: 16.971

  9 in total
  3 in total

1.  Prevalence and Risk Factors of PrEP Use Stigma Among Adolescent Girls and Young Women in Johannesburg, South Africa and Mwanza, Tanzania Participating in the EMPOWER Trial.

Authors:  R J Munthali; A L Stangl; D Baron; I Barré; S Harvey; L Ramskin; M Colombini; N Naicker; S Kapiga; S Delany-Moretlwe
Journal:  AIDS Behav       Date:  2022-07-01

2.  clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R.

Authors:  Luca Scrucca; Adrian E Raftery
Journal:  J Stat Softw       Date:  2018-04-17       Impact factor: 6.440

3.  A Bayesian probit model with spatially varying coefficients for brain decoding using fMRI data.

Authors:  Fengqing Zhang; Wenxin Jiang; Patrick Wong; Ji-Ping Wang
Journal:  Stat Med       Date:  2016-05-24       Impact factor: 2.373

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.