Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

Literature DB >> 25279246

Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

Gilles Celeux¹, Marie-Laure Martin-Magniette², Cathy Maugis-Rabusseau³, Adrian E Raftery⁴.

Abstract

We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension context.

Entities: Chemical Species

Keywords: Model selection; Model-based clustering; Regularization approach; Variable selection

Year: 2014 PMID： 25279246 PMCID： PMC4178956

Source DB: PubMed Journal: J Soc Fr Statistique (2009) ISSN： 1962-5197

9 in total

1. PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS.

Authors: J H Wolfe
Journal: Multivariate Behav Res Date: 1970-04-01 Impact factor: 5.923

2. Simultaneous feature selection and clustering using mixture models.

Authors: Martin H C Law; Mário A T Figueiredo; Anil K Jain
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2004-09 Impact factor: 6.226

3. Variable selection for model-based high-dimensional clustering and its application to microarray data.

Authors: Sijian Wang; Ji Zhu
Journal: Biometrics Date: 2007-10-26 Impact factor: 2.571

4. Variable selection for clustering with Gaussian mixture models.

Authors: Cathy Maugis; Gilles Celeux; Marie-Laure Martin-Magniette
Journal: Biometrics Date: 2009-02-04 Impact factor: 2.571

5. Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.

Authors: Benhuai Xie; Wei Pan; Xiaotong Shen
Journal: Electron J Stat Date: 2008 Impact factor: 1.125

6. A framework for feature selection in clustering.

Authors: Daniela M Witten; Robert Tibshirani
Journal: J Am Stat Assoc Date: 2010-06-01 Impact factor: 5.033

7. Penalized model-based clustering with unconstrained covariance matrices.

Authors: Hui Zhou; Wei Pan; Xiaotong Shen
Journal: Electron J Stat Date: 2009-01-01 Impact factor: 1.125

8. Pairwise variable selection for high-dimensional model-based clustering.

Authors: Jian Guo; Elizaveta Levina; George Michailidis; Ji Zhu
Journal: Biometrics Date: 2010-09 Impact factor: 2.571

9. CATdb: a public access to Arabidopsis transcriptome data from the URGV-CATMA platform.

Authors: Séverine Gagnot; Jean-Philippe Tamby; Marie-Laure Martin-Magniette; Frédérique Bitton; Ludivine Taconnat; Sandrine Balzergue; Sébastien Aubourg; Jean-Pierre Renou; Alain Lecharny; Véronique Brunaud
Journal: Nucleic Acids Res Date: 2007-10-16 Impact factor: 16.971

9 in total

3 in total

1. Prevalence and Risk Factors of PrEP Use Stigma Among Adolescent Girls and Young Women in Johannesburg, South Africa and Mwanza, Tanzania Participating in the EMPOWER Trial.

Authors: R J Munthali; A L Stangl; D Baron; I Barré; S Harvey; L Ramskin; M Colombini; N Naicker; S Kapiga; S Delany-Moretlwe
Journal: AIDS Behav Date: 2022-07-01

2. clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R.

Authors: Luca Scrucca; Adrian E Raftery
Journal: J Stat Softw Date: 2018-04-17 Impact factor: 6.440

3. A Bayesian probit model with spatially varying coefficients for brain decoding using fMRI data.

Authors: Fengqing Zhang; Wenxin Jiang; Patrick Wong; Ji-Ping Wang
Journal: Stat Med Date: 2016-05-24 Impact factor: 2.373

3 in total