Literature DB >> 15742891

Simultaneous feature selection and clustering using mixture models.

Martin H C Law1, Mário A T Figueiredo, Anil K Jain.   

Abstract

Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.

Mesh:

Year:  2004        PMID: 15742891     DOI: 10.1109/TPAMI.2004.71

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  13 in total

1.  Stability and change in patterns of concerns related to eating, weight, and shape in young adult women: a latent transition analysis.

Authors:  Angela S Cain; Amee J Epler; Douglas Steinley; Kenneth J Sher
Journal:  J Abnorm Psychol       Date:  2010-05

2.  Mixture models with multiple levels, with application to the analysis of multifactor gene expression data.

Authors:  Rebecka Jörnsten; Sündüz Keleş
Journal:  Biostatistics       Date:  2008-02-05       Impact factor: 5.899

3.  Concerns related to eating, weight, and shape: typologies and transitions in men during the college years.

Authors:  Angela S Cain; Amee J Epler; Douglas Steinley; Kenneth J Sher
Journal:  Int J Eat Disord       Date:  2011-07-08       Impact factor: 4.861

4.  Efficient neural spike sorting using data subdivision and unification.

Authors:  Masood Ul Hassan; Rakesh Veerabhadrappa; Asim Bhatti
Journal:  PLoS One       Date:  2021-02-10       Impact factor: 3.240

5.  Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

Authors:  Gilles Celeux; Marie-Laure Martin-Magniette; Cathy Maugis-Rabusseau; Adrian E Raftery
Journal:  J Soc Fr Statistique (2009)       Date:  2014

6.  Bayesian clustering and feature selection for cancer tissue samples.

Authors:  Pekka Marttinen; Samuel Myllykangas; Jukka Corander
Journal:  BMC Bioinformatics       Date:  2009-03-18       Impact factor: 3.169

7.  Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination.

Authors:  Christopher Yau; Chris Holmes
Journal:  Bayesian Anal       Date:  2011-07-01       Impact factor: 3.728

8.  Inferring latent heterogeneity using many feature variables supervised by survival outcome.

Authors:  Beilin Jia; Donglin Zeng; Jason J Z Liao; Guanghan F Liu; Xianming Tan; Guoqing Diao; Joseph G Ibrahim
Journal:  Stat Med       Date:  2021-04-05       Impact factor: 2.497

9.  Compatibility Evaluation of Clustering Algorithms for Contemporary Extracellular Neural Spike Sorting.

Authors:  Rakesh Veerabhadrappa; Masood Ul Hassan; James Zhang; Asim Bhatti
Journal:  Front Syst Neurosci       Date:  2020-06-30

10.  Model-based clustering of array CGH data.

Authors:  Sohrab P Shah; K-John Cheung; Nathalie A Johnson; Guillaume Alain; Randy D Gascoyne; Douglas E Horsman; Raymond T Ng; Kevin P Murphy
Journal:  Bioinformatics       Date:  2009-06-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.