Literature DB >> 19920875

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.

Benhuai Xie1, Wei Pan, Xiaotong Shen.   

Abstract

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing penalized likelihood approaches in model-based clustering analysis all assume a common diagonal covariance matrix across clusters, which however may not hold in practice. To analyze high-dimensional data, particularly those with relatively low sample sizes, this article introduces a novel approach that shrinks the variances together with means, in a more general situation with cluster-specific (diagonal) covariance matrices. Furthermore, selection of grouped variables via inclusion or exclusion of a group of variables altogether is permitted by a specific form of penalty, which facilitates incorporating subject-matter knowledge, such as gene functions in clustering microarray samples for disease subtype discovery. For implementation, EM algorithms are derived for parameter estimation, in which the M-steps clearly demonstrate the effects of shrinkage and thresholding. Numerical examples, including an application to acute leukemia subtype discovery with microarray gene expression data, are provided to demonstrate the utility and advantage of the proposed method.

Entities:  

Year:  2008        PMID: 19920875      PMCID: PMC2777718          DOI: 10.1214/08-EJS194

Source DB:  PubMed          Journal:  Electron J Stat        ISSN: 1935-7524            Impact factor:   1.125


  25 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  A mixture model-based approach to the clustering of microarray expression data.

Authors:  G J McLachlan; R W Bean; D Peel
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

3.  Mixture modelling of gene expression data from microarray experiments.

Authors:  Debashis Ghosh; Arul M Chinnaiyan
Journal:  Bioinformatics       Date:  2002-02       Impact factor: 6.937

4.  Comparing three methods for variance estimation with duplicated high density oligonucleotide arrays.

Authors:  Xiaohong Huang; Wei Pan
Journal:  Funct Integr Genomics       Date:  2002-07-24       Impact factor: 3.410

5.  Incorporating gene functions as priors in model-based clustering of microarray gene expression data.

Authors:  Wei Pan
Journal:  Bioinformatics       Date:  2006-01-24       Impact factor: 6.937

6.  Variable selection for model-based high-dimensional clustering and its application to microarray data.

Authors:  Sijian Wang; Ji Zhu
Journal:  Biometrics       Date:  2007-10-26       Impact factor: 2.571

7.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case.

Authors:  J G Liao; Khew-Voon Chin
Journal:  Bioinformatics       Date:  2007-05-31       Impact factor: 6.937

8.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

9.  Optimization models for cancer classification: extracting gene interaction information from microarray expression data.

Authors:  Alexey V Antonov; Igor V Tetko; Michael T Mader; Jan Budczies; Hans W Mewes
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

10.  Cluster-Rasch models for microarray gene expression data.

Authors:  H Li; F Hong
Journal:  Genome Biol       Date:  2001-07-31       Impact factor: 13.583

View more
  17 in total

1.  Sparse Biclustering of Transposable Data.

Authors:  Kean Ming Tan; Daniela M Witten
Journal:  J Comput Graph Stat       Date:  2014       Impact factor: 2.302

2.  Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data.

Authors:  Benhuai Xie; Wei Pan; Xiaotong Shen
Journal:  Bioinformatics       Date:  2009-12-23       Impact factor: 6.937

3.  Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.

Authors:  Zhiguang Huo; George Tseng
Journal:  Ann Appl Stat       Date:  2017-07-20       Impact factor: 2.083

4.  Statistical Significance of Clustering using Soft Thresholding.

Authors:  Hanwen Huang; Yufeng Liu; Ming Yuan; J S Marron
Journal:  J Comput Graph Stat       Date:  2015-12-10       Impact factor: 2.302

5.  Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty.

Authors:  Wei Pan; Xiaotong Shen; Binghui Liu
Journal:  J Mach Learn Res       Date:  2013-07-01       Impact factor: 3.654

6.  A framework for feature selection in clustering.

Authors:  Daniela M Witten; Robert Tibshirani
Journal:  J Am Stat Assoc       Date:  2010-06-01       Impact factor: 5.033

7.  Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

Authors:  Gilles Celeux; Marie-Laure Martin-Magniette; Cathy Maugis-Rabusseau; Adrian E Raftery
Journal:  J Soc Fr Statistique (2009)       Date:  2014

8.  Penalized model-based clustering with unconstrained covariance matrices.

Authors:  Hui Zhou; Wei Pan; Xiaotong Shen
Journal:  Electron J Stat       Date:  2009-01-01       Impact factor: 1.125

9.  Penalized unsupervised learning with outliers.

Authors:  Daniela M Witten
Journal:  Stat Interface       Date:  2013       Impact factor: 0.582

10.  Meta-analytic framework for sparse K-means to identify disease subtypes in multiple transcriptomic studies.

Authors:  Zhiguang Huo; Ying Ding; Silvia Liu; Steffi Oesterreich; George Tseng
Journal:  J Am Stat Assoc       Date:  2016-05-05       Impact factor: 5.033

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.