Literature DB >> 20463857

Penalized model-based clustering with unconstrained covariance matrices.

Hui Zhou1, Wei Pan, Xiaotong Shen.   

Abstract

Clustering is one of the most useful tools for high-dimensional analysis, e.g., for microarray data. It becomes challenging in presence of a large number of noise variables, which may mask underlying clustering structures. Therefore, noise removal through variable selection is necessary. One effective way is regularization for simultaneous parameter estimation and variable selection in model-based clustering. However, existing methods focus on regularizing the mean parameters representing centers of clusters, ignoring dependencies among variables within clusters, leading to incorrect orientations or shapes of the resulting clusters. In this article, we propose a regularized Gaussian mixture model permitting a treatment of general covariance matrices, taking various dependencies into account. At the same time, this approach shrinks the means and covariance matrices, achieving better clustering and variable selection. To overcome one technical challenge in estimating possibly large covariance matrices, we derive an E-M algorithm utilizing the graphical lasso (Friedman et al 2007) for parameter estimation. Numerical examples, including applications to microarray gene expression data, demonstrate the utility of the proposed method.

Entities:  

Year:  2009        PMID: 20463857      PMCID: PMC2867492          DOI: 10.1214/09-EJS487

Source DB:  PubMed          Journal:  Electron J Stat        ISSN: 1935-7524            Impact factor:   1.125


  22 in total

1.  A mixture model-based approach to the clustering of microarray expression data.

Authors:  G J McLachlan; R W Bean; D Peel
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

2.  Endothelial cell diversity revealed by global expression profiling.

Authors:  Jen-Tsan Chi; Howard Y Chang; Guttorm Haraldsen; Frode L Jahnsen; Olga G Troyanskaya; Dustin S Chang; Zhen Wang; Stanley G Rockson; Matt van de Rijn; David Botstein; Patrick O Brown
Journal:  Proc Natl Acad Sci U S A       Date:  2003-09-08       Impact factor: 11.205

3.  Incorporating gene functions as priors in model-based clustering of microarray gene expression data.

Authors:  Wei Pan
Journal:  Bioinformatics       Date:  2006-01-24       Impact factor: 6.937

4.  Variable selection for model-based high-dimensional clustering and its application to microarray data.

Authors:  Sijian Wang; Ji Zhu
Journal:  Biometrics       Date:  2007-10-26       Impact factor: 2.571

5.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case.

Authors:  J G Liao; Khew-Voon Chin
Journal:  Bioinformatics       Date:  2007-05-31       Impact factor: 6.937

6.  Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data.

Authors:  George C Tseng
Journal:  Bioinformatics       Date:  2007-06-27       Impact factor: 6.937

7.  Variable selection in penalized model-based clustering via regularization on grouped parameters.

Authors:  Benhuai Xie; Wei Pan; Xiaotong Shen
Journal:  Biometrics       Date:  2007-12-20       Impact factor: 2.571

8.  Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.

Authors:  Benhuai Xie; Wei Pan; Xiaotong Shen
Journal:  Electron J Stat       Date:  2008       Impact factor: 1.125

9.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

10.  Identifying genes that contribute most to good classification in microarrays.

Authors:  Stuart G Baker; Barnett S Kramer
Journal:  BMC Bioinformatics       Date:  2006-09-07       Impact factor: 3.169

View more
  10 in total

1.  Graph-based sparse linear discriminant analysis for high-dimensional classification.

Authors:  Jianyu Liu; Guan Yu; Yufeng Liu
Journal:  J Multivar Anal       Date:  2018-12-17       Impact factor: 1.473

2.  Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data.

Authors:  Benhuai Xie; Wei Pan; Xiaotong Shen
Journal:  Bioinformatics       Date:  2009-12-23       Impact factor: 6.937

3.  Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

Authors:  Gilles Celeux; Marie-Laure Martin-Magniette; Cathy Maugis-Rabusseau; Adrian E Raftery
Journal:  J Soc Fr Statistique (2009)       Date:  2014

4.  Integrative clustering methods for multi-omics data.

Authors:  Xiaoyu Zhang; Zhenwei Zhou; Hanfei Xu; Ching-Ti Liu
Journal:  Wiley Interdiscip Rev Comput Stat       Date:  2021-02-07

5.  Clustering High-Dimensional Landmark-based Two-dimensional Shape Data.

Authors:  Chao Huang; Martin Styner; Hongtu Zhu
Journal:  J Am Stat Assoc       Date:  2015-04-16       Impact factor: 5.033

6.  Estimation of multiple networks in Gaussian mixture models.

Authors:  Chen Gao; Yunzhang Zhu; Xiaotong Shen; Wei Pan
Journal:  Electron J Stat       Date:  2016-05-02       Impact factor: 1.125

7.  HeteroGGM: an R package for Gaussian graphical model-based heterogeneity analysis.

Authors:  Mingyang Ren; Sanguo Zhang; Qingzhao Zhang; Shuangge Ma
Journal:  Bioinformatics       Date:  2021-02-26       Impact factor: 6.937

8.  Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm.

Authors:  Meng-Yun Wu; Dao-Qing Dai; Xiao-Fei Zhang; Yuan Zhu
Journal:  PLoS One       Date:  2013-06-17       Impact factor: 3.240

9.  Penalized model-based clustering of fMRI data.

Authors:  Andrew Dilernia; Karina Quevedo; Jazmin Camchong; Kelvin Lim; Wei Pan; Lin Zhang
Journal:  Biostatistics       Date:  2022-07-18       Impact factor: 5.279

10.  Molecular heterogeneity at the network level: high-dimensional testing, clustering and a TCGA case study.

Authors:  Nicolas Städler; Frank Dondelinger; Steven M Hill; Rehan Akbani; Yiling Lu; Gordon B Mills; Sach Mukherjee
Journal:  Bioinformatics       Date:  2017-09-15       Impact factor: 6.937

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.