Literature DB >> 26755893

Statistical Significance of Clustering using Soft Thresholding.

Hanwen Huang1, Yufeng Liu2, Ming Yuan3, J S Marron4.   

Abstract

Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts. This challenge is especially serious, and very few methods are available, when the data are very high in dimension. Statistical Significance of Clustering (SigClust) is a recently developed cluster evaluation tool for high dimensional low sample size data. An important component of the SigClust approach is the very definition of a single cluster as a subset of data sampled from a multivariate Gaussian distribution. The implementation of SigClust requires the estimation of the eigenvalues of the covariance matrix for the null multivariate Gaussian distribution. We show that the original eigenvalue estimation can lead to a test that suffers from severe inflation of type-I error, in the important case where there are a few very large eigenvalues. This paper addresses this critical challenge using a novel likelihood based soft thresholding approach to estimate these eigenvalues, which leads to a much improved SigClust. Major improvements in SigClust performance are shown by both mathematical analysis, based on the new notion of Theoretical Cluster Index, and extensive simulation studies. Applications to some cancer genomic data further demonstrate the usefulness of these improvements.

Entities:  

Keywords:  Clustering; Covariance Estimation; High Dimension; Invariance Principles; Unsupervised Learning

Year:  2015        PMID: 26755893      PMCID: PMC4706235          DOI: 10.1080/10618600.2014.948179

Source DB:  PubMed          Journal:  J Comput Graph Stat        ISSN: 1061-8600            Impact factor:   2.302


  15 in total

1.  Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.

Authors:  Lisa M McShane; Michael D Radmacher; Boris Freidlin; Ren Yu; Ming-Chung Li; Richard Simon
Journal:  Bioinformatics       Date:  2002-11       Impact factor: 6.937

2.  Model-based clustering of microarray expression data via latent Gaussian mixture models.

Authors:  Paul D McNicholas; Thomas Brendan Murphy
Journal:  Bioinformatics       Date:  2010-08-29       Impact factor: 6.937

3.  Variable selection for model-based high-dimensional clustering and its application to microarray data.

Authors:  Sijian Wang; Ji Zhu
Journal:  Biometrics       Date:  2007-10-26       Impact factor: 2.571

4.  Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.

Authors:  Benhuai Xie; Wei Pan; Xiaotong Shen
Journal:  Electron J Stat       Date:  2008       Impact factor: 1.125

5.  Mixtures of common t-factor analyzers for clustering high-dimensional microarray data.

Authors:  Jangsun Baek; Geoffrey J McLachlan
Journal:  Bioinformatics       Date:  2011-03-03       Impact factor: 6.937

6.  Concordance among gene-expression-based predictors for breast cancer.

Authors:  Cheng Fan; Daniel S Oh; Lodewyk Wessels; Britta Weigelt; Dimitry S A Nuyten; Andrew B Nobel; Laura J van't Veer; Charles M Perou
Journal:  N Engl J Med       Date:  2006-08-10       Impact factor: 91.245

7.  NETWORK EXPLORATION VIA THE ADAPTIVE LASSO AND SCAD PENALTIES.

Authors:  Jianqing Fan; Yang Feng; Yichao Wu
Journal:  Ann Appl Stat       Date:  2009-06-01       Impact factor: 2.083

8.  Supervised risk predictor of breast cancer based on intrinsic subtypes.

Authors:  Joel S Parker; Michael Mullins; Maggie C U Cheang; Samuel Leung; David Voduc; Tammi Vickery; Sherri Davies; Christiane Fauron; Xiaping He; Zhiyuan Hu; John F Quackenbush; Inge J Stijleman; Juan Palazzo; J S Marron; Andrew B Nobel; Elaine Mardis; Torsten O Nielsen; Matthew J Ellis; Charles M Perou; Philip S Bernard
Journal:  J Clin Oncol       Date:  2009-02-09       Impact factor: 44.544

9.  A core MYC gene expression signature is prominent in basal-like breast cancer but only partially overlaps the core serum response.

Authors:  Sanjay Chandriani; Eirik Frengen; Victoria H Cowling; Sarah A Pendergrass; Charles M Perou; Michael L Whitfield; Michael D Cole
Journal:  PLoS One       Date:  2009-08-19       Impact factor: 3.240

10.  Molecular subsets in the gene expression signatures of scleroderma skin.

Authors:  Ausra Milano; Sarah A Pendergrass; Jennifer L Sargent; Lacy K George; Timothy H McCalmont; M Kari Connolly; Michael L Whitfield
Journal:  PLoS One       Date:  2008-07-16       Impact factor: 3.240

View more
  12 in total

1.  Statistical significance for hierarchical clustering.

Authors:  Patrick K Kimes; Yufeng Liu; David Neil Hayes; James Stephen Marron
Journal:  Biometrics       Date:  2017-01-18       Impact factor: 2.571

2.  Molecular profiling predicts meningioma recurrence and reveals loss of DREAM complex repression in aggressive tumors.

Authors:  Akash J Patel; Ying-Wooi Wan; Rami Al-Ouran; Jean-Pierre Revelli; Maria F Cardenas; Mazen Oneissi; Liu Xi; Ali Jalali; John F Magnotti; Donna M Muzny; HarshaVardhan Doddapaneni; Sherly Sebastian; Kent A Heck; J Clay Goodman; Shankar P Gopinath; Zhandong Liu; Ganesh Rao; Sharon E Plon; Daniel Yoshor; David A Wheeler; Huda Y Zoghbi; Tiemo J Klisch
Journal:  Proc Natl Acad Sci U S A       Date:  2019-10-07       Impact factor: 11.205

3.  MultiK: an automated tool to determine optimal cluster numbers in single-cell RNA sequencing data.

Authors:  Siyao Liu; Aatish Thennavan; Joseph P Garay; J S Marron; Charles M Perou
Journal:  Genome Biol       Date:  2021-08-19       Impact factor: 13.583

Review 4.  Phenotypes of osteoarthritis: current state and future implications.

Authors:  Leticia A Deveza; Amanda E Nelson; Richard F Loeser
Journal:  Clin Exp Rheumatol       Date:  2019-10-15       Impact factor: 4.473

5.  Integrative Analysis Identifies Four Molecular and Clinical Subsets in Uveal Melanoma.

Authors:  A Gordon Robertson; Juliann Shih; Christina Yau; Ewan A Gibb; Junna Oba; Karen L Mungall; Julian M Hess; Vladislav Uzunangelov; Vonn Walter; Ludmila Danilova; Tara M Lichtenberg; Melanie Kucherlapati; Patrick K Kimes; Ming Tang; Alexander Penson; Ozgun Babur; Rehan Akbani; Christopher A Bristow; Katherine A Hoadley; Lisa Iype; Matthew T Chang; Andrew D Cherniack; Christopher Benz; Gordon B Mills; Roel G W Verhaak; Klaus G Griewank; Ina Felau; Jean C Zenklusen; Jeffrey E Gershenwald; Lynn Schoenfield; Alexander J Lazar; Mohamed H Abdel-Rahman; Sergio Roman-Roman; Marc-Henri Stern; Colleen M Cebulla; Michelle D Williams; Martine J Jager; Sarah E Coupland; Bita Esmaeli; Cyriac Kandoth; Scott E Woodman
Journal:  Cancer Cell       Date:  2017-08-14       Impact factor: 31.743

6.  Multi-omics Analysis of Microenvironment Characteristics and Immune Escape Mechanisms of Hepatocellular Carcinoma.

Authors:  Wenli Li; Huimei Wang; Zhanzhong Ma; Jian Zhang; Wen Ou-Yang; Yan Qi; Jun Liu
Journal:  Front Oncol       Date:  2019-10-15       Impact factor: 6.244

7.  Evidence for Multiple Subpopulations of Herpesvirus-Latently Infected Cells.

Authors:  Justin T Landis; Ryan Tuck; Yue Pan; Carson N Mosso; Anthony B Eason; Razia Moorad; J Stephen Marron; Dirk P Dittmer
Journal:  mBio       Date:  2022-01-04       Impact factor: 7.867

8.  Dysregulated BMP2 in the Placenta May Contribute to Early-Onset Preeclampsia by Regulating Human Trophoblast Expression of Extracellular Matrix and Adhesion Molecules.

Authors:  Yuyin Yi; Hua Zhu; Christian Klausen; Hsun-Ming Chang; Amy M Inkster; Jefferson Terry; Peter C K Leung
Journal:  Front Cell Dev Biol       Date:  2021-12-14

9.  Gene co-expression modules as clinically relevant hallmarks of breast cancer diversity.

Authors:  Denise M Wolf; Marc E Lenburg; Christina Yau; Aaron Boudreau; Laura J van 't Veer
Journal:  PLoS One       Date:  2014-02-07       Impact factor: 3.240

10.  diceR: an R package for class discovery using an ensemble driven approach.

Authors:  Derek S Chiu; Aline Talhouk
Journal:  BMC Bioinformatics       Date:  2018-01-15       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.