Literature DB >> 35847529

MODEL ASSISTED VARIABLE CLUSTERING: MINIMAX-OPTIMAL RECOVERY AND ALGORITHMS.

Florentina Bunea1, Christophe Giraud2, Xi Luo3, Martin Royer2, Nicolas Verzelen4.   

Abstract

The problem of variable clustering is that of estimating groups of similar components of a p-dimensional vector X = (X 1, … , X p ) from n independent copies of X. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to G-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular K-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we compare our methods with another popular clustering method, spectral clustering. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.

Entities:  

Keywords:  Convergence rates; Primary 62H30; convex optimization; covariance matrices; high-dimensional inference; secondary 62C20

Year:  2020        PMID: 35847529      PMCID: PMC9286061          DOI: 10.1214/18-aos1794

Source DB:  PubMed          Journal:  Ann Stat        ISSN: 0090-5364            Impact factor:   4.904


  16 in total

1.  Identification of large-scale networks in the brain using fMRI.

Authors:  Pierre Bellec; Vincent Perlbarg; Saâd Jbabdi; Mélanie Pélégrini-Issac; Jean-Luc Anton; Julien Doyon; Habib Benali
Journal:  Neuroimage       Date:  2005-10-24       Impact factor: 6.556

2.  Individual parcellation of resting fMRI with a group functional connectivity prior.

Authors:  M Chong; C Bhushan; A A Joshi; S Choi; J P Haldar; D W Shattuck; R N Spreng; R M Leahy
Journal:  Neuroimage       Date:  2017-05-03       Impact factor: 6.556

3.  ENTRYWISE EIGENVECTOR ANALYSIS OF RANDOM MATRICES WITH LOW EXPECTED RANK.

Authors:  Emmanuel Abbe; Jianqing Fan; Kaizheng Wang; Yiqiao Zhong
Journal:  Ann Stat       Date:  2020-07-17       Impact factor: 4.028

4.  Ensemble Clustering using Semidefinite Programming.

Authors:  Vikas Singh; Lopamudra Mukherjee; Jiming Peng; Jinhui Xu
Journal:  Adv Neural Inf Process Syst       Date:  2007-12-31

5.  A whole brain fMRI atlas generated via spatially constrained spectral clustering.

Authors:  R Cameron Craddock; G Andrew James; Paul E Holtzheimer; Xiaoping P Hu; Helen S Mayberg
Journal:  Hum Brain Mapp       Date:  2011-07-18       Impact factor: 5.038

6.  Spatial Topography of Individual-Specific Cortical Networks Predicts Human Cognition, Personality, and Emotion.

Authors:  Ru Kong; Jingwei Li; Csaba Orban; Mert R Sabuncu; Hesheng Liu; Alexander Schaefer; Nanbo Sun; Xi-Nian Zuo; Avram J Holmes; Simon B Eickhoff; B T Thomas Yeo
Journal:  Cereb Cortex       Date:  2019-06-01       Impact factor: 5.357

7.  Functional network organization of the human brain.

Authors:  Jonathan D Power; Alexander L Cohen; Steven M Nelson; Gagan S Wig; Kelly Anne Barnes; Jessica A Church; Alecia C Vogel; Timothy O Laumann; Fran M Miezin; Bradley L Schlaggar; Steven E Petersen
Journal:  Neuron       Date:  2011-11-17       Impact factor: 17.173

8.  MODEL ASSISTED VARIABLE CLUSTERING: MINIMAX-OPTIMAL RECOVERY AND ALGORITHMS.

Authors:  Florentina Bunea; Christophe Giraud; Xi Luo; Martin Royer; Nicolas Verzelen
Journal:  Ann Stat       Date:  2020-02-17       Impact factor: 4.904

9.  Evaluation and improvements of clustering algorithms for detecting remote homologous protein families.

Authors:  Juliana S Bernardes; Fabio R J Vieira; Lygia M M Costa; Gerson Zaverucha
Journal:  BMC Bioinformatics       Date:  2015-02-05       Impact factor: 3.169

10.  GEM2Net: from gene expression modeling to -omics networks, a new CATdb module to investigate Arabidopsis thaliana genes involved in stress response.

Authors:  Rim Zaag; Jean Philippe Tamby; Cécile Guichard; Zakia Tariq; Guillem Rigaill; Etienne Delannoy; Jean-Pierre Renou; Sandrine Balzergue; Tristan Mary-Huard; Sébastien Aubourg; Marie-Laure Martin-Magniette; Véronique Brunaud
Journal:  Nucleic Acids Res       Date:  2014-11-11       Impact factor: 19.160

View more
  1 in total

1.  MODEL ASSISTED VARIABLE CLUSTERING: MINIMAX-OPTIMAL RECOVERY AND ALGORITHMS.

Authors:  Florentina Bunea; Christophe Giraud; Xi Luo; Martin Royer; Nicolas Verzelen
Journal:  Ann Stat       Date:  2020-02-17       Impact factor: 4.904

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.