Literature DB >> 24817772

The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping.

Daniela M Witten, Ali Shojaie, Fan Zhang.   

Abstract

In the high-dimensional regression setting, the elastic net produces a parsimonious model by shrinking all coefficients towards the origin. However, in certain settings, this behavior might not be desirable: if some features are highly correlated with each other and associated with the response, then we might wish to perform less shrinkage on the coefficients corresponding to that subset of features. We propose the cluster elastic net, which selectively shrinks the coefficients for such variables towards each other, rather than towards the origin. Instead of assuming that the clusters are known a priori, the cluster elastic net infers clusters of features from the data, on the basis of correlation among the variables as well as association with the response. These clusters are then used in order to more accurately perform regression. We demonstrate the theoretical advantages of our proposed approach, and explore its performance in a simulation study, and in an application to HIV drug resistance data. Supplementary Materials are available online.

Entities:  

Keywords:  correlated variables; feature clustering; feature selection; lasso; p ≫ n; ridge; structured sparsity

Year:  2014        PMID: 24817772      PMCID: PMC4011669          DOI: 10.1080/00401706.2013.810174

Source DB:  PubMed          Journal:  Technometrics        ISSN: 0040-1706


  10 in total

1.  Averaged gene expressions for regression.

Authors:  Mee Young Park; Trevor Hastie; Robert Tibshirani
Journal:  Biostatistics       Date:  2006-05-11       Impact factor: 5.899

2.  Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.

Authors:  Howard D Bondell; Brian J Reich
Journal:  Biometrics       Date:  2007-06-30       Impact factor: 2.571

3.  Network-constrained regularization and variable selection for analysis of genomic data.

Authors:  Caiyan Li; Hongzhe Li
Journal:  Bioinformatics       Date:  2008-03-01       Impact factor: 6.937

4.  Simultaneous supervised clustering and feature selection over a graph.

Authors:  Xiaotong Shen; Hsin-Cheng Huang; Wei Pan
Journal:  Biometrika       Date:  2012-10-18       Impact factor: 2.445

5.  Genotypic predictors of human immunodeficiency virus type 1 drug resistance.

Authors:  Soo-Yon Rhee; Jonathan Taylor; Gauhar Wadhera; Asa Ben-Hur; Douglas L Brutlag; Robert W Shafer
Journal:  Proc Natl Acad Sci U S A       Date:  2006-10-25       Impact factor: 11.205

6.  The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression.

Authors:  Jian Huang; Shuangge Ma; Hongzhe Li; Cun-Hui Zhang
Journal:  Ann Stat       Date:  2011       Impact factor: 4.028

Review 7.  Rationale and uses of a public HIV drug-resistance database.

Authors:  Robert W Shafer
Journal:  J Infect Dis       Date:  2006-09-15       Impact factor: 5.226

8.  Consistent Group Identification and Variable Selection in Regression with Correlated Predictors.

Authors:  Dhruv B Sharma; Howard D Bondell; Hao Helen Zhang
Journal:  J Comput Graph Stat       Date:  2013-04-01       Impact factor: 2.302

9.  VARIABLE SELECTION AND REGRESSION ANALYSIS FOR GRAPH-STRUCTURED COVARIATES WITH AN APPLICATION TO GENOMICS.

Authors:  Caiyan Li; Hongzhe Li
Journal:  Ann Appl Stat       Date:  2010-09-01       Impact factor: 2.083

10.  Supervised harvesting of expression trees.

Authors:  T Hastie; R Tibshirani; D Botstein; P Brown
Journal:  Genome Biol       Date:  2001-01-10       Impact factor: 13.583

  10 in total
  2 in total

1.  A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses.

Authors:  Jyoti Shankar; Sebastian Szpakowski; Norma V Solis; Stephanie Mounaud; Hong Liu; Liliana Losada; William C Nierman; Scott G Filler
Journal:  BMC Bioinformatics       Date:  2015-02-01       Impact factor: 3.169

2.  An analytic approach for interpretable predictive models in high-dimensional data in the presence of interactions with exposures.

Authors:  Sahir Rai Bhatnagar; Yi Yang; Budhachandra Khundrakpam; Alan C Evans; Mathieu Blanchette; Luigi Bouchard; Celia M T Greenwood
Journal:  Genet Epidemiol       Date:  2018-02-08       Impact factor: 2.135

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.