| Literature DB >> 24817772 |
Daniela M Witten, Ali Shojaie, Fan Zhang.
Abstract
In the high-dimensional regression setting, the elastic net produces a parsimonious model by shrinking all coefficients towards the origin. However, in certain settings, this behavior might not be desirable: if some features are highly correlated with each other and associated with the response, then we might wish to perform less shrinkage on the coefficients corresponding to that subset of features. We propose the cluster elastic net, which selectively shrinks the coefficients for such variables towards each other, rather than towards the origin. Instead of assuming that the clusters are known a priori, the cluster elastic net infers clusters of features from the data, on the basis of correlation among the variables as well as association with the response. These clusters are then used in order to more accurately perform regression. We demonstrate the theoretical advantages of our proposed approach, and explore its performance in a simulation study, and in an application to HIV drug resistance data. Supplementary Materials are available online.Entities:
Keywords: correlated variables; feature clustering; feature selection; lasso; p ≫ n; ridge; structured sparsity
Year: 2014 PMID: 24817772 PMCID: PMC4011669 DOI: 10.1080/00401706.2013.810174
Source DB: PubMed Journal: Technometrics ISSN: 0040-1706