| Literature DB >> 26614384 |
Stephen Reid1, Robert Tibshirani2.
Abstract
We propose a new approach for sparse regression and marginal testing, for data with correlated features. Our procedure first clusters the features, and then chooses as the cluster prototype the most informative feature in that cluster. Then we apply either sparse regression (lasso) or marginal significance testing to these prototypes. While this kind of strategy is not entirely new, a key feature of our proposal is its use of the post-selection inference theory of Taylor and others (2014, Exact post-selection inference for forward stepwise and least angle regression, Preprint, arXiv:1401.3889) and Lee and others (2014, Exact post-selection inference with the lasso, Preprint, arXiv:1311.6238v5) to compute exact [Formula: see text]-values and confidence intervals that properly account for the selection of prototypes. We also apply the recent "knockoff" idea of Barber and Candès (2014, Controlling the false discovery rate via knockoffs, Preprint, arXiv:1404.5609) to provide exact finite sample control of the FDR of our regression procedure. We illustrate our proposals on both real and simulated data.Keywords: Clustering; Correlated predictors; Knockoff; Lasso; Marginal screening; Post-selection inference
Mesh:
Year: 2015 PMID: 26614384 PMCID: PMC5006118 DOI: 10.1093/biostatistics/kxv049
Source DB: PubMed Journal: Biostatistics ISSN: 1465-4644 Impact factor: 5.899