Literature DB >> 26614384

Sparse regression and marginal testing using cluster prototypes.

Stephen Reid1, Robert Tibshirani2.   

Abstract

We propose a new approach for sparse regression and marginal testing, for data with correlated features. Our procedure first clusters the features, and then chooses as the cluster prototype the most informative feature in that cluster. Then we apply either sparse regression (lasso) or marginal significance testing to these prototypes. While this kind of strategy is not entirely new, a key feature of our proposal is its use of the post-selection inference theory of Taylor and others (2014, Exact post-selection inference for forward stepwise and least angle regression, Preprint, arXiv:1401.3889) and Lee and others (2014, Exact post-selection inference with the lasso, Preprint, arXiv:1311.6238v5) to compute exact [Formula: see text]-values and confidence intervals that properly account for the selection of prototypes. We also apply the recent "knockoff" idea of Barber and Candès (2014, Controlling the false discovery rate via knockoffs, Preprint, arXiv:1404.5609) to provide exact finite sample control of the FDR of our regression procedure. We illustrate our proposals on both real and simulated data.
© The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Keywords:  Clustering; Correlated predictors; Knockoff; Lasso; Marginal screening; Post-selection inference

Mesh:

Year:  2015        PMID: 26614384      PMCID: PMC5006118          DOI: 10.1093/biostatistics/kxv049

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  5 in total

1.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns.

Authors:  T Hastie; R Tibshirani; M B Eisen; A Alizadeh; R Levy; L Staudt; W C Chan; D Botstein; P Brown
Journal:  Genome Biol       Date:  2000-08-04       Impact factor: 13.583

2.  Averaged gene expressions for regression.

Authors:  Mee Young Park; Trevor Hastie; Robert Tibshirani
Journal:  Biostatistics       Date:  2006-05-11       Impact factor: 5.899

3.  Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.

Authors:  Howard D Bondell; Brian J Reich
Journal:  Biometrics       Date:  2007-06-30       Impact factor: 2.571

4.  Hierarchical Clustering With Prototypes via Minimax Linkage.

Authors:  Jacob Bien; Robert Tibshirani
Journal:  J Am Stat Assoc       Date:  2011       Impact factor: 5.033

5.  Supervised harvesting of expression trees.

Authors:  T Hastie; R Tibshirani; D Botstein; P Brown
Journal:  Genome Biol       Date:  2001-01-10       Impact factor: 13.583

  5 in total
  6 in total

Review 1.  Statistical Approaches to Address Multi-Pollutant Mixtures and Multiple Exposures: the State of the Science.

Authors:  Massimo Stafoggia; Susanne Breitner; Regina Hampel; Xavier Basagaña
Journal:  Curr Environ Health Rep       Date:  2017-12

2.  Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques.

Authors:  Sumeet Patiyal; Anjali Dhall; Gajendra P S Raghava
Journal:  Biol Methods Protoc       Date:  2022-05-27

3.  Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data.

Authors:  Lai Jiang; Celia M T Greenwood; Weixin Yao; Longhai Li
Journal:  Sci Rep       Date:  2020-06-16       Impact factor: 4.379

4.  Selection-adjusted inference: an application to confidence intervals for cis-eQTL effect sizes.

Authors:  Snigdha Panigrahi; Junjie Zhu; Chiara Sabatti
Journal:  Biostatistics       Date:  2021-01-28       Impact factor: 5.899

5.  Output-Related and -Unrelated Fault Monitoring with an Improvement Prototype Knockoff Filter and Feature Selection Based on Laplacian Eigen Maps and Sparse Regression.

Authors:  Cuiping Xue; Tie Zhang; Dong Xiao
Journal:  ACS Omega       Date:  2021-04-19

6.  An analytic approach for interpretable predictive models in high-dimensional data in the presence of interactions with exposures.

Authors:  Sahir Rai Bhatnagar; Yi Yang; Budhachandra Khundrakpam; Alan C Evans; Mathieu Blanchette; Luigi Bouchard; Celia M T Greenwood
Journal:  Genet Epidemiol       Date:  2018-02-08       Impact factor: 2.135

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.