Literature DB >> 25574062

A SIGNIFICANCE TEST FOR THE LASSO.

Richard Lockhart1, Jonathan Taylor2, Ryan J Tibshirani3, Robert Tibshirani4.   

Abstract

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a [Formula: see text] distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than [Formula: see text] under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the [Formula: see text] penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties-adaptivity and shrinkage-and its null distribution is tractable and asymptotically Exp(1).

Entities:  

Keywords:  Lasso; least angle regression; p-value; significance test

Year:  2014        PMID: 25574062      PMCID: PMC4285373          DOI: 10.1214/13-AOS1175

Source DB:  PubMed          Journal:  Ann Stat        ISSN: 0090-5364            Impact factor:   4.028


  5 in total

1.  Variance estimation using refitted cross-validation in ultrahigh dimensional regression.

Authors:  Jianqing Fan; Shaojun Guo; Ning Hao
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2012-01-01       Impact factor: 4.488

2.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

3.  A Perturbation Method for Inference on Regularized Regression Estimates.

Authors:  Jessica Minnier; Lu Tian; Tianxi Cai
Journal:  J Am Stat Assoc       Date:  2012-01-24       Impact factor: 5.033

4.  HIGH DIMENSIONAL VARIABLE SELECTION.

Authors:  Larry Wasserman; Kathryn Roeder
Journal:  Ann Stat       Date:  2009-01-01       Impact factor: 4.028

5.  Human immunodeficiency virus reverse transcriptase and protease sequence database.

Authors:  Soo-Yon Rhee; Matthew J Gonzales; Rami Kantor; Bradley J Betts; Jaideep Ravela; Robert W Shafer
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

  5 in total
  117 in total

1.  POWERFUL TEST BASED ON CONDITIONAL EFFECTS FOR GENOME-WIDE SCREENING.

Authors:  Yaowu Liu; Jun Xie
Journal:  Ann Appl Stat       Date:  2018-03-09       Impact factor: 2.083

2.  High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality.

Authors:  Zhaoran Wang; Quanquan Gu; Yang Ning; Han Liu
Journal:  Adv Neural Inf Process Syst       Date:  2015

3.  Too many covariates and too few cases? - a comparative study.

Authors:  Qingxia Chen; Hui Nian; Yuwei Zhu; H Keipp Talbot; Marie R Griffin; Frank E Harrell
Journal:  Stat Med       Date:  2016-06-30       Impact factor: 2.373

4.  Cross-validation and hypothesis testing in neuroimaging: An irenic comment on the exchange between Friston and Lindquist et al.

Authors:  Philip T Reiss
Journal:  Neuroimage       Date:  2015-04-25       Impact factor: 6.556

5.  Statistical learning and selective inference.

Authors:  Jonathan Taylor; Robert J Tibshirani
Journal:  Proc Natl Acad Sci U S A       Date:  2015-06-23       Impact factor: 11.205

6.  Collaborative regression.

Authors:  Samuel M Gross; Robert Tibshirani
Journal:  Biostatistics       Date:  2014-11-17       Impact factor: 5.899

Review 7.  Statistical learning approaches in the genetic epidemiology of complex diseases.

Authors:  Anne-Laure Boulesteix; Marvin N Wright; Sabine Hoffmann; Inke R König
Journal:  Hum Genet       Date:  2019-05-02       Impact factor: 4.132

8.  Disentangling the effects of farmland use, habitat edges, and vegetation structure on ground beetle morphological traits.

Authors:  Katherina Ng; Philip S Barton; Wade Blanchard; Maldwyn J Evans; David B Lindenmayer; Sarina Macfadyen; Sue McIntyre; Don A Driscoll
Journal:  Oecologia       Date:  2018-06-06       Impact factor: 3.225

9.  Prefrontal cortical activation during working memory task anticipation contributes to discrimination between bipolar and unipolar depression.

Authors:  Anna Manelis; Satish Iyengar; Holly A Swartz; Mary L Phillips
Journal:  Neuropsychopharmacology       Date:  2020-02-18       Impact factor: 7.853

10.  Graphical Models via Univariate Exponential Family Distributions.

Authors:  Eunho Yang; Pradeep Ravikumar; Genevera I Allen; Zhandong Liu
Journal:  J Mach Learn Res       Date:  2015-12       Impact factor: 3.654

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.