| Literature DB >> 33285928 |
Mariusz Kubkowski1,2, Jan Mielniczuk1,2.
Abstract
We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true response coincides with a postulated parametric response for a certain value of parameter, we obtain a common framework for parametric inference. Both cases of correct specification and misspecification are covered in this contribution. Variable selection for such a scenario aims at recovering the support of the minimizer of the associated risk with large probability. We propose a two-step selection Screening-Selection (SS) procedure which consists of screening and ordering predictors by Lasso method and then selecting the subset of predictors which minimizes the Generalized Information Criterion for the corresponding nested family of models. We prove consistency of the proposed selection method under conditions that allow for a much larger number of predictors than the number of observations. For the semi-parametric case when distribution of random predictors satisfies linear regressions condition, the true and the estimated parameters are collinear and their common support can be consistently identified. This partly explains robustness of selection procedures to the response function misspecification.Entities:
Keywords: consistent selection; generalized information criterion; high-dimensional regression; loss function; misspecification; random predictors; robustness; subgaussianity
Year: 2020 PMID: 33285928 PMCID: PMC7516565 DOI: 10.3390/e22020153
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1for Models M1 and M2.
Figure 2for Models M1 and M2.
Figure 3for Models M1 and M2.
Figure 4for Models M1 and M2.