| Literature DB >> 32190012 |
Yingying Fan1, Emre Demirkaya2, Jinchi Lv1.
Abstract
Evaluating the joint significance of covariates is of fundamental importance in a wide range of applications. To this end, p-values are frequently employed and produced by algorithms that are powered by classical large-sample asymptotic theory. It is well known that the conventional p-values in Gaussian linear model are valid even when the dimensionality is a non-vanishing fraction of the sample size, but can break down when the design matrix becomes singular in higher dimensions or when the error distribution deviates from Gaussianity. A natural question is when the conventional p-values in generalized linear models become invalid in diverging dimensions. We establish that such a breakdown can occur early in nonlinear models. Our theoretical characterizations are confirmed by simulation studies.Entities:
Keywords: Nonuniformity; breakdown point; generalized linear model; high dimensionality; joint significance testing; p-value
Year: 2019 PMID: 32190012 PMCID: PMC7079742
Source DB: PubMed Journal: J Mach Learn Res ISSN: 1532-4435 Impact factor: 5.177
Figure 1:Results of KS and AD tests for testing the uniformity of GLM p-values in simulation example 1 for diverging-dimensional logistic regression model with uniform orthonormal design under global null. The vertical axis represents the p-value from the KS and AD tests, and the horizontal axis stands for the growth rate α0 of dimensionality p = [n].
Figure 3:Results of KS test for testing the uniformity of GLM p-values in simulation example 3 for diverging-dimensional logistic regression model with uncorrelated Gaussian design under global null for varying sparsity s. The vertical axis represents the p-value from the KS test, and the horizontal axis stands for the growth rate α0 of dimensionality p = [n].
Figure 2:Results of KS and AD tests for testing the uniformity of GLM p-values in simulation example 2 for diverging-dimensional logistic regression model with correlated Gaussian design under global null for varying correlation level ρ. The vertical axis represents the p-value from the KS and AD tests, and the horizontal axis stands for the growth rate α0 of dimensionality p = [n].
Figure 4:Histograms of null p-values in simulation example 1 from the first simulation repetition for different growth rates α0 of dimensionality p = [n].
Means and standard deviations (SD) for estimated probabilities of making type I error in simulation example 1 with α0 the growth rate of dimensionality p = [n]. Two significance levels a = 0.05 and 0.1 are considered.
|
| 0.10 | 0.47 | 0.57 | 0.67 | 0.77 | 0.87 | |
|---|---|---|---|---|---|---|---|
| Mean | 0.050 | 0.052 | 0.055 | 0.063 | 0.082 | 0.166 | |
| SD | 0.006 | 0.007 | 0.007 | 0.007 | 0.001 | 0.011 | |
| Mean | 0.098 | 0.104 | 0.107 | 0.118 | 0.144 | 0.247 | |
| SD | 0.008 | 0.010 | 0.009 | 0.011 | 0.012 | 0.013 |