| Literature DB >> 31130762 |
Abstract
It is of fundamental interest in statistics to test the significance of a set of covariates. For example, in genome-wide association studies, a joint null hypothesis of no genetic effect is tested for a set of multiple genetic variants. The minimum p-value method, higher criticism, and Berk-Jones tests are particularly effective when the covariates with nonzero effects are sparse. However, the correlations among covariates and the non-Gaussian distribution of the response pose a great challenge towards the p-value calculation of the three tests. In practice, permutation is commonly used to obtain accurate p-values, but it is computationally very intensive, especially when we need to conduct a large amount of hypothesis testing. In this paper, we propose a Gaussian approximation method based on a Monte Carlo scheme, which is computationally more efficient than permutation while still achieving similar accuracy. We derive non-asymptotic approximation error bounds that could vanish in the limit even if the number of covariates is much larger than the sample size. Through real-genotype-based simulations and data analysis of a genome-wide association study of Crohn's disease, we compare the accuracy and computation cost of our proposed method, of permutation, and of the method based on asymptotic distribution.Entities:
Keywords: Berk-Jones test; Genome-wide association study; High dimensionality; Higher criticism; Monte Carlo method
Year: 2018 PMID: 31130762 PMCID: PMC6530914 DOI: 10.1080/01621459.2017.1407776
Source DB: PubMed Journal: J Am Stat Assoc ISSN: 0162-1459 Impact factor: 5.033