| Literature DB >> 25075117 |
Christoph Lippert1, Jing Xiang1, Danilo Horta1, Christian Widmer1, Carl Kadie1, David Heckerman1, Jennifer Listgarten1.
Abstract
MOTIVATION: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods.Entities:
Mesh:
Year: 2014 PMID: 25075117 PMCID: PMC4221116 DOI: 10.1093/bioinformatics/btu504
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Number of significant associations on real datasets
Number of significant associations on several real datasets using a Bonferroni-corrected threshold α = 0.05, when no background variance component is applied (1K), and when a background variance component computed from all SNPs is used to correct for potential confounding factors (2K). Grayed boxes denote cases where the score test found more associations than the LR test.
Type I error for common variants sets
| Algorithm | |||
|---|---|---|---|
| Gaussian phenotype | |||
| Linear score | 1.3 × 10−5 | 1.1 × 10−4 | 1.0 × 10−3 |
| Linear LR test | 1.4 × 10−5 | 1.1 × 10−4 | 1.0 × 10−3 |
| Binary phenotype | |||
| Linear score | 7.0 × 10−6 | 1.1 × 10−4 | 9.7 × 10−4 |
| Linear LR test | 1.0 × 10−5 | 1.1 × 10−4 | 1.0 × 10−3 |
| Logistic score | 7.0 × 10−6 | 1.1 × 10−4 | 9.7 × 10−4 |
No statistically significant deviations from expectation according to binomial test with significance level of 0.05.
Type I error for rare variant sets
| Algorithm | |||
|---|---|---|---|
| Gaussian phenotype | |||
| Linear score | 9.9 × 10−6 | 1.0 × 10−4 | 9.7 × 10−4 |
| Linear LR test | 6.9 × 10−6 | 1.1 × 10−4 | 1.0 × 10−3 |
| Binary phenotype | |||
| Linear score | 1.4 × 10−5 | 9.6 × 10−5 | 9.7 × 10−5 |
| Linear LR test | 1.6 × 10−5 | 1.0 × 10−4 | 9.8 × 10−4 |
| Logistic score | 1.4 × 10−5 | 9.9 × 10−5 | 1.0 × 10−3 |
No statistically significant deviations from expectation according to binomial test with significance level of 0.05.
Fig. 1.Power on synthetic data for each method in each setting, for the lowest signal strength, . Fraction of tests deemed significant across various significance levels for each method is shown on the vertical axis. The threshold for significance is shown on the horizontal axis. Other signal strengths are shown in Figure 2 and Supplementary Figures S1 and S2
Fig. 2.Power on synthetic data for each method in each setting, for signal strength, . Fraction of tests deemed significant across various significance levels for each method is shown on the vertical axis. The threshold for significance is shown on the horizontal axis. Other signal strengths are shown in Figure 1 and Supplementary Figures S1 and S2
Runtimes and time complexity for the 13,500 WTCCC dataset
| Algorithm | Time | Time complexity |
|---|---|---|
| One variance component model | ||
| SKAT ( | 0.03 s | |
| FaST-LMM-set score | 0.03 s | |
| FaST-LMM-Set LR test | 0.04 s | |
| Two variance component model full rank background kernel | ||
| FaST-LMM-set score | 2 s | |
| FaST-LMM-set LR test | 1.6 h | |
| LMM-Set LR test (before improvement) | 150 h | |
| Two variance component model low rank background kernel | ||
| FaST-LMM-set score | See text | |
| LMM-set score (before improvement) | See text | |
| FaST-LMM-set LR test | See text | |
Runtimes on a single core and time complexities for various linear set tests, both without a background kernel (one variance component model) and with (two variance component model) after applying our improvements with exceptions noted. The time reported is the time per test averaged over 13 850 tests from the WTCC1 type 1 diabetes dataset. Runtimes and complexities for the two-variance full rank cases exclude the O(N3) computations shared across all tests and done upfront (2 s, when amortized over the 13 850 tests). The logistic score model had approximately the same timing as the linear score, and so here we report only the linear score. For the LR test, the time includes the 10 permutations that are required. Regarding the notation for time complexity, and refer to the size of the background and test components, respectively.