| Literature DB >> 28959274 |
Kelsey E Grinde1, Jaron Arbet2, Alden Green3, Michael O'Connell2, Alessandra Valcarcel4, Jason Westra5,6, Nathan Tintle6.
Abstract
To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as "winner's curse." We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10-6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures.Entities:
Keywords: SKAT; burden test; case-control; next-generation sequencing; winner's curse
Year: 2017 PMID: 28959274 PMCID: PMC5603735 DOI: 10.3389/fgene.2017.00117
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Flow chart depicting the bootstrap resampling approach. A high-level overview of the steps involved in the proposed bootstrap resampling approach to adjust for winner's curse in post-hoc single variant analyses.
Overall improvement in bias and MSE of bias-adjusted statistics across 10,000 replications of the 40 alternative hypothesis simulation settings.
| 1 | Bias | 0.94 | 1.63 × 10−04 | 3.14 × 10−05 | 5.20 | ||
| MSE | 0.90 | 7.32 × 10−08 | 1.34 × 10−08 | 5.47 | |||
| 0.01 | Bias | 0.97 | 2.10 × 10−04 | 9.48 × 10−05 | 2.22 | ||
| MSE | 0.97 | 2.04 × 10−07 | 6.04 × 10−08 | 3.37 | |||
| 1 | Bias | 0.64 | 2.37 × 10−07 | 2.06 × 10−08 | 11.51 | ||
| MSE | 0.77 | 1.11 × 10−12 | 1.68 × 10−14 | 65.95 | |||
| 0.01 | Bias | 0.69 | 4.85 × 10−07 | 2.81 × 10−08 | 17.29 | ||
| MSE | 0.81 | 2.85 × 10−12 | 1.92 × 10−14 | 148.22 |
Results shown for two pairs of Step 1 and Step 2 test statistics: Q.
Bias is computed as the average difference between the estimated single-marker post-hoc statistic and its expected value. MSE is computed as the average squared difference between the estimated single-marker post-hoc statistic and its expected value.
Computed as the percent of variants for which bias (or MSE, depending on the row of the table) decreased after implementing our bootstrap bias-correction strategy.
The median change in bias (or MSE) among variants that show an improvement after adjustment (i.e., bias (or MSE) is smaller after adjustment).
The median change in bias (or MSE) among variants that show a decline after adjustment (i.e., bias (or MSE) is larger after adjustment).
The ratio of the previous two columns.
Figure 2Distribution of estimated difference in minor allele frequencies before and after application of the post-hoc adjustment strategy. Panel (A) displays histograms of naive (light red) and bias-corrected (blue) differences in minor allele frequencies for each SNP in a single simulated gene. The dark pink color is the overlap of the two histograms. The black dashed line represents the expected difference in minor allele frequencies. Results are shown for post-hoc analyses conducted after conducting a burden test with α = 0.01 at Step 1. The top row of SNPs (1–5) are risk-increasing with relative risk of 8, and the bottom row of SNPs (6–10) are neutral. All SNPs are very rare (MAF = 0.0001), with the exception of SNP 6 which has minor allele frequency of 0.01. The Step 1 power of this gene was 11.7% across 10,000 replications. Panel (B) displays the relative improvement in bias and mean squared error of the bias-corrected post-hoc statistic for each SNP, where relative improvement is computed as the estimated bias (MSE) before adjustment divided by the estimated bias (MSE) after adjustment. Panel (C) illustrates the results from SNP6. In particular, we show results for the post-hoc difference in minor allele frequencies after conducting Q (top left) and Q (top right) tests with α = 0.01 at Step 1, as well as results for the post-hoc squared difference in minor allele frequencies after conducting Q (bottom left) and Q (bottom right) test with α = 0.01 at Step 1.
Improvement in bias and MSE of bias-adjusted statistics stratified by the power of the Step 1 test.
| 1 | Bias | 0−0.05 | 0.90 | 1.77 × 10−04 | 3.24 × 10−05 | 5.45 | ||
| 0.05−0.2 | 1.00 | 1.76 × 10−04 | – | – | ||||
| 0.2−0.5 | 1.00 | 1.52 × 10−04 | – | – | ||||
| 0.5−1 | 0.85 | 1.51 × 10−04 | 1.67 × 10−05 | 9.04 | ||||
| MSE | 0−0.05 | 0.99 | 1.89 × 10−07 | 1.07 × 10−09 | 177 | |||
| 0.05−0.2 | 0.96 | 6.97 × 10−08 | 1.13 × 10−08 | 6.16 | ||||
| 0.2−0.5 | 0.90 | 3.79 × 10−08 | 1.28 × 10−08 | 2.96 | ||||
| 0.5−1 | 0.60 | 4.66 × 10−08 | 3.52 × 10−08 | 1.32 | ||||
| 0.01 | Bias | 0−0.05 | 0.96 | 2.09 × 10−04 | 9.48 × 10−05 | 2.21 | ||
| 0.05−0.2 | 1.00 | 2.67 × 10−04 | – | – | ||||
| 0.2−0.5 | 1.00 | 1.83 × 10−04 | – | – | ||||
| 0.5−1 | – | – | – | – | ||||
| MSE | 0−0.05 | 0.96 | 2.25 × 10−07 | 6.04 × 10−08 | 3.72 | |||
| 0.05−0.2 | 1.00 | 2.15 × 10−07 | – | – | ||||
| 0.2−0.5 | 1.00 | 5.10 × 10−08 | – | – | ||||
| 0.5−1 | – | – | – | – | ||||
| 1 | Bias | 0−0.05 | 0.68 | 1.49 × 10−07 | 1.73 × 10−08 | 8.60 | ||
| 0.05−0.2 | 0.62 | 4.75 × 10−07 | 2.15 × 10−08 | 22.1 | ||||
| 0.2−0.5 | 0.55 | 2.43 × 10−07 | 2.06 × 10−08 | 11.8 | ||||
| 0.5−1 | – | – | – | – | ||||
| MSE | 0−0.05 | 0.74 | 9.76 × 10−13 | 1.37 × 10−14 | 71.4 | |||
| 0.05−0.2 | 0.81 | 2.45 × 10−12 | 1.87 × 10−14 | 131 | ||||
| 0.2−0.5 | 0.77 | 1.52 × 10−12 | 1.85 × 10−14 | 82.4 | ||||
| 0.5−1 | – | – | – | – | ||||
| 0.01 | Bias | 0−0.05 | 0.69 | 4.85 × 10−07 | 2.81 × 10−08 | 17.3 | ||
| 0.05−0.2 | – | – | – | – | ||||
| 0.2−0.5 | – | – | – | – | ||||
| 0.5−1 | – | – | – | – | ||||
| MSE | 0−0.05 | 0.81 | 2.85 × 10−12 | 1.92 × 10−14 | 148 | |||
| 0.05−0.2 | – | – | – | – | ||||
| 0.2−0.5 | – | – | – | – | ||||
| 0.5−1 | – | – | – | – |
Results shown for two pairs of Step 1 and Step 2 test statistics: Q.
Bias is computed as the average difference between the estimated single-marker post-hoc statistic and its expected value. MSE is computed as the average squared difference between the estimated single-marker post-hoc statistic and its expected value.
Computed as the percent of variants for which bias (or MSE, depending on the row of the table) decreased after implementing our bootstrap bias-correction strategy. Set to – if no step 1 tests had power in that range.
The median change in bias (or MSE) among variants that show an improvement after adjustment (i.e., bias (or MSE) is smaller after adjustment). Set to – if no tests in that range.
The median change in bias (or MSE) among variants that show a decline after adjustment (i.e., bias (or MSE) is larger after adjustment). Set to – if no tests in that range or no variants showed decline (i.e., % of improved variants is 1).
The ratio of the previous two columns. Set to – if either of previous two columns is –.
Figure 3Relationship between bias, minor allele frequency, and GBT power. These scatterplots show the bias of unadjusted (A) and adjusted (B) single-marker post-hoc statistics vs. the power of the GBT in which that variant is contained. All combinations of Step 1 test, Step 2 single-marker statistic, and significance level are shown for all SNPs and all simulation settings, so a total of 4,000 (=2*2*2*10*50) points are in each scatterplot. The points are colored by the MAF of the variant, with lighter colors corresponding to larger MAF.
Figure 4Percent of times that the top ranked SNP based is causal before and after adjustment. After conducting a GBT (Step 1) which yields a significant result, all SNPs in the gene are then ranked by either or , both before and after adjustment. The figure shows the percent of times the top-ranked SNP is causal when ranking is based on the adjusted statistic vs. the unadjusted statistic. Points are colored by whether or not the adjusted statistic provides “better” ranking results, where a “better” ranking result is one in which the top-ranked SNP is causal a higher proportion of the time (across the 10,000 simulation settings). The dashed line is y = x, so that points falling above the line are settings where the adjusted statistics are better (top ranking SNP is more likely to be causal after adjustment), and points falling below the line are settings where the adjusted statistics perform worse (top ranked SNP is less likely to be causal after adjustment). This figure depicts all 40 non-null simulations and all four combinations of Step 1 (Q or Q) and Step 2 ( or ) statistics.