| Literature DB >> 25727067 |
Amy V Spencer1, Angela Cox, Wei-Yu Lin, Douglas F Easton, Kyriaki Michailidou, Kevin Walters.
Abstract
Bayes factors (BFs) are becoming increasingly important tools in genetic association studies, partly because they provide a natural framework for including prior information. The Wakefield BF (WBF) approximation is easy to calculate and assumes a normal prior on the log odds ratio (logOR) with a mean of zero. However, the prior variance (W) must be specified. Because of the potentially high sensitivity of the WBF to the choice of W, we propose several new BF approximations with logOR ∼N(0,W), but allow W to take a probability distribution rather than a fixed value. We provide several prior distributions for W which lead to BFs that can be calculated easily in freely available software packages. These priors allow a wide range of densities for W and provide considerable flexibility. We examine some properties of the priors and BFs and show how to determine the most appropriate prior based on elicited quantiles of the prior odds ratio (OR). We show by simulation that our novel BFs have superior true-positive rates at low false-positive rates compared to those from both P-value and WBF analyses across a range of sample sizes and ORs. We give an example of utilizing our BFs to fine-map the CASP8 region using genotype data on approximately 46,000 breast cancer case and 43,000 healthy control samples from the Collaborative Oncological Gene-environment Study (COGS) Consortium, and compare the single-nucleotide polymorphism ranks to those obtained using WBFs and P-values from univariate logistic regression.Entities:
Keywords: elicitation; empirical; expert knowledge; filtering; fine mapping; flexibility; hyperparameter; sensitivity; single-nucleotide polymorphism
Mesh:
Year: 2015 PMID: 25727067 PMCID: PMC4406822 DOI: 10.1002/gepi.21891
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Prior densities for W
| Name of prior |
| Restrictions on hyperparameters |
|---|---|---|
| Power |
|
|
| Exponential |
|
|
| Hybrid |
|
|
| Reciprocal |
|
Proportional density functions for each of the four prior forms (applies for ).
Figure 1Densities of three families of tractable priors and one specific prior for . β1 is logOR with and a value of is used in all plots.
Prior distribution functions for W
| Type of prior |
| Limitations |
|---|---|---|
| Power |
|
|
|
|
| |
| Exponential |
|
|
| Hybrid |
|
|
| Reciprocal |
|
Distribution functions for each of the four prior forms.
Figure 2Prior densities of W (plots (a), (c), and (e)) and as a function of V (plots (b), (d), and (f)) for empirical forms of the prior (). Prior densities are given for minimum, median, and maximum values of V for SNPs with MAF>0.005 in a sample size of 20,000. is given over a range of V likely to been seen in sample sizes of 2,000 or greater with different values of the hyperparameters, where relevant.
TPRs (× 1, 000) from the simulated data analysis by sample size, OR, and FPR
| Sample size (SS) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SS = 2,000 | SS = 4,000 | SS = 20,000 | ||||||||||
| FPR (%) | FPR (%) | FPR (%) | ||||||||||
| Method (parameter values) | 5% | 10% | 15% | 20% | 5% | 10% | 15% | 20% | 1% | 5% | 10% | 15% |
| OR = 1.10 | ||||||||||||
|
| 72 | 118 | 163 | 259 | 151 | 249 | 320 | 384 | 202 | 497 | 677 | 786 |
| WBF ( | 48 |
|
|
| 136 |
|
|
|
|
|
|
|
| WBF ( |
|
|
| 173 |
| 221 | 242 | 259 | 173 | 433 | 530 | 582 |
| PPBF ( |
|
|
| 245 |
|
| 318 | 346 |
|
| 676 | 758 |
| HPBF ( |
|
|
| 248 |
|
| 308 | 334 |
|
| 655 | 707 |
|
| ||||||||||||
| OR = 1.14 | ||||||||||||
|
| 146 | 247 | 331 | 395 | 232 | 357 | 457 | 531 | 508 | 831 | 924 | 962 |
| WBF | 106 |
|
|
| 222 |
|
|
| 502 |
|
|
|
| WBF |
|
| 290 | 307 |
| 335 | 373 | 387 | 503 | 789 | 875 | 899 |
| PPBF |
|
|
| 372 |
|
|
| 486 |
|
|
| 956 |
| HPBF |
|
|
| 363 |
|
| 450 | 470 |
|
| 919 | 943 |
|
| ||||||||||||
| OR = 1.18 | ||||||||||||
|
| 178 | 286 | 382 | 454 | 323 | 475 | 551 | 619 | 753 | 950 | 984 | 994 |
| WBF | 134 |
|
|
| 290 |
|
|
| 730 |
|
|
|
| WBF |
|
| 340 | 353 |
| 452 | 478 | 495 | 751 | 947 | 980 | 988 |
| PPBF |
|
|
| 439 |
|
|
| 602 |
|
|
|
|
| HPBF |
|
|
| 434 |
|
|
| 582 |
|
|
|
|
|
| ||||||||||||
True‐positive rates (TPRs) multiplied by a thousand at the most relevant false‐positive rates (FPRs) for different filtering methods (PPBF, HPBF, Wakefield Bayes factors, and P‐values) applied to 1,000 simulated datasets with 2,871 SNPs. For PPBF and HPBF, the support is . The data were simulated using the LD structure of the CASP8 region for a scenario with a single causal SNP with an MAF of 0.08 for various sample sizes, odds ratios, and FPRs . Figures in bold are those that exceed the TPR obtained using P‐values.
Figure 3ROC curves showing the results of WBF, PPBF, HPBF, and P‐value filtering. (A) Shows the whole ROC space while (B) shows the ROC curve for false‐positive rates below 20%. We use and for the WBF analysis. The prior for the PPBF analysis has and puts most of the mass close to . The prior for the HPBF analysis has and and puts most of the weight close to . All filtering was carried out on 1,000 datasets simulated using the LD structure of the CASP8 region for a scenario with a single causal SNP that has an OR of 1.14, an MAF of 0.08, and a total sample size of 4,000.
Results of analysis of iCOGS data
| Ranking | |||||||
|---|---|---|---|---|---|---|---|
| WBF with | |||||||
| SNP number | OR (95% CI) | MAF | PPBF |
| 0.003 | 0.1 | PPBF |
| 980b | 1.048 (1.027, 1.071) | 0.294 | 1,387 | 1 | 1 | 1 | 1 |
| 1,027 | 1.046 (1.024, 1.068) | 0.285 | 664 | 2 | 2 | 2 | 2 |
| 992b | 1.045 (1.022, 1.067) | 0.287 | 334 | 3 | 3 | 3 | 3 |
| 909 | 1.043 (1.021, 1.065) | 0.287 | 234 | 9 | 4 | 6 | 4 |
| 878a | 1.081 (1.039, 1.125) | 0.061 | 228 | 12 | 13 | 4 | 5 |
| 1,272a | 1.075 (1.036, 1.116) | 0.071 | 217 | 14 | 11 | 5 | 6 |
| 950b | 1.043 (1.021, 1.065) | 0.286 | 217 | 10 | 5 | 7 | 7 |
| 838 | 1.041 (1.020, 1.062) | 0.338 | 213 | 5 | 6 | 10 | 8 |
| 960b | 1.043 (1.021, 1.065) | 0.285 | 213 | 7 | =7 | =8 | =9 |
| 961b | 1.043 (1.021, 1.065) | 0.285 | 213 | 8 | =7 | =8 | =9 |
| 985b | 1.043 (1.021, 1.066) | 0.286 | 206 | 4 | 9 | 11 | 11 |
| 837 | 1.042 (1.021, 1.064) | 0.299 | 200 | 6 | 10 | 13 | 12 |
| 907 | 1.042 (1.020, 1.064) | 0.287 | 167 | 11 | 12 | 16 | 13 |
| 896 | 1.042 (1.020, 1.064) | 0.287 | 166 | 13 | =14 | =17 | =14 |
| 912 | 1.042 (1.020, 1.064) | 0.287 | 166 | 15 | =14 | =17 | =14 |
| 956a, b | 1.052 (1.025, 1.080) | 0.170 | 159 | 16 | 16 | 15 | 16 |
| 681a | 1.074 (1.035, 1.116) | 0.069 | 149 | 17 | 19 | 14 | 17 |
| 1,004a, b | 1.051 (1.024, 1.078) | 0.173 | 124 | 18 | 18 | 20 | 18 |
| 885 | 1.041 (1.019 1.063) | 0.287 | 119 | 19 | 17 | 23 | 19 |
| 955a, b | 1.050 (1.023, 1.078) | 0.173 | 112 | 21 | 21 | 21 | 20 |
For these SNPs, the major allele is associated with a higher disease risk.
These SNPs were not genotyped but imputed. Top‐ranked SNPs in CASP8 region based on power prior Bayes factor (PPBF) approximation with hyperparameter and . Rankings using P‐value and Wakefield Bayes factor (WBF) are also included, as is the logistic regression estimate and 95% confidence interval (CI) of the odds ratio (OR) for each SNP. The genotype data for CASP8 region come from the iCOGS study and has a total sample size of 89,050 and 1,733 SNPs.
Results of subset analysis of iCOGS data
| Ranking | |||||||
|---|---|---|---|---|---|---|---|
| WBF with | |||||||
| SNP number | OR (95% CI) | MAF | PPBF value |
| 0.003 | 0.1 | PPBF |
| 822a | 1.514 (1.215, 1.886) | 0.037 | 45.1 | 1 | 28 | 1 | 1 |
| 807a | 1.520 (1.216, 1.900) | 0.036 | 42.2 | 2 | 35 | 2 | 2 |
| 820a | 1.515 (1.213, 1.893) | 0.036 | 40.0 | 3 | 37 | 3 | 3 |
| 824a | 1.514 (1.212, 1.891) | 0.036 | 39.7 | 4 | 39 | 4 | 4 |
| 868a | 1.508 (1.209, 1.881) | 0.038 | 38.5 | 5 | 38 | 5 | 5 |
| 378a | 1.431 (1.174, 1.745) | 0.046 | 33.3 | 7 | 16 | 6 | 6 |
| 858a | 1.495 (1.198, 1.866) | 0.036 | 30.6 | 6 | 47 | 7 | 7 |
| 379a | 1.409 (1.162, 1.709) | 0.047 | 29.2 | 8 | 15 | 8 | 8 |
| 854 | 1.470 (1.181, 1.829) | 0.036 | 24.0 | 9 | 56 | 9 | 9 |
| 346 | 1.262 (1.099, 1.449) | 0.093 | 23.7 | 22 | 2 | 27 | 10 |
| 845a | 1.469 (1.180, 1.829) | 0.037 | 23.6 | 11 | 57 | 10 | 11 |
| 879a | 1.480 (1.184, 1.851) | 0.037 | 23.2 | 10 | 64 | 11 | 12 |
| 823a | 1.480 (1.183, 1.851) | 0.036 | 22.9 | 12 | 65 | 12 | 13 |
| 339a | 1.266 (1.099, 1.459) | 0.091 | 21.7 | 28 | 3 | 37 | 14 |
| 705a | 1.439 (1.161, 1.761) | 0.043 | 20.5 | 15 | 53 | 15 | 15 |
| 752a | 1.449 (1.168, 1.798) | 0.039 | 20.2 | 14 | 61 | 14 | 16 |
| 900a | 1.475 (1.177, 1.849) | 0.036 | 19.7 | 13 | 89 | 13 | 17 |
| 698a | 1.454 (1.167, 1.812) | 0.036 | 18.3 | 16 | 79 | 16 | 18 |
| 699a | 1.454 (1.167, 1.812) | 0.036 | 18.3 | 17 | 80 | 17 | 19 |
| 700a | 1.432 (1.159, 1.771) | 0.038 | 18.2 | 20 | 63 | 19 | 20 |
These SNPs were imputed. Ranks of the top‐ranked SNPs in CASP8 region based on the power prior Bayes factor (PPBF) approximation with hyperparameter and . Rankings using P‐value and Wakefield Bayes factor (WBF) are also included, as is the logistic regression estimate and 95% confidence interval (CI) of the odds ratio (OR) for each SNP. Values of the Bayes factors for the PPBF are also provided. The genotype data for CASP8 region come from a subset of the iCOGS study and has a total sample size of 5,238 and 1,733 SNPs.