| Literature DB >> 23762022 |
Nanye Long1, Samuel P Dickson, Jessica M Maia, Hee Shin Kim, Qianqian Zhu, Andrew S Allen.
Abstract
Although many methods are available to test sequence variants for association with complex diseases and traits, methods that specifically seek to identify causal variants are less developed. Here we develop and evaluate a Bayesian hierarchical regression method that incorporates prior information on the likelihood of variant causality through weighting of variant effects. By simulation studies using both simulated and real sequence variants, we compared a standard single variant test for analyzing variant-disease association with the proposed method using different weighting schemes. We found that by leveraging linkage disequilibrium of variants with known GWAS signals and sequence conservation (phastCons), the proposed method provides a powerful approach for detecting causal variants while controlling false positives.Entities:
Mesh:
Year: 2013 PMID: 23762022 PMCID: PMC3675126 DOI: 10.1371/journal.pcbi.1003093
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Workflow of the simulation study.
Before carrying out these steps, a large pool of haplotypes (n = 15,000) was simulated. Given GRR and MAF of causal variants, cases and controls were simulated by randomly choosing pairs of haplotypes and calculating the risk of each individual to probabilistically assign phenotype.
Power of different methods in the simulation analysis.
| Causal MAF | Number of causal variants detected | Single variant test | Bayesian w/o wt | Bayesian wt = | Bayesian wt = phastCons | Bayesian wt = |
| 0.2–1% | At least one | 0.120 | 0.010 | 0.085 | 0.220 | 0.670 |
| At least two | 0.005 | 0 | 0 | 0.070 | 0.310 | |
| All three | 0 | 0 | 0 | 0.020 | 0.050 | |
| Average | 0.042 | 0.003 | 0.028 | 0.103 | 0.343 | |
| 1–2% | At least one | 0.445 | 0.270 | 0.420 | 0.845 | 0.945 |
| At least two | 0.085 | 0.020 | 0.075 | 0.590 | 0.685 | |
| All three | 0 | 0 | 0.015 | 0.260 | 0.285 | |
| Average | 0.177 | 0.097 | 0.170 | 0.565 | 0.638 | |
| 2–3% | At least one | 0.660 | 0.700 | 0.795 | 0.990 | 1 |
| At least two | 0.180 | 0.220 | 0420 | 0.875 | 0.895 | |
| All three | 0.010 | 0.040 | 0.130 | 0.515 | 0.535 | |
| Average | 0.283 | 0.320 | 0.448 | 0.793 | 0.810 | |
| 3–4% | At least one | 0.795 | 0.885 | 0.935 | 1 | 1 |
| At least two | 0.290 | 0.510 | 0.625 | 0.950 | 0.940 | |
| All three | 0.030 | 0.100 | 0.145 | 0.610 | 0.570 | |
| Average | 0.372 | 0.498 | 0.568 | 0.853 | 0.837 | |
| 4–5% | At least one | 0.835 | 0.980 | 0.990 | 0.995 | 0.995 |
| At least two | 0.380 | 0.785 | 0.785 | 0.975 | 0.950 | |
| All three | 0.060 | 0.285 | 0.320 | 0.720 | 0.660 | |
| Average | 0.425 | 0.683 | 0.698 | 0.897 | 0.868 |
Results were based on 200 replicates. In each replicate, 500 cases and 500 controls were used to identify three causal variants from a total of ∼1000 variants, with each method being evaluated. We assumed causal variants have a constant GRR of 3 and render disease susceptibility under a dominant model.
The proportion of replicate simulations in which at least one causal variant was detected.
The proportion of replicate simulations in which at least two causal variants were detected.
The proportion of replicate simulations in which all three causal variants were detected.
The average power of three causal variants.
Figure 2Distributions of three informative weights (r, phastCons and r×phastCons) for causal variants and non-causal variants on the causal and null chromosomes in the simulation study.
In each MAF range, weights were collected from 200 replicates, and weights in each replicate were scaled by dividing each by the maximal value so as to bound final weight between 0 and 1.
NOD2 and ITPA causal variants in the exome sequencing data.
| Gene | Causal variant | Chromosome | Build 37 position (bp) | MAF | LD ( | Composite phastCons score |
|
| rs2066844 | 16 | 50745926 | 5.29% | 0.39 | 0.32 (0.16, 0.24, 0.95) |
| rs2066845 | 16 | 50756540 | 1.10% | 0.16 | 0.99 (1, 1, 0.96) | |
|
| rs1127354 | 20 | 3193842 | 7.55% | 0.34 | 0.99 (1, 1, 0.99) |
| rs7270101 | 20 | 3193893 | 13.85% | 0.58 | 0.17(0.016, 0, 0.95) |
The composite phastCons is weighted sum of vertebrate cons, mammal cons and primate cons (shown in parenthesis), with weight 1/2, 1/3 and 1/6, respectively.
Figure 3Causal variant detection in the exome sequencing data analysis.
(A): NOD2 data; (B): ITPA data. The two top panels are from one replicate of the simulation. For single variant test, SNP effect size was represented by −log10 of p value from logistic regression model; for Bayesian liability model, it was represented by the standardized effect estimated at each SNP. Red dots indicate two causal variants (see Table 1 for more information). Blue vertical bars show values of SNP weights (r × phastCons). The horizontal dashed line indicates effect size at the significance threshold (permutation p value = 0.01). The bottom panel shows proportion of simulations where a variant was detected (i.e., significant at permutation p = 0.01 level). Causal variants are marked in red color.