| Literature DB >> 30693553 |
Brunilda Balliu1, Jeanine J Houwing-Duistermaat2, Stefan Böhringer3.
Abstract
Marginal tests based on individual SNPs are routinely used in genetic association studies. Studies have shown that haplotype-based methods may provide more power in disease mapping than methods based on single markers when, for example, multiple disease-susceptibility variants occur within the same gene. A limitation of haplotype-based methods is that the number of parameters increases exponentially with the number of SNPs, inducing a commensurate increase in the degrees of freedom and weakening the power to detect associations. To address this limitation, we introduce a hierarchical linkage disequilibrium model for disease mapping, based on a reparametrization of the multinomial haplotype distribution, where every parameter corresponds to the cumulant of each possible subset of a set of loci. This hierarchy present in the parameters enables us to employ flexible testing strategies over a range of parameter sets: from standard single SNP analyses through the full haplotype distribution tests, reducing degrees of freedom and increasing the power to detect associations. We show via extensive simulations that our approach maintains the type I error at nominal level and has increased power under many realistic scenarios, as compared to single SNP and standard haplotype-based studies. To evaluate the performance of our proposed methodology in real data, we analyze genome-wide data from the Wellcome Trust Case-Control Consortium.Entities:
Keywords: cis interactions; genome-wide association study; haplotype association study; linkage disequilibrium
Mesh:
Year: 2019 PMID: 30693553 PMCID: PMC6637384 DOI: 10.1002/bimj.201800053
Source DB: PubMed Journal: Biom J ISSN: 0323-3847 Impact factor: 2.207
Result on type I error rate (%, ) and power (%, ) for the scenarios simulated based on parameters from significant findings from the WTCCC data
| Test | df | Triplet 1 | Triplet 2 | Triplet 3 | Triplet 4 | |
|---|---|---|---|---|---|---|
| Type I Error Rate (%) | ||||||
|
|
| 3 | 5.13 | 5.31 | 5.21 | 5.06 |
|
| 6 | 4.94 | 4.82 | 4.98 | 4.91 | |
|
| 7 | 5.28 | 4.81 | 4.95 | 5.45 | |
|
| 8.90 | 8.86 | 8.83 | 9.13 | ||
|
| 6.70 | 6.79 | 6.63 | 6.55 | ||
|
|
| 1 | 5.27 | 4.91 | 5.08 | 5.33 |
|
| 4 | 5.53 | 5.56 | 5.02 | 5.83 | |
| Single SNP | SNP 1 | 1 | 5.05 | 4.89 | 4.83 | 5.12 |
| SNP 2 | 1 | 5.15 | 4.56 | 4.87 | 4.82 | |
| SNP 3 | 1 | 5.33 | 5.14 | 5.28 | 4.91 | |
|
| 14.68 | 13.92 | 13.56 | 13.98 | ||
|
| 7* | 5.65 | 5.14 | 4.98 | 4.89 | |
| Power (%) | ||||||
|
|
| 3 | 65.49 | 74.96 | 88.58 | 69.65 |
|
| 6 | 69.00 | 69.39 | 94.60 | 97.57 | |
|
| 7 | 65.37 | 65.96 | 97.18 | 97.15 | |
|
| 73.58 | 77.74 | 97.54 | 97.79 | ||
|
| 71.45 | 76.83 | 96.48 | 97.07 | ||
|
|
| 1 | 0.03 | 0.02 | 5.29 | 24.49 |
|
| 4 | 0.00 | 0.00 | 0.21 | 0.00 | |
| Single SNP | SNP 1 | 1 | 21.45 | 22.64 | 11.49 | 9.76 |
| SNP 2 | 1 | 0.00 | 17.76 | 1.51 | 10.63 | |
| SNP 3 | 1 | 16.43 | 0.00 | 44.57 | 0.01 | |
|
| 33.99 | 35.67 | 51.78 | 19.06 | ||
|
| 7* | 68.43 | 69.12 | 97.28 | 97.19 | |
Parameter values for each scenario are listed in Table A.1
*df might be different because the package automatically groups rare haplotypes.
Estimated haplotype frequencies in the cases (Ca), controls (Co). and pool (P) of cases and controls samples for each of the four triplets identified from the WTCCC data analysis (Burton et al., 2007)
| Triplet 1 | Triplet 2 | Triplet 3 | Triplet 4 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | Ca | Co | P | Ca | Co | P | Ca | Co | P | Ca | Co | |
| θ000 | 0.596 | 0.569 | 0.613 | 0.499 | 0.479 | 0.512 | 0.477 | 0.464 | 0.486 | 0.358 | 0.340 | 0.370 |
| θ001 | 0.059 | 0.063 | 0.056 | 0.200 | 0.189 | 0.208 | 0.135 | 0.172 | 0.110 | 0.088 | 0.115 | 0.071 |
| θ010 | 0.104 | 0.098 | 0.107 | 0.015 | 0.015 | 0.015 | 0.029 | 0.031 | 0.028 | 0.240 | 0.228 | 0.247 |
| θ011 | 0.003 | 0.006 | 0.002 | 0.047 | 0.054 | 0.043 | 0.010 | 0.008 | 0.011 | 0.101 | 0.084 | 0.112 |
| θ100 | 0.192 | 0.211 | 0.180 | 0.147 | 0.165 | 0.135 | 0.115 | 0.112 | 0.115 | 0.148 | 0.166 | 0.137 |
| θ101 | 0.028 | 0.037 | 0.022 | 0.062 | 0.059 | 0.064 | 0.067 | 0.060 | 0.071 | 0.004 | 0.004 | 0.004 |
| θ110 | 0.017 | 0.014 | 0.019 | 0.025 | 0.033 | 0.020 | 0.132 | 0.116 | 0.142 | 0.054 | 0.058 | 0.052 |
| θ111 | 0.002 | 0.002 | 0.001 | 0.004 | 0.006 | 0.003 | 0.036 | 0.035 | 0.037 | 0.006 | 0.006 | 0.006 |
Linkage disequilibrium parameters in the cases (Ca), controls (Co), and pool (P) of cases and controls samples for each of the four triplets identified from the WTCCC data analysis
| Triplet 1 | Triplet 2 | Triplet 3 | Triplet 4 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | Ca | Co | P | Ca | Co | P | Ca | Co | P | Ca | Co | |
| δ1 | 0.239 | 0.264 | 0.223 | 0.239 | 0.264 | 0.223 | 0.350 | 0.324 | 0.366 | 0.213 | 0.234 | 0.199 |
| δ2 | 0.126 | 0.120 | 0.130 | 0.091 | 0.108 | 0.081 | 0.207 | 0.191 | 0.218 | 0.401 | 0.376 | 0.417 |
| δ3 | 0.091 | 0.108 | 0.081 | 0.314 | 0.308 | 0.319 | 0.247 | 0.276 | 0.229 | 0.199 | 0.209 | 0.193 |
| δ12 | −0.374 | −0.502 | −0.279 | 0.111 | 0.135 | 0.081 | 0.712 | 0.696 | 0.720 | −0.297 | −0.268 | −0.306 |
| δ13 | 0.111 | 0.135 | 0.081 | −0.112 | −0.199 | −0.046 | 0.103 | 0.033 | 0.170 | −0.759 | −0.790 | −0.739 |
| δ23 | −0.544 | −0.382 | −0.663 | 0.363 | 0.351 | 0.377 | −0.105 | −0.174 | −0.040 | 0.227 | 0.088 | 0.335 |
| δ123 | 0.172 | 0.084 | 0.229 | −0.697 | −0.665 | −0.716 | −0.160 | −0.119 | −0.190 | 0.203 | 0.521 | −0.126 |
Corresponding haplotype frequencies for SNPs in linkage equilibrium (Scenario 1), and SNPs in LD (Scenario 2)
| Haplotype | Scenario 1 | Scenario 2 |
|---|---|---|
| θ0000 | 0.292 | 0.416 |
| θ0001 | 0.239 | 0.183 |
| θ0010 | 0.135 | 0.066 |
| θ0011 | 0.111 | 0.126 |
| θ0100 | 0.065 | 0.022 |
| θ0101 | 0.054 | 0.042 |
| θ0110 | 0.030 | 0.029 |
| θ0111 | 0.025 | 0.065 |
| θ1000 | 0.015 | 0.001 |
| θ1001 | 0.013 | 0.008 |
| θ1010 | 0.007 | 0.006 |
| θ1011 | 0.006 | 0.010 |
| θ1100 | 0.003 | 0.007 |
| θ1101 | 0.003 | 0.005 |
| θ1110 | 0.002 | 0.003 |
| θ1111 | 0.001 | 0.011 |
Result on type I error rate (%, ) for scenarios simulated under different disease generating models
| Type I Error Rate | ||||
|---|---|---|---|---|
| Test | df | Scenario 1 | Scenario 2 | |
|
|
| 4 | 5.15 | 5.27 |
|
| 10 | 5.14 | 5.24 | |
|
| 14 | 4.78 | 4.44 | |
|
| 15 | 4.59 | 4.31 | |
|
| 7.03 | 7.14 | ||
|
| 10.39 | 10.22 | ||
|
|
| 1 | 4.54 | 5.78 |
|
| 5 | 4.35 | 4.64 | |
|
| 11 | 4.45 | 4.13 | |
| Single SNP | SNP 1 | 1 | 4.82 | 5.21 |
| SNP 2 | 1 | 5.05 | 5.37 | |
| SNP 3 | 1 | 4.93 | 4.91 | |
| SNP 4 | 1 | 4.90 | 5.03 | |
| MinPvalSingle | 18.38 | 18.47 | ||
|
| 15* | 6.55 | 4.68 | |
|
| Pair 1 | 5 | 5.29 | 5.72 |
| Pair 2 | 5 | 5.01 | 4.78 | |
|
| 10.05 | 10.20 | ||
Parameter values for each scenario are listed in Table A.2
* df might be different because the package automatically groups rare haplotypes.
Power results for Scenario 1 (%, )
| Power | ||||||||
|---|---|---|---|---|---|---|---|---|
| Test | df | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | |
|
|
| 4 | 90.24 | 90.84 | 69.78 | 65.94 | 29.53 | 13.83 |
|
| 10 | 73.79 | 85.76 | 88.70 | 62.80 | 20.19 | 10.97 | |
|
| 14 | 63.16 | 77.91 | 81.83 | 51.59 | 14.95 | 9.74 | |
|
| 15 | 60.76 | 75.91 | 80.26 | 48.94 | 13.90 | 9.08 | |
|
| 90.29 | 91.86 | 87.75 | 70.22 | 31.21 | 16.45 | ||
|
| 90.37 | 92.27 | 89.61 | 71.85 | 32.15 | 17.84 | ||
|
|
| 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
|
| 11 | 0.00 | 0.00 | 1.66 | 0.00 | 0.00 | 0.01 | |
| Single SNP | SNP 1 | 1 | 0.03 | 0.00 | 4.43 | 0.00 | 0.00 | 0.10 |
| SNP 2 | 1 | 3.04 | 79.20 | 17.49 | 30.50 | 1.94 | 0.14 | |
| SNP 3 | 1 | 10.74 | 11.10 | 3.21 | 15.10 | 1.30 | 0.25 | |
| SNP 4 | 1 | 15.57 | 0.00 | 0.00 | 0.00 | 0.90 | 0.34 | |
| MinPvalSingle | 27.04 | 81.25 | 23.53 | 40.65 | 4.09 | 0.83 | ||
|
| 15* | 64.21 | 78.82 | 81.58 | 54.40 | 18.17 | 22.29 | |
|
| Pair 1 | 5 | 2.10 | 53.55 | 46.70 | 11.35 | 0.42 | 0.85 |
| Pair 2 | 5 | 47.56 | 2.74 | 0.47 | 4.01 | 5.49 | 0.89 | |
|
| 48.67 | 54.78 | 46.93 | 14.86 | 5.89 | 1.72 | ||
Nonzero parameters for Model 1: , ; Model 2: , ; Model 3: , , ; Model 4: , ; Model 5: , ; Model 6: , ,
*df might be different because the package automatically groups rare haplotypes.
Power results of each test on Scenario 2 (%, )
| Power | ||||||||
|---|---|---|---|---|---|---|---|---|
| Test | df | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | |
|
|
| 4 | 72.95 | 94.70 | 81.76 | 78.77 | 86.70 | 79.31 |
|
| 10 | 47.81 | 89.14 | 80.27 | 65.76 | 74.49 | 71.28 | |
|
| 14 | 35.48 | 81.36 | 69.80 | 52.33 | 64.64 | 62.47 | |
|
| 15 | 33.19 | 79.69 | 68.04 | 50.33 | 62.67 | 60.75 | |
|
| 73.00 | 95.12 | 85.06 | 79.64 | 87.31 | 81.42 | ||
|
| 73.10 | 95.24 | 85.93 | 80.16 | 87.67 | 82.20 | ||
|
|
| 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| 5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
|
| 11 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | |
| Single SNP | SNP 1 | 1 | 1.61 | 0.00 | 31.24 | 0.01 | 0.00 | 17.36 |
| SNP 2 | 1 | 38.07 | 86.59 | 49.42 | 66.91 | 51.38 | 16.20 | |
| SNP 3 | 1 | 10.72 | 66.42 | 11.92 | 38.04 | 40.71 | 15.93 | |
| SNP 4 | 1 | 6.93 | 0.11 | 0.00 | 0.00 | 23.63 | 10.14 | |
| MinPvalSingle | 45.84 | 92.79 | 64.03 | 75.17 | 70.60 | 42.64 | ||
|
| 15* | 36.84 | 80.80 | 72.23 | 54.34 | 65.60 | 66.56 | |
|
| Pair 1 | 5 | 30.97 | 65.51 | 74.01 | 38.73 | 25.93 | 37.72 |
| Pair 2 | 5 | 13.82 | 39.86 | 4.60 | 15.72 | 50.10 | 20.96 | |
|
| 38.24 | 74.62 | 74.43 | 45.44 | 58.92 | 47.57 | ||
Nonzero parameters for Model 1: , ; Model 2: , ; Model 3: , , ; Model 4: , ; Model 5: , ; Model 6: , ,
*df might be different because the package automatically groups rare haplotypes.
Results on real data
| SNPs in the triplet |
|
|
| Single SNP tests | |||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||||
| SNP rs6920220 | |||||||||
| rs11961920 | rs11970411 | 2.6e‐08 | 7.9e‐08 | 2.3e‐07 | 0.89 | 0.21 | 5e‐06 | 0.16 | 1.2e‐05 |
| rs11970411 | rs674451 | 8.5e‐09 | 9.9e‐08 | 2.8e‐07 | 0.81 | 0.56 | 5e‐06 | 1.2e‐05 | 0.25 |
| SNP rs12723859 | |||||||||
| rs12739961 | rs1113523 | 1.8e‐10 | 4.4e‐11 | 5.6e‐12 | 7.78e‐03 | 8.50e‐04 | 3e‐05 | 0.0013 | 2.2e‐07 |
| rs12739961 | rs17013326 | 2.4e‐10 | 7.3e‐11 | 7.9e‐12 | 6.40e‐03 | 9.29e‐04 | 3e‐05 | 0.0013 | 3.1e‐07 |
| SNP rs12205634 | |||||||||
| rs411136 | rs210137 | 1.9e‐08 | 4.8e‐12 | 1.2e‐11 | 0.41 | 2.26e‐05 | 5.2e‐05 | 4.3e‐05 | 6.9e‐02 |
| rs411136 | rs210138 | 2.1e‐08 | 1.1e‐11 | 2.1e‐11 | 3.7e‐05 | 5.1e‐05 | 5.2e‐05 | 4.3e‐05 | 6.9e‐02 |
Summary of GWAS analyses by all test for haplotypes spanning 2 or 3 loci (number after ':')
| Test:2 | Infl.:2 | #:2 | #T:2 | Test:3 | Infl.:3 | #:3 | #T:3 |
|---|---|---|---|---|---|---|---|
|
| 1.06 | 277 | 22 | – | – | – | – |
|
| 1.43 | 8690 | 33 | – | – | – | – |
|
| 1.37 | 6836 | 84 |
| 1.53 | 10824 | 230 |
|
| 1.18 | 889 | 59 |
| 1.38 | 1047 | 26 |
| – | – | – | – |
| 1.27 | 4201 | 35 |
|
| 1.36 | 7151 | 17 |
| 1.29 | 4247 | 29 |
| – | – | – | – |
| 1.17 | 3945 | 33 |
|
| 1.20 | 6662 | 26 |
| 1.40 | 3465 | 341 |
|
| 1.97 | 7156 | 18 |
| 2.66 | 4613 | 22 |
|
| 1.57 | 7145 | 18 |
| 1.98 | 4608 | 24 |
Test is the name of the test. Infl. denotes the inflation factor, # denotes the number p‐values . #T denotes number of significant P‐values filtered by the tower criterion (see text). Single SNP is given for reference and refers to a SNP‐by‐SNP logistic regression
Figure 1QQ plot for all two locus tests, including the marginal test. P‐values were inflation‐corrected before plotting (see text). Blue lines represent point‐wise confidence limits for ordered P‐values
Figure 2Manhattan plot of P‐values for all test when haplotypes span two loci. Colors are applied to P‐values . P‐values were filtered by the tower criterion (see text). P‐values were truncated
Positions (chr, pos) and P‐values for tests for which at least three tests reached a P‐value when haplotypes span two loci
| chr | pos | Kim et al | Full | HLDmin | iterHLD | haplo.stats |
|
| Singe SNP |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 31008523 | 1.8e‐249 | 3.3e‐309 | 3.3e‐309 | 6.5e‐309 | 8.6e‐171 | 7.1e‐311 | – | – |
| 1 | 31032093 | 1.7e‐28 | 2.9e‐33 | 2.9e‐33 | 5.8e‐33 | – | 7.4e‐35 | – | – |
| 1 | 31033624 | 1.5e‐185 | 1.1e‐224 | 1.1e‐224 | 2.2e‐224 | 1.6e‐08 | 2.7e‐226 | – | – |
| 1 | 90440693 | 1.8e‐32 | – | 1.2e‐33 | 2.3e‐33 | 6.0e‐33 | – | – | – |
| 1 | 90481133 | 9.6e‐185 | – | 2.1e‐224 | 4.2e‐224 | 5.4e‐165 | – | – | – |
| 5 | 59580410 | 2.2e‐09 | 2.6e‐10 | 2.6e‐10 | 5.2e‐10 | – | – | – | – |
| 5 | 122166044 | – | – | 4.1e‐151 | 8.2e‐151 | 2.1e‐119 | – | – | – |
| 5 | 153251949 | – | 3.3e‐189 | 3.3e‐189 | 6.5e‐189 | 2.3e‐140 | – | – | – |
| 6 | 26564048 | 1.7e‐251 | 5.5e‐302 | 5.5e‐302 | 1.1e‐301 | 5.5e‐172 | – | – | – |
| 7 | 88709037 | – | 3.4e‐140 | – | – | 6.7e‐145 | 4.5e‐90 | – | – |
| 7 | 149498823 | 2.9e‐100 | 3.2e‐09 | 6.3e‐10 | 6.3e‐10 | 1.0e‐11 | – | 6.3e‐10 | – |
| 7 | 149504002 | – | 4.6e‐11 | 1.1e‐11 | 1.1e‐11 | 1.0e‐13 | – | 1.1e‐11 | 2.5e‐13 |
| 12 | 68141467 | – | 8.8e‐79 | 8.8e‐79 | 1.8e‐78 | 1.1e‐68 | – | – | – |
| 12 | 99342932 | 5.1e‐56 | 1.0e‐59 | 1.0e‐59 | 2.0e‐59 | 1.2e‐54 | – | – | – |
| 13 | 82025812 | 8.4e‐246 | 1.1e‐320 | 1.1e‐320 | 2.1e‐320 | – | 1.5e‐323 | – | – |
| 16 | 60188415 | 8.5e‐10 | 8.0e‐11 | 8.0e‐11 | 1.6e‐10 | 1.3e‐10 | – | – | – |
| 17 | 66951514 | – | 1.1e‐35 | 1.1e‐35 | 2.2e‐35 | – | – | – | – |
| 18 | 56526010 | – | 2.5e‐162 | 2.5e‐162 | 5.0e‐162 | 3.3e‐126 | 9.9e‐165 | – | – |
| 19 | 63603580 | 4.5e‐17 | 6.6e‐19 | 6.6e‐19 | 1.3e‐18 | 3.1e‐18 | 8.9e‐21 | – | – |
P‐values were filtered by the tower criterion (see text)