| Literature DB >> 18654633 |
Clive J Hoggart1, John C Whittaker, Maria De Iorio, David J Balding.
Abstract
Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.Entities:
Mesh:
Year: 2008 PMID: 18654633 PMCID: PMC2464715 DOI: 10.1371/journal.pgen.1000130
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Logarithms of NEG and DE densities.
Fixed to have the same density at the origin.
Main simulation study: the results shown are summed over the 500 datasets each with 6 causal variants; a causal variant is “tagged” if ≥1 selected SNP has r 2>0.05 with it.
| Method | SNPs selected | Causal SNPs tagged | False positives minimum separation (Kb) | |||
| 0 | 20 | 40 | 100 | |||
| NEG | 2097 | 1576 | 368 | 368 | 368 | 366 |
| DE | 2622 | 1501 | 297 | 277 | 276 | 271 |
| ATT | 6810 | 1554 | 696 | 536 | 486 | 441 |
Figure 2Main simulation study.
Histograms of the number of selected SNPs tagging (at r 2>0.05) each causal SNP for (A) NEG and (B) ATT analyses.
Main simulation study: numbers of causal SNPs tagged, out of the 500 for each MAF and risk ratio.
| Method | MAF and allelic risk ratio | |||||
| 15% | 5% | 2% | ||||
| 1.4 | 1.5 | 1.8 | 2.2 | 2.5 | 3.0 | |
| NEG | 252 | 360 | 209 | 370 | 146 | 239 |
| DE | 233 | 347 | 194 | 366 | 135 | 227 |
| ATT | 244 | 353 | 209 | 370 | 143 | 235 |
Null simulation: empirical per-SNP type-I error rates from 1,000 permutations of case-control labels of 2 K individuals genotyped at 80 K SNPs.
| Method | Error rate (per million SNPs) | |
| Additive only | Additive, dominant and recessive terms | |
| NEG | 6.44 | 12.8 |
| DE | 6.39 | 12.7 |
| ATT | 6.48 | - |
In each case the nominal per-SNP type-I error rate for the additive-only model was 10−5 ( = 10 per million SNPs).
Figure 3GWA simulation.
(A) locations of the ten causal variants (vertical blue line) on the 20 Mb chromosome; also shown are the SNPs selected by NEG (red dots), and the SNPs with ATT p-value 5×10−7 (black dots) plotted against −log10 (p-value). (B) and (C) show zooms of two sub-intervals of (A).
Figure 4Re-sequencing simulation.
Histograms of the maximum r 2 for each selected SNP with a causal variant for (A) NEG and (B) ATT analyses.
SNPs included in the best-fitting model for association with type 2 diabetes from the NEG analysis of Human Hap300 BeadArray genotype data that were validated in a second stage analysis [19].
| SNP | Chromosome | Position | Closest gene | Model |
| rs13266634 | 8 | 118,253,964 | SLC30A8 | Dominant |
| rs7923837 | 10 | 94,471,897 | HHEX | Additive |
| rs7903146 | 10 | 114,748,339 | TCFL2 | Additive |
| rs7480010 | 11 | 42,203,294 | LOC387761 | Dominant |
| rs729287 | 11 | 44,236,666 | EXT2 | Dominant |