| Literature DB >> 25519327 |
Ian Johnston1, Luis E Carvalho1.
Abstract
The primary goal of genome-wide association studies is to determine which genetic markers are associated with genetic traits, most commonly human diseases. As a result of the "large p, small n" nature of genome-wide association study data sets, and especially because of the collinearity due to linkage disequilibrium, multivariate regression results in an ill-posed problem. To overcome these obstacles, we propose preprocessing single-nucleotide polymorphisms to adjust for linkage disequilibrium, and a novel Bayesian statistical model that exploits a hierarchical structure between single-nucleotide polymorphisms and genes. We obtain posterior samples using a hybrid Metropolis-within-Gibbs sampler, and further conduct inference on single-nucleotide polymorphism and gene associations using centroid estimation. Finally, we illustrate the proposed model and estimation procedure and discuss results obtained on the data provided for the Genetic Analysis Workshop 18.Entities:
Year: 2014 PMID: 25519327 PMCID: PMC4143727 DOI: 10.1186/1753-6561-8-S1-S45
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Top 5 SNPs for original raw (normal text) and latent genotypes (bold)
| SNP | Position | MAF | SNP PPA | Gene | Gene PPA |
|---|---|---|---|---|---|
| rs17688430 | 62458083 | 0.16 | 0.95 | 0.012 | |
| rs7616789 | 27024158 | 0.23 | 0.73 | -- | |
| rs1565471 | 72736592 | 0.43 | 0.70 | -- | |
| rs3773282 | 13630307 | 0.29 | 0.58 | 0.006 | |
| rs13068005 | 192388678 | 0.47 | 0.50 | 0.022 | |
Figure 1Posterior probability of association (PPA) of SNPs on chromosome 3. The top 10 highest PPA have opaque dots (genotypes: raw in red, latent in blue).
Figure 2Expected values of the posterior distributions of , , and α. Histograms of estimates across all windows (genotypes: raw on top, latent on bottom).