| Literature DB >> 24490143 |
Lewei Duan1, Duncan C Thomas1.
Abstract
A variety of methods have been proposed for studying the association of multiple genes thought to be involved in a common pathway for a particular disease. Here, we present an extension of a Bayesian hierarchical modeling strategy that allows for multiple SNPs within each gene, with external prior information at either the SNP or gene level. The model involves variable selection at the SNP level through latent indicator variables and Bayesian shrinkage at the gene level towards a prior mean vector and covariance matrix that depend on external information. The entire model is fitted using Markov chain Monte Carlo methods. Simulation studies show that the approach is capable of recovering many of the truly causal SNPs and genes, depending upon their frequency and size of their effects. The method is applied to data on 504 SNPs in 38 candidate genes involved in DNA damage response in the WECARE study of second breast cancers in relation to radiotherapy exposure.Entities:
Year: 2013 PMID: 24490143 PMCID: PMC3892936 DOI: 10.1155/2013/406217
Source DB: PubMed Journal: Int J Genomics ISSN: 2314-436X Impact factor: 2.326
Figure 1Directed acyclic graph describing the structure of the model. Boxes describe observed data; circles represent latent variables or model parameters. Single arrows denote stochastic relationships, while double arrows denote deterministic relationships. The first rectangle illustrates the relations of disease status and genes at the subject (i) level; the second rectangle illustrated the relations of external information and first level coefficient β at the gene (g) level; the third rectangle illustrates the relations of weighted SNP effects and gene burden index at SNP (s) level.
Figure 2Graphical representation of the A matrix derived from the Gene Ontology. The lower levels of the graph indicate sets of genes with high correlations across the 860 GO terms.
Figure 3Posterior distributions of numbers of genes (a) and numbers of SNPs (b) included in the analysis of the WECARE study data.
Figure 4Posterior probabilities (a) and Bayes factors for gene inclusion (b) and SNP inclusion (c) in the model for the real WECARE study data.
Association between selected variants in DNA-damage response genes and CBC risk in the WECARE study.
| Gene | rs number |
Homozygous; | Heterozygous |
Homozygous; | lnRRc | Bayes factors | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Case | Control | Case | Control | Case | Control | (95% CI) |
| BF SNP | BF gene | ||
|
| rs1800057a | 680 | 1322 | 28 | 76 | 0 | 1 | −0.47 (−0.95, −0.01) | 0.046 | 4.58 | 1.41 |
| rs4987951a | 674 | 1278 | 34 | 121 | 0 | 0 | −0.66 (−1.32, −0.25) | 0.002 | 9.04 | ||
|
| |||||||||||
|
| rs6005861a,b | 680 | 1311 | 27 | 86 | 1 | 2 | −0.40 (−0.85, 0.06) | 0.086 | 7 | 0.36 |
|
| |||||||||||
|
| rs4713354a,b | 535 | 1116 | 157 | 267 | 16 | 16 | 0.47 (0.26, 0.68) | <0.001 | 9.72 | 20.71 |
| rs2269705a | 589 | 1220 | 113 | 175 | 6 | 4 | 0.50 (0.25, 0.76) | <0.001 | 15.91 | ||
|
| |||||||||||
|
| rs13447682a,b | 690 | 1343 | 18 | 54 | 0 | 2 | −0.56 (−1.12, −0.01) | 0.046 | 5.7 | 0.52 |
|
| |||||||||||
|
| rs14448b | 640 | 1215 | 60 | 171 | 8 | 13 | −0.11 (−0.40, 0.18) | 0.447 | 0.2 | 2.62 |
| rs9297757a,b | 651 | 1233 | 148 | 52 | 5 | 18 | −0.26 (−0.58, 0.05) | 0.097 | 27.33 | ||
| rs3736640a,b | 676 | 1288 | 32 | 107 | 0 | 4 | −0.64 (−1.27, −0.21) | 0.003 | 4.14 | ||
|
| |||||||||||
|
| rs1801320a | 646 | 1209 | 58 | 186 | 4 | 4 | −0.31 (−0.62, 0.00) | 0.048 | 21.38 | 3.51 |
aSNPs identified by Model I based on Bayes factors. Only those SNPs with BF exceeding 3 are listed.
bSNPs identified by Brooks et al. 2012 [25] based on per-allele RR. Only those SNPs with P value for trend <0.05 are listed.
clnRR: regression coefficients of each SNP from simple logistic regression, adjusted for age, menarche, menopause, family history, pregnancy, histology, treatment, the FGFR2 GWAS-identified SNP, and deleterious variants in ATM, BRCA1, BRCA2, CHECK2, and offset term.
d P values associated with Wald-z test for lnRR estimates from simple logistic regression adjusted for fixed covariants listed in d.
(a)
| SNPTrue | Average countsa | Posterior SNP inclusionb | BFc | ||||
|---|---|---|---|---|---|---|---|
| −1 | 0 | 1 | >3 | >20 | >150 | ||
| −1 | 17.5 | 24.14% | 71.75% | 4.11% | 25.54% | 17.49% | 12.46% |
| 0 | 348.1 | 3.19% | 93.76% | 3.05% | 3.90% | 0.68% | 0.19% |
| 1 | 18.4 | 3.88% | 70.19% | 25.94% | 28.15% | 19.13% | 15.54% |
(b)
| GeneTrue | Average countsd | Posterior gene inclusione | BFf | |||
|---|---|---|---|---|---|---|
| Not included | Included | >3 | >20 | >150 | ||
| Not included | 13.9 | 55.95% | 44.05% | 3.67% | 0.58% | 0.15% |
| Included | 24.1 | 36.55% | 63.45% | 27.71% | 20.01% | 17.14% |
aAverage counts of simulated SNP inclusion indicators based on 10 × 10 replicates.
bAverage row percentages of the distribution of posterior SNP inclusion indicators based on 10 × 10 replicates.
cAverage row percentages of the SNP counts among the range of the indicated Bayes factors based on 10 × 10 replicates.
dAverage counts of simulated gene inclusion indicators based on 10 × 10 replicates.
eAverage row percentages of the distribution of posterior gene inclusions based on 10 × 10 replicates.
fAverage row percentages of the gene counts among the range of the indicated Bayes factors based on 10 × 10 replicates.