| Literature DB >> 20018005 |
Yuanjia Wang1, Nanshi Sha, Yixin Fang.
Abstract
Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data from a GWA while controlling for collinearity and overfitting in a high dimensional predictor space, we propose a variable selection procedure using Bayesian logistic regression. We explored a connection between Bayesian regression with certain priors and L1 and L2 penalized logistic regression. After analyzing large number of SNPs simultaneously in a Bayesian regression, we selected important SNPs for further consideration. With much fewer SNPs of interest, problems of multiple comparisons and collinearity are less severe. We conducted simulation studies to examine probability of correctly selecting disease contributing SNPs and applied developed methods to analyze Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium data.Entities:
Year: 2009 PMID: 20018005 PMCID: PMC2795912 DOI: 10.1186/1753-6561-3-s7-s16
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1.
Bayesian logistic regression of 2705 SNPs on chromosome 9
| Additive model | Dominant model | |||||
|---|---|---|---|---|---|---|
| Rank | SNP | Position | abs ( | SNP | Position | abs ( |
| 1 | rs1407869 | 101353456 | 8.02 | rs7864653 | 100860678 | 7.16 |
| 2 | rs4437724 | 113188649 | 7.76 | rs10989329 | 100794635 | 7.16 |
| 3 | rs10120479 | 111426956 | 6.97 | rs4237190 | 97922972 | 6.58 |
| 4 | rs9697192 | 116879138 | 6.97 | rs6478644 | 123942505 | 6.42 |
| 5 | rs3824535 | 122763410 | 6.90 | rs1407869 | 101353456 | 6.03 |
| 6 | rs10491578 | 116463442 | 6.39 | rs2229594 | 101204219 | 5.97 |
| 7 | rs10121681 | 111718477 | 6.37 | rs10820559 | 103716588 | 5.87 |
| 8 | rs694428 | 117692812 | 6.13 | rs1536705 | 126851425 | 5.86 |
| 9 | rs2900180 | 120785936 | 5.96 | rs2564362 | 123365200 | 5.74 |
| 10 | rs11243755 | 132287257 | 5.96 | rs10978456 | 106155366 | 5.73 |
Single-SNP analysis of the top 300 selected SNPs
| Additive model | Dominant model | |||||
|---|---|---|---|---|---|---|
| Rank | SNP | Position | SNP | Position | ||
| 1 | rs2900180 | 120785936 | 6.24 × 10-9 | rs2900180 | 120785936 | 6.24 × 10-9 |
| 2 | rs1953126 | 120720054 | 2.76 × 10-8 | rs11787779 | 114820894 | 6.89 × 10-5 |
| 3 | rs942152 | 121031239 | 3.94 × 10-6 | rs17148869 | 132180015 | 1.00 × 10-4 |
| 4 | rs7858974 | 91959665 | 1.26 × 10-5 | rs7862566 | 117133575 | 2.00 × 10-4 |
| 5 | rs11787779 | 114820894 | 6.89 × 10-5 | rs4978629 | 107708375 | 3.00 × 10-4 |
| 6 | rs6478300 | 117115323 | 7.12 × 10-5 | rs4978890 | 110046695 | 3.00 × 10-4 |
| 7 | rs989980 | 106309592 | 1.00 × 10-4 | rs1333914 | 119662788 | 4.00 × 10-4 |
| 8 | rs17148869 | 132180015 | 1.00 × 10-4 | rs1332408 | 122271713 | 4.00 × 10-4 |
| 9 | rs7862566 | 117133575 | 2.00 × 10-4 | rs2095069 | 94782055 | 0.001 |
| 10 | rs945246 | 119953710 | 2.00 × 10-4 | rs4743420 | 100567644 | 0.0011 |
Figure 2.