| Literature DB >> 21958005 |
Zhe Liu1, Yuanyuan Shen, Jurg Ott.
Abstract
BACKGROUND: In genome-wide association studies, it is widely accepted that multilocus methods are more powerful than testing single-nucleotide polymorphisms (SNPs) one at a time. Among statistical approaches considering many predictors simultaneously, scan statistics are an effective tool for detecting susceptibility genomic regions and mapping disease genes. In this study, inspired by the idea of scan statistics, we propose a novel sliding window-based method for identifying a parsimonious subset of contiguous SNPs that best predict disease status.Entities:
Mesh:
Year: 2011 PMID: 21958005 PMCID: PMC3224109 DOI: 10.1186/1471-2105-12-384
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Summary of steps in GRLR method. Step 1: Construct search region; Step 2: SNP truncation; Step 3: Apply forward selection to the region, fitting the GRLR model in each iteration; Step 4: Calculate the p-value for final model and switch to next region.
Power calculation under Scenario (A)
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
| I | 0.196 | 0.206 | 0.198 | 0.210 | 0.226 | 0.196 | ||
| 0.638 | 0.620 | 0.602 | 0.688 | 0.670 | 0.650 | |||
| 0.792 | 0.734 | 0.730 | 0.790 | 0.828 | 0.826 | |||
| 0.184 | 0.216 | 0.194 | 0.194 | 0.204 | 0.206 | |||
| 0.712 | 0.642 | 0.622 | 0.690 | 0.726 | 0.704 | |||
| 0.860 | 0.794 | 0.788 | 0.902 | 0.888 | 0.870 | |||
| II | 0.008 | 0.078 | 0.070 | 0.032 | 0.052 | 0.124 | ||
| 0.122 | 0.266 | 0.252 | 0.340 | 0.392 | 0.558 | |||
| 0.354 | 0.304 | 0.306 | 0.526 | 0.624 | 0.768 | |||
| 0.018 | 0.088 | 0.064 | 0.066 | 0.078 | 0.126 | |||
| 0.202 | 0.296 | 0.300 | 0.364 | 0.458 | 0.608 | |||
| 0.400 | 0.370 | 0.360 | 0.628 | 0.690 | 0.806 | |||
Results of power simulations for six methods under Scenario (A) (two causal SNPs). Two definitions of testing success, Strategy I (requiring that at least one of the two SNPs is significant) and Strategy II (requiring significance at both SNPs), are applied in comparisons; The number of replication runs is 500 for all the simulations. Corr = correlation coefficient; MAF = minor allele frequency.
Power calculation under Scenario (B)
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
| I | 0.302 | 0.346 | 0.320 | 0.330 | 0.388 | 0.310 | ||
| 0.740 | 0.716 | 0.732 | 0.826 | 0.832 | 0.696 | |||
| 0.906 | 0.888 | 0.900 | 0.932 | 0.940 | 0.872 | |||
| 0.342 | 0.422 | 0.422 | 0.436 | 0.404 | 0.368 | |||
| 0.814 | 0.846 | 0.832 | 0.838 | 0.854 | 0.784 | |||
| 0.948 | 0.942 | 0.946 | 0.960 | 0.966 | 0.900 | |||
| II | 0.040 | 0.200 | 0.180 | 0.158 | 0.202 | 0.270 | ||
| 0.280 | 0.478 | 0.490 | 0.630 | 0.704 | 0.680 | |||
| 0.544 | 0.622 | 0.612 | 0.804 | 0.862 | 0.856 | |||
| 0.058 | 0.256 | 0.264 | 0.210 | 0.238 | 0.314 | |||
| 0.440 | 0.580 | 0.582 | 0.646 | 0.730 | 0.758 | |||
| 0.700 | 0.672 | 0.678 | 0.858 | 0.914 | 0.890 | |||
| III | 0.002 | 0.022 | 0.016 | 0.044 | 0.066 | 0.100 | ||
| 0.030 | 0.082 | 0.066 | 0.340 | 0.434 | 0.460 | |||
| 0.140 | 0.084 | 0.076 | 0.558 | 0.640 | 0.656 | |||
| 0.004 | 0.056 | 0.050 | 0.062 | 0.080 | 0.122 | |||
| 0.124 | 0.088 | 0.090 | 0.324 | 0.422 | 0.490 | |||
| 0.292 | 0.094 | 0.092 | 0.550 | 0.672 | 0.704 | |||
Results of power simulations for six methods under Scenario (B) (three causal SNPs). Three definitions of testing success, Strategy I (requiring that at least one of the three SNPs is significant), Strategy II (requiring that at least two of the three SNPs are significant), and Strategy III (requiring significance at all three SNPs), are applied in comparisons; The number of replication runs is 500 for all the simulations. Corr = correlation coefficient; MAF = minor allele frequency.
Figure 2Impact of different thresholds on power. The effects of different truncation thresholds t = 0.05 or 0.10, and different sizes of regions 11, 21, or 41 (i.e. u = 5, 10, or 20) on power (y-axis). Only Scenario (B) (three causal SNPs) is considered, and the minor allele frequency is set to 0.50.
Results for heroin addiction data using logistic regression
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 1 | 1 | 189929064 | 2.36 | 2.23E-04 | 1.0000 | 0.573 | |
| 2 | 20 | 7174248 | 2.04 | 4.20E-04 | 1.0000 | 0.841 | |
| 3 | 13 | 58410147 | 2.23 | 4.44E-04 | 1.0000 | 0.866 | |
| 4 | 13 | 58410016 | 2.26 | 4.54E-04 | 1.0000 | 0.876 | |
| 5 | 4 | 99955054 | 0.48 | 6.68E-04 | 1.0000 | 0.955 | |
| 6 | 11 | 42426136 | 2.33 | 8.68E-04 | 1.0000 | 0.984 | |
| 7 | 5 | 158958294 | 2.19 | 8.70E-04 | 1.0000 | 0.984 | |
| 8 | 17 | 12558426 | 2.26 | 9.37E-04 | 1.0000 | 0.986 | |
| 9 | 9 | 15340199 | 0.50 | 9.48E-04 | 1.0000 | 0.986 | |
| 10 | 4 | 126424833 | 0.17 | 1.34E-03 | 1.0000 | 0.998 |
Analysis results for the published dataset on heroin addiction using logistic regression. Odds ratios, original p-values, p-values after Bonferroni correction, and empirical p-values (derived from 1000 permutations) of the top 10 SNPs are listed.
Results for both datasets using GRLR
|
|
|
|
|
|
|---|---|---|---|---|
| heroin addiction | 1 | {rs1408830, rs965972} | 1.26 × 10-07 | 0.027 |
| AMD HK | 10 | {rs2736911, rs10490924, rs763720} | 1.32 × 10-11 | 0.000 |
Analysis results for two published datasets using our GRLR method. Genomic search regions are constructed based on the SNPs whose single-locus p-values are no larger than a fixed threshold 0.05 for the heroin addiction data, and 0.01 for the AMD Hong Kong data; the truncation threshold within each search region is set to 0.05; the maximal length on each side of the central SNP is set to 10; the tuning parameter λ in GRLR model is set to 1.0. Empirical p-values are derived from 1000 permutations.
Results for AMD Hong Kong data using logistic regression
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 1 | 10 | 124204438 | 0.26 | 1.25E-09 | 0.0001 | 0.001 | |
| 2 | 8 | 54292668 | 0.18 | 5.45E-06 | 0.4428 | 0.062 | |
| 3 | 7 | 4340896 | 0.36 | 3.88E-05 | 1.0000 | 0.550 | |
| 4 | 13 | 69496879 | 0.42 | 5.86E-05 | 1.0000 | 0.738 | |
| 5 | 8 | 53838124 | 3.09 | 7.49E-05 | 1.0000 | 0.836 | |
| 6 | 4 | 182400252 | 3.21 | 8.73E-05 | 1.0000 | 0.888 | |
| 7 | 1 | 53333649 | 0.41 | 1.10E-04 | 1.0000 | 0.940 | |
| 8 | 5 | 33931083 | 3.03 | 1.17E-04 | 1.0000 | 0.943 | |
| 9 | 5 | 65314237 | 0.45 | 1.55E-04 | 1.0000 | 0.980 | |
| 10 | 20 | 95685 | 0.42 | 1.66E-04 | 1.0000 | 0.987 |
Analysis results for the published dataset on AMD Hong Kong using logistic regression. Odds ratios, original p-values, p-values after Bonferroni correction, and empirical p-values (derived from 1000 permutations) of the top 10 SNPs are listed.