| Literature DB >> 18466479 |
Jing Li1.
Abstract
Large-scale genome-wide association studies are increasingly common, due in large part to recent advances in genotyping technology. Despite a dramatic drop in genotyping costs, it is still too expensive to genotype thousands of individuals for hundreds of thousands single-nucleotide polymorphisms (SNPs) for large-scale whole-genome association studies for many researchers. A two-stage design has been a promising alternative: in the first stage, only a small fraction of samples are genotyped and tested using a dense set of SNPs, and only a small subset of markers that show moderate associations with the disease will be genotyped in the second stage. In this report, I developed an approach to select and prioritize SNPs for association studies with a two-stage or multi-stage design. In the first stage, the method not only evaluates associations of SNPs with the disease of interest, it also explicitly explores correlations among SNPs. I applied the approach on the simulated Genetic Analysis Workshop 15 Problem 3 data sets, which have modeled the complex genetic architecture of rheumatoid arthritis. Results show that the method can greatly reduce the number of SNPs required in later stage(s) without sacrificing mapping precision.Entities:
Year: 2007 PMID: 18466479 PMCID: PMC2367522 DOI: 10.1186/1753-6561-1-s1-s136
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Mean (SD) number of positive SNPs at significance level α and fraction of samples f in stage one and for sample sizes 100, 200, and 300 for each method (one-stage design, two-stage design, and two-stage design with clustering)
| 100 | 200 | 300 | ||||||||
| 1 stage | 2 stage | 2 stage-c | 1 stage | 2 stage | 2 stage-c | 1 stage | 2 stage | 2 stage-c | ||
| 0.05 | 0.3 | 15.8(± 0.42) | 6.8(± 0.20) | 27.6(± 0.54) | 10.7(± 0.25) | 39.7(± 0.62) | 15.0(± 0.29) | |||
| 0.4 | 17.9(± 0.44) | 16.1(± 0.43) | 6.5(± 0.19) | 31.1(± 0.62) | 28.3(± 0.58) | 11.3(± 0.25) | 44.9(± 0.71) | 40.1(± 0.63) | 15.3(± 0.27) | |
| 0.5 | 16.1(± 0.42) | 6.9(± 0.18) | 28.2(± 0.56) | 11.4(± 0.26) | 40.5(± 0.64) | 15.7(± 0.27) | ||||
| 0.01 | 0.3 | 14.0(± 0.41) | 6.0(± 0.20) | 24.6(± 0.48) | 9.8(± 0.23) | 35.6(± 0.57) | 13.5(± 0.27) | |||
| 0.4 | 15.7(± 0.41) | 14.2(± 0.38) | 6.0(± 0.18) | 27.6(± 0.54) | 25.0(± 0.47) | 10.4(± 0.22) | 39.4(± 0.62) | 36.0(± 0.54) | 14.0(± 0.26) | |
| 0.5 | 14.2(± 0.39) | 6.2(± 0.16) | 25.1(± 0.48) | 10.3(± 0.23) | 36.1(± 0.56) | 14.3(± 0.25) | ||||
Mean (SD) significance levels (-log10(p)) for each design for fraction of samples f in stage one and for sample sizes 100, 200 and 300 for each method
| 100 | 200 | 300 | |||||||
| 1 stage | 2 stage | 2 stage-c | 1 stage | 2 stage | 2 stage-c | 1 stage | 2 stage | 2 stage-c | |
| 0.3 | 10.8(± 0.29) | 11.0(± 0.30) | 25.1(± 0.40) | 25.3(± 0.41) | 39.7(± 0.47) | 39.9(± 0.49) | |||
| 0.4 | 11.6(± 0.30) | 10.7(± 0.29) | 11.0(± 0.30) | 26.1(± 0.40) | 25.1(± 0.40) | 25.4(± 0.40) | 40.8(± 0.47) | 39.7(± 0.47) | 40.1(± 0.47) |
| 0.5 | 10.8(± 0.30) | 11.1(± 0.30) | 25.2(± 0.40) | 25.5(± 0.40) | 39.7(± 0.47) | 40.1(± 0.47) | |||
Mean (SD) distances of the predicted locus from the disease locus (kbp) for fraction of samples f in stage one and for sample sizes 100, 200 and 300 for each method
| 100 | 200 | 300 | |||||||
| 1 stage | 2 stage | 2 stage-c | 1 stage | 2 stage | 2 stage-c | 1 stage | 2 stage | 2 stage-c | |
| 0.3 | 68(± 5.8) | 85(± 5.8) | 85(± 6.0) | 90(± 5.9) | 82(± 5.7) | 83(± 5.7) | |||
| 0.4 | 68(± 5.7) | 67(± 5.7) | 75(± 5.8) | 86(± 6.0) | 87(± 6.0) | 91(± 6.0) | 81(± 5.6) | 81(± 5.6) | 82(± 5.7) |
| 0.5 | 69(± 5.8) | 73(± 5.9) | 86(± 6.0) | 90(± 6.0) | 80(± 5.7) | 81(± 5.7) | |||