| Literature DB >> 23394771 |
Minghui Wang1, Lin Wang, Ning Jiang, Tianye Jia, Zewei Luo.
Abstract
BACKGROUND: The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case-control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23394771 PMCID: PMC3626840 DOI: 10.1186/1471-2164-14-88
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Conditional probability distributions
| a. | ||||||||
| | | | | | | |||
| 2 | (1- | (1- | 2 | (1- | ||||
| where | ||||||||
| b. | ||||||||
| | | | | | | |||
| 2 | (1- | (1- | 2 | (1- | ||||
| where | ||||||||
| c. | ||||||||
| | |
Cases
| | |
Controls
| | ||
| | ||||||||
| (1- | (1- | (1- | ||||||
| (1- | (1- | (1- | ||||||
| (1- | (1- | (1- | ||||||
Conditional probability distribution of (a) marker genotypes on a given disease genotype, (b) disease genotypes on a given marker genotype and (c) marker genotypes given a genotype at the disease locus under the penetrance model of the disease gene in case/control samples. fi is the penetrance that an individual in the population is affected with disease given its genotype at the disease locus is i (i = 1, 2 and 3 for genotypes AA, Aa and aa respectively).
Figure 1Genome-wide association scan. Graphic presenting association results from (a) stage I, (b) stage II and (c) two-stage combined case and control samples. The analysis with each of the three datasets was done using Method 1 (black circles), 2 (red circles) and 3 (blue circles) accordingly. The red horizontal dashed lines indicate the Bonferroni significance threshold of P value 1.1 × 10-7 (a) and 1.5 × 10-4 (b and c). The triangle at the bottom of (a) is the estimated linkage disequilibrium structure for the 44 most significant SNPs listed in Table 2. The diamonds and squares in (a) illustrates the SNPs at which the bootstrap posterior probability for genetic association are either > 80% or within 60 ~ 80%.
Summary of top associations from stage I dataset
| 1p13.2-13.3 | rs17654531 | - | 1.9 × 10-9 | 3.2 × 10-6 | 1.2 × 10-5 | 37 | 22 | 14 |
| rs10857899 | 328 | 2.7 × 10-8 | 3.1 × 10-6 | 3.1 × 10-6 | 57 | 25 | 27 | |
| 2p23.3 | rs7564397 | - | 9.7 × 10-8 | 0.013 | 0.033 | 55 | 0 | 0 |
| 2q21.2 | rs1474406 | - | 4.3 × 10-8 | 2.3 × 10-3 | 0.001 | 57 | 1 | 3 |
| 2q36.1 | rs1447108 | - | 5.5 × 10-8 | 2.5 × 10-4 | 4.4 × 10-4 | 59 | 4 | 3 |
| 3p24.3 | rs1605527 | - | 2.0 × 10-8 | 1.0 × 10-4 | 9.4 × 10-5 | 53 | 9 | 10 |
| 4p15.2 | rs6820719 | - | 1.6 × 10-9 | 0.23 | 0.30 | 74 | 0 | 0 |
| rs7676830 | 23 | 8.6 × 10-10 | 0.12 | 0.15 | 77 | 0 | 0 | |
| rs12649499 | 11 | 4.8 × 10-10 | 0.20 | 0.26 | 77 | 0 | 0 | |
| 4q21 | rs11931074 | - | 3.9 × 10-8 | 5.1 × 10-8 | 4.8 × 10-8 | 56 | 54 | 54 |
| rs356220 | 2 | 7.7 × 10-11 | 3.4 × 10-8 | 7.0 × 10-8 | 81 | 56 | 52 | |
| rs3857059 | 34 | 5.3 × 10-8 | 4.0 × 10-8 | 3.6 × 10-8 | 56 | 55 | 56 | |
| rs2736990 | 3 | 6.3 × 10-12 | 2.9 × 10-9 | 5.7 × 10-9 | 88 | 71 | 67 | |
| 6q27 | rs2072638 | - | 1.1 × 10-11 | 0.014 | 0.012 | 86 | 0 | 0 |
| 7p14-p13 | rs859522 | - | 1.8 × 10-8 | 9.7 × 10-6 | 3.4 × 10-5 | 62 | 21 | 14 |
| 7q21 | rs3779331 | - | 6.6 × 10-8 | 0.028 | 0.01 | 56 | 0 | 0 |
| 7q21.11 | rs10246477 | - | 9.3 × 10-8 | 2.3 × 10-5 | 5.3 × 10-5 | 56 | 13 | 10 |
| 8p23.2 | rs7013027 | - | 5.8 × 10-8 | 4.3 × 10-6 | 1.9 × 10-6 | 56 | 23 | 29 |
| rs4875773 | 63 | 1.6 × 10-8 | 0.02 | 0.044 | 63 | 0 | 0 | |
| 8p22 | rs7828611 | - | 8.4 × 10-8 | 1.2 × 10-4 | 6.2 × 10-4 | 55 | 6 | 3 |
| rs2736050 | 1 | 9.9 × 10-10 | 1.0 × 10-5 | 2.0 × 10-4 | 74 | 18 | 5 | |
| rs2009817 | 3 | 2.0 × 10-9 | 1.3 × 10-5 | 2.1 × 10-4 | 72 | 16 | 5 | |
| 8q24.23-24.3 | rs4556079 | - | 4.8 × 10-8 | 5.0 × 10-6 | 4.8 × 10-6 | 60 | 20 | 22 |
| rs11781101 | 14 | 7.3 × 10-8 | 5.4 × 10-6 | 5.3 × 10-6 | 56 | 21 | 22 | |
| rs7004938 | 12 | 3.1 × 10-8 | 3.0 × 10-6 | 3.0 × 10-6 | 59 | 24 | 25 | |
| rs11783351 | 1 | 7.7 × 10-8 | 5.0 × 10-6 | 5.5 × 10-6 | 53 | 21 | 21 | |
| 9q21.31 | rs2378554 | - | 6.6 × 10-8 | 2.0 × 10-6 | 2.9 × 10-5 | 54 | 29 | 13 |
| 10p11.21 | rs2492448 | - | 3.8 × 10-8 | 1.6 × 10-6 | 3.8 × 10-6 | 61 | 29 | 24 |
| rs11591754 | 12 | 4.8 × 10-10 | 2.5 × 10-7 | 1.7 × 10-6 | 80 | 43 | 30 | |
| rs7923172 | 102 | 7.0 × 10-8 | 1.1 × 10-5 | 1.4 × 10-5 | 54 | 17 | 16 | |
| rs4934704 | 23 | 7.3 × 10-8 | 1.2 × 10-5 | 1.5 × 10-5 | 54 | 17 | 16 | |
| rs10827492 | 97 | 9.7 × 10-8 | 1.3 × 10-5 | 1.7 × 10-5 | 52 | 16 | 16 | |
| 10q24.3 | rs17115100 | - | 2.7 × 10-8 | 6.9 × 10-6 | 2.5 × 10-5 | 37 | 19 | 13 |
| 11p15.2 | rs11605276 | - | 3.4 × 10-11 | 0.079 | 0.19 | 86 | 0 | 0 |
| rs10500796 | 45 | 1.9 × 10-8 | 0.18 | 0.30 | 61 | 0 | 0 | |
| 11q13 | rs1726764 | - | 6.6 × 10-8 | 0.088 | 0.20 | 53 | 0 | 0 |
| 12p13 | rs10849446 | - | 6.7 × 10-9 | 1.1 × 10-4 | 3.7 × 10-5 | 68 | 6 | 12 |
| 16p13.3 | rs11648673 | - | 5.5 × 10-8 | 1.3 × 10-5 | 4.8 × 10-7 | 56 | 15 | 38 |
| 17q21 | rs169201 | - | 1.0 × 10-7 | 6.5 × 10-6 | 1.2 × 10-7 | 57 | 19 | 49 |
| rs199533 | 39 | 4.1 × 10-8 | 2.8 × 10-6 | 5.0 × 10-8 | 60 | 24 | 55 | |
| 17q24.3 | rs558076 | - | 6.6 × 10-8 | 1.0 × 10-4 | 2.5 × 10-5 | 57 | 7 | 14 |
| rs817097 | 42 | 5.0 × 10-8 | 8.1 × 10-6 | 6.2 × 10-6 | 56 | 18 | 18 | |
| 20p12.1 | rs6041636 | - | 9.9 × 10-9 | 0.16 | 0.24 | 66 | 0 | 0 |
| 21q22.3 | rs2070535 | - | 5.0 × 10-8 | 0.060 | 0.096 | 54 | 0 | 0 |
Significance and bootstrap posterior probabilities (BPP) for the 44 significant SNPs detected by Method 1 (M 1) from stage I dataset. Shadowed are the regions at which the genetic association was tested by Method 2 (M2) and Method 3 (M 3) at the same significance level. *Distance (kb) from previous significant SNP in the same chromosome region.
Figure 2Significance of Parkinson’s disease candidate genes. The most significant SNP within ±2.5 Mb chromosome regions surrounding each of 25 Parkinson’s disease (PD) candidate genes. In parentheses is the physical distance (Mb) of the SNP to the corresponding PD candidate gene. P values are calculated from analysis of stage I dataset with Method 1 (square) and 2 (up triangle) and 3 (down triangle), and presented in the color bar depicting varying levels of significance probability. Note some data points are overlapped. n refers as to estimate of the number of false discoveries for a given P value.
Parameters and results of scheme a simulation
| 1 | 0.5 | 0.5 | 0 | - | 0.004 ± 0.012 | 1.9 ± 2.5 | 6.9 | 1.0 ± 1.3 | 4.2 |
| 2 | 0.3 | 0.7 | 0 | - | 0.005 ± 0.011 | 2.0 ± 2.8 | 7.3 | 1.0 ± 1.4 | 4.5 |
| 3 | 0.7 | 0.3 | 0 | - | 0.002 ± 0.011 | 1.9 ± 2.7 | 6.7 | 1.0 ± 1.5 | 5.0 |
| 4 | 0.5 | 0.5 | 0.15 | 0.50 ± 0.05 | 0.148 ± 0.015 | 184.4 ± 42.8 | 100 | 73.3 ± 14.0 | 100 |
| 5 | 0.5 | 0.5 | 0.10 | 0.50 ± 0.09 | 0.097 ± 0.018 | 73.9 ± 26.5 | 99.7 | 33.3 ± 10.6 | 96.6 |
| 6 | 0.5 | 0.5 | 0.05 | 0.50 ± 0.20 | 0.043 ± 0.020 | 18.1 ± 12.0 | 36.8 | 8.8 ± 5.6 | 10.8 |
| 7 | 0.3 | 0.7 | 0.07 | 0.72 ± 0.12 | 0.064 ± 0.026 | 68.4 ± 25.4 | 99.6 | 29.6 ± 10.2 | 91.5 |
| 8 | 0.3 | 0.7 | 0.05 | 0.70 ± 0.15 | 0.047 ± 0.023 | 33.2 ± 17.6 | 77.3 | 15.1 ± 7.5 | 38.2 |
| 9 | 0.7 | 0.3 | -0.07 | 0.28 ± 0.14 | -0.062 ± 0.028 | 54.8 ± 23.4 | 96.8 | 26.3 ± 9.6 | 85.2 |
| 10 | 0.7 | 0.3 | -0.05 | 0.31 ± 0.20 | -0.042 ± 0.024 | 27.8 ± 15.6 | 66.1 | 13.7 ± 6.9 | 31.0 |
Population genetic parameters for 10 simulated populations and statistical inference of model parameters from 200 cases and 200 controls repeatedly sampled from the simulation populations. p and q are allelic frequencies at the marker and disease loci, D is the coefficient of linkage disequilibrium (LD) between the two loci. Means and standard deviations (s.d.) of the model parameters, q and D, and χ2test statistic were calculated from 1000 repeated samples. ρ (%) is the proportion in 1000 repeats in which the association test surpassed the threshold of P-value at 0.05 when LD is absent and the Bonferroni threshold of P-value at 5 × 10-5 when LD is present.
Parameters and results of scheme b simulation
| 1 | 0.40 | 0.10 | 0.00 | 0.70 | 0.10 | 0.00 | 0.1 | 0.0 | 0.0 | 1.6 | 0.0 | 0.0 | 1.2 | 0.0 | 25.3 |
| 2 | 0.45 | 0.10 | 0.00 | 0.70 | 0.10 | 0.00 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.6 | 0.0 | 12.6 |
| 3 | 0.50 | 0.10 | 0.00 | 0.70 | 0.10 | 0.00 | 0.3 | 0.0 | 0.0 | 1.4 | 0.0 | 0.0 | 1.2 | 0.0 | 3.7 |
| 4 | 0.55 | 0.10 | 0.00 | 0.70 | 0.10 | 0.00 | 0.2 | 0.0 | 0.0 | 2.1 | 0.0 | 0.0 | 1.1 | 0.0 | 0.9 |
| 5 | 0.60 | 0.10 | 0.00 | 0.70 | 0.10 | 0.00 | 0.0 | 0.0 | 0.0 | 1.1 | 0.0 | 0.0 | 1.0 | 0.0 | 0.3 |
| 6 | 0.65 | 0.10 | 0.00 | 0.70 | 0.10 | 0.00 | 0.1 | 0.1 | 0.1 | 0.9 | 0.0 | 0.0 | 0.5 | 0.0 | 0.0 |
| 7 | 0.40 | 0.10 | 0.00 | 0.50 | 0.10 | 0.02 | 0.1 | 0.0 | 0.0 | 94.3 | 44.8 | 45.6 | 91.1 | 2.9 | 50.8 |
| 8 | 0.45 | 0.10 | 0.00 | 0.50 | 0.10 | 0.02 | 0.0 | 0.0 | 0.0 | 93.4 | 45.7 | 47.2 | 90.8 | 1.4 | 28.4 |
| 9 | 0.40 | 0.10 | 0.02 | 0.50 | 0.10 | 0.00 | 99.5 | 93.9 | 94.7 | 1.1 | 0.0 | 0.0 | 99.4 | 70.1 | 90.0 |
| 10 | 0.45 | 0.10 | 0.02 | 0.50 | 0.10 | 0.00 | 99.7 | 95.4 | 95.5 | 1.1 | 0.0 | 0.0 | 99.3 | 69.3 | 77.4 |
| 11 | 0.40 | 0.10 | 0.02 | 0.50 | 0.10 | 0.02 | 99.6 | 95.0 | 95.1 | 93.2 | 43.7 | 45.7 | 100.0 | 99.7 | 100.0 |
| 12 | 0.45 | 0.10 | 0.02 | 0.50 | 0.10 | 0.02 | 99.6 | 95.2 | 95.6 | 93.1 | 47.5 | 49.0 | 100.0 | 99.7 | 100.0 |
| 13 | 0.40 | 0.10 | 0.02 | 0.50 | 0.10 | -0.02 | 99.4 | 95.1 | 95.3 | 92.2 | 45.6 | 47.0 | 100.0 | 4.2 | 6.1 |
| 14 | 0.45 | 0.10 | 0.02 | 0.50 | 0.10 | -0.02 | 99.1 | 93.9 | 94.0 | 94.2 | 45.8 | 47.8 | 100.0 | 3.0 | 1.4 |
Population genetic parameters defining two genetically divergent populations and empirical statistical powers of Methods 1–3 (M 1–3) for detecting significance of linkage disequilibrium between a polymorphic marker and a putative disease locus. The empirical power was calculated from 1,000 repeated samples of 1,000 cases and 1,000 controls as the proportion of the test statistic surpassing the Bonferroni threshold 5 × 10-5. The admixed samples were made up of 57% cases and 76% controls from Population 1 and the rest from Population 2.
Parameters and results of scheme c simulation
| 1 | 0.5 | 0.5 | 0 | 0.1 | 0.05 | 0 | 0.50 ± 0.02 | 2.0 ± 6.1 | 2 | 1.9 ± 2.9 | 0 | 0.9 ± 1.3 | 0 |
| 2 | 0.3 | 0.7 | 0 | 0.1 | 0.05 | 0 | 0.30 ± 0.02 | 2.2 ± 5.9 | 2.1 | 1.8 ± 2.9 | 0.2 | 1.0 ± 1.3 | 0 |
| 3 | 0.7 | 0.3 | 0 | 0.2 | 0.1 | 0 | 0.70 ± 0.02 | 1.5 ± 3.9 | 0.5 | 2.0 ± 2.8 | 0 | 1.0 ± 1.3 | 0 |
| 4 | 0.5 | 0.5 | 0.15 | 0.2 | 0.1 | 0 | 0.48 ± 0.02 | 57.8 ± 21.8 | 99.2 | 52.5 ± 19.0 | 98 | 24.1 ± 9.2 | 79.1 |
| 5 | 0.5 | 0.5 | 0.1 | 0.1 | 0 | 0 | 0.49 ± 0.02 | 74.5 ± 24.1 | 99.7 | 68.7 ± 20.9 | 99.6 | 35.0 ± 11.0 | 97.7 |
| 6 | 0.5 | 0.5 | 0.05 | 0.1 | 0 | 0 | 0.50 ± 0.02 | 20.0 ± 12.3 | 42.6 | 19.6 ± 11.9 | 41.5 | 9.3 ± 5.8 | 11.8 |
| 7 | 0.3 | 0.7 | 0.07 | 0.3 | 0.1 | 0 | 0.29 ± 0.02 | 16.4 ± 12.7 | 32.4 | 14.2 ± 10.9 | 25 | 6.5 ± 4.9 | 4.5 |
| 8 | 0.3 | 0.7 | 0.05 | 0.3 | 0.1 | 0 | 0.29 ± 0.02 | 9.5 ± 8.9 | 12.6 | 8.2 ± 7.7 | 8.8 | 3.7 ± 3.6 | 1 |
| 9 | 0.7 | 0.3 | -0.07 | 0.1 | 0 | 0 | 0.70 ± 0.02 | 102.6 ± 29.7 | 100 | 93.5 ± 26.1 | 99.9 | 44.7 ± 12.0 | 99.6 |
| 10 | 0.7 | 0.3 | -0.05 | 0.1 | 0 | 0 | 0.70 ± 0.02 | 53.7 ± 21.1 | 96.6 | 50.5 ± 19.3 | 96.2 | 24.0 ± 9.0 | 80.1 |
Means and standard deviations (s.d.) of estimates of empirical statistical power (ρ) and the test statistic based on 200 cases and 200 controls from 1000 repeated computer simulations. The left panel lists values of the simulation parameters and the right the estimates. ρ is estimated as proportion (%) of significant tests at the Bonferroni threshold 5 × 10-5 in 1000 simulations.
* when the true simulated parameters were used in the association test.
** when the penetrance parameters f were constantly set to be (1, ½, 0).