| Literature DB >> 20504300 |
Minghui Wang1, Tianye Jia, Ning Jiang, Lin Wang, Xiaohua Hu, Zewei Luo.
Abstract
BACKGROUND: Linkage disequilibrium (LD) plays a fundamental role in population genetics and in the current surge of studies to screen for subtle genetic variants affecting complex traits. Methods widely implemented in LD analyses require samples to be randomly collected, which, however, are usually ignored and thus raise the general question to the LD community of how the non-random sampling affects statistical inference of genetic association. Here we propose a new approach for inferring LD using a sample un-randomly collected from the population of interest.Entities:
Mesh:
Year: 2010 PMID: 20504300 PMCID: PMC2890561 DOI: 10.1186/1471-2164-11-328
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Conditional probability distribution of disease genotypes for a given marker genotype
| 2 | (1 - | (1 - | 2 | (1 - | ||||
Q = q + D/p and R = q - D/(1 - p)
Prediction of Sampling Scheme I
| Pop. | ( | |||||
|---|---|---|---|---|---|---|
| 1 | 0.5 | 0.5 | (-0.25, 0.25) | 0.20 | 0.1999 ± 0.0078 | 0.2004 ± 0.0078 |
| 2 | 0.5 | 0.5 | (-0.25, 0.25) | 0.10 | 0.1002 ± 0.0145 | 0.1003 ± 0.0145 |
| 3 | 0.3 | 0.3 | (-0.09, 0.21) | 0.09 | 0.0898 ± 0.0133 | 0.0899 ± 0.0125 |
| 4 | 0.7 | 0.7 | (-0.09, 0.21) | 0.09 | 0.0895 ± 50.0133 | 0.0896 ± 0.0126 |
| 5 | 0.3 | 0.5 | (-0.15 0.15) | 0.10 | 0.0997 ± 0.0120 | 0.0998 ± 0.0111 |
| 6 | 0.5 | 0.3 | (-0.15, 0.15) | 0.10 | 0.0995 ± 0.0121 | 0.0993 ± 0.0109 |
| 7 | 0.5 | 0.5 | (-0.25, 0.25) | -0.20 | -0.1995 ± 0.0081 | -0.1998 ± 0.0081 |
| 8 | 0.5 | 0.5 | (-0.25, 0.25) | -0.10 | -0.0996 ± 0.0146 | -0.0997 ± 0.01460 |
| 9 | 0.3 | 0.3 | (-0.09, 0.21) | -0.09 | -0.0896 ± 0.0074 | -0.0899 ± 0.0068 |
| 10 | 0.7 | 0.7 | (-0.09, 0.21) | -0.09 | -0.0897 ± 0.0073 | -0.0899 ± 0.0065 |
| 11 | 0.3 | 0.5 | (-0.15 0.15) | -0.10 | -0.1000 ± 0.0124 | -0.1000 ± 0.0117 |
| 12 | 0.5 | 0.3 | (-0.15, 0.15) | -0.10 | -0.0995 ± 0.0120 | -0.0993 ± 0.0111 |
Linkage disequilibrium parameters were estimated based on 1000 simulations of n = 200 individuals from 12 different populations: p and q are the frequencies of alleles of the marker (M) and trait (A), D is the coefficient of linkage disequilibrium between the marker and the disease loci, Dand Dare respectively the minimum and maximum possible coefficients of linkage disequilibrium given allelic frequencies p and q, and are the estimates from Methods H and L respectively, and the means and standard deviations, s.d., calculated from 1000 simulations.
Prediction of Sampling Scheme II
| Pop. | ||||||||
|---|---|---|---|---|---|---|---|---|
| 1 | 0.20 | 0.18 ± 0.01 | 0.20 ± 0.01 | 0.18 ± 0.01 | 0.17 ± 0.01 | 0.17 ± 0.01 | 0.17 ± 0.01 | |
| 2 | 0.10 | 0.09 ± 0.02 | 0.10 ± 0.01 | 0.09 ± 0.02 | 0.05 ± 0.02 | 0.07 ± 0.01 | 0.05 ± 0.02 | |
| 3 | 0.09 | 0.08 ± 0.02 | 0.06 ± 0.01 | 0.10 ± 0.02 | 0.07 ± 0.01 | 0.04 ± 0.01 | 0.00 ± 0.02 | |
| 4 | 0.09 | 0.10 ± 0.02 | 0.06 ± 0.01 | 0.08 ± 0.01 | 0.00 ± 0.02 | 0.04 ± 0.01 | 0.07 ± 0.01 | |
| 5 | 0.10 | 0.08 ± 0.01 | 0.06 ± 0.01 | 0.12 ± 0.01 | 0.07 ± 0.01 | 0.07 ± 0.01 | 0.05 ± 0.02 | |
| 6 | 0.10 | 0.09 ± 0.01 | 0.10 ± 0.01 | 0.09 ± 0.01 | 0.07 ± 0.01 | 0.07 ± 0.01 | 0.05 ± 0.02 |
Means and standard deviations of coefficients of linkage disequilibrium were estimated from 1,000 repeats for Methods H and L when 200 individuals were generated from the Sampling Scheme II in which either a marker genotype (ni• = 0, i = 1,2,3) or a marker-disease genotype (nii = 0, i = 1,2,3) was missing. The population numbers are the same as those in Table 2 and D is the true value of the disequilibrium coefficient.
Prediction of Sampling Scheme III
| ( | |||||||
|---|---|---|---|---|---|---|---|
| 0.6 | 0.005 | (-0.003,0.002) | -0.002 | -0.011 ± 0.149 | -0.002 ± 0.000 | -0.113 ± 0.015 | -0.071 ± 0.011 |
| 0.5 | 0.01 | (-0.005,0.005) | 0.004 | 0.280 ± 0.011 | 0.004 ± 0.001 | 0.108 ± 0.013 | 0.080 ± 0.013 |
| 0.5 | 0.02 | (-0.010, 0.010) | 0.008 | 0.273 ± 0.010 | 0.008 ± 0.001 | 0.110 ± 0.014 | 0.081 ± 0.013 |
| 0.3 | 0.03 | (-0.009, 0.021) | 0.010 | 0.191 ± 0.026 | 0.011 ± 0.002 | 0.104 ± 0.019 | 0.057 ± 0.018 |
| 0.7 | 0.04 | (-0.028, 0.012) | 0.010 | 0.309 ± 0.016 | 0.011 ± 0.002 | 0.071 ± 0.012 | 0.061 ± 0.017 |
| 0.3 | 0.05 | (-0.015, 0.035) | 0.020 | 0.192 ± 0.044 | 0.021 ± 0.004 | 0.122 ± 0.017 | 0.066 ± 0.017 |
| 0.5 | 0.10 | (-0.050, 0.050) | 0.040 | 0.227 ± 0.008 | 0.045 ± 0.006 | 0.124 ± 0.014 | 0.088 ± 0.012 |
Simulation parameters for the case-control sampling scheme and means and standard deviation of estimates of the disequilibrium coefficients, and , from Methods H and L respectively. p, q and D are simulated values of population allele frequencies at the marker and disease loci and linkage disequilibrium respectively. (D, D) are the theoretical minimum and maximum values of D. The disequilibrium coefficients were calculated when the marker allele frequency, p was calculated directly from the control sub-samples while the disease allele frequency q was either from population survey or directly estimated from the case-control samples.
Prediction from case-control samples with various sample sizes
| 0.6 | 0.005 | -0.002 | 0.002 ± 0.001 | -0.002 ± 0.002 | -0.002 ± 0.000 | -0.002 ± 0.000 |
| 0.5 | 0.010 | 0.004 | 0.004 ± 0.001 | 0.004 ± 0.001 | 0.004 ± 0.000 | 0.004 ± 0.000 |
| 0.5 | 0.020 | 0.008 | 0.008 ± 0.002 | 0.008 ± 0.001 | 0.008 ± 0.001 | 0.008 ± 0.001 |
| 0.3 | 0.030 | 0.010 | 0.010 ± 0.003 | 0.011 ± 0.002 | 0.010 ± 0.002 | 0.010 ± 0.001 |
| 0.7 | 0.040 | 0.010 | 0.010 ± 0.004 | 0.011 ± 0.002 | 0.010 ± 0.002 | 0.010 ± 0.001 |
| 0.3 | 0.050 | 0.020 | 0.021 ± 0.005 | 0.021 ± 0.004 | 0.021 ± 0.003 | 0.021 ± 0.002 |
| 0.5 | 0.100 | 0.040 | 0.044 ± 0.008 | 0.045 ± 0.006 | 0.046 ± 0.004 | 0.045 ± 0.003 |
| 0.6 | 0.005 | -0.002 | 1.852 ± 1.047 | 3.619 ± 1.467 | 6.913 ± 2.112 | 13.774 ± 2.927 |
| 0.5 | 0.010 | 0.004 | 1.949 ± 1.036 | 3.692 ± 1.454 | 7.397 ± 1.959 | 14.390 ± 2.809 |
| 0.5 | 0.020 | 0.008 | 2.010 ± 1.045 | 3.843 ± 1.483 | 7.657 ± 2.070 | 14.955 ± 2.832 |
| 0.3 | 0.030 | 0.010 | 1.545 ± 1.047 | 2.957 ± 1.395 | 5.596 ± 1.977 | 11.085 ± 2.717 |
| 0.7 | 0.040 | 0.010 | 1.115 ± 0.707 | 2.093 ± 1.043 | 3.958 ± 1.377 | 7.682 ± 2.063 |
| 0.3 | 0.050 | 0.020 | 2.325 ± 1.278 | 4.393 ± 1.690 | 8.648 ± 2.452 | 17.191 ± 3.332 |
| 0.5 | 0.100 | 0.040 | 2.493 ± 1.100 | 4.860 ± 1.549 | 9.560 ± 2.165 | 18.948 ± 2.994 |
Means and standard deviations of the estimates of the disequilibrium coefficients and the corresponding LOD score from using various sizes of case-control samples.
LD estimation from case and control samples with varying proportions
| 0.6 | 0.005 | -0.002 | 0.002 ± 0.000 | -0.002 ± 0.000 | -0.002 ± 0.000 | -0.002 ± 0.000 |
| 0.5 | 0.010 | 0.004 | 0.004 ± 0.001 | 0.004 ± 0.001 | 0.004 ± 0.001 | 0.004 ± 0.001 |
| 0.5 | 0.020 | 0.008 | 0.008 ± 0.001 | 0.008 ± 0.001 | 0.008 ± 0.001 | 0.008 ± 0.002 |
| 0.3 | 0.030 | 0.010 | 0.010 ± 0.003 | 0.010 ± 0.003 | 0.010 ± 0.002 | 0.010 ± 0.003 |
| 0.7 | 0.040 | 0.010 | 0.010 ± 0.003 | 0.011 ± 0.002 | 0.010 ± 0.002 | 0.010 ± 0.003 |
| 0.3 | 0.050 | 0.020 | 0.021 ± 0.004 | 0.021 ± 0.004 | 0.021 ± 0.004 | 0.021 ± 0.004 |
| 0.5 | 0.100 | 0.040 | 0.043 ± 0.007 | 0.044 ± 0.006 | 0.045 ± 0.006 | 0.045 ± 0.007 |
| 0.6 | 0.005 | -0.002 | 1.814 ± 0.855 | 2.420 ± 1.063 | 4.707 ± 2.172 | 5.362 ± 2.695 |
| 0.5 | 0.010 | 0.004 | 1.906 ± 0.832 | 2.493 ± 0.997 | 4.915 ± 1.994 | 5.715 ± 2.502 |
| 0.5 | 0.020 | 0.008 | 1.969 ± 0.812 | 2.579 ± 1.029 | 5.112 ± 2.160 | 5.916 ± 2.653 |
| 0.3 | 0.030 | 0.010 | 1.481 ± 0.812 | 1.950 ± 1.025 | 3.888 ± 2.011 | 4.463 ± 2.519 |
| 0.7 | 0.040 | 0.010 | 1.063 ± 0.580 | 1.453 ± 0.724 | 2.714 ± 1.368 | 3.130 ± 1.783 |
| 0.3 | 0.050 | 0.020 | 2.162 ± 0.966 | 2.982 ± 1.161 | 5.876 ± 2.446 | 6.695 ± 3.029 |
| 0.5 | 0.100 | 0.040 | 2.275 ± 0.816 | 3.152 ± 1.071 | 6.657 ± 2.241 | 7.616 ± 2.772 |
Means and standard deviations of the estimates of the disequilibrium coefficients and the corresponding LOD score from use of case-control samples of a constant size of 200 individuals but with varying proportions between the 'case and control' individuals, c:c.
Figure 1Linkage disequilibrium analysis of the . Distribution of linkage disequilibrium between each of polymorphic sites and the β-thalassemia causing mutation in a 20.693 kb region surrounding the human β -globin gene. (a) Estimates of the coefficients of LD from three different methods. The dot lines represent the lowest and highest theoretical bounds of the disequilibrium parameter, which are defined by allele frequencies at the polymorphic marker and disease causing sites. (b) The LOD score values calculated for the LD estimates from the three methods.
Figure 2LOD score profile. LOD score profiles for the MLE of linkage disequilibrium between the disaese causing mutant and each of two linked polymorphic markers M516 and M2046 over different values of disease allele frequency. The two markers are of 516 and 2046 bp to the disease causing mutant site.