| Literature DB >> 17508343 |
Joanna M Biernacka1, Heather J Cordell.
Abstract
In a small chromosomal region, a number of polymorphisms may be both linked to and associated with a disease. Distinguishing the potential causal sites from those indirectly associated due to linkage disequilibrium (LD) with a causal site is an important problem. This problem may be approached by determining which of the associations can explain the observed linkage signal. Recently, several methods have been proposed to aid in the identification of disease associated polymorphisms that may explain an observed linkage signal, using genotype data from affected sib pairs (ASPs) [Li et al. [2005] Am. J. Hum. Genet. 76:934-949; Sun et al. [2002] Am. J. Hum. Genet. 70:399-411]. These methods can be used to test the null hypothesis that a candidate single nucleotide polymorphism (SNP) is the sole causal variant in the region, or is in complete LD with the sole causal variant in the region. We extend variations of these methods to test for complete LD between a disease locus and haplotypes composed of two or more tightly linked candidate SNPs. We study properties of the proposed methods by simulation and apply them to type 1 diabetes data for ASPs and their parents at candidate SNP and microsatellite marker loci in the Insulin (INS) gene region.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17508343 PMCID: PMC2682330 DOI: 10.1002/gepi.20236
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Simulation models:single SNP analysis
| LD | ||||||
|---|---|---|---|---|---|---|
| Model | Description | Risk contributions for haplotypes(11,12,21,22) | Frequency of haplotypes(11,12,21,22) | Disease prevalence | ||
| Model 1—full LD | Multiplicative | (0.15,0.30,0.15,0.30) | 1.00 | 1.00 | (0.70, 0.00, 0.00, 0.30) | 0.038 |
| Model 1—high LD | 0.74 | 0.44 | (0.60,0.05,0.10,0.25) | |||
| Model 1— mid LD | 0.44 | 0.13 | (0.50,0.10,0.20,0.20) | |||
| Model 1—low LD | 0.23 | 0.025 | (0.40,0.12,0.30,0.18) | |||
| Model 2—full LD | Multiplicative | (0.10,0.30,0.10,0.30) | 1.00 | 1.00 | (0.50,0.00,0.00,0.50) | 0.040 |
| Model 2—high LD | 0.80 | 0.64 | (0.45,0.05,0.05,0.45) | |||
| Model 2—mid LD | 0.52 | 0.27 | (0.38,0.12,0.12,0.38) | |||
| Model 2—low LD | 0.20 | 0.04 | (0.30,0.20,0.20,0.30) | |||
| Model 3—full LD | Rare disease allele | (0.10,0.30,0.10,0.30) | 1.00 | 1.00 | (0.95,0.00,0.00,0.05) | 0.0121 |
| Model 3—full LD | 0.70 | 0.05 | (0.65, 0.01, 0.30,0.04) | |||
| Model 4—full LD | Non-multiplicative | (0.01, 0.01,0.05) | 1.00 | 1.00 | (0.70,0.00,0.00,0.30) | 0.0136 |
| Model 4—high LD | 0.67 | 0.44 | (0.63,0.07,0.07,0.23) | |||
| Model 4—mid LD | 0.52 | 0.27 | (0.60,0.10,0.10,0.20) | |||
| Model 5 | Causal haplotype | (0.10,0.15,0.15,0.20,0.15,0.20,0.20,0.30) | (0.20,0.15,0.05,0.10,0.10.,0.05,0.15,0.20) | 0.0342 | ||
Risks are calculated by multiplying td risk contributions of a person's two haplotypes, except for Model 4. For Model 4, td table shows the genotype risks for genotypes 11, 12, 22 at the second SNP in the haplotype. Under Model 5, there are three tightly linked diseasesusceptibility SNPs forming a haplotype. For this model, the table shows the haplotype risk and frequencies for haplotypes 111, 112, 121, 122, 211, 212, 221, 222.
Simulation results: single SNP analysis
| Type 1 error/power | |||||
|---|---|---|---|---|---|
| Model | Sample size | LAMP-LD | Li-cpg | Sun | Sun-cpg |
| Model 1—full LD | 500 | 0.015 | 0.044 | 0.054 | .047 |
| Model 1—high LD | 500 | 0.118 | 0.145 | 0.199 | 0.191 |
| Model 1—mid LD | 500 | 0.253 | 0.220 | 0.367 | 0.327 |
| Model 1—low LD | 500 | 0.324 | 0.275 | 0.447 | 0.371 |
| Model 1—full LD | 1,000 | 0.006 | 0.058 | 0.052 | 0.064 |
| Model 1—high LD | 1,000 | 0.194 | 0.280 | 0.327 | 0.288 |
| Model 1—mid LD | 1,000 | 0.483 | 0.495 | 0.602 | 0.531 |
| Model 1—low LD | 1,000 | 0.608 | 0.530 | 0.704 | 0.607 |
| Model 2—full LD | 500 | 0.019 | 0.058 | 0.040 | 0.041 |
| Model 2—high LD | 500 | 0.296 | 0.260 | 0.277 | 0.240 |
| Model 2—mid LD | 500 | 0.699 | 0.680 | 0.732 | 0.646 |
| Model 2—low LD | 500 | 0.848 | 0.775 | 0.903 | 0.833 |
| Model 2—full LD | 1,000 | 0.022 | 0.044 | 0.046 | 0.051 |
| Model 2—high LD | 1,000 | 0.545 | 0.435 | 0.428 | 0.404 |
| Model 2—mid LD | 1,000 | 0.961 | 0.895 | 0.929 | 0.879 |
| Model 2—low LD | 1,000 | 0.993 | 0.990 | 0.990 | 0.976 |
| Model 3—full LD | 1,000 | 0.016 | .040 | 0.049 | 0.050 |
| Model 3—mid LD | 1,000 | 0.797 | 0.708 | 0.757 | 0.641 |
| Model 4—full LD | 1,000 | 0.023 | 0.043 | 0.039 | 0.037 |
| Model 4—high LD | 1,000 | 0.999 | 0.976 | 0.917 | 0.868 |
| Model 4—mid LD | 1,000 | 0.999 | 0.994 | 0.992 | 0.981 |
| Model 5 | 1,000 | 0.232 | 0.251 | 0.366 | 0.308 |
For Li-cpg, type 1 error estimates are based on 500 data replicates, and power estimates are based on 200 data replicates. For all other methods type 1 error and power estimates are based on 1,000 replicates. When data are generated under “full” LD, the null hypothesis is true, and values in the table are estimates of type 1 error for a test of nominal size 0.05.
LAMP-LD is the test for complete LD implemented in the software LAMP.
Simulation models: haplotype analysis
| Model | Haplotype risks | LD | Haplotype frequencies |
|---|---|---|---|
| Null 1 | (0.1,0.3,0.1,0.3,0.1,0.3,0.1,0.3) | 1.00 | (0.20.0.00,0.15,0.00,0.15,0.00,0.50) |
| Null 2 | (0.1,0.1,0.2,0.2,0.2,0.2,0.4,0.4) | 1.00 | (0.00,0.25,0.00,0.25,0.00,0.25,0.00,0.25) |
| Alt 1 | (0.1,0.3,0.1,0.3,0.1,0.3,0.1,0.3) | 0.39 | (0.10,0.05,0.10,0.05,0.10,0.05,0.15,0.40) |
| Alt 2 | (0.1,0.3,0.1,0.3,0.1,0.3,0.1,0.3) | 0.17 | (0.09,0.06,0.09,0.06,0.09,0.06,0.25,0.30) |
aRisks are calculated by multiplying the risk contributions of a person's two haplotypes. Risks and frequencies are given for haplotypes (111, 112, 121, 122, 211, 212, 221, 222).
bHere D0 represents Hedrick's D′ measure of LD for multi-allelic markers [Hedrick, 1987]. We use it to represent the LD between the loci 1 and 2 haplotype (treated as a four-allele marker) and the third locus, which is the disease SNP. Under the “Null 2” model the loci 1–2 haplotype is itself causal (rather than locus 3). Therefore in this case we report D0 between the candidate loci 1–2 haplotype and the causal (loci 1–2) haplotype, which is clearly D′=1.
Results are shown in Table IV. Table IV also shows the average Kong and Cox [1997] LOD score obtained when testing for initial linkage. In simulations these methods gave type 1 errors close to the nominal 5%. Note that under the “Null 2” model both loci in the haplotype tested are causal, so that the assumption of a single causal SNP in LD with the candidate haplotype made by the Li-cpg haplotype method is violated. However, the type 1 error is still correct, demonstrating that the method is robust to failure of this assumption. This is because of the way significance of the statistic is assessed by a simulation procedure which fixes all the candidate SNP genotypes, as discussed in Appendix A.
Simulation result:haplotype analyses
| Type 1 error/power | |||||
|---|---|---|---|---|---|
| Haplotype extension of | |||||
| Model | Sample size | LD level | Kong and Cox LOD score | Li-cpg | Sun-cpg |
| Null 1 | 500 | Full | 2.75 | 0.056 | 0.036 |
| Null 2 | 500 | Full | 2.46 | 0.056 | 0.034 |
| Alt 1 | 500 | Mid | 3.04 | 0.44 | 0.43 |
| Alt 1 | 1,000 | Mid | 5.70 | 0.73 | 0.80 |
| Alt 2 | 500 | Low | 2.36 | 0.62 | 0.70 |
| Alt 2 | 1,000 | Low | 4.41 | 0.95 | 0.92 |
Type 1 error estimates for a test of nominal size 0.05 are based on 500 data replicates. Power estimates are based on 100 data replicates. When td null td is true, the values in the table are estimates of type 1 error.
td extension of Sun-cpg td for haplotypes used here is based on the NPL-type statistic with weights=σ.
Here the same marker map was used as for simulations in Table II (markers with four equally frequent alleles at 0.0, 2.5, 5.0, 7.5, 10.0cM; candidate and disease SNPs at 5.2 cM).