| Literature DB >> 31198442 |
Marc Joiret1,2, Jestinah M Mahachie John1, Elena S Gusareva1, Kristel Van Steen1,3.
Abstract
BACKGROUND: In Genome-Wide Association Studies (GWAS), the concept of linkage disequilibrium is important as it allows identifying genetic markers that tag the actual causal variants. In Genome-Wide Association Interaction Studies (GWAIS), similar principles hold for pairs of causal variants. However, Linkage Disequilibrium (LD) may also interfere with the detection of genuine epistasis signals in that there may be complete confounding between Gametic Phase Disequilibrium (GPD) and interaction. GPD may involve unlinked genetic markers, even residing on different chromosomes. Often GPD is eliminated in GWAIS, via feature selection schemes or so-called pruning algorithms, to obtain unconfounded epistasis results. However, little is known about the optimal degree of GPD/LD-pruning that gives a balance between false positive control and sufficient power of epistasis detection statistics. Here, we focus on Model-Based Multifactor Dimensionality Reduction as one large-scale epistasis detection tool. Its performance has been thoroughly investigated in terms of false positive control and power, under a variety of scenarios involving different trait types and study designs, as well as error-free and noisy data, but never with respect to multicollinear SNPs.Entities:
Keywords: 1000 genomes project; Ankylosing spondylitis; Gametic phase disequilibrium (GPD); Genome-wide association interaction studies (GWAIS); Model-based multifactor-dimensionality reduction (MB-MDR); Signal sensitivity
Year: 2019 PMID: 31198442 PMCID: PMC6558841 DOI: 10.1186/s13040-019-0199-7
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Two LD block structures on two chromosomes: Presented are two LD blocks corresponding to HapMap3 GBR subpopulation of 91 unrelated individuals. The selected regions are from chromosome 8 (left) and 7 (right) consist of 787 and 964 SNPs respectively. The positions of causal epistatic variants are indicated by arrows
Fig. 2LD block structures on chromosome 8 selected region. Zoom-in on the positions of causal DSLs on chromosome 8 corresponding to 3 out of 4 epistatic scenarios
Fig. 3Trajectory of the minor allele frequencies. Simulated forward-time trajectory of allele frequency over 500 generations. The blue line is the trajectory for DSL 2 A moving from 0.42 to 0.40 in 500 generations. The orange line is the trajectory for DSL 1 moving from 0.088 to 0.05
Imposed genotype penetrance table and disease prevalence calculation in the general population with allele frequencies under assumption of Hardy-Weinberg equilibrium
| Genotype | Penetrance of genotype | Marginal | |||
| −−−−−−−−−−−−−−−− | penetrance | ||||
|
|
|
| |||
| (1− | 2 |
| |||
|
| (1− | ||||
|
| 2 | ||||
|
|
| ||||
| Marginal | |||||
| penetrance | |||||
| DSL 1 |
|
|
| ||
| DSL 2 A | 0.9025 | 0.095 | 0.0025 | ||
|
| 0.36 | 0.0067 | 0.0911 | 0.0911 | 0.015 |
|
| 0.48 | 0.0067 | 0.0392 | 0.0392 | 0.010 |
|
| 0.16 | 0.0067 | 0.0163 | 0.0163 | 0.008 |
| Marginal | 0.0067 | 0.054 | 0.054 | p(D)=0.0113 | |
| penetrance | |||||
| Odds ratio as compared to double homozygous | |||||
|
|
|
| |||
|
| 1.00 | 14.88 | 14.88 | ||
|
| 1.00 | 6.05 | 6.05 | ||
|
| 1.00 | 2.46 | 2.46 | ||
In all settings, the minor allele frequency for DSL 1 is p=0.05 and for DSL 2 is p=0.40. Upper part: probabilities of disease given the genotype, values for simulated datasets in setting A (DSL 1 and DSL 2 A) with epistasis effect size β3=0.90 (see text). Lower part : odds ratio with major homozygous (TT) as baseline in setting A with epistasis effect size β3=0.90. The prevalence in the general population with this setting is around 1%
Allele frequencies of DSLs in founder and expanded populations (first allele in each pair is the minor allele)
| Minor allele frequencies | |||
|---|---|---|---|
| Founder population | Expanded population | ||
| 91 individuals | 10000 individuals | ||
| Causal SNP | Alleles | ||
| DSL 1 (rs17644404) | A/T | 0.09 | 0.05 |
| DSL 2 A (rs10956767) | C/A | 0.42 | 0.40 |
| DSL 2 B (rs2073640) | T/C | 0.33 | 0.40 |
| DSL 2 C (rs1476427) | T/C | 0.35 | 0.40 |
| DSL 2 D (rs112698197) | T/C | 0.19 | 0.40 |
Fig. 4Disease odds ratios conditioned on the genotype of 2 causal loci: Odds ratio effect sizes conditioned on pure epistatic pairs of loci for disease status in the simulated case-control datasets. Causal effects for DSL 1 and DSL 2 A are conditioned on allele A for DSL 1. The risk allele A of DSL 2 A only increases risk for individuals carrying at least one copy of the DSL 1 risk allele (DSL 1 is epistatic to DSL 2 A). The low risk CC/TT genotype is set as the baseline (OR=1). The other genotype combinations are coded according to g1,g2 and their product g1×g2. Odds ratio are obtained by exponentiating the β3 coefficient of the interaction term from the logistic regression (see text). Error bars: 95% confidence intervals of possible odds ratio that are obtained in different simulated case-control samples
Tag SNPs number associated to causal variants for different LD(r2) values
| Causal | Number of tag SNP at LD( | ||||
|---|---|---|---|---|---|
| SNP | |||||
| DSL 1 | 60 | 2 | 2 | 1 | 1 |
| DSL 2 A | 115 | 114 | 114 | 111 | 98 |
| DSL 2 B | 110 | 110 | 109 | 107 | 107 |
| DSL 2 C | 81 | 80 | 80 | 78 | 78 |
| DSL 2 D | 76 | 48 | 31 | 31 | 24 |
Heritabilities associated to effect sizes for the epistatic interaction in all simulated datasets
| Simulated | Interaction | Heritability |
|---|---|---|
| setting |
|
|
| Effect size 1 | ||
| Effect size 2 | ||
| Effect size 3 |
Fig. 5Sensitivities of MB-MDR to detect two-loci pure epistatic interaction in 4 settings at three effect sizes and with different LD pruning levels: Signal sensitivities (upper panel) and exact sensitivities (lower panel) are displayed at different LD pruning thresholds (unpruned data or LD pruning at 0.75, 0.60, 0.50 and 0.20). Signal sensitivities determined with tag-SNP subsets at LD r2≥0.45 with causal SNPs
Fig. 6Sensitivities of MB-MDR to detect two-loci pure epistatic interaction in 4 settings at three effect sizes and with different LD pruning levels: Signal sensitivities (upper panel) and exact sensitivities (lower panel) are displayed at different LD pruning thresholds (unpruned data or LD pruning at 0.75, 0.60, 0.50 and 0.20). Signal sensitivities determined with tag-SNP subsets at LD r2≥0.20 with causal SNPs
Sensitivity results of MB-MDR to detect two locus model of pure epistatic interaction in 1200 simulated datasets with real human genome LD patterns, for 3 effect sizes and after 5 LD pruning levels
| LD block setting | LD pruning | Effect Size | Exact | Signal Sensitivity | |
|---|---|---|---|---|---|
| Sensitivity | −−−−−−−−−−−−−−−−−− | ||||
|
|
| ||||
| LD | LD | ||||
|
| unpruned | 0.61 | 0.67 | 0.73 | |
| Two SNPs | 0.55 | 0.65 | 0.77 | ||
| in same | 0.70 | 0.85 | 0.89 | ||
| LD block | LD | 0.01 | 0.90 | 0.91 | |
| 0.04 | 0.92 | 0.94 | |||
| 0.03 | 0.93 | 0.93 | |||
| LD | 0.01 | 0.93 | 0.94 | ||
| 0.00 | 0.94 | 0.94 | |||
| 0.01 | 0.92 | 0.94 | |||
| LD | 0.00 | 0.91 | 0.92 | ||
| 0.00 | 0.90 | 0.91 | |||
| 0.01 | 0.91 | 0.95 | |||
| LD | 0.00 | 0.61 | 0.74 | ||
| 0.00 | 0.69 | 0.80 | |||
| 0.01 | 0.66 | 0.84 | |||
|
| unpruned | 0.54 | 0.75 | 0.75 | |
| Two SNPs | 0.46 | 0.70 | 0.71 | ||
| in middle | 0.41 | 0.75 | 0.76 | ||
| of two | LD | 0.64 | 0.91 | 0.91 | |
| separate | 0.58 | 0.91 | 0.91 | ||
| LD blocks | 0.44 | 0.93 | 0.94 | ||
| LD | 0.49 | 0.92 | 0.92 | ||
| 0.41 | 0.93 | 0.93 | |||
| 0.27 | 0.94 | 0.95 | |||
| LD | 0.39 | 0.92 | 0.92 | ||
| 0.32 | 0.93 | 0.93 | |||
| 0.23 | 0.94 | 0.95 | |||
| LD | 0.19 | 0.57 | 0.81 | ||
| 0.16 | 0.69 | 0.91 | |||
| 0.21 | 0.83 | 0.92 | |||
|
| unpruned | 0.18 | 0.33 | 0.43 | |
| One SNP | 0.23 | 0.36 | 0.49 | ||
| in a block | 0.18 | 0.36 | 0.51 | ||
| and one | LD | 0.0 | 0.65 | 0.74 | |
| in the edge | 0.0 | 0.72 | 0.83 | ||
| of a separate | 0.0 | 0.57 | 0.76 | ||
| LD block | LD | 0.0 | 0.56 | 0.74 | |
| 0.0 | 0.59 | 0.81 | |||
| 0.0 | 0.47 | 0.74 | |||
| LD | 0.0 | 0.48 | 0.71 | ||
| 0.0 | 0.50 | 0.81 | |||
| 0.0 | 0.36 | 0.70 | |||
| LD | 0.0 | 0.07 | 0.60 | ||
| 0.0 | 0.05 | 0.62 | |||
| 0.0 | 0.04 | 0.57 | |||
|
| unpruned | 0.39 | 0.68 | 0.82 | |
| Two SNPs | 0.40 | 0.69 | 0.81 | ||
| on LD blocks | 0.58 | 0.76 | 0.84 | ||
| of separate | LD | 0.18 | 0.86 | 0.94 | |
| chromosomes | 0.18 | 0.93 | 0.99 | ||
| 0.23 | 0.84 | 0.90 | |||
| LD | 0.14 | 0.87 | 0.94 | ||
| 0.13 | 0.93 | 0.98 | |||
| 0.17 | 0.85 | 0.90 | |||
| LD | 0.13 | 0.85 | 0.92 | ||
| 0.13 | 0.93 | 0.97 | |||
| 0.16 | 0.83 | 0.89 | |||
| LD | NA | NA | NA | ||
| 0.10 | 0.67 | 0.86 | |||
| 0.17 | 0.75 | 0.84 | |||
False positive rates (type I error) estimation in % for different LD patterns and pruning levels
| LD | LD block settings: | |||
|---|---|---|---|---|
| pruning |
|
|
|
|
| unpruned | < 1% | < 1% | < 1% | < 1% |
| LD ( | < 1% | < 1% | < 1% | < 1% |
| LD ( | < 1% | < 1% | < 1% | < 1% |
| LD ( | < 1% | < 1% | < 1% | < 1% |
| LD ( | < 1% | < 1% | < 1% | < 1% |
Null data with no disease association to the investigated pair of SNPs as disease susceptibility loci