| Literature DB >> 30792419 |
Guoqiang Yu1, David J Miller2, Chiung-Ting Wu3, Eric P Hoffman4, Chunyu Liu5, David M Herrington6, Yue Wang3.
Abstract
Most genetic or environmental factors work together in determining complex disease risk. Detecting gene-environment interactions may allow us to elucidate novel and targetable molecular mechanisms on how environmental exposures modify genetic effects. Unfortunately, standard logistic regression (LR) assumes a convenient mathematical structure for the null hypothesis that however results in both poor detection power and type 1 error, and is also susceptible to missing factor, imperfect surrogate, and disease heterogeneity confounding effects. Here we describe a new baseline framework, the asymmetric independence model (AIM) in case-control studies, and provide mathematical proofs and simulation studies verifying its validity across a wide range of conditions. We show that AIM mathematically preserves the asymmetric nature of maintaining health versus acquiring a disease, unlike LR, and thus is more powerful and robust to detect synergistic interactions. We present examples from four clinically discrete domains where AIM identified interactions that were previously either inconsistent or recognized with less statistical certainty.Entities:
Year: 2019 PMID: 30792419 PMCID: PMC6385186 DOI: 10.1038/s41598-019-38983-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Mathematical formulation and illustrative comparison between LR and AIM. (a) Theoretical discrepancy between Logistic Regression (LR) prediction and ground truth probability in the case of missing variables (Appendix B). (b) Theoretical capability of the Asymmetric Independence Model (AIM) to accurately predict the ground truth probability in the case of missing variables. (c) Mathematical expression of LR. (d) Mathematical expression of AIM.
Figure 2Comparative performance assessment of AIM and LR using extensive simulation datasets. Our extensive simulation studies evaluate the type 1 error and detection power of AIM and LR in a controlled setting, under varying parameter settings which characterize the population being studied, as well as under the three confounding scenarios prominently identified in this paper – missing factors, surrogate factors, and disease subtypes. The goal is to understand the performance effects of different parameter settings and of these scenarios on both models. (a) The empirical type I error (evaluated when the null hypothesis of no interaction is valid) at significance level 0.05. The gray region is the 95% confidence interval. (b) Power versus sample size with interaction effect size at an odds ratio of 1.5; and case fraction of 50% and the main effect size of 1.5 for both risk factors. (c) Power versus case-control ratio. The fraction of cases is varied by adjusting the baseline parameter in the LR model possessing an interaction term. The sample size is 2000 and the interaction effect size is 1.5. The main effect size for both risk factors is 1.5. (d) Power versus frequency of risk allele, with sample size 2000, main effect size 1.5 for both risk factors, interaction effect size 1.5, and case fraction at 50%. (e) Power to detect an interaction versus correlation between the risk factors for AIM and LR models; both methods achieve their greatest detection power when risk factors are uncorrelated. (f) Power versus main effect size, with sample size 1000, interaction effect size 1.5, and case fraction 50%. (g) Sample size versus p-value threshold, with main effect size 1.5, interaction effect size 1.5, and case fraction 50%. (h) Statistical significance (log p-values) of five ground-truth interactions, as detected by the AIM and LR models (Appendix D–E).
Figure 3Empirical type I error rate at significance level 0.05 for LR (dark grey) and AIM (light grey). (a) A few missing factors with large effect size; (b) Surrogate markers with strong marginal effects; (c) Three subtypes.
Legnani et al. study: risk of venous thrombosis according to the presence of thrombophilic genetic mutation and the use of oral contraceptive.
| Thrombophilic genetic risk mutation | Oral contraceptive | Controls | Cases | Odds ratio |
|---|---|---|---|---|
| − | − | 444 | 118 | 1 |
| − | + | 166 | 86 | 1.95 |
| + | − | 33 | 42 | 4.79 |
| + | + | 7 | 51 | 27.4 |
Joint association of alcohol drinking and tobacco smoking statuses with esophageal cancer risk.
| Alcohol | Smoking | Men | Women | All | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Control | Case | Odds ratio | Control | Case | Odds ratio | Control | Case | Odds ratio | ||
| never | never | 189 | 8 | 1 | 234 | 83 | 1 | 423 | 91 | 1 |
| never | ever | 298 | 61 | 4.84 | 55 | 27 | 1.38 | 353 | 88 | 1.16 |
| ever | never | 144 | 24 | 3.94 | 63 | 29 | 1.30 | 207 | 53 | 1.19 |
| ever | ever | 777 | 562 | 17.1 | 19 | 36 | 5.34 | 796 | 598 | 3.49 |
| LR ( | 0.81 | 0.014 | 5.10e-5 | |||||||
| AIM ( | 5.43e-6 | 0.0031 | 2.11e-8 | |||||||
Figure 4Re-analysis of the interaction between the ALDH2 gene and alcohol consumption.
Joint association of tobacco smoking status and NAT2 acetylation genotype with bladder cancer risk.
| NAT2 acetylation genotype | Smoking status | Controls | Cases | Odds ratio |
|---|---|---|---|---|
| Fast | never | 131 | 66 | 1 |
| Fast | ever | 362 | 340 | 1.86 |
| Slow | never | 199 | 91 | 0.91 |
| Slow | ever | 438 | 637 | 2.89 |