| Literature DB >> 30133451 |
Iryna Lobach1, Joshua Sampson2, Alexander Alekseyenko3, Siarhei Lobach4, Li Zhang1,5,6.
Abstract
Case-control Genome-Wide Association Studies (GWAS) provide a rich resource for studying the genetic architecture of complex diseases. A key is to elucidate how the genetic effects vary by the environment, what is traditionally defined by Gene-Environment interactions (GxE). The overlooked complication is that multiple, distinct pathophysiologic mechanisms may lead to the same clinical diagnosis and often these mechanisms have distinct genetic bases. In this paper, we first show that using the clinically diagnosed status can lead to severely biased estimates of GxE interactions in situations when the frequency of the pathologic diagnosis of interest, as compared to other diagnoses, depends on the environment. We then propose a pseudo-likelihood solution to correct the bias. Finally, we demonstrate our method in extensive simulations and in a GWAS of Alzheimer's disease.Entities:
Mesh:
Year: 2018 PMID: 30133451 PMCID: PMC6104951 DOI: 10.1371/journal.pone.0201140
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Parameter estimates in Alzheimer’s disease study.
| KIAA0922 | TLR2 | uLR | -0.30, p = 0.09 | |||||
| KIAA0922 | TLR2 | uLR | -0.16, p = 0.43 | |||||
| TLR2 | uLR | 0.43, p = 0.06 | 0.01, p = 0.49 | ||||
| SORBS2 | TLR3 | uLR | -0.31, p = 0.06 | |||||
| SORBS2 | TLR3 | uLR | -0.24, p = 0.21 | |||||
| SORBS2 | TLR3 | uLR | 0.08, p = 0.24 | 0.03, p = 0.55 | ||||
| SORBS2 | TLR3 | uLR | 0.13, p = 0.62 | -0.64, p = 0.14 | ||||
| SORBS2 | TLR3 | uLR | -0.33, p = 0.15 | 0.46, p = 0.86 | ||||
| SORBS2 | TLR3 | uLR | -0.24, p = 0.05 | |||||
| SORBS2 | TLR3 | uLR | -0.01,p = 0.51 | 0.21, p = 0.16 | ||||
| SORBS2 | TLR3 | uLR | -0.27, p = 0.26 | |||||
| NQO1 | LOC100132364 | uLR | 0.59, p = 0.07 | |||||
| TLR3 | uLR | -0.02, p = 0.50 | -0.61, p = 0.06 | ||||
| TLR3 | uLR | -0.14, p = 0.25 | 0.01, p = 0.52 | ||||
| TLR3 | FAM149A | uLR | -0.12, p = 0.20 | 0.12, p = 0.26 | ||||
| ASTN2 | TLR4 | uLR | -0.04, p = 0.65 | |||||
| TNFRSF19 | uLR | -0.28, p = 0.06 | 0.27, p = 0.13 | ||||
| ASTN2 | TLR4 | uLR | 0.68, p = 0.06 | 0.74, p = 0.06 | ||||
| ASTN2 | TLR4 | uLR | 0.04, p = 0.59 | 0.16, p = 0.69 | ||||
| ASTN2 | TLR4 | uLR | 0.06, p = 0.56 | -0.28, p = 0.31 | ||||
| ASTN2 | TLR4 | uLR | -0.06, p = 0.34 | |||||
| ASTN2 | TLR4 | uLR | -0.03, p = 0.43 | 0.27, p = 0.08 | ||||
| ASTN2 | TLR4 | uLR | 0.42, p = 0.10 | |||||
| ASTN2 | TLR4 | uLR | -0.009, p = 0.49 | |||||
| ASTN2 | TLR4 | uLR | 0.05, p = 0.32 | 0.05, p = 0.54 | ||||
| ASTN2 | TLR4 | uLR | 0.21, p = 0.63 | 0.11, p = 0.54 | 1.2, p = 0.17 | |||
| ASTN2 | TLR4 | uLR | 0.14, p = 0.66 | 0.59, p = 0.11 | ||||
| ASTN2 | TLR4 | uLR | -0.07, p = 0.34 | |||||
| ASTN2 | TLR4 | uLR | 0.75, p = 0.08 | 0.07, p = 0.59 | 0.68, p = 0.08 | |||
| ASTN2 | TLR4 | uLR | 0.52, p = 0.33 | |||||
| ASTN2 | TLR4 | uLR | ||||||
| ASTN2 | TLR4 | uLR | -0.02, p = 0.43 | |||||
| ASTN2 | TLR4 | uLR | -0.23, p = 0.33 | |||||
| ASTN2 | TLR4 | uLR | -0.19, p = 0.10 | 0.24, p = 0.10 | ||||
| ASTN2 | TLR4 | uLR | -1.3, p = 0.11 | |||||
| ASTN2 | TLR4 | uLR | -0.01, p = 0.47 | |||||
| ASTN2 | TLR4 | uLR | -0.2, p = 0.00 | |||||
| ASTN2 | TLR4 | uLR | 0.02, p = 0.59 | -0.04, p = 0.44 | ||||
| ASTN2 | TLR4 | uLR | -0.24, p = 0.27 | 0.26, p = 0.26 | ||||
| ASTN2 | TLR4 | uLR | -0.005, p = 0.56 | -0.14, p = 0.48 | 1.4,p = 0.15 | |||
| ASTN2 | TLR4 | uLR | 0.17, p = 0.12 | 0.04, p = 0.58 | ||||
| TLR4 | uLR | 0.63, p = 0.25 | -0.56, p = 0.30 | 0.78, p = 0.27 | |||
| TLR4 | uLR | 0.14, p = 0.36 | |||||
| TLR4 | LOC100129489 | uLR | -0.02, p = 0.46 | 0.47, p = 0.11 | ||||
| TLR4 | LOC100129489 | uLR | -0.02, p = 0.43 | 0.07, p = 0.63 | ||||
| TLR4 | LOC100129489 | uLR | 0.08, p = 0.69 | -0.11, p = 0.33 | ||||
| TLR4 | LOC100129489 | uLR | -0.02, p = 0.45 | 0.08, p = 0.63 | ||||
| TLR4 | LOC100129489 | uLR | 0.25, p = 0.18 | -0.20, p = 0.28 | ||||
| TLR4 | LOC100129489 | uLR | -0.06, p = 0.33 | |||||
| TLR4 | LOC100129489 | uLR | 0.20, p = 0.11 | -0.31, p = 0.09 | ||||
| TLR4 | LOC100129489 | uLR | -0.28, p = 0.28 | |||||
| TLR4 | LOC100129489 | uLR | 0.36, p = 0.21 | 0.16, p = 0.28 | ||||
| TLR4 | LOC100129489 | uLR | 0.07, p = 0.30 | |||||
| AGER | uLR | -0.12, p = 0.08 | -0.28, p = 0.05 | ||||
| AGER | uLR | 0.03, p = 0.85 | |||||
| AGER | uLR | -0.12, p = 0.09 |
Analyses are performed using the usual logistic regression (uLR) that uses the clinical diagnosis as an outcome and using pseudo-likelihood method that assumes that the proportion of nuisance disease within the clinically diagnosed AD is 36% for ε4 non-carriers and is 6% for ε4 carriers. Pseudo-likelihood analyses pMLE-DX estimates parameters for D = 1 vs. D = 0 and D = 1* combined. Pseudo-likelihood analyses pMLE – DX*, however, estimate two sets of risk coefficients, i.e. βs for D = 0 vs. D = 1 and β*s D = 0 vs. D = 1*. Estimates of β*s are reported in .
Bias and RMSE in parameter estimates when β ≠ 0.
| Parameters | True value | Clinical disease status is the outcome | With consideration of clinical-pathological diagnoses relationship | |||||
|---|---|---|---|---|---|---|---|---|
| Usual logistic | Pseudo-likelihood method (pMLE) | Pseudo-likelihood method | ||||||
| Bias | RMSE | Bias | RMSE | Bias | RMSE | |||
| -1 | 0.46 | 0.46 | 0.98 | 0.98 | -0.0002 | 0.07 | ||
| 0.406 | -0.13 | 0.16 | -0.13 | 0.16 | -0.008 | 0.13 | ||
| 1.098 | -0.35 | 0.35 | -0.35 | 0.35 | 0.003 | 0.08 | ||
| -0.083 | 0.02 | 0.06 | 0.02 | 0.06 | -0.004 | 0.08 | ||
| 2.079 | -0.31 | 0.33 | -0.31 | 0.33 | 0.005 | 0.12 | ||
| 0.693 | 0.56 | 2.4 | 0.26 | 0.91 | 0.22 | 0.93 | ||
| Pr(G = 1) | 0.10 | -0.0004 | 0.004 | 0.02 | 0.02 | |||
The Bias and Root Mean Squared Error (RMSE) in parameter estimates from simulations using the usual logistic regression with clinical diagnosis as the outcome (uLR), the pseudo-likelihood approach (pMLE), and our newly proposed pseudo-likelihood approach that accounts for misdiagnosis (pMLE-DX). For these simulations, the study included n0 = 3000 controls and n1 = 3000 cases. Frequency of ApoE ε4 allele in the population is 14%. Variables Z1 and Z2 are Bernoulli with frequencies 0.50 and 0.52, respectively. Frequency of the true disease status is 46% in the population; and is 40% among the subpopulation with no ApoE ε4 alleles, and 82% in the subpopulation with at least one ApoE ε4 alleles. Frequency of nuisance disease within the clinical diagnosis varies by ApoE4 status pr(D = 1*|D = 1,ε4−) = 0.36 and pr(D = 1*|D = 1,ε4+) = 0.06.
Bias and RMSE in parameter estimates when β = 0.
| Parameters | True value | Clinical disease status is the outcome | With consideration of clinical-pathological relationship | ||||
|---|---|---|---|---|---|---|---|
| Usual logistic | Pseudo-likelihood method | Pseudo-likelihood method | |||||
| Bias | RMSE | Bias | RMSE | Bias | RMSE | ||
| -1 | 0.45 | 0.45 | 0.93 | 0.93 | -0.0004 | 0.07 | |
| 1.099 | -0.12 | 0.15 | -0.07 | -0.15 | 0.002 | 0.13 | |
| 1.098 | -0.33 | 0.34 | -0.33 | 0.34 | 0.001 | 0.08 | |
| -0.083 | 0.02 | 0.06 | 0.02 | 0.06 | -0.003 | 0.08 | |
| 2.079 | -0.26 | 0.28 | -0.26 | 0.28 | 0.007 | 0.12 | |
| 0 | 0.12 | 0.41 | 0.13 | 0.41 | 0.04 | 0.43 | |
| Pr(G = 1) | 0.10 | -0.000 | 0.004 | 0.03 | 0.03 | ||
The Bias and Root Mean Squared Error (RMSE) in parameter estimates from simulations using the usual logistic regression with clinical diagnosis as the outcome (uLR), the pseudo-likelihood approach (pMLE), and our newly proposed pseudo-likelihood approach that accounts for misdiagnosis (pMLE-DX). For these simulations, the study included n0 = 3000 controls and n1 = 3000 cases. Frequency of ApoE ε4 allele in the population is 14%. Variables Z1 and Z2 are Bernoulli with frequencies 0.50 and 0.52, respectively. Frequency of the true disease status is 46% in the population; and is 40% among the subpopulation with no ApoE ε4 alleles, and 82% in the subpopulation with at least one ApoE ε4 alleles. Frequency of nuisance disease within the clinical diagnosis varies by ApoE4 status pr(D = 1*|D = 1,ε4−) = 0.36 and pr(D = 1*|D = 1,ε4+) = 0.06.
Bias and RMSE in parameter estimates when = 0 and = 0.
| Parameters | True value | Clinical disease status is the outcome | With consideration of clinical-pathological diagnoses relationship | ||||
|---|---|---|---|---|---|---|---|
| Usual logistic | Pseudo-likelihood method | Pseudo-likelihood method | |||||
| Bias | RMSE | Bias | RMSE | Bias | RMSE | ||
| -1 | 0.97 | 0.97 | 0.74 | 0.74 | 0.02 | 0.06 | |
| -1.7 | 0.008 | 0.05 | |||||
| -0.69 | 0.30 | 0.31 | -0.39 | 0.39 | 0.005 | 0.10 | |
| 0 | -0.02 | 0.14 | |||||
| 0.10 | 0.002 | 0.31 | 0.004 | 0.05 | 0.002 | 0.05 | |
| -0.083 | -0.004 | 0.05 | -0.0008 | 0.05 | -0.004 | 0.05 | |
| 1.3 | -0.22 | 0.24 | -0.21 | 0.23 | -0.006 | 0.10 | |
| 0.5 | -0.007 | 0.05 | |||||
| 0.10 | -0.13 | 0.29 | -0.28 | 0.36 | 0.01 | 0.25 | |
| 0 | 0.001 | 0.11 | |||||
| Pr(G = 1) | 0.10 | 0.05 | 0.05 | 0.0001 | 0.004 | ||
The Bias and Root Mean Squared Error (RMSE) in parameter estimates from simulations using the usual logistic regression with clinical diagnosis as the outcome (uLR), the pseudo-likelihood approach (pMLE), and our newly proposed pseudo-likelihood approach that accounts for misdiagnosis (pMLE-DX). For these simulations, the study included n0 = 3000 controls and n1 = 3000 cases. Risk of the disease of interest is defined in a set of parameters ; while the risk of the nuisance disease is parametrized by Frequency of ApoE ε4 allele in the population is 14%. Variables Z1 and Z2 are Bernoulli with frequencies 0.50 and 0.52, respectively. Frequencies of the disease of interest and the nuisance disease are pr(D = 1) = 24.8%, pr(D = 1*) = 12.5%, pr(D = 1|ε4+) = 43%, pr(D = 1*|ε4+) = 16.1%, pr(D = 1|ε4−) = 20%, pr(D = 1*|ε4+) = 11.6%. Frequency of the nuisance disease within the clinical diagnosis varies by ApoE4 status pr(D = 1*|D = 1,ε4−) = 0.36 and pr(D = 1*|D = 1,ε4+) = 0.06.
Bias and RMSE in parameter estimates when = 0, β = 0 and = 0.
| Parameters | True value | Clinical disease is the outcome | With consideration of clinical-pathological diagnoses relationship | ||||
|---|---|---|---|---|---|---|---|
| Usual logistic | Pseudo-likelihood method | Pseudo-likelihood method | |||||
| Bias | RMSE | Bias | RMSE | Bias | RMSE | ||
| -1 | 0.97 | 0.97 | 0.75 | 0.75 | 0.03 | 0.06 | |
| -1.7 | 0.01 | 0.05 | |||||
| -0.69 | 0.30 | 0.31 | -0.38 | 0.39 | 0.004 | 0.09 | |
| 0 | -0.01 | 0.13 | |||||
| 0.10 | 0.002 | 0.05 | 0.001 | 0.09 | 0.002 | 0.05 | |
| -0.083 | -0.004 | 0.05 | -0.003 | 0.05 | -0.004 | 0.05 | |
| 1.3 | -0.22 | 0.24 | -0.22 | 0.23 | -0.006 | 0.10 | |
| 0.5 | -0.009 | 0.06 | |||||
| 0 | -0.23 | 0.28 | -0.23 | 0.28 | 0.01 | 0.25 | |
| 0 | -0.0008 | 0.12 | |||||
| Pr(G = 1) | 0.10 | 0.000 | 0.004 | ||||
The Bias and Root Mean Squared Error (RMSE) in parameter estimates from simulations using the usual logistic regression with clinical diagnosis as the outcome (uLR), the pseudo-likelihood approach (pMLE), and our newly proposed pseudo-likelihood approach that accounts for misdiagnosis (pMLE-DX). For these simulations, the study included n0 = 3000 controls and n1 = 3000 cases. Risk of the disease of interest is defined in a set of parameters ; while the risk of the nuisance disease is parametrized by Frequency of ApoE ε4 allele in the population is 14%. Variables Z1 and Z2 are Bernoulli with frequencies 0.50 and 0.52, respectively. Frequencies of the disease of interest and the nuisance disease are pr(D = 1) = 24.8%, pr(D = 1*) = 12.5%, pr(D = 1|ε4+) = 43%, pr(D = 1*|ε4+) = 16.1%, pr(D = 1|ε4−) = 20%, pr(D = 1*|ε4−) = 11.6%. Frequency of the nuisance disease within the clinical diagnosis varies by ApoE4 status pr(D = 1*|D = 1,ε4−) = 0.36 and pr(D = 1*|D = 1,ε4+) = 0.06.