| Literature DB >> 29265408 |
Stefan Konigorski1,2, Yuan Wang2, Candemir Cigsar2, Yildiz E Yilmaz2,3,4.
Abstract
In genetic association studies, it is important to distinguish direct and indirect genetic effects in order to build truly functional models. For this purpose, we consider a directed acyclic graph setting with genetic variants, primary and intermediate phenotypes, and confounding factors. In order to make valid statistical inference on direct genetic effects on the primary phenotype, it is necessary to consider all potential effects in the graph, and we propose to use the estimating equations method with robust Huber-White sandwich standard errors. We evaluate the proposed causal inference based on estimating equations (CIEE) method and compare it with traditional multiple regression methods, the structural equation modeling method, and sequential G-estimation methods through a simulation study for the analysis of (completely observed) quantitative traits and time-to-event traits subject to censoring as primary phenotypes. The results show that CIEE provides valid estimators and inference by successfully removing the effect of intermediate phenotypes from the primary phenotype and is robust against measured and unmeasured confounding of the indirect effect through observed factors. All other methods except the sequential G-estimation method for quantitative traits fail in some scenarios where their test statistics yield inflated type I errors. In the analysis of the Genetic Analysis Workshop 19 dataset, we estimate and test genetic effects on blood pressure accounting for intermediate gene expression phenotypes. The results show that CIEE can identify genetic variants that would be missed by traditional regression analyses. CIEE is computationally fast, widely applicable to different fields, and available as an R package.Entities:
Keywords: causal inference; direct effect; directed acyclic graph; estimating equations; genetic association study; time-to-event phenotype
Mesh:
Year: 2017 PMID: 29265408 PMCID: PMC6619348 DOI: 10.1002/gepi.22107
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Figure 1Overview of the directed acyclic graph considered in this study. Y is the primary outcome measure of interest; K is a secondary phenotype; X is the genetic marker of interest and αXY is the direct effect of interest. It is assumed that so that L is a measured predictive factor of K, however, CIEE is also valid if L is a measured confounder of (i.e., and ). U represents unmeasured factors and confounders potentially influencing L and Y.
Figure 2Overview of the scenarios considered in the simulation study for investigation of the type I error. The models are submodels of the DAG in Figure 1 with some of the effects set to 0. Scenario 7 equals scenario 4 in this figure with larger effect sizes. Scenario 6 contains a nonzero effect of L on Y in the data generation, providing a test of robustness against model misspecification. Nonzero direct effects of X on Y are considered under each scenario for investigation of the power of the test statistics.
Empirical type I error estimates under the null model of a quantitative primary phenotype
| Scenario | MAFX | CIEE | BS | G‐EST | MR | RR | SEM |
|---|---|---|---|---|---|---|---|
| 1 | 0.05 | 5.45% | 5.40% | 5.03% | 5.35% | 5.29% | 5.04% |
| 0.1 | 5.26% | 5.18% | 5.05% | 5.34% | 5.23% | 5.05% | |
| 0.2 | 4.83% | 4.88% | 4.76% | 4.94% | 4.87% | 5.34% | |
| 0.4 | 5.16% | 5.17% | 5.12% | 5.06% | 4.80% | 5.24% | |
| 2 | 0.05 | 5.17% | 5.12% | 4.77% | 4.71% | 4.67% | 5.57% |
| 0.1 | 5.16% | 5.13% | 5.02% | 5.12% | 4.99% | 4.75% | |
| 0.2 | 5.01% | 4.91% | 4.87% | 5.16% | 4.90% | 5.46% | |
| 0.4 | 5.14% | 5.14% | 5.06% | 4.91% | 4.54% | 4.86% | |
| 3 | 0.05 | 5.37% | 5.27% | 4.89% | 4.89% | 4.82% | 5.18% |
| 0.1 | 5.17% | 5.11% | 4.99% | 5.00% | 4.87% | 4.85% | |
| 0.2 | 4.89% | 4.90% | 4.81% | 5.25% | 4.95% | 5.19% | |
| 0.4 | 5.06% | 4.96% | 4.97% | 4.77% | 4.38% | 5.21% | |
| 4 | 0.05 | 5.44% | 5.30% | 4.98% | 5.15% | 5.10% | 5.32% |
| 0.1 | 5.25% | 5.21% | 4.99% | 5.27% | 5.13% | 4.87% | |
| 0.2 | 4.81% | 4.79% | 4.73% | 6.03% | 5.68% | 4.98% | |
| 0.4 | 5.09% | 5.14% | 5.03% | 5.90% | 5.48% | 5.43% | |
| 5 | 0.05 | 5.26% | 5.11% | 4.83% | 4.94% | 4.82% | 5.42% |
| 0.1 | 5.08% | 5.02% | 4.91% | 5.42% | 5.23% | 5.24% | |
| 0.2 | 4.91% | 4.93% | 4.88% | 6.01% | 5.69% | 5.42% | |
| 0.4 | 5.12% | 5.14% | 5.07% | 6.11% | 5.75% | 5.40% | |
| 6 | 0.05 | 5.14% | 5.21% | 4.57% | 5.35% | 5.29% | 5.62% |
| 0.1 | 5.29% | 5.27% | 5.10% | 5.13% | 5.08% | 5.83% | |
| 0.2 | 5.03% | 4.99% | 4.83% | 5.25% | 5.01% | 6.01% | |
| 0.4 | 5.09% | 5.04% | 4.96% | 4.94% | 4.68% | 6.33% | |
| 7 | 0.05 | 5.06% | 4.97% | 4.61% | 36.14% | 30.45% | 21.33% |
| 0.1 | 5.05% | 5.16% | 4.94% | 56.37% | 45.31% | 33.26% | |
| 0.2 | 4.97% | 4.93% | 4.94% | 73.78% | 54.86% | 45.47% | |
| 0.4 | 5.18% | 5.23% | 5.17% | 82.96% | 59.54% | 55.24% |
Data were generated for individuals and replicates. CIEE is the proposed method using estimating equations; BS is CIEE using nonparametric bootstrap standard errors; G‐EST is the sequential G‐estimation approach (Vansteelandt et al., 2009); MR is multiple regression; RR is residual regression; and SEM is structural equation modeling.
Empirical type I error estimates under the null model of a time‐to‐event primary phenotype
| Scenario | Censoring | CIEE | BS | G‐EST | MR |
|---|---|---|---|---|---|
| 1 | 10% | 5.29% | 5.29% | 22.81% | 4.82% |
| 30% | 5.24% | 5.13% | 24.98% | 5.00% | |
| 50% | 5.29% | 5.33% | 20.24% | 5.28% | |
| 2 | 10% | 5.15% | 5.45% | 34.48% | 5.28% |
| 30% | 5.13% | 5.29% | 37.83% | 5.15% | |
| 50% | 5.14% | 5.20% | 30.33% | 4.74% | |
| 3 | 10% | 5.10% | 5.12% | 34.54% | 5.34% |
| 30% | 4.94% | 4.92% | 37.25% | 5.30% | |
| 50% | 4.88% | 4.77% | 30.66% | 4.84% | |
| 4 | 10% | 5.23% | 5.19% | 31.59% | 6.07% |
| 30% | 5.15% | 5.15% | 35.40% | 6.17% | |
| 50% | 5.24% | 5.14% | 29.43% | 5.68% | |
| 5 | 10% | 5.15% | 5.27% | 4.94% | 6.17% |
| 30% | 4.98% | 5.08% | 4.80% | 5.79% | |
| 50% | 4.93% | 4.84% | 4.33% | 5.73% |
Data were generated for individuals and replicates. The MAF of the marker X was set to 0.2. CIEE is the proposed method using estimating equations; BS is CIEE using nonparametric bootstrap standard errors; G‐EST is the sequential G‐estimation approach (Lipman et al., 2011); and MR is multiple log‐linear censored regression.
Power estimates under the alternative hypothesis models of a quantitative primary phenotype
| Scenario | αXY | CIEE | BS | G‐EST | MR | RR | SEM |
|---|---|---|---|---|---|---|---|
| 1 | 0.1 | 42.33% | 42.26% | 41.98% | 43.13% | 42.63% | 42.31% |
| 0.2 | 94.62% | 94.59% | 94.54% | 94.81% | 94.68% | 94.13% | |
| 2 | 0.1 | 42.52% | 42.40% | 42.22% | 41.55% | 40.53% | 43.51% |
| 0.2 | 94.22% | 94.09% | 94.09% | 94.18% | 93.81% | 94.15% | |
| 3 | 0.1 | 42.35% | 42.30% | 41.98% | 42.85% | 41.90% | 42.32% |
| 0.2 | 94.20% | 94.17% | 94.06% | 94.03% | 93.68% | 94.17% | |
| 4 | 0.1 | 39.90% | 39.74% | 39.53% | 30.12% | 29.30% | 35.85% |
| 0.2 | 91.88% | 91.88% | 91.78% | 87.92% | 87.38% | 90.26% | |
| 5 | 0.1 | 39.04% | 38.99% | 38.76% | 28.79% | 28.11% | 35.56% |
| 0.2 | 92.48% | 92.44% | 92.38% | 87.10% | 86.66% | 90.46% |
In all scenarios, data were generated for individuals and replicates. The MAF of the marker X was set to 0.2. CIEE is the proposed method using estimating equations; BS is CIEE using nonparametric bootstrap standard errors; G‐EST is the sequential G‐estimation approach (Vansteelandt et al., 2009); MR is multiple regression; RR is residual regression; and SEM is structural equation modeling.
Figure 3Overview of the assumed DAG for the analysis of the GAW19 data. Systolic blood pressure (BP) is the primary outcome; gene expression is the secondary phenotype and sex, age, and smoking are factors potentially influencing both phenotypes but unrelated to the investigated genetic markers.
Top five SNPs with the smallest P‐values in the GAW19 genetic association analysis using CIEE
|
| 95% CI for αXY |
| Adjusted | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SNP | MAF | Gene | CIEE | MR1 | MR2 | CIEE | MR1 | MR2 | CIEE | MR1 | MR2 | CIEE | MR1 | MR2 |
| rs56202530 | 0.14 |
| –0.15 (0.03) | –0.08 (0.03) | –0.09 (0.03) | (–0.21; –0.09) | (–0.14; ‐0.02) | (–0.15; ‐0.02) |
|
|
| 0.04 | 1 | 1 |
| rs3746061 | 0.05 |
| –0.11 (0.02) | –0.06 (0.04) | –0.06 (0.04) | (–0.15; ‐0.06) | (–0.14; 0.03) | (–0.14; 0.03) |
|
|
| 0.28 | 1 | 1 |
| rs60458566 | 0.13 |
| –0.12 (0.03) | –0.08 (0.03) | –0.08 (0.03) | (–0.18; –0.07) | (–0.14; ‐0.02) | (–0.14; ‐0.02) |
|
|
| 0.46 | 1 | 1 |
| rs62117661 | 0.09 |
| 0.26 (0.06) | 0.18 (0.04) | 0.18 (0.04) | (0.14; 0.37) | (0.10; 0.26) | (0.10; 0.26) |
|
|
| 0.55 | 0.46 | 1 |
| rs883394 | 0.25 |
| –0.10 (0.02) | –0.08 (0.02) | –0.06 (0.02) | (–0.15; –0.06) | (–0.12; ‐0.04) | (–0.11; ‐0.02) |
|
|
| 0.62 | 1 | 1 |
Top five SNPs with the strongest association with systolic blood pressure obtained through the CIEE genetic association analysis of 113,890 SNPs on chromosome 19, with the shown gene (expression) as intermediate phenotype. The SNP is described by its rs identification number. For these SNPs, point estimates, standard error estimates, approximate 95% confidence intervals (CI), raw P‐values and Bonferroni‐corrected (adjusted) P‐values obtained through CIEE and the multiple regression approaches MR1 and MR2 are shown. MAF is the observed minor allele frequency of the SNP.