| Literature DB >> 35355302 |
Ralph Møller Trane1, Hyunseung Kang1.
Abstract
Recently, in genetic epidemiology, Mendelian randomization (MR) has become a popular approach to estimate causal exposure effects by using single nucleotide polymorphisms from genome-wide association studies (GWAS) as instruments. The most popular type of MR study, a two-sample summary-data MR study, relies on having summary statistics from two independent GWAS and using parametric methods for estimation. However, little is understood about using a nonparametric bound-based analysis, a popular approach in traditional instrumental variables frameworks, to study causal effects in two-sample MR. In this article, we explore using a nonparametric, bound-based analysis in two-sample MR studies, focusing primarily on implications for practice. We also propose a framework to assess how likely one can obtain more informative bounds if we used a different MR design, notably a one-sample MR design. We conclude by demonstrating our findings through two real data analyses concerning the causal effect of smoking on lung cancer and the causal effect of high cholesterol on heart attacks. Overall, our results suggest that while a bound-based analysis may be appealing due to its nonparametric nature, it is far more conservative in two-sample settings than in one-sample settings to get informative bounds on the causal exposure effect.Entities:
Keywords: Mendelian randomization; causal inference; instrument strength; nonparametric bounds; two-sample studies
Mesh:
Year: 2022 PMID: 35355302 PMCID: PMC9314714 DOI: 10.1002/sim.9368
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.497
FIGURE 1Illustration of the relationship between instrument strength, length of bounds, and coefficients from logistic regression model in two‐sample MR settings. (A) Relationship between instrument strength (ST) and length of the IV bounds. Black line is the upper bound on the two‐sample IV bounds from Theorem 1. Black dots indicate one of the 10,000 IV bounds. Colored dots indicate bounds from real data; see Section 5 for details. (B) Coefficients from logistic regression model and instrument strength (ST). Each color represents different magnitudes of unmeasured confounding
FIGURE 2Relationship between the smallest needed for a two‐sample IV bound to exclude 0 and the average treatment effect (ATE). Each color corresponds to different levels of unmeasured confounding
FIGURE 3Two‐sample bounds with or instruments. Bounds from strongest instruments are highlighted in red. Blue lines denote the true average treatment effects (ATEs). Columns represent effect size of the exposure and the unmeasured confounder on the outcome on the logit scale. Rows represents different scenarios of multiple instruments. The y‐axis represents instrument strength measured by and the x‐axis represents the average treatment effect
FIGURE 4Two‐sample bounds (horizontal lines) and average treatment effects (vertical blue lines) under pleiotropy. Columns represent the effect size of the exposure on the logit scale, rows represent the magnitude of violation of assumption (A3). The x‐axis shows average treatment effect (ATE), and the y‐axis represents instrument strength as measured by
FIGURE 5Nonparametric bounds based on a dichotomized exposure. Columns represent the effect size of the exposure on the logit scale. Rows represent different values of the intercept . The y‐axis shows the effect of the instrument on the continuous exposure, and the x‐axis shows the average treatment effect
Values of and used to illustrate our approach. For each cell (eg, row A, column 1), we have on the first row and on the second row
| Column 1 | Column 2 | Column 3 | |
|---|---|---|---|
| Row A |
{0.125, 0.399, 0.080} {0.699, 0.840, 0.742} |
{0.244, 0.275, 0.185} {0.238, 0.089, 0.146} |
{0.603, 0.469, 0.310} {0.638, 0.346, 0.719} |
| Row B |
{0.886, 0.968, 0.874} {0.805, 0.822, 0.951} |
{0.139, 0.441, 0.334} {0.179, 0.359, 0.559} |
{0.901, 0.909, 0.935} {0.821, 0.810, 0.905} |
| Row C |
{0.175, 0.079, 0.365} {0.599, 0.358, 0.087} |
{0.493, 0.911, 0.085} {0.360, 0.480, 0.441} |
{0.434, 0.045, 0.733} {0.747, 0.370, 0.169} |
FIGURE 6One‐sample bounds (horizontal lines) and two‐sample bounds (vertical dotted lines). Red color represents one‐sample bounds that do not cover zero and gray color represents one‐sample bounds that do cover zero
FIGURE 7Two‐sample IV bounds for the two real data examples with 8 SNPs from each data set. (A) Two‐sample IV bounds for the ATE of smoking on the incidence of lung cancer. (B) Two‐sample IV bounds for the ATE of high cholesterol on the incidence of heart attack
FIGURE 8Potential one‐sample IV bounds for the two real data examples using the method described in Section 4. (A) One‐sample IV bounds for the ATE of smoking on the incidence of lung cancer from 500 potential one‐sample distributions for each SNP. (B) One‐sample IV bounds for the ATE of high cholesterol on the incidence of heart attack from 500 potential one‐sample distributions for each SNP