| Literature DB >> 23863760 |
Brandon L Pierce, Stephen Burgess.
Abstract
Mendelian randomization (MR) is a method for estimating the causal relationship between an exposure and an outcome using a genetic factor as an instrumental variable (IV) for the exposure. In the traditional MR setting, data on the IV, exposure, and outcome are available for all participants. However, obtaining complete exposure data may be difficult in some settings, due to high measurement costs or lack of appropriate biospecimens. We used simulated data sets to assess statistical power and bias for MR when exposure data are available for a subset (or an independent set) of participants. We show that obtaining exposure data for a subset of participants is a cost-efficient strategy, often having negligible effects on power in comparison with a traditional complete-data analysis. The size of the subset needed to achieve maximum power depends on IV strength, and maximum power is approximately equal to the power of traditional IV estimators. Weak IVs are shown to lead to bias towards the null when the subsample is small and towards the confounded association when the subset is relatively large. Various approaches for confidence interval calculation are considered. These results have important implications for reducing the costs and increasing the feasibility of MR studies.Entities:
Keywords: Mendelian randomization; epidemiologic methods; instrumental variable
Mesh:
Year: 2013 PMID: 23863760 PMCID: PMC3783091 DOI: 10.1093/aje/kwt084
Source DB: PubMed Journal: Am J Epidemiol ISSN: 0002-9262 Impact factor: 4.897
Figure 1.Power (left) and median standard error (right) of the subsample instrumental-variable (IV) estimate for different values of the causal effect size (β) and the sample size of the first-stage regression (n), with a strong IV (R2 = 0.025), a sample size for the reduced-form regression (n) of 10,000, and a confounding variable with equal effects on X and Y (β = β = 0.2). β values are 0.0 (filled diamond), 0.05 (open diamond), 0.1 (filled triangle), 0.15 (open triangle), 0.2 (filled square), and 0.3 (open square).
Figure 2.Power (left) and median standard error (right) of the subsample instrumental-variable (IV) estimate for different values of the first-stage R2 and the sample size of the first-stage regression (n), with a constant effect size (β = 0.2), a sample size for the reduced-form regression (n) of 10,000, and a confounding variable with equal effects on X and Y (β = β = 0.2). First-stage R2 values are 0.002 (filled diamond), 0.004 (open diamond), 0.007 (filled triangle), 0.01 (open triangle), 0.0015 (filled square), 0.2 (open square), 0.03 (filled circle), and 0.05 (open circle).
Figure 3.Bias in the subsample instrumental-variable (IV) estimate in confounded (left) and unconfounded (right) scenarios for different values of the average first-stage F statistic and the relative size of the subsample used in the first-stage regression (n:n), with a constant causal effect size (β = 0.1) and a confounding variable with equal effects on X and Y (β = β = 0.3). Values for n:n are 1 (filled diamond), 0.75 (open diamond), 0.5 (filled triangle), 0.25 (open triangle), and 0.1 (filled square). The sample size for the reduced-form regression equation (n, on the right vertical axis) is shown as dots connected with a dashed line.
Figure 4.Bias in the subsample instrumental-variable (IV) estimate for different values of the first-stage R2 and the relative size of the sample used in the first-stage regression (n:n). The sample size for the reduced-form regression equation (n) is 10,000 (top), 3,000 (middle), and 1,000 (bottom), with a constant causal effect size (β = 0.1) and a confounding variable with equal effects on X and Y (β = β = 0.3). Values for n:n are 1 (filled diamond), 0.75 (open diamond), 0.5 (filled triangle), 0.25 (open triangle), and 0.1 (filled square).
A Comparison of Different Methods of Estimating 95% Confidence Intervals for Selected Simulated Data Setsa
| Strong IV ( | Moderate IV ( | Weak IV ( | |||||||
|---|---|---|---|---|---|---|---|---|---|
| β | SE | CI | β | SE | CI | β | SE | CI | |
| Subsample IV approach | |||||||||
| Delta method | 0.148 | 0.057 | 0.037, 0.259 | 0.152 | 0.132 | −0.108, 0.411 | 0.081 | 0.161 | −0.234, 0.397 |
| Sequential regressionc | 0.055 | 0.039, 0.256 | 0.128 | −0.099, 0.403 | 0.159 | −0.231, 0.394 | |||
| Fieller's theorem | N/A | 0.040, 0.272 | N/A | −0.108, 0.562 | N/A | −0.291, 0.602 | |||
| Bootstrapd | 0.068 | 0.014, 0.280 | 0.137 | −0.117, 0.421 | 0.551 | −0.999, 1.162 | |||
| Bayesian | 0.143 | 0.056 | 0.040, 0.258 | 0.174 | 0.289 | −0.161, 0.778 | 0.089 | 0.443 | −0.563, 0.975 |
| 2-sample IV approach | |||||||||
| Delta method | 0.117 | 0.068 | −0.015, 0.250 | 0.051 | 0.119 | −0.182, 0.284 | −0.086 | 0.163 | −0.405, 0.232 |
| Sequential regressionc | 0.065 | −0.011, 0.245 | 0.118 | −0.181, 0.282 | 0.160 | −0.440, 0.227 | |||
| Fieller's theorem | N/A | −0.012, 0.267 | N/A | −0.201, 0.336 | N/A | −0.610, 0.280 | |||
| Bootstrapd | 0.071 | −0.023, 0.257 | 0.138 | −0.221, 0.322 | 0.997 | −2.041, 1.868 | |||
| Bayesian | 0.119 | 0.072 | −0.013, 0.273 | 0.055 | 0.169 | −0.232, 0.390 | −0.100 | 0.456 | −1.012, 0.554 |
Abbreviations: CI, confidence interval; IV, instrumental variable; N/A, not applicable; SE, standard error.
a The simulated data sets consisted of 10,000 persons with data on G and Y and 2,000 persons with data on G and X. The true effect of X on Y was set to 0.1, and a confounding variable U had the effect of 0.2 on both X and Y.
b Theoretical F values were obtained using the following equation: F = R2 (n − 1)/(1 − R2).
c For the second-stage regression (of sequential regression), robust SEs are reported.
d Bootstrapping was conducted using 1,000 replications, with samples of size n and n randomly selected (with replacement) from the original samples of size n and n.