| Literature DB >> 27061298 |
Jack Bowden1, George Davey Smith1, Philip C Haycock1, Stephen Burgess2.
Abstract
Developments in genome-wide association studies and the increasing availability of summary genetic association data have made application of Mendelian randomization relatively straightforward. However, obtaining reliable results from a Mendelian randomization investigation remains problematic, as the conventional inverse-variance weighted method only gives consistent estimates if all of the genetic variants in the analysis are valid instrumental variables. We present a novel weighted median estimator for combining data on multiple genetic variants into a single causal estimate. This estimator is consistent even when up to 50% of the information comes from invalid instrumental variables. In a simulation analysis, it is shown to have better finite-sample Type 1 error rates than the inverse-variance weighted method, and is complementary to the recently proposed MR-Egger (Mendelian randomization-Egger) regression method. In analyses of the causal effects of low-density lipoprotein cholesterol and high-density lipoprotein cholesterol on coronary artery disease risk, the inverse-variance weighted method suggests a causal effect of both lipid fractions, whereas the weighted median and MR-Egger regression methods suggest a null effect of high-density lipoprotein cholesterol that corresponds with the experimental evidence. Both median-based and MR-Egger regression methods should be considered as sensitivity analyses for Mendelian randomization investigations with multiple genetic variants.Entities:
Keywords: Egger regression; Mendelian randomization; instrumental variables; pleiotropy; robust statistics
Mesh:
Substances:
Year: 2016 PMID: 27061298 PMCID: PMC4849733 DOI: 10.1002/gepi.21965
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Figure 1Illustrative diagram representing the hypothesized relationships between genetic variant , exposure X, disease Y, and confounders U when is a valid instrumental variable (IV). Crosses indicate violations of assumptions IV2 and IV3 that potentially lead to invalid inferences from conventional methods.
Figure 2Fictional example of a Mendelian randomization analysis with 10 genetic variants–six valid instrumental variables (hollow circles) and four invalid instrumental variables (solid circles) for finite sample size (left) and infinite sample size (right) showing IVW (solid line) and simple median (dashed line) estimates compared with the true causal effect (dotted line). The ratio estimate for each genetic variant is the gradient of the line connecting the relevant datapoint for that variant to the origin; the simple median estimate is the median of these ratio estimates.
Weights and percentiles of weighted median function
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|
| Simple median | ||||||||||
| Weight ( |
|
|
|
|
|
|
|
|
|
|
| Percentile ( | 5 | 15 | 25 | 35 | 45 | 55 | 65 | 75 | 85 | 95 |
| Weighting 1 | ||||||||||
| Weight ( |
|
|
|
|
|
|
|
|
|
|
| Percentile | 1.67 | 6.67 | 15.00 | 26.67 | 41.67 | 58.33 | 73.33 | 85.00 | 93.33 | 98.33 |
| Weighting 2 | ||||||||||
| Weight ( |
|
|
|
|
|
|
|
|
|
|
| Percentile ( | 2.78 | 9.72 | 27.78 | 52.78 | 70.83 | 81.94 | 88.89 | 93.06 | 95.83 | 98.61 |
Weights and percentiles of the empirical distribution function assigned to the ordered ratio instrumental variable estimates () for the hypothetical examples given in Figure 3.
Figure 3Empirical distribution functions of ordered ratio instrumental variable estimates () used for calculation of the simple median estimate (black) and two weighted median estimates (shown in red and blue) using the weights given in Table 1.
Results from simulation study in two‐sample setting with null causal effect
| Inverse‐variance weighted | Weighted median | Penalized weighted median | MR‐Egger regression | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Proportion of | Mean estimate | Mean estimate | Mean estimate | Mean estimate | |||||||
|
| invalid IVs |
|
| (mean SE) | Power | (mean SE) | Power | (mean SE) | Power | (mean SE) | Power |
| Scenario 1. Balanced pleiotropy, InSIDE assumption satisfied | |||||||||||
| 10,000 | 0.1 | 10.7 | 2.6% | −0.001 (0.114) | 5.4 | −0.001 (0.093) | 3.2 | −0.001 (0.093) | 3.4 | −0.003 (0.287) | 6.3 |
| 10,000 | 0.2 | 10.7 | 2.6% | 0.001 (0.153) | 6.2 | 0.001 (0.098) | 4.5 | 0.001 (0.098) | 4.0 | −0.001 (0.386) | 6.2 |
| 10,000 | 0.3 | 10.7 | 2.6% | 0.003 (0.185) | 6.3 | 0.001 (0.103) | 6.2 | 0.001 (0.104) | 5.2 | 0.000 (0.467) | 6.0 |
| 20,000 | 0.1 | 20.5 | 2.5% | −0.001 (0.107) | 5.1 | 0.000 (0.067) | 3.4 | 0.000 (0.067) | 3.6 | 0.000 (0.305) | 6.0 |
| 20,000 | 0.2 | 20.5 | 2.5% | 0.002 (0.150) | 5.3 | 0.001 (0.071) | 4.4 | 0.001 (0.071) | 4.4 | −0.004 (0.426) | 6.1 |
| 20,000 | 0.3 | 20.5 | 2.5% | −0.004 (0.184) | 5.7 | −0.001 (0.075) | 6.4 | −0.001 (0.077) | 6.3 | −0.004 (0.523) | 6.2 |
| Scenario 2. Directional pleiotropy, InSIDE assumption satisfied | |||||||||||
| 10,000 | 0.1 | 10.7 | 2.6% | 0.126 (0.111) | 14.6 | 0.033 (0.093) | 4.9 | 0.024 (0.093) | 4.2 | 0.013 (0.279) | 6.3 |
| 10,000 | 0.2 | 10.7 | 2.6% | 0.256 (0.145) | 37.0 | 0.078 (0.100) | 10.7 | 0.071 (0.102) | 9.6 | 0.037 (0.363) | 6.5 |
| 10,000 | 0.3 | 10.7 | 2.6% | 0.384 (0.169) | 62.7 | 0.139 (0.109) | 21.8 | 0.149 (0.114) | 22.1 | 0.046 (0.421) | 6.3 |
| 20,000 | 0.1 | 20.5 | 2.5% | 0.134 (0.104) | 15.0 | 0.026 (0.067) | 4.9 | 0.026 (0.068) | 5.2 | 0.003 (0.295) | 6.1 |
| 20,000 | 0.2 | 20.5 | 2.5% | 0.271 (0.141) | 42.9 | 0.061 (0.072) | 11.9 | 0.080 (0.078) | 15.8 | 0.011 (0.398) | 6.2 |
| 20,000 | 0.3 | 20.5 | 2.5% | 0.404 (0.166) | 70.4 | 0.115 (0.080) | 25.4 | 0.177 (0.095) | 35.9 | 0.016 (0.467) | 6.0 |
| Scenario 3. Directional pleiotropy, InSIDE assumption not satisfied | |||||||||||
| 10,000 | 0.1 | 13.5 | 3.3% | 0.182 (0.092) | 48.0 | 0.145 (0.095) | 29.9 | 0.062 (0.094) | 12.6 | 0.363 (0.195) | 50.9 |
| 10,000 | 0.2 | 16.3 | 3.9% | 0.318 (0.105) | 77.2 | 0.303 (0.097) | 61.3 | 0.186 (0.097) | 37.9 | 0.555 (0.204) | 72.5 |
| 10,000 | 0.3 | 19.2 | 4.6% | 0.421 (0.110) | 91.1 | 0.435 (0.092) | 82.5 | 0.335 (0.095) | 65.9 | 0.651 (0.204) | 83.2 |
| 20,000 | 0.1 | 26.0 | 3.1% | 0.189 (0.084) | 53.5 | 0.131 (0.072) | 32.4 | 0.059 (0.070) | 13.2 | 0.412 (0.184) | 57.5 |
| 20,000 | 0.2 | 31.7 | 3.8% | 0.327 (0.100) | 81.0 | 0.290 (0.075) | 63.8 | 0.176 (0.077) | 40.5 | 0.607 (0.198) | 77.1 |
| 20,000 | 0.3 | 37.2 | 4.4% | 0.427 (0.105) | 93.5 | 0.428 (0.072) | 83.9 | 0.321 (0.077) | 68.4 | 0.697 (0.197) | 86.9 |
Mean estimates, mean standard errors, and power of 95% confidence interval to reject null hypothesis of inverse‐variance weighted, weighted median, and MR‐Egger regression methods in simulation study for two‐sample Mendelian randomization with a null () causal effect.
Abbreviations: IV, instrumental variable; SE, standard error.
Results from simulation study in two‐sample setting with positive causal effect
| Inverse‐variance weighted | Weighted median | Penalized weighted median | MR‐Egger regression | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Proportion of | Mean estimate | Mean estimate | Mean estimate | Mean estimate | |||||||
|
| invalid IVs |
|
| (mean SE) | Power | (mean SE) | Power | (mean SE) | Power | (mean SE) | Power |
| Scenario 1. Balanced pleiotropy, InSIDE assumption satisfied | |||||||||||
| 10,000 | 0.1 | 10.7 | 2.6% | 0.090 (0.116) | 16.2 | 0.085 (0.098) | 12.3 | 0.086 (0.098) | 12.4 | 0.049 (0.292) | 6.7 |
| 10,000 | 0.2 | 10.7 | 2.6% | 0.092 (0.155) | 11.7 | 0.088 (0.103) | 13.5 | 0.089 (0.103) | 12.7 | 0.052 (0.390) | 6.5 |
| 10,000 | 0.3 | 10.7 | 2.6% | 0.094 (0.186) | 9.4 | 0.088 (0.109) | 13.4 | 0.089 (0.109) | 12.8 | 0.053 (0.470) | 6.4 |
| 20,000 | 0.1 | 20.5 | 2.5% | 0.095 (0.108) | 22.1 | 0.092 (0.071) | 24.0 | 0.093 (0.071) | 24.2 | 0.064 (0.309) | 6.8 |
| 20,000 | 0.2 | 20.5 | 2.5% | 0.097 (0.150) | 13.7 | 0.093 (0.075) | 24.4 | 0.094 (0.075) | 24.1 | 0.060 (0.428) | 6.5 |
| 20,000 | 0.3 | 20.5 | 2.5% | 0.092 (0.184) | 9.6 | 0.091 (0.079) | 22.6 | 0.092 (0.080) | 22.7 | 0.061 (0.525) | 6.3 |
| Scenario 2. Directional pleiotropy, InSIDE assumption satisfied | |||||||||||
| 10,000 | 0.1 | 10.7 | 2.6% | 0.217 (0.114) | 45.9 | 0.121 (0.099) | 20.9 | 0.111 (0.099) | 18.3 | 0.066 (0.285) | 7.4 |
| 10,000 | 0.2 | 10.7 | 2.6% | 0.348 (0.148) | 68.0 | 0.168 (0.107) | 32.5 | 0.160 (0.108) | 28.7 | 0.090 (0.367) | 7.3 |
| 10,000 | 0.3 | 10.7 | 2.6% | 0.475 (0.171) | 84.3 | 0.232 (0.116) | 47.6 | 0.239 (0.121) | 46.1 | 0.099 (0.425) | 7.0 |
| 20,000 | 0.1 | 20.5 | 2.5% | 0.230 (0.105) | 61.4 | 0.120 (0.071) | 37.2 | 0.119 (0.072) | 36.0 | 0.067 (0.298) | 7.1 |
| 20,000 | 0.2 | 20.5 | 2.5% | 0.366 (0.143) | 80.3 | 0.157 (0.077) | 52.5 | 0.173 (0.082) | 53.9 | 0.076 (0.401) | 6.7 |
| 20,000 | 0.3 | 20.5 | 2.5% | 0.500 (0.168) | 92.3 | 0.213 (0.086) | 66.3 | 0.269 (0.099) | 72.0 | 0.081 (0.469) | 6.4 |
| Scenario 3. Directional pleiotropy, InSIDE assumption not satisfied | |||||||||||
| 10,000 | 0.1 | 13.5 | 3.3% | 0.274 (0.095) | 71.1 | 0.238 (0.101) | 48.5 | 0.154 (0.099) | 29.1 | 0.432 (0.202) | 55.5 |
| 10,000 | 0.2 | 16.3 | 3.9% | 0.411 (0.107) | 89.9 | 0.400 (0.103) | 75.8 | 0.283 (0.103) | 55.6 | 0.634 (0.209) | 76.5 |
| 10,000 | 0.3 | 19.2 | 4.6% | 0.515 (0.112) | 96.8 | 0.533 (0.099) | 90.5 | 0.433 (0.101) | 78.5 | 0.736 (0.209) | 86.9 |
| 20,000 | 0.1 | 26.0 | 3.1% | 0.285 (0.085) | 81.1 | 0.229 (0.076) | 63.8 | 0.153 (0.074) | 47.6 | 0.491 (0.189) | 62.1 |
| 20,000 | 0.2 | 31.7 | 3.8% | 0.423 (0.101) | 93.4 | 0.391 (0.079) | 85.0 | 0.274 (0.081) | 71.1 | 0.694 (0.201) | 81.2 |
| 20,000 | 0.3 | 37.2 | 4.4% | 0.525 (0.106) | 98.0 | 0.529 (0.076) | 94.7 | 0.420 (0.082) | 87.1 | 0.788 (0.200) | 90.2 |
Mean estimates, mean standard errors, and power of 95% confidence interval to reject null hypothesis of inverse‐variance weighted, weighted median, and MR‐Egger regression methods in simulation study for two‐sample Mendelian randomization with a positive () causal effect.
Figure 4Scatter plots of genetic associations with the outcome (coronary artery disease risk, CAD) against genetic associations with the exposure (low‐density lipoprotein cholesterol, LDL‐c; high‐density lipoprotein cholesterol, HDL‐c; triglycerides). Left side: all genetic variants, right side: genetic variants having primary association with the target exposure. Solid line represents IVW estimate, dashed line represents weighted median estimate, and dotted line represents MR‐Egger estimate.
Results from applied example
| Primary association | ||||
|---|---|---|---|---|
| All genetic variants | with target exposure | |||
| Analysis method | Estimate (SE) |
| Estimate (SE) |
|
| Low‐density lipoprotein cholesterol (LDL‐c) | ||||
| Inverse‐variance weighted | 0.482 (0.060) |
| 0.470 (0.055) |
|
| Simple median | 0.429 (0.070) |
| 0.429 (0.079) |
|
| Weighted median | 0.458 (0.065) |
| 0.457 (0.065) |
|
| Penalized weighted median | 0.457 (0.063) |
| 0.457 (0.067) |
|
| MR‐Egger regression: slope | 0.617 (0.103) |
| 0.562 (0.094) |
|
| intercept | −0.009 (0.005) | −0.006 (0.005) | ||
| High‐density lipoprotein cholesterol (HDL‐c) | ||||
| Inverse‐variance weighted | −0.254 (0.070) |
| −0.137 (0.066) |
|
| Simple median | −0.267 (0.090) |
| −0.224 (0.085) |
|
| Weighted median | −0.069 (0.071) | −0.066 (0.065) | ||
| Penalized weighted median | −0.071 (0.068) | −0.064 (0.066) | ||
| MR‐Egger regression: slope | −0.013 (0.115) | 0.092 (0.107) | ||
| intercept | −0.014 (0.005) |
| −0.013 (0.005) |
|
| Triglycerides | ||||
| Inverse‐variance weighted | 0.416 (0.081) |
| 0.417 (0.095) |
|
| Simple median | 0.512 (0.101) |
| 0.565 (0.105) |
|
| Weighted median | 0.516 (0.084) |
| 0.521 (0.087) |
|
| Penalized weighted median | 0.528 (0.078) |
| 0.539 (0.089) |
|
| MR‐Egger regression: slope | 0.422 (0.140) |
| 0.464 (0.155) |
|
| intercept | −0.000 (0.008) | −0.004 (0.009) | ||
Estimates (standard errors) of causal effects of lipid fractions on coronary artery disease risk. Estimates are log odds ratios per 1 standard deviation increase in the exposure. The intercept term in MR‐Egger regression provides a test of directional pleiotropy.
a P‐values are indicated as: *, **, ***
Summary of methods considered in this paper
| Method | Breakdown | IV2 | IV3 | Comments |
|---|---|---|---|---|
| Two‐stage least squares | 0% | ✗ | ✗ | Requires individual‐level data. Biased when at least one genetic variant is an invalid IV. |
| Inverse‐variance weighted (IVW) | 0% | ✗ | ✗ | Equivalent to two‐stage least squares method with summary data. Also biased when at least one genetic variant is an invalid IV. |
| Simple median | 50% | ✓ | ✓ | Consistent when 50% of genetic variants are valid IVs. Inefficient compared with IVW and weighted median methods. |
| Weighted median | 50% | ✓ | ✓ | Consistent when 50% of weight contributed by genetic variants is valid. Efficiency is similar to that of IVW method. |
| Penalized weighted median | 50% | ✓ | ✓ | Equivalent to weighted median when there is no causal effect heterogeneity. Downweights the contribution of heterogeneous variants, so may have better finite sample properties, particularly if there is directional pleiotropy. |
| MR‐Egger regression | 100% | ✗ | ✓ | Consistent when 100% of genetic variants are invalid, but requires variants to satisfy a weaker assumption (the InSIDE assumption). This assumption is not automatically violated by an association between a genetic variant and a confounder, but it would be violated if several variants were associated with the same confounder. Substantially less efficient than IVW and median‐based methods, and more susceptible to weak instrument bias in a one‐sample setting. |
Breakdown refers to the breakdown level, the proportion of information that can come from invalid instrumental variables (IVs) before the method gives biased estimates. IV2 and IV3 refer to whether violations of the second (no association with confounders) and third (no direct effect on the outcome) instrumental variable assumptions are allowed (✓) or not allowed (✗).