| Literature DB >> 27625185 |
Stephen Burgess1, Neil M Davies2,3, Simon G Thompson4.
Abstract
Mendelian randomization analyses are often performed using summarized data. The causal estimate from a one-sample analysis (in which data are taken from a single data source) with weak instrumental variables is biased in the direction of the observational association between the risk factor and outcome, whereas the estimate from a two-sample analysis (in which data on the risk factor and outcome are taken from non-overlapping datasets) is less biased and any bias is in the direction of the null. When using genetic consortia that have partially overlapping sets of participants, the direction and extent of bias are uncertain. In this paper, we perform simulation studies to investigate the magnitude of bias and Type 1 error rate inflation arising from sample overlap. We consider both a continuous outcome and a case-control setting with a binary outcome. For a continuous outcome, bias due to sample overlap is a linear function of the proportion of overlap between the samples. So, in the case of a null causal effect, if the relative bias of the one-sample instrumental variable estimate is 10% (corresponding to an F parameter of 10), then the relative bias with 50% sample overlap is 5%, and with 30% sample overlap is 3%. In a case-control setting, if risk factor measurements are only included for the control participants, unbiased estimates are obtained even in a one-sample setting. However, if risk factor data on both control and case participants are used, then bias is similar with a binary outcome as with a continuous outcome. Consortia releasing publicly available data on the associations of genetic variants with continuous risk factors should provide estimates that exclude case participants from case-control samples.Entities:
Keywords: Mendelian randomization; aggregated data; instrumental variables; summarized data; weak instrument bias
Mesh:
Year: 2016 PMID: 27625185 PMCID: PMC5082560 DOI: 10.1002/gepi.21998
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Simulation 1 with continuous outcome and different overlap proportions
| Mean | Mean | Percentage |
|
| |||||
|---|---|---|---|---|---|---|---|---|---|
| α |
|
| overlap |
|
|
|
|
|
|
| 0.04 | 4.4 | 0.9% | OLS | 0.498 | 0.697 | 1.193 | 0.298 | 0.497 | 0.993 |
| 0% | 0.156 | 0.157 | 0.161 | −0.001 | 0.000 | 0.004 | |||
| 10% | 0.167 | 0.173 | 0.186 | 0.006 | 0.012 | 0.024 | |||
| 20% | 0.178 | 0.187 | 0.211 | 0.012 | 0.022 | 0.045 | |||
| 30% | 0.189 | 0.203 | 0.237 | 0.019 | 0.033 | 0.067 | |||
| 40% | 0.200 | 0.218 | 0.262 | 0.026 | 0.044 | 0.087 | |||
| 50% | 0.211 | 0.233 | 0.287 | 0.033 | 0.055 | 0.108 | |||
| 60% | 0.223 | 0.249 | 0.313 | 0.040 | 0.066 | 0.130 | |||
| 70% | 0.234 | 0.264 | 0.339 | 0.047 | 0.077 | 0.152 | |||
| 80% | 0.245 | 0.280 | 0.363 | 0.053 | 0.088 | 0.172 | |||
| 90% | 0.256 | 0.294 | 0.389 | 0.060 | 0.098 | 0.194 | |||
| 100% | 0.266 | 0.309 | 0.415 | 0.066 | 0.109 | 0.215 | |||
| 0.06 | 8.5 | 1.7% | OLS | 0.495 | 0.692 | 1.185 | 0.295 | 0.492 | 0.985 |
| 0% | 0.178 | 0.179 | 0.176 | 0.000 | 0.000 | −0.002 | |||
| 10% | 0.184 | 0.186 | 0.189 | 0.003 | 0.005 | 0.009 | |||
| 20% | 0.189 | 0.193 | 0.203 | 0.006 | 0.011 | 0.020 | |||
| 30% | 0.194 | 0.201 | 0.215 | 0.009 | 0.016 | 0.031 | |||
| 40% | 0.200 | 0.209 | 0.228 | 0.013 | 0.022 | 0.041 | |||
| 50% | 0.206 | 0.216 | 0.242 | 0.016 | 0.027 | 0.052 | |||
| 60% | 0.211 | 0.224 | 0.255 | 0.019 | 0.032 | 0.063 | |||
| 70% | 0.216 | 0.231 | 0.268 | 0.023 | 0.038 | 0.074 | |||
| 80% | 0.221 | 0.239 | 0.282 | 0.026 | 0.043 | 0.086 | |||
| 90% | 0.226 | 0.246 | 0.295 | 0.028 | 0.048 | 0.097 | |||
| 100% | 0.232 | 0.254 | 0.309 | 0.031 | 0.054 | 0.109 | |||
| 0.08 | 14.4 | 2.8% | OLS | 0.492 | 0.687 | 1.174 | 0.292 | 0.487 | 0.974 |
| 0% | 0.187 | 0.187 | 0.187 | 0.000 | 0.000 | 0.000 | |||
| 10% | 0.191 | 0.191 | 0.195 | 0.002 | 0.003 | 0.007 | |||
| 20% | 0.194 | 0.195 | 0.203 | 0.004 | 0.006 | 0.014 | |||
| 30% | 0.197 | 0.199 | 0.211 | 0.006 | 0.009 | 0.020 | |||
| 40% | 0.200 | 0.204 | 0.218 | 0.008 | 0.011 | 0.026 | |||
| 50% | 0.203 | 0.208 | 0.225 | 0.010 | 0.014 | 0.032 | |||
| 60% | 0.206 | 0.213 | 0.232 | 0.012 | 0.018 | 0.038 | |||
| 70% | 0.210 | 0.218 | 0.240 | 0.014 | 0.022 | 0.044 | |||
| 80% | 0.213 | 0.222 | 0.248 | 0.016 | 0.024 | 0.050 | |||
| 90% | 0.217 | 0.226 | 0.255 | 0.018 | 0.027 | 0.056 | |||
| 100% | 0.220 | 0.230 | 0.264 | 0.020 | 0.030 | 0.064 | |||
Notes: Mean two‐stage least squares (or equivalently, inverse‐variance weighted) estimates with true causal effect (positive effect) and (null effect) for three values of genetic associations with the risk factor (α), three values of the confounder effect on the outcome (), and 11 values of the percentage overlap between the two samples. The mean F statistic (F), mean coefficient of determination (R 2), and mean ordinary least squares (OLS) estimate are given to judge the strength of the instrumental variables and the degree of confounding.
Figure 1Mean two‐stage least squares/inverse‐variance weighted estimates plotted against sample overlap for different values of instrument strength (, circle; , triangle; , plus) and different values of the confounder effect on the outcome (, black solid line; , mid‐gray dashed line; , light‐gray dotted line). Left panel: positive causal effect (); right panel: null causal effect ()
Simulation 2 with continuous outcome to validate bias and type 1 error rate formulae
| Mean | Mean | Relative | Empirical | Expected | ||||
|---|---|---|---|---|---|---|---|---|
| α | Mean | Mean | OLS estimate | IV estimate | bias | (Mean | Type 1 error | Type 1 error |
| 0.01 | 1.2 | 0.2% | 0.999 | 0.829 | 0.829 | 0.825 | 84.8% | 89.5% |
| 0.02 | 1.8 | 0.4% | 0.998 | 0.531 | 0.532 | 0.545 | 65.5% | 69.8% |
| 0.03 | 2.9 | 0.6% | 0.996 | 0.333 | 0.334 | 0.346 | 46.9% | 46.1% |
| 0.04 | 4.4 | 0.9% | 0.993 | 0.216 | 0.218 | 0.229 | 32.5% | 30.6% |
| 0.05 | 6.3 | 1.2% | 0.990 | 0.149 | 0.151 | 0.160 | 24.1% | 21.8% |
| 0.06 | 8.6 | 1.7% | 0.985 | 0.109 | 0.111 | 0.117 | 19.2% | 16.8% |
| 0.07 | 11.3 | 2.2% | 0.980 | 0.082 | 0.083 | 0.089 | 15.6% | 13.6% |
| 0.08 | 14.4 | 2.8% | 0.974 | 0.062 | 0.064 | 0.069 | 12.4% | 11.5% |
| 0.09 | 18.0 | 3.5% | 0.967 | 0.050 | 0.052 | 0.056 | 11.0% | 10.1% |
| 0.10 | 22.0 | 4.2% | 0.960 | 0.040 | 0.042 | 0.046 | 9.8% | 9.0% |
| 0.15 | 48.3 | 8.8% | 0.914 | 0.019 | 0.021 | 0.021 | 7.5% | 6.6% |
| 0.20 | 85.0 | 14.5% | 0.856 | 0.010 | 0.012 | 0.012 | 5.9% | 5.8% |
Notes: Simulation results with null causal effect , and confounder effect to estimate the relative bias and empirical Type 1 error rate (5% nominal significance level) of the two‐stage least squares (or equivalently, inverse‐variance weighted) instrumental variable (IV) estimate; the relative bias is the bias of the IV estimate divided by the bias of the ordinary least squares (OLS) estimate. The relative bias is theoretically predicted to be close to the reciprocal of the mean value of the F statistic, (Mean F)−1.
Simulation 3 with binary outcome to validate bias and type 1 error rate formulae
| Mean observational | Mean IV | Relative | Empirical | Expected | ||||
|---|---|---|---|---|---|---|---|---|
| α | Mean | Mean | estimate | estimate | bias | (Mean | Type 1 | Type 1 |
| Risk factor measurements taken in controls only | ||||||||
| 0.01 | 1.1 | 0.4% | 0.481 | 0.001 | 0.002 | ‐ | 4.9% | ‐ |
| 0.02 | 1.4 | 0.6% | 0.481 | 0.000 | 0.000 | ‐ | 5.2% | ‐ |
| 0.03 | 2.0 | 0.8% | 0.479 | −0.003 | 0.007 | ‐ | 4.8% | ‐ |
| 0.04 | 2.7 | 1.1% | 0.478 | −0.001 | −0.001 | ‐ | 5.0% | ‐ |
| 0.05 | 3.7 | 1.5% | 0.476 | 0.000 | 0.000 | ‐ | 5.2% | ‐ |
| 0.08 | 7.9 | 3.1% | 0.469 | 0.000 | 0.001 | ‐ | 4.7% | ‐ |
| Risk factor measurements taken in all participants | ||||||||
| 0.01 | 1.2 | 0.3% | 0.481 | 0.360 | 0.748 | 0.837 | 24.4% | 29.1% |
| 0.02 | 1.8 | 0.4% | 0.481 | 0.237 | 0.493 | 0.561 | 17.4% | 21.1% |
| 0.03 | 2.8 | 0.5% | 0.479 | 0.149 | 0.311 | 0.363 | 12.8% | 15.2% |
| 0.04 | 4.1 | 0.8% | 0.478 | 0.099 | 0.207 | 0.242 | 10.0% | 11.7% |
| 0.05 | 5.9 | 1.2% | 0.476 | 0.068 | 0.142 | 0.170 | 8.4% | 9.6% |
| 0.08 | 13.6 | 2.6% | 0.469 | 0.030 | 0.064 | 0.074 | 6.3% | 6.9% |
Notes: Mean instrumental variable (IV) estimates and empirical Type 1 error rate (5% nominal significance level) from inverse‐variance weighted method with binary outcome for null causal effect () and six values of genetic associations with the risk factor (α) in a case‐control setting, with the risk factor measurements taken in control participants only and with the risk factor measurements taken in all participants. Observational estimates are log odds ratios from logistic regression of the outcome on the risk factor, and IV estimates are log odds ratios calculated using logistic regression for the IV–outcome association and linear regression for the IV–risk factor association.