| Literature DB >> 25382280 |
Neil M Davies1, Stephanie von Hinke Kessler Scholder, Helmut Farbmacher, Stephen Burgess, Frank Windmeijer, George Davey Smith.
Abstract
Instrumental variable estimates of causal effects can be biased when using many instruments that are only weakly associated with the exposure. We describe several techniques to reduce this bias and estimate corrected standard errors. We present our findings using a simulation study and an empirical application. For the latter, we estimate the effect of height on lung function, using genetic variants as instruments for height. Our simulation study demonstrates that, using many weak individual variants, two-stage least squares (2SLS) is biased, whereas the limited information maximum likelihood (LIML) and the continuously updating estimator (CUE) are unbiased and have accurate rejection frequencies when standard errors are corrected for the presence of many weak instruments. Our illustrative empirical example uses data on 3631 children from England. We used 180 genetic variants as instruments and compared conventional ordinary least squares estimates with results for the 2SLS, LIML, and CUE instrumental variable estimators using the individual height variants. We further compare these with instrumental variable estimates using an unweighted or weighted allele score as single instruments. In conclusion, the allele scores and CUE gave consistent estimates of the causal effect. In our empirical example, estimates using the allele score were more efficient. CUE with corrected standard errors, however, provides a useful additional statistical tool in applications with many weak instruments. The CUE may be preferred over an allele score if the population weights for the allele score are unknown or when the causal effects of multiple risk factors are estimated jointly.Entities:
Keywords: ALSPAC; Mendelian randomization; allele scores; continuously updating estimator; height; many weak instruments
Mesh:
Year: 2014 PMID: 25382280 PMCID: PMC4305205 DOI: 10.1002/sim.6358
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Simulation results comparing the properties of two-stage least squares, LIML, CUE, and allele score instrumental variable estimators when using 9, 25, and 100 variants as instrumental variables.
| Theoretical results | Empirical results | ||||
|---|---|---|---|---|---|
| Estimator | Bias | Median | IQR | RF | |
| 9 | 2SLS | 0.057 | 0.065 | 0.158 | 0.101 |
| LIML | −0.009 | 0.002 | 0.186 | 0.057 | |
| Corrected LIML | 0.047 | ||||
| CUE | 0.002 | 0.187 | 0.079 | ||
| Corrected CUE | 0.045 | ||||
| Allele score IV | 0.001 | 0.178 | 0.047 | ||
| 25 | 2SLS | 0.147 | 0.150 | 0.139 | 0.321 |
| LIML | −0.009 | 0.001 | 0.204 | 0.079 | |
| Corrected LIML | 0.049 | ||||
| CUE | 0.002 | 0.207 | 0.159 | ||
| Corrected CUE | 0.046 | ||||
| Allele score IV | 0.000 | 0.181 | 0.044 | ||
| 100 | 2SLS | 0.318 | 0.318 | 0.095 | 0.988 |
| LIML | −0.009 | 0.002 | 0.277 | 0.173 | |
| Corrected LIML | 0.046 | ||||
| CUE | 0.001 | 0.297 | 0.481 | ||
| Corrected CUE | 0.045 | ||||
| Allele score IV | 0.000 | 0.176 | 0.042 | ||
Note that μ2=56.70 in all designs. E[F]≈7.30,3.27,1.57 for k = 9,25,100, respectively. The means of the empirical F-statistics are 7.40, 3.32, and 1.58. The rejection frequency is the proportion of replications for which the true parameter (β = 0) lies outside the 95% confidence intervals. Median, median estimate, β = 0; n = 3000; 10,000 replications. 2SLS rejection frequencies assume a homoskedastic error term. Rejection frequencies for H0:β = 0 using Wald tests.
LIML, limited information maximum likelihood; CUE, continuously updating estimator; 2SLS, two-stage least square; IQR, interquartile range; RF, rejection frequency.
Figure 1Plot of the association of 180 genetic variants and height in ALSPAC and Lango et al., [42].
Baseline characteristics of included ALSPAC participants.
| Mean | Standard deviation | ||
|---|---|---|---|
| Height at age 15 years (in cm) | 169.35 | 8.42 | 3631 |
| Male | 0.48 | 0.50 | 3631 |
| Age in months at Teen Focus 3 Clinic | 185.31 | 3.58 | 3631 |
| Birth weight (g) | 3444 | 525 | 3435 |
| No. of older siblings in household | 1.02 | 0.99 | 3319 |
| No. of younger siblings in household | 0.92 | 0.95 | 3319 |
| Ln(income) | 5.74 | 0.46 | 3141 |
| Mother's education | 3.39 | 1.18 | 3391 |
| Father's education | 3.39 | 1.37 | 3320 |
| Mothers' mother's education | 2.35 | 1.33 | 2627 |
| Mothers' father's education | 2.57 | 1.46 | 2471 |
| Child not ever raised by natural father | 0.11 | 0.32 | 3314 |
| Father's social class at birth | 2.78 | 1.27 | 3146 |
| Mother works part time | 0.43 | 0.50 | 3113 |
| Mother works full time | 0.09 | 0.28 | 3113 |
| Partner works full time | 0.93 | 0.25 | 1608 |
| Mother drank during pregnancy | 0.57 | 0.50 | 3419 |
| Mother smoked during pregnancy | 0.15 | 0.36 | 3418 |
| Ever breast-fed | 0.86 | 0.34 | 3226 |
| Mother's age | 29.82 | 4.51 | 3495 |
| Participant had tried tobacco at age 8 years | 0.02 | 0.15 | 3133 |
Association of height at age 15 years with baseline characteristics.
| Actual height | |||
|---|---|---|---|
| Coef | |||
| Age in months at Teen Focus 3 Clinic | 3631 | 0.14 | 0.02 |
| Male | 3631 | 0.29 | <0.001 |
| Older siblings in household | 3321 | 0.00 | 0.98 |
| Younger siblings in household | 3321 | 0.02 | 0.30 |
| Ln(income) | 3141 | 0.02 | 0.03 |
| Mother's education | 3391 | 0.06 | 0.002 |
| Father's education | 3320 | 0.08 | 0.001 |
| Mothers' mother's education | 2627 | 0.06 | 0.02 |
| Mothers' father's education | 2471 | 0.06 | 0.05 |
| Child not ever raised by natural father | 3314 | 0.00 | 0.62 |
| Father's social class at birth | 3146 | −0.04 | 0.07 |
| Mother works part time | 3113 | 0.00 | 0.72 |
| Mother works full time | 3113 | 0.01 | 0.07 |
| Partner works full time | 1608 | 0.00 | 0.57 |
| Mother drank during pregnancy | 3419 | 0.00 | 0.85 |
| Mother smoked during pregnancy | 3418 | −0.01 | 0.08 |
| Ever breast-fed | 3226 | 0.02 | <0.001 |
| Mother's age | 3495 | 0.27 | <0.001 |
| Participant had tried tobacco at age 8 years | 3133 | 0.01 | 0.03 |
Coef, coefficient from a robust ordinary least squares regression of covariate on normalized height.
Association of variants and allele scores with covariates.
| Unweighted allele score | Weighted allele score | |||
|---|---|---|---|---|
| Coef | Coef | |||
| (1) | (2) | (3) | (4) | |
| Male | 0.01 | 0.41 | 0.01 | 0.33 |
| Birth weight (g) | 31.47 | <0.001 | 34.43 | <0.001 |
| Older siblings in household | 0.01 | 0.60 | 0.01 | 0.45 |
| Younger siblings in household | −0.02 | 0.19 | −0.02 | 0.25 |
| Ln(income) | 0.00 | 0.62 | −0.01 | 0.37 |
| Mother's education | 0.00 | 0.94 | 0.00 | 0.99 |
| Father's education | 0.00 | 0.94 | 0.00 | 0.84 |
| Mothers' mother's education | 0.02 | 0.50 | 0.02 | 0.42 |
| Mothers' father's education | 0.02 | 0.50 | 0.03 | 0.29 |
| Child not ever raised by natural father | 0.00 | 0.51 | 0.01 | 0.36 |
| Father's social class at birth | 0.00 | 0.83 | 0.00 | 0.99 |
| Mother works part time | 0.01 | 0.15 | 0.01 | 0.24 |
| Mother works full time | 0.00 | 0.66 | 0.00 | 0.50 |
| Partner works full time | −0.01 | 0.21 | −0.01 | 0.08 |
| Mother drank during pregnancy | −0.01 | 0.52 | −0.01 | 0.25 |
| Mother smoked during pregnancy | −0.01 | 0.35 | −0.01 | 0.18 |
| Ever breast-fed | 0.00 | 0.58 | 0.00 | 0.71 |
| Mother's age | −0.09 | 0.21 | −0.09 | 0.23 |
| Participant had tried tobacco at age 8 years | 0.00 | 0.58 | 0.00 | 0.32 |
Association of allele scores and normalized height at age 15 years (N = 3631).
| Unweighted score | Weighted score | |||
|---|---|---|---|---|
| (95% confidence interval) | (95% confidenceinterval) | |||
| Allele score | 0.21 (0.18,0.24) | <0.001 | 0.22 (0.19,0.25) | <0.001 |
| 0.043 | 0.049 | |||
| 164 | 190 | |||
The relationship between height and lung function (N = 3631).
| Method | Mean difference (95% confidence intervals) | Robust standard error | Sargan/Hansen | Hausman endogeneity tests | |||
|---|---|---|---|---|---|---|---|
| Ordinary least squares | 0.67 (0.65,0.70) | 0.013 | <0.001 | ||||
| Adjusted OLS | 0.53 (0.50,0.56) | 0.015 | <0.001 | ||||
| Two-stage least squares | 0.60 (0.52,0.68) | 0.040 | <0.001 | 2.03 | 0.02 | 4.08 | 0.04 |
| LIML | 0.47 (0.34,0.60) | 0.067 | <0.001 | 2.03 | 0.01 | 3.70 | 0.05 |
| LIML corrected | 0.47 (0.25,0.68) | 0.109 | <0.001 | ||||
| CUE | 0.43 (0.35,0.50) | 0.039 | <0.001 | 2.03 | 0.05 | ||
| CUE corrected | 0.43 (0.21,0.64) | 0.110 | <0.001 | ||||
| Unweighted allele score | 0.44 (0.32,0.56) | 0.062 | <0.001 | 164.18 | 16.80 | <0.001 | |
| Weighted allele score | 0.42 (0.31,0.53) | 0.057 | <0.001 | 190.15 | 22.71 | <0.001 |
Robust confidence intervals. Adjusted OLS adjusts for covariates described in Table II.
OLS, ordinary least squares; LIML, limited information maximum likelihood; CUE, continuously updating estimator.
We used the Sargan test for LIML and the Hansen J-test for the other estimators. The Hausman test assumes homoskedasticity; therefore, we do not include it for CUE.
Figure 2Estimated effect of standardized height on lung function. These results are presented in Table VI. OLS, ordinary least squares; 2SLS, two-stage least squares regression; LIML, limited information maximum likelihood; CUE, continuously updating estimator. The horizontal line indicates the OLS estimate. Lung function standardized to mean zero and standard deviation one.
Figure 3Coefficients and standard errors from two-stage least squares and continuously updating estimator by number of variants that are included as instruments. Variants included in order of association with height; thus, the analysis with one variant uses the strongest variant reported by Lango et al., and the analysis with two variants uses the two strongest variants and so on.