| Literature DB >> 36037248 |
William R P Denault1,2, Jon Bohlin2,3, Christian M Page2,4, Stephen Burgess5, Astanand Jugessur2,6.
Abstract
Bias from weak instruments may undermine the ability to estimate causal effects in instrumental variable regression (IVR). We present here a new approach to handling weak instrument bias through the application of a new type of instrumental variable coined 'Cross-Fitted Instrument' (CFI). CFI splits the data at random and estimates the impact of the instrument on the exposure in each partition. These estimates are then used to perform an IVR on each partition. We adapt CFI to the Mendelian randomization (MR) setting and term this adaptation 'Cross-Fitting for Mendelian Randomization' (CFMR). We show that, even when using weak instruments, CFMR is, at worst, biased towards the null, which makes it a conservative one-sample MR approach. In particular, CFMR remains conservative even when the two samples used to perform the MR analysis completely overlap, whereas current state-of-the-art approaches (e.g., MR RAPS) display substantial bias in this setting. Another major advantage of CFMR lies in its use of all of the available data to select genetic instruments, which maximizes statistical power, as opposed to traditional two-sample MR where only part of the data is used to select the instrument. Consequently, CFMR is able to enhance statistical power in consortia-led meta-analyses by enabling a conservative one-sample MR to be performed in each cohort prior to a meta-analysis of the results across all the cohorts. In addition, CFMR enables a cross-ethnic MR analysis by accounting for ethnic heterogeneity, which is particularly important in meta-analyses where the participating cohorts may have different ethnicities. To our knowledge, none of the current MR approaches can account for such heterogeneity. Finally, CFMR enables the application of MR to exposures that are either rare or difficult to measure, which would normally preclude their analysis in the regular two-sample MR setting.Entities:
Mesh:
Year: 2022 PMID: 36037248 PMCID: PMC9462731 DOI: 10.1371/journal.pcbi.1010268
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
Fig 1Schematic overview of two-sample MR and two-fold CFMR.
Panel (a) shows the two-sample MR setup in which the first sample is used to build the instrument and the second sample is used to estimate the causal effect. Panel (b) shows the two-fold CFMR setup. Step 1 in panel (b) describes the random splitting of the dataset into two sub-samples. In step 2, two separate GWASs are performed: the first using sub-sample 1 and the exposure and the second using sub-sample 2 and the exposure. The predictors of the exposure are subsequently built based on sub-sample 1 (IV1) and sub-sample 2 (IV2). Step 3 refers to the 2SLS in which IV1 is applied to sub-sample 2 and IV2 is applied to sub-sample 1 to obtain the estimates of these IVRs. Finally, in step 4, the two 2SLS from step 3 are simply averaged to obtain the final estimate.
Fig 2Power curves for CFMR versus two-sample MR (2SMR) using the simulation setup described in the Simulations section in the main text (with h2 = 20%).
The dashed lines represent power curves for CFMR, while the solid lines represent the theoretical power for 2SMR [24]. Note that the solid pink line covers the solid red line perfectly (top part of the graph). These lines fully overlap as a result of symmetry. Given that the solid lines are generated from the theoretical power formula of two-sample MR (see Deng et al. [24]), the red and pink curves correspond to the same effect size (in terms of magnitude) but have opposite signs. The same is the case for the solid blue and gold lines in the middle part of the graph.
Fig 3Summary of the results of the simulations to assess bias due to complete sample overlap and weak instruments across different MR methods.
For details, see Section 3.2 in the main text. Each panel displays the box plots of the estimated effect according to method used (1SMR, Barry et al. [26], CFMR, and MR RAPS) and sample size (1000, 5, 000, 10, 000, and 50, 000). The y-axis corresponds to the estimated effect. The solid horizontal black line corresponds to the true value of the effect to be estimated. The different types of box plots correspond to the variance X explained by the genetic marker used as instruments (10% and 20%). The red box plots correspond to the estimates based on one-sample MR, the green box plots the estimates based on the Barry et al. [26] method, the purple box plots the estimate using MR RAPS, and finally, the blue box plots the estimates using CFMR.
CFMR estimates of maternal pre-pregnancy BMI on newborn’s birth weight per 1 SD increase in maternal pre-pregnancy BMI.
| − | 1SMR estimate | 1SMR Std. error | CFI variance explained (%) | CFMR estimate | CFMR Std. error | P-value | 95% CI | SNPs per split |
|---|---|---|---|---|---|---|---|---|
| -3 | 84.4 | 4.4 | 1.112 | 101.6 | 25.0 | 0.00005 | 52.6–150.6 | 1798 |
| -4 | 88.6 | 5.8 | 1.102 | 113.8 | 26.3 | 0.00002 | 62.2–165.5 | 624 |
| -5 | 88.1 | 8.4 | 1.101 | 94.3 | 26.6 | 0.00038 | 42.3–146.4 | 198 |
| -6 | 94.5 | 12.3 | 1.112 | 82.4 | 24.9 | 0.00093 | 33.6–131.2 | 52 |
| -7 | 85.6 | 16.3 | 0.951 | 73.4 | 27.0 | 0.00657 | 20.5–126.2 | 22 |
| -8 | 108.1 | 19.2 | 0.044 | 87.0 | 38.4 | 0.02351 | 11.7–162.3 | 6 |
‘−log10 SNP P-value’ corresponds to the cutoff used to build the CFI.
‘1SMR estimate’ corresponds to the estimation using one-sample MR.
‘1SMR Std. error’ corresponds to the standard error of the estimation based on one-sample MR.
‘Variance explained’ corresponds to the pre-pregnancy variance explained by the CFI.
‘SNPs per fold’ corresponds to the average number of SNPs with a P-value below a given threshold after clumping the output of each GWAS of maternal pre-pregnancy BMI.
‘Selected SNPs per fold’ corresponds to the average number of SNPs selected by LASSO to build the instrument in each fold.
Abbreviations: 1SMR, one-sample MR; CFI, Cross-Fitted Instrument; Std. error, standard error; CFMR, Cross-Fitting for Mendelian Randomization CI, confidence interval;
Fig 4Schematic overview of the application of CFMR to a dataset comprising two ethniticies.
In step 1, the two ethnicities are first separated into two distinct datasets, where each dataset contains individuals of the same ethnicity. In step 2, the dataset is split at random for each ethnicity. In step 3, two separate GWASs are performed: the first using sub-sample 1 and the exposure and the second using sub-sample 2 and the exposure. The predictors of the exposure are subsequently built based on sub-sample 1 (IV1) and sub-sample 2 (IV2). Step 4 refers to the 2SLS using IV1 on sub-sample 2 and IV2 on sub-sample 1, and, for each dataset, the two 2SLS from step 3 are averaged. Finally, in step 5 the two estimates are meta-analyzed to obtain the final estimate.