Literature DB >> 31373921

Survival Bias in Mendelian Randomization Studies: A Threat to Causal Inference.

Roelof A J Smit^1,2, Stella Trompet^1,2, Olaf M Dekkers^3,4,5, J Wouter Jukema^1,6, Saskia le Cessie^4,7.

Abstract

It has been argued that survival bias may distort results in Mendelian randomization studies in older populations. Through simulations of a simple causal structure we investigate the degree to which instrumental variable (IV)-estimators may become biased in the context of exposures that affect survival. We observed that selecting on survival decreased instrument strength and, for exposures with directionally concordant effects on survival (and outcome), introduced downward bias of the IV-estimator when the exposures reduced the probability of survival till study inclusion. Higher ages at study inclusion generally increased this bias, particularly when the true causal effect was not equal to null. Moreover, the bias in the estimated exposure-outcome relation depended on whether the estimation was conducted in the one- or two-sample setting. Finally, we briefly discuss which statistical approaches might help to alleviate this and other types of selection bias. See video abstract at, http://links.lww.com/EDE/B589.

Entities: Chemical Disease Species

Year: 2019 PMID： 31373921 PMCID： PMC6784762 DOI： 10.1097/EDE.0000000000001072

Source DB: PubMed Journal: Epidemiology ISSN： 1044-3983 Impact factor: 4.822

It has been argued that, in Mendelian randomization studies in older populations, survival bias may distort results, as these populations necessarily consist of the nonrandom subset of the population who have survived long enough to be included.[1,2] We aimed to investigate the impact of survival bias on Mendelian randomization analyses with a continuous outcome through a simulation study. In particular, we will examine whether instrumental variable (IV) estimators become biased within aging populations, for one- or two-sample Mendelian randomization settings. We will also discuss which statistical approaches may help to minimize or address this bias.

METHODS

Suppose we are interested in estimating the causal effect of X (e.g., cholesterol) on an outcome Y (e.g., cognitive test performance) in older individuals (Figure 1), where survival until study inclusion (S) is influenced by the exposure of interest X. If there is a second, uncorrelated exposure R (e.g., smoking) (Figure 1A) that also affects S, conditioning on survival (S = 1) will induce an association between X and R, and therefore also between G and R. We therefore expect that the previously uncorrelated variables will become associated, as an indirect path from G to Y going through R is opened.

FIGURE 1.

For two exposures increasing the risk of death, conditioning on survival (S) may induce an association between the previously uncorrelated risk factors X (and its genetic proxy G) and R (panel A). Additionally, conditioning on survival may induce an association between the genetic instrument G and any confounders U of the X–Y association (panel B), even in the absence of risk factor R. In addition, conditioning on S implies partial conditioning on X. Therefore, if confounders U (e.g., alcohol intake) of the X–Y association were to exist, G and U may become correlated (Figure 1B).

Data Generation

All simulation scenarios assume the basic causal structure shown in Figure 1A. The causal associations are chosen such that an increase in cause will lead to an increase in the consequence, except for the effect on survival where higher values in exposures correspond to lower survival times. In our simulations, we used linear models to generate the exposure and outcome. We assumed a homogeneous treatment effect, meaning that there was no additive effect modification by the confounder, the instrument, and the other exposure. For each scenario we generated a dataset of 10 million observations with multiple randomly generated variables: a binary genetic instrument (G), a continuous exposure (X) influenced by G, a binary exposure (R), a continuous outcome (Y) influenced by R and variably influenced by X, and finally an age of death influenced by both X and R. In secondary analyses, we added a continuous confounder (U) with equal effects on X and Y. We also repeated the simulations for a normally distributed R, and when interaction exists between X and R on age of death.[3] Details of data generation and parameters values are presented in the Table, and results of the secondary analyses are presented in eAppendix 1; http://links.lww.com/EDE/B568. Parameters Values and Details of Data Generation To generate survival time we used the 2016 mortality data of the United States from the Human Mortality Database.[4] Using the MortalityLaws R-package we estimated the parameters of the Gompertz model (eFigure 1; http://links.lww.com/EDE/B568), which were subsequently used to generate survival times for our simulated population. Effects of both X and R on age of death were modeled as hazard ratios, with having higher levels of X and/or R translating into an earlier death (on average). Subsequently, we considered different age boundaries for study inclusion, from 75 to 95 years, thereby steadily decreasing the number of surviving participants (S = 1). We used R (version 3.4.1) for all data generation and analyses. Annotated code is provided as eAppendix 2; http://links.lww.com/EDE/B569.

Effects on Instrumental Variable Estimators

Increasingly, summarized data (coefficients and standard errors) from large genome-wide association study consortia are made publicly available, which enables researchers to perform two-sample Mendelian randomization even if their own study does not allow for estimation of both coefficients necessary to calculate the Wald ratio.[5] These external datasets are generally more likely to have primarily included middle-aged participants,[6,7] and thus less likely to be affected by survival bias. Therefore, under the assumption of no age-related effect modification, we not only considered the scenario where both coefficients are estimated in the same increasingly selected dataset (i.e., “internal” estimation), but also what happens if the association measure between G and X is taken from an external dataset not selected on survival (i.e., “external” estimation, by taking the fixed value of our total population). We assumed different true effects of X on Y (Table). We calculated confidence intervals for the internally estimated Wald ratio using the SEM R-package.

RESULTS

For our instrument, which explained 5% of variance in the exposure in the unselected (i.e., entire) sample, the R2 declined from 4.9% at 75 years to 4.5% at 95 years. The prevalence of G declined from 0.49 at age 75 years to 0.46 at age 95. Furthermore, of the population alive at 75 years, 15.6% were still alive at 95 years.

Bias to Instrumental Variable Estimator

The bias in the IV-estimator depended on (a) whether the association between G and X is estimated within the same selected dataset as the association between G and Y was, or within an external source not selected on age and (2) whether the true effect of X on Y is null or not (Figure 2). In general, selecting higher ages at study inclusion increased the amount of bias. In cases where the true effect >0, a clear downward bias was seen, underestimating the true effect. Where the true effect of X on Y was null, the resulting association became nominally negative (Figure 2A).

FIGURE 2.

Estimating the causal effect of X on Y. Wald ratios (95% CI) based on internally (white ribbon) versus externally (gray ribbon) estimated X–Y association, for different true effects of exposure X on outcome Y. Dashed lines denote the true (i.e., unselected) Wald ratio, which equals the true causal effect of X on Y. CI, confidence interval. When both the numerator (Y ~ G) and denominator (X ~ G) of the Wald ratio are estimated in the same selected dataset, we observed that they were similarly biased. Taking the ratio, therefore seemingly cancels out much of the bias, compared to the situation where only the numerator is estimated in the selected population. In this latter situation, the relative degree of the bias equals that seen for the association measure between G and Y (eFigures 2–3; http://links.lww.com/EDE/B568). The two IV-estimators diverge more strongly as the true effect of X on Y is stronger.

Secondary Analyses

Simulation results for the causal structure depicted under Figure 1B, and for the combination of Figure 1A and B, did not show markedly different results (eFigures 4–6; http://links.lww.com/EDE/B568). For the normally distributed R, we observed similar results, though selection bias partially persisted for the internally estimated IV-estimator (eFigure 3; http://links.lww.com/EDE/B568). Positive interaction between X and R on age of death increased the amount of downward bias. In contrast, sufficiently strong negative interaction led to upward bias (eFigure 7; http://links.lww.com/EDE/B568).

DISCUSSION

We observed that, for selection-related exposures with directionally concordant effects on survival (and outcome), the IV-estimator based on a genetic proxy of that exposure became downwardly biased. In addition, we observed that when selection increased the instrument strength decreased, as measured by R2. While our simulations specifically examined age-related selection, researchers with data on populations selected on alternative characteristics (e.g., disease status) will similarly have to consider the possible influence of selection bias in genetic analyses.[8-10] Alternative causal structures that might give rise to selection bias in Mendelian randomization studies have been presented elsewhere.[11] Recent work by Canan et al.[2] suggests that, for the causal structure under investigation in our simulations, selection bias may be corrected via inverse probability weighting. In general, we expect that if the selection gradient solely depends on measured variables which are available for the entire original study population (i.e., also for those individuals who are not selected in the study sample), and assuming a constant treatment effect, both inverse probability weighting and multiple imputation could be suitable solutions for selection bias. If data are only available for the selected individuals, but a sufficient set of selection-related variables are precisely measured, then inclusion of these selection-related variables in multivariable regression models may resolve the bias if the models are well-specified. The value of representative cohorts with little selection (e.g., birth cohorts) cannot be overstated in this context,[11,12] though genotyping genetically informative family members may hold promise as well.[13] Alternative strategies have been proposed in the context of hazard models,[14-16] which may fare better when selection depends on (partially) unobserved variables. In addition, methods of using covariate balance to detect dependent censoring in longitudinal studies exist, though these approaches have not been extended to IV-analysis where bias amplification may occur.[17,18] In our simulations, we assumed that survival bias would similarly affect different components of the causal structure (e.g., both the numerator and denominator of the Wald ratio). In addition, we solely considered one commonly occurring genetic instrument and uncorrelated exposures with directionally concordant effects on survival (and the outcome of interest), though R could be considered a combined vector for many possible competing causes of death. Furthermore, we did not consider a binary outcome, to avoid the issue of non-collapsibility, and restricted our investigations to a linear instrument-exposure association. It will be of interest to examine more detailed simulations using greater numbers of instruments and exposures to derive bias formulas (as others have done for collider bias in binary variable structures[19]). Of particular interest would be to examine whether sets of polygenic instruments, whose individual metabolic pathways to the intermediate phenotype may differ, might be differentially affected by survival bias. Finally, future work should explore the implications of using different IV assumptions such as monotonicity.

TABLE.

Parameters Values and Details of Data Generation

15 in total

1. A structural approach to selection bias.

Authors: Miguel A Hernán; Sonia Hernández-Díaz; James M Robins
Journal: Epidemiology Date: 2004-09 Impact factor: 4.822

2. Using family members to augment genetic case-control studies of a life-threatening disease.

Authors: Lu Chen; Clarice R Weinberg; Jinbo Chen
Journal: Stat Med Date: 2016-02-11 Impact factor: 2.373

3. Mendelian randomization studies in the elderly.

Authors: Anna G C Boef; Saskia le Cessie; Olaf M Dekkers
Journal: Epidemiology Date: 2015-03 Impact factor: 4.822

4. Survivor bias in Mendelian randomization analysis.

Authors: Stijn Vansteelandt; Oliver Dukes; Torben Martinussen
Journal: Biostatistics Date: 2018-10-01 Impact factor: 5.899

5. The effect of survival bias on case-control genetic association studies of highly lethal diseases.

Authors: Christopher D Anderson; Michael A Nalls; Alessandro Biffi; Natalia S Rost; Steven M Greenberg; Andrew B Singleton; James F Meschia; Jonathan Rosand
Journal: Circ Cardiovasc Genet Date: 2011-02-03

6. Instrumental Variable Analyses and Selection Bias.

Authors: Chelsea Canan; Catherine Lesko; Bryan Lau
Journal: Epidemiology Date: 2017-05 Impact factor: 4.822

7. Quantifying the extent to which index event biases influence large genetic association studies.

Authors: Hanieh Yaghootkar; Michael P Bancks; Sam E Jones; Aaron McDaid; Robin Beaumont; Louise Donnelly; Andrew R Wood; Archie Campbell; Jessica Tyrrell; Lynne J Hocking; Marcus A Tuke; Katherine S Ruth; Ewan R Pearson; Anna Murray; Rachel M Freathy; Patricia B Munroe; Caroline Hayward; Colin Palmer; Michael N Weedon; James S Pankow; Timothy M Frayling; Zoltán Kutalik
Journal: Hum Mol Genet Date: 2017-03-01 Impact factor: 6.150

8. Mendelian randomization analysis with multiple genetic variants using summarized data.

Authors: Stephen Burgess; Adam Butterworth; Simon G Thompson
Journal: Genet Epidemiol Date: 2013-09-20 Impact factor: 2.135

9. Impact of Selection Bias on Estimation of Subsequent Event Risk.

Authors: Yi-Juan Hu; Amand F Schmidt; Frank Dudbridge; Michael V Holmes; James M Brophy; Vinicius Tragante; Ziyi Li; Peizhou Liao; Arshed A Quyyumi; Raymond O McCubrey; Benjamin D Horne; Aroon D Hingorani; Folkert W Asselbergs; Riyaz S Patel; Qi Long
Journal: Circ Cardiovasc Genet Date: 2017-10

10. Discovery and refinement of loci associated with lipid levels.

Authors: Cristen J Willer; Ellen M Schmidt; Sebanti Sengupta; Michael Boehnke; Panos Deloukas; Sekar Kathiresan; Karen L Mohlke; Erik Ingelsson; Gonçalo R Abecasis; Gina M Peloso; Stefan Gustafsson; Stavroula Kanoni; Andrea Ganna; Jin Chen; Martin L Buchkovich; Samia Mora; Jacques S Beckmann; Jennifer L Bragg-Gresham; Hsing-Yi Chang; Ayşe Demirkan; Heleen M Den Hertog; Ron Do; Louise A Donnelly; Georg B Ehret; Tõnu Esko; Mary F Feitosa; Teresa Ferreira; Krista Fischer; Pierre Fontanillas; Ross M Fraser; Daniel F Freitag; Deepti Gurdasani; Kauko Heikkilä; Elina Hyppönen; Aaron Isaacs; Anne U Jackson; Åsa Johansson; Toby Johnson; Marika Kaakinen; Johannes Kettunen; Marcus E Kleber; Xiaohui Li; Jian'an Luan; Leo-Pekka Lyytikäinen; Patrik K E Magnusson; Massimo Mangino; Evelin Mihailov; May E Montasser; Martina Müller-Nurasyid; Ilja M Nolte; Jeffrey R O'Connell; Cameron D Palmer; Markus Perola; Ann-Kristin Petersen; Serena Sanna; Richa Saxena; Susan K Service; Sonia Shah; Dmitry Shungin; Carlo Sidore; Ci Song; Rona J Strawbridge; Ida Surakka; Toshiko Tanaka; Tanya M Teslovich; Gudmar Thorleifsson; Evita G Van den Herik; Benjamin F Voight; Kelly A Volcik; Lindsay L Waite; Andrew Wong; Ying Wu; Weihua Zhang; Devin Absher; Gershim Asiki; Inês Barroso; Latonya F Been; Jennifer L Bolton; Lori L Bonnycastle; Paolo Brambilla; Mary S Burnett; Giancarlo Cesana; Maria Dimitriou; Alex S F Doney; Angela Döring; Paul Elliott; Stephen E Epstein; Gudmundur Ingi Eyjolfsson; Bruna Gigante; Mark O Goodarzi; Harald Grallert; Martha L Gravito; Christopher J Groves; Göran Hallmans; Anna-Liisa Hartikainen; Caroline Hayward; Dena Hernandez; Andrew A Hicks; Hilma Holm; Yi-Jen Hung; Thomas Illig; Michelle R Jones; Pontiano Kaleebu; John J P Kastelein; Kay-Tee Khaw; Eric Kim; Norman Klopp; Pirjo Komulainen; Meena Kumari; Claudia Langenberg; Terho Lehtimäki; Shih-Yi Lin; Jaana Lindström; Ruth J F Loos; François Mach; Wendy L McArdle; Christa Meisinger; Braxton D Mitchell; Gabrielle Müller; Ramaiah Nagaraja; Narisu Narisu; Tuomo V M Nieminen; Rebecca N Nsubuga; Isleifur Olafsson; Ken K Ong; Aarno Palotie; Theodore Papamarkou; Cristina Pomilla; Anneli Pouta; Daniel J Rader; Muredach P Reilly; Paul M Ridker; Fernando Rivadeneira; Igor Rudan; Aimo Ruokonen; Nilesh Samani; Hubert Scharnagl; Janet Seeley; Kaisa Silander; Alena Stančáková; Kathleen Stirrups; Amy J Swift; Laurence Tiret; Andre G Uitterlinden; L Joost van Pelt; Sailaja Vedantam; Nicholas Wainwright; Cisca Wijmenga; Sarah H Wild; Gonneke Willemsen; Tom Wilsgaard; James F Wilson; Elizabeth H Young; Jing Hua Zhao; Linda S Adair; Dominique Arveiler; Themistocles L Assimes; Stefania Bandinelli; Franklyn Bennett; Murielle Bochud; Bernhard O Boehm; Dorret I Boomsma; Ingrid B Borecki; Stefan R Bornstein; Pascal Bovet; Michel Burnier; Harry Campbell; Aravinda Chakravarti; John C Chambers; Yii-Der Ida Chen; Francis S Collins; Richard S Cooper; John Danesh; George Dedoussis; Ulf de Faire; Alan B Feranil; Jean Ferrières; Luigi Ferrucci; Nelson B Freimer; Christian Gieger; Leif C Groop; Vilmundur Gudnason; Ulf Gyllensten; Anders Hamsten; Tamara B Harris; Aroon Hingorani; Joel N Hirschhorn; Albert Hofman; G Kees Hovingh; Chao Agnes Hsiung; Steve E Humphries; Steven C Hunt; Kristian Hveem; Carlos Iribarren; Marjo-Riitta Järvelin; Antti Jula; Mika Kähönen; Jaakko Kaprio; Antero Kesäniemi; Mika Kivimaki; Jaspal S Kooner; Peter J Koudstaal; Ronald M Krauss; Diana Kuh; Johanna Kuusisto; Kirsten O Kyvik; Markku Laakso; Timo A Lakka; Lars Lind; Cecilia M Lindgren; Nicholas G Martin; Winfried März; Mark I McCarthy; Colin A McKenzie; Pierre Meneton; Andres Metspalu; Leena Moilanen; Andrew D Morris; Patricia B Munroe; Inger Njølstad; Nancy L Pedersen; Chris Power; Peter P Pramstaller; Jackie F Price; Bruce M Psaty; Thomas Quertermous; Rainer Rauramaa; Danish Saleheen; Veikko Salomaa; Dharambir K Sanghera; Jouko Saramies; Peter E H Schwarz; Wayne H-H Sheu; Alan R Shuldiner; Agneta Siegbahn; Tim D Spector; Kari Stefansson; David P Strachan; Bamidele O Tayo; Elena Tremoli; Jaakko Tuomilehto; Matti Uusitupa; Cornelia M van Duijn; Peter Vollenweider; Lars Wallentin; Nicholas J Wareham; John B Whitfield; Bruce H R Wolffenbuttel; Jose M Ordovas; Eric Boerwinkle; Colin N A Palmer; Unnur Thorsteinsdottir; Daniel I Chasman; Jerome I Rotter; Paul W Franks; Samuli Ripatti; L Adrienne Cupples; Manjinder S Sandhu; Stephen S Rich
Journal: Nat Genet Date: 2013-10-06 Impact factor: 38.330

20 in total

1. Differential Effects of Genetically Determined Cholesterol Efflux Capacity on Coronary Artery Disease and Ischemic Stroke.

Authors: Aoming Jin; Mengxing Wang; Weiqi Chen; Hongyi Yan; Xianglong Xiang; Yuesong Pan
Journal: Front Cardiovasc Med Date: 2022-07-04

Review 2. Understanding the assumptions underlying Mendelian randomization.

Authors: Christiaan de Leeuw; Jeanne Savage; Ioan Gabriel Bucur; Tom Heskes; Danielle Posthuma
Journal: Eur J Hum Genet Date: 2022-01-26 Impact factor: 5.351

3. Classical risk factors for primary coronary artery disease from an aging perspective through Mendelian Randomization.

Authors: Swetta A Jansen; Bas Huiskens; Stella Trompet; JWouter Jukema; Simon P Mooijaart; Ko Willems van Dijk; Diana van Heemst; Raymond Noordam
Journal: Geroscience Date: 2021-12-21 Impact factor: 7.581

Review 4. A review of Mendelian randomization in amyotrophic lateral sclerosis.

Authors: Thomas H Julian; Sarah Boddy; Mahjabin Islam; Julian Kurz; Katherine J Whittaker; Tobias Moll; Calum Harvey; Sai Zhang; Michael P Snyder; Christopher McDermott; Johnathan Cooper-Knock; Pamela J Shaw
Journal: Brain Date: 2022-04-29 Impact factor: 15.255

5. Mendelian Randomisation Study of Smoking, Alcohol, and Coffee Drinking in Relation to Parkinson's Disease.

Authors: Cloé Domenighetti; Pierre-Emmanuel Sugier; Ashwin Ashok Kumar Sreelatha; Claudia Schulte; Sandeep Grover; Océane Mohamed; Berta Portugal; Patrick May; Dheeraj R Bobbili; Milena Radivojkov-Blagojevic; Peter Lichtner; Andrew B Singleton; Dena G Hernandez; Connor Edsall; George D Mellick; Alexander Zimprich; Walter Pirker; Ekaterina Rogaeva; Anthony E Lang; Sulev Koks; Pille Taba; Suzanne Lesage; Alexis Brice; Jean-Christophe Corvol; Marie-Christine Chartier-Harlin; Eugénie Mutez; Kathrin Brockmann; Angela B Deutschländer; Georges M Hadjigeorgiou; Efthimos Dardiotis; Leonidas Stefanis; Athina Maria Simitsi; Enza Maria Valente; Simona Petrucci; Stefano Duga; Letizia Straniero; Anna Zecchinelli; Gianni Pezzoli; Laura Brighina; Carlo Ferrarese; Grazia Annesi; Andrea Quattrone; Monica Gagliardi; Hirotaka Matsuo; Yusuke Kawamura; Nobutaka Hattori; Kenya Nishioka; Sun Ju Chung; Yun Joong Kim; Pierre Kolber; Bart Pc van de Warrenburg; Bastiaan R Bloem; Jan Aasly; Mathias Toft; Lasse Pihlstrøm; Leonor Correia Guedes; Joaquim J Ferreira; Soraya Bardien; Jonathan Carr; Eduardo Tolosa; Mario Ezquerra; Pau Pastor; Monica Diez-Fairen; Karin Wirdefeldt; Nancy L Pedersen; Caroline Ran; Andrea C Belin; Andreas Puschmann; Clara Hellberg; Carl E Clarke; Karen E Morrison; Manuela Tan; Dimitri Krainc; Lena F Burbulla; Matt J Farrer; Rejko Krüger; Thomas Gasser; Manu Sharma; Alexis Elbaz
Journal: J Parkinsons Dis Date: 2022 Impact factor: 5.520

6. Inheritance of a common androgen synthesis variant allele is associated with female COVID susceptibility in UK Biobank.

Authors: Jeffrey M McManus; Navin Sabharwal; Peter Bazeley; Nima Sharifi
Journal: Eur J Endocrinol Date: 2022-05-12 Impact factor: 6.558

10. Use of genetic variation to separate the effects of early and later life adiposity on disease risk: mendelian randomisation study.

Authors: Tom G Richardson; Eleanor Sanderson; Benjamin Elsworth; Kate Tilling; George Davey Smith
Journal: BMJ Date: 2020-05-06