Elizabeth W Diemer1, Jeremy Labrecque2, Henning Tiemeier1,3, Sonja A Swanson2,4. 1. From the Department of Child Psychiatry, Erasmus MC, Rotterdam, The Netherlands. 2. Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands. 3. Department of Social and Behavioral Sciences, Harvard. T.H. Chan School of Public Health, Boston, MA. 4. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA.
Abstract
BACKGROUND: Investigators often support the validity of Mendelian randomization (MR) studies, an instrumental variable approach proposing genetic variants as instruments, via. subject matter knowledge. However, the instrumental variable model implies certain inequalities, offering an empirical method of falsifying (but not verifying) the underlying assumptions. Although these inequalities are said to detect only extreme assumption violations in practice, to our knowledge they have not been used in settings with multiple proposed instruments. METHODS: We applied the instrumental inequalities to an MR analysis of the effect of maternal pregnancy vitamin D on offspring psychiatric outcomes, proposing four independent maternal genetic variants as instruments. We assessed whether the proposed instruments satisfied the instrumental inequalities separately and jointly and explored the instrumental inequalities' properties via simulations. RESULTS: The instrumental inequalities were satisfied (i.e., we did not falsify the MR model) when considering each variant separately. However, the inequalities were violated when considering four variants jointly and for some combinations of two or three variants (two of 36 two-variant combinations and 18 of 24 three-variant combinations). In simulations, the inequalities detected structural biases more often when assessing proposed instruments jointly, although falsification in the absence of structural bias remained rare. CONCLUSIONS: The instrumental inequalities detected violations of the MR assumptions for genetic variants jointly proposed as instruments in our study, although the instrumental inequalities were satisfied when considering each proposed instrument separately. We discuss how investigators can assess instrumental inequalities to eliminate clearly invalid analyses in settings with many proposed instruments and provide appropriate code.
BACKGROUND: Investigators often support the validity of Mendelian randomization (MR) studies, an instrumental variable approach proposing genetic variants as instruments, via. subject matter knowledge. However, the instrumental variable model implies certain inequalities, offering an empirical method of falsifying (but not verifying) the underlying assumptions. Although these inequalities are said to detect only extreme assumption violations in practice, to our knowledge they have not been used in settings with multiple proposed instruments. METHODS: We applied the instrumental inequalities to an MR analysis of the effect of maternal pregnancy vitamin D on offspring psychiatric outcomes, proposing four independent maternal genetic variants as instruments. We assessed whether the proposed instruments satisfied the instrumental inequalities separately and jointly and explored the instrumental inequalities' properties via simulations. RESULTS: The instrumental inequalities were satisfied (i.e., we did not falsify the MR model) when considering each variant separately. However, the inequalities were violated when considering four variants jointly and for some combinations of two or three variants (two of 36 two-variant combinations and 18 of 24 three-variant combinations). In simulations, the inequalities detected structural biases more often when assessing proposed instruments jointly, although falsification in the absence of structural bias remained rare. CONCLUSIONS: The instrumental inequalities detected violations of the MR assumptions for genetic variants jointly proposed as instruments in our study, although the instrumental inequalities were satisfied when considering each proposed instrument separately. We discuss how investigators can assess instrumental inequalities to eliminate clearly invalid analyses in settings with many proposed instruments and provide appropriate code.
Mendelian randomization (MR), an increasingly popular tool for studying causal effects even when unmeasured confounding appears insurmountable, is a type of instrumental variable (IV) model where genetic variants are proposed as instruments. Briefly, a valid MR analysis with one genetic variant requires:(1) The genetic variant Z is associated with the exposure X(2) The genetic variant Z does not affect the outcome Y except through its effect on the exposure X(3) Individuals at different levels of the genetic variant Z are exchangeable (i.e., comparable) with regard to counterfactual outcomeConditions 2 and 3 are unverifiable. Forms of these conditions are necessary but not usually sufficient for all versions of MR analyses: obtaining point estimates of an average causal effect requires additional assumptions,[1] although these three conditions suffice for estimating bounds and sharp causal null testing.[2-4]Frequently, MR analyses propose that multiple single-nucleotide polymorphisms (SNPs) act as instruments and therefore that those SNPs “jointly” satisfy the MR assumptions. Leveraging multiple proposed instruments mitigates issues with power and weak instrument biases that can arise in analyses with a single proposed instrument,[5,6] although investigators are then challenged to support that the MR assumptions are satisfied for each SNP and for all SNPs jointly. As many genetic loci jointly proposed as instruments are derived from genome-wide association studies and the exact biologic mechanisms are often poorly understood, it is likely that these required assumptions do not hold for many MR analyses. Given this, several recently developed estimators allow for specific relaxations in exchange for additional, different assumptions.[7-12] For example, some approaches only require a subset of proposed instruments are true instruments.[8,13]Often missing from the MR literature, however, is any discussion of whether the data are consistent with the MR model proposed. Over 2 decades ago, Pearl[14] showed that the IV assumptions imply the following inequality for discrete proposed instruments, exposures, and outcomes:which is equivalent to the set of inequalities4 resulting fromfor allLater, Bonet[15] proved the IV model also implies additional constraints, and that such inequalities can be generalized to settings in which the proposed instrument and outcome, but not the exposure, are continuous. Although additional constraints by Bonet[15] are often difficult to state with straightforward equations, he did provide one expression for the case of a trichotomous instrument, dichotomous exposure, and dichotomous outcome:If the inequalities presented by Pearl[14] and Bonet[15], known as instrumental inequalities, do not hold, the IV model cannot hold. This means that investigators can attempt to falsify the IV model with their data alone when they have a dataset with measures of the proposed instrument, exposure, and outcome: if the instrumental inequalities are not satisfied, the data tell us that one or more of our assumptions are not satisfied. Recognizing the importance of falsification strategies (when available) for causal inference, multiple reporting guidelines recommend assessing the instrumental inequalities in all IV analyses.[16-18] Despite this, few MR analyses use them, perhaps because, for dichotomous proposed instruments, it has been suggested that only extreme assumption violations will be detected in practice.[17,18] No study has applied the instrumental inequalities to investigate the validity of multiple genetic loci jointly proposed as instruments. Here, we aim to explore the utility of the instrumental inequalities in identifying violations of the assumptions required for MR with multiple proposed instruments in real and simulated data and to provide adaptable software for the implementation and visualization of the instrumental inequalities. We begin by describing how to interpret the results of the instrumental inequalities when applied to a specific MR model and dataset.
INTERPRETATION OF THE INSTRUMENTAL INEQUALITIES
Because such falsification tests are relatively uncommon, let us begin by considering for illustrative purposes a scenario in which we believe that the two causal diagrams in Figure 1 are the only possible relationships between a particular SNP, exposure, and outcome. If the instrumental inequalities failed to hold, Figure 1A could not be true, meaning that Figure 1B must be true and the SNP has a direct effect on the outcome. However, if the instrumental inequalities hold, the data are consistent with the SNP having a direct effect or having no direct effect on the outcome, as we have failed to falsify Figure 1A.
FIGURE 1.
Causal diagrams representing a Mendelian randomization (MR) study with one genetic variant, Z, proposed as an instrument for the effect of X on Y. In A, Z is a valid instrument. In B, the MR assumptions are violated by a direct effect of Z on Y.
Causal diagrams representing a Mendelian randomization (MR) study with one genetic variant, Z, proposed as an instrument for the effect of X on Y. In A, Z is a valid instrument. In B, the MR assumptions are violated by a direct effect of Z on Y.The same logic applies where multiple SNPs are believed to be instruments. Figure 2 presents a causal diagram in which four independent SNPs are valid instruments both individually and as a single joint variable. When multiple SNPs are available, MR analyses using different subsets of SNPs, and thus slightly different assumptions, can be proposed. As such, the instrumental inequalities can be applied to each SNP individually, to any combination of two, three, or four of the SNPs, or to a summary score derived from these SNPs (e.g., an allele score) to evaluate the validity of each subset as a (jointly) proposed instrument. For example, one could propose all four SNPs jointly as instruments by combining the SNPs into a 3^4 = 81 level variable, where each level represents a different possible combination of alleles for the four SNPs. Violations of the instrumental inequalities when proposing this combination variable as an instrument provide evidence against the causal diagram in Figure 2. Likewise, violations of the instrumental inequalities when considering any SNP individually or any subset of SNPs would also provide evidence against this particular causal diagram.
FIGURE 2.
A causal diagram representing a Mendelian randomization study with four independent genetic variants, Z1, Z2, Z3, and Z4, proposed as instruments for the effect of X on Y. Here, all four genetic variants are valid instruments individually and jointly.
A causal diagram representing a Mendelian randomization study with four independent genetic variants, Z1, Z2, Z3, and Z4, proposed as instruments for the effect of X on Y. Here, all four genetic variants are valid instruments individually and jointly.It is possible to apply the instrumental inequalities directly to an allele score. Violations of the instrumental inequalities when proposing this allele score as an instrument could also provide evidence against the causal diagram in Figure 2. However, allele scores imply additional linearity and additivity assumptions, which are not required for the use of MR or the instrumental inequalities, and may result in loss of power,[6] although this approach may be useful to investigators considering using the allele score in their particular MR analysis.Importantly, the instrumental inequalities do not actually require us to specify an alternative causal diagram like we did in Figure 1. The instrumental inequalities simply show us whether a proposed MR model is false. In fact, without additional assumptions, the instrumental inequalities do not give evidence as to “how” the MR assumptions are violated, only that the MR model cannot be true in the dataset.In practice, the usefulness of the instrumental inequalities for evaluating many proposed instruments may be hindered by sample size. As the number of SNPs jointly proposed as instruments increases, the number of individuals within a given stratum of the proposed joint instrument becomes increasingly small, and it becomes more likely that the instrumental inequalities will fail to hold by random chance. The concept of random violations of MR assumptions is similar to that of “random confounding”[1,19]: in randomized trials, although randomization implies we expect balance of covariates across trial arms on average, it does not guarantee balance within a particular study. If there are imbalances in the distribution of a risk factor for the outcome in a study, adjustment for the imbalanced risk factor is recommended to produce unbiased causal effect estimates. Analogously, even if the MR assumptions for a proposed joint instrument are met in a theoretical super-population, the distribution of the proposed instrument, exposure, and outcome within a particular sample might deviate substantially from the expected distribution in the super-population, especially in small samples, which are more prone to notable deviations from what is expected. As a result, the MR assumptions, and thus, the instrumental inequalities could fail to hold by chance. Such violations may occur more often in small samples. As in a randomized trial with “random confounding,” an MR analysis in a sample where the assumptions were violated by chance is expected to produce biased estimates of causal effects. Thus, any evidence of a violation of the MR assumptions should be considered as important evidence about the validity of an MR analysis for that specific dataset. It remains important to understand the impact of sample size on the ability to detect structural violations of the MR assumptions, as it would otherwise remain unclear whether a violation found in one dataset provides evidence against a similar MR model in another dataset.The application of the instrumental inequalities to multiple proposed instruments allows for many layers of falsification strategies: we can attempt to falsify the model for any proposed instrument individually, any combination of proposed instruments jointly, and any summary score. A potential advantage of applying the instrumental inequalities to each of these is that they might be used to identify subsets of SNPs for which the MR assumptions definitely do not hold, and subsets of SNPs where an MR analysis could be pursued with caution.In the next section, we explore this possibility in a study of the effects of maternal prenatal vitamin D levels on childhood behavioral health outcomes and introduce a new visualization for the instrumental inequalities. We follow this application with a simulation study in order to better understand the impact of sample size on the instrumental inequalities. All analyses were conducted in R 3.4.1 (www.r-project.org, R Core Development Team). We provide adaptable R functions, available in the eAppendix; http://links.lww.com/EDE/B605, that allow the user to calculate the instrumental inequalities for multiple proposed instruments and display the results in a novel graph format.
DATA EXAMPLE: ESTIMATING THE EFFECTS OF MATERNAL PREGNANCY VITAMIN D ON CHILDHOOD BEHAVIORAL HEALTH OUTCOMES IN GENERATION R
Study Population
Generation R is a population-based cohort from fetal life to young adulthood, based in Rotterdam, the Netherlands. Mothers with a delivery date between April 2002 and January 2006 who lived in the study area were eligible for participation. Further information about the study is available elsewhere.[20] In total, 8,880 mothers were enrolled during pregnancy. To avoid overt violation of the MR assumptions by population stratification or relatedness, we restrict our analysis to the 3,188 mother-child pairs for which mothers were of self-reported Dutch ancestry and the child was the first offspring of the mother included in the cohort. For each MR model investigated, analysis was restricted to individuals with complete data available on exposure, outcome, and all proposed instruments, resulting in analytic samples of 1,970 (pervasive developmental problems [PDP]), 1,971 (mother-reported attention deficit hyperactivity disorder [ADHD] symptoms), and 1,146 (teacher-reported ADHD symptoms) for each outcome studied, respectively (see eTable 1; http://links.lww.com/EDE/B605 for descriptive statistics). This complete case analysis approach aligns with common practices in MR analyses, but it can violate the MR assumptions (and in fact may be the reason for violations of the instrumental inequalities in these samples).[21,22] Future studies might mitigate this issue by conducting the instrumental inequalities and MR models in samples weighted by the inverse probability of selection.[21] The study was approved by the Medical Ethics Committee of Erasmus Medical Center and was in accordance with the World Medical Association Declaration of Helsinki.
Proposed Instruments
Maternal genotyping was performed using Taqman allelic discrimination assay (Applied Biosystems, Foster City, CA), with an error rate of less than 1% confirmed in a random subsample (n = 276).[23] Based on existing literature, we proposed four independent maternal SNPs (rs2282679, rs12785878, rs6013897, rs10741657) as instruments. These SNPs have been associated genome-wide with serum vitamin D in a sample of 42,274 individuals,[24] and are often used in MR studies of vitamin D.[25-27] For all models, we coded SNPs trichotomously, based on the presence of 0, 1, or 2 risk alleles.
Exposure
Pregnancy serum vitamin D status was defined using the storage form of vitamin D, total 25OHD, measured in venous blood taken between 18.1 and 24.9 weeks gestation.[28] We defined exposure dichotomously and trichotomously, based on established clinical cutoffs at which treatment for vitamin D is recommended.[29-32] Total serum 25OHD was dichotomized at 75 nmol/L based on sufficiency; and trichotomized as deficiency (0–50 nmol/L), insufficiency (50–74.99 nmol/L), and sufficiency (≥ 75 nmol/L). Although these categorizations imply strong assumptions about a step-function relationship between vitamin D and offspring behavioral health, it is important to recognize that modeling vitamin D continuously in MR typically makes a likewise strong and potentially inaccurate assumption of a linear relationship.
Outcomes
Maternal-reported pervasive developmental problems (PDP) and ADHD symptoms at age 5 years were assessed from the Persistent Developmental Problems and the Attention Deficit-Hyperactivity subscales, respectively, of the Dutch translation of the Child Behavior Checklist.[33,34] The former subscale has been used as a screening tool to identify children with autism spectrum disorder,[35] while the latter has shown good convergent validity with clinician ratings.[36,37] We used the 98th percentile of each subscale’s T-scores (PDP: T ≥ 8.98; ADHD: T ≥ 9) as cutoffs to classify children with mother-reported PDP and ADHD symptoms in the clinical range. Teacher-reported ADHD symptoms at age 7 were defined as a T-score above the 98th percentile on the Teacher Report Form Attention Problems subscale (T ≥ 15).[38-40]
Analysis
We assessed whether the instrumental inequalities would identify violations of MR models for the causal effect of maternal serum vitamin D during pregnancy on offspring PDP and ADHD symptoms, using the above-mentioned four SNPs proposed as instruments. For each possible combination of SNPs, we applied the instrumental inequalities to MR models for the causal effect of maternal vitamin D on an outcome. We then extracted the maximum value of the instrumental inequalities, along with the number of strata of the proposed instrument with exactly 0 or fewer than 10 individuals. For binary exposure models, we also applied the Bonet[15] inequality for trichotomous instruments to each SNP marginally. Although in any plausible scenario where an allele score satisfies the MR assumptions, each contributing SNP would also individually and jointly satisfy those assumptions,[5] we also applied the instrumental inequalities to MR models with a categorical, unweighted allele score proposed as an instrument.Although the instrumental inequalities cannot be applied to continuous measures of exposures, evaluating models based on categorized measures could still be informative. However, the MR assumptions can be violated if the exposure is inappropriately categorized,[41] implying the instrumental inequalities might be detecting this mismeasurement rather than another MR assumption violation. If that were the case, we may expect to see decreasing instances in which the instrumental inequalities were violated as the number of categories of the exposure increases, although evaluating this property might require prohibitively large samples. To see if coding of the exposure variable altered the conclusions, we evaluated the instrumental inequalities using dichotomous and trichotomous exposure definitions, as described above.
Results
For all definitions of exposures and outcomes, the instrumental inequalities, including the stronger inequalities developed by Bonet[15], held for each SNP individually, indicating that there was no evidence in the data alone against each specific proposed instrument being valid. However, as the number of SNPs jointly proposed as instruments increased, the instrumental inequalities increasingly failed to hold (Figure 3).
FIGURE 3.
In these visualizations, each horizontal line represents a single nucleotide polymorphism (SNP), and each vertical line connects a set of SNPs proposed as instruments (with the number of included SNPs increasing from left to right). The color of each node represents the maximum value of the instrumental inequalities, with white indicating a value less than one and darker colors indicating larger values that represent violations (see Legend). See eAppendix; http://links.lww.com/EDE/B605 for further details of visualization technique.
In these visualizations, each horizontal line represents a single nucleotide polymorphism (SNP), and each vertical line connects a set of SNPs proposed as instruments (with the number of included SNPs increasing from left to right). The color of each node represents the maximum value of the instrumental inequalities, with white indicating a value less than one and darker colors indicating larger values that represent violations (see Legend). See eAppendix; http://links.lww.com/EDE/B605 for further details of visualization technique.When the instrumental inequalities were applied to MR models for the causal effect of maternal vitamin D coded dichotomously on mother-reported PDP symptoms, the instrumental inequalities failed to hold for half of the combinations of three SNPs jointly proposed as instruments and the combination of all four SNPs (Tables 1–3). When applied to MR models for the causal effect of maternal vitamin D on mother-reported ADHD symptoms, the instrumental inequalities failed to hold for all three SNP and four SNP combinations, as well as the allele score. For teacher-reported ADHD symptoms, the instrumental inequalities failed to hold for the allele score, all three SNP and four SNP combinations, and one two-SNP combination.
TABLE 1.
Summary of Instrumental Inequalities for Studying the Effect of Maternal Vitamin D on Mother-reported Pervasive Developmental Problems Symptoms With Varying Combinations of Proposed Instruments and Definitions of Exposure
TABLE 3.
Summary of Instrumental Inequalities for Studying the Effect of Maternal Vitamin D on Teacher-reported Attention Deficit Hyperactivity Disorder Symptoms With Varying Combinations of Proposed Instruments and Definitions of Exposure
Summary of Instrumental Inequalities for Studying the Effect of Maternal Vitamin D on Mother-reported Pervasive Developmental Problems Symptoms With Varying Combinations of Proposed Instruments and Definitions of ExposureSummary of Instrumental Inequalities for Studying the Effect of Maternal Vitamin D on Mother-reported Attention Deficit Hyperactivity Disorder Symptoms With Varying Combinations of Proposed Instruments and Definitions of ExposureSummary of Instrumental Inequalities for Studying the Effect of Maternal Vitamin D on Teacher-reported Attention Deficit Hyperactivity Disorder Symptoms With Varying Combinations of Proposed Instruments and Definitions of ExposureWhen we coded maternal vitamin D trichotomously, the maximum value of the instrumental inequalities for each possible combination of SNPs proposed as instruments was less than or equal to the maximum value of the inequalities in models with a dichotomized measure of maternal vitamin D. For some models, the instrumental inequalities held in the trichotomous exposure case but not the dichotomous exposure case, including two settings in which the allele score was the proposed instrument.
SIMULATION STUDY
Methods
We simulated four independent binary genetic variants Z1–Z4 with causal effects on the exposure X. Although Z2, Z3, and Z4 were true causal instruments, Z1 also had a direct causal effect on the outcome Y, thereby violating the MR assumptions. We then applied the instrumental inequalities in scenarios with varying sample sizes (n = 1,000; 10,000; 100,000), proposed instrument strengths, and strengths of the direct effect of Z1 on Y. R code for the simulations and details of simulated parameters are available in the eAppendix; http://links.lww.com/EDE/B605.The instrumental inequalities were increasingly violated for combinations of proposed instruments including Z1 as the strength of violation and number of proposed instruments included in a combination increased (Figure 4). When the strength of violation was relatively weak, the instrumental inequalities were more often violated for combinations including Z1 in the smaller (n = 1,000) samples.
FIGURE 4.
Results of six simulations with four dichotomous proposed instruments Z1, Z2, Z3, and Z4, a dichotomous exposure X, dichotomous outcome Y, and continuous exposure-outcome confounder U. For each setting, we simulated 1,000 samples such that Z1i~bernoulli (0.5), Z2i~bernoulli (0.5), Z3i~bernoulli (0.5), Z4i~bernoulli (0.5), Ui~norm (0, 1), Xi~bernoulli (expit [0.6 + 0.1 × Ui + 0.1 × Z1i + 0.1 × Z2i + 0.1 × Z3i + 0.1 × Z4i]). We varied sample sizes (n = 1,000, 10,000, 100,000) across simulations. In addition, in each of the six depicted simulations, Z1 violated the Mendelian randomization (MR) conditions, with Yi~bernoulli (expit [0.02 + 0.1 × Ui + β × Z]). Thus, each simulation represents a setting where one of the four proposed instruments violates the MR assumptions, with differing degrees of violation of the MR assumptions and differing sample sizes. In these visualizations, each horizontal line represents a genetic variant, and each vertical line connects a set of genetic variants proposed as instruments (with the number of included genetic variants increasing from left to right). Unlike in Figure 3, in which connected nodes indicate a particular application of the instrumental inequalities, here nodes indicate the number of simulated samples in which the MR inequalities were violated. The color of each node represents the number of samples where the instrumental inequalities were violated, out of 1,000 total samples for each setting. Note that this is in contrast to Figure 3, where the color of each node represented the maximum value of the instrumental inequalities for each set of genetic variants jointly proposed as instruments within a particular dataset. See eAppendix; http://links.lww.com/EDE/B605 for further details of visualization technique.
Results of six simulations with four dichotomous proposed instruments Z1, Z2, Z3, and Z4, a dichotomous exposure X, dichotomous outcome Y, and continuous exposure-outcome confounder U. For each setting, we simulated 1,000 samples such that Z1i~bernoulli (0.5), Z2i~bernoulli (0.5), Z3i~bernoulli (0.5), Z4i~bernoulli (0.5), Ui~norm (0, 1), Xi~bernoulli (expit [0.6 + 0.1 × Ui + 0.1 × Z1i + 0.1 × Z2i + 0.1 × Z3i + 0.1 × Z4i]). We varied sample sizes (n = 1,000, 10,000, 100,000) across simulations. In addition, in each of the six depicted simulations, Z1 violated the Mendelian randomization (MR) conditions, with Yi~bernoulli (expit [0.02 + 0.1 × Ui + β × Z]). Thus, each simulation represents a setting where one of the four proposed instruments violates the MR assumptions, with differing degrees of violation of the MR assumptions and differing sample sizes. In these visualizations, each horizontal line represents a genetic variant, and each vertical line connects a set of genetic variants proposed as instruments (with the number of included genetic variants increasing from left to right). Unlike in Figure 3, in which connected nodes indicate a particular application of the instrumental inequalities, here nodes indicate the number of simulated samples in which the MR inequalities were violated. The color of each node represents the number of samples where the instrumental inequalities were violated, out of 1,000 total samples for each setting. Note that this is in contrast to Figure 3, where the color of each node represented the maximum value of the instrumental inequalities for each set of genetic variants jointly proposed as instruments within a particular dataset. See eAppendix; http://links.lww.com/EDE/B605 for further details of visualization technique.In samples of 100,000 individuals, the instrumental inequalities were never violated for combinations not including Z1, regardless of instrument strength or strength of violation (eAppendix; http://links.lww.com/EDE/B605). In simulated samples of 10,000 and 1,000 individuals, the instrumental inequalities were occasionally violated for some combinations not including Z1 (i.e., for combinations when no structural bias was present), although this occurred in less than 1% of simulations for each true instrument marginally (eAppendix; http://links.lww.com/EDE/B605). This was especially likely when considering the three valid instruments jointly in the smallest sample size and the strongest proposed instrument strength simulated, in which 90% of the time the inequalities were violated. In all cases in which the inequalities were violated for a combination that did not include Z1, the instrumental inequalities were also always violated for combinations including Z1. When we proposed Z1–Z4 jointly as instruments in these settings, the instrumental inequalities were violated in more than 95% of simulations.
DISCUSSION
Our results indicate that, for studies of the causal effect of maternal pregnancy vitamin D on offspring PDP and ADHD within Generation R, there are clear violations of the MR assumptions when proposing four SNPs (rs2282679, rs12785878, rs6013897, rs10741657) jointly as instruments, as well as for several combinations of three of the four SNPs. We did not detect violations of the MR assumptions when each SNP was proposed as an instrument marginally, or for most combinations of two of the four SNPs. The results of our simulations suggest that the instrumental inequalities will be increasingly violated as the magnitude of the violation of the MR assumptions grows, are more sensitive to violations of the MR assumptions when multiple instruments are proposed jointly, and that small sample sizes appear to increase the probability of finding a true structural violation with limited risk of incorrectly detecting a structural violation when none existed.Because a violation of the instrumental inequalities for any of the sets of SNPs proposed as instruments would indicate that the four SNPs are not jointly valid instruments, our results clearly demonstrate that certain MR analyses would be biased if conducted in our dataset. Moreover, for teacher-reported and mother-reported ADHD using a dichotomous exposure, the MR assumptions fail to hold when every possible overlapping combination of three of the four SNPs are proposed jointly as instruments, which for independent SNPs logically implies that the MR assumptions cannot hold for at least two of the included SNPs individually. Altogether, our results then suggest that MR analyses requiring all four SNPs are jointly instruments (e.g., analyses proposing an allele score) are inappropriate in our dataset, and also that MR analyses that only require a subset of SNPs are instruments (e.g., the median-based approach[8]) should be pursued with extreme caution. Our dataset found no particular pattern suggestive of a specific problematic SNP and thus is not helpful in pruning clearly invalid instruments. On the other hand, our simulations suggest that a pattern consistent with one “bad apple” is possible to detect and may aid in pruning clearly invalid instruments: investigators might consider removing the offending SNP from their proposed instrument set and continuing with an MR analysis. It is also possible for investigators to consider MR estimators that allow for all proposed instruments to be invalid in specific ways, although these methods require alternative assumptions beyond those considered here[7,10] and the results of the instrumental inequalities would only be informative if coupled with a strong biologic rationale for these alternative assumptions. Finally, it is worth reiterating two important points on interpretation. First, the instrumental inequalities falsify but do not verify the MR model. Thus, if an application of the inequalities detects no violation, it is still possible for the MR analysis to be biased. Investigators should still weigh subject matter knowledge, perform other falsification strategies and sensitivity analyses, and choose an appropriate method if they decide to pursue an MR analysis, as outlined in prior guidelines.[17] The relevance of this point is underscored by our simulations, in which a bias was always structurally present but remained undetected in several simulated samples. Second, the instrumental inequalities are a falsification strategy for the core MR assumptions but do not assess the additional point-identifying assumptions.[18]Finding the instrumental inequalities are not satisfied, however, does not tell us “why” they are not satisfied. In our data example, there are several structural reasons why the MR assumptions could be violated, some of which are depicted in Figure 5 and described in the eAppendix; http://links.lww.com/EDE/B605.[7,8,11,41-43] It is also possible that the falsification of the MR model indicated by our findings are specific to our dataset, which motivated our simulations. As previously discussed, as sample size decreases and the number of proposed instruments increases, the MR assumptions, and thus the instrumental inequalities, can be more readily violated by chance. In the simple scenario constructed in our simulations, the instrumental inequalities appear to be violated for combinations excluding the invalid proposed instrument only when the bias for the invalid instrument is very strong and the sample is relatively small, in which cases the instrumental inequalities also indicate that the set of four jointly proposed instruments violate the MR conditions. The frequency of this type of sample-specific violation appears to decline with sample size, and there was no evidence of finding violations for combinations with no structural bias in simulated samples of 100,000 participants. Overall, the results of our simple simulations suggest that, even in settings with small samples and strong instruments, where it is possible detected violations are sample-specific, the instrumental inequalities still provide strong evidence regarding the validity of MR analyses within a particular dataset. However, in such settings, it may be difficult, if not impossible, to determine the source of said violations if it is truly limited to a subset of the proposed instruments. It is unclear how this property of the inequalities will be affected when larger numbers of SNPs are proposed as instruments. Although the instrumental inequalities may be impacted by sample size, outside of the all-binary case, statistical inference procedures have not been fully developed.[44,45]
FIGURE 5.
Causal diagrams depicting some reasons for possible violations of the Mendelian randomization (MR) assumptions. For simplicity, in each causal diagram, Z4 alone (and therefore any combination involving Z4) violates the MR assumptions: (A) pleiotropy, (B) violation by population stratification, (C) violation by offspring genotype, (D) violation by postnatal effects of exposure, (E) violation by changing exposure-instrument association over pregnancy, (F) violation by selection on fertility, (G) violation by exposure dichotomization, (H) violation by missing data. See eAppendix; http://links.lww.com/EDE/B605 for further discussion of each of these possible violations in the context of our data analysis.
Causal diagrams depicting some reasons for possible violations of the Mendelian randomization (MR) assumptions. For simplicity, in each causal diagram, Z4 alone (and therefore any combination involving Z4) violates the MR assumptions: (A) pleiotropy, (B) violation by population stratification, (C) violation by offspring genotype, (D) violation by postnatal effects of exposure, (E) violation by changing exposure-instrument association over pregnancy, (F) violation by selection on fertility, (G) violation by exposure dichotomization, (H) violation by missing data. See eAppendix; http://links.lww.com/EDE/B605 for further discussion of each of these possible violations in the context of our data analysis.In our data example, the fact that violations by SNPs jointly proposed as instruments were detected by some of the instrumental inequalities applied to allele scores, which have a smaller number of strata, as well as the relative weakness of the proposed instruments, suggests that not all the violations in our dataset are attributable to sample size. If the violations detected are not sample-specific, but rather indicative of structural biases related to the SNPs proposed as instruments, this might suggest these four SNPs should not be used as instruments for the effect of maternal vitamin D on offspring behavioral outcomes.More broadly, our data example provides a concrete case in which the instrumental inequalities falsified a model proposing multiple variables jointly as instruments, underscoring previous calls for the use of the instrumental inequalities in all IV analyses.[16-18] Like all observational research, MR requires strong, unverifiable assumptions. However, in the context of one-sample MR with multiple proposed instruments, the instrumental inequalities may allow us to eliminate clearly invalid analyses and focus efforts on more potentially informative studies.
ACKNOWLEDGMENTS
We thank Vanessa Didelez for helpful discussions.
TABLE 2.
Summary of Instrumental Inequalities for Studying the Effect of Maternal Vitamin D on Mother-reported Attention Deficit Hyperactivity Disorder Symptoms With Varying Combinations of Proposed Instruments and Definitions of Exposure
Authors: Michael F Holick; Neil C Binkley; Heike A Bischoff-Ferrari; Catherine M Gordon; David A Hanley; Robert P Heaney; M Hassan Murad; Connie M Weaver Journal: J Clin Endocrinol Metab Date: 2011-06-06 Impact factor: 5.958
Authors: Thomas J Wang; Feng Zhang; J Brent Richards; Bryan Kestenbaum; Joyce B van Meurs; Diane Berry; Douglas P Kiel; Elizabeth A Streeten; Claes Ohlsson; Daniel L Koller; Leena Peltonen; Jason D Cooper; Paul F O'Reilly; Denise K Houston; Nicole L Glazer; Liesbeth Vandenput; Munro Peacock; Julia Shi; Fernando Rivadeneira; Mark I McCarthy; Pouta Anneli; Ian H de Boer; Massimo Mangino; Bernet Kato; Deborah J Smyth; Sarah L Booth; Paul F Jacques; Greg L Burke; Mark Goodarzi; Ching-Lung Cheung; Myles Wolf; Kenneth Rice; David Goltzman; Nick Hidiroglou; Martin Ladouceur; Nicholas J Wareham; Lynne J Hocking; Deborah Hart; Nigel K Arden; Cyrus Cooper; Suneil Malik; William D Fraser; Anna-Liisa Hartikainen; Guangju Zhai; Helen M Macdonald; Nita G Forouhi; Ruth J F Loos; David M Reid; Alan Hakim; Elaine Dennison; Yongmei Liu; Chris Power; Helen E Stevens; Laitinen Jaana; Ramachandran S Vasan; Nicole Soranzo; Jörg Bojunga; Bruce M Psaty; Mattias Lorentzon; Tatiana Foroud; Tamara B Harris; Albert Hofman; John-Olov Jansson; Jane A Cauley; Andre G Uitterlinden; Quince Gibson; Marjo-Riitta Järvelin; David Karasik; David S Siscovick; Michael J Econs; Stephen B Kritchevsky; Jose C Florez; John A Todd; Josee Dupuis; Elina Hyppönen; Timothy D Spector Journal: Lancet Date: 2010-06-10 Impact factor: 79.321
Authors: Jue-Sheng Ong; Gabriel Cuellar-Partida; Yi Lu; Peter A Fasching; Alexander Hein; Stefanie Burghaus; Matthias W Beckmann; Diether Lambrechts; Els Van Nieuwenhuysen; Ignace Vergote; Adriaan Vanderstichele; Jennifer Anne Doherty; Mary Anne Rossing; Jenny Chang-Claude; Ursula Eilber; Anja Rudolph; Shan Wang-Gohrke; Marc T Goodman; Natalia Bogdanova; Thilo Dörk; Matthias Dürst; Peter Hillemanns; Ingo B Runnebaum; Natalia Antonenkova; Ralf Butzow; Arto Leminen; Heli Nevanlinna; Liisa M Pelttari; Robert P Edwards; Joseph L Kelley; Francesmary Modugno; Kirsten B Moysich; Roberta B Ness; Rikki Cannioto; Estrid Høgdall; Claus K Høgdall; Allan Jensen; Graham G Giles; Fiona Bruinsma; Susanne K Kjaer; Michelle At Hildebrandt; Dong Liang; Karen H Lu; Xifeng Wu; Maria Bisogna; Fanny Dao; Douglas A Levine; Daniel W Cramer; Kathryn L Terry; Shelley S Tworoger; Meir Stampfer; Stacey Missmer; Line Bjorge; Helga B Salvesen; Reidun K Kopperud; Katharina Bischof; Katja Kh Aben; Lambertus A Kiemeney; Leon Fag Massuger; Angela Brooks-Wilson; Sara H Olson; Valerie McGuire; Joseph H Rothstein; Weiva Sieh; Alice S Whittemore; Linda S Cook; Nhu D Le; C Blake Gilks; Jacek Gronwald; Anna Jakubowska; Jan Lubiński; Tomasz Kluz; Honglin Song; Jonathan P Tyrer; Nicolas Wentzensen; Louise Brinton; Britton Trabert; Jolanta Lissowska; John R McLaughlin; Steven A Narod; Catherine Phelan; Hoda Anton-Culver; Argyrios Ziogas; Diana Eccles; Ian Campbell; Simon A Gayther; Aleksandra Gentry-Maharaj; Usha Menon; Susan J Ramus; Anna H Wu; Agnieszka Dansonka-Mieszkowska; Jolanta Kupryjanczyk; Agnieszka Timorek; Lukasz Szafron; Julie M Cunningham; Brooke L Fridley; Stacey J Winham; Elisa V Bandera; Elizabeth M Poole; Terry K Morgan; Harvey A Risch; Ellen L Goode; Joellen M Schildkraut; Celeste L Pearce; Andrew Berchuck; Paul Dp Pharoah; Georgia Chenevix-Trench; Puya Gharahkhani; Rachel E Neale; Penelope M Webb; Stuart MacGregor Journal: Int J Epidemiol Date: 2016-09-04 Impact factor: 7.196
Authors: Joy Shi; Sonja A Swanson; Peter Kraft; Bernard Rosner; Immaculata De Vivo; Miguel A Hernán Journal: Epidemiology Date: 2022-01-01 Impact factor: 4.860
Authors: Elizabeth W Diemer; Jeremy A Labrecque; Alexander Neumann; Henning Tiemeier; Sonja A Swanson Journal: Paediatr Perinat Epidemiol Date: 2020-08-11 Impact factor: 3.980