| Literature DB >> 33682654 |
Gemma Hammerton1,2, Marcus R Munafò2,3.
Abstract
The goal of much observational research is to identify risk factors that have a causal effect on health and social outcomes. However, observational data are subject to biases from confounding, selection and measurement, which can result in an underestimate or overestimate of the effect of interest. Various advanced statistical approaches exist that offer certain advantages in terms of addressing these potential biases. However, although these statistical approaches have different underlying statistical assumptions, in practice they cannot always completely remove key sources of bias; therefore, using design-based approaches to improve causal inference is also important. Here it is the design of the study that addresses the problem of potential bias - either by ensuring it is not present (under certain assumptions) or by comparing results across methods with different sources and direction of potential bias. The distinction between statistical and design-based approaches is not an absolute one, but it provides a framework for triangulation - the thoughtful application of multiple approaches (e.g. statistical and design based), each with their own strengths and weaknesses, and in particular sources and directions of bias. It is unlikely that any single method can provide a definite answer to a causal question, but the triangulation of evidence provided by different approaches can provide a stronger basis for causal inference. Triangulation can be considered part of wider efforts to improve the transparency and robustness of scientific research, and the wider scientific infrastructure and system of incentives.Entities:
Keywords: causal inference; epidemiology; mental health; observational data; triangulation
Year: 2021 PMID: 33682654 PMCID: PMC8020490 DOI: 10.1017/S0033291720005127
Source DB: PubMed Journal: Psychol Med ISSN: 0033-2917 Impact factor: 7.723
Assumptions and limitations of statistical and design-based approaches to causal inference
| Statistical approaches | Description | Assumptions | Limitations | Example |
|---|---|---|---|---|
| Confounding | ||||
| Multivariable regression | Potential confounders are included in the regression model for the effect of the exposure on the outcome | No residual confounding (all confounders are accurately measured, and correctly included in the statistical model); for multivariable regression, the outcome is modelled correctly given the exposure and confounders, for propensity score methods the exposure is modelled correctly given the confounders | Assumptions difficult to meet with full confidence resulting in bias from residual confounding; although propensity scores carry some advantages over multivariable regression (e.g. statistical efficiency and flexibility), the different methods to incorporate a propensity score into the analysis model (e.g. stratifying, matching, adjusting, weighting) each have their own limitations – see Haukoos and Lewis (Haukoos & Lewis, | Harrison and colleagues (Harrison et al., |
| Propensity scores | Propensity scores are used to control for time-invariant confounding, calculated by estimating the probability that an individual is exposed, given the values of their observed baseline confounders; can be extended to address time-varying confounding via marginal structural models | Bray and colleagues (Bray et al., | ||
| Fixed-effects regression | This approach uses repeated measures of an exposure and an outcome to account for the possibility of an association between the exposure and the unexplained variability in the outcome (representing unmeasured confounding); can adjust for all time-invariant confounders, including unobserved confounders, and can incorporate observed time-varying confounders | Potential time-varying confounders are measured accurately and correctly included in the statistical model | Requires repeated assessments of exposure and outcome; model cannot control for unobserved fixed confounding factors whose effects vary with age, or that combine interactively with the exposure to influence the outcome, or unobserved time-varying confounders | Fergusson and Horwood (Fergusson & Horwood, |
| Selection bias | ||||
| Complete case analysis with covariate adjustment | Analyses are performed on those with complete data on all variables, but covariates are included in the model that are associated with missingness | Data are MAR or MCAR; results can be unbiased when data are MNAR as long as the chance of being a complete case does not depend on the outcome after adjusting for covariates | Cannot address lack of power due to missing data; results biased when outcome MNAR; must be aware of and measure predictors of missingness; cannot include information from variables not included in main analysis that are associated with missingness | Hughes and colleagues (Hughes et al., |
| Approaches based on the MAR assumption, e.g. multiple imputation | Multiple imputation is a two-stage process, where first, multiple imputed data sets are created with each missing value replaced by imputed values using models fitted to the observed data, and second, each imputed data set is analysed, and results are combined in an appropriate way; can address both lack of power and bias (with extensions that exist to allow for MNAR mechanisms using sensitivity parameters) | Data are MAR or MCAR; imputation model is compatible with analysis model; imputation is performed multiple times and performed ‘properly;’ final analysis combines appropriately over the multiple data sets (e.g. using Rubin's rules); for a more in-depth discussion of potential pitfalls in multiple imputation see the review by Sterne and colleagues (Sterne et al., | If exposure is MNAR, multiple imputation can cause more bias than using complete case analysis; requires information to be collected on auxiliary variables, closely associated with variables to be imputed; all aspects of the analysis model must be included in the imputation model, therefore if changes are made at a later date (e.g. testing an interaction), the imputation model needs to be redone; computationally intensive therefore can result in computational problems (particularly with small sample sizes) | |
| Approaches based on the MNAR assumption, e.g. using linkage to external routinely collected health records | Routinely collected health data can be used to examine biases from selective non-response by providing data on those that did and did not respond to assessments within population cohorts or surveys; it can also be used as a proxy for the missing study outcome in multiple imputation or deriving weights to adjust for potential bias and make the MAR assumption more plausible | High correlation between study outcome and linked proxy; if the outcome is not MNAR but missingness depends on the proxy, inclusion of the proxy in a multiple imputation model would increase bias – see Cornish and colleagues (Cornish et al., | Requires access to closely related routinely collected data; not all participants may consent to linkage which could introduce bias if differences between non-consenters and non-responders; linkage to external datasets can be costly and complicated; use of a proxy in multiple imputation can increase bias depending on missing data mechanism | Gorman and colleagues (Gorman et al., |
| Measurement bias | ||||
| Latent variables using multiple sources of data | A latent variable is a source of variance not directly measured but estimated from the covariation between a set of strongly related observed variables; if these observed variables are assessed using multiple methods, each with different sources of bias, variability due to bias shared across items can be removed from the latent variable | Latent variable indicators all measure same underlying construct and responses on the indicators are a result of an individual's position on the latent variable; latent variable variance is independent from measurement residual variance; indicators assessed using different methods have different sources of bias; for a description of all assumptions in latent variable modelling see Kline (Kline, | Requires at least four strongly correlated measures assessed using different methods each with different sources of bias; important that items included make theoretical sense given underlying construct; important to think carefully about the meaning of the latent variable | Palmer and colleagues (Palmer et al., |
| Mechanisms | ||||
| Counterfactual mediation | Mediation approach based on conceptualizing ‘potential outcomes’ for each individual [ | Main assumptions include conditional exchangeability, no interference and consistency; see de Stavola and colleagues (De Stavola, Daniel, Ploubidis, & Micali, | Still subject to the same threats to causality as traditional approaches to mediation analyses (including poorly measured or unmeasured confounding and measurement error); challenging to extend to examine individual paths via multiple mediators; each specific counterfactual mediation method subject to its own limitations – see VanderWeele (VanderWeele, | Using a sequential counterfactual mediation approach, Aitken and colleagues (Aitken, Simpson, Gurrin, Bentley, & Kavanagh, |
| Design-based approaches | ||||
| RCTs | In an RCT, participants are randomly assigned to a treatment or control group, and the outcome is compared across groups; when performed well, RCTs can account for both known and unknown confounders and are therefore considered to be the gold standard for estimating causal effects | Assignment to treatment and control groups is random, and so groups are similar except with respect to the intervention | Prone to potential bias, such as lack of concealment of the random allocation, failure to maintain randomization, lack of blinding to which group participants have been randomized, non-adherence, and differential loss to follow-up between groups; often recruit highly selected samples which are not representative of the population of interest, threatening the generalizability of results; can be expensive and time-consuming and not always feasible or ethical, particularly in mental health research | Ford and colleagues (Ford et al., |
| Natural experiments | Populations are compared before and after (or with and without exposure to) a ‘natural’ exposure at a specific time point, with the assumption that potential biases (such as confounding) are similar between them; exposure may occur naturally (e.g. famine), or be quasi-random (e.g. introduction of policies) | Populations compared are comparable (e.g. with respect to the underlying confounding structure) except for the naturally occurring (or quasi-randomized) exposure | Potential sources of bias include differences on characteristics that may confound any observed association, or misclassification of outcome that relates to the naturally occurring exposure; relies on the occurrence of appropriate natural experiments that manipulate exposure of interest; selection bias can be present as exposure is not manipulated by researcher | Davies and colleagues (Davies et al., |
| Instrumental variables | An instrumental variable is a variable that is robustly associated with an exposure of interest, but not confounders of the exposure and outcome. MR is an extension of this approach where a genetic variant is used as a proxy for the exposure | The instrument is associated with the exposure (relevance assumption); the instrument is not associated with confounders of the exposure-outcome association (exchangeability assumption); the instrument is not associated with the outcome other than via its association with the exposure (exclusion restriction assumption) | Weak instrument bias can result from a weak association between the instrument and the exposure; another source of bias is the exclusion restriction criterion being violated – this is the main source of bias in MR (due to horizontal pleiotropy), and therefore a number of extensions have been developed which are robust to horizontal pleiotropy; population stratification is also a source of bias in MR, which may require focusing on an ethnically homogeneous population, or adjusting for genetic principal components that reflect different population sub-groups | Taylor and colleagues (Taylor et al., |
| Different confounding structures | Multiple samples with different confounding structures are used, for example, comparing multiple control groups within a case−control design, or multiple populations with different confounding structures | The bias introduced by confounding is different across samples so that congruent results are more likely to reflect causal effects; different results across samples are due to different confounding structures and not true differences in causal effect; no other sources of bias that could explain results being the same or different across samples | Assessment and quality of measures must be similar across samples; misclassification of exposure or outcome (or other unknown sources of bias) can produce misleading results; strong a priori hypotheses required about confounding structures across samples | Sellers and colleagues (Sellers et al., |
| Positive and negative controls | This approach allows a test of whether an exposure or outcome is behaving as expected (a positive control), or not as expected (a negative control); a positive control is known to be causally related to the outcome (or exposure), whereas a negative control is not plausibly causally related to outcome (or exposure) | The real exposure (or outcome) and negative control exposure (or outcome) have the same sources of bias; the negative control exposure is not causally related to the outcome (and vice versa for negative control outcome); the positive control exposure is causally related to the outcome (and vice versa for positive control outcome) | Important to consider assortative mating in the prenatal negative control design, and mutually adjust for maternal and paternal exposures [see Madley-Dowd and colleagues (Madley-Dowd et al., | Caramaschi and colleagues (Caramaschi et al., |
| Discordant siblings | Family-based study designs can provide a degree of control over family-level confounding by comparing outcomes for siblings who are discordant for an exposure; for example, two siblings born to a mother who smoked during one pregnancy, but not the other, provide information on the intrauterine effects of tobacco exposure, while controlling for observed and unobserved genetic and shared environmental familial confounding | Any misclassification of the exposure or outcome is similar across siblings, and there is little or no individual-level confounding (for example, one sibling was not exposed to a potential confounder where the other was not) | The assumption of no individual-level confounding is unlikely to be met (for example, the plausible scenario where a mother is both older and less likely to be smoking for the second pregnancy); method depends on the availability of suitable samples which means sample size can be limited (particularly for use of identical twins within a discordant-sibling design); bias due to individual-level confounding or misclassification of exposure/ outcome will be larger than in studies of unrelated individuals – see Frisell and colleagues (Frisell, Oberg, Kuja-Halkola, & Sjolander, | Madley-Dowd and colleagues (Madley-Dowd et al., |
MAR, missing at random; MCAR, missing completely at random; MNAR, missing not at random; SEM, structural equation modelling; RCT, randomized controlled trial; MR, Mendelian randomization.
Studies using triangulation to address a research question in mental health epidemiology
| Study | Exposure | Outcome | Approach used | Description | Comments |
|---|---|---|---|---|---|
| Brand et al. ( | Maternal smoking in pregnancy | Longitudinal foetal growth from 12–16 to 40 weeks gestation | Linear regression | Multilevel fractional polynomial models of estimated foetal weight, and multivariable linear regression between maternal smoking in pregnancy and foetal weight, adjusting for potential confounders | The study states that findings were triangulated from three approaches with differing sources of bias to improve causal inference; evidence was consistent with a causal effect for maternal smoking in pregnancy on foetal growth (i.e. results from all three methods were consistent with a causal effect) |
| MR | MR of smoking quantity and ease of quitting on estimated foetal weight using individual-level data | ||||
| Negative control exposure | Partner's smoking was used as a negative control for intrauterine exposure | ||||
| Thapar et al. ( | Maternal smoking in pregnancy | Child Attention Deficit/Hyperactivity Disorder (ADHD) and birth weight | Natural experiment | Natural experiment comparing offspring conceived via | Study does not specifically refer to triangulation; evidence was consistent with a causal effect for maternal smoking in pregnancy on lower birth weight but not ADHD symptoms (i.e. consistent results were found for unrelated and related mother–offspring pairs for birth weight but not ADHD) |
| Sellers et al. ( | Maternal smoking in pregnancy | Child conduct and hyperactivity, cognition and birth weight | Cross-cohort design | Two national UK cohorts born in 1958 and 2000/2001 with different confounding structures were compared | The study highlights the utility of cross-cohort designs in helping triangulate conclusions about the role of putative causal risk factors in observational epidemiology; evidence was consistent with a causal effect for maternal smoking in pregnancy on lower birth weight but not the other child outcomes (i.e. consistent results were found across cohorts for birth weight but not conduct problems, hyperactivity and reading) |
| Caramaschi et al. ( | Maternal smoking in pregnancy | Autism spectrum disorder (ASD) | Logistic and linear regression | Multivariable regression using self-report smoking and an epigenetic score as the exposure and ASD diagnosis or traits as the outcome, adjusted for potential confounders | Study states that the integration of evidence from several different epidemiological approaches that have differing and unrelated sources of bias was used, but does not specifically refer to triangulation; evidence was not consistent with a causal effect for maternal smoking in pregnancy on autism or related traits (i.e. all three methods showed weak or no evidence for a causal effect) |
| Negative control exposure | Partner's smoking was used as a negative control for intrauterine exposure | ||||
| MR | MR between heaviness of smoking and ASD or autistic traits using individual-level data | ||||
| Gage et al. ( | Smoking | Education attainment and cognitive ability | Linear regression | Multivariable linear regression between smoking heaviness and education attainment and cognitive ability, adjusting for potential confounders and earlier measures of the outcome | Study highlights that the triangulation of results across different methods, each with their own strengths, limitations and sources of bias is a strength; evidence was consistent with a causal effect for smoking on lower educational attainment, but results were less consistent for cognitive ability (i.e. results from both methods were consistent with a causal effect for education and cognition, however cognition results were less robust to various sensitivity analyses) |
| MR | Two-sample MR of two smoking phenotypes (smoking initiation and lifetime smoking) on cognitive ability and educational attainment | ||||
| Harrison et al. ( | Smoking behaviours (initiation, smoking status, heaviness, lifetime smoking) | Suicidal ideation and attempts | Logistic regression | Multivariable logistic regression between smoking behaviours and suicidal ideation and attempts, adjusting for potential confounders | Study states that they triangulated across multiple methods, multiple smoking behaviours and multiple suicidal behaviours to improve causal inference; evidence was not consistent with a causal effect for smoking on suicidal ideation and attempts (i.e. an association was found in observational analyses but not MR) |
| MR | Two-sample MR of smoking initiation on suicide attempt using five different MR methods; MR of lifetime smoking behaviour on suicidal ideation and attempt using individual-level data | ||||
| Itani et al. ( | Prescription of varenicline | Smoking cessation at 2-years | Logistic regression | Multivariable logistic regression between varenicline prescription and smoking cessation, adjusting for potential confounders both in those with and those without a neuro-developmental disorder | Study highlights that triangulating three different analytical methods to address confounding is a strength; evidence was consistent with a causal effect for varenicline on smoking cessation (i.e. results from all three methods were consistent with a causal effect) |
| Propensity score matching | Participants were matched based on the association between their exposure and all baseline characteristics | ||||
| Instrumental variable analysis | Physicians’ previously recorded prescribing preferences for varenicline | ||||
| Taylor et al. ( | Prescription of varenicline | Smoking cessation and mental health | Logistic regression | Multivariable logistic regression between varenicline prescription and smoking cessation and mental health outcomes adjusting for potential confounders both in those with and those without a mental disorder | Study states that results were triangulated from three analytical techniques; evidence was consistent with a causal effect for varenicline on smoking cessation (i.e. results from all three methods were consistent with a causal effect); this study is not independent from Itani et al. ( |
| Propensity score matching | Participants were matched based on the association between their exposure and all baseline characteristics | ||||
| Instrumental variable analysis | Physicians’ previously recorded prescribing preferences for varenicline | ||||
| Davies et al. ( | Remaining in school | Various health outcomes including depression diagnosis, alcohol use and smoking | Natural experiment | The raising of the school leaving age from 15 to 16 years was used as a natural experiment for testing whether remaining in school at 15 years of age affected later outcomes; data analysed using a regression discontinuity design, instrumental variable analysis and difference-in-difference analysis | Study does not refer to triangulation; evidence was consistent with a causal effect for remaining in school on reduced diabetes and mortality (i.e. results from all three methods were consistent with a causal effect) |
| Sanderson, Davey Smith, Bowden, & Munafo ( | Educational attainment | Smoking behaviour (current smoking, smoking initiation and smoking cessation) | Logistic regression | Multivariable logistic regression between educational attainment and smoking behaviours, adjusting for general cognitive ability and potential confounders | Study states that results were compared within a triangulation framework; evidence was consistent with a causal effect for more years of education on smoking behaviour (i.e. results from both methods were consistent with a causal effect) |
| MR | Multivariable MR of educational attainment and general cognitive ability on smoking behaviour using individual-level data; univariable and multivariable two-sample MR of educational attainment and general cognitive ability on smoking initiation and cessation | ||||
| Fancourt & Steptoe ( | Cultural engagement | Depression | Logistic regression | Multivariable regression between cultural engagement and depression, adjusting for potential confounders related to socio-economic status (SES) and baseline depression symptoms | Study states that a statistical triangulation approach was used, running three separate sets of analyses that each have different strengths and address different statistical limitations or biases; evidence was consistent with a causal effect for cultural engagement on depression (i.e. results from all three methods were consistent with a causal effect) |
| Propensity score matching | Participants were matched based on the association between their exposure and SES | ||||
| Fixed-effects regression | Regression model which takes account of all time-invariant factors (which include multiple aspects of SES) even if unobserved |
| Mechanistic evidence can strengthen causal inference; indeed, some argue that causality cannot be established until a mechanism is identified (Glennan, |