| Literature DB >> 33324996 |
Michal Shimonovich1, Anna Pearce2, Hilary Thomson2, Katherine Keyes3, Srinivasa Vittal Katikireddi2.
Abstract
The nine Bradford Hill (BH) viewpoints (sometimes referred to as criteria) are commonly used to assess causality within epidemiology. However, causal thinking has since developed, with three of the most prominent approaches implicitly or explicitly building on the potential outcomes framework: directed acyclic graphs (DAGs), sufficient-component cause models (SCC models, also referred to as 'causal pies') and the grading of recommendations, assessment, development and evaluation (GRADE) methodology. This paper explores how these approaches relate to BH's viewpoints and considers implications for improving causal assessment. We mapped the three approaches above against each BH viewpoint. We found overlap across the approaches and BH viewpoints, underscoring BH viewpoints' enduring importance. Mapping the approaches helped elucidate the theoretical underpinning of each viewpoint and articulate the conditions when the viewpoint would be relevant. Our comparisons identified commonality on four viewpoints: strength of association (including analysis of plausible confounding); temporality; plausibility (encoded by DAGs or SCC models to articulate mediation and interaction, respectively); and experiments (including implications of study design on exchangeability). Consistency may be more usefully operationalised by considering an effect size's transportability to a different population or unexplained inconsistency in effect sizes (statistical heterogeneity). Because specificity rarely occurs, falsification exposures or outcomes (i.e., negative controls) may be more useful. The presence of a dose-response relationship may be less than widely perceived as it can easily arise from confounding. We found limited utility for coherence and analogy. This study highlights a need for greater clarity on BH viewpoints to improve causal assessment.Entities:
Keywords: Bradford Hill; Causal inference; Directed acyclic graphs; GRADE; Sufficient component cause models
Mesh:
Year: 2020 PMID: 33324996 PMCID: PMC8206235 DOI: 10.1007/s10654-020-00703-7
Source DB: PubMed Journal: Eur J Epidemiol ISSN: 0393-2990 Impact factor: 12.434
Bradford Hill viewpoints and explanatory quotations
| Viewpoint | Explanatory quotations from Bradford Hill [ |
|---|---|
| Strength of association | “But to explain the pronounced excess in cancer of the lung [ |
| Consistency | “We have, therefore, the somewhat paradoxical position that the different results of a different inquiry certainly cannot be held to refute the original evidence; yet the same results from precisely the same form of inquiry will not invariably greatly strengthen the original evidence. I would myself put a good deal of weight upon similar results reached in quite different ways, e.g. prospectively and retrospectively.” p. 296–297 |
| Specificity | “If, as here, the association [ |
| Temporality | “Which is the cart and which the horse? This is a question which might be particularly relevant with diseases of slow development.” p. 297 |
| Dose-response | “For instance, the fact that the death rate from cancer of the lung rises linearly with the number of cigarettes smoked daily, adds a very great deal to the simpler evidence that cigarette smokers have a higher death rate than non-smokers.” p. 298 |
| Plausibility | “But this is a feature I am convinced we cannot demand. What is biologically plausible depends upon the biological knowledge of the day.” p. 298 |
| Coherence | “On the other hand, the cause-and-effect interpretation of our data should not seriously conflict with the generally known facts of the natural history and biology of the disease.” p. 298 |
| Experiment | “Occasionally it is possible to appeal to experimental, or semi-experimental, evidence. For example, because of an observed association some preventive action is taken. Does it in fact prevent? The dust in the workshop is reduced, lubricating oils are changed, persons stop smoking cigarettes. Is the frequency of the associated events affected? Here the strongest support for the causation hypothesis may be revealed.” p. 298–299 |
| Analogy | “In some circumstances it would be fair to judge by analogy. With the effects of thalidomide and rubella before us we would surely be ready to accept slighter but similar evidence with another drug or another viral disease in pregnancy.” p. 299 |
Fig. 1Directed acyclic graph representing relationship between alcohol consumption and active-TB. The confounding variable, overcrowding, effects both the exposure and outcome and should be conditioned on, as indicated by the bold square around overcrowding
Sufficient component cause models and corresponding prevalence rates and risk ratios (RRs) for each sufficient-cause between two populations
| Column 1: Causal pies | 2: Alcohol consumption | 3: Overcrowding | Population 1 | Population 2 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Prevalence of outcome for each sufficient-cause | Prevalence of outcome for each sufficient-cause | |||||||||
| 4: Active-TB | 5: Not-active-TB | 6: Risk of active- TB | 7: Risk ratio (RR) | 8: Active-TB | 9: Not-active-TB | 10: Risk of active- TB | 11: Risk ratio (RR) | |||
|
| 0 | 0 | 20 | 80 | 0.2 | Reference group | 20 | 80 | 0.2 | Reference group |
|
| 1 | 0 | 60 | 40 | 0.6 | 3.0 | 60 | 40 | 0.6 | 3.0 |
|
| 0 | 1 | 70 | 30 | 0.7 | 3.5 | 40 | 60 | 0.4 | 2.0 |
|
| 1 | 1 | 90 | 10 | 0.9 | 4.5 | 90 | 10 | 0.9 | 4.5 |
The prevalence of each causal pie differs in each population, and as a result the RR differs in each population
Unknown factors may differ in each combination of components, as indicated by the different subscripts of U corresponding to each SCC model. In a hypothetical dataset of 400 individuals, A and O are measured and U is not. The causal pies can be found in column one (see label). Columns two and three indicate if the individual has been exposed to each measured causal component (A and O, where A = 1 indicates individuals represented in the corresponding SCC models have been exposed). Columns four and five for population 1 and columns eight and nine for population 2 show the number of individuals in the example dataset who developed active-TB (T = 1) and who did not (T = 0), respectively. The sum of columns four and five for population 1 and eight and nine for population 2 is the total number of individuals exposed to each causal pie for each population. Finally, column seven for population 1 and eleven for population 2 is the risk ratio (RR) for each pie calculated using S as the reference group
The initial level of certainty, according to GRADE, differs between randomised controlled trials (RCTs) and nonrandomised studies (NRSs)
| Type of evidence corresponding to initial level of certainty | Level of certainty | Definition of level of certainty | |||
|---|---|---|---|---|---|
| Randomised controlled trials (RCTs) | High (four plus: ⊕ ⊕ ⊕ ⊕) | We are very confident that the true effect lies close to that of the estimate of the effect | |||
| Moderate (three plus: ⊕ ⊕ ⊕ ○) | We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different | ||||
| Nonrandomised studies (NRSs) | Low (two plus: ⊕ ⊕ ○○) | Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect | |||
| Very low (one plus: ⊕ ○○○) | We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect | ||||
| Domains that may downgrade or upgrade (for observational evidence) a level of certainty | |||||
| Downgrade | Large effect + 1 Large + 2 Very large | Dose response + 1 Evidence of a gradient | All plausible residual confounding would: + 1 reduce a demonstrated effect + 1 suggest a spurious effect if no effect was observed | ||
| Upgrade | Risk of Bias − 1 Serious − 2 Very serious | Inconsistency − 1 Serious − 2 Very serious | Indirectness − 1 Serious − 2 Very serious | Imprecision − 1 Serious − 2 Very serious | Publication bias − 1 Likely − 2 Very likely |
The level of certainty indicates the confidence of investigators that the estimated effect is close to the true causal effect. GRADE provides domains that may upgrade or downgrade the level of certainty. Based on tables in [38]
Concerns about directness, inconsistency, imprecision and publication bias may reduce certainty. Directness refers to how closely the research evidence relates to the research question of interest, with different study populations (such as available evidence only focusing on adults, rather than children) or the use of surrogate outcomes being examples of ‘indirectness’. Inconsistency reflects differences in the effect size across studies (often identified through high levels of heterogeneity in a meta-analysis) which cannot be adequately explained. Imprecision occurs when effect estimates have wide confidence interval. Publication bias may arise if studies with a positive or exciting result are more likely to be published than those without a large association
Summary of utilisation of each Bradford Hill (BH) viewpoint by each causal assessment approach: BH viewpoints, directed acyclic graphs (DAGs), sufficient-component cause models and GRADE methodology. Based on comparative analysis of causal assessment approaches
| Strength of association | Consistency | Specificity | Temporality | Dose-response | Plausibility | Coherence | Experiment | Analogy | |
|---|---|---|---|---|---|---|---|---|---|
| Bradford Hill viewpoints | A strong association between an exposure and outcome indicates that the association is less likely due to something other than causality | Consistent observations of associations in different settings or populations indicate that the associations are less likely due to something other than causality | Evidence of specificity (one-to-one relationship) indicates that the association is less likely due to alternative variables (confounding) but absence of specificity does not undermine causality | Temporality is necessary for a causal argument to be made but may not always be clear, particularly with exposures that have an incubation period | Similar to strength of association, evidence of a dose-response relationship indicates that the association is less likely due to confounding | Critically evaluating plausible explanations for an association, other than causality, may strengthen a causal argument | Coherence is determined by how well assumptions about the causal relationship fit into existing theory | An association observed in an experiment provides strongest evidence that the association is not due to something other than causality | Associations between analogous exposures and outcomes indicate a similar causal mechanism and may strengthen a causal argument |
| Directed acyclic graphs | DAGs facilitate bias analysis which encourages articulating plausible confounding variables. Though DAGs cannot represent the size of an association, they can be used to consider the degree and implications of unmeasured and residual confounding | DAGs and SCC models provide a framework to elucidate the transportability of effect estimates. Transportability may be impacted by confounding structures in different settings or if the characteristics of different settings interact with the exposure. This may be useful for developing a causal explanation, which may then increase confidence in causality | DAGs cannot be used to articulate specificity, but they can be used to identify falsification outcomes (i.e. an outcome which cannot be plausibly associated with the exposure unless confounded) or falsification exposures (the opposite). The absence of a relationship between an exposures/outcomes and falsification variables are used to examine residual or unmeasured confounding and thus increase confidence in causality | DAGS can be used to articulate the potential for reverse causality which may undermine temporality | DAGs can be used to articulate confounding variables relevant to the relationship understudy. A high number of the confounding variables may undermine the relevance of a dose-response relationship in causal inference | SCC models and DAGs make the assumptions behind a causal relationship explicit, making it easier to consider the plausibility of the evidence and relationship | DAGs and SCC models are not helpful for considering coherence | DAGs can be used to articulate the extent to which exposure in certain study designs, such as natural experiment, resembles random exposure | DAGs and SCC models do not account for analogous relationships in their assessment, but analogous relationships may be part of developing the assumptions and theories encoded in the diagrams |
| Sufficient-component cause models | SCC models help to visually understand the impact prevalence of the outcome in the reference group has on the observed association | Specificity arises when a causal component is both necessary and sufficient to produce the outcome. SSC’s multifactorial nature illustrate the rarity of specificity | Time may be a component of a sufficient cause. Indicates a latent period that contributes to the outcome being observed | The unknown and unmeasured variables in SCC models limit their utility in understanding a dose-response relationship | Because unknown variables may differ between SCC models, they have limited utility for considering exchangeability between comparison groups | ||||
| GRADE methodology | GRADE provides guidance for what may be considered a large association. Upgrades NRSs if a large effect size is observed across a body of evidence | GRADE methodology underscores that consistent effect estimates, as described by Bradford Hill, may not give more confidence in causality as it could be due to the same bias. Rather, unexplained inconsistency (heterogeneous effect sizes) reduces confidence about the effect of the exposure on the outcome | GRADE does not take specificity into account, although it may be incorrectly conflated with indirectness | Evidence that proves participants were exposed before the outcome was recorded (such as an RCT) is graded higher than evidence that does not | GRADE suggest upgrading NRSs if a dose-response gradient is present because, alongside a strong effect, it indicates that the effect is less likely due to residual confounding | GRADE upgrades for adjustment for plausible confounding, but not plausibility of relationship | Coherence may be incorrectly conflated with indirectness, but GRADE does not account for coherence | Evidence from experimental studies graded higher than from non-experimental studies | Evidence of effect of exposure on analogous outcomes may prevent downgrading evidence, but this is more to do with applicability of surrogate outcomes rather than analogy as Bradford Hill described it |
Fig. 2Directed acyclic graph (DAG) of target population with high baseline risk of HIV. The high baseline risk of HIV means that HIV has been conditioned upon, indicated by square around HIV. The estimated effect of alcohol consumption on active-TB in this population will be
modified by the higher risk of HIV. This needs to be considered when comparing the effect estimates between this target population and the one described in Fig. 1 with low risk of HIV
Fig. 3The directed acyclic graphs (DAG) shows the relationship between the exposure (alcohol consumption), the outcome (active-TB), the confounding variable (overcrowding) and the falsification outcome (head lice). The bold square around overcrowding indicates that it has been conditioned on. If there is no effect of alcohol consumption on head lice, there is a greater likelihood that overcrowding has been accurately conditioned upon
Fig. 4Temporality using directed acyclic graphs (DAGs). Investigators may be more confident that the effect of alcohol consumption on active-TB is not due to reverse causality if (1) they condition upon active-TB before diagnosis and continue to observe an effect of alcohol consumption on active-TB after diagnosis or (2) if they do not observe an effect of active-TB before diagnosis on alcohol consumption
Fig. 5Directed acyclic graph (DAG) with randomisation as the instrumental variable. According to this DAG, randomisation causes alcohol consumption. If this were true, there is a greater likelihood that the effect estimated would be similar or equivalent to the true causal effect
Summary of conclusions. Interpretation of each BH based on mapping of DAGs, SCC models and GRADE
| Bradford Hill viewpoint | Summary of comparisons | Implications for causal assessment |
|---|---|---|
| Strength of association | Bradford Hill argued that the stronger an association the less likely it could be explained by confounding, but did not make clear what should be constituted as strong. DAGs and SCC models can be used to consider how other variables might impact investigators’ confidence in a strong association and the extent to which it should be relevant to causal assessment. This includes the impact of several confounding variables or unknown and unmeasured confounding variables depicted by DAGs and the impact of competing causes depicted by SCC models, respectively. GRADE suggests potential thresholds for what constitutes a strong association | Strength of association should be considered in relation to potential residual confounding from unknown or unmeasured variables |
| Consistency | DAGs highlight that transportability (using the causal effect in one context to make causal inferences about a different population) issues may emerge due to differences in the confounding structures. SCCs illustrate that differences in prevalence of competing causes may result in variable effect sizes. GRADE draws attention to the importance of focusing on unexplained statistical heterogeneity (unexplained effect sizes that differ between populations) | A distinction needs to be made between different types of consistency namely transportability and unexplained statistical heterogeneity. Factors that may undermine transportability to another population may not undermine the causal relationship in that population. However, unexplained statistical heterogeneity may be used as evidence against a causal relationship |
| Specificity | One potential reason for specificity helping in causal assessment is that confounding cannot account for a specific relationship. DAGs can be used to extend this thinking to identify falsification exposures and outcomes. GRADE and SCC models reinforce Bradford Hill’s understanding of specificity, which is that a lack of specificity does not help with causal assessment | Specificity itself is rare and generally unhelpful in epidemiology. Falsification exposures or outcomes may strengthen evidence for a causal relationship, but may be difficult to identify |
| Temporality | DAGs explicitly incorporate the temporal ordering of variables and can be used to identify the potential biases due to reverse causality. Causal pies do not provide more insight, while GRADE privileges RCTs where the exposure necessarily precedes the outcome | Unchanged |
| Dose-response | Bradford Hill did not provide detailed explanations for how dose-response strengthened the evidence for causality. Similar to their use in strength of association, DAGs can be used to identify confounding variables which may create a spurious dose-response relationship. SCCs do not explicitly consider dose-response. GRADE currently uses the presence of a dose-response gradient to upgrade the certainty for a causal relationship | Dose-response is considered in both BH viewpoints and GRADE. However, it may not add as much to causal assessment as is commonly assumed, particularly if the impact of confounding variables is not considered alongside a dose-response gradient |
| Plausibility | DAGs and causal pies make assumptions about causal relationships explicit, thus they should be built upon plausibility. This transparency allows the plausibility of those assumptions to be interrogated by others. This, as well as the certainty assessed using GRADE, may provide evidence for the plausibility of the assumptions made in causal assessment | Plausibility can be formally encoded within DAGs to articulate the causal chain and in SCC models to articulate causal mechanisms, such as interaction between variables |
| Coherence | DAGs and causal pies do not typically consider coherence. GRADE does not consider coherence either, though it has been confused with indirectness. In practice, it is poorly delineated from plausibility | Utility not clearly supported |
| Experiment | Bradford Hill argued that experiment was the most important viewpoint for assessing causality. DAGs may help identify exchangeable groups (e.g. instrumental variables). SCC models do not explicitly consider experiments. GRADE privileges RCTs but does not discriminate between natural experiment studies and other NRSs | Consistent with what Bradford Hill argued, genuine experiments (trials), as well as quasi-experiments, can substantially strengthen causal inference |
| Analogy | Certainty in causality of analogous relationships or in causality of analogous outcomes may strengthen a causal argument or may be useful in developing assumptions about relationship, however these are not embedded into DAGs or SCC models. GRADE considers analogous exposures within the body of evidence, but not whether assumptions about analogous relationships can be transported to the causal relationship under study | Utility not clearly supported |