| Literature DB >> 30775013 |
Paul Montgomery1, Ani Movsisyan2, Sean P Grant3, Geraldine Macdonald4, Eva Annette Rehfuess5.
Abstract
Public health interventions and health technologies are commonly described as 'complex', as they involve multiple interacting components and outcomes, and their effects are largely influenced by contextual interactions and system-level processes. Systematic reviewers and guideline developers evaluating the effects of these complex interventions and technologies report difficulties in using existing methods and frameworks, such as the Grading of Recommendations Assessment, Development and Evaluation (GRADE). As part of a special series of papers on implications of complexity in the WHO guideline development, this paper serves as a primer on how to consider sources of complexity when using the GRADE approach to rate certainty of evidence. Relevant sources of complexity in systematic reviews, health technology assessments and guidelines of public health are outlined and mapped onto the reported difficulties in rating the estimates of the effect of these interventions. Recommendations on how to address these difficulties are further outlined, and the need for an integrated use of GRADE from the beginning of the review or guideline development is emphasised. The content of this paper is informed by the existing GRADE guidance, an ongoing research project on considering sources of complexity when applying the GRADE approach to rate certainty of evidence in systematic reviews and the review authors' own experiences with using GRADE.Entities:
Keywords: public health; systematic review
Year: 2019 PMID: 30775013 PMCID: PMC6350753 DOI: 10.1136/bmjgh-2018-000848
Source DB: PubMed Journal: BMJ Glob Health ISSN: 2059-7908
Mapping the main sources of complexity onto difficulties in rating estimates of the effect of interventions (data taken from Movsisyan et al, 2016; Petticrew et al, 2013; Petticrew et al, 2019; Rehfuess and Akl, 2013)2 3 10 69
| Source of complexity | Difficulties in rating estimates of the effect of interventions |
| Multiple components | Interventions are comprised of different components, which may interact (synergistically or dysynergistically) Need to assess the effects of interventions as bundles or specific intervention components |
| Flexibility or tailoring or non-standardisation of implementation | Ambiguities around how to assess fidelity to intervention implementation |
| Long causal pathways | Lack of direct evidence linking interventions with distal outcomes Need to integrate different pieces of evidence from potentially different bodies of evidence to estimate the distal effects |
| Effects are contingent on recipients’ and providers’ agency | It may be impossible to blind recipients and providers of interventions |
| Multiple outcomes | Need to prioritise between a range of important (health and non-health) outcomes |
| Effects at different levels, for example, individual and population levels | Need to consider outcomes at different levels (eg, individual, family and societal levels) Population-level interventions are frequently impossible to evaluate using RCTs, which results in downgrading the ‘best evidence possible’ for these interventions because of initial categorisation of evidence in GRADE based on study design |
| Moderating effects of context | Need to account for various implementation and contextual factors, when conceptualising and rating estimates of the effect |
GRADE, Grading of Recommendations Assessment, Development and Evaluation; RCT, randomised controlled trial.
Approaches for setting thresholds or ranges for certainty of evidence ratings (adapted from Hultcrantz et al, 2017)38
| Setting | Contextualisation | Threshold or range | How to set | What certainty rating represents |
| Primarily for systematic reviews and health technology assessment | Non-contextualised | Range: 95% CI | Using existing limits of the 95% CI, which implies that precision is not routinely part of the rating | Certainty that the effect lies within the CI |
| OR≠1; RR≠1; HR≠1; RD≠0 | Using the threshold of null effect | Certainty that the effect of one treatment differs from another | ||
| Primarily for systematic reviews and health technology assessment | Partly contextualised | Specified magnitude of effect | For example, small effect is the effect small enough to not use the intervention if adverse effects/costs are appreciable | Certainty in a specified magnitude of effect for one outcome (eg, trivial, small, moderate or large) |
| Primarily for practice guidelines | Fully contextualised | Threshold determined with consideration of all critical outcomes | Considering the range of effects on all critical outcomes, and the values & preferences for those ranges | Confidence that the direction of the net effect will not differ from one end of the certainty range to the other |
RD, risk difference; RR, risk ratio.
Key considerations for rating certainty in systematic reviews on the effects of complex interventions
| Recommendation | Rationale |
| 1. Use logic models to develop PICO and review questions | Logic models help in scoping, defining and conducting the review and in making the review relevant to policy and practice. Approaches have been developed to assist with this |
| 2. Identify which tools to use to best describe the sources of complexity that users will require | There are several newly developed tools on using a complexity perspective in systematic reviews, such as the approach by Petticrew |
| 3. Using these tools identify contextual and implementation factors and other moderators of effect that may help explain heterogeneity and which will need separate GRADE certainty ratings | In addition to the standard PICO question, identify in both the intervention and the system in which it is being used all the complexities and interactions that review users will want to know about Under intervention complexities, consider aspects of its implementation, such as theory of why and how the intervention is expected to work, the components, implementers, mediators, moderators, and causal pathways Under system complexities, consider context, setting and any other independent interventions taking place |
| Define ‘certainty’ in a manner that matches the needs of the intended users of the review | Decide among the three approaches to defining certainty of evidence: ‘non-contextualised’, ‘partly contextualised’ and ‘fully contextualised' In each case, specify the threshold or ranges used to rate certainty of evidence For ‘non-contextualised’ reviews, consider the utility of using GRADE for the ‘non-null’ effect |
| 3. Using these tools identify contextual and implementation factors and other moderators of effect that may help explain heterogeneity and which will need separate GRADE certainty ratings | In addition to the standard PICO question, identify in both the intervention and the system in which it is being used all the complexities and interactions that review users will want to know about Under intervention complexities, consider all aspects of its implementation, including theory of why and how the intervention is expected to work, the process, the components, implementers, moderators, causal pathways (linear and non-linear) and important process outcomes Under system complexities, consider context, setting (eg, individual or population level) and any other independent interventions taking place |
| 1. Initially rate any body of evidence as ‘high’ if a rigorous tool is used to assess risk of bias in NRSs (ie, ROBINS-I), otherwise, use the ‘standard’ GRADE guidance | Consider using Cochrane Risk of Bias (RoB V.2.0) tool for randomised controlled trials Consider using ROBINS-I for cohort-type studies |
| 2. Give extra scrutiny to the impact of lack of blinding providers/participants on overall risk of bias for outcomes | If lack of blinding of either participants of providers is unlikely to affect assessment of outcome (such as when using objective outcome measures, for example, mortality), then consider not downgrading evidence for lack of blinding for that outcome. |
| 3. Consider the effect of bias associated with deviation from the intended intervention | Deviations, such as poor adherence, poor implementation and cointerventions in relation to the effect of starting and adhering to an intervention, may lead to bias and may be downgraded by one level Consider not downgrading if assessing the effect of assignment to the intervention, when deviations do not occur in relation to usual practice and groups remain balanced |
| 4. Consider multiple criteria for judging inconsistency of evidence | Assessment of heterogeneity should always start off with an appraisal of study heterogeneity, including heterogeneity in PICO elements as well as methodological aspects Assessment of heterogeneity should take account of multiple rather than single criteria for inconsistency (eg, I2 and its p value, overlap of CIs and degree of variation within chosen thresholds) Consider whether definition of certainty of evidence influences nature of inconsistency assessment (eg, when effect sizes across all studies are consistently in the same direction outside of the null effect or a given threshold of interest, then downgrading for inconsistency is not warranted despite other measures) Consider different analytical methods to explain heterogeneity (eg, subgroup analysis, meta-regression and qualitative comparative analysis) |
| 5. Rate imprecision of evidence with regard to the adopted definition of ‘certainty’ | Consider whether definition of certainty of evidence influences nature of imprecision assessment For ‘non-contextualised’ systematic reviews definition, a certainty that the effect lies within estimated CIs or prediction intervals, a GRADE assessment for imprecision can usually be omitted as assessment of precision is dependent on the chosen range For ‘partly contextualised’ systematic reviews, consider whether the point estimate would represent a trivial, small, moderate or large absolute effect For ‘fully contextualised’ systematic reviews, simultaneously consider all important outcomes to determine precision of the effect estimate |
| 6. Examine indirectness of evidence by way of assessing important differences in the evidence base beyond what is expected | Consider grouping studies, synthesising evidence and rating certainty in the estimates of effect for separate outcomes according to the relevant sources of complexity identified at the start of the review Consider splitting the questions to answer subset conditions, downgrading only for those with less certain evidence. Do not downgrade for indirectness if observed differences are unlikely to affect the outcome |
| 7. Consider publication bias | Conduct extensive grey literature searches and expert contacts to identify reports and working papers Consider sponsorship of studies by any vested industries as well as potential ‘allegiance bias’ |
| 8. Upgrading evidence | Consider upgrading certainty of evidence for a dose–response relationship related to the level of implementation Consider upgrading evidence for a body of evidence from studies with low implementation fidelity positive results which counteract plausible residual bias or confounding |
| Use logic models to investigate coherence of evidence across the causal pathway | Consider assessing the coherence of evidence across different links in the causal pathway at the end of evidence synthesis. This judgement should be made outside of the GRADE framework |
CICI, Context and Implementation of Complex Interventions; GRADE, Grading of Recommendations Assessment, Development and Evaluation; iCAT-SR, Intervention Complexity Assessment Tool for Systematic Reviews; NRS, non-randomised study; PICO, Population, Intervention, Comparison, Outcome; PRISMA-CI, Preferred Reporting Items for Complex Interventions for Systematic Reviews and Meta-analyses; ROBINS-I, risk of bias in non-randomised studies; TIDieR, Template for Intervention Description and Replication.
Figure 1Example chain of evidence approach: screening and interventions for overweight in childhood. Arrow 1: Is there direct evidence that screening (and intervention) for overweight in childhood improves age-appropriate behavioural or physiological measures or health outcomes? Arrow 2: (1) What are appropriate standards for overweight in childhood, and what is prevalence of overweight based on these? (2) What clinical screening tests for overweight in childhood are reliable and valid in predicting obesity in childhood? (3) What clinical screening tests for overweight in childhood are reliable and valid in predicting poor health outcomes in adulthood? Arrow 3: What are the adverse effects of screening, including labelling? Is screening acceptable to patients? Arrow 4: (1) Do weight control interventions lead to improved intermediate outcomes? (2) What are common behavioural and health system elements of efficacious interventions? (3) Are there differences in efficacy between patient subgroups? Arrow 5: Do weight control interventions lead to improved health outcome and/or improved functioning? Arrow 6: What are the adverse effects of interventions? Are interventions acceptable to patients? Arrow 7: Are improvements in intermediate outcomes associated with improved health outcomes? (Only evaluated if there is no direct evidence for link 1 or link 5 and if there is sufficient evidence for link 4). BMI, body mass index. Taken from Whitlock et al, 2005.57