Literature DB >> 35641661

Interpreting and assessing confidence in network meta-analysis results: an introduction for clinicians.

Alan Yang1, Petros Pechlivanoglou1, Kazuyoshi Aoyama2,3.   

Abstract

PURPOSE: We aimed to provide clinicians with introductory guidance for interpreting and assessing confidence in on Network meta-analysis (NMA) results.
METHODS: We reviewed current literature on NMA and summarized key points.
RESULTS: Network meta-analysis (NMA) is a statistical method for comparing the efficacy of three or more interventions simultaneously in a single analysis by synthesizing both direct and indirect evidence across a network of randomized clinical trials. It has become increasingly popular in healthcare, since direct evidence (head-to-head randomized clinical trials) are not always available. NMA methods are categorized as either Bayesian or frequentist, and while the two mostly provide similar results, the two approaches are theoretically different and require different interpretations of the results.
CONCLUSIONS: We recommend a careful approach to interpreting NMA results and the validity of an NMA depends on its underlying statistical assumptions and the quality of the evidence used in the NMA.
© 2022. The Author(s).

Entities:  

Keywords:  Confidence intervals; Credible intervals; Indirect treatment comparisons; Multiple treatment comparisons; Network meta-analysis

Mesh:

Year:  2022        PMID: 35641661      PMCID: PMC9338903          DOI: 10.1007/s00540-022-03072-5

Source DB:  PubMed          Journal:  J Anesth        ISSN: 0913-8668            Impact factor:   2.931


Introduction

The highest level of evidence for the comparative effectiveness of different clinical interventions generally comes from systematic reviews of randomized controlled trials (RCTs) [1-3]. The most conventional and widely used method for synthesizing the results of different RCTs is pairwise meta-analysis [4, 5]. While this statistical approach is useful, it is limited as it can only compare two interventions at a time, and only head-to-head RCTs that involve the comparison of interest [6]. Network meta-analysis (NMA) is a statistical method that extends the principles of pairwise meta-analysis to the evaluation of multiple interventions in a single process, which is achieved by combining both direct and indirect evidence [4, 5, 7, 8]. Direct evidence represents evidence obtained from head-to-head RCTs [4]. For example, in an RCT comparing interventions A and B, the estimate of relative effectiveness of A versus B counts as direct evidence. Indirect evidence represents evidence obtained from one or more common comparators; for example, in the absence of RCTs that evaluate interventions A and B directly, interventions A and B can be indirectly compared if both have been compared to a common intervention C in existing trials [4]. The combination of direct and indirect evidence is at the core of a network meta-analysis [5, 7, 8]. Network meta-analysis is a statistical method for synthesizing direct and indirect evidence from a network of clinical trials to concurrently compare multiple clinical interventions in a single process [4, 5, 7–9]. Synonymous names of NMA include multiple treatment meta-analysis, indirect treatment comparisons, and mixed treatment comparisons [1, 10]. NMA has become attractive among clinicians and health-care researchers in recent years because of its ability to evaluate the comparative clinical effectiveness of different clinical interventions based on clinical evidence through a robust quantitative framework [3, 8, 11]. However, due to its complex structure and methodological requirements, a careful approach is required when interpreting NMA results, to avoid drawing biased or incorrect conclusions [3, 12]. This article aims to provide clinicians with introductory guidance for interpreting and assessing confidence in NMA results.

Interpretation of NMA results

NMA has matured over the recent years and NMA models are available for different types of individual-level and trial-level data and summary effect measures (e.g., odds ratio, risk difference) and are being implemented in both frequentist and Bayesian frameworks [2, 13, 14]. Typically, interventions are displayed in the form of a network, called a network diagram. Statistical approaches to NMA are broadly classified as frequentist and Bayesian frameworks [1, 2, 15]. The Bayesian framework allows for a more logical analysis of indirect and multiple comparisons, which are essential for an NMA; therefore, 60–70% of NMA studies have adopted a Bayesian approach [16, 17]. The differences between the two methodological frameworks are further outlined below. While these two methodological frameworks have different fundamental concepts for approaching the NMA model, they produce almost identical results if the sample size is large [17, 18]. Table 1 explains the common terms used in an NMA with plain words as much as possible, to help readers navigate through the following paragraphs [1–5, 8, 11, 13, 17–28].
Table 1

Network meta-analysis concepts and definitions

FrameworkConcept/definition
Indirect treatment comparison (ITC)Bayesian and frequentistA comparison of the relative effectiveness across different clinical interventions using data from separate non-head-to-head RCTs
Fixed effects model (FE)Bayesian and frequentistThe fixed-effect model assumes that there is a true effect size that underlies all the RCTs for each comparison in the network, and that all differences in the observed effect sizes are due to sampling error
Random effects model (FE)Bayesian and frequentistThe random-effects model assumes that the true effect size can differ from trial to trial
Likelihood functionFrequentistThe likelihood function characterizes the joint probability of the observed data as a function of the parameters of the statistical model
P valueFrequentistThe P value is the probability of finding a result that is more extreme than the observed result if the null hypothesis was true. P values are used to help determine whether to reject the null hypothesis. The smaller the P value, the more likely will the null hypothesis be rejected. If the P value is smaller than a pre-specified significance level (usually 5%), then the null hypothesis is rejected at this significance level
Confidence intervalFrequentistA confidence interval provides an estimated range of values that is likely to include an unknown population parameter; it is calculated from the observed data. The confidence level of a confidence interval is the probability that the interval produced by the method used to calculate the confidence interval includes the true value of the parameter; it is usually 95%
Prior distributionBayesianA prior distribution, or prior, of an unknown parameter, usually the mean effect size, is the probability distribution that represents one’s beliefs about this parameter before considering any evidence or observed data
Posterior distributionBayesianThe posterior distribution encapsulates all information about an unknown parameter, usually effect sizes, after evidence and observed data are considered. It combines information from the prior distribution and the likelihood function
Posterior summariesBayesianSummary statistics of a posterior distribution; often the mean, median, maximum, minimum, and standard deviation are reported
Credible intervalsBayesianA credible interval is an interval within which an unknown parameter value, usually an effect size, falls with a specific probability. It is an interval within a posterior distribution
Ranking probabilities; probability of best treatment; surface under the cumulative ranking area (SUCRA)Bayesian and FrequentistRanking probability is the probability that an intervention is at a specific rank (first, second, etc.) when compared with the other interventions based on a statistic (e.g., mean odds, mean risk, median survival probability). The probability of best treatment is the probability that an intervention is ranked first. The surface under the cumulative ranking curve (SUCRA) is a single number that summarizes the overall ranking of each intervention. Ranking probabilities and SUCRA range from 0 to 100%
Predictive distributionsBayesianThe predictive distribution is the distribution of possible unobserved (new/ forecasted) values given the observed values
Akaike information criterion (AIC) and Bayesian information criterion (BIC)FrequentistThe AIC and the BIC are model fit assessments that attempt to explicitly balance model complexity with fit to the observed data. The BIC tends to penalize complex models more compared to the AIC
Deviance information criterion (DIC)BayesianThe DIC compares the relative fit of a set of Bayesian models. Like the AIC and the BIC, it is a model selection method which tries to explicitly balance model complexity with fit to the data
Network geometryBayesian and FrequentistThe geometry of the network, usually presented as a network plot, consists of a number of nodes (i.e., interventions), a number of edges (i.e., direct comparison evidence), and number of included studies (thickness of the edges)
Transitivity, similarity or exchangeabilityBayesian and FrequentistThe selection of RCTs to formulate the NMA should be based on rigorous criteria and therefore the included RCTs should be similar such that there are no systematic differences between them other than the interventions. That is, the trials in comparison do not differ with respect to the distribution of effect modifiers
HeterogeneityBayesian and FrequentistThe variation in trial outcomes between RCTs within the same comparison
ConsistencyBayesian and FrequentistThe degree of agreement between estimates of effect sizes from direct and indirect evidence
ConvergenceBayesianSamples from the fitted posterior distributions tend to the theoretical posterior distributions as the number of samples becomes adequately large
Effect modifiersBayesian and FrequentistCharacteristics that impact the relative clinical intervention effects
Meta-regressionBayesian and FrequentistA regression model that models trial-level or arm-level effect sizes with trial-level covariates. It is often used to reduce heterogeneity and inconsistency between RCTs in the network
I2FrequentistThe I2 statistic is the percentage of variation across RCTs that is due to unexplained heterogeneity rather than randomness
T2FrequentistT2 is the between-studies variance (the variance of the true effect size parameters across all RCTs) parametrized in the random effects model
τ2Bayesianτ2 is the precision parameter and also the inverse of the between-trial variance parameter in the random effects model. The lower the between-trial variance, the higher is the precision
Network meta-analysis concepts and definitions The Bayesian method combines the known information obtained in the past (prior information) with the present data (likelihood) to calculate the posterior (“post” data observation) probability where the research hypothesis holds [29]. Therefore, the Bayesian method takes a probabilistic approach that allows us to calculate the probability that the research hypothesis holds true, the probability that the true effect size falls within a range—the 95% credible interval (CrI), and the ranking probabilities of interventions [8, 29, 30]. Moreover, these probabilities can change depending on prior information [30]. The frequentist method calculates the P value or the 95% confidence interval (CI) for rejecting the research hypothesis based solely on present data [7, 8, 17]. Table 2 also highlights differences and similarities between frequentist and Bayesian approaches for NMA [4, 5, 15, 17, 18, 26, 31].
Table 2

Differences and similarities between frequentist and Bayesian approaches for network meta-analysis

Frequentist frameworkBayesian framework
Prior informationPrior information is informally introduced often in the form of supplementary text and is underemphasizedIncorporated within user-specified prior distributions
Basic interpretationHow likely is it to observe the data given a specific parameter value?How likely is a specific parameter value given the observed data?
Presentation of resultsP values, confidence intervals, ranking probabilitiesPosterior distributions, credible intervals, ranking probabilities
CaveatP values are often misinterpreted as probability that the alternative hypothesis is true. Confidence intervals are often misinterpreted as the probability that the true effect size lies in a particular interval

Priors may be difficult to choose

Readers often uncritically overemphasize the subjective component induced by the prior and therefore undermine the quality of the analysis. More complex to conduct

Additional featuresModel fit and quality assessed with Akaike information criteria or other similar criteriaModel fit and quality assessed with deviance information criterion
Differences and similarities between frequentist and Bayesian approaches for network meta-analysis Priors may be difficult to choose Readers often uncritically overemphasize the subjective component induced by the prior and therefore undermine the quality of the analysis. More complex to conduct

Illustration of interpretation of NMA results through a recent publication in the Journal of Anesthesia

The Journal of Anesthesia has recently published several NMAs [32-36]. We illustrate the interpretation of NMA results through published studies in the journal. One NMA examined the comparative effectiveness of interventions for managing postoperative catheter-related bladder discomfort (CRBD) [33]. A Bayesian Table 3 NMA including 29 trials with 2841 participants was performed for this study. A total of 14 interventions including placebo were included in the evidence network. The effect sizes of interest were the odds ratio (OR) of CRBD at 0 and 1 h after surgery. The results of a Bayesian NMA are usually presented as estimates of relative effect sizes accompanied by 95% Crl. Relative effect sizes are often ratios (e.g., OR, risk ratio, hazard ratio), and in such cases if the credible interval contains 1, then the comparators are not considered as different in the effect size. If the credible interval lies entirely above or below 1, then the comparators are considered as different in the effect size, and the direction (positive or negative) depends on the nature of the effect size associated with the outcome of interest [5, 37]. For example, the estimated OR of CRBD at 0 h after surgery for ketamine versus placebo is 0.17 with a 95% CrI of (0.04, 0.82), which means the odds of CRBD at 0 h after surgery of ketamine is significantly lower than that of placebo. The 95% CrI also implies the true odds ratio of CRBD at 0 h after survey of ketamine versus placebo has a 95% probability of being between 0.04 and 0.82. The estimated OR of CRBD at 0 h after surgery of tramadol versus placebo is 0.26 with a 95% CrI of (0.04, 1.73). Since this 95% CrI contains 1, OR of CRBD at 0 h after surgery of tramadol versus placebo has a 95% probability of not being different. A 95% CI under the frequentist approach does not have the same intuitive and practical interpretation, but can only conclude whether the two interventions are statistically different in the effect size at 5% level of significance [37, 38]. A significance level of 5% indicates that there is a 5% risk of concluding that there is a difference when there is actually no difference. That is, if a result is statistically significant, it means it is unlikely to have occurred solely by chance or random factors.
Table 3

Available software and statistical packages for network meta-analysis as of December 13, 2021

Statistical packageFrameworkProsConsURL
RBayesian and frequentistGreat flexibility, high-quality customizable graphs, free accessLimited user friendliness, steep learning curve, requiring extensive programming knowledgehttps://www.rproject.org
WinBUGS/OPENBUGS/JAGSBayesianGreat flexibility, free access, accessible through other software (e.g., R)Limited user friendliness, steep learning curve, requiring extensive programming knowledge, limited graphical functionality

https://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-winbugs

https://www.mrc-bsu.cam.ac.uk/software/bugs/openbugs

https://mcmc-jags.sourceforge.io

SASBayesian and FrequentistGreat flexibilityLimited user friendliness, requiring fundamental programming knowledge, costhttps://www.sas.com
StataBayesian and FrequentistHigh-quality graphs, variety of analyses availableLimited user friendliness, costhttps://www.stata.com
ADDIS/GeMTCBayesianUser friendliness, embeds well-developed methods and techniques that are ready to useLimited modeling capabilities, limited graphical optionshttps://gemtc.drugis.org
Available software and statistical packages for network meta-analysis as of December 13, 2021 https://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-winbugs https://www.mrc-bsu.cam.ac.uk/software/bugs/openbugs https://mcmc-jags.sourceforge.io We illustrate the interpretation of results of a frequentist NMA through a study that examined the effects of individualized positive end-expiratory pressure (PEEP) combined with recruitment maneuver (RM) on intraoperative oxygenation during abdominal surgery [32]. A frequentist NMA including 15 trials with 3634 participants was performed for this study. A total of eight interventions were included in the evidence network. The main effect size of interest was the mean difference in oxygenation index. The results of a frequentist NMA are usually presented as estimates of absolute or relative effect sizes accompanied by 95% Cl. If the Cl does not contain the equalization threshold (e.g., 0 for difference-type effect sizes, 1 for ratio-type effect sizes), the comparators are statistically different in the effect size, and the direction (positive or negative) depends on the nature of the effect size associated with the outcome of interest. For example, the estimated mean difference in oxygenation index between interventions is 145.0 with 95% Cl (87.0, 202.9), which means the oxygenation index of Individualized PEEP + RM is 145.0 higher than that of High PEEP at a 5% significance level. The difference is statistically significantly as the lower edge of 95% CI (i.e., 87.0) is greater than 0. It is worthwhile to discuss the interpretation of ranking probabilities such as surface under the cumulative ranking area (SUCRA), since these often tend to be misinterpreted in the literature [27, 28, 39]. Table 1 also provides an explanation of these terms. When interpreting these ranking statistics, one should also consider (1) the quality of evidence used in the NMA; (2) confidence in NMA results (further described in the next session); (3) the magnitude of differences in intervention effects; and (4) random chance that may explain any apparent differences between intervention rankings [3, 26, 27, 40]. That is, clinicians and decision makers should not assume an intervention as being “best” simply because it is ranked first, unless the aforementioned aspects of the NMA are fully considered.

Confidence in NMA results

NMA inherits all challenges present in a conventional pairwise meta-analysis, but with magnified complexity due to the large number of comparisons within the evidence network [37]. To cope with these challenges, NMA adopts a set of assumptions that should be satisfied. The assumptions are (1) similarity or exchangeability, (2) homogeneity and (3) transitivity or consistency [8, 22, 23]. Definitions and concepts of these assumptions are described in detail in Table 1. Typically, if the trial population, trial design and outcome measures are similar for trials that compose the NMA, and that the trials are comparable on effect modifiers (Table 1), these assumptions are adequately satisfied [22, 23]. If one or more assumptions are not satisfied, the NMA becomes inherently biased and in turn yields biased and inaccurate results [41]. To prevent this, remedial measures and adjustments should be applied if appropriate. Methods for assessing NMA assumptions and remedial measures have been developed and widely adopted over the past few years [22, 23]. In addition to these more statistical assumptions, the characteristics of trials in the evidence network that affect the certainty of evidence should be evaluated [42]. These characteristics include risk of bias and publication bias and are often part of the systematic review. These biases usually increase the level of uncertainty of individual trial evidence and subsequently the synthesized evidence in an NMA [3]. In summary, violation of the similarity, homogeneity and consistency assumptions, as well as the presence of any risk of bias and publication bias, affect the overall confidence in the results of an NMA. Therefore, when reviewing a published NMA, one should examine if these issues were identified and how they were dealt with and base one’s confidence in the NMA on these factors. GRADE (Grading of Recommendations, Assessment, Development and Evaluations) is a transparent framework for developing and presenting summaries of evidence [42, 43]. It is the most widely adopted tool for grading the quality of evidence with over 100 organizations worldwide officially endorsing GRADE [42]. GRADE provides a tool to assess the aforementioned statistical assumptions and evidence characteristics for any NMA [42-44]. We recommend reviewing the GRADE assessment of a published NMA if it is available. Other tools to assess the quality of an NMA include checklists published by the National Institute for Health and Care Excellence (NICE), the Professional Society for Health Economics and Outcomes Research (ISPOR), PRISMA and Medical Decision Making (MDM) [3, 26, 40, 45].

Using individual patient data in a network meta-analysis

Nowadays, as data become easier to collect and assess, we enter an era of “big data” with big data analysis emerging as a new analysis technique in clinical research [46]. We can utilize big data to improve precision of an NMA. An NMA can turn into a big data analysis through incorporating individual patient data (IPD) into its evidence synthesis process [47, 48]. There are benefits of conducting an NMA using IPD over a usual NMA using aggregated trial-level data. If there is interest in patient-specific covariates, either to explain between-study inconsistency or to explore intervention effects in subgroups of patients, using IPD can have much more statistical power than using aggregated trial-level covariates [48]. Furthermore, several studies have shown that the use of IPD in NMA will considerably improve the precision of estimates of intervention effects and regression coefficients in most scenarios [49, 50]. However, IPD may not provide significant improvement to NMAs that have large and dense intervention networks, since the amount of data and evidence are already large and using IPD on top of these will not much improve the precision in the intervention effect estimates [47]. In most NMAs, since IPD may not be available from all eligible RCTs, techniques for combining IPD and aggregated trial-level data into the NMA have been developed Fconsider[47, 50].

Conclusions

Network meta-analysis has become increasingly popular for synthesizing multiple sources of clinical evidence. It provides the ability to compare multiple clinical interventions where head-to-head trials are not always available by combining direct and indirect evidence from a network of clinical trials. By doing so, it produces less biased and more precise intervention efficacy estimates. While Bayesian and frequentist methods often yield similar results, the two approaches are fundamentally different in theoretical principles and more importantly require different interpretation of the results. The major limitation of NMA is that NMA results hinge on the inherent statistical assumptions of the NMA and the quality of the evidence used in the NMA. The inherent statistical assumptions are strict and often difficult to satisfy, and the quality of evidence used in the NMA are often difficult to uphold. Multiple requirements need to be met for the results to be sound and useful. Therefore, we recommend a thorough, careful, and conservative approach to interpreting and evaluating the results of an NMA. We also recommend using big data analysis techniques to integrate IPD into the NMA to improve the overall quality and precision of the NMA.
  44 in total

1.  GRADE approach to rate the certainty from a network meta-analysis: avoiding spurious judgments of imprecision in sparse networks.

Authors:  Romina Brignardello-Petersen; M Hassan Murad; Stephen D Walter; Shelley McLeod; Alonso Carrasco-Labra; Bram Rochwerg; Holger J Schünemann; George Tomlinson; Gordon H Guyatt
Journal:  J Clin Epidemiol       Date:  2018-09-22       Impact factor: 6.437

2.  How valuable are multiple treatment comparison methods in evidence-based health-care evaluation?

Authors:  Nicola J Cooper; Jaime Peters; Monica C W Lai; Peter Juni; Simon Wandel; Steve Palmer; Mike Paulden; Stefano Conti; Nicky J Welton; Keith R Abrams; Sylwia Bujkiewicz; David Spiegelhalter; Alex J Sutton
Journal:  Value Health       Date:  2011-02-05       Impact factor: 5.725

3.  Network meta-analysis: an introduction for clinicians.

Authors:  Benjamin Rouse; Anna Chaimani; Tianjing Li
Journal:  Intern Emerg Med       Date:  2016-12-02       Impact factor: 3.397

4.  Checking consistency in mixed treatment comparison meta-analysis.

Authors:  S Dias; N J Welton; D M Caldwell; A E Ades
Journal:  Stat Med       Date:  2010-03-30       Impact factor: 2.373

Review 5.  Determining Associations and Estimating Effects with Regression Models in Clinical Anesthesia.

Authors:  Kazuyoshi Aoyama; Ruxandra Pinto; Joel G Ray; Andrea Hill; Damon C Scales; Robert A Fowler
Journal:  Anesthesiology       Date:  2020-09       Impact factor: 7.892

6.  The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations.

Authors:  Brian Hutton; Georgia Salanti; Deborah M Caldwell; Anna Chaimani; Christopher H Schmid; Chris Cameron; John P A Ioannidis; Sharon Straus; Kristian Thorlund; Jeroen P Jansen; Cynthia Mulrow; Ferrán Catalá-López; Peter C Gøtzsche; Kay Dickersin; Isabelle Boutron; Douglas G Altman; David Moher
Journal:  Ann Intern Med       Date:  2015-06-02       Impact factor: 25.391

7.  Evidence synthesis for decision making 5: the baseline natural history model.

Authors:  Sofia Dias; Nicky J Welton; Alex J Sutton; A E Ades
Journal:  Med Decis Making       Date:  2013-07       Impact factor: 2.583

8.  Network meta-analysis: a technique to gather evidence from direct and indirect comparisons.

Authors:  Fernanda S Tonin; Inajara Rotta; Antonio M Mendes; Roberto Pontarolo
Journal:  Pharm Pract (Granada)       Date:  2017-03-15

Review 9.  Big Data in Studying Acute Pain and Regional Anesthesia.

Authors:  Lukas M Müller-Wirtz; Thomas Volk
Journal:  J Clin Med       Date:  2021-04-01       Impact factor: 4.241

10.  Conduct and reporting of individual participant data network meta-analyses need improvement.

Authors:  Anna Chaimani
Journal:  BMC Med       Date:  2020-06-02       Impact factor: 8.775

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.