| Literature DB >> 17052350 |
Jonathan R Treadwell1, Stephen J Tregear, James T Reston, Charles M Turkelson.
Abstract
BACKGROUND: Methods for describing one's confidence in the available evidence are useful for end-users of evidence reviews. Analysts inevitably make judgments about the quality, quantity consistency, robustness, and magnitude of effects observed in the studies identified. The subjectivity of these judgments in several areas underscores the need for transparency in judgments. DISCUSSION: This paper introduces a new system for rating medical evidence. The system requires explicit judgments and provides explicit rules for balancing these judgments. Unlike other systems for rating the strength of evidence, our system draws a distinction between two types of conclusions: quantitative and qualitative. A quantitative conclusion addresses the question, "How well does it work?", whereas a qualitative conclusion addresses the question, "Does it work?" In our system, quantitative conclusions are tied to stability ratings, and qualitative conclusions are tied to strength ratings. Our system emphasizes extensive a priori criteria for judgments to reduce the potential for bias. Further, the system makes explicit the impact of heterogeneity testing, meta-analysis, and sensitivity analyses on evidence ratings. This article provides details of our system, including graphical depictions of how the numerous judgments that an analyst makes can be combined. We also describe two worked examples of how the system can be applied to both interventional and diagnostic technologies.Entities:
Mesh:
Year: 2006 PMID: 17052350 PMCID: PMC1624842 DOI: 10.1186/1471-2288-6-52
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
A Guide to Terminology
| Quality | The extent to which studies are protected from bias | |
| Quantity | The number of studies and the number of patients | |
| Consistency | The extent to which different studies found similar results | |
| Robustness | The extent to which minor alterations in the data do not change conclusions drawn from that data | |
| Magnitude of effect | The effect size | |
| Clinical interpretation: | How well does it work? | Does it work? |
| Type of rating: | Stability | Strength |
| Interpretation of rating: | Confidence that future evidence will not indicate a different effect size | Confidence that future evidence will not indicate a different direction of effect |
| Possible ratings: | High, Moderate, Low, or Unstable | Strong, Moderate, Weak, or Inconclusive |
Judgments Involved in Systematic Reviews
| • What method will be used to assess study quality? |
| • If a quality scale is to be used, what scoring method will apply? |
| • What is the threshold for excluding a study from analysis due to poor quality? |
| • How will the individual study quality ratings be summarized to yield a single overall rating of quality to the evidence base (High, Moderate, or Low)? |
| • What is the minimum number of studies required to permit a quantitative estimate? |
| • What is the minimum percentage of studies reporting accurate information (i.e., calculable effect sizes) required to permit a quantitative estimate? |
| • What imputation methods will be used for studies that did not report sufficient information for a calculable effect size? |
| • What effect size measure will be used? |
| • How will heterogeneity be measured (Q or I2)? |
| • What is the threshold for considering an evidence base heterogeneous? |
| • Will the summary effect size estimate be derived from a fixed-effects or random-effects meta-analysis? |
| • What robustness tests will be used? |
| • If a cumulative meta-analysis is one of the robustness tests, will it be a fixed-effects or random-effects model? |
| • If a cumulative meta-analysis is one of the robustness tests, in what order will studies be entered into the cumulation? |
| • If a cumulative meta-analysis is one of the robustness tests, how many steps (i.e., study removals) will be examined to determine robustness? |
| • If a cumulative meta-analysis is one of the robustness tests, what threshold for a change in the summary effect size will be used to determine robustness? |
| • If publication bias testing is one of the robustness tests, which method of testing for publication bias will be used? |
| • If confidence interval width is one of the robustness tests, how narrow must the interval be for the summary estimate to be considered robust? |
| • Will overall robustness be judged based on passing all of the robustness tests, or simply a majority, or what percentage? |
| • What is the minimum number of studies required to perform meta-regression? |
| • Which covariates will be included in multiple regression models? |
| • How many covariates are permitted in any given regression model? |
| • Does "explaining heterogeneity" require a statistically significant covariate, or the lack of resultant heterogeneity, or both of these? |
| • What robustness tests will be performed for the meta-regression? |
| • What robustness tests will be performed? |
| • If a cumulative meta-analysis is one of the robustness tests, in what order will studies be entered into the cumulation? |
| • If a cumulative meta-analysis is one of the robustness tests, how many steps (i.e., study removals) will be examined to determine robustness? |
| • What is the size of a lowest possible effect that is still clinically important? |
| • What is the p value for statistical significance? |
| • What is the definition of a large magnitude of effect? |
| • How will qualitative consistency be defined (based on point estimates, confidence intervals, some percentage of studies, etc)? |
USPSTF System for Evaluating the Quality of Evidence (Harris et al. 2002)[1]
| • |
| • |
| • Aggregate internal validity of studies addressing the linkage |
| • Aggregate external validity of studies addressing the linkage |
| • Coherence/consistency of studies addressing the linkage |
| • Quality of the evidence for each linkage in the analytic framework |
| • Degree to which a complete chain of linkages supported by adequate evidence connects the preventive service to health outcomes |
| • Degree to which the complete chain of linkages "fit" together a |
| • Degree to which the evidence connecting the preventive service and health outcomes is "direct" b |
a"Fit" refers to the degree to which the linkages refer to the same population and conditions. For example, if studies of a screening linkage identify people who are different from those involved in studies of the treatment linkage, the linkages are not supported by evidence that "fits" together.
b"Directness" of evidence is inversely proportional to the number of bodies of evidence required to make the connection between the preventive service and health outcomes. Evidence is direct when a single body of evidence makes the connection, and more indirect if two or more bodies of evidence are required.
The GRADE System for Grading Quality of Evidence and Strength of Recommendations[2,5]
| |
| |
| 1) Limitations in quality can decrease grade one or two levels. |
| 2) Evidence of reporting bias can also decrease grade one level. |
| 3) Grade can be increased one level if all plausible confounders would have reduced the treatment effect. |
| |
| 1) Important inconsistency can decrease grade one level. |
| |
| 1) Some or major uncertainty about directness lowers the grade one or two levels. |
| |
| 1) Magnitude of effect can increase the grade of evidence. Strong evidence of association (significant relative risk of >2 or <0.5 based on consistent evidence from 2 or more observational studies with no plausible confounders) increases the grade by one level. Very strong evidence of association (significant relative risk of >5 or <0.2 based on direct evidence with no major threats to validity) increases the grade by two levels. |
| 2) Evidence of a dose-response gradient increases the grade by one level. |
| 3) Imprecise or sparse data can lower the grade by one level. |
Figure 1Entry Into System.
Figure 2Overview of the High Quality Arm.
Figure 3High Quality Arm: Homogeneous Data.
Figure 4High Quality Arm: Heterogeneous Data.
Figure 5High Quality Arm: Small Evidence Base.
Figure 6Forest Plot Demonstrating the Quantitative/Qualitative Distinction. Plot showing the results of 14 randomized trials that compared drug-eluting stents (DES) to bare metal stents (BMS) and reported the percentages of patients who underwent target lesion revascularization after stent implantation. Sizes of the squares are proportional to study size, and 95% confidence intervals are shown as horizontal lines. There was unexplained heterogeneity among the trial results, so we did not estimate the size of the difference between groups. However, the random-effects meta-analytic confidence interval at the bottom of the plot showed the summary statistic was statistically significant.
Figure 7Example of a Qualitative Robustness Test. Cumulative meta-analytic-test of the qualitative of seven randomized trials that compared pharmacotherapeutic treatment for bulimia to placebo and reported mean purging frequency. We performed a cumulative meta-analysis in which the study with the largest weight was entered first (the topmost horizontal segment in the plot), and then the study with the next largest weight was entered (the second one from the top), etc. Each horizontal segment in the plot is a 95% confidence interval around a random-effects summary Hedges' d, a standardized mean difference. (The point estimates are not shown to clarify that the analysis focuses only on confidence intervals, not point estimates). In each of the last five analyses, the effect was statistically significant in the same direction. This met our a priori definition of qualitative robustness, which was that the qualitative conclusion must have remained the same after each of the last three or more studies were added.
Figure 8Informative and Non-Informative Effect Sizes. This figure is adapted from Armitage and Berry.[23] Each open diamond denotes a hypothetical meta-analytic summary statistic, and the horizontal segments denote 95% confidence intervals. The dashed vertical line indicates the effect size that was determined a priori to represent the minimum effect size that is considered clinically important. A meta-analytic summary statistic is considered informative if its confidence interval either excludes 0 or excludes a clinically important effect (or both). Thus, meta-analyses A through D each show informative results, whereas meta-analysis E shows a non-informative result.