| Literature DB >> 22648675 |
Sonal Singh1, Stephanie M Chang, David B Matchar, Eric B Bass.
Abstract
INTRODUCTION: Grading the strength of a body of diagnostic test evidence involves challenges over and above those related to grading the evidence from health care intervention studies. This chapter identifies challenges and outlines principles for grading the body of evidence related to diagnostic test performance. CHALLENGES: Diagnostic test evidence is challenging to grade because standard tools for grading evidence were designed for questions about treatment rather than diagnostic testing; and the clinical usefulness of a diagnostic test depends on multiple links in a chain of evidence connecting the performance of a test to changes in clinical outcomes. PRINCIPLES: Reviewers grading the strength of a body of evidence on diagnostic tests should consider the principle domains of risk of bias, directness, consistency, and precision, as well as publication bias, dose response association, plausible unmeasured confounders that would decrease an effect, and strength of association, similar to what is done to grade evidence on treatment interventions. Given that most evidence regarding the clinical value of diagnostic tests is indirect, an analytic framework must be developed to clarify the key questions, and strength of evidence for each link in that framework should be graded separately. However if reviewers choose to combine domains into a single grade of evidence, they should explain their rationale for a particular summary grade and the relevant domains that were weighed in assigning the summary grade.Entities:
Mesh:
Year: 2012 PMID: 22648675 PMCID: PMC3364356 DOI: 10.1007/s11606-012-2021-9
Source DB: PubMed Journal: J Gen Intern Med ISSN: 0884-8734 Impact factor: 5.128
Example of the Impact of Precision of Sensitivity on Negative Predictive Value
| Type of biopsy | Post biopsy probability of having cancer after a negative core-needle biopsy result | |||
|---|---|---|---|---|
| Analysis results | Analysis overestimated sensitivity by 1% (e.g., sensitivity 97% rather than 98%) | Analysis overestimated sensitivity by 5% (e.g., sensitivity 93% rather than 98%) | Analysis overestimated sensitivity by 10% (e.g., sensitivity 88% rather than 98%) | |
| Freehand automated gun | 6% | 6% | 8% | 9% |
| Ultrasound guidance automated gun | 1% | 1% | 3% | 5% |
| Stereotactic guidance automated gun | 1% | 1% | 3% | 5% |
| Ultrasound guidance vacuum-assisted | 2% | 2% | 3% | 6% |
| Stereotactic guidance vacuum-assisted | 0.4% | 0.8% | 3% | 5% |
aFor a woman with a BI-RADS® 4 score following mammography and expected to have an approximate prebiopsy risk of malignancy of 30%. Note that an individual woman’s risk may be different from these estimates depending on her own individual characteristics11
Required and Additional Domains and their Definitions*
| Domain | Definition and Elements | Application to evaluation of diagnostic test performance |
|---|---|---|
| Risk of Bias | Risk of bias is the degree to which the included studies for a given outcome or comparison have a high likelihood of adequate protection against bias (i.e., good internal validity), assessed through main elements: | Use one of three levels of aggregate risk of bias: |
| • Study design (e.g., RCTs or observational studies) | • Low risk of bias | |
| • Aggregate quality of the studies under consideration from the rating of quality (good/fair/poor) done for individual studies | • Medium risk of bias | |
| • High risk of bias | ||
| Well designed and executed studies of new tests compared to an adequate criterion standard are rated as “low risk of bias” | ||
| Consistency | Consistency is the degree to which reported study results (e.g., sensitivity, specificity, likelihood ratios) from included studies are similar. This can be assessed through two main elements: | Use one of three levels of consistency: |
| • The range of study results is narrow | • Consistent (i.e., no inconsistency) | |
| • | • Inconsistent | |
| • Unknown or not applicable (e.g., single study) | ||
| Single-study evidence bases should be considered as “consistency unknown (single study).” | ||
| Directness | Directness relates to whether the evidence links the interventions directly to outcomes. For a comparison of two diagnostic tests, directness implies head-to-head comparisons against a common criterion standard | Score dichotomously as one of two levels of directness |
| Directness may be contingent on the outcomes of interest | • Direct | |
| • Indirect | ||
| When assessing the directness of the overarching question, if there are no studies linking the test to a clinical outcome, then evidence that only provides diagnostic accuracy outcomes would be considered indirect. If indirect, specify which of the two types of indirectness account for the rating (or both, if that is the case)—namely, use of intermediate/ surrogate outcomes rather than health outcomes, and use of indirect comparisons. If the decision is made to grade the strength of evidence of an intermediate outcome such as diagnostic accuracy, then the reviewer does not need to automatically “downgrade” that outcome for being indirect | ||
| Precision | Precision is the degree of certainty surrounding an effect estimate with respect to a given outcome (i.e., for each outcome separately) | Score dichotomously as one of two levels of precision: |
| If a meta-analysis was performed, this will be the confidence interval around the summary measure(s) of test performance (e.g sensitivity, true positive) | • Precise | |
| • Imprecise | ||
| A precise estimate is an estimate that would allow a clinically useful conclusion. An imprecise estimate is one for which the confidence interval is wide enough to include clinically distinct conclusions | ||
| Publication bias† | Publication bias indicates that studies may have been published selectively, with the result that the estimate of test performance based on published studies does not reflect the true effect. Methods to detect publication bias for medical test studies are not robust. Evidence from small studies of new tests or asymmetry in funnel plots should raise suspicion for publication bias | Publication bias can influence ratings of consistency, precision, magnitude of effect (and, to a lesser degree, risk of bias and directness). Reviewers should comment on publication bias when circumstances suggest that relevant empirical findings, particularly negative or no-difference findings, have not been published or are unavailable |
| Dose-response association | This association, either across or within studies, refers to a pattern of a larger effect with greater exposure (dose, duration, and adherence) | The dose-response association may support an underlying mechanism of detection and potential relevance for some tests that have continuous outcomes and possibly multiple cutoffs [e.g., gene expression, serum PSA (prostate-specific antigen) levels, and ventilation/perfusion scanning] |
| Plausible unmeasured confounding and bias that would decrease an observed effect or increase an effect if none was observed | Occasionally, in an observational study, plausible confounding factors would work in the direction opposite to that of the observed effect. Had these confounders not been present, the observed effect would have larger. In such case the evidence can be upgraded | The impact of plausible unmeasured confounders may be relevant to testing strategies that predict outcomes. A study may be biased to find low diagnostic accuracy via spectrum bias and yet despite this find very high diagnostic accuracy |
| Strength of association (magnitude of effect) | Strength of association refers to the likelihood that the observed effect or association is large enough that it cannot have occurred solely as a result of bias from potential confounding factors | The strength of association may be relevant when comparing the accuracy of two different medical tests with one being more accurate than the other |
| It is possible that the accuracy of a test is better than the reference standard because of an imperfect reference standard. It is important to consider this and modify the analysis to take into consideration alternative assumptions about the best reference standard |
*Adapted from the Methods Guide for Effectiveness and Comparative Effectiveness Reviews3
†The GRADE approach is moving towards considering publication bias a GRADE principle domain
Abbreviations: EPC = Evidence-based Practice Center
Steps in Grading a Body of Evidence on Diagnostic Test Accuracy Outcomes
*Adapted from the Methods Guide for Effectiveness and Comparative Effectiveness Reviews3
Illustration of the Approach to Grading a Body of Evidence on Diagnostic Tests- Identifying Norovirus in a Healthcare Setting *
| Outcome | Quantity and type of evidence | Findings | Starting grade | Decrease GRADE‡ | GRADE of Evidence for Outcome | Overall GRADE§ | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Risk of Bias‡ | Consistency‡ | Directness‡ | Precision‡ | Publication Bias‡ | ||||||
| Sensitivity† | 1 DIAG | 68% | High | 0 | 0 | 0 | -1 | 0 | Moderate | Moderate |
| Specificity† | 1 DIAG | 99% | High | 0 | 0 | 0 | -1 | 0 | Moderate | |
| PPV† | 1 DIAG | 97% | High | 0 | 0 | 0 | -1 | 0 | Moderate | |
| NPV† | 1 DIAG | 82% | High | 0 | 0 | 0 | -1 | 0 | Moderate | |
*Adapted from MacCannell T, Umscheid CA, Agarwal RK, Lee I, Kuntz G, Stevenson, KB, and the Healthcare Infection Control Practices Advisory Committee. Guideline for the prevention and control of norovirus gastroenteritis outbreaks in healthcare settings. Infection Control and Hospital Epidemiology. 2011; 32(10): 939-96920
21
†These outcomes were considered the most critical by the guideline developers
‡These modifiers can impact the GRADE by 1 or 2 points
§Consider the additional domains of strength of association, dose-response and impact of plausible confounders if applicable