| Literature DB >> 22648676 |
Thomas A Trikalinos1, Cynthia M Balion, Craig I Coleman, Lauren Griffith, Pasqualina L Santaguida, Ben Vandermeer, Rongwei Fu.
Abstract
Synthesizing information on test performance metrics such as sensitivity, specificity, predictive values and likelihood ratios is often an important part of a systematic review of a medical test. Because many metrics of test performance are of interest, the meta-analysis of medical tests is more complex than the meta-analysis of interventions or associations. Sometimes, a helpful way to summarize medical test studies is to provide a "summary point", a summary sensitivity and a summary specificity. Other times, when the sensitivity or specificity estimates vary widely or when the test threshold varies, it is more helpful to synthesize data using a "summary line" that describes how the average sensitivity changes with the average specificity. Choosing the most helpful summary is subjective, and in some cases both summaries provide meaningful and complementary information. Because sensitivity and specificity are not independent across studies, the meta-analysis of medical tests is fundamentaly a multivariate problem, and should be addressed with multivariate methods. More complex analyses are needed if studies report results at multiple thresholds for positive tests. At the same time, quantitative analyses are used to explore and explain any observed dissimilarity (heterogeneity) in the results of the examined studies. This can be performed in the context of proper (multivariate) meta-regressions.Entities:
Mesh:
Year: 2012 PMID: 22648676 PMCID: PMC3364353 DOI: 10.1007/s11606-012-2029-1
Source DB: PubMed Journal: J Gen Intern Med ISSN: 0884-8734 Impact factor: 5.128
Figure 1.Typical data on the performance of a medical test (D-dimers for venous thromboembolism). Eleven studies on ELISA-based D-dimer assays for the diagnosis of venous thromboembolism.15 The top panel (a) depicts studies as markers, labeled by author names and thresholds for a positive test (in ng/mL). Studies listed on the left lightly shaded area have a positive likelihood ratio of at least 10. Studies listed on the top lightly shaded area have a negative likelihood ratio of at most 0.1. Studies listed at the intersection of the gray areas (darker gray polygon) have both a positive likelihood ratio of at least 10 and a negative likelihood ratio of 0.1 or less. The second panel (b) shows ‘paired’ forest plots in ascending order of sensitivity (left) along with with the corresponding specificity (right). Note how sensitivity increases with decreasing specificity, which could be explained by a “threshold effect”. The third panel (c) shows the respective negative and positive likelihood ratios.
Figure 2.Obtaining summary (overall) metrics for medical test performance. PLR/NLR = positive (negative) likelihood ratio; PPV/NPV = positive (negative) predictive value; Prev = prevalence; Se = Sensitivity; Sp = specificity. The herein recommended approach is to perform a meta-analysis for sensitivity and specificity across the K studies, and then use the summary sensitivity and specificity (Se and Sp; a row of two boxes after the horizontal black line) to back-calculate “overall” values for the other metrics (second row of boxes after the horizontal black line). In most cases it is not meaningful to synthesize prevalences (see text).
Commonly Used Methods for Meta-Analysis of Medical Test Performance
| Method | Description or comment | Does it have desired characteristics? |
|---|---|---|
| Independent meta-analysis of sensitivity and specificity | Separate meta-analyses per metric | Ignores correlation between sensitivity and specificity |
| Within-study variability preferably modeled by the binomial distribution. | Underestimates summary sensitivity and specificity and incorrect confidence intervals | |
| Joint (multivariate) meta-analysis of sensitivity and specificity based on hierarchical modeling | Based on multivariate (joint) modeling of sensitivity and specificity. | The generally preferred method |
| Two families of models | ||
| Modeling preferably using binomial likelihood rather than normal approximations | ||
| Moses and Littenberg model | Summary line based on a simple regression of the difference of logit-transformed true and false positive rates versus their average. | Ignores unexplained variation between-studies (fixed effects) |
| Does not account for correlation between sensitivity and specificity | ||
| Does not account for variability in the independent variable | ||
| Inability to weight studies optimally—yields wrong inferences when covariates are used | ||
| Random intercept augmentation of the Moses-Littenberg model | Regression of the difference of logit-transformed true and false positive rates versus their average with random effects to allow for variability across studies | Does not account for correlation between sensitivity and specificity |
| Does not account for variability in the independent variable | ||
| Summary ROC based on hierarchical modeling | Same as for multivariate meta-analysis to obtain a summary point—hierarchical modeling | Most theoretically motivated method |
| Many ways to obtain a (hierarchical) summary ROC : | Rutter-Gatsonis HSROC recommended in the Cochrane handbook, | |
| Rutter-Gatsonis (most common) | ||
| Several alternative curves | ||
Figure 3.Graphical presentation of studies reporting data at multiple thresholds. Ability of early total serum bilirubin measurements to identify postdischarge total serum bilirubin above the 95th hour-specific percentile. Sensitivity and 100 percent minus specificity pairs from the same study (obtained with different cut-offs for the early total serum bilirubin measurement) are connected with lines. These lines are reconstructed based on the reported cut-offs, and are not perfect representations of the actual ROC curves in each study (they show only a few thresholds that could be extracted from the study). Studies listed on the left lightly shaded area have a positive likelihood ratio of at least 10. Studies listed on the top lightly shaded area have a negative likelihood ratio of at most 0.1. Studies listed at the intersection of the gray areas (darker gray polygon) have both a positive likelihood ratio of at least 10 and a negative likelihood ratio of 0.1 or less.41
Figure 4.HSROC for the ELISA-based D-dimer tests. (a) Hierarchical summary receiver-operator curve (HSROC) of the studies plotted in Fig. 1a. (b) Calculated negative predictive value for the ELISA-based D-dimer test if the sensitivity and specificity are fixed at 80 % and 97 %, respectively, and prevalence of venous thromboembolism varies from 5 to 50 %.
Figure 5.Sensitivity 1–specificity plot for studies of serial CK-MB measurements. The left panel shows the sensitivity and specificity of 14 studies according to the timing of the last serial CK-MB measurement for diagnosis of acute cardiac ischemia. The numbers next to each study point are the actual length of the time interval from symptom onset to last serial CK-MB measurement. Filled circles: at most 3 hours; “x” marks: longer than 3 hours. The right panel plots the summary points and the 95 % confidence regions for the aforementioned subgroups of studies (at most 3 hours: filled circles; longer than 3 hours— “x”s). Estimates are based on a bivariate meta-regression using the time interval as a predictor. The predictor has distinct effects for sensitivity and specificity. This is the same analysis as in Table 2.
Meta-Regression-Based Comparison of Diagnostic Performance
| Meta-analysis metric | ≤3 hours | >3 hours | p-Value for the comparison across subgroups |
|---|---|---|---|
| Summary sensitivity (percent) | 80 (64 to 90) | 96 (85 to 99) | 0.036 |
| Summary specificity (percent) | 97 (94 to 98) | 97 (95 to 99) | 0.56 |
Results based on a bivariate meta-regression that effectively compared the summary sensitivity and summary specificity according to the timing of the last serial CK-MB measurement for diagnosis of acute cardiac ischemia. The meta-regression is on a variable that takes the value 1 if the time from the onset of symptoms to testing was 3 hours or less, and the value 0, when the respective time interval was more than 3 hours. The bivariate meta-regression model allows for different effects of timing on sensitivity and specificity. To facilitate interpretation, we present the summary sensitivity and specificity in each subgroup, calculated from the parameters of the meta-regression model, which also gave the p-values for the effect of timing on test performance.