| Literature DB >> 25800943 |
Richard D Riley1, Ikhlaaq Ahmed, Thomas P A Debray, Brian H Willis, J Pieter Noordzij, Julian P T Higgins, Jonathan J Deeks.
Abstract
Following a meta-analysis of test accuracy studies, the translation of summary results into clinical practice is potentially problematic. The sensitivity, specificity and positive (PPV) and negative (NPV) predictive values of a test may differ substantially from the average meta-analysis findings, because of heterogeneity. Clinicians thus need more guidance: given the meta-analysis, is a test likely to be useful in new populations, and if so, how should test results inform the probability of existing disease (for a diagnostic test) or future adverse outcome (for a prognostic test)? We propose ways to address this. Firstly, following a meta-analysis, we suggest deriving prediction intervals and probability statements about the potential accuracy of a test in a new population. Secondly, we suggest strategies on how clinicians should derive post-test probabilities (PPV and NPV) in a new population based on existing meta-analysis results and propose a cross-validation approach for examining and comparing their calibration performance. Application is made to two clinical examples. In the first example, the joint probability that both sensitivity and specificity will be >80% in a new population is just 0.19, because of a low sensitivity. However, the summary PPV of 0.97 is high and calibrates well in new populations, with a probability of 0.78 that the true PPV will be at least 0.95. In the second example, post-test probabilities calibrate better when tailored to the prevalence in the new population, with cross-validation revealing a probability of 0.97 that the observed NPV will be within 10% of the predicted NPV.Entities:
Keywords: calibration; diagnostic; discrimination; meta-analysis; prognostic; test accuracy
Mesh:
Year: 2015 PMID: 25800943 PMCID: PMC4973708 DOI: 10.1002/sim.6471
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Summary of the 11 temperature studies identified for meta‐analysis; each study used a threshold of 38 °C to define fever, an electronic rectal thermometer and a FirstTemp ear thermometer.
| First author |
|
| Sensitivity |
|
| Specificity | Observed prevalence |
|---|---|---|---|---|---|---|---|
| Brennan | 150 | 203 | 0.74 | 155 | 167 | 0.93 | 0.55 |
| Davis | 9 | 18 | 0.50 | 46 | 48 | 0.96 | 0.27 |
| Green | 8 | 9 | 0.89 | 12 | 12 | 1.00 | 0.43 |
| Greenes | 53 | 109 | 0.49 | 193 | 195 | 0.99 | 0.36 |
| Hoffman | 30 | 42 | 0.71 | 56 | 58 | 0.97 | 0.42 |
| Hooker | 10 | 15 | 0.67 | 24 | 24 | 1.00 | 0.38 |
| Lanham | 53 | 103 | 0.51 | 74 | 75 | 0.99 | 0.58 |
| Muma | 48 | 87 | 0.55 | 136 | 136 | 1.00 | 0.39 |
| Nypaver | 282 | 425 | 0.66 | 445 | 453 | 0.98 | 0.48 |
| Rhoads | 7 | 27 | 0.26 | 38 | 38 | 1.00 | 0.42 |
| Stewart | 57 | 59 | 0.97 | 20 | 20 | 1.00 | 0.75 |
r11i, number of true positives; n 1, number with fever.
r00i, number of true negatives; n 0, number without fever.
Summary of five cohort studies evaluating the accuracy of a >65% change in PTH (measured pre‐operation to 0–20 min post‐thyroidectomy) for identifying hypocalcaemia.
| First author |
|
| Sensitivity |
|
| Specificity | Observed proportion with outcome |
|---|---|---|---|---|---|---|---|
| Lo | 11 | 11 | 1.0 | 56 | 89 | 0.63 | 0.11 |
| Lombardi | 13 | 16 | 0.81 | 31 | 35 | 0.89 | 0.31 |
| McLeod | 9 | 13 | 0.69 | 33 | 43 | 0.77 | 0.23 |
| Warren, 2002 | 3 | 4 | 0.75 | 10 | 12 | 0.83 | 0.25 |
| Warren, 2004 | 2 | 3 | 0.67 | 20 | 23 | 0.87 | 0.12 |
r11i, number of true positives; n 1, number with hypocalcaemia.
r00i, number of true negatives; n 0, number without for identifying hypocalcaemia.
PTH, parathyroid.
Summary of five cohort studies evaluating the accuracy of a >65% change in PTH (measured pre‐operation to 1–2 h post‐thyroidectomy) for identifying hypocalcaemia.
| First author |
|
| Sensitivity |
|
| Specificity | Observed proportion with outcome |
|---|---|---|---|---|---|---|---|
| Lam | 12 | 12 | 1.0 | 24 | 26 | 0.92 | 0.32 |
| Lombardi | 15 | 16 | 0.94 | 29 | 35 | 0.83 | 0.31 |
| McLeod | 11 | 12 | 0.92 | 19 | 25 | 0.76 | 0.32 |
| Warren, 2002 | 1 | 2 | 0.50 | 6 | 8 | 0.75 | 0.2 |
| Warren, 2004 | 3 | 3 | 1.0 | 17 | 23 | 0.74 | 0.12 |
r11i, number of true positives; n 1, number with hypocalcaemia.
r 00i, number of true negatives; n 0∖, number without hypocalcaemia.
PTH, parathyroid.
Figure 1Confidence and prediction regions following application of model (1) to the temperature data.
Results of the internal–external cross‐validation procedure for the temperature data.
| Study | Observed no. with fever (O) | Observed prevalence | Summary prevalence from meta‐analysis model | Summary sensitivity from meta‐analysis model | Summary specificity from meta‐analysis model | Option A Post‐test predictions obtained using summary sensitivity, summary specificity and summary prevalence (Equations | Option B Post‐test predictions obtained using summary sensitivity, summary specificity and observed prevalence (Equations | Option C Post‐test predictions obtained after bivariate meta‐regression of PPV and NPV on prevalence (Equations | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predicted PPV | Predicted NPV | Expected no. with fever | O/E | Predicted PPV | Predicted NPV | Expected no. with fever | O/E | Predicted PPV | Predicted NPV | Expected no. with fever | O/E | ||||||
| Brennan | 203 | 0.55 | 0.45 | 0.644 | 0.99 | 0.97 | 0.77 | 207.59 | 0.98 | 0.98 | 0.69 | 226.11 | 0.90 | 0.98 | 0.75 | 211.83 | 0.96 |
| Davis | 18 | 0.27 | 0.48 | 0.661 | 0.98 | 0.97 | 0.76 | 24.41 | 0.74 | 0.94 | 0.89 | 16.85 | 1.07 | 0.96 | 0.78 | 22.71 | 0.79 |
| Green | 9 | 0.43 | 0.46 | 0.631 | 0.98 | 0.97 | 0.76 | 10.92 | 0.82 | 0.97 | 0.78 | 10.58 | 0.85 | 0.96 | 0.76 | 10.80 | 0.83 |
| Greenes | 109 | 0.36 | 0.46 | 0.67 | 0.98 | 0.97 | 0.77 | 110.91 | 0.98 | 0.96 | 0.83 | 94.12 | 1.16 | 0.95 | 0.78 | 106.11 | 1.03 |
| Hoffman | 42 | 0.42 | 0.47 | 0.645 | 0.99 | 0.97 | 0.77 | 47.23 | 0.89 | 0.96 | 0.80 | 44.56 | 0.94 | 0.97 | 0.76 | 47.06 | 0.89 |
| Hooker | 15 | 0.38 | 0.46 | 0.65 | 0.98 | 0.97 | 0.76 | 16.63 | 0.90 | 0.96 | 0.82 | 14.97 | 1.00 | 0.96 | 0.78 | 16.07 | 0.93 |
| Lanham | 103 | 0.58 | 0.47 | 0.666 | 0.98 | 0.97 | 0.76 | 81.98 | 1.26 | 0.98 | 0.67 | 94.01 | 1.10 | 0.98 | 0.76 | 82.71 | 1.25 |
| Muma | 87 | 0.39 | 0.46 | 0.662 | 0.98 | 0.97 | 0.78 | 84.73 | 1.03 | 0.96 | 0.83 | 75.98 | 1.15 | 0.95 | 0.78 | 84.66 | 1.03 |
| Nypaver | 425 | 0.48 | 0.45 | 0.652 | 0.98 | 0.97 | 0.78 | 408.91 | 1.04 | 0.97 | 0.76 | 426.06 | 1.00 | 0.97 | 0.76 | 421.87 | 1.01 |
| Rhoads | 27 | 0.42 | 0.47 | 0.683 | 0.98 | 0.96 | 0.77 | 20.25 | 1.33 | 0.96 | 0.80 | 18.12 | 1.49 | 0.96 | 0.77 | 19.78 | 1.36 |
| Stewart | 59 | 0.75 | 0.43 | 0.65 | 0.98 | 0.97 | 0.78 | 59.88 | 0.99 | 0.99 | 0.49 | 67.79 | 0.87 | 0.98 | 0.56 | 65.61 | 0.90 |
Obtained from meta‐an the study in the row.
Expected number for the study in the row, based on using Equation (14) and the positive predictive value (PPV) and negative predictive value (NPV) in the prior two columns.
Meta‐analysis results for overall calibration (O/E) when using model (16), as estimated in a frequentist or Bayesian framework.1
| Example | Predicted PPV and NPV obtained using… | Statistical framework | Summary O/E (95% CI) |
| 95% prediction interval for O/E in a new population | Probability 0.9 < O/E < 1.1 in a new population |
|---|---|---|---|---|---|---|
| Temperature data | Option A | Bayesian | 1.02 (0.93 to 1.11) | 0.10 (0.01 to 0.23) | 0.78 to 1.31 | 0.67 |
| Option A | Frequentist | 1.02 (0.95 to 1.10) | 0.08 | 0.85 to 1.24 | — | |
| PTH data 1–2 h | Option B | Bayesian | 1.02 (0.73 to 1.38) | 0.14 (0.01 to 0.45) | 0.56 to 1.74 | 0.40 |
| Option B | Frequentist | 1.01 (0.79 to 1.29) | 0 | 0.68 to 1.50 | — |
All frequentist analyses used method of moments to estimate the model.
All Bayesian analyses used a prior N(0, 1 000 000) for the mean ln(O/E) and a prior uniform(0, 0.25) for τ, with a 10 000 burn‐in followed by 100 000 samples for posterior inferences.
O/E, observed/expected; PPV, positive predictive value; NPV, negative predictive value.
Meta‐analysis results for calibration of either PPV or NPV, when using meta‐analysis model (17) as estimated in a frequentist or Bayesian framework.1
| Example | Meta‐analysis method | Statistical framework | Summary calibration, |
| 95% prediction interval for O/E in a new population | Probability 0.9 < O/E < 1.1 in a new population |
|---|---|---|---|---|---|---|
| Calibration for just NPV | ||||||
| PTH data 1–2 h | Option B | Bayesian | 0.24 (−0.97 to 1.81) | 0.51 (0.03 to 0.98) | 0.86 to 1.05 | 0.95 |
| Option B | Frequentist | 0.093 (−1.06 to 1.25) | 0 | 0.87 to 1.03 | — | |
| PTH data 0–20 min | Option B | Bayesian | 0.021 (−0.82 to 1.02) | 0.51 (0.03 to 0.97) | 0.86 to 1.04 | 0.95 |
| Option B | Frequentist | −0.044 (−0.83 to 0.74) | 0.34 | 0.80 to 1.03 | — | |
| Calibration for just PPV | ||||||
| Temperature data | Option A | Bayesian | −0.017 (−0.63 to 0.77) | 0.65 (0.11 to 0.99) | 0.90 to 1.03 | 0.98 |
| Option A | Frequentist | 0.015 (−0.75 to 0.78) | 0.70 | 0.87 to 1.03 | — |
All frequentist analyses used maximum likelihood estimation of model (17). All Bayesian analyses used a prior distribution of N(0, 1 000 000) for a, and a prior distribution of uniform(0, 1) for τ, with a 10 000 burn‐in followed by 100 000 samples for posterior inferences. Median values of the posterior distribution are shown for a and τ.
Based on a predicted positive predictive value (PPV) of 0.97 in the temperature analysis, and a negative predictive value (NPV) of 0.95 in the parathyroid (PTH) analysis.
O/E, observed/expected.
Results of the internal–external cross‐validation procedure for the PTH data.
| Study | Observed no. with fever (O) | Observed prevalence | Summary prevalence from meta‐analysis model | Summary sensitivity from meta‐analysis model | Summary specificity from meta‐analysis model | Option A Post‐test predictions obtained using summary sensitivity, summary specificity and summary prevalence (Equations | Option B Post‐test predictions obtained using summary sensitivity, summary specificity and observed prevalence (Equations | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predicted PPV | Predicted NPV | Expected no. with outcome | O/E | Predicted PPV | Predicted NPV | Expected no. with outcome | O/E | ||||||
|
| |||||||||||||
| Lam | 12 | 0.32 | 0.26 | 0.91 | 0.78 | 0.60 | 0.96 | 9.37 | 1.28 | 0.66 | 0.95 | 10.51 | 1.14 |
| Lombardi | 16 | 0.31 | 0.26 | 0.93 | 0.80 | 0.62 | 0.97 | 14.01 | 1.14 | 0.68 | 0.96 | 15.64 | 1.02 |
| McLeod | 12 | 0.32 | 0.26 | 0.94 | 0.82 | 0.65 | 0.97 | 11.77 | 1.02 | 0.72 | 0.97 | 13.11 | 0.92 |
| Warren, 2002 | 2 | 0.20 | 0.28 | 0.95 | 0.82 | 0.67 | 0.98 | 2.21 | 0.90 | 0.57 | 0.99 | 1.82 | 1.10 |
| Warren, 2004 | 3 | 0.12 | 0.31 | 0.93 | 0.83 | 0.71 | 0.96 | 7.22 | 0.42 | 0.42 | 0.99 | 3.99 | 0.75 |
|
| |||||||||||||
| Lo | 11 | 0.11 | 0.24 | 0.75 | 0.83 | 0.59 | 0.91 | 33.60 | 0.33 | 0.36 | 0.96 | 18.82 | 0.58 |
| Lombardi | 16 | 0.31 | 0.16 | 0.81 | 0.76 | 0.39 | 0.96 | 8.32 | 1.92 | 0.60 | 0.90 | 14.12 | 1.13 |
| McLeod | 13 | 0.23 | 0.18 | 0.87 | 0.81 | 0.50 | 0.97 | 11.15 | 1.17 | 0.58 | 0.95 | 13.22 | 0.98 |
| Warren, 2002 | 4 | 0.25 | 0.18 | 0.83 | 0.78 | 0.47 | 0.95 | 2.94 | 1.36 | 0.56 | 0.93 | 3.69 | 1.08 |
| Warren, 2004 | 3 | 0.12 | 0.21 | 0.84 | 0.77 | 0.49 | 0.95 | 3.70 | 0.81 | 0.32 | 0.97 | 2.26 | 1.33 |
Obtained from meta‐analysing all studies excluding the study in the row.
Expected number for the study in the row, based on using Equation (14) and the positive predictive value (PPV) and negative predictive value (NPV) in the prior two columns.
PTH, parathyroid.
Figure 2Meta‐analysis of the observed/expected (O/E) calibration statistics (frequentist estimation of model (16)) from the internal–external cross‐validation approach applied to the ear temperature data for diagnosis of fever. PPV, positive predictive value; NPV, negative predictive value.
Figure 3Meta‐analysis of the observed/expected (O/E) calibration statistics (frequentist estimation of model (16)) from the internal–external cross‐validation approach applied to the parathyroid data at 1–2 h for prediction of hypocalcaemia. PPV, positive predictive value; NPV, negative predictive value.
Figure 4Calibration of predicted and observed post‐test probabilities, for (a) positive predictive value (PPV) derived using option A in the temperature example and (b) negative predictive value (NPV) derived using option B in the parathyroid example. Each circle represents a study and is proportional to the study sample size.
Figure 5Posterior distributions for the true observed/expected (O/E), positive predictive value (PPV) or negative predictive value (NPV) in a new population, derived from Bayesian estimation of model (17).