| Literature DB >> 28480827 |
Kym Ie Snell1, Joie Ensor1, Thomas Pa Debray2,3, Karel Gm Moons2,3, Richard D Riley1.
Abstract
If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.Entities:
Keywords: C-statistic; Validation; between-study distribution; calibration; discrimination; heterogeneity; meta-analysis; performance statistics; simulation
Mesh:
Year: 2017 PMID: 28480827 PMCID: PMC6193210 DOI: 10.1177/0962280217705678
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Parameter values of the true prediction model in nine base scenarios.
| Scenario | μα | μβ | Prevalence[ | |
|---|---|---|---|---|
| 1 | −1.274 | 0.010 | 0.22 | 0.55 |
| 2 | −2.957 | 0.010 | 0.05 | 0.55 |
| 3 | 2.210 | 0.010 | 0.90 | 0.55 |
| 4 | −1.425 | 0.045 | 0.22 | 0.7 |
| 5 | −3.215 | 0.045 | 0.05 | 0.7 |
| 6 | 2.440 | 0.045 | 0.90 | 0.7 |
| 7 | −2.386 | 0.145 | 0.22 | 0.9 |
| 8 | −5.133 | 0.145 | 0.05 | 0.9 |
| 9 | 3.987 | 0.145 | 0.90 | 0.9 |
The μα and μβ values are selected to give the corresponding average prevalence and C-statistic (average from 100 large samples each of 1,000,000 patients).
Defined settings for simulation, with variability in either α or β across studies.
| Simulation setting | Standard deviation for
| |
|---|---|---|
| 1: No variability in
| σα = 0 | σβ = 0 |
| 2: Little variability in
| σα = 0.1 | σβ = 0 |
| 3: Moderate variability in
| σα = 0.5 | σβ = 0 |
| 4: Large variability in
| σα = 1.0 | σβ = 0 |
| 5: Little variability in
| σα = 0 | σβ = 0.005 |
| 6: Moderate variability in
| σα = 0 | σβ = 0.020 |
| 7: Large variability in
| σα = 0 | σβ = 0.070 |
Figure 1.Coefficient of skewness for the C-statistic for all scenarios across different simulation settings, with level of variability in either the study-specific intercept α or predictor effect β along the x-axis.
Figure 2.Histograms for (a) the C-statistic and (b) logit C-statistic when there is large variability in the predictor effect β (setting 7: σ = 0.07).
Figure 3.Skewness for the C-statistic and transformations of the C-statistic for different levels of variability in βj (settings 1, 5, 6 and 7) along the x-axis.
Figure 5.Histograms for (a) E/O in all scenarios when variability in β was large (setting 7: σ = 0.07), and (b) log(E/O) in all scenarios when variability in α was large (setting 4: σ = 1.0). Note different x-axes used for scenarios.
Figure 6.Skewness for E/O and transformations of E/O for different levels of variability in α (settings 1–4) along the x-axis.
Figure 7.Skewness for the calibration-in-the-large for different levels of variability in α or β along the x-axis.
Random-effects meta-analysis results for the QRISK2 example.
| Scale used | Pooled | 95% Confidence interval | 95% Prediction interval | |
|---|---|---|---|---|
| Original | 0.829 | 0.800–0.859 | 0.691–0.968 | 91.2 |
| Logit | 0.830 | 0.800–0.856 | 0.672–0.921 | 88.6 |