| Literature DB >> 33881369 |
Menelaos Pavlou1, Chen Qu1, Rumana Z Omar1, Shaun R Seaman2, Ewout W Steyerberg3, Ian R White4, Gareth Ambler1.
Abstract
Risk-prediction models for health outcomes are used in practice as part of clinical decision-making, and it is essential that their performance be externally validated. An important aspect in the design of a validation study is choosing an adequate sample size. In this paper, we investigate the sample size requirements for validation studies with binary outcomes to estimate measures of predictive performance (C-statistic for discrimination and calibration slope and calibration in the large). We aim for sufficient precision in the estimated measures. In addition, we investigate the sample size to achieve sufficient power to detect a difference from a target value. Under normality assumptions on the distribution of the linear predictor, we obtain simple estimators for sample size calculations based on the measures above. Simulation studies show that the estimators perform well for common values of the C-statistic and outcome prevalence when the linear predictor is marginally Normal. Their performance deteriorates only slightly when the normality assumptions are violated. We also propose estimators which do not require normality assumptions but require specification of the marginal distribution of the linear predictor and require the use of numerical integration. These estimators were also seen to perform very well under marginal normality. Our sample size equations require a specified standard error (SE) and the anticipated C-statistic and outcome prevalence. The sample size requirement varies according to the prognostic strength of the model, outcome prevalence, choice of the performance measure and study objective. For example, to achieve an SE < 0.025 for the C-statistic, 60-170 events are required if the true C-statistic and outcome prevalence are between 0.64-0.85 and 0.05-0.3, respectively. For the calibration slope and calibration in the large, achieving SE < 0.15 would require 40-280 and 50-100 events, respectively. Our estimators may also be used for survival outcomes when the proportion of censored observations is high.Entities:
Keywords: C-statistic; Sample size calculation; calibration; discrimination; prediction model
Mesh:
Year: 2021 PMID: 33881369 PMCID: PMC8529102 DOI: 10.1177/09622802211007522
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Figure 1.Standard error of the estimated C-statistic (a), calibration slope (b) and calibration in the large (c) as the true value of the C-statistic varies and the number of events is fixed to 100, corresponding to sample sizes of 2000, 1000, 500, 334 and 250 for outcome prevalences of 0.05, 0.1, 0.2, 0.3 and 0.4, respectively.
SE: standard error.
Figure 2.Number of events required to achieve required standard errors of: (a) SE = 0.025 for the estimated C-statistic of 0.025 (width of 95% CI = 0.1) or (b) SE = 0.15 for the estimated calibration slope (width of 95% CI = 0.6) or (c) SE = 0.15 for the estimated calibration in the large, as the true value of the C-statistic and the outcome prevalence varies.
SE: standard error.
Figure 3.Number of events required to detect a difference of magnitude between 0.03 and 0.1 from a target value of C = 0.72 (C1 = C0 + d).
DGM 3. % Bias of the estimated standard errors for and calculated over 10,000 simulations for true prevalence values 10% and 30% and true C-statistic of 0.64, 0.72, 0.8 and 0.85.
| C-statistic | Calibration slope | Calibration in the large | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| %Bias | %Bias |
| %Bias | %Bias |
| %Bias | %Bias |
| 0.1 | 0.64 | 50 | 0.041 | −3 | −3 | 0.295 | −2 | −1 | 0.145 | 1 | 1 |
| 100 | 0.028 | 0 | 0 | 0.205 | 0 | 0 | 0.104 | −1 | −1 | ||
| 200 | 0.020 | 0 | 0 | 0.147 | −1 | −1 | 0.074 | −2 | −2 | ||
| 0.1 | 0.72 | 50 | 0.036 | 0 | 0 | 0.188 | −3 | −1 | 0.148 | 0 | 0 |
| 100 | 0.026 | 1 | 0 | 0.134 | −4 | −2 | 0.106 | −1 | −1 | ||
| 200 | 0.018 | 0 | 1 | 0.093 | −2 | 0 | 0.074 | 0 | 0 | ||
| 0.1 | 0.8 | 50 | 0.032 | 0 | 0 | 0.142 | −8 | −1 | 0.155 | −1 | −1 |
| 100 | 0.022 | 0 | −1 | 0.100 | −7 | 0 | 0.109 | 0 | 0 | ||
| 200 | 0.016 | 0 | −1 | 0.069 | −6 | 1 | 0.076 | 0 | 0 | ||
| 0.1 | 0.85 | 50 | 0.027 | 1 | 0 | 0.126 | −13 | −1 | 0.160 | 1 | 0 |
| 100 | 0.019 | 2 | −1 | 0.088 | −12 | 1 | 0.113 | 1 | 0 | ||
| 200 | 0.014 | 2 | 0 | 0.062 | −11 | 1 | 0.079 | 2 | 1 | ||
| 0.3 | 0.64 | 50 | 0.041 | −4 | −4 | 0.305 | −1 | −1 | 0.154 | −2 | −2 |
| 100 | 0.029 | −1 | 0 | 0.215 | −1 | −1 | 0.107 | −1 | 0 | ||
| 200 | 0.020 | 0 | 0 | 0.151 | 0 | 0 | 0.076 | −1 | −1 | ||
| 0.3 | 0.72 | 50 | 0.037 | −1 | −1 | 0.200 | −4 | −1 | 0.158 | −4 | −3 |
| 100 | 0.027 | 0 | 0 | 0.142 | −5 | −3 | 0.108 | −1 | 1 | ||
| 200 | 0.019 | −1 | −1 | 0.098 | −2 | 1 | 0.077 | −2 | 0 | ||
| 0.3 | 0.8 | 50 | 0.033 | −1 | −2 | 0.157 | −9 | −3 | 0.163 | −5 | −1 |
| 100 | 0.023 | 0 | −1 | 0.108 | −7 | 0 | 0.115 | −6 | −1 | ||
| 200 | 0.016 | 0 | −1 | 0.076 | −7 | 0 | 0.080 | −4 | 1 | ||
| 0.3 | 0.85 | 50 | 0.028 | 0 | 0 | 0.141 | −14 | −3 | 0.172 | −10 | −2 |
| 100 | 0.020 | 0 | −1 | 0.098 | −13 | −1 | 0.121 | −9 | −1 | ||
| 200 | 0.014 | 1 | 0 | 0.067 | −10 | 2 | 0.085 | −8 | 0 | ||
DGM 3. Number of events for a specified standard error for , and
| C-statistic | Calibration slope | Calibration in the large | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.10 | 0.64 | 0.0125 | 541 | −1 | 0 | 0.1 | 453 | 0 | 1 | 115 | −1 | −1 |
| 0.10 | 0.72 | 0.0125 | 447 | 1 | 2 | 0.1 | 198 | −6 | −2 | 120 | −3 | 0 |
| 0.10 | 0.80 | 0.0125 | 329 | 3 | 1 | 0.1 | 118 | −15 | −4 | 130 | −9 | 1 |
| 0.10 | 0.85 | 0.0125 | 241 | 4 | −2 | 0.1 | 94 | −23 | −4 | 146 | −17 | −3 |
| 0.30 | 0.64 | 0.0125 | 695 | −1 | 0 | 0.1 | 625 | 0 | −1 | 153 | −1 | −1 |
| 0.30 | 0.72 | 0.0125 | 586 | −1 | −1 | 0.1 | 282 | −2 | −2 | 166 | −1 | 0 |
| 0.30 | 0.80 | 0.0125 | 420 | 3 | 2 | 0.1 | 180 | −9 | −4 | 192 | −5 | 0 |
| 0.30 | 0.85 | 0.0125 | 323 | 1 | −1 | 0.1 | 149 | −14 | −2 | 216 | −15 | 0 |
| 0.10 | 0.64 | 0.025 | 136 | −1 | 0 | 0.15 | 209 | −5 | −3 | 52 | −2 | −2 |
| 0.10 | 0.72 | 0.025 | 112 | 1 | −1 | 0.15 | 89 | −7 | −4 | 55 | −5 | −3 |
| 0.10 | 0.80 | 0.025 | 86 | −1 | 1 | 0.15 | 54 | −17 | −7 | 58 | −9 | 0 |
| 0.10 | 0.85 | 0.025 | 62 | 2 | −3 | 0.15 | 44 | −27 | −9 | 65 | −17 | −3 |
| 0.30 | 0.64 | 0.025 | 173 | −1 | 0 | 0.15 | 282 | −1 | −2 | 70 | −3 | −4 |
| 0.30 | 0.72 | 0.025 | 144 | 1 | 0 | 0.15 | 125 | −2 | −1 | 76 | −3 | −3 |
| 0.30 | 0.80 | 0.025 | 110 | −2 | −2 | 0.15 | 81 | −10 | −5 | 86 | −6 | −1 |
| 0.30 | 0.85 | 0.025 | 81 | 0 | −2 | 0.15 | 70 | −19 | −8 | 97 | −15 | −1 |
Note. % Bias of the estimated sample size (and number of events), calculated over 10,000 simulations for true prevalence values 10% and 30% and true C-statistic of 0.64, 0.72, 0.8 and 0.85. denotes the required number of events.