| Literature DB >> 33307188 |
Richard D Riley1, Kym I E Snell2, Glen P Martin3, Rebecca Whittle2, Lucinda Archer2, Matthew Sperrin3, Gary S Collins4.
Abstract
OBJECTIVES: When developing a clinical prediction model, penalization techniques are recommended to address overfitting, as they shrink predictor effect estimates toward the null and reduce mean-square prediction error in new individuals. However, shrinkage and penalty terms ('tuning parameters') are estimated with uncertainty from the development data set. We examined the magnitude of this uncertainty and the subsequent impact on prediction model performance. STUDY DESIGN ANDEntities:
Keywords: Overfitting; Penalization; Risk prediction models; Sample size; Shrinkage
Year: 2020 PMID: 33307188 PMCID: PMC8026952 DOI: 10.1016/j.jclinepi.2020.12.005
Source DB: PubMed Journal: J Clin Epidemiol ISSN: 0895-4356 Impact factor: 6.437
Three prediction models developed using linear regression, with summary of model performance and bootstrap uniform shrinkage estimate
| Model | Outcome | Model equation derived using ordinary least squares estimation (i.e., before any shrinkage) | Number of patients/predictor parameters | Uniform shrinkage ( | |
|---|---|---|---|---|---|
| A | Systolic blood pressure (SBP) (low CVD risk population) | 28.10 + 0.46∗SBP + 0.41∗DBP + 0.013∗BMI + 0.45∗age − 2.05∗sex − 17.81∗treat − 2.08∗smoker | 262/7 = 37 | 0.23 | 0.94 (0.77 to 1.18) |
| B | Systolic blood pressure (SBP) (high CVD risk population) | −12.69 + 0.94∗SBP + 0.21∗DBP −0.001∗BMI + 0.06∗age + 1.72∗sex − 1.04∗treat + 0.17∗smoker | 253/7 = 36 | 0.56 | 0.98 (0.87 to 1.10) |
| C | ln(FEV) | −2.07 + 0.02∗age + 0.04∗height + 0.03∗sex + 0.05∗smoker | 654/4 = 164 | 0.81 | 1.00 (0.96 to 1.04) |
DBP, diastolic blood pressure; BMI, body mass index; CVD, cardiovascular disease.
Fig. 1The mean estimate and 95% confidence interval of the uniform shrinkage factor ( as derived from 1,000 bootstrap samples, across different sample sizes for developing models A, B, and C as described in Table 1. Curves are created using a lowess smoother.
Fig. 2Difference in predicted systolic blood pressure (SBP) values in mmHg, when using the lower or upper bound of the bootstrap-derived 95% confidence interval for the shrinkage factor ( to revise model A after (A) using 262 participants and (B) using 50 participants for model development.
Fig. 3Median values (short horizontal lines) and scatter plots showing variability in (A) tuning parameter estimate () and (B) predictive performance of the developed model in the large validation data, for various methods across varying sample sizes for model development; for each sample size, 500 data sets were simulated as described in Section 2.3, and for each data set, a model was developed for each method with (A) tuning parameter estimated, and then (B) predictive performance tested. In (B), the long horizontal lines are the large sample performance values. CITL = calibration-in-the-large; Horizontal spread within each sample size grouping is just random jitter to aid display; c-index is not shown for heuristic or bootstrap shrinkage, as these methods do not change the ranking of predictions, and thus the c-index is the same as maximum likelihood estimation.