| Literature DB >> 27557642 |
Romin Pajouheshnia1, Wiebe R Pestman2, Steven Teerenstra3,4, Rolf H H Groenwold5.
Abstract
BACKGROUND: It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research.Entities:
Mesh:
Year: 2016 PMID: 27557642 PMCID: PMC4997720 DOI: 10.1186/s12874-016-0209-0
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1An example of the comparison of two linear regression modelling strategies. Strategies A and B are individually applied to a data set and the ratio SSE(B)/SSE(A) is calculated. The process is repeated 10,000 times yielding a comparison distribution. The left tail below a cut off value of 1 represents the victory rate of strategy B over strategy A, the proportion of times strategy B outperformed strategy A
A comparison of modelling strategies against the null strategy in the full Oudega DVT data
| Strategy | Victory rate (%) | Median | IQR | Mean shrinkage |
|---|---|---|---|---|
| 1. Heuristic shrinkage | 56.9 | −0.2 | 1.5 | 0.97 |
| 2. Split sample shrinkage | 66.8 | −0.2 | 0.7 | 0.98 |
| 3. 10-fold CV shrinkage | 48.0 | 0.0 | 0.1 | 1.00 |
| 4. Bootstrap shrinkage | 66.4 | −0.3 | 1.0 | 0.97 |
| 5. Firth penalization | 66.9 | −0.2 | 0.6 | -* |
Victory rates and associated metrics are presented. Values are based on 5000 comparison replicates. Abbreviations: IQR interquartile range, CV cross-validation
*No mean shrinkage for the Firth penalization strategy is presented as shrinkage occurs during the coefficient estimation process
Fig. 2Histograms of the distributions resulting from comparisons between five modelling strategies and the null strategy in the full Oudega data set. The victory rate of each strategy over the null strategy is represented by the proportion of trials to the left of the blue indicator line. The distributions each represent 5000 comparison replicates
A comparison of modelling strategies in three additional clinical data sets
| Strategy |
|
|
| |||
|---|---|---|---|---|---|---|
| Victory rate (%) | Mean shrinkage | Victory rate (%) | Mean shrinkage | Victory rate (%) | Mean shrinkage | |
| 1. Heuristic shrinkage | 63.8 | 0.93 | 60.8 | 0.93 | 3.9 | 0.71 |
| 2. Split sample shrinkage | 61.9 | 0.92 | 42.0 | 0.94 | 93.8 | 0.98 |
| 3. 10 fold CV shrinkage | 38.3 | 1.00 | 39.6 | 0.99 | 90.9 | 0.99 |
| 4. Bootstrap shrinkage | 56.4 | 0.89 | 42.6 | 0.94 | 94.9 | 0.97 |
| 5. Firth penalization | 73.8 | -* | 66.0 | -* | 65.8 | -* |
Victory rates of each strategy over the null strategy are presented, as well as the mean shrinkage factor applied in each of the shrinkage-based strategies. Values are based on 5000 comparison replicates. Abbreviations: CV cross-validation
*No mean shrinkage for the Firth penalization strategy is presented as shrinkage occurs during the coefficient estimation process
Fig. 3a-e The influence of data characteristics on the performance of different modelling strategies compared to the null strategy. Victory rates were estimated across a range of values of a data parameter, keeping all other parameters fixed. a Linear regression using simulated data; the number of observations in the data per model variable was varied. b Linear regression using simulated data; the fraction of explained variance (R2) of the least squares model was varied. c Logistic regression using simulated data based on the full Oudega data; the number of outcome events in the data per model variable was varied. d Logistic regression using simulated data based on the full Oudega data; the explained variance (Nagelkerke’s R2) of the maximum likelihood model was varied. e Logistic regression using simulated data based on the Deepvein data; the number of outcome events in the data per model variable was varied. * A loess smoother was applied to (c), (d) and (e)