| Literature DB >> 24966219 |
Gordon C S Smith, Shaun R Seaman, Angela M Wood, Patrick Royston, Ian R White.
Abstract
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991-2003), Edinburgh (1999-2003), and Cambridge (1990-2006), as well as Scottish national pregnancy discharge data (2004-2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation.Entities:
Keywords: logistic models; models, statistical; multivariate analysis; receiver operating characteristic curve
Mesh:
Year: 2014 PMID: 24966219 PMCID: PMC4108045 DOI: 10.1093/aje/kwu140
Source DB: PubMed Journal: Am J Epidemiol ISSN: 0002-9262 Impact factor: 4.897
Figure 1.The medians and interquartile ranges of the difference between the C statistic estimated using different methods and the true C statistic for A) 50 small Down syndrome data sets and B) 150 small cesarean delivery data sets. The data are United Kingdom Down syndrome screening results from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)). Bars, 95% confidence intervals. CV, cross-validation.
Figure 2.The medians and interquartile ranges of the absolute (unsigned) difference between the C statistic estimated using different methods of correcting for optimism and the true C statistic for A) 50 small Down syndrome data sets and B) 150 small cesarean delivery data sets. (See Table 1 for comparison of absolute errors comparing different methods.) The data are United Kingdom Down syndrome screening results from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)). Bars, 95% confidence intervals. CV, cross-validation.
Pairwise Comparison of Absolute Errors Using Different Methods of Adjustment for Optimism Using Data on Down Syndrome and Cesarean Delivery, United Kingdom, 1990–2007
| Method of Adjustment by Data Set | Median Difference in Absolute Error (IQR)a | ||||
|---|---|---|---|---|---|
| Sample Splitting | Bootstrapping | 10-Fold CV | 10-Fold CV (20 Replications) | Leave-Pair-Out CV | |
| Down syndrome data set | |||||
| Bootstrapping | 0.022*** (–0.005–0.053) | ||||
| 10-Fold CV | 0.015 (–0.028–0.046) | –0.006** (–0.024–0.003) | |||
| 10-Fold CV (20 replications) | 0.025*** (–0.008–0.0520) | 0.002 (–0.006–0.007) | 0.011*** (–0.003–0.025) | ||
| Leave-pair-out CV | 0.018*** (–0.005–0.052) | 0.000 (–0.004–0.005) | 0.007** (–0.004–0.024) | –0.002 (–0.005–0.003) | |
| Leave-1-out CV | 0.014** (–0.006–0.053) | –0.003 (–0.031–0.016) | 0.010 (–0.016–0.029) | –0.006 (–0.028–0.014) | –0.010* (–0.026–0.013) |
| Cesarean delivery data set | |||||
| Bootstrapping | 0.012*** (–0.018–0.079) | ||||
| 10-Fold CV | 0.013** (–0.032–0.066) | –0.008** (–0.035–0.017) | |||
| 10-Fold CV (20 replications) | 0.015*** (–0.023–0.069) | –0.001 (–0.014––0.008) | 0.008 (–0.018–0.028) | ||
| Leave-pair-out CV | 0.015*** (–0.023–0.072) | 0.000 (–0.009–0.006) | 0.007* (–0.018–0.030) | –0.001 (–0.006–0.007) | |
| Leave-1-out CV | 0.016** (–0.031–0.060) | –0.012** (–0.037–0.023) | 0.000 (–0.026–0.029) | –0.009** (–0.029–0.016) | –0.019*** (–0.029–0.020) |
Abbreviations: CV, cross-validation; IQR, interquartile range.
* P < 0.05, **P < 0.01, ***P < 0.001.
a The absolute error associated with the method in the row is subtracted from the absolute error associated with the method in the column, and the medians and IQRs are presented for the 50 subgroups. Hence, positive values indicate greater absolute error using the method in the column, and negative values indicate lower absolute error. Statistical comparison is by the Wilcoxon signed-rank test versus the null hypothesis of no difference.
Figure 3.The C statistics for 50 consecutive analyses of a representative subsample illustrating the variability of repeated analysis using different methods. A) For the Down syndrome data, the standard deviations of the 50 repeated analyses were 0.004 (range, 0.876–0.893) for bootstrapping, 0.035 (range, 0.805–0.964) for 10-fold cross-validation (CV), and 0.008 (range, 0.869–0.901) for 10-fold cross-validation with 20 replications. B) For the cesarean delivery data, the standard deviations of the 50 repeated analyses were 0.004 (range, 0.686–0.700) for bootstrapping, 0.042 (range, 0.594–0.773) for 10-fold cross-validation, and 0.009 (range, 0.669–0.707) for 10-fold cross-validation with 20 replications. The data are United Kingdom Down syndrome screening results from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)).