| Literature DB >> 24608868 |
Abstract
As a performance measure for a prediction model, the area under the receiver operating characteristic curve (AUC) is insensitive to the addition of strong markers. A number of measures sensitive to performance change have recently been proposed; however, these relative-performance measures may lead to self-contradictory conclusions. This paper examines alternative performance measures for prediction models: the Lorenz curve-based Gini and Pietra indices, and a standardized version of the Brier score, the scaled Brier. Computer simulations are performed in order to study the sensitivity of these measures to performance change when a new marker is added to a baseline model. When the discrimination power of the added marker is concentrated in the gray zone of the baseline model, the AUC and the Gini show minimal performance improvements. The Pietra and the scaled Brier show more significant improvements in the same situation, comparatively. The Pietra and the scaled Brier indices are therefore recommended for prediction model performance measurement, in light of their ease of interpretation, clinical relevance and sensitivity to gray-zone resolving markers.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24608868 PMCID: PMC3946724 DOI: 10.1371/journal.pone.0091249
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Computing formulas and interpretations of various performance measures.
Figure 2Disease odds ratios (discrimination powers) of the new markers () (solid line: when the discrimination power of the new marker () is independent of the baseline score; dotted line: when the discrimination power of the new marker () is concentrated in the gray zone of the baseline model).
Figure 3Distribution of the predicted probabilities for a baseline model (A), and the model with the new marker added (B), or added (C).
The discrimination power of is independent of the baseline score, and that of is concentrated in the gray zone of the baseline model. The solid vertical bar indicates the grand mean of the predicted probabilities, and the two dotted vertical bars, the means of the predicted probabilities for the diseased subjects and the non-diseased subjects, respectively.
Improvements in prediction performances when new markers, , are added to a baseline model (), respectively.
| Performance Measure | ||||
| AUC | Gini | Pietra | sBrier | |
| Model | ||||
|
| 0.822 | 0.644 | 0.485 | 0.306 |
|
| 0.841 | 0.683 | 0.521 | 0.344 |
|
| 0.844 | 0.687 | 0.568 | 0.363 |
| Absolute (Relative) Improvement | ||||
| from | +0.019 (+2.3%) | +0.039 (+6.1%) | +0.036 (+7.4%) | +0.038 (+12.4%) |
| from | +0.022 (+2.7%) | +0.043 (+6.7%) | +0.083 (+17.1%) | +0.057 (+18.6%) |
The discrimination power of is independent of the baseline score, whereas that of is concentrated in the gray zone of the baseline model.