| Literature DB >> 28420338 |
M Shafiqur Rahman1, Gareth Ambler2, Babak Choodari-Oskooei3, Rumana Z Omar2.
Abstract
BACKGROUND: When developing a prediction model for survival data it is essential to validate its performance in external validation settings using appropriate performance measures. Although a number of such measures have been proposed, there is only limited guidance regarding their use in the context of model validation. This paper reviewed and evaluated a wide range of performance measures to provide some guidelines for their use in practice.Entities:
Keywords: Prognostic model; Survival analysis; Validation; calibration; discrimination
Mesh:
Year: 2017 PMID: 28420338 PMCID: PMC5395888 DOI: 10.1186/s12874-017-0336-2
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Summary of the performance measures
| Types of Measures | Measures | Characteristics | Range and Interpretation | Software |
|---|---|---|---|---|
| Overall Performance | R2 BS | Assesses relative gain in predictive accuracy quantified using at a specific time point based on squared error loss function. | Range: 0 to 1 | Available in SAS and R and easy to implement in other software |
| R2 IBS | Same approach as R2 BS but provides a summary over a range of time period. | Range: same as R2 BS Interpretation: % gain in predictive accuracy over a range of time period relative to the null model. | Available in SAS and R and easy to implement in other software | |
| R2 SH | Assesses relative gain in predictive accuracy quantified based on absolute error loss function. It is not robust to model mis-specification. | Same as R2 IBS | Available in SAS and R and easy to implement in other software | |
| R2 S | Modified version of R2 SH which is robust to model mis-specification. | Same as R2 IBS | Available in SAS and R and easy to implement other software | |
| R2 PM | Measures the variation in the outcome explained by the covariates in the model. Assume that the model is correctly specified. Requires re-calibration in the validation data. | Range: 0 to 1 | Easy to implement in any software | |
| R2 D | Measures the relative gain in prognostic separation quantified by the D statistic. Assume that the PI is normally distributed. | Range: 0 to 1 | Available in Stata and easy to implement in other software | |
| Discrimination | CH | Rank order statistic based on usable pairs in which shorter time corresponds to an event. | Range: 0.5 to 1 | Available in R and Stata and easy to implement in software |
| CU | Rank order statistic based on usable pairs. Inverse probability weighting is used to compensate for censoring. | Same as CH. | Available in R and easy to implement in other software | |
| CGH | Rank order statistic based on all patient pairs. Assumes that Cox PH model is correctly specified.Requires re-calibration in the validation data. | Same as CH. | Available in R and Stata and easy to implement in other software | |
| D | Quantifies the observed separation between low and high risk groups. Assumes that PI is normally distributed. | Range: 0 to ∞ | Available in Stata and easy to implement in other software | |
| Calibration | Cal Slope | Regression slope of the PI and assesses the agreement between the observed and predicted survival.. | Range: −∞ to ∞ | Easy to implement in any software |
Values of the performance measures estimated in the breast cancer validation data
| Measure | Value (95% CI) |
|---|---|
| R2 IBS(3) | 0.107 (0.036 to 0.178) |
| R2 SH (3) | 0.130 (0.089 to 0.171) |
| R2 S (3) | 0.128 (0.090 to 0.167) |
| R2 BS(3) | 0.141 (0.033 to 0.250) |
| R2 PM | 0.194 (0.094 to 0.294) |
| R2 D | 0.192 (0.093 to 0.291) |
| CH | 0.674 (0.622 to 0.726) |
| CU | 0.666 (0.610 to 0.722) |
| CGH | 0.659 (0.616 to 0.701) |
| CH(3) | 0.685 (0.633 to 0.737) |
| CU(3) | 0.676 (0.619 to 0.734) |
| D | 0.998 (0.672 to 1.323) |
| Cal. Slope | 0.764 (0.531 to 0.996) |
Fig. 1Prediction errors over time for the breast cancer risk model for: a) prediction error (based on a quadratic loss function) for calculating R2 IBS and R2 BS; b) prediction error (based on an absolute loss function) for calculating R2 SH
Mean (SD) of the overall performance measures for the breast cancer data over 5000 simulations
| Profile | Censoring | R2 IBS(3) | R2 SH(3) | R2 S(3) | R2 BS(3) | R2 PM | R2 D |
|---|---|---|---|---|---|---|---|
| Low | 0% | 0.099 (0.032) | 0.100 (0.018) | 0.101 (0.018) | 0.128 (0.037) | 0.232 (0.034) | 0.225 (0.034) |
| Low | 20% | 0.098 (0.033) | 0.100 (0.019) | 0.101 (0.019) | 0.128 (0.038) | 0.232 (0.038) | 0.228 (0.038) |
| Low | 50% | 0.099 (0.034) | 0.101 (0.019) | 0.101 (0.019) | 0.129 (0.040) | 0.234 (0.045) | 0.238 (0.048) |
| Low | 80% | 0.098 (0.041) | 0.100 (0.024) | 0.099 (0.023) | 0.127 (0.060) | 0.235 (0.065) | 0.255 (0.075) |
| Medium | 0% | 0.131 (0.032) | 0.133 (0.018) | 0.135 (0.018) | 0.176 (0.039) | 0.279 (0.035) | 0.277 (0.036) |
| Medium | 20% | 0.133 (0.032) | 0.135 (0.018) | 0.135 (0.018) | 0.177 (0.040) | 0.280 (0.038) | 0.280 (0.038) |
| Medium | 50% | 0.131 (0.034) | 0.135 (0.019) | 0.134 (0.019) | 0.176 (0.045) | 0.279 (0.046) | 0.283 (0.047) |
| Medium | 80% | 0.130 (0.045) | 0.133 (0.025) | 0.131 (0.025) | 0.176 (0.082) | 0.281 (0.068) | 0.292 (0.071) |
| High | 0% | 0.121 (0.028) | 0.123 (0.015) | 0.125 (0.015) | 0.165 (0.035) | 0.247 (0.035) | 0.243 (0.034) |
| High | 20% | 0.121 (0.028) | 0.124 (0.016) | 0.124 (0.016) | 0.165 (0.038) | 0.247 (0.038) | 0.242 (0.037) |
| High | 50% | 0.121 (0.031) | 0.125 (0.016) | 0.124 (0.017) | 0.164 (0.046) | 0.247 (0.047) | 0.243 (0.046) |
| High | 80% | 0.120 (0.048) | 0.121 (0.022) | 0.120 (0.026) | 0.168 (0.114) | 0.250 (0.070) | 0.252 (0.071) |
Fig. 2Box plots showing the distribution of the overall performance measures for 3 risk profiles (low, medium and high) and 4 levels of censoring (0, 20, 50 and 80%) for the breast cancer data over 5000 simulations
Fig. 3Scatter plot showing the relationships between the overall performance measures for the breast cancer data with the medium risk profile over 5000 simulations
Mean (SD) of the discrimination and calibration measures for the breast cancer data over 5000 simulations
| Profile | Censoring | CH | CU(τmax) | CGH | CH(3) | CU(3) | D | Cal. Slope |
|---|---|---|---|---|---|---|---|---|
| Low | 0% | 0.667 (0.015) | 0.667 (0.015) | 0.667 (0.012) | 0.684 (0.028) | 0.684 (0.028) | 1.103 (0.107) | 0.981 (0.108) |
| Low | 20% | 0.670 (0.018) | 0.667 (0.016) | 0.667 (0.014) | 0.684 (0.029) | 0.684 (0.029) | 1.111 (0.121) | 0.982 (0.116) |
| Low | 50% | 0.679 (0.023) | 0.668 (0.022) | 0.668 (0.017) | 0.687 (0.030) | 0.685 (0.029) | 1.144 (0.152) | 0.987 (0.136) |
| Low | 80% | 0.689 (0.039) | 0.673 (0.060) | 0.667 (0.024) | 0.690 (0.040) | 0.684 (0.040) | 1.197 (0.243) | 0.989 (0.190) |
| Medium | 0% | 0.690 (0.015) | 0.690 (0.015) | 0.689 (0.013) | 0.704 (0.023) | 0.704 (0.023) | 1.269 (0.113) | 0.979 (0.101) |
| Medium | 20% | 0.694 (0.017) | 0.690 (0.015) | 0.690 (0.014) | 0.705 (0.024) | 0.704 (0.024) | 1.278 (0.123) | 0.984 (0.107) |
| Medium | 50% | 0.701 (0.022) | 0.690 (0.021) | 0.689 (0.017) | 0.706 (0.026) | 0.704 (0.026) | 1.288 (0.152) | 0.980 (0.126) |
| Medium | 80% | 0.711 (0.037) | 0.698 (0.056) | 0.689 (0.024) | 0.711 (0.037) | 0.704 (0.037) | 1.316 (0.231) | 0.986 (0.177) |
| High | 0% | 0.677 (0.015) | 0.677 (0.015) | 0.676 (0.013) | 0.684 (0.021) | 0.684 (0.021) | 1.158 (0.108) | 0.977 (0.108) |
| High | 20% | 0.679 (0.017) | 0.677 (0.016) | 0.676 (0.014) | 0.684 (0.022) | 0.683 (0.021) | 1.155 (0.118) | 0.979 (0.116) |
| High | 50% | 0.684 (0.023) | 0.677 (0.021) | 0.676 (0.018) | 0.686 (0.025) | 0.683 (0.024) | 1.158 (0.148) | 0.980 (0.139) |
| High | 80% | 0.692 (0.038) | 0.683 (0.058) | 0.676 (0.026) | 0.692 (0.038) | 0.685 (0.042) | 1.187 (0.230) | 0.987 (0.198) |
Fig. 4Box plots showing the distribution of the concordance measures for 3 risk profiles (low, medium and high) and 4 levels of censoring (0, 20, 50 and 80%) for the breast cancer data over 5000 simulations
Fig. 5Scatter plot showing the relationships between the discrimination measures for the breast cancer data with the medium risk profile over 5000 simulations
Mean (SD) of the overall performance measures for the HCM data over 5000 simulations
| Profile | Censoring | R2 IBS(5) | R2 SH (5) | R2 S (5) | R2 BS(5) | R2 PM | R2 D |
|---|---|---|---|---|---|---|---|
| Low | 0% | 0.013 (0.015) | 0.013 (0.006) | 0.014 (0.006) | 0.020 (0.019) | 0.173 (0.021) | 0.166 (0.021) |
| Low | 20% | 0.013 (0.014) | 0.013 (0.006) | 0.013 (0.006) | 0.020 (0.019) | 0.173 (0.022) | 0.173 (0.023) |
| Low | 50% | 0.014 (0.015) | 0.013 (0.006) | 0.013 (0.006) | 0.020 (0.019) | 0.174 (0.026) | 0.184 (0.029) |
| Low | 80% | 0.014 (0.015) | 0.014 (0.007) | 0.014 (0.006) | 0.020 (0.020) | 0.174 (0.037) | 0.201 (0.047) |
| Medium | 0% | 0.018 (0.014) | 0.018 (0.006) | 0.019 (0.006) | 0.027 (0.019) | 0.221 (0.022) | 0.221 (0.023) |
| Medium | 20% | 0.018 (0.014) | 0.018 (0.006) | 0.019 (0.006) | 0.027 (0.018) | 0.221 (0.023) | 0.226 (0.024) |
| Medium | 50% | 0.018 (0.014) | 0.018 (0.006) | 0.019 (0.006) | 0.027 (0.019) | 0.221 (0.028) | 0.233 (0.031) |
| Medium | 80% | 0.018 (0.015) | 0.018 (0.008) | 0.019 (0.007) | 0.027 (0.019) | 0.222 (0.038) | 0.241 (0.042) |
| High | 0% | 0.018 (0.013) | 0.018 (0.005) | 0.018 (0.005) | 0.026 (0.017) | 0.199 (0.022) | 0.200 (0.022) |
| High | 20% | 0.018 (0.013) | 0.018 (0.005) | 0.018 (0.005) | 0.027 (0.017) | 0.199 (0.023) | 0.201 (0.023) |
| High | 50% | 0.018 (0.013) | 0.018 (0.006) | 0.018 (0.005) | 0.026 (0.017) | 0.200 (0.028) | 0.203 (0.029) |
| High | 80% | 0.018 (0.013) | 0.018 (0.007) | 0.018 (0.006) | 0.026 (0.017) | 0.201 (0.040) | 0.206 (0.041) |
Fig. 6Box plots showing the distribution of the overall performance measures for 3 risk profiles (low, medium and high) and 4 levels of censoring (0, 20, 50and 80%) for the HCM data over 5000 simulations
Mean (SD) of the discrimination and calibration measures for the HCM data over 5000 simulations
| Profile | Censoring | CH | CU(τmax) | CGH | CH(5) | Cu(5) | D | Cal. Slope |
|---|---|---|---|---|---|---|---|---|
| Low | 0% | 0.645 (0.011) | 0.645 (0.011) | 0.645 (0.009) | 0.675 (0.061) | 0.675 (0.061) | 0.911 (0.070) | 0.983 (0.082) |
| Low | 20% | 0.649 (0.012) | 0.645 (0.011) | 0.645 (0.009) | 0.676 (0.061) | 0.676 (0.061) | 0.934 (0.075) | 0.986 (0.086) |
| Low | 50% | 0.656 (0.016) | 0.645 (0.014) | 0.645 (0.011) | 0.676 (0.062) | 0.676 (0.062) | 0.971 (0.095) | 0.989 (0.098) |
| Low | 80% | 0.666 (0.026) | 0.649 (0.039) | 0.645 (0.016) | 0.676 (0.063) | 0.676 (0.063) | 1.025 (0.151) | 0.988 (0.136) |
| Medium | 0% | 0.670 (0.010) | 0.670 (0.010) | 0.670 (0.009) | 0.694 (0.049) | 0.694 (0.049) | 1.090 (0.072) | 0.985 (0.075) |
| Medium | 20% | 0.674 (0.012) | 0.670 (0.011) | 0.670 (0.009) | 0.695 (0.048) | 0.695 (0.048) | 1.105 (0.077) | 0.986 (0.079) |
| Medium | 50% | 0.680 (0.015) | 0.670 (0.013) | 0.670 (0.011) | 0.694 (0.049) | 0.694 (0.049) | 1.127 (0.097) | 0.985 (0.091) |
| Medium | 80% | 0.688 (0.022) | 0.675 (0.033) | 0.670 (0.015) | 0.695 (0.050) | 0.695 (0.050) | 1.153 (0.134) | 0.989 (0.115) |
| High | 0% | 0.661 (0.011) | 0.661 (0.011) | 0.661 (0.009) | 0.676 (0.043) | 0.676 (0.043) | 1.022 (0.070) | 0.982 (0.079) |
| High | 20% | 0.663 (0.011) | 0.661 (0.011) | 0.661 (0.010) | 0.677 (0.043) | 0.677 (0.043) | 1.025 (0.075) | 0.983 (0.083) |
| High | 50% | 0.667 (0.015) | 0.661 (0.013) | 0.661 (0.011) | 0.676 (0.043) | 0.676 (0.043) | 1.032 (0.092) | 0.984 (0.097) |
| High | 80% | 0.672 (0.023) | 0.664 (0.034) | 0.661 (0.016) | 0.676 (0.044) | 0.676 (0.044) | 1.042 (0.133) | 0.987 (0.132) |
Fig. 7Box plots showing the distribution of the concordance measures for 3 risk profiles (low, medium and high) and 4 levels of censoring (0, 20, 50 and 80%) for the HCM data over 5000 simulations