| Literature DB >> 20353579 |
Susan Mallett1, Patrick Royston, Rachel Waters, Susan Dutton, Douglas G Altman.
Abstract
BACKGROUND: Appropriate choice and use of prognostic models in clinical practice require the use of good methods for both model development, and for developing prognostic indices and risk groups from the models. In order to assess reliability and generalizability for use, models need to have been validated and measures of model performance reported. We reviewed published articles to assess the methods and reporting used to develop and evaluate performance of prognostic indices and risk groups from prognostic models.Entities:
Mesh:
Year: 2010 PMID: 20353579 PMCID: PMC2857810 DOI: 10.1186/1741-7015-8-21
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 8.775
Numerical and graphical presentation of model (n = 47)
| % (n) articles | |
|---|---|
| Statistical model used | |
| Cox only | 88 (41) |
| Cox plus other (two RPA, one ANN) | 6 (3) |
| Other (one Weibull, one RPA, one unclear) | 6 (3) |
| Assumption of proportional hazards tested† | 21 (10) |
| Final prognostic model reported* | 96 (45) |
| Regression coefficient reported** | 72 (34) |
| Reproducibility of model development assessed†† | 11 (5) |
| Model with same variables, not same coefficients | 9 (4) |
| Model generating both new variables and coefficients | 4 (2) |
† In three articles assumption of proportional hazards was not applicable as models used were RPA (recursive partitioning analysis) and ANN (artificial neural network).
* Two articles did not report the final model
** Not applicable in two articles using RPA or ANN model
†† One article used both methods to examine model reproducibility.
Prognostic index, risk groups and model fitting
| % (n) articles | |
|---|---|
| Prognostic index (PI) developed | 81 (38) |
| Components of final model used to create PI | |
| Same variables and coefficients | 34 (13/38) |
| Same variables but not same coefficients | 21 (8/38) |
| Neither same variables nor coefficients | 29 (11/38) |
| Method unclear | 16 (6/38) |
| Risk groups are created from prognostic model | 76 (36) |
| Method used to create risk groups | |
| Data driven | 28 (10/36) |
| Equal size groups created | 14 (5/36) |
| Other non data driven method | 11 (4/36) |
| Method unclear | 8 (3/36) |
| Method not reported | 39 (14/36) |
| Number of risk groups created | |
| Two risk groups | 11 (4/36) |
| Three risk groups | 39 (14/36) |
| Four risk groups | 31 (11/36) |
| Five or more groups | 11 (4/36) |
| Several different risk groupings used | 8 (3/36) |
Model performance on data used to develop model and usability* (n = 42)
| Articles with risk groups | Articles with PI but no risk groups | |
|---|---|---|
| Presentation of discrimination of model predictions† | ||
| KM for risk groups | 34 | NA |
| Nomogram | 2 | 4 |
| Other graphical | 2 | 2 |
| % survival probability at fixed time†† | 22 | 0 |
| Index of discrimination (see below) | 9 | 2 |
| Log rank | 17 | NA |
| Unspecified | 6 | 0 |
| No presentation | 0 | 0 |
| Index of discrimination$ | ||
| c-index | 7 | 1 |
| R squared or goodness of fit or Brier score | 1 | 1 |
| D | 0 | 0 |
| Other - K (Begg), sensitivity and specificity | 2 | 0 |
| Reclassification of patient risk | 0 | 0 |
| Calibration | ||
| Yes | 1 | 1 |
| No | 35 | 5 |
| Model usability from article$$ | ||
| Prognostic score or risk group can be assigned | 33 | 6 |
| Survival presented for risk group and/or prognostic score | 36 | 5 |
| Instructions for use suitable for physicians included | 3 | 3 |
* Five articles did not develop PI or use risk groups.
**Four of six articles had some commonality: three articles included the same author, one the same department.
†More than one option possible.
†† All articles have either KM or % survival by risk group.
$ Two articles with risk groups report two indices of discrimination.
$$ Four articles are unusable, lacking one or other criteria.
Model performance on validation data
| Articles with risk groups | |
|---|---|
| Presentation of discrimination of model predictions | |
| KM for risk groups | 1 |
| Other graphical | 0 |
| % survival probability at fixed time | 2 |
| Index of discrimination (see below) | 11 |
| Log rank | 1 |
| Unspecified | 0 |
| No presentation of discrimination | 4 |
| Index of discrimination$ | |
| c-index | 10 |
| R squared or goodness of fit | 4 |
| D | 0 |
| Other - k (Begg), SEP (Graf) | 2 |
| Reclassification of patient risk | 0 |
| Calibration | |
| Yes | 2 |
| No | 14 |
† More than one option possible
$ Two studies reported two indices of discrimination, one study three indices
Reproducibility and validation of models
| Topic | % (n = 47) articles |
|---|---|
| Model validation included | 34 (16) |
| Validation dataset* | |
| Same data (bootstrap) | 13 (6) |
| Same population, new data** | 23 (11) |
| External (that is, new population setting) | 11 (5) |
| Larger series including original sample | 0 (0) |
| Validation of models | |
| Final model with same coefficients and variables | 26 (12) |
| Unclear reporting | 9 (4) |
| Modifications suggested to model in light of validation? | 0 (0) |
* Five studies included validation with more than one dataset.
** Methods used by studies were as follows: five studies used a random split, two used temporal split, five used cross validation, one used the jacknife method. Two studies used two and three methods respectively.