| Literature DB >> 22629234 |
Walter Bouwmeester1, Nicolaas P A Zuithoff, Susan Mallett, Mirjam I Geerlings, Yvonne Vergouwe, Ewout W Steyerberg, Douglas G Altman, Karel G M Moons.
Abstract
BACKGROUND: We investigated the reporting and methods of prediction studies, focusing on aims, designs, participant selection, outcomes, predictors, statistical power, statistical methods, and predictive performance measures. METHODS ANDEntities:
Mesh:
Year: 2012 PMID: 22629234 PMCID: PMC3358324 DOI: 10.1371/journal.pmed.1001221
Source DB: PubMed Journal: PLoS Med ISSN: 1549-1277 Impact factor: 11.069
Figure 1Flowchart of included studies.
aThe hand search included only studies with an abstract, published in 2008 in The New England Journal of Medicine, The Lancet, JAMA: the Journal of the American Medical Association, Annals of Internal Medicine, BMJ, and PLoS Medicine. The following publication types were excluded beforehand: editorials, bibliographies, biographies, comments, dictionaries, directories, festschrifts, interviews, letters, news, and periodical indexes. bStudies, generally conducted in a yet healthy population, aimed at quantifying a causal relationship between a particular determinant or risk factor and an outcome, adjusting for other risk factors (i.e., confounders). cFor example, see [72].
Aim of the included multivariable prediction studies, subdivided by clinical domains.
| Study Aim | Cardiovascular ( | Oncology ( | Other | Total Papers ( | Number of Models ( | Number of Diagnostic Studies |
|
| ||||||
| Prediction was primary aim | 46 (11) | 62 (8) | 44 (15) | 48 (34) | 49 (66) | 1 |
| Prediction was secondary aim | 17 (4) | 31 (4) | 26 (9) | 24 (17) | 21 (28) | 0 |
|
| 21 (5) | 8 (1) | 15 (5) | 15 (11) | 14 (19) | 1 |
|
| 4 (1) | 0 (0) | 6 (2) | 4 (3) | 8 (11) | 0 |
|
| 8 (2) | 0 (0) | 3 (1) | 4 (3) | 5 (7) | 1 |
|
| 4 (1) | 0 (0) | 6 (2) | 4 (3) | 3 (4) | 2 |
Numbers are column percentages, with absolute numbers in parentheses.
Including studies from infectious diseases (n = 7), diabetes (n = 5), neonatology and child health (n = 6), mental disorders (e.g., dementia) (n = 4), and musculoskeletal disorders (e.g., lower back pain) (n = 4).
There were no external validation studies of a previously published model that also updated the model after poor validation.
Study design in relation to study aim.
| Study Design | Total ( | Predictor Finding Studies ( | Development without External Validation ( | Development with External Validation ( | External Validation (without Updating) ( | Impact Analysis ( | Specifications ( |
|
| 62 (44) | 53 (27) | 82 (9) | 100 (3) | 67 (2) | 100 (3) | Cross-sectional (1) |
| Randomized trial (13) | |||||||
|
| 14 (10) | 16 (8) | 9 (1) | 0 (0) | 33 (1) | 0 (0) | Cross-sectional (2) |
|
| 8(6) | 12 (6) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | Nested (4) |
| Non-nested (2) | |||||||
|
| 15 (11) | 20 (10) | 9 (1) | 0 (0) | 0 (0) | 0 (0) |
Numbers are column percentages, with absolute numbers in parentheses, except for the column “Specifications”, which includes only absolute numbers.
Some cohort studies had a cross-sectional cohort design, which was possible because the predictor values did not change (gender, genes, etc.) or because the study involved a diagnostic prediction model study.
Of the 13 studies that used randomized trial data, 11 were predictor finding or model development studies. Of these 11 studies, five adjusted for the treatment effect, three did not adjust because there was no treatment effect, one did not adjust despite an effective treatment, and in two studies reporting and adjustment for treatment effects was entirely missing.
One study used two designs: a cross-sectional case-cohort and a cross-sectional nested case-control (here both scored as nested case-control).
Reporting of outcomes.
| Reporting and Analysis of Outcomes | Percentage ( |
|
| 91 (62) |
|
| 22 (12) |
|
| 93 (63) |
|
| 9 (6) |
| Linear regression | 83 (5) |
| Logistic regression | 17 (1) |
|
| 34 (23) |
| Logistic regression | 91 (21) |
| Non-regression | 9 (2) |
|
| 12 (8) |
| Polytomous regression | 38 (3) |
| Logistic regression | 50 (4) |
| CART | 13 (1) |
|
| 48 (30) |
| Survival analysis | 97 (29) |
| Logistic regression | 3 (1) |
Impact studies were excluded from this table because these studies had outcomes of a different type (e.g., costs). Hence, the total number of studies is 68.
Not applicable in 11/68 studies, because all cause death was the outcome.
Types of outcomes and how they were analysed (unclear for five studies). The sum 6+23+8+30 is higher than 63 because some outcomes were analysed in more than one way (e.g., a time-to-event outcome that was analysed as time to event and as a binary outcome neglecting time). If a study analysed two binary outcomes, it was here counted as one binary outcome.
After dichotomization of a continuous outcome.
One study used the Cochran–Mantel–Haenszel procedure, another calculated odds ratios.
CART, classification and regression tree.
Reporting of candidate predictors.
| Reporting and Handling of Candidate Predictors | Percentage ( |
|
| 87 (59) |
|
| 75 (51) |
|
| 1 (1) |
|
| 55 (36) |
|
| 67 (43) |
| Kept linear (continuous) | 67 (43) |
| (Fractional) polynomial transformation or any spline transformation | 19 (12) |
| Categorised | 47 (30) |
| Dichotomized | 33 (21) |
| Other | 3 (2) |
Impact studies (n = 3) were excluded from this table as their aim is not to develop or validate a prediction model, but rather to quantify the effect or impact of using a prediction model on physicians' behaviour, patient outcome, or cost-effectiveness of care relative to not using the model or usual care. Hence, for this table total n = 68.
Not applicable for the three external validation studies. Hence, n = 65.
Not applicable in four studies, because one studied no continuous predictors, and the others were the three external validation studies. Hence, n = 64. Of these, handling was unclear in 19 studies, not described in two studies. The sum 43+12+30+21+2 is more than 43 because some studies handled continuous predictors in two ways (e.g., dichotomizing blood pressure and categorising body mass index into four categories).
Effective sample size of the included studies (reflecting statistical power).
| Effective Sample Size | Prediction as Primary Aim ( | Prediction as Secondary Aim ( |
|
| ||
| <5 | 8 (8) | 0 (0) |
| 5–10 | 6 (6) | 25 (7) |
| 10–15 | 11 (11) | 4 (1) |
| >15 | 63 (60) | 46 (13) |
| Number of participants or events not described | 11 (11) | 25 (7) |
|
| ||
| <5 | 7 (7) | 14 (4) |
| 5–10 | 7 (7) | 11 (3) |
| 10–15 | 0 (0) | 0 (0) |
| >15 | 19 (18) | 11 (3) |
| Number of candidate predictors not described | 67 (64) | 64 (18) |
Numbers are column percentages, with absolute numbers in parentheses. For continuous outcomes, the effective sample size is the number of participants divided by the number of predictors; for dichotomous outcomes, the effective sample size is the number of participants in the smallest category divided by the number of predictors; for time-to-event outcomes, the effective sample size is the number of events divided by the number of predictors.
Excluding impact and external validation studies, because they require very different statistical power calculations.
The number of candidate predictors was the total number of degrees of freedom (i.e., the sum of all candidate predictors, interactions, and dummy variables).
Method of predictor selection, stratified by whether prediction was the primary or secondary study aim.
| Selection Method | Prediction as Primary Aim ( | Prediction as Secondary Aim ( | Total ( |
|
| |||
|
| |||
| Method described | 75 (36) | 47 (8) | 68 (44) |
| Literature based | 71 (34) | 29 (5) | 60 (39) |
| A priori hypothesis/clinical reasoning | 29 (14) | 29 (5) | 29 (19) |
|
| |||
| Screening by univariable analysis | 13 (6) | 24 (4) | 15 (10) |
|
| |||
| Backward selection | 17 (8) | 18 (3) | 17 (11) |
| Forward selection | 6 (3) | 0 (0) | 5 (3) |
| Added value of a specific predictor to existing predictors or model | 25 (12) | 0 (0) | 18 (12) |
| All predictors included regardless of statistical significance | 40 (19) | 47 (8) | 42 (27) |
| Similar predictors combined | 17 (8) | 6 (1) | 11 (7) |
| Method not described | 27 (13) | 35 (6) | 29 (19) |
|
| |||
|
| 21; 29 (10) | 12; 18 (2) | 18; 26 (12) |
|
| 4; 6 (2) | 12; 18 (2) | 6; 9 (4) |
| Akaike's Information Criterion | 4; 6 (2) | 0; 0 (0) | 3; 4 (2) |
| Bayesian Information Criterion | 2; 6 (1) | 6; 9 (1) | 3; 4 (2) |
| Explained variance ( | 4; 6 (2) | 0; 0 (0) | 3; 4 (2) |
| Change in | 10; 14 (5) | 0; 0 (0) | 9; 13 (6) |
Numbers are column percentages, with absolute numbers in parentheses. Impact and external validation studies (n = 6) were excluded from this table as these issues are not applicable for these type of studies. Hence, n = 65.
More than one method may be used within a study; percentages do not add up to 100%.
Percentage (number) of studies that reported the applied method for selecting which predictors were included in the multivariable analyses, if it was not based on statistical analysis (i.e., univariable predictor–outcome associations).
Predictor inclusion in multivariable model was pre-specified, as the specific aim was to quantify the added value of a new predictor to existing predictors.
For example, systolic and diasystolic blood pressure combined to mean blood pressure.
For the items below, two percentages are given. The first percentage includes all studies (i.e., 48 predictor finding studies, 17 model development studies, or 65 total); the second is the percentage of all studies that applied some type of predictor selection in the multivariable analysis (35 predictor finding studies, 11 model development studies, and 46 total; the excluded studies did not apply any predictor selection in the multivariable analysis but simply pre-specified the final model).
Handling of missing values, stratified by whether prediction was the primary or secondary study aim.
| Reporting and Handling of Missing Values | Prediction as Primary Aim ( | Prediction as Secondary Aim ( | External Validation Studies ( | Impact Studies ( | Total ( |
|
| |||||
| Not reported or unclear | 35 (18) | 53 (9) | 0 (0) | 0 (0) | 38 (27) |
| Number of participants with missing values | 23 (11) | 12 (2) | 67 (2) | 0 (0) | 21 (15) |
| Number of missing values per predictor | 60 (29) | 47 (8) | 33 (1) | 100 (3) | 58 (41) |
| Number lost to follow-up | 40 (16) | 50 (7) | 50 (1) | 100 (3) | 46 (27) |
|
| |||||
| Complete case analysis | 71 (33) | 53 (9) | 67 (2) | 33 (1) | 65 (45) |
| Predictor with missing values omitted | 2 (1) | 12 (2) | 0 (0) | 0 (0) | 4 (3) |
| Missing indicator method | 14 (7) | 12 (2) | 0 (0) | 0 (0) | 13 (9) |
| Single imputation | 2 (1) | 6 (1) | 0 (0) | 0 (0) | 3 (2) |
| Multiple imputation | 10 (5) | 0 (0) | 0 (0) | 0 (0) | 7 (5) |
| Sensitivity analysis | 6 (3) | 24 (4) | 0 (0) | 0 (0) | 10 (7) |
| Not reported or unclear | 50 (23) | 65 (11) | 33 (1) | 67 (2) | 54 (37) |
Numbers are column percentages, with absolute numbers in parentheses.
Some studies reported more than one item. Hence, percentages do not add up to 100%.
Cross-sectional studies were excluded for this item (item not applicable).
More than one method could be applied. Hence, the percentages do not add up to 100%. Items were not applicable for two primary-aim studies that had no missing values. Hence, total n = 69.
Only participants with completely observed data were analysed.
For example: in a diagnostic study [73], the investigators assumed that among participants who did not undergo follow-up colonoscopy, the detection rates for any adenoma and for an advanced adenoma ranged from half to twice the rates among participants who did undergo follow-up colonoscopy.
Presentation of the results, stratified by type of prediction study.
| Type of Result Presented | Predictor Finding Studies ( | Development Studies ( | Total ( |
| Unadjusted (univariable) candidate predictor-outcome association | 18 (9) | 21 (3) | 18 (12) |
| Unadjusted association only of the predictors eventually included in the final model(i.e., after predictor selection) | 37 (19) | 29 (4) | 35 (23) |
| Adjusted associations of each predictor in full multivariable model | 18 (9) | 29 (4) | 20 (13) |
| Adjusted associations of each predictor in final multivariable model | 65 (33) | 79 (11) | 68 (44) |
| Simplified risk score/nomogram/score chart | 4 (2) | 36 (5) | 11 (7) |
Numbers are column percentages, with absolute numbers in parentheses. Impact and external validation studies (n = 6) were excluded from this table as these items were not applicable. Hence, total n = 65.
The percentages do not add up to 100%, because studies reported univariable and multivariable models. Further, all studies reporting the full model also reported the final model.
Model performance measures, stratified by type of prediction study.
| Performance measure | Predictor Finding Studies ( | Development ( | Total ( |
|
| |||
| Calibration plot | 0 (0) | 27 (4) | 6 (4) |
| Calibration intercept and slope | 0 (0) | 0 (0) | 0 (0) |
| Hosmer-Lemeshow statistic | 4 (2) | 27 (4) | 9 (6) |
|
| |||
|
| 12 (6) | 80 (12) | 27 (18) |
|
| |||
| NRI | 2 (1) | 40 (6) | 11 (7) |
| Sensitivity/specificity | 2 (1) | 7 (1) | 3 (2) |
| Other | 2 (1) | 33 (5) | 9 (6) |
|
| |||
| Brier score | 0 (0) | 7 (1) | 2 (1) |
|
| 8 (4) | 13 (2) | 9 (6) |
|
| |||
| Apparent | 18 (9) | 60 (9) | 27 (18) |
| Internal with jack-knife | 0 (0) | 7 (1) | 2 (1) |
| Internal with (random) split sample | 0 (0) | 13 (2) | 3 (2) |
| Internal with bootstrapping techniques | 4 (2) | 13 (2) | 6 (4) |
| External | 0 (0) | 27 (4) | 6 (4) |
Numbers are column percentages, with absolute numbers in parentheses. The percentages sometimes do not add up to 100% because development studies commonly reported more than one performance measure or validity assessment.
Impact studies (n = 3) were excluded since all items were not applicable. Additionally, two external validation studies were excluded because they evaluated risk stratification tools that did not provide predicted probabilities (the Manchester triage system [74] and predictive life support tools [75]). Hence, almost all items were not applicable. Hence, for this table total n = 66 studies.
The predictive performance (e.g., C-statistic, calibration, or net reclassification index) of the prediction model as estimated from the same data from which the model was developed.
AUC-ROC, area under the receiver operation characteristic curve; NRI, net reclassification index.