| Literature DB >> 20846460 |
David Vergouw1, Martijn W Heymans, George M Peat, Ton Kuijpers, Peter R Croft, Henrica C W de Vet, Henriëtte E van der Horst, Daniëlle A W M van der Windt.
Abstract
BACKGROUND: In prognostic studies model instability and missing data can be troubling factors. Proposed methods for handling these situations are bootstrapping (B) and Multiple imputation (MI). The authors examined the influence of these methods on model composition.Entities:
Mesh:
Year: 2010 PMID: 20846460 PMCID: PMC2954918 DOI: 10.1186/1471-2288-10-81
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Patient characteristics at baseline
| variable | n | (%) |
|---|---|---|
| Age (years); mean (SD) | 51 | (14) |
| gender (male) | 292 | (50) |
| education | ||
| low* | 210 | (36) |
| middle | 234 | (40) |
| high | 135 | (23) |
| shoulder complaints in the past year | 321 | (55) |
| neck complaints in the past year | 252 | (43) |
| duration of complaints | ||
| 0-6 weeks | 205 | (35) |
| 7-12 weeks | 139 | (24) |
| > 3 months | 242 | (41) |
| gradual onset (vs. acute) | 363 | (62) |
| shoulder pain (0-10); mean (SD) | 4.8 | (2) |
| shoulder disability (0-100); mean (SD) | 59.9 | (24) |
| both shoulders afflicted | 74 | (13) |
| co-morbidity | 469 | (80) |
| upper extremity joint pain | 245 | (42) |
| neck pain | 197 | (34) |
| upper extremity joint pain | 174 | (30) |
| low back pain | 139 | (24) |
| high back pain | 53 | (9) |
| psychological complaints | 55 | (9) |
| pain coping (0-6); mean (SD) | 2.98 | (0.98) |
| catastrophizing (0-6); mean (SD) | 2.2 | (0.8) |
| internal locus of control (0-6); mean (SD) | 3.3 | (0.9) |
| external locus of control(0-6); mean (SD) | 3.2 | (0.88) |
| anxiety (0-6); mean (SD) | 0.3 | (1.2) |
| depression (0-6); mean (SD) | 0.2 | (1.3) |
| somatisation (0-6); mean (SD) | 3.3 | (4.1) |
| distress (0-6); mean (SD) | 2.3 | (4.5) |
| fear-avoidance (0-6); mean (SD) | 14.1 | (5.6) |
| kinesophobia (0-6); mean (SD) | 3.2 | (3.5) |
| physical load at work (0-5); mean (SD) | 1.2 | (1.5) |
| physical activity | ||
| less active than others | 110 | (39) |
| equally active | 245 | (42) |
| more active | 226 | (19) |
| inability to perform daily activities last year | ||
| 1-30 days | 184 | (31) |
| 1-12 months | 61 | (10) |
| sporting activities | 230 | (39) |
| cause of shoulder problem: sporting injury | 29 | (5) |
SD = standard deviation
Complete case and multiple imputed model compositions for the outcome measure persistent shoulder disability
| missing values | CCA | MI-5 | |
|---|---|---|---|
| rank | rank | ||
| persistent shoulder disability* | 72 (12.3%) | ||
| inability to perform daily activities | 8 (1.4%) | 1 | 4 |
| shoulder complaints in the past year | 27 (4.6%) | 2 | |
| both shoulders afflicted | 0 (0%) | 3 | 3 |
| concomitant lower back pain | 0 (0%) | 4 | 1 |
| concomitant lower extremity pain | 0 (0%) | 5 | |
| more disability at baseline | 2 (0.3%) | 6 | 7 |
| longer duration of complaints | 1 (0.2%) | 2 | |
| higher scores for somatisation | 3 (0.5%) | 5 | |
| higher scores for external locus of control | 33 (5.6%) | 6 | |
| older age | 0 (0%) | 8 |
CCA - complete case analysis
MI -5 - multiple imputation using 5 imputation files
rank - the order of appearance of predictors in the derived model arranged by their predictive ability (regression coefficient)
* - outcome measure
Complete case and multiple imputed model compositions for the outcome measure persistent shoulder pain intensity
| missing values | CCA | MI-5 | |
|---|---|---|---|
| rank | rank | ||
| persistent shoulder pain intensity* | 76 (12.9%) | ||
| sporting injury | 0 (0%) | 1 | 1 |
| concomitant lower back pain | 0 (0%) | 2 | 3 |
| longer duration of complaints | 1 (0.2%) | 3 | 2 |
| both shoulders afflicted | 0 (0%) | 4 | 4 |
| inability to perform daily activities | 8 (1.4%) | 5 | 5 |
| concomitant upper extremity pain | 0 (0%) | 6 | 6 |
| sporting activities | 0 (0%) | 7 | |
| higher physical workload | 0 (0%) | 8 |
CCA - complete case analysis
MI -5 - multiple imputation using 5 imputation files
rank - the order of appearance of predictors in the derived model arranged by their predictive ability (regression coefficient)
% - inclusion frequency; the proportion of times that a variable with a univariable association with the outcome is retained in the automated backward selected models. When a variable was selected in each of the replications, the inclusion frequency was 100%
* - outcome measure
Complete case bootstrap model selection for the outcome measure persistent disability
| most frequently selected models | rank | ||||||
|---|---|---|---|---|---|---|---|
| Predictors* | 1 | 2 | 2 | 4 | 5 | B | CCA |
| inability to perform daily activities | X | X | X | X | X | 1 | 1 |
| both shoulders afflicted | X | - | - | X | X | 2 | 3 |
| shoulder complaints in the past year | X | X | X | X | - | 3 | 2 |
| concomitant lower extremity pain | X | X | X | X | X | 4 | 5 |
| more disability at baseline | X | X | X | X | X | 5 | 6 |
| concomitant lower back pain | - | - | X | X | X | - | 4 |
| older age | - | - | - | - | - | - | - |
| longer duration of complaints | - | - | - | - | - | - | - |
| acute onset | - | - | - | - | - | - | - |
| Count | 33 | 23 | 23 | 22 | 16 | ||
| % | 6.6 | 4.6 | 4.6 | 4.4 | 3.2 | ||
* - only those predictors that appeared in ≥40% of the first bootstrap model selection step are presented
rank - the order of appearance of predictors in the derived models arranged by their predictive ability (regression coefficient estimates)
B - the complete case date based bootstrap selected model (i.e. the most frequently occurring combination of predictors in 500 replicate data sets of the second bootstrap model selection step)
CCA - the complete case data based model derived without additional bootstrap was the fourth most occurring combination of predictors in the bootstrap model selection procedure
Count - the number of times the model was selected in the 500 replicate data sets of the second bootstrap model selection step
Complete case bootstrap model selection results for the outcome measure persistent pain intensity
| most frequently selected models | rank | ||||||
|---|---|---|---|---|---|---|---|
| Predictors | 1 | 2 | 3 | 4 | 5 | B | CCA |
| longer duration of complaints | X | X | X | X | X | 1 | 3 |
| concomitant lower back pain | X | X | X | X | X | 2 | 2 |
| both shoulders afflicted | X | X | - | X | X | 3 | 4 |
| concomitant upper extremity pain | X | X | X | - | - | 4 | 6 |
| shoulder complaints in the past year | X | - | X | X | - | 5 | - |
| sporting injury* | - | 1 | |||||
| inability to perform daily activities* | - | 5 | |||||
| sporting activities* | - | 7 | |||||
| higher physical workload* | - | 8 | |||||
| Count | 120 | 96 | 58 | 47 | 37 | ||
| % | 24.0 | 19.2 | 11.6 | 9.4 | 7.4 | ||
* - predictors that appeared in ≥ 40% in the first step of the of the bootstrap model selection are not used in the second step in model selection
rank - the order of appearance of predictors in the derived models arranged by their predictive ability (regression coefficient estimates)
B - the complete case date based bootstrap selected model (i.e. the most frequently occurring combination of predictors in 500 replicate data sets of the second bootstrap model selection step)
CCA - the complete case data based model derived without additional bootstrap did not occur in the bootstrap model selection since some of the included predictors occurred ≥ 40% in the first selection step
Count - the number of times the model was selected in the 500 replicate data sets of the second bootstrap model selection step
Imputed bootstrap model selection results for the outcome measure persistent pain intensity
| most frequently selected models | rank | ||||||
|---|---|---|---|---|---|---|---|
| Predictors* | 1 | 2 | 3 | 4 | 5 | MI-5+B | MI-5 |
| sporting injury | X | X | X | X | X | 1 | 1 |
| longer duration of complaints | X | X | X | X | X | 2 | 2 |
| concomitant lower back pain | X | X | X | X | X | 3 | 3 |
| both shoulders afflicted | X | X | X | X | X | 4 | 4 |
| inability to perform daily activities | X | X | X | X | X | 5 | 5 |
| higher level of education | X | X | X | - | X | 6 | - |
| shoulder complaints in the past year | X | X | - | - | X | 7 | - |
| concomitant upper extremity pain | X | - | X | X | X | 8 | 6 |
| higher physical workload | X | X | X | - | X | 9 | - |
| Count | 163 | 158 | 113 | 111 | 105 | ||
| % | 6.5 | 6.3 | 4.5 | 4.4 | 4.2 | ||
* - only those predictors that appeared in ≥40% of the first bootstrap model selection step are presented
rank - the order of appearance of predictors in the derived models arranged by their predictive ability (regression coefficient estimates)
MI-5+B - the multiple imputation based bootstrap selected model (i.e. the most frequently occurring combination of predictors in 2500 replicate data sets of the second bootstrap model selection step)
MI-5 - the multiple imputation based model using 5 imputed data sets was the fourth most occurring combination of predictors in the bootstrap model selection procedure.
Count - the number of times the model was selected in the 2500 replicate data sets of the second bootstrap model selection step
Imputed bootstrap model selection results for the outcome measure persistent disability
| most frequently selected models | rank | ||||||
|---|---|---|---|---|---|---|---|
| Predictors* | 1 | 2 | 3 | 4 | 5 | MI-5+B | MI-5 |
| concomitant lower back pain | X | X | X | X | X | 1 | 1 |
| longer duration of complaints | X | X | X | X | - | 2 | 2 |
| both shoulders afflicted | X | X | X | X | X | 3 | 3 |
| inability to perform daily activities | X | X | X | - | X | 4 | 4 |
| higher scores for somatisation | X | X | X | X | X | 5 | 5 |
| higher scores for external locus of control | X | X | X | X | X | 6 | 6 |
| more disability at baseline | X | X | X | X | X | 7 | 7 |
| older age | X | X | - | X | X | 8 | 8 |
| shoulder complaints in the past year | - | X | X | - | X | - | - |
| concomitant lower extremity pain | - | - | - | - | - | - | - |
| Count | 91 | 77 | 56 | 54 | 52 | ||
| % | 3.6 | 3.1 | 2.2 | 2.2 | 2.1 | ||
* - only those predictors that appeared in ≥40% of the first bootstrap model selection step are presented
rank - the order of appearance of predictors in the derived models arranged by their predictive ability (regression coefficient estimates)
MI-5+B - the multiple imputation based bootstrap selected model (i.e. the most frequently occurring combination of predictors in 2500 replicate data sets of the second bootstrap model selection step)
MI-5 - the multiple imputation based model using 5 imputed data sets was also the most frequently occurring combination of predictors in the 2500 bootstrap replicate data sets
Count - the number of times the model was selected in the 2500 replicate data sets of the second bootstrap model selection step.
Model performance parameters.
| Persistent disability | Persistent shoulder pain intensity | |||||||
|---|---|---|---|---|---|---|---|---|
| CCA | MI-5 | B | MI-5 + B | CCA | MI-5 | B | MI-5 + B | |
| calibration slope | 0.978 | 0.978 | 1.077 | 0.978 | 0.985 | 0.973 | 0.998 | 0.986 |
| R2N | 0.119 | 0.120 | 0.088 | 0.120 | 0.188 | 0.162 | 0.135 | 0.174 |
| Ac 95% CI | 0.666 0.616,0.715 | 0.667 0.624,0.710 | 0.645 0.596,0.694 | 0.667 0.624,0.710 | 0.717 0.668,0.766 | 0.702 0.660,0.745 | 0.684 0.637,0.732 | 0.710 0.668,0.752 |
| Opt | 0.027 | 0.022 | 0.023 | 0.022 | 0.030 | 0.014 | 0.018 | 0.022 |
| Oc | 0.639 | 0.646 | 0.622 | 0.646 | 0.686 | 0.688 | 0.667 | 0.688 |
Ac - apparent c-index
B - bootstrapping based on a complete case data set
CCA - complete case analysis
MI-5+B - multiple imputation combined with bootstrapping
MI-5 - multiple imputation using 5 imputation files
Oc - optimism corrected c-index
Opt - estimation of the overoptimism
R2N - explained variance (Nagelkerke's R-squared)
95% CI - 95% confidence interval