| Literature DB >> 19087314 |
Zoran Bursac1, C Heath Gauss, David Keith Williams, David W Hosmer.
Abstract
BACKGROUND: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process.Entities:
Year: 2008 PMID: 19087314 PMCID: PMC2633005 DOI: 10.1186/1751-0473-3-17
Source DB: PubMed Journal: Source Code Biol Med ISSN: 1751-0473
Simulation results.
| n | Purposeful | Stepwise | Backward | Forward |
| 60 | 5.1 | 4.5 | 4.9 | 9 |
| 120 | 24.2 | 22.4 | 22.8 | 24.2 |
| 240 | 52.6 | 52.6 | 52.5 | 36 |
| 360 | 69.8 | 69.8 | 69.8 | 42.5 |
| 480 | 71.1 | 71.2 | 71.1 | 44.2 |
| 600 | 70.5 | 70.6 | 70.5 | 40.4 |
Retention of the correct model for purposeful, stepwise, backward, and forward selection methods, with no confounding present.
Simulation results
| Confounding | Non-candidate Inclusion | n | Purposeful | Stepwise | Backward | Forward |
| 20 | 0.1 | 60 | 5 | 3.6 | 6.3 | 9.1 |
| 120 | 17.3 | 15.6 | 18.2 | 18.8 | ||
| 240 | 39.7 | 39.6 | 40.1 | 30.3 | ||
| 360 | 55.2 | 54.4 | 54.4 | 36.6 | ||
| 480 | 64.3 | 64.3 | 64.3 | 37.5 | ||
| 600 | 65.8 | 65.7 | 65.7 | 41.3 | ||
| 20 | 0.15 | 60 | 9.2 | 4.6 | 6.4 | 8.1 |
| 120 | 18.7 | 14.8 | 17.2 | 18.5 | ||
| 240 | 43.1 | 37.1 | 38.2 | 30.5 | ||
| 360 | 56.5 | 53.7 | 53.9 | 37 | ||
| 480 | 63.6 | 62.6 | 62 | 43 | ||
| 600 | 70.3 | 69 | 68.7 | 41 | ||
| 15 | 0.1 | 60 | 6.6 | 4.1 | 6.1 | 9.6 |
| 120 | 17.8 | 15.6 | 18.6 | 19.2 | ||
| 240 | 39.7 | 36.6 | 37.6 | 29.8 | ||
| 360 | 53.3 | 52.2 | 52.6 | 38.3 | ||
| 480 | 62.4 | 62.1 | 62.1 | 40.1 | ||
| 600 | 68.5 | 67.9 | 68 | 40.2 | ||
| 15 | 0.15 | 60 | 9.7 | 4.4 | 6.7 | 9 |
| 120 | 21.9 | 16.8 | 21.3 | 19.6 | ||
| 240 | 46.6 | 40.2 | 41.4 | 32.3 | ||
| 360 | 57.7 | 52.5 | 52.5 | 35.3 | ||
| 480 | 64 | 63.1 | 63.1 | 39.3 | ||
| 600 | 70.4 | 69.6 | 69.6 | 41.4 |
Retention of the correct model for purposeful, stepwise, backward, and forward selection methods, under 24 simulated conditions that vary confounding, non-candidate inclusion, and sample size levels.
Simulation results.
| n | Purposeful | Stepwise | Backward | Forward | |
| 0.13 | 60 | 9.7 | 6.3 | 10.3 | 10.8 |
| 120 | 25.8 | 19.8 | 24.9 | 23 | |
| 240 | 55.5 | 52 | 54.9 | 37.4 | |
| 360 | 66.4 | 65.5 | 65.8 | 38.7 | |
| 480 | 72.5 | 72.7 | 72.8 | 41.1 | |
| 600 | 71.4 | 72.9 | 72.9 | 42.9 | |
| 0.07 | 60 | 7.5 | 3.1 | 4.4 | 6.7 |
| 120 | 18.6 | 11.3 | 12.2 | 15.8 | |
| 240 | 32.2 | 22.5 | 22.9 | 21.4 | |
| 360 | 41.5 | 35.5 | 35.5 | 26.9 | |
| 480 | 47.9 | 44.5 | 44.5 | 34.6 | |
| 600 | 52 | 50.5 | 50.5 | 35.5 |
Retention of the correct model for purposeful, stepwise, backward, and forward selection methods, for two values of β2 while specifying confounding at 15% and non-candidate inclusion at 0.15.
WHAS data set variables.
| FSTAT | Status as of last follow-up (0 = Alive, 1 = Dead) |
| AGE | Age at hospital admission (Years) |
| SEX | Gender (0 = Male, 1 = Female) |
| HR | Initial heart rate (Beats per minute) |
| BMI | Body mass index (kg/m2) |
| CVD | History of cardiovascular disease (0 = No, 1 = Yes) |
| AFB | Atrial fibrillation (0 = No, 1 = Yes) |
| SHO | Cardiogenic shock (0 = No, 1 = Yes) |
| CHF | Congestive heart complications (0 = No, 1 = Yes) |
| AV3 | Complete heart block (0 = No, 1 = Yes) |
| MIORD | MI order (0 = First, 1 = Recurrent) |
| MITYPE | MI type (0 = non - Q-wave, 1 = Q-wave) |
WHAS data set variables retained in the final models for purposeful selection method under two different settings.
| AGE | <0.0001 | AGE | <0.0001 | AGE | <0.0001 |
| SHO | 0.0018 | SHO | 0.0029 | SHO | 0.0039 |
| HR | 0.0025 | HR | 0.0019 | HR | 0.0011 |
| MITYPE | 0.091 | MITYPE | 0.0586 | MITYPE | 0.0149 |
| MIORD | 0.1087 | AV3 | 0.0760 | AV3 | 0.0672 |
| BMI | 0.2035 | MIORD | 0.1285 | ||
| BMI | 0.2107 |
SAS PROC LOGISTIC forward, backward, and stepwise selection methods.
Figure 1macro flow chart.
%PurposefulSelection macro variables.
| DATASET | Input data set |
| OUTCOME | Main outcome (Y) |
| COVARIATES | All covariates (X1...X |
| PVALUEI | Inclusion criteria for multivariate model |
| PVALUER | Retention criteria for multivariate model |
| CHBETA | % change in parameter estimate indicating confounding |
| PVALUENC | Inclusion criteria for non-candidate |