| Literature DB >> 23940836 |
Amy S Nowacki1, Brian J Wells, Changhong Yu, Michael W Kattan.
Abstract
Background. Propensity score usage seems to be growing in popularity leading researchers to question the possible role of propensity scores in prediction modeling, despite the lack of a theoretical rationale. It is suspected that such requests are due to the lack of differentiation regarding the goals of predictive modeling versus causal inference modeling. Therefore, the purpose of this study is to formally examine the effect of propensity scores on predictive performance. Our hypothesis is that a multivariable regression model that adjusts for all covariates will perform as well as or better than those models utilizing propensity scores with respect to model discrimination and calibration. Methods. The most commonly encountered statistical scenarios for medical prediction (logistic and proportional hazards regression) were used to investigate this research question. Random cross-validation was performed 500 times to correct for optimism. The multivariable regression models adjusting for all covariates were compared with models that included adjustment for or weighting with the propensity scores. The methods were compared based on three predictive performance measures: (1) concordance indices; (2) Brier scores; and (3) calibration curves. Results. Multivariable models adjusting for all covariates had the highest average concordance index, the lowest average Brier score, and the best calibration. Propensity score adjustment and inverse probability weighting models without adjustment for all covariates performed worse than full models and failed to improve predictive performance with full covariate adjustment. Conclusion. Propensity score techniques did not improve prediction performance measures beyond multivariable adjustment. Propensity scores are not recommended if the analytical goal is pure prediction modeling.Entities:
Keywords: Calibration curve; Concordance index; Multivariable regression; Prediction; Propensity score
Year: 2013 PMID: 23940836 PMCID: PMC3740143 DOI: 10.7717/peerj.123
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
List of models used for comparison of prediction performance measures.
| Model | Description |
|---|---|
| Naïve | Treatment |
| All | Treatment |
| All covariates | |
| PS | Treatment |
| Continuous propensity score | |
| PS quintile | Treatment |
| Categorical propensity score | |
| PS + Select | Treatment |
| Continuous propensity score | |
| Select covariates | |
| PS + All | Treatment |
| Continuous propensity score | |
| All covariates | |
| IPW | Treatment |
| Inverse probability weighting | |
| IPW + All | Treatment |
| Inverse probability weighting | |
| All covariates | |
| Multi PS | Treatment |
| Continuous multinomial propensity scores | |
| Multi PS + All | Treatment |
| Continuous multinomial propensity scores | |
| All covariates | |
| Multi IPW | Treatment |
| Multinomial inverse probability weighting | |
| Multi IPW + All | Treatment |
| Multinomial inverse probability weighting | |
| All covariates |
Steps of the modeling approach.
| Modeling approach |
|---|
| 1. Begin with full dataset. |
| 2. Randomly select 90% of full dataset as Training dataset; remaining 10% of full dataset is Test dataset. |
| 3. Fit propensity model to the Training dataset. Use this model to obtain propensity scores for patients in both the Training and Test datasets. |
| 4. Fit each of the 12 predictive models to the Training dataset. |
| 5. Use model coefficients to obtain predicted probabilities for the Test dataset; do this for each of the 12 predictive models. |
| 6. Calculate prediction performance measures ( |
| 7. Repeat steps 2–6, 500 times. |
Figure 1Predictive accuracy by calibration curve among the models in the NSQIP, UNOS and DIABETES studies.
Discrimination by concordance index and overfitting by shrinkage factor among the models in the NSQIP, UNOS and DIABETES studies.
| Study | Performance measure | Naïve | All | PS | PS | PS + Select | PS + All | IPW | IPW + All | Multi | Multi | Multi | Multi |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NSQIP | median | 0.54 | 0.66 | 0.57 | 0.56 | 0.57 | 0.66 | 0.54 | 0.64 | ||||
| std error | 0.001 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.001 | 0.003 | |||||
| median shrinkage factor | 0.88 | 0.82 | 0.84 | 0.75 | 0.76 | 0.82 | 0.93 | 0.93 | |||||
| UNOS | median | 0.50 | 0.71 | 0.49 | 0.49 | 0.62 | 0.71 | 0.48 | 0.71 | ||||
| std error | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | |||||
| median shrinkage factor | −1 | 0.95 | −0.71 | −0.37 | 0.96 | 0.95 | −7 | 0.98 | |||||
| Diabetes | median | 0.63 | 0.75 | 0.74 | 0.74 | 0.75 | 0.62 | 0.75 | 0.74 | 0.75 | 0.62 | 0.75 | |
| std error | 0.0008 | 0.0006 | 0.0008 | 0.0006 | 0.0011 | 0.0006 | 0.0008 | 0.0006 | 0.0006 | 0.0008 | 0.0006 | ||
| median shrinkage factor | 0.996 | 0.98 | 0.998 | 0.997 | 0.98 | 0.99 | 0.99 | 0.996 | 0.98 | 0.99 | 0.996 |
Notes.
Shading represents best performing model(s) according to the c-statistic.
Negative shrinkage factors result when treatment variable is poor predictor of outcome and hence a very small likelihood ratio value.