Stephanie Watkins 1 , Michele Jonsson-Funk , M Alan Brookhart , Steven A Rosenberg , T Michael O'Shea , Julie Daniels . Show Affiliations »
Abstract
OBJECTIVE: To illustrate the use of ensemble tree-based methods (random forest classification [RFC] and bagging) for propensity score estimation and to compare these methods with logistic regression, in the context of evaluating the effect of physical and occupational therapy on preschool motor ability among very low birth weight (VLBW) children. DATA SOURCE: We used secondary data from the Early Childhood Longitudinal Study Birth Cohort (ECLS-B) between 2001 and 2006. STUDY DESIGN: We estimated the predicted probability of treatment using tree-based methods and logistic regression (LR). We then modeled the exposure-outcome relation using weighted LR models while considering covariate balance and precision for each propensity score estimation method. PRINCIPAL FINDINGS: Among approximately 500 VLBW children, therapy receipt was associated with moderately improved preschool motor ability. Overall, ensemble methods produced the best covariate balance (Mean Squared Difference: 0.03-0.07) and the most precise effect estimates compared to LR (Mean Squared Difference: 0.11). The overall magnitude of the effect estimates was similar between RFC and LR estimation methods. CONCLUSION: Propensity score estimation using RFC and bagging produced better covariate balance with increased precision compared to LR. Ensemble methods are a useful alterative to logistic regression to control confounding in observational studies. © Health Research and Educational Trust.
OBJECTIVE: To illustrate the use of ensemble tree-based methods (random forest classification [RFC] and bagging) for propensity score estimation and to compare these methods with logistic regression, in the context of evaluating the effect of physical and occupational therapy on preschool motor ability among very low birth weight (VLBW) children . DATA SOURCE: We used secondary data from the Early Childhood Longitudinal Study Birth Cohort (ECLS-B) between 2001 and 2006. STUDY DESIGN: We estimated the predicted probability of treatment using tree-based methods and logistic regression (LR). We then modeled the exposure-outcome relation using weighted LR models while considering covariate balance and precision for each propensity score estimation method. PRINCIPAL FINDINGS: Among approximately 500 VLBW children , therapy receipt was associated with moderately improved preschool motor ability. Overall, ensemble methods produced the best covariate balance (Mean Squared Difference: 0.03-0.07) and the most precise effect estimates compared to LR (Mean Squared Difference: 0.11). The overall magnitude of the effect estimates was similar between RFC and LR estimation methods. CONCLUSION: Propensity score estimation using RFC and bagging produced better covariate balance with increased precision compared to LR. Ensemble methods are a useful alterative to logistic regression to control confounding in observational studies. © Health Research and Educational Trust.
Entities: Disease
Species
Keywords:
Propensity scores; ensemble methods; tree-based methods
Mesh: See more »
Year: 2013
PMID: 23701015 PMCID: PMC3796115 DOI: 10.1111/1475-6773.12068
Source DB: PubMed Journal: Health Serv Res ISSN: 0017-9124 Impact factor: 3.402