Chengcheng Zhang1, Yujia Ding1,2, Qidi Peng1,2. 1. Claremont Graduate University, Department of Economic Sciences, 150 E. 10th Street, California Claremont, USA. 2. Claremont Graduate University, Institute of Mathematical Sciences, 150 E. 10th Street, California Claremont, USA.
Abstract
PURPOSE: Healthcare out-of-pocket (OOP) costs consist of the annual expenses paid by individuals or families that are not reimbursed by insurance. In the U.S, broadening healthcare disparities are caused by the rapid increase in OOP costs. With a precise forecast of the OOP costs, governments can improve the design of healthcare policies to better control the OOP costs. This study designs a purely data-driven ensemble learning procedure to achieve a collection of factors that best predict OOP costs. METHODS: We propose a voting ensemble learning procedure to rank and select factors of OOP costs based on the Medical Expenditure Panel Survey dataset. The method involves utilizing votes from the base learners forward subset selection, backward subset selection, random forest, and LASSO. RESULTS: The top-ranking factors selected by our proposed method are insurance type, age, asthma, family size, race, and number of physician office visits. The predictive models using these factors outperform the models that employ the factors commonly considered by the literature through improving the prediction error (test MSE of the OOP costs' log-odds) from 0.462 to 0.382. CONCLUSION: Our results indicate a set of factors which best explain the OOP costs behavior based on a purely data-driven solution. These findings contribute to the discussions regarding demand-side needs for containing rapidly rising OOP costs. Instead of estimating the impact of a single factor on OOP costs, our proposed method allows for the selection of arbitrary-sized factors to best explain OOP costs.
PURPOSE: Healthcare out-of-pocket (OOP) costs consist of the annual expenses paid by individuals or families that are not reimbursed by insurance. In the U.S, broadening healthcare disparities are caused by the rapid increase in OOP costs. With a precise forecast of the OOP costs, governments can improve the design of healthcare policies to better control the OOP costs. This study designs a purely data-driven ensemble learning procedure to achieve a collection of factors that best predict OOP costs. METHODS: We propose a voting ensemble learning procedure to rank and select factors of OOP costs based on the Medical Expenditure Panel Survey dataset. The method involves utilizing votes from the base learners forward subset selection, backward subset selection, random forest, and LASSO. RESULTS: The top-ranking factors selected by our proposed method are insurance type, age, asthma, family size, race, and number of physician office visits. The predictive models using these factors outperform the models that employ the factors commonly considered by the literature through improving the prediction error (test MSE of the OOP costs' log-odds) from 0.462 to 0.382. CONCLUSION: Our results indicate a set of factors which best explain the OOP costs behavior based on a purely data-driven solution. These findings contribute to the discussions regarding demand-side needs for containing rapidly rising OOP costs. Instead of estimating the impact of a single factor on OOP costs, our proposed method allows for the selection of arbitrary-sized factors to best explain OOP costs.