Akihiro Shimoda1, Daisuke Ichikawa2, Hiroshi Oyama3. 1. Department of Clinical Information Engineering, Division of Social Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan. Electronic address: shimoda-tky@umin.ac.jp. 2. Department of Clinical Information Engineering, Division of Social Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan. 3. Department of Clinical Information Engineering, Division of Social Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan; Department of Clinical Information Engineering, School of Public Health, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
Abstract
BACKGROUND: In the time since the launch of a nationwide general health check-up and instruction program in Japan in 2008, interest in the formulation of an effective and efficient strategy to improve the participation rate has been growing. The aim of this study was to develop and evaluate models identifying those who are unlikely to undergo general health check-ups. We used machine-learning methods to select interventional targets more efficiently. METHODS: We used information from a local government database of Japan. The study population included 7290 individuals aged 40-74 years who underwent at least one general health check-up between 2012 and 2015. We developed four predictive models based on the extreme gradient boosting (XGBoost), random forest (RF), support vector machines (SVMs), and logistic regression (LR) algorithms, using machine-learning techniques, and compared the areas under the curves (AUCs) of the models with those of the heuristic method (which presumes that the individuals who underwent a general health check-up in the previous year will do so again in the following year). RESULTS: The AUCs for the XGBoost, RF, SVMs, LR, and heuristic models/method were 0.829 (95% confidence interval [CI]: 0.806-0.853), 0.821 (95% CI: 0.797-0.845), 0.812 (95% CI: 0.787-0.837), 0.816 (95% CI: 0.791-0.841), and 0.683 (95% CI: 0.657-0.708), respectively. XGBoost model exhibited the best AUC, and the performance was significantly better than that of SVMs (p = 0.034), LR (p = 0.017), and heuristic method (p < 0.001). However, the performance of XGBoost did not differ significantly from that of RF (p = 0.229). CONCLUSION: Predictive models using machine-learning techniques outperformed the existing heuristic method when used to predict participation in a general health check-up system by eligible participants.
BACKGROUND: In the time since the launch of a nationwide general health check-up and instruction program in Japan in 2008, interest in the formulation of an effective and efficient strategy to improve the participation rate has been growing. The aim of this study was to develop and evaluate models identifying those who are unlikely to undergo general health check-ups. We used machine-learning methods to select interventional targets more efficiently. METHODS: We used information from a local government database of Japan. The study population included 7290 individuals aged 40-74 years who underwent at least one general health check-up between 2012 and 2015. We developed four predictive models based on the extreme gradient boosting (XGBoost), random forest (RF), support vector machines (SVMs), and logistic regression (LR) algorithms, using machine-learning techniques, and compared the areas under the curves (AUCs) of the models with those of the heuristic method (which presumes that the individuals who underwent a general health check-up in the previous year will do so again in the following year). RESULTS: The AUCs for the XGBoost, RF, SVMs, LR, and heuristic models/method were 0.829 (95% confidence interval [CI]: 0.806-0.853), 0.821 (95% CI: 0.797-0.845), 0.812 (95% CI: 0.787-0.837), 0.816 (95% CI: 0.791-0.841), and 0.683 (95% CI: 0.657-0.708), respectively. XGBoost model exhibited the best AUC, and the performance was significantly better than that of SVMs (p = 0.034), LR (p = 0.017), and heuristic method (p < 0.001). However, the performance of XGBoost did not differ significantly from that of RF (p = 0.229). CONCLUSION: Predictive models using machine-learning techniques outperformed the existing heuristic method when used to predict participation in a general health check-up system by eligible participants.