Akihiro Shimoda1, Daisuke Ichikawa2, Hiroshi Oyama3. 1. Department of Clinical Information Engineering, Division of Social Medicine, Graduate School of Medicine, the University of Tokyo, Bunkyo-ku, Tokyo, Japan. Electronic address: shimoda-tky@umin.ac.jp. 2. Department of Clinical Information Engineering, Division of Social Medicine, Graduate School of Medicine, the University of Tokyo, Bunkyo-ku, Tokyo, Japan. 3. Department of Clinical Information Engineering, Division of Social Medicine, Graduate School of Medicine, the University of Tokyo, Bunkyo-ku, Tokyo, Japan; Department of Clinical Information Engineering, School of Public Health, Graduate School of Medicine, the University of Tokyo, Bunkyo-ku, Tokyo, Japan.
Abstract
OBJECTIVES: Since the launch of a nationwide general health check-up and instruction program in Japan in 2008, interest in strategies to improve implementation of the program based on predictive analytics has grown. We investigated the performance of prediction models developed to identify individuals classified as "requiring instruction" (high-risk) who were unlikely to participate in a health intervention program. METHODS: Data were obtained from one large health insurance union in Japan. The study population included individuals who underwent at least one general health check-up between 2008 and 2013 and were identified as "requiring instruction" in 2013. We developed three prediction models based on the gradient boosted trees (GBT), random forest (RF), and logistic regression (LR) algorithms using machine-learning techniques and compared the areas under the curve (AUC) of the developed models with those of two conventional methods The aim of the models was to identify at-risk individuals who were unlikely to participate in the instruction program in 2013 after being classified as requiring instruction at their general health check-up that year. RESULTS: At first we performed the analysis using data without multiple imputation. The AUC values for the GBT, RF, and LR prediction models and conventional methods: 1, and 2 were 0.893 (95%CI: 0.882-0.905), 0.889 (95%CI: 0.877-0.901), 0.885 (95%CI: 0.872-0.897), 0.784 (95%CI: 0.767-0.800), and 0.757 (95%CI: 0.741-0.773), respectively. Subsequently, we performed the analysis using data after multiple imputation. The AUC values for the GBT, RF, and LR prediction models and conventional methods: 1, and 2 were 0.894 (95%CI: 0.882-0.906), 0.889 (95%CI: 0.887-0.901), 0.885 (95%CI: 0.872-0.898), 0.784 (95%CI: 0.767-0.800), and 0.757 (95%CI: 0.741-0.773), respectively. In both analyses, the GBT model showed the highest AUC among that of other models, and statistically significant difference were found in comparison with the LR model, conventional method 1, and conventional method 2. CONCLUSION: The prediction models using machine-learning techniques outperformed existing conventional methods: for predicting participation in the instruction program among participants identified as "requiring instruction" (high-risk).
OBJECTIVES: Since the launch of a nationwide general health check-up and instruction program in Japan in 2008, interest in strategies to improve implementation of the program based on predictive analytics has grown. We investigated the performance of prediction models developed to identify individuals classified as "requiring instruction" (high-risk) who were unlikely to participate in a health intervention program. METHODS: Data were obtained from one large health insurance union in Japan. The study population included individuals who underwent at least one general health check-up between 2008 and 2013 and were identified as "requiring instruction" in 2013. We developed three prediction models based on the gradient boosted trees (GBT), random forest (RF), and logistic regression (LR) algorithms using machine-learning techniques and compared the areas under the curve (AUC) of the developed models with those of two conventional methods The aim of the models was to identify at-risk individuals who were unlikely to participate in the instruction program in 2013 after being classified as requiring instruction at their general health check-up that year. RESULTS: At first we performed the analysis using data without multiple imputation. The AUC values for the GBT, RF, and LR prediction models and conventional methods: 1, and 2 were 0.893 (95%CI: 0.882-0.905), 0.889 (95%CI: 0.877-0.901), 0.885 (95%CI: 0.872-0.897), 0.784 (95%CI: 0.767-0.800), and 0.757 (95%CI: 0.741-0.773), respectively. Subsequently, we performed the analysis using data after multiple imputation. The AUC values for the GBT, RF, and LR prediction models and conventional methods: 1, and 2 were 0.894 (95%CI: 0.882-0.906), 0.889 (95%CI: 0.887-0.901), 0.885 (95%CI: 0.872-0.898), 0.784 (95%CI: 0.767-0.800), and 0.757 (95%CI: 0.741-0.773), respectively. In both analyses, the GBT model showed the highest AUC among that of other models, and statistically significant difference were found in comparison with the LR model, conventional method 1, and conventional method 2. CONCLUSION: The prediction models using machine-learning techniques outperformed existing conventional methods: for predicting participation in the instruction program among participants identified as "requiring instruction" (high-risk).
Authors: Ji Hwan Park; Han Eol Cho; Jong Hun Kim; Melanie M Wall; Yaakov Stern; Hyunsun Lim; Shinjae Yoo; Hyoung Seop Kim; Jiook Cha Journal: NPJ Digit Med Date: 2020-03-26
Authors: Davide Barbieri; Nitesh Chawla; Luciana Zaccagni; Tonći Grgurinović; Jelena Šarac; Miran Čoklo; Saša Missoni Journal: Int J Environ Res Public Health Date: 2020-10-28 Impact factor: 3.390