Qiong Wang1,2, Min Yang1,2, Bo Pang1,2, Mei Xue1,2, Yicheng Zhang1,2, Zhixin Zhang3,4, Wenquan Niu5. 1. Graduate School, Beijing University of Chinese Medicine, Beijing, China. 2. Department of Pediatrics, China-Japan Friendship Hospital, Beijing, China. 3. Department of Pediatrics, China-Japan Friendship Hospital, Beijing, China. zhangzhixin032@163.com. 4. International Medical Services, China-Japan Friendship Hospital, Beijing, China. zhangzhixin032@163.com. 5. Institute of Clinical Medical Sciences, China-Japan Friendship Hospital, Beijing, China. niuwenquan_shcn@163.com.
Abstract
OBJECTIVES: We adopted the machine-learning algorithms and deep-learning sequential model to determine and optimize most important factors for overweight and obesity in Chinese preschool-aged children. METHODS: This is a cross-sectional survey conducted in 2020 at Beijing and Tangshan. Using a stratified cluster random sampling strategy, children aged 3-6 years were enrolled. Data were analyzed using the PyCharm and Python. RESULTS: A total of 9478 children were eligible for inclusion, including 1250 children with overweight or obesity. All children were randomly divided into the training group and testing group at a 6:4 ratio. After comparison, support vector machine (SVM) outperformed the other algorithms (accuracy: 0.9457), followed by gradient boosting machine (GBM) (accuracy: 0.9454). As reflected by other 4 performance indexes, GBM had the highest F1 score (0.7748), followed by SVM with F1 score at 0.7731. After importance ranking, the top 5 factors seemed sufficient to obtain descent performance under GBM algorithm, including age, eating speed, number of relatives with obesity, sweet drinking, and paternal education. The performance of the top 5 factors was reinforced by the deep-learning sequential model. CONCLUSIONS: We have identified 5 important factors that can be fed to GBM algorithm to better differentiate children with overweight or obesity from the general children, with decent prediction performance.
OBJECTIVES: We adopted the machine-learning algorithms and deep-learning sequential model to determine and optimize most important factors for overweight and obesity in Chinese preschool-aged children. METHODS: This is a cross-sectional survey conducted in 2020 at Beijing and Tangshan. Using a stratified cluster random sampling strategy, children aged 3-6 years were enrolled. Data were analyzed using the PyCharm and Python. RESULTS: A total of 9478 children were eligible for inclusion, including 1250 children with overweight or obesity. All children were randomly divided into the training group and testing group at a 6:4 ratio. After comparison, support vector machine (SVM) outperformed the other algorithms (accuracy: 0.9457), followed by gradient boosting machine (GBM) (accuracy: 0.9454). As reflected by other 4 performance indexes, GBM had the highest F1 score (0.7748), followed by SVM with F1 score at 0.7731. After importance ranking, the top 5 factors seemed sufficient to obtain descent performance under GBM algorithm, including age, eating speed, number of relatives with obesity, sweet drinking, and paternal education. The performance of the top 5 factors was reinforced by the deep-learning sequential model. CONCLUSIONS: We have identified 5 important factors that can be fed to GBM algorithm to better differentiate children with overweight or obesity from the general children, with decent prediction performance.