Hongwei Liu1, Jing Li1, Junhong Leng2, Hui Wang1, Jinnan Liu1, Weiqin Li2, Hongyan Liu2, Shuo Wang2, Jun Ma1, Juliana Cn Chan3,4, Zhijie Yu5, Gang Hu6, Changping Li1, Xilin Yang1. 1. Department of Epidemiology and Biostatistics, School of Public Health, Tianjin Medical University, Tianjin, China. 2. Tianjin Women and Children's Health Center, Tianjin, China. 3. Department of Medicine and Therapeutics, Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Hong Kong SAR, China. 4. International Diabetes Federation Centre of Education, The Chinese University of Hong Kong, Hong Kong SAR, China. 5. Population Cancer Research Program and Department of Pediatrics, Dalhousie University, Halifax, Canada. 6. Chronic Disease Epidemiology Laboratory, Pennington Biomedical Research Center, Baton Rouge, Louisiana, USA.
Abstract
AIMS: This study aimed to develop a machine learning-based prediction model for gestational diabetes mellitus (GDM) in early pregnancy in Chinese women. MATERIALS AND METHODS: We used an established population-based prospective cohort of 19,331 pregnant women registered as pregnant before the 15th gestational week in Tianjin, China, from October 2010 to August 2012. The dataset was randomly divided into a training set (70%) and a test set (30%). Risk factors collected at registration were examined and used to construct the prediction model in the training dataset. Machine learning, that is, the extreme gradient boosting (XGBoost) method, was employed to develop the model, while a traditional logistic model was also developed for comparison purposes. In the test dataset, the performance of the developed prediction model was assessed by calibration plots for calibration and area under the receiver operating characteristic curve (AUR) for discrimination. RESULTS: In total, 1484 (7.6%) women developed GDM. Pre-pregnancy body mass index, maternal age, fasting plasma glucose at registration, and alanine aminotransferase were selected as risk factors. The machine learning XGBoost model-predicted probability of GDM was similar to the observed probability in the test data set, while the logistic model tended to overestimate the risk at the highest risk level (Hosmer-Lemeshow test p value: 0.243 vs. 0.099). The XGBoost model achieved a higher AUR than the logistic model (0.742 vs. 0.663, p < 0.001). This XGBoost model was deployed through a free, publicly available software interface (https://liuhongwei.shinyapps.io/gdm_risk_calculator/). CONCLUSION: The XGBoost model achieved better performance than the logistic model.
AIMS: This study aimed to develop a machine learning-based prediction model for gestational diabetes mellitus (GDM) in early pregnancy in Chinese women. MATERIALS AND METHODS: We used an established population-based prospective cohort of 19,331 pregnant women registered as pregnant before the 15th gestational week in Tianjin, China, from October 2010 to August 2012. The dataset was randomly divided into a training set (70%) and a test set (30%). Risk factors collected at registration were examined and used to construct the prediction model in the training dataset. Machine learning, that is, the extreme gradient boosting (XGBoost) method, was employed to develop the model, while a traditional logistic model was also developed for comparison purposes. In the test dataset, the performance of the developed prediction model was assessed by calibration plots for calibration and area under the receiver operating characteristic curve (AUR) for discrimination. RESULTS: In total, 1484 (7.6%) women developed GDM. Pre-pregnancy body mass index, maternal age, fasting plasma glucose at registration, and alanine aminotransferase were selected as risk factors. The machine learning XGBoost model-predicted probability of GDM was similar to the observed probability in the test data set, while the logistic model tended to overestimate the risk at the highest risk level (Hosmer-Lemeshow test p value: 0.243 vs. 0.099). The XGBoost model achieved a higher AUR than the logistic model (0.742 vs. 0.663, p < 0.001). This XGBoost model was deployed through a free, publicly available software interface (https://liuhongwei.shinyapps.io/gdm_risk_calculator/). CONCLUSION: The XGBoost model achieved better performance than the logistic model.
Authors: Seung Mi Lee; Suhyun Hwangbo; Errol R Norwitz; Ja Nam Koo; Ig Hwan Oh; Eun Saem Choi; Young Mi Jung; Sun Min Kim; Byoung Jae Kim; Sang Youn Kim; Gyoung Min Kim; Won Kim; Sae Kyung Joo; Sue Shin; Chan-Wook Park; Taesung Park; Joong Shin Park Journal: Clin Mol Hepatol Date: 2021-10-15