| Literature DB >> 35272434 |
Juyoung Shin1,2, Jaewon Kim3, Chanjung Lee3, Joon Young Yoon3, Seyeon Kim3, Seungjae Song3, Hun-Sung Kim2,4.
Abstract
BACKGROUND: There are many models for predicting diabetes mellitus (DM), but their clinical implication remains vague. Therefore, we aimed to create various DM prediction models using easily accessible health screening test parameters.Entities:
Keywords: Diabetes mellitus; Electronic health records; Machine learning; Probability; Risk assessment
Mesh:
Year: 2022 PMID: 35272434 PMCID: PMC9353566 DOI: 10.4093/dmj.2021.0115
Source DB: PubMed Journal: Diabetes Metab J ISSN: 2233-6079 Impact factor: 5.893
Fig. 1.Design of the four diabetes prediction models and selection of study subjects. The medical records contained 3,952 diabetic and 134,691 non-diabetic individuals. Model-1 and Model-2 were 2- and 1-year prediction models, respectively, for non-diabetic subjects. Subjects with data of the previous 24 months before diabetes mellitus (DM) diagnosis were included in the Model-1 (752 diabetic and 26,175 non-diabetic individuals). Subjects with data of the previous 12 months before DM diagnosis were included in the Model-2 (641 diabetic and 33,380 non-diabetic individuals). Model-3 and Model-4 were the 1-year prediction models for prediabetic subjects and model-4 was constructed after learning the difference between 1 and 2 years before diabetes diagnosis. From subjects of Model-2, subjects with prediabetic condition on previous 12 months were selected for the Model-3 (519 diabetic and 6,345 prediabetic individuals). From subjects of Model-3, subjects with data of the previous 24 months were selected for the Model-4 (281 diabetic and 3,814 prediabetic individuals). Non-diabetics were randomly selected from subjects without diabetes according to the design of each model. The number of non-diabetic or prediabetic subjects was adjusted to be the same in each model. Gradient boosting algorithms were used for Models-1, -2, and -3, and random forest algorithms were used for Model-4.
Diabetes risk prediction model performance evaluation parameters
| Variable | Model-1 | Model-2 | Model-3 | Model-4 |
|---|---|---|---|---|
| No. of diabetic subjects | 752 | 641 | 519 | 281 |
| With 62 variables | ||||
| Accuracy | 0.858 | 0.867 | 0.834 | 0.841 |
| Recall | 0.856 | 0.872 | 0.823 | 0.850 |
| Precision | 0.857 | 0.870 | 0.837 | 0.840 |
| ROC-AUC | 0.916 | 0.928 | 0.891 | 0.925 |
| With 27 variables | ||||
| Accuracy | 0.807 | 0.815 | 0.770 | 0.793 |
| Recall | 0.817 | 0.779 | 0.744 | 0.834 |
| Precision | 0.802 | 0.853 | 0.793 | 0.771 |
| ROC-AUC | 0.878 | 0.880 | 0.842 | 0.873 |
ROC-AUC, area under the receiver operating characteristic curve.
Fig. 2.Variable importance in the simplified diabetes risk prediction models with 27 variables. The Gini importance of the 27 variables is presented in reference to that of fasting blood glucose, which was set as 1.0. BMI, body mass index; GTP, glutamyl transpeptidase; ALT, alanine aminotransferase; LDL, low-density lipoprotein; HDL, high-density lipoprotein; BP, blood pressure; AST, aspartate transaminase; DM, diabetes mellitus.
Diabetes 2-year risk prediction model performance evaluation parameters dependence on the “fasting glucose” parameters compared to Model-1 (n=641)
| Variable | Model-1 | Model-1A | Model-1B | Model-1C | Model-1D | Model-1E |
|---|---|---|---|---|---|---|
| Accuracy | 0.807 (0.030) | 0.714 (0.037) | 0.803 (0.035) | 0.804 (0.046) | 0.809 (0.042) | 0.820 (0.026) |
| Recall | 0.817 (0.043) | 0.747 (0.047) | 0.791 (0.035) | 0.809 (0.040) | 0.836 (0.052) | 0.826 (0.038) |
| Precision | 0.802 (0.037) | 0.702 (0.036) | 0.813 (0.048) | 0.803 (0.053) | 0.794 (0.044) | 0.818 (0.038) |
| ROC-AUC | 0.878 (0.023) | 0.793 (0.027) | 0.862 (0.030) | 0.879 (0.034) | 0.882 (0.030) | 0.898 (0.021) |
| Kappa score | 0.575 | 0.420 | 0.595 | 0.563 | 0.608 | 0.598 |
Model-1: 27 variables; Model-1A: variables of Model-1 except fasting blood glucose; Model-1B: fasting blood glucose; Model-1C: age, sex, body mass index (BMI), fasting blood glucose; Model-1D: age, BMI, waist circumference, systolic and diastolic blood pressure, pulse rate, family history of diabetes, hemoglobin, fasting blood glucose; Model-1E: fasting blood glucose, glycosylated hemoglobin.
ROC-AUC, area under the receiver operating characteristic curve.