| Literature DB >> 36193541 |
Abstract
Background: Due to the increasing insulin resistance (IR) in childhood, rates of diabetes and cardiovascular disease may rise in the future and seriously threaten the healthy development of children. Finding an easy way to predict IR in children can help pediatricians to identify these children in time and intervene appropriately, which is particularly important for practitioners in primary health care. Patients andEntities:
Keywords: artificial intelligence; children; insulin resistance; machine learning
Year: 2022 PMID: 36193541 PMCID: PMC9526431 DOI: 10.2147/DMSO.S380772
Source DB: PubMed Journal: Diabetes Metab Syndr Obes ISSN: 1178-7007 Impact factor: 3.249
Grid Search Parameter Intervals for Model Training
| Model | Parameter 1 | Parameter 2 | Parameter 3 | Parameter 4 | Parameter 5 |
|---|---|---|---|---|---|
| RF | n_estimators: range (30, 80, 10) | Max_depth: range (3, 10, 2) | Min_samples_leaf: [5, 6, 7] | Max_features: [1, 2, 3] | |
| LR | Max_iter: [20, 40, 60, 100] | C: [0.01, 0.1, 1, 10] | |||
| SVM | Kernel: [linear, poly, rbf] | C: [1, 10, 100] | Gamma: [1, 0.1, 0.01, 0.001] | ||
| XGBoost | n_estimators: range (80, 200, 20) | Max_depth: range (2, 15, 2) | Subsample: np.linspace (0.7, 0.9, 3) | Colsample_bytree: np.linspace (0.5, 0.98, 4) | Min_child_weight: range (1, 9, 3) |
| CatBoost | Depth: range (4, 10, 1) | Learning_rate: [0.05, 0.1, 0.15] | l2_leaf_reg: [1, 4, 9] |
Abbreviations: LR, logistic regression; SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting; CatBoost, gradient boosting with categorical features support.
Figure 1Flowchart of the model training and test procedure in this study.
Figure 2Distribution of missing features in the training set and test set.
Feature Comparison Between the Training Set and Test Set- Continuous Variables (Mean± SD)
| Training Set | Test Set | t | p | |
|---|---|---|---|---|
| Age | 9.64±1.74 | 9.40±1.75 | 1.351 | 0.177 |
| BMI (kg/m2) | 16.69±3.04 | 24.63±3.40 | −24.112 | <0.001 |
| SBP (mmHg) | 97.43±12.62 | 113.93±11.50 | −13.462 | <0.001 |
| DBP (mmHg) | 64.98±9.00 | 64.36±8.28 | 0.715 | 0.475 |
| Hip circumference (cm) | 71.40±9.87 | 99.24±10.77 | −27.967 | <0.001 |
| Waist circumference (cm) | 59.51±8.93 | 82.57±10.82 | −22.289 | <0.001 |
| Sleep duration (hours/day) | 9.91±1.01 | 9.13±0.80 | 0.688 | 0.492 |
| Sports activities in school per week (times/week) | 7.71±5.55 | 4.24±1.97 | 11.395 | <0.001 |
| HGB (g/L) | 134.30±15.15 | 137.83±8.78 | −3.423 | 0.001 |
| WBC (*109/L) | 6.75±1.85 | 8.38±2.07 | −8.106 | <0.001 |
| RBC (*109/L) | 4.79±0.56 | 5.02±0.33 | −6.007 | <0.001 |
| PLT (*109/L) | 280.44±78.09 | 309.60±62.05 | −4.487 | <0.001 |
| Glucose (mmol/L) | 4.89±0.82 | 5.02±0.41 | −1.805 | 0.072 |
Note: p values <0.05 were considered significant.
Abbreviations: BMI, body mass index; SBP, systolic blood pressure; DBP, diastolic blood pressure; HGB, haemoglobin; WBC, leukocytes; RBC, erythrocytes; PLT, platelets.
Feature Comparison Between the Training Set and Test Set - Categorical Variables
| Training Set | Test Set | p | ||
|---|---|---|---|---|
| Males, n, (%) | 271(55.1%) | 78(60.5%) | 1.203 | 0.273 |
| Children with IR, n, (%) | 155(31.6%) | 96(74.4%) | 95.891 | <0.001 |
| Outdoor sports activities, n, (%) | 174(55.1%) | 89(60.5%) | 47.333 | <0.001 |
| Sports activities in school, n, (%) | 425(86.4%) | 103(79.8%) | 3.043 | 0.064 |
Note: p values <0.05 were considered significant.
Figure 3Violin plot depicting the comparison of basic information between the training set and the external test set.
Figure 4Feature selection for different models by the RFE method. (A) LR; (B) RF; (C) SVM; (D) XGBoost; (E) CatBoost.
Features Selected by Different Models, Displayed in Descending Order of Importance Percentage
| Rank | LR | SVM | RF | XGBoost | CatBoost |
|---|---|---|---|---|---|
| 1 | Glucose (44.69%) | Glucose (42.23%) | Glucose (16.62%) | Age (17.31%) | PLT (18.87%) |
| 2 | Age (17.63%) | Gender (16.27%) | Hip circumference (12.77%) | Hip circumference (8.82%) | SBP (18.47%) |
| 3 | Sports activities in school (16.80%) | Age (15.73%) | DBP (12.49%) | Glucose (8.72%) | Glucoses (17.49%) |
| 4 | Gender (15.29%) | Sports activities in school (12.96%) | Waist circumference (12.30%) | Waist circumference (8.04%) | HGB (16.62%) |
| 5 | WBC (3.89%) | Outdoor sports activities (6.74%) | SBP (9.26%) | BMI (7.94%) | Hip circumference (15.40%) |
| 6 | Waist circumference (1.70%) | WBC (1.85%) | WBC (8.31%) | HGB (6.90%) | Waist circumference (13.16%) |
| 7 | DBP (1.69%) | HGB (7.86%) | WBC (6.16%) | ||
| 8 | Waist circumference (1.33%) | RBC (6.99%) | PLT (5.95%) | ||
| 9 | RBC (1.21%) | PLT (6.86%) | Gender (5.73%) | ||
| 10 | Sports activities in school per week (6.54%) | Sports activities in school (5.46%) | |||
| 11 | RBC (5.31%) | ||||
| 12 | DBP (5.06%) | ||||
| 13 | Sports activities in school per week (4.57%) | ||||
| 14 | SBP (4.03%) |
Abbreviations: LR, logistic regression; SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting; CatBoost, gradient boosting with categorical features support; HGB, haemoglobin; WBC, leukocytes; RBC, erythrocytes; PLT, platelets; BMI, body mass index; SBP, systolic blood pressure; DBP, diastolic blood pressure.
Figure 5ROC curves for the test set.
Evaluation Metrics for the Different Models
| Accuracy | Sensitivity | Specificity | Precision | F1 Score | |
|---|---|---|---|---|---|
| RF | 0.81 | 0.88 | 0.61 | 0.87 | 0.87 |
| LR | 0.80 | 0.89 | 0.55 | 0.85 | 0.87 |
| SVM | 0.74 | 0.77 | 0.67 | 0.87 | 0.82 |
| XGBoost | 0.78 | 0.78 | 0.79 | 0.91 | 0.84 |
| CatBoost | 0.77 | 0.79 | 0.69 | 0.88 | 0.84 |
Abbreviations: LR, logistic regression; SVM, support vector machine; RF, random forest; XGBoost, extreme gradient boosting; CatBoost, gradient boosting with categorical features support.