| Literature DB >> 35054379 |
Sunmin Park1,2, Chaeyeon Kim1, Xuangao Wu2.
Abstract
BACKGROUND: Insulin resistance is a common etiology of metabolic syndrome, but receiver operating characteristic (ROC) curve analysis shows a weak association in Koreans. Using a machine learning (ML) approach, we aimed to generate the best model for predicting insulin resistance in Korean adults aged > 40 of the Ansan/Ansung cohort using a machine learning (ML) approach.Entities:
Keywords: HOMA-IR; XGboost; insulin resistance; liver function; machine learning; obesity
Year: 2022 PMID: 35054379 PMCID: PMC8774355 DOI: 10.3390/diagnostics12010212
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Analysis processing to generate a prediction model in the participants. (A) A total of 8842 adults participated, and 99 features were selected manually from 1411 in the Ansan/Ansung cohort to predict the insulin resistance model using the seven machine learning (ML) approach. Missing data were filled with the mean values for continuous variables and the mode values for the categorical variables. Data were normalized using the z-score. HOMA-IR was used as an indirect insulin resistance index, and 2.31 was used as the cutoff for participants of both genders. The prediction models for insulin resistance were generated using seven ML algorithms. (B) The Ansan/Ansung cohort participants were randomly divided into a training set of 80% and a test set of 20% participants. The best model was selected with a random grid search after 1000 repetitions in seven different ML algorithms, including linear regression, support vector machines (SVM), XGBoost (XGB), decision tree, random forest, K-nearest neighbor (KNN), and artificial neural network (ANN). The best prediction model was selected using the AUC of the ROC. The accuracy and k-fold cross-validation of the predicted models were assessed in the test set.
Characteristics of the participants in the Ansan/Ansung cohort.
| Men ( | Women ( | |||
|---|---|---|---|---|
| Low-IR ( | High-IR ( | Low-IR ( | High-IR ( | |
| Age (year) | 52.0 ± 0.15 b | 50.6 ± 0.34 c | 52.4 ± 0.14 a | 53.7 ± 0.31 a*** |
| HOMA-IR | 1.22 ± 0.03 c | 3.43 ± 0.08 a | 1.37 ± 0.03 b | 3.41 ± 0.07 a**### |
| BMI (mg/kg2) | 24.0 ± 0.06 d | 26.2 ± 0.13 b | 24.5 ± 0.05 c | 26.7 ± 0.12 a***### |
| Waist circumferences(cm) | 82.2 ± 0.21 c | 88.4 ± 0.54 a | 80.3 ± 0.22 d | 86.5 ± 0.46 b***### |
| Skeletal muscle mass index (%) | 35.4 ± 0.04 a | 33.9 ± 0.10 b | 30.8 ± 0.04 c | 29.4 ± 0.09 d***### |
| Fat mass (%) | 21.3 ± 0.09 d | 24.8 ± 0.21 c | 31.3 ± 0.09 b | 34.4 ± 0.20 a***### |
| MetS (%)9 | 558 (15.9) | 256 (37.8) *** | 813 (21.1) | 350 (43.3) *** |
| Serum glucose (mg/dL) | 86.0 ± 0.34 c | 112.4 ± 0.77 a | 81.7 ± 0.33 d | 101.2 ± 0.70 b***### |
| HbA1c (%) | 5.71 ± 0.15 c | 6.44 ± 0.04 a | 5.69 ± 0.15 c | 6.30 ± 0.03 b**### |
| Serum total cholesterol (mg/dL) | 190 ± 0.61 b | 199 ± 1.38 a | 190 ± 0.58 b | 199 ± 1.26 a### |
| Serum HDL (mg/dL) | 44.1 ± 0.17 b | 41.0 ± 0.39 d | 46.1 ± 0.16 a | 43.0 ± 0.35 c***### |
| Serum LDL (mg/dL) | 105 ± 0.83 c | 103 ± 2.2 c | 113 ± 0.85 b | 118 ± 1.83 a*** |
| Serum Triglyceride (mg/dL) | 169 ± 1.74 c | 227 ± 3.96 a | 142 ± 1.66 d | 183 ± 3.62 b***### |
| Serum CRP (mg/dL) | 0.24 ± 0.01 | 0.29 ± 0.02 | 0.21 ± 0.01 | 0.26 ± 0.02 |
| Pulse | 62.8 ± 0.13 | 64.8 ± 0.29 | 64.0 ± 0.12 | 67.3 ± 0.27 ***### |
| SBP (mmHg) | 119 ± 0.46 b | 125 ± 1.18 a | 119 ± 0.47 b | 127 ± 1.01 a### |
| DBP (mmHg) | 76.5 ± 0.27 | 80.7 ± 0.70 | 74.9 ± 0.28 | 80.0 ± 0.60 **### |
| Serum AST (U/L) | 32.4 ± 0.31 b | 34.5 ± 0.70 a | 27.0 ± 0.29 c | 28.0 ± 0.64 c***## |
| Serum ALT(U/L) | 31.8 ± 0.45 b | 43.6 ± 1.02 a | 22.4 ± 0.43 d | 27.8 ± 0.94 c***### |
Low-IR, low insulin resistance (≤2.31 HOMA-IR); High-IR, high insulin resistance (>2.31 HOMA-IR). HOMA-IR, homeostasis model assessment of insulin resistance; BMI, body mass index; HbA1c, hemoglobin A1c; HDL, high-density lipoprotein; LDL, low-density lipoprotein CRP, high-sensitive C-reactive protein; SBP, systolic blood pressure; DBP, diastolic blood pressure; AST, aspartate aminotransferase; ALT, alanine aminotransferase. Skeletal muscle mass index was calculated by dividing skeletal muscle mass by body weight × 100. * significantly different by genders at p < 0.05, ** at p < 0.01, *** at p < 0.001. ## significantly different by HOM-IR at p < 0.01, ### at p < 0.001. a,b,c Different superscript letters of the means indicate significant differences among the groups by Tukey’s test at p < 0.05.
Nutrient intake and lifestyle-related variables.
| Men ( | Women ( | |||
|---|---|---|---|---|
| Low-IR ( | High-IR ( | Low-IR ( | High-IR ( | |
| Energy (EER%) | 96.8 ± 0.66 b | 97.1 ± 1.49 b | 106 ± 0.62 a | 109 ± 1.38 a*** |
| CHO (En%) | 69.7 ± 0.12 b | 68.8 ± 0.27 b | 71.7 ± 0.11 a | 72.4 ± 0.25 a***++ |
| Fat (En%) | 15.3 ± 0.09 a | 15.9 ± 0.21 a | 13.6 ± 0.09 b | 13.0 ± 0.19 c***++ |
| SFA (En%) | 3.76 ± 0.04 a | 3.96 ± 0.09 a | 3.15 ± 0.04 b | 2.93 ± 0.08 b***++ |
| MUFA (En%) | 4.88 ± 0.04 a | 5.04 ± 0.09 a | 4.00 ± 0.04 b | 3.76 ± 0.09 b***++ |
| PUFA (En%) | 2.29 ± 0.02 a | 2.37 ± 0.04 a | 1.94 ± 0.02 b | 1.90 ± 0.03 b***+ |
| Protein (En%) | 13.7 ± 0.04 b | 14.1 ± 0.09 a | 13.5 ± 0.04 c | 13.4 ± 0.09 c***++ |
| Dietary fiber (g) | 6.92 ± 0.08 | 7.11 ± 0.16 | 7.16 ± 0.07 | 7.31 ± 0.15 |
| Vitamin C (mg) | 121 ± 2.17 b | 126 ± 4.57 b | 136 ± 2.09 a | 141 ± 4.30 a*** |
| Calcium (mg) | 486 ± 5.86 | 481 ± 12.3 | 482 ± 5.63 | 477 ± 11.6 |
| Sodium (g) | 3.37 ± 0.04 a | 3.39 ± 0.07 a | 3.02 ± 0.03 b | 3.03 ± 0.07 b*** |
| Alcohol intake (g/day) | 19.1 ± 0.35a | 19.5 ± 0.80 a | 1.29 ± 0.33 b | 1.48 ± 0.73 b*** |
| Smoking | ||||
| Former smoker | 166 (4.8) | 33 (4.9) | 46 (1.22) | 13 (1.65) |
| Smoker | 1567 (44.9) | 298 (44.1) | 86 (2.28) | 19 (2.41) |
| Regular exercise (yes, %) | 1043 (73.9) | 144 (68.3) | 933 (71.9) | 213 (77.2) |
EER, energy estimated requirement; CHO, carbohydrate; En%, energy percent. SFA, saturated fatty acids; MUFA, monounsaturated fatty acids; PUFA, polyunsaturated fatty acids. *** Significantly different by genders at p < 0.001. + significant interaction between gender and HOMA-IR at p < 0.05 and ++ at p < 0.01. a,b,c Different superscript letters of the means indicate significant differences among the groups by Tukey’s test at p < 0.05.
Figure 2Receiver operating characteristic (ROC) curve with insulin resistance and metabolic syndrome components for the metabolic syndrome risk.
The area under the curve (AUC) of the receiver operating characteristic (ROC) curve, accuracy, and k-fold of prediction models generated from machine-learning algorithms in the Ansan/Ansung cohort.
| 99 Features | Logistic | XGBoost | Decision | KNN | SVM | Random | ANN |
|---|---|---|---|---|---|---|---|
| AUC of ROC | 0.866 | 0.866 | 0.647 | 0.662 | 0.597 | 0.836 | 0.816 |
| Accuracy | 0.867 | 0.868 | 0.793 | 0.826 | 0.859 | 0.841 | |
| k-fold | 0.858 | 0.859 | 0.786 | 0.821 | 0.851 | 0.833 | |
| Top 15 features | |||||||
| AUC of ROC | 0.849 | 0.853 (0.853–0.854) | 0.639 (0.638–0.640) | 0.694 (0.693–0.695) | 0.574 | 0.831 | 0.822 |
| Accuracy | 0.868 | 0.877 | 0.798 | 0.837 | 0.855 | 0.860 | |
| k-fold | 0.856 | 0.861 | 0.777 | 0.827 | 0.850 | 0.856 | |
| Top 9 features | |||||||
| AUC of ROC | 0.849 | 0.853 (0.852–0.853) | 0.636 (0.635–0.636) | 0.691 (0.690–0.692) | 0.561 (0.560–0.561) | 0.836 | 0.862 |
| Accuracy | 0.867 | 0.868 | 0.791 | 0.834 | 0.853 | 0.862 | |
| k-fold | 0.856 | 0.861 | 0.779 | 0.828 | 0.848 | 0.857 | |
Prediction models were generated from the training set with 80% of the Ansan/Ansung cohort, and its 20% was used as a test set. KNN, K-Nearest Neighbor; SVM, support vector machine; ANN, artificial neural network. The top 15-feature prediction model generated from XGBoost included serum glucose, waist circumference, blood HbA1c, serum total bilirubin, season to enroll the study, body fat, pulse, hip circumference, serum HDL, ALT, and γ-GTP, gender, serum creatinine, residence area, and PRS for insulin resistance. The top 9-feature prediction model generated from XGBoost contained serum glucose, waist circumference, body fat, serum ALT, serum total bilirubin, pulse, serum HDL, and gender.
Figure 3The relative importance of the top 15 features for predicting insulin resistance (IR), as determined by the XGBoost and random forest algorithms. (a) IR prediction model by the XGBoost algorithm 3. (b) IR prediction model by the random forest algorithm. ALT, alanine aminotransferase; HbA1c, hemoglobin A1c; γ-GTP, γ-glutamyl transpeptidase; HDL, high-density lipoprotein; CRP, high-sensitive C-reactive protein; PRS, polygenetic risk scores.
Figure 4Positive and negative impact explanation of the top 15 features for predicting insulin resistance (IR) using SHAP values. (a) Explanation of each feature impact on the IR in the prediction model by the SHAP values in the XGBoost algorithm. (b) Explanation of each feature impact on the IR in the prediction model by the SHAP values in random forest algorithm. ALT, alanine aminotransferase; HbA1c, hemoglobin A1c; γ-GTP, γ-glutamyl transpeptidase; HDL, high-density lipoprotein; CRP, high-sensitive C-reactive protein; PRS, polygenetic risk scores.
Figure 5The relative importance of the top nine features for the insulin resistance (IR) prediction, as determined by the XGBoost and random forest algorithms. (a) IR prediction model with top 9 features using the XGBoost algorithm. (b) IR prediction model with top 9 features using the random forest algorithm. (c) Explanation of each feature impact on the IR prediction model by SHAP values using the XGBoost algorithm.
Figure 6Summary of the main findings to explore the prediction model for insulin resistance. HOMA-IR, homeostasis model assessment of insulin resistance.