| Literature DB >> 35387629 |
Junho Kim1, Sujeong Mun1, Siwoo Lee1, Kyoungsik Jeong1, Younghwa Baek2.
Abstract
BACKGROUND: Metabolic syndrome (MetS) is a complex condition that appears as a cluster of metabolic abnormalities, and is closely associated with the prevalence of various diseases. Early prediction of the risk of MetS in the middle-aged population provides greater benefits for cardiovascular disease-related health outcomes. This study aimed to apply the latest machine learning techniques to find the optimal MetS prediction model for the middle-aged Korean population.Entities:
Keywords: Data sampling method; Feature importance; Machine learning; Metabolic syndrome; SMOTE
Mesh:
Year: 2022 PMID: 35387629 PMCID: PMC8985311 DOI: 10.1186/s12889-022-13131-x
Source DB: PubMed Journal: BMC Public Health ISSN: 1471-2458 Impact factor: 3.295
General participant characteristics
| Total | Normal group | MetS group | ||
|---|---|---|---|---|
| Sex | 1991 (100) | 1317 (66.1) | 674 (33.9) | |
| Male | 608 (30.5) | 297 (48.8) | 311 (51.2) | < 0.001 |
| Female | 1383 (69.5) | 1020 (73.8) | 363 (26.2) | |
| Age (years) | 43.81 ± 6.86 | 43.12 ± 6.83 | 45.17 ± 6.73 | < 0.001 |
| 30–44 | 1006 (50.5) | 728 (72.4) | 278 (27.6) | 0.373 |
| 45–55 | 985 (49.5) | 589 (59.8) | 396 (40.2) | 0.003 |
| BMI (kg/m2) | 24.34 ± 3.62 | 22.96 ± 2.79 | 27.05 ± 3.53 | < 0.001 |
| WHR | 0.86 ± 0.06 | 0.84 ± 0.05 | 0.90 ± 0.05 | < 0.001 |
| Alcohol status | ||||
| non-drinker | 716 (100) | 500 (69.8) | 216 (30.2) | 0.007 |
| former drinker | 76 (100) | 52 (68.4) | 24 (31.6) | |
| current drinker | 1199 (100) | 765 (63.8) | 434 (36.2) | |
| Smoking status | ||||
| non-smoker | 1585 (100) | 1122 (70.8) | 463 (29.2) | < 0.001 |
| former smoker | 162 (100) | 75 (46.3) | 87 (53.7) | |
| current smoker | 244 (100) | 120 (49.8) | 124 (50.8) | |
| KM type | ||||
| Taeumin | 1012 (100) | 492 (48.6) | 520 (51.4) | < 0.001 |
| Soeumin | 397 (100) | 351 (88.4) | 46 (11.6) | |
| Soyangin | 582 (100) | 474 (81.4) | 108 (18.6) | |
| PA (METs) | 2538 ± 3798.53 | 2606.85 ± 39,258.16 | 2405.60 ± 3527.73 | 0.264 |
| Sleep time (h) | 6.71 ± 1.06 | 6.74 ± 1.04 | 6.66 ± 1.09 | 0.139 |
| Sleep quality | 4.69 ± 2.86 | 4.69 ± 2.81 | 4.68 ± 2.97 | 0.959 |
| Eating index | 51.42 ± 10.50 | 51.72 ± 10.46 | 50.82 ± 10.53 | 0.069 |
| Stress | 17.65 ± 7.07 | 17.60 ± 7.26 | 17.75 ± 6.69 | 0.657 |
| AST (U/L) | 24.89 ± 12.14 | 23.53 ± 10.06 | 27.55 ± 15.07 | < 0.001 |
| ALT (U/L) | 23.97 ± 19.81 | 20.19 ± 14.78 | 31.35 ± 25.48 | < 0.001 |
| ALP (U/L) | 63.70 ± 18.68 | 60.66 ± 17.68 | 69.62 ± 19.15 | < 0.001 |
| hsCRP (mg/L) | 1.25 ± 2.76 | 1.00 ± 2.69 | 1.73 ± 2.85 | < 0.001 |
| HbA1c (%) | 5.48 ± 0.60 | 5.35 ± 0.31 | 5.75 ± 0.87 | < 0.001 |
| Insulin (mIU/L) | 6.10 ± 4.34 | 4.90 ± 2.90 | 8.45 ± 5.56 | < 0.001 |
| GGT (U/L) | 30.18 ± 38.20 | 22.64 ± 27.18 | 44.92 ± 50.40 | < 0.001 |
| HOMA-IR | 1.30 ± 1.14 | 0.99 ± 0.64 | 1.94 ± 1.57 | < 0.001 |
| Mets components | ||||
| Waist circumference (cm) | 82.72 ± 9.67 | 78.92 ± 7.71 | 90.13 ± 8.76 | < 0.001 |
| Triglyceride (mg/dL) | 132.31 ± 124.14 | 95.05 ± 50.38 | 205.12 ± 180.42 | < 0.001 |
| HDL-C (mg/dL) | 56.87 ± 13.89 | 61.54 ± 12.97 | 47.74 ± 10.72 | < 0.001 |
| Systolic BP (mmHg) | 116.97 ± 15.34 | 112.10 ± 12.47 | 126.49 ± 15.95 | < 0.001 |
| Diastolic BP (mmHg) | 73.52 ± 12.08 | 69.75 ± 9.95 | 80.88 ± 12.48 | < 0.001 |
| Glucose (mg/dL) | 84.16 ± 16.20 | 80.59 ± 8.00 | 91.14 ± 24.01 | < 0.001 |
MetS Metabolic syndrome, BMI Body mass index, WHR Waist-to-hip ratio, KM type Korean medicine type, PA Physical activity, METs Metabolic equivalent of task, AST Aspartate transaminase, ALT Alanine transaminase, ALP Alkaline phosphatase, hsCRP High sensitivity C-reactive protein, HbAlc Hemoglobin A1c, GGT Gamma-glutamyl transferase, HOMA-IR Homeostatic model assessment for insulin resistance, HDL-C High-density lipoprotein-cholesterol, BP Blood pressure
Values are presented as n (%) or mean ± standard deviation
†P-values for continuous are based on independent t-tests; all other P-values for categorical variables are based on Fisher’s exact test or chi-square test between the normal and Mets groups
The models’ performance with 95% confidence interval according to the number of features used
| F1-score | Accuracy | Sensitivity | Specificity | AUC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Original | SMOTE | Original | SMOTE | Original | SMOTE | Original | SMOTE | Original | SMOTE | |
| Decision Tree | 0.711 (0.66–0.76) | 0.758 (0.71–0.80) | 0.711 (0.66–0.76) | 0.758 (0.71–0.80) | 0.573 (0.52–0.63) | 0.758 (0.71–0.80) | 0.782 (0.74–0.83) | 0.758 (0.71–0.80) | 0.677 (0.63–0.73) | 0.758 (0.71–0.80) |
| Gaussian NB | 0.789 (0.75–0.83) | 0.780 (0.74–0.82) | 0.790 (0.75–0.83) | 0.780 (0.74–0.82) | 0.684 (0.63–0.73) | 0.790 (0.75–0.83) | 0.844 (0.80–0.88) | 0.769 (0.72–0.81) | 0.764 (0.72–0.81) | 0.780 (0.74–0.82) |
| KNN | 0.774 (0.73–0.82) | 0.783 (0.74–0.83) | 0.777 (0.73–0.82) | 0.783 (0.74–0.83) | 0.619 (0.57–0.67) | 0.826 (0.79–0.87) | 0.859 (0.82–0.90) | 0.740 (0.69–0.79) | 0.739 (0.69–0.79) | 0.783 (0.74–0.83) |
| XGBoost | 0.771 (0.73–0.82) | 0.802 (0.76–0.84) | 0.773 (0.73–0.82) | 0.802 (0.76–0.85) | 0.626 (0.57–0.68) | 0.812 (0.77–0.85) | 0.848 (0.81–0.89) | 0.792 (0.75–0.84) | 0.737 (0.69–0.78) | 0.802 (0.76–0.85) |
| RF | 0.772 (0.73–0.82) | 0.813 (0.77–0.86) | 0.774 (0.73–0.82) | 0.814 (0.77–0.86) | 0.628 (0.58–0.68) | 0.832 (0.79–0.87) | 0.850 (0.81–0.89) | 0.795 (0.75–0.84) | 0.739 (0.69–0.79) | 0.814 (0.77–0.86) |
| Logistic R | 0.777 (0.73–0.82) | 0.783 (0.74–0.83) | 0.787 (0.74–0.83) | 0.784 (0.74–0.83) | 0.558 (0.50–0.61) | 0.799 (0.76–0.84) | 0.904 (0.87–0.94) | 0.768 (0.72–0.81) | 0.731 (0.68–0.78) | 0.784 (0.74–0.83) |
| SVM | 0.787 (0.74–0.83) | 0.785 (0.74–0.83) | 0.795 (0.75–0.84) | 0.785 (0.74–0.83) | 0.585 (0.53–0.64) | 0.809 (0.77–0.85) | 0.903 (0.87–0.93) | 0.762 (0.72–0.81) | 0.744 (0.70–0.79) | 0.786 (0.74–0.83) |
| MLP | 0.785 (0.74–0.83) | 0.770 (0.72–0.82) | 0.792 (0.75–0.84) | 0.772 (0.73–0.82) | 0.607 (0.55–0.66) | 0.735 (0.69–0.78) | 0.887 (0.85–0.92) | 0.809 (0.77–0.85) | 0.747 (0.70–0.79) | 0.772 (0.73–0.82) |
| 1D-CNN | 0.779 (0.73–0.82) | 0.783 (0.74–0.83) | 0.782 (0.74–0.83) | 0.784 (0.74–0.83) | 0.657 (0.61–0.71) | 0.784 (0.74–0.83) | 0.846 (0.81–0.88) | 0.784 (0.74–0.83) | 0.752 (0.71–0.80) | 0.784 (0.74–0.83) |
| Decision Tree | 0.722 (0.67–0.77) | 0.765 (0.72–0.81) | 0.724 (0.68–0.77) | 0.765 (0.72–0.81) | 0.570 (0.52–0.62) | 0.776 (0.73–0.82) | 0.803 (0.76–0.85) | 0.755 (0.71–0.80) | 0.686 (0.64–0.74) | 0.765 (0.72–0.81) |
| Gaussian NB | 0.775 (0.73–0.82) | 0.766 (0.72–0.81) | 0.774 (0.73–0.82) | 0.766 (0.72–0.81) | 0.685 (0.64–0.74) | 0.773 (0.73–0.82) | 0.820 (0.78–0.86) | 0.759 (0.71–0.81) | 0.753 (0.71–0.80) | 0.766 (0.72–0.81) |
| KNN | 0.738 (0.69–0.78) | 0.780 (0.73–0.82) | 0.743 (0.70–0.79) | 0.782 (0.74–0.83) | 0.551 (0.50–0.60) | 0.879 (0.84–0.91) | 0.842 (0.80–0.88) | 0.685 (0.63–0.73) | 0.696 (0.65–0.75) | 0.782 (0.74–0.83) |
| XGBoost | 0.778 (0.73–0.82) | 0.834 (0.79–0.87) | 0.782 (0.74–0.83) | 0.834 (0.79–0.87) | 0.622 (0.57–0.67) | 0.837 (0.8–0.88) | 0.863 (0.83–0.90) | 0.832 (0.79–0.87) | 0.743 (0.70–0.79) | 0.834 (0.79–0.87) |
| RF | 0.791 (0.75–0.83) | 0.838 (0.80–0.88) | 0.795 (0.75–0.84) | 0.838 (0.80–0.88) | 0.635 (0.58–0.69) | 0.850 (0.81–0.89) | 0.876 (0.84–0.91) | 0.826 (0.79–0.87) | 0.756 (0.71–0.80) | 0.838 (0.80–0.88) |
| Logistic R | 0.785 (0.74–0.83) | 0.779 (0.73–0.82) | 0.792 (0.75–0.84) | 0.779 (0.73–0.82) | 0.595 (0.54–0.65) | 0.791 (0.75–0.83) | 0.893 (0.86–0.93) | 0.767 (0.72–0.81) | 0.744 (0.70–0.79) | 0.779 (0.73–0.82) |
| SVM | 0.790 (0.75–0.83) | 0.783 (0.74–0.83) | 0.797 (0.75–0.84) | 0.783 (0.74–0.83) | 0.605 (0.55–0.66) | 0.796 (0.75–0.84) | 0.894 (0.86–0.93) | 0.770 (0.72–0.82) | 0.750 (0.70–0.80) | 0.783 (0.74–0.83) |
| MLP | 0.772 (0.73–0.82) | 0.797 (0.75–0.84) | 0.778 (0.73–0.82) | 0.798 (0.75–0.84) | 0.619 (0.57–0.67) | 0.790 (0.75–0.83) | 0.859 (0.82–0.90) | 0.806 (0.76–0.85) | 0.739 (0.69–0.79) | 0.798 (0.75–0.84) |
| 1D-CNN | 0.771 (0.73–0.82) | 0.770 (0.72–0.82) | 0.776 (0.73–0.82) | 0.774 (0.73–0.82) | 0.635 (0.58–0.69) | 0.861 (0.82–0.90) | 0.848 (0.81–0.89) | 0.688 (0.64–0.74) | 0.742 (0.69–0.79) | 0.775 (0.73–0.82) |
| Decision Tree | 0.743 (0.70–0.79) | 0.777 (0.73–0.82) | 0.743 (0.70–0.79) | 0.778 (0.73–0.82) | 0.631 (0.58–0.68) | 0.797 (0.75–0.84) | 0.801 (0.76–0.84) | 0.758 (0.71–0.80) | 0.716 (0.67–0.76) | 0.778 (0.73–0.82) |
| Gaussian NB | 0.786 (0.74–0.83) | 0.759 (0.71–0.81) | 0.795 (0.75–0.84) | 0.762 (0.72–0.81) | 0.577 (0.52–0.63) | 0.646 (0.59–0.70) | 0.906 (0.87–0.94) | 0.878 (0.84–0.91) | 0.741 (0.69–0.79) | 0.762 (0.72–0.81) |
| KNN | 0.748 (0.70–0.79) | 0.787 (0.74–0.83) | 0.756 (0.71–0.80) | 0.788 (0.74–0.83) | 0.540 (0.49–0.59) | 0.871 (0.83–0.91) | 0.866 (0.83–0.90) | 0.705 (0.66–0.75) | 0.703 (0.65–0.75) | 0.788 (0.74–0.83) |
| XGBoost | 0.801 (0.76–0.84) | 0.851 (0.81–0.89) | 0.804 (0.76–0.85) | 0.851 (0.81–0.89) | 0.662 (0.61–0.71) | 0.859 (0.82–0.9) | 0.877 (0.84–0.91) | 0.843 (0.8–0.88) | 0.769 (0.72–0.81) | 0.851 (0.81–0.89) |
| RF | 0.815 (0.77–0.86) | 0.843 (0.80–0.88) | 0.818 (0.78–0.86) | 0.844 (0.80–0.88) | 0.690 (0.64–0.74) | 0.857 (0.82–0.89) | 0.883 (0.85–0.92) | 0.831 (0.79–0.87) | 0.786 (0.74–0.83) | 0.844 (0.80–0.88) |
| Logistic R | 0.812 (0.77–0.85) | 0.804 (0.76–0.85) | 0.818 (0.78–0.86) | 0.804 (0.76–0.85) | 0.638 (0.59–0.69) | 0.812 (0.77–0.85) | 0.910 (0.88–0.94) | 0.796 (0.75–0.84) | 0.774 (0.73–0.82) | 0.804 (0.76–0.85) |
| SVM | 0.811 (0.77–0.85) | 0.810 (0.77–0.85) | 0.817 (0.78–0.86) | 0.810 (0.77–0.85) | 0.636 (0.58–0.69) | 0.831 (0.79–0.87) | 0.909 (0.88–0.94) | 0.790 (0.75–0.83) | 0.773 (0.73–0.82) | 0.810 (0.77–0.85) |
| MLP | 0.807 (0.76–0.85) | 0.811 (0.77–0.85) | 0.812 (0.77–0.85) | 0.812 (0.77–0.85) | 0.638 (0.59–0.69) | 0.836 (0.80–0.88) | 0.901 (0.87–0.93) | 0.787 (0.74–0.83) | 0.770 (0.72–0.81) | 0.812 (0.77–0.85) |
| 1D-CNN | 0.799 (0.76–0.84) | 0.814 (0.77–0.86) | 0.803 (0.76–0.85) | 0.815 (0.77–0.86) | 0.662 (0.61–0.71) | 0.807 (0.76–0.85) | 0.875 (0.84–0.91) | 0.822 (0.78–0.86) | 0.768 (0.72–0.81) | 0.815 (0.77–0.86) |
Presented are the results before (Original) and after (SMOTE) applying the synthetic minority oversampling technique
AUC Area under the receiver operating characteristic curve, Gaussian NB Gaussian naïve bayes classifier, KNN K-nearest neighbor, XGBoost Extreme gradient boosting, Logistic R Logistic regression, RF Random forest, SVM Support vector machine, MLP Multilayer perceptron, 1D-CNN 1-dimensional convolutional neural network
Fig. 1Feature importance in the MetS prediction model. a Feature importance when using 12 features; (b) Feature importance when using 20 features. Variable importance results when building the model are presented. BMI, body mass index; WHR, waist-to-hip ratio; PA, physical activity; KM type, Korean medicine type; HOMA-IR, homeostatic model assessment for insulin resistance; GGT, gamma-glutamyl transferase; HbAlc, hemoglobin A1c; hsCRP, high sensitivity C-reactive protein; ALT, alanine transaminase; ALP, alkaline phosphatase; AST, aspartate transaminase