| Literature DB >> 35145200 |
Haizhen Yang1,2,3, Baoxian Yu4,5,6, Ping OUYang7, Xiaoxi Li8, Xiaoying Lai8, Guishan Zhang9, Han Zhang10,11,12.
Abstract
Metabolic syndrome (MetS) is a group of physiological states of metabolic disorders, which may increase the risk of diabetes, cardiovascular and other diseases. Therefore, it is of great significance to predict the onset of MetS and the corresponding risk factors. In this study, we investigate the risk prediction for MetS using a data set of 67,730 samples with physical examination records of three consecutive years provided by the Department of Health Management, Nanfang Hospital, Southern Medical University, P.R. China. Specifically, the prediction for MetS takes the numerical features of examination records as well as the differential features by using the examination records over the past two consecutive years, namely, the differential numerical feature (DNF) and the differential state feature (DSF), and the risk factors of the above features w.r.t different ages and genders are statistically analyzed. From numerical results, it is shown that the proposed DSF in addition to the numerical feature of examination records, significantly contributes to the risk prediction of MetS. Additionally, the proposed scheme, by using the proposed features, yields a superior performance to the state-of-the-art MetS prediction model, which provides the potential of effective prescreening the occurrence of MetS.Entities:
Mesh:
Year: 2022 PMID: 35145200 PMCID: PMC8831522 DOI: 10.1038/s41598-022-06235-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Results based on three models with and without differential features.
| Model | Threshold | AUC | Accuracy | Precision | Recall | F1-score | Specificity | F2-score |
|---|---|---|---|---|---|---|---|---|
| XGBoost | 0.147 | 0.85 | ||||||
| Stacking | 0.116 | 0.917 ± 0.003 | 0.812 | 0.37 | 0.52 | 0.80 | ||
| Random Forest | 0.156 | 0.908 ± 0.003 | 0.804 | 0.36 | 0.51 | 0.79 | 0.68 | |
| XGBoost | 0.144 | 0.87 | ||||||
| Stacking | 0.125 | 0.928 ± 0.002 | 0.837 | 0.41 | 0.56 | 0.83 | ||
| Random Forest | 0.177 | 0.916 ± 0.002 | 0.825 | 0.39 | 0.87 | 0.54 | 0.82 | 0.70 |
The result with the best performance in each metric using different classifiers are marked in bold characters.
Figure 1Receiver operating characteristic (ROC) curves of models with/without DNFs and DSFs using XGBoost classifier.
Figure 2(a) Feature importance ranking chart based on XGBoost model (top 20). (b) The SHAP analysis of the important features. BMI_DSF, DSF of BMI; BP_DSF, DSF of BP; DBP, diastolic blood pressure; DBP_DSF, DSF of DBP; FGLU, fasting blood glucose; FGLU_DSF, DSF of FGLU; FL_DSF, DSF of FL; HGB_DSF, DSF of hemoglobin (HGB); SBP, systolic blood pressure; SBP_DSF, DSF of SBP; TG_DSF, DSF of TG; WC_DSF, DSF of WC.
Prevalence of MetS in the forthcoming year for different gender and age groups.
| Age 18–44 | Age 45–59 | Age | ||
|---|---|---|---|---|
| Male | N | 23,839 | 10,540 | 2975 |
| Prevalence | 13.52% | 21.94% | 25.41% | |
| Female | N | 21,224 | 6793 | 1641 |
| Prevalence | 1.67% | 6.52% | 19.07% |
The OR of feature’s abnormality to MetS by age groups in male.
| Features | Age 18–44 (95% CI) | Age 45–59 (95% CI) | Age |
|---|---|---|---|
| TG (mmol/L) | 2.961 (2.426–3.613) | ||
| WC (cm) | 2.453 (2.209–2.724) | 2.263 (1.848–2.770) | |
| BMI (kg/m2) | |||
| HDL-C (mmol/L) | 2.890 (2.644–3.159) | 2.355 (2.067–2.684) | 2.478 (1.896–3.237) |
| WHR (–) | 4.021 (3.733–4.331) | 2.352 (2.144–2.580) | 2.844 (2.440–3.408) |
| FL (%) | |||
| SBP (mmHg) | 1.947 (1.796–2.111) | 1.656 (1.504–1.824) | 1.546 (1.312–1.823) |
| FGLU (mmol/L) | 3.527 (2.840–4.381) | 2.762 (2.373–3.216) | 2.146 (1.719–2.680) |
| DBP (mmHg) | 3.169 (2.774–3.620) | 2.024 (1.778–2.305) | 1.686 (1.339–2.122) |
| BP (%) | 2.017 (1.865–2.181) | 1.666 (1.517–1.830) | 1.578 (1.337–1.861) |
The result with the best performance in each metric using different classifiers are marked in bold characters.
The OR of feature’s abnormality to MetS by age groups in female.
| Features | Age 18–44 (95% CI) | Age 45–59 (95% CI) | Age |
|---|---|---|---|
| TG (mmol/L) | 2.337 (1.777–3.074) | ||
| WC (cm) | 9.177 (7.176–11.735) | 5.138 (4.164–6.339) | 3.402 (2.557–4.524) |
| BMI (kg/m2) | 4.060 (2.653–6.213) | ||
| HDL-C (mmol/L) | 8.257 (6.445–10.579) | 4.356 (3.076–6.168) | |
| WHR (–) | 6.100 (4.586–8.115) | ||
| FL (%) | 3.537 (2.644–4.731) | ||
| SBP (mmHg) | 4.618 (3.582–5.955) | 2.561 (2.111–3.107) | 2.072 (1.615–2.659) |
| FGLU (mmol/L) | 2.597 (1.843–3.658) | ||
| DBP (mmHg) | 7.171 (4.673–11.005) | 2.884 (2.118–3.926) | 2.223 (1.426–3.467) |
| BP (%) | 4.410 (3.461–5.619) | 2.628 (2.172–3.180) | 2.154 (1.676–2.767) |
The result with the best performance in each metric using different classifiers are marked in bold characters.
The OR of the presistent abnormality (A2A) compared to sudden abnormal state (N2A) in male.
| Features | Age 18–44 | Age 45–59 | Age | |||
|---|---|---|---|---|---|---|
| N2A | A2A | N2A | A2A | N2A | A2A | |
| TG_DSF | 2.810 | 2.501 | ||||
| WC_DSF | 3.498 | 5.827 | 2.235 | 3.017 | 1.954 | 2.815 |
| BP_DSF | 1.807 | 2.632 | 1.403 | 2.065 | 1.512 | 1.932 |
| BMI_DSF | ||||||
| SBP_DSF | 1.675 | 2.581 | 1.484 | 2.027 | 1.457 | 1.930 |
| FGLU_DSF | 2.591 | 5.319 | 2.592 | 3.051 | ||
| FL_DSF | 3.319 | 6.034 | 2.507 | 4.338 | 3.260 | |
| HGB_DSF | 2.792 | 4.475 | 2.370 | |||
| DBP_DSF | 2.348 | 3.663 | 1.513 | 2.195 | 1.240 | 1.871 |
The result with the best performance in each metric using different classifiers are marked in bold characters.
The OR of the presistent abnormality (A2A) compared to sudden abnormal state (N2A) in female.
| Features | Age 18–44 | Age 45–59 | Age | |||
|---|---|---|---|---|---|---|
| N2A | A2A | N2A | A2A | N2A | A2A | |
| TG_DSF | 5.826 | 1.683 | 3.116 | |||
| WC_DSF | 6.916 | 17.346 | 4.870 | 6.689 | 4.691 | |
| BP_DSF | 3.585 | 7.419 | 2.769 | 3.258 | 1.837 | 3.152 |
| BMI_DSF | 16.165 | 1.028 | ||||
| SBP_DSF | 3.859 | 7.728 | 2.719 | 3.232 | 1.605 | 3.148 |
| FGLU_DSF | 9.978 | 4.512 | 8.325 | |||
| FL_DSF | 9.287 | 23.430 | 2.119 | |||
| HGB_DSF | 3.341 | 6.003 | 2.154 | 3.511 | 2.007 | 3.344 |
| DBP_DSF | 4.182 | 7.233 | 2.682 | 3.655 | ||
The result with the best performance in each metric using different classifiers are marked in bold characters.
Comparison between the proposed MetS model and the state-of-the-art contributions.
| References | Interval | Data type | Sample size | Method | Performance |
|---|---|---|---|---|---|
| [ | 14 years | Physical examination data, SC types | 3529 | Logistic regression | AUC = 0.817 |
| [ | 10 years | MetS diagnosis indicators, follow-up time, RBP4 | 352 | Logistic regression | AUC = 0.813 |
| [ | 7 years | Physical examination data | 2107 | Support Vector Machines | AUC = 0.774 |
| [ | 3 years | BMI, DBP, HDL, FPG | 4395 | Statistic methods | AUC = 0.680 |
| [ | 2 years | Clinical, diet and anthropometric indicators | 27,945 | XGBoost | AUC = 0.880 |
The result with the best performance in each metric using different classifiers are marked in bold characters.
Basic statistical characteristics of the raw data set.
| Indicators (unit) | Male (N = 537,283) | Female (N = 403,899) | Indicators (unit) | Male (N = 537,283) | Female (N = 403,899) |
|---|---|---|---|---|---|
| Age (year) | 39.25 ± 13.68 | 36.84 ± 13.37 | ALT (U/L) | 26.18 ± 15.12 | 15.87 ± 9.15 |
| WC (cm) | 83.59 ± 9.29 | 73.02 ± 8.90 | AST (U/L) | 24.12 ± 8.09 | 20.12 ± 6.42 |
| FGLU (mmol/L) | 5.05 ± 1.22 | 4.88 ± 0.91 | HGB (g/L) | 153.24 ± 11.35 | 130.76 ± 11.59 |
| PG (mmol/L) | 7.36 ± 3.08 | 6.93 ± 2.67 | RBC (1012/L) | 5.20 ± 0.50 | 4.57 ± 0.43 |
| DBP (mmHg) | 75.27 ± 10.91 | 69.41 ± 10.04 | WBC (109/L) | 6.80 ± 1.69 | 6.26 ± 1.58 |
| SBP (mmHg) | 124.00 ± 15.34 | 114.62 ± 16.04 | PLT (109/L) | 239.81 ± 54.05 | 261.42 ± 60.44 |
| BP (ratio) | 15.73% | 7.79% | CR (μmol/L) | 79.25 ± 16.38 | 54.49 ± 13.73 |
| TG (mmol/L) | 1.77 ± 1.59 | 1.13 ± 0.85 | DM_H (ratio) | 1.27% | 0.60% |
| HDL-C (mmol/L) | 1.27 ± 0.32 | 1.54 ± 0.37 | HTN_H (ratio) | 3.74% | 1.96% |
| Hip (cm) | 94.38 ± 6.40 | 90.04 ± 6.49 | SMK_H (ratio) | – | – |
| WHR (–) | 1.04 ± 0.20 | 1.02 ± 0.13 | FL (ratio) | 32.86% | 12.62% |
| HBA1c (%) | 5.87 ± 0.80 | 5.74 ± 0.69 | TN (ratio) | 7.92% | 13.79% |
| BMI (kg/m2) | 24.12 ± 3.36 | 21.96 ± 3.19 | HM (ratio) | – | 10.20% |
| TC (mmol/L) | 5.23 ± 1.02 | 5.03 ± 1.00 | MGH (ratio) | – | 78.09% |
| LDL-C (mmol/L) | 3.19 ± 0.78 | 2.89 ± 0.77 | UALB (ratio) | 2.47% | 1.08% |
| UA (μmol/L) | 414.36 ± 88.06 | 298.97 ± 70.83 | MS_result (ratio) | 11.48% | 2.88% |
Continuous indicator is expressed as mean ± standard deviation, discrete indicator is expressed as a percentage (%). MS_result is the numerical result of MetS, the value is 0 or 1, 0 represents no disease, 1 represents disease. – means less than 0.1% of the data is available due to missing or gender specific examinations.
ALT, alanine aminotransferase; AST, aspartate aminotransferase; CR, creatinine; DM_H, history of diabetes mellitus; HBA1c, hemoglobin a1c; HM, hysteromyoma; HTN_H, history of hypertension; LDL-C, low-density lipoprotein cholesterol; MGH, mammary gland hyperplasia; N, numbers; PLT, platelets; RBC, red blood cell count; SMK_H, history of smoking; TC, total cholesterol; TN, thyroid nodules; UA, uric acid; UALB, urine albumin; WBC, white blood cell count.
Figure 3Schematic diagram of the risk prediction model for MetS within the next 1 year.
Figure 4Framework of our MetS risk predictive model.