| Literature DB >> 35397593 |
Mahdi Akbarzadeh1, Nadia Alipour2, Hamed Moheimani3, Asieh Sadat Zahedi4, Firoozeh Hosseini-Esfahani5, Hossein Lanjanian4, Fereidoun Azizi6, Maryam S Daneshpour7.
Abstract
BACKGROUND: Metabolic syndrome (MetS) is a prevalent multifactorial disorder that can increase the risk of developing diabetes, cardiovascular diseases, and cancer. We aimed to compare different machine learning classification methods in predicting metabolic syndrome status as well as identifying influential genetic or environmental risk factors.Entities:
Keywords: Decision tree; Discriminant analysis; Logistic Regression; Metabolic syndrome; Random Forest; Support vector machines
Mesh:
Substances:
Year: 2022 PMID: 35397593 PMCID: PMC8994379 DOI: 10.1186/s12967-022-03349-z
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 1Study design and participant selection flowchart; 4754 eligible participants with available genotype information, > 19 years old, without prevalent MetS at the 1st phase and complete follow-up data from Tehran Cardio-metabolic genetic study (TCGS) were included
Comparing independent demographic and genetic predictors of MetS in the healthy and unhealthy groups
| Variables | Unhealthy (MetS) (%) | Healthy (No MetS) (%) | P value |
|---|---|---|---|
| Group size (%) | 2365(50.6) | 2309(49.4) | |
| Age (mean ± SD) | 40.53 ± 12.93 | 33.04 ± 12.47 | < 0.001a |
| Schooling years (mean ± SD) | 9.19 ± 4.34 | 10.41 ± 4.63 | < 0.001a |
| BMI (mean ± SD) | 27.08 ± 4.09 | 23.8 ± 3.95 | < 0.001a |
| Physical activity (mean ± SD) | 575.16 ± 923.29 | 452.48 ± 808.96 | < 0.001a |
| Sex (%) | |||
| Male | 1249(52.81) | 867 (37.55) | < 0.001b |
| Female | 1116 (47.19) | 1442 (62.45) | |
| Smoking status (%) | |||
| Never | 1213(51.29) | 1323(55.94) | < 0.001b |
| Former smoker | 146(6.17) | 73(3.09) | |
| Current smoker | 336 (14.21) | 259 (1095) | |
| Second hand smoker | 670 (28.33) | 654 (27.65) | |
| Marital status (%) | |||
| Divorced | 24 (1.01) | 19 (0.80) | < 0.001b |
| Married | 1967 (83.17) | 1612 (68.16) | |
| Single | 312(13.19) | 652(27.57) | |
| Widowed | 62(2.62) | 26 (1.10) | |
| rs1260326 (%) | |||
| CC | 662(27.99) | 734 (31.04) | < 0.01b |
| TC | 1156(48.88) | 1090 (46.09) | |
| TT | 547(23.13) | 485 (20.51) | |
| rs780094 (%) | |||
| CC | 675 (28.54) | 752 (31.80) | < 0.01b |
| TC | 1156 (48.88) | 1079 (45.62) | |
| TT | 534 (22.58) | 478 (20.21) | |
| rs780093 (%) | |||
| CC | 668(28.25) | 735(31.08) | < 0.01b |
| TC | 1143(48.33) | 1107(46.81) | |
| TT | 554(23.42) | 476(20.13) | |
MetS Metabolic Syndrome, BMI Body Mass Index, SD Standard Deviation; significant difference were observed in SNP information of GCKR genotypes and independent variables between healthy and unhealthy participants
aStudent's-t test
bchi-square test
Baseline characteristics of study participants and non-responders by common SNPs of GCKR genotypes
| Variables | Male | Female | ||||
|---|---|---|---|---|---|---|
| Responders (%) | Non-Responders (%) | P value | Responders (%) | Non-Responders (%) | P value | |
| Group size (%) | 2164 (72.28) | 830 (27.72) | 2590 (68.28) | 1203 (31.72) | ||
| Age(mean ± SD) | 39.35 ± 14.59 | 41.48 ± 15.31 | 0.06141 | 36.25 ± 11.51 | 37.06 ± 14.6 | 0.0649a |
| Schooling years (mean ± SD) | 10.02 ± 4.65 | 9.91 ± 4.4 | 0.5821 | 9.58 ± 4.4 | 9.32 ± 4.36 | 0.145a |
| BMI (mean ± SD) | 24.86 ± 3.87 | 25.12 ± 4.38 | 0.02461 | 26.31 ± 4.68 | 26.63 ± 4.92 | 0.0539a |
| Physical activity (mean ± SD) | 607.79 ± 1049.11 | 524.49 ± 1034.92 | 0.05101 | 429.78 ± 696.21 | 384.48 ± 682.76 | 0.0607a |
| Smoking status (%) | ||||||
| Never smoker | 825 (38.12) | 283 (34.10) | 0.2212 | 1732 (66.87) | 717 (59.60) | 0.002b |
| Former smoker | 219 (10.12) | 88 (10.60) | 8 (0.31) | 13 (1.08) | ||
| Current smoker | 547(25.28) | 196 (23.61) | 59 (2.28) | 35 (2.91) | ||
| Second hand | 573(26.48) | 171 (20.60) | 791 (30.54) | 371 (30.84) | ||
| Marital status (%) | ||||||
| Divorced | 12 (0.55) | 1(0.05) | 0.1042 | 33 (1.27) | 18 (1.50) | < 0.001b |
| Married | 1628 (75.23) | 649(29.99) | 2011 (77.64) | 954 (79.30) | ||
| Single | 520 (24.03) | 176 (8.13) | 462 (17.84) | 155 (12.88) | ||
| Widowed | 4 (0.18) | 3 (0.14) | 84 (3.24) | 76 (6.32) | ||
| rs1260326 (%) | ||||||
| CC | 662 (30.59) | 265 (31.93) | 0.2332 | 755 (29.15) | 370 (30.76) | 0.602b |
| TC | 1015 (46.90) | 402 (48.43) | 1265 (48.84) | 574 (47.71) | ||
| TT | 487 (22.50) | 163 (49.64) | 570 (22.01) | 259 (21.53) | ||
| rs780094 (%) | ||||||
| CC | 676 (31.24) | 251 (30.24) | 0.302 | 773 (28.30) | 358 (29.76) | 0.998b |
| TC | 1015 (46.90) | 414 (49.88) | 1256 (48.49) | 584 (48.55) | ||
| TT | 473 (21.86) | 165 (19.88) | 561 (21.66) | 261 (21.70) | ||
| rs780093 (%) | ||||||
| CC | 672 (31.05) | 179 (21.57) | 0.1762 | 754 (29.11) | 280 (23.28) | 0.087b |
| TC | 1010 (46.67) | 315 (37.95) | 1275 (49.23) | 404 (33.58) | ||
| TT | 482 (22.27) | 125 (15.06) | 561 (21.66) | 214 (17.79) | ||
BMI Body Mass Index; There were no significant differences between responders and non-responders in males and females other than higher BMI and lower physical activity in male non-responders and different smoking and marital status distribution between female responders and non-responders
aStudent's t-test
bchi-square test
Applying logistic regression to assess the significance of relationship between Independent demographic and genetic variables and metabolic syndrome
| Variables | B | Odds ratio (OR) | P value | |
|---|---|---|---|---|
| Age | 0.025 | 1.025 | < 0.001 | |
| Gender (female = 0) | 0.864 | 2.373 | < 0.001 | |
| Schooling years | − 0.021 | 0.978 | 0.009 | |
| BMI | 0.207 | 1.230 | < 0.001 | |
| Physical activity | 0.0001 | 1.000 | 0.005 | |
| Smoking status | Current smoker(reference) | |||
| Never smoker | 0.005 | 1.005 | 0.962 | |
| Former smoker | 0.072 | 1.075 | 0.698 | |
| Second hand | 0.646 | 1.066 | 0.593 | |
| Marital status | Divorced (reference) | |||
| Married | − 0.087 | 0.916 | 0.808 | |
| Single | − 0.198 | 0.820 | 0.595 | |
| Widowed | − 0.010 | 0.989 | 0.981 | |
| rs1260326 | CC(reference) | |||
| TC | 0.206 | 1.229 | 0.347 | |
| TT | 0.472 | 1.603 | 0.133 | |
| rs780094 | CC(reference) | |||
| TC | 0.149 | 1.161 | 0.664 | |
| TT | − 1.211 | 0.298 | 0.008 | |
| rs780093 | CC(reference) | |||
| TC | − 0.122 | 0.884 | 0.664 | |
| TT | 1.066 | 2.903 | 0.002 | |
BMI Body Mass Index; logistic regression is used to predict the metabolic syndrome status of the participants in TCGS. The metabolic syndrome was significantly associated with age, gender, schooling years, BMI, physical activity, rs780094, and rs780093 (P < 0.05)
Performances metrics for LR, SVM, DT, RF, LDA, and QDA algorithms
| Models | Accuracy | Sensitivity | Specificity | Kappa | AUC-ROC | AUC-PR | |
|---|---|---|---|---|---|---|---|
| Total | SVM | 0.725 | 0.661 | 0.785 | 0.447 | 0.785 | 0.761 |
| DT | 0.738 | 0.667 | 0.804 | 0.473 | 0.771 | 0.730 | |
| RF | 0.743 | 0.699 | 0.784 | 0.484 | 0.804 | 0.776 | |
| LR | 0.705 | 0.677 | 0.732 | 0.409 | 0.770 | 0.748 | |
| LDA | 0.562 | 0.915 | 0.230 | 0.141 | 0.658 | 0.666 | |
| QDA | 0.546 | 0.492 | 0.598 | 0.089 | 0.563 | 0.555 | |
| Male | SVM | 0.712 | 0.475 | 0.870 | 0.366 | 0.733 | 0.766 |
| DT | 0.735 | 0.527 | 0.874 | 0.421 | 0.739 | 0.753 | |
| RF | 0.729 | 0.559 | 0.842 | 0.415 | 0.754 | 0.782 | |
| LR | 0.711 | 0.519 | 0.839 | 0.373 | 0.732 | 0.768 | |
| LDA | 0.591 | 0.754 | 0.482 | 0.217 | 0.679 | 0.734 | |
| QDA | 0.547 | 0.394 | 0.649 | 0.044 | 0.531 | 0.616 | |
| Female | SVM | 0.733 | 0.783 | 0.671 | 0.456 | 0.802 | 0.565 |
| DT | 0.748 | 0.753 | 0.742 | 0.492 | 0.785 | 0.706 | |
| RF | 0.744 | 0.767 | 0.715 | 0.482 | 0.815 | 0.754 | |
| LR | 0.738 | 0.798 | 0.663 | 0.465 | 0.803 | 0.741 | |
| LDA | 0.608 | 0.752 | 0.427 | 0.184 | 0.664 | 0.617 | |
| QDA | 0.635 | 0.749 | 0.491 | 0.245 | 0.670 | 0.572 |
LR Logistic Regression, SVM support vector machines, DT Decision Tree, RF Random Forest, LDA Linear discriminant analysis, QDA Quadratic discriminant analysis, AUC Area Under Curve. Machine learning methods outperforms the traditional statistical methods
Fig. 2Assessing the importance of predictors with Gini and Accuracy importance indices based on the implementation of the random forest model; we confirmed that BMI, physical activity, and age were the most influential variables in MetS prediction
Fig. 3Classification decision tree, with probabilities of success for metabolic syndrome shown in each node; A combination of BMI, Physical activity, and age is an accurate predictor for the MetS