| Literature DB >> 29374085 |
Habibollah Esmaeily1, Maryam Tayefi2, Majid Ghayour-Mobarhan3, Alireza Amirabadizadeh4.
Abstract
Background: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multiple logistic regression (MLR) models were applied, using demographic, anthropometric, and biochemical characteristics, on a sample of 9528 individuals from Mashhad City in Iran.Entities:
Keywords: Support vector machine; Data mining; Diabetes type 2
Mesh:
Year: 2018 PMID: 29374085 PMCID: PMC6058191 DOI: 10.29252/ibj.22.5.303
Source DB: PubMed Journal: Iran Biomed J ISSN: 1028-852X
Comparison of baseline characteristics between diabetes and non-diabetes groups
| Variables | Diabetes (n =1361) | Non-diabetes (n = 8167) | |
|---|---|---|---|
| Age (year) | 52.01 ± 7.2 | 47.70 ± 8.1 | <0.001 |
| BMI (kg/m2) | 27.76 ± 4.7 | 28.78 ± 4.4 | <0.001 |
| Gender | |||
| Male | 3277 (40.1%) | 518 (38.1%) | =0.04 |
| Female | 4890 (59.9%) | 843 (61.9%) | |
| Marriage status | |||
| Single | 54 (0.7%) | 5 (0.4%) | <0.001 |
| Married | 7636 (93.5%) | 1239 (91%) | |
| Divorced | 111 (1.4%) | 21 (1.5%) | |
| Widow | 366 (4.5%) | 96 (7.1%) | |
| Education Level | |||
| Low | 4319 (52.9%) | 878 (64.5%) | <0.001 |
| Moderate | 2912 (35.7%) | 374 (27.5%) | |
| High | 936 (11.5%) | 109 (8.0%) | |
| Occupation status | |||
| Employment | 3125 (38.3%) | 400 (29.4%) | <0.001 |
| Unemployment | 4283 (52.4%) | 783 (57.5%) | |
| Retired | 759 (9.3%) | 178 (13.1%) | |
| Smoking status | |||
| Yes | 1775 (21.7%) | 272 (20.0%) | =0.05 |
| No | 6392 (78.3%) | 1089 (80.0%) | |
| Family history of diabetes | |||
| Yes | 1994 (24.4%) | 647 (47.5%) | <0.001 |
| No | 6173 (75.6%) | 714 (52.5%) | |
| Depression | |||
| Yes | 2226 (27.3%) | 461 (33.9%) | =0.001 |
| No | 5941 (72.7%) | 900 (66.1%) | |
| SBP (mmHg) | 121.14 ± 18.2 | 128.81 ± 18.4 | <0.001 |
| DBP (mmHg) | 78.91 ± 11.1 | 81.36 ± 10.4 | <0.001 |
| Cholesterol (mg/dL) | 189.69 ± 37.8 | 201.46 ± 46.3 | <0.001 |
| LDL (mg/dL) | 116.74 ± 34.6 | 120.49 ± 39.1 | <0.001 |
| HDL (mg/dL) | 42.73 ± 9.9 | 42.81 ± 9.6 | =0.004 |
| PAL (h per week) | 1.59 ± 0.86 | 1.60 ± 0.64 | =0.04 |
| hs-CRP (mg/l) | 1.56 [0.95-3.23] | 2.7 [1.3-4.9] | <0.001 |
| TG (mg/dL) | 117 [82-165] | 160 [105-222] | <0.001 |
Median (IQR)
Multiple logistic regression analysis on the influential factors of type 2 diabetes in training dataset
| Variables | B (SE) | OR (95% CI) | |
|---|---|---|---|
| Age(year) | 0.05 (0.004) | 1.05 (1.04-1.06) | <0.001 |
| Education | |||
| High | Reference | =0.27 | |
| Moderate | 0.12 (0.11) | 1.13 (0.91-1.39) | =0.59 |
| Low | -0.04 (0.07) | 0.96 (0.84-1.11) | |
| Occupation | |||
| Unemployment | Reference | ||
| Retired | 0.16 (0.11) | 1.18 (0.94-1.47) | =0.15 |
| Employment | 0.46 (0.12) | 1.59 (1.27-1.99) | <0.001 |
| Married status | |||
| Divorced | Reference | ||
| Married | -0.22 (0.24) | 0.80 (0.50-1.28) | =0.36 |
| Single | -0.67 (0.54) | 0.51 (0.17-1.48) | =0.21 |
| Widow | -0.23 (0.26) | 0.79 (0.47-1.33) | =0.37 |
| Smoking status | |||
| No | Reference | ||
| Yes | -0.11 (0.07) | 0.89 (0.77-1.03) | =0.13 |
| Family history of diabetes | |||
| No | Reference | ||
| Yes | 1.03 (0.06) | 2.81 (2.5-3.16) | <0.001 |
| Depression | |||
| No | Reference | ||
| Yes | 0.76 (0.61-0.95) | <0.001 | |
| BMI (kg/m2) | 0.02 (0.007) | 1.02 (1.01-1.02) | <0.001 |
| SBP (mmHg) | 0.01 (0.002) | 1.02 (1.01-1.02) | <0.001 |
| DBP (mmHg) | -0.01 (0.004) | 0.99 (0.98-0.99) | =0.009 |
| LDL (mg/dL) | -0.003 (0.002) | 0.99 (0.99-1.001) | =0.1 |
| HDL (mg/dL) | -0.003 (0.004) | 0.99 (0.99-1.005) | =0.47 |
| Cholesterol (mg/dL) | 0.005 (0.002) | 1.005 (1.001-1.008) | =0.007 |
| hs-CRP (mg/l) | 0.02 (0.003) | 1.02 (1.01-1.02) | <0.001 |
| TG (mg/dL) | 0.003 (0.0001) | 1.003 (1.002-1.004) | <0.001 |
| PAL (h per week) | 0.001 (0.0001) | 1.001 (1.001-1.003) | =0.001 |
SE, standard error
Fig. 1The importance of input variables in MLR (A), ANN (B), and SVM (C) models.
The mean predicted error according to the number of hidden layer nodes by neural network
| Number of hidden layer nodes | Learning rate | ||||||
|---|---|---|---|---|---|---|---|
| 20 | 18 | 16 | 14 | 12 | 10 | ||
| Mean square error | 27.04 | 27.12 | 27.52 | 27.56 | 27.74 | 28.51 | 0.05 |
| 26.84 | 26.94 | 27.56 | 27.44 | 27.80 | 28.46 | 0.01 | |
| 26.8 | 27.23 | 27.39 | 27.54 | 27.87 | 28.38 | 0.20 | |
| 26.94 | 27.37 | 27.38 | 27.51 | 27.86 | 28.37 | 0.50 | |
The performance of three models for identifying associated risk factors of type 2 diabetes
| Model (%) | ANN (95% CI) | MLR (95% CI) | SVM (95% CI) |
|---|---|---|---|
| Sensitivity | 63.1 (59.8-67.5) | 60.1 (58.4-63.1) | 64.5 (59.8-66.4) |
| Specificity | 81.2 (78.4-84.6) | 80.5 (76.4-85.3) | 78.9 (76.4-81.7) |
| Accuracy | 78.7 (73.5-82.6) | 77.7 (73.5-80.9) | 76.8 (73.5-80.9) |
| AUC | 71.5 (68.0-75.9) | 70.4 (68.6-73.9) | 73.1 (69.2-77.6) |
AUC, the area under the ROC curve
Fig. 2ROC curves of the SVM, ANN, and MLR models in testing dataset.