| Literature DB >> 33029536 |
Mingyue Xue1,2, Yinxia Su2, Chen Li3, Shuxia Wang4, Hua Yao4.
Abstract
BACKGROUND: An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas.Entities:
Mesh:
Year: 2020 PMID: 33029536 PMCID: PMC7532405 DOI: 10.1155/2020/6873891
Source DB: PubMed Journal: J Diabetes Res Impact factor: 4.011
Characteristics of variables.
| Variables | Diabetes ( | Nondiabetes ( |
|
|---|---|---|---|
| Age (years) | 66.43 ± 13.43 | 52.41 ± 16.06 | <0.001 |
| BMI (kg/m2) | 25.92 ± 3.65 | 24.37 ± 3.42 | <0.001 |
| Waist circumference (cm) | 90.20 ± 10.75 | 84.95 ± 10.71 | <0.001 |
| Systolic pressure (mmHg) | 130.20 ± 16.52 | 121.30 ± 14.27 | <0.001 |
| Diastolic pressure (mmHg) | 77.80 ± 10.56 | 75.14 ± 9.65 | <0.001 |
| Ethnicity, | <0.001 | ||
| Han | 50,691 (70.38) | 331,413 (64.93) | |
| Uygur | 10,864 (15.08) | 95,913 (18.79) | |
| Kazak | 1147 (1.59) | 18,893 (3.70) | |
| Hui | 8126 (11.28) | 52,838 (10.35) | |
| Mongolian | 76 (0.11) | 1214 (0.24) | |
| Other nationalities | 1123 (1.56) | 10,140 (1.99) | |
| Gender, | <0.001 | ||
| Male | 34,641 (48.09) | 239,875 (47.00) | |
| Female | 37,386 (51.91) | 270,536 (53.00) | |
| Physical activity, | <0.001 | ||
| Yes | 26,239 (36.43) | 154,585 (30.29) | |
| No | 45,788 (63.57) | 355,826 (69.71) | |
| Drinking status, | <0.001 | ||
| Yes | 15,944 (22.14) | 102,852 (20.15) | |
| No | 56,083 (77.86) | 407,559 (79.85) | |
| Drinking amount (g) | <0.001 | ||
| ≥170 | 6687 (9.30) | 39,479 (7.73) | |
| <170 | 65,240 (90.70) | 470,932 (92.27) | |
| Smoking amount (cigarettes) | 10 (8-20)∗ | 10 (7-20)∗ | <0.001 |
| Smoking status, | <0.001 | ||
| Yes | 10,683 (14.83) | 63,920 (12.52) | |
| No | 61,344 (85.17) | 446,491 (87.48) | |
| Dietary ratio, | <0.001 | ||
| Meat based | 2849 (3.96) | 13,554 (2.66) | |
| Meat balanced | 66,603 (92.47) | 482,864 (94.60) | |
| Vegetarian based | 2575 (3.58) | 13,993 (2.74) | |
| Sugar loving, | <0.001 | ||
| Yes | 940 (1.31) | 4560 (0.89) | |
| No | 71,087 (98.69) | 505,851 (99.11) | |
| Oil loving, | <0.001 | ||
| Yes | 2722 (3.78) | 13,068 (2.56) | |
| No | 69,305 (96.22) | 497,343 (97.44) | |
| Salt loving, | <0.001 | ||
| Yes | 4261 (5.92) | 20,896 (4.09) | |
| No | 67,766 (94.08) | 489,515 (95.91) | |
| Fatty liver, | <0.001 | ||
| Yes | 22,331 (31.00) | 52,800 (10.34) | |
| No | 49,696 (69.00) | 457,611 (89.66) | |
| Hypertension, | <0.001 | ||
| Yes | 29,937 (41.56) | 112,348 (22.01) | |
| No | 42,090 (58.44) | 398,063 (77.99) |
∗Median (IQR). Abbreviation: BMI: body mass index.
Figure 1Machine learning flowchart of this study. Abbreviations: LR: logistic regression; DT: decision tree; RF: random forest; AB: AdaBoost; XGB: XGBoost; ML: machine learning.
Figure 2Parameter selection process of the prediction model constructed by four classification tree models: (a) decision tree, (b) random forest, (c) AdaBoost, and (d) XGBoost. Note: the score of F‐1 has been tested when the max depth parameter of the model is between 10 and 50.
Screening the risk factors for T2DM by multiple logistic regression (CI = confidence interval).
| Intercept and variable | Odds ratio | 95% CI |
|
|
|---|---|---|---|---|
| Age (years) | 1.047 | (1.046-1.048) | 113.625 | <0.001 |
| BMI (kg/m2) | 1.016 | (1.012-1.020) | 7.894 | <0.001 |
| Waist circumference (cm) | 1.016 | (1.015-1.018) | 23.905 | <0.001 |
| Systolic pressure (mmHg) | 1.002 | (1.001-1.003) | 5.304 | <0.001 |
| Diastolic pressure (mmHg) | 1.001 | (0.999-1.002) | 1.650 | 0.099 |
| Ethnicity, | ||||
| Han | 1 | Ref | — | — |
| Uygur | 1.011 | (0.981-1.043) | 0.734 | 0.463 |
| Kazak | 0.460 | (0.426-0.497) | -19.669 | <0.001 |
| Hui | 1.075 | (1.040-1.111) | 4.269 | <0.001 |
| Mongolian | 0.464 | (0.342-0.616) | -5.127 | <0.001 |
| Other nationalities | 0.989 | (0.912-1.072) | -0.263 | 0.793 |
| Gender, | ||||
| Male | 1 | Ref | — | — |
| Female | 1.017 | (0.994-1.041) | 1.444 | 0.149 |
| Physical activity, | ||||
| No | 1 | — | — | |
| Yes | 0.715 | (0.699-0.731) | -29.179 | <0.001 |
| Drinking status, | ||||
| No | 1 | Ref | — | — |
| Yes | 0.891 | (0.864-0.918) | -7.424 | <0.001 |
| Drinking amount (g) | ||||
| <170 | 1 | Ref | — | — |
| ≥170 | 1.239 | (1.185-1.296) | 9.432 | <0.001 |
| Smoking amount (cigarettes) | 1.005 | (1.002-1.007) | 3.921 | <0.001 |
| Smoking status, | ||||
| No | 1 | Ref | — | — |
| Yes | 1.137 | (1.086-1.191) | 5.452 | <0.001 |
| Dietary ratio, | ||||
| Meat based | 1 | Ref | — | — |
| Meat balanced | 0.917 | (0.869-0.969) | -3.105 | 0.002 |
| Vegetarian based | 1.019 | (0.941-1.103) | 0.455 | 0.649 |
| Sugar loving, | ||||
| No | 1 | Ref | — | — |
| Yes | 0.994 | (0.896-1.101) | -0.119 | 0.906 |
| Oil loving, | ||||
| No | 1 | Ref | — | — |
| Yes | 1.157 | (1.072-1.249) | 3.730 | <0.001 |
| Salt loving, | ||||
| No | 1 | Ref | — | — |
| Yes | 0.989 | (0.932-1.049) | -0.362 | 0.718 |
| Fatty liver, | ||||
| No | 1 | Ref | — | — |
| Yes | 2.224 | (2.168-2.280) | 62.430 | <0.001 |
| Hypertension, | ||||
| No | 1 | Ref | — | — |
| Yes | 2.373 | (2.312-2.435) | 65.334 | <0.001 |
Abbreviation: BMI: body mass index.
Dataset description.
| Dataset | Sample distribution | Ratio | Description |
|---|---|---|---|
| Original data | 510,411/72,027 | 7 : 1 | Original data with full instances |
| SMOTE data | 510,411/510,411 | 1 : 1 | Dataset is balanced utilizing SMOTE oversampling |
The results of classification algorithms.
| Testing criteria | DT | RF | AB | XGB |
|---|---|---|---|---|
| Confusion matrix |
|
|
|
|
| Accuracy | 0.832 | 0.873 | 0.878 | 0.906 |
| Precision | 0.823 | 0.862 | 0.871 | 0.910 |
| Recall | 0.845 | 0.889 | 0.888 | 0.902 |
|
| 0.834 | 0.875 | 0.879 | 0.906 |
| AUC | 0.832 | 0.947 | 0.948 | 0.968 |
Abbreviations: AUC: the area under the receiver operating characteristic (ROC) curve; DT: decision tree; RF: random forest.
Figure 3ROC curve of all algorithms. Abbreviations: DT: decision tree; RF: random forest; AB: AdaBoost; XGB: XGBoost.
Figure 4Feature importance contributed to the XGBoost model measured by the F score.