| Literature DB >> 33273837 |
Dongmei Pei1, Tengfei Yang1, Chengpu Zhang1.
Abstract
BACKGROUND: To predict and make an early diagnosis of diabetes is a critical approach in a population with high risk of diabetes, one of the devastating diseases globally. Traditional and conventional blood tests are recommended for screening the suspected patients; however, applying these tests could have health side effects and expensive cost. The goal of this study was to establish a simple and reliable predictive model based on the risk factors associated with diabetes using a decision tree algorithm.Entities:
Keywords: J48 algorithm; decision tree; diabetes; risk factors
Year: 2020 PMID: 33273837 PMCID: PMC7705272 DOI: 10.2147/DMSO.S279329
Source DB: PubMed Journal: Diabetes Metab Syndr Obes ISSN: 1178-7007 Impact factor: 3.168
Figure 1Flow chart of records that were excluded from the physical examination database of Shengjing Hospital of China Medical University (January–July, 2017).
Characteristics of Variables of the Study Participants
| Variables | Possible Values | Diabetes N=541 | No-Diabetes | p-value |
|---|---|---|---|---|
| Age | 20–34 years old | 118(21.8%) | 1773(60.9%) | <0.001 |
| 35–49 years old | 230(42.5%) | 796(27.3%) | ||
| 50–65 years old | 193(35.7%) | 344(11.8%) | <0.001 | |
| Gender | Male | 331(61.2%) | 1156(39.7%) | |
| Female | 210(38.8%) | 1757(60.3%) | ||
| Marital status | Single/widow/divorced/separated | 125(23.1%) | 417(14.3%) | <0.001 |
| Married/cohabitation | 416(76.9%) | 2496(85.7%) | ||
| Education levels | Junior college* | 79(14.6%) | 329(11.3%) | <0.001 |
| Undergraduate | 379(70.1%) | 2088(71.7%) | ||
| Graduate | 83(15.3%) | 496(17.0%) | ||
| Annual income (USD) | ≤5000 | 186(34.4%) | 887(30.4%) | 0.162 |
| 5000–10,000 | 292(54.0%) | 1682(57.7%) | ||
| >10,000 | 63(11.6%) | 344(11.8%) | ||
| Workweek (hours) | ≤40 | 376(69.5%) | 2031(69.7%) | 0.918 |
| >40 | 165(30.5%) | 882(30.3%) | ||
| BMI | <25 | 278(51.4%) | 2403(82.5%) | <0.001 |
| ≥25 | 263(48.6% | 510(17.5%) | ||
| History of hypertension | No | 395(73.0% | 2489(85.4%) | <0.001 |
| Yes | 146(27.0%) | 424(14.6%) | ||
| History of cardiovascular disease or stroke | No | 443(81.9%) | 2555(87.7%) | <0.001 |
| Yes | 98(18.1%) | 358(12.3%) | ||
| History of hyperlipidemia | No | 430(79.5%) | 2514(86.3%) | <0.001 |
| Yes | 111(20.5%) | 399(13.7%) | ||
| Family history of diabetes | No | 308(56.9%) | 2376(81.6%) | <0.001 |
| Yes | 233(43.1%) | 537(18.4%) | ||
| Family history of hypertension | No | 401(74.1%) | 2149(73.8%) | 0.865 |
| Yes | 140(25.9%) | 764(26.2%) | ||
| Family history of cardiovascular disease or stroke | No | 448(82.8%) | 2512(86.2%) | <0.001 |
| Yes | 93(17.2%) | 401(13.8%) | ||
| Family history of hyperlipidemia | No | 414(76.5%) | 2229(76.5%) | 0.998 |
| Yes | 127(23.5%) | 684(23.5%) | ||
| Smoking | No | 370(68.4%) | 2301(79.0%) | <0.001 |
| Yes | 171(31.6%) | 612(21.0%) | ||
| Alcohol consumption | No | 416(76.9%) | 2291(78.6%) | 0.363 |
| Yes | 125(23.1%) | 622(21.4%) | ||
| Tea preference | No | 202(37.3%) | 1217(41.8%) | 0.054 |
| Yes | 339(62.7%) | 1696(58.2%) | ||
| Fruit preference | No | 268(49.5%) | 1376(47.2%) | 0.325 |
| Yes | 273(50.5%) | 1537(52.8%) | ||
| Fish preference | No | 292(52.8%) | 1668(57.3%) | 0.157 |
| Yes | 249(54.0%) | 1245(42.7%) | ||
| Vegetable preference | No | 123(22.7%) | 697(23.9%) | 0.550 |
| Yes | 418(77.3%) | 2216(76.1%) | ||
| Meat preference | No | 108(20.0%) | 608(20.9%) | 0.632 |
| Yes | 433(80.0%) | 2305(79.1%) | ||
| Milk preference | No | 414(76.5%) | 2167(74.4%) | 0.294 |
| Yes | 127(23.5%) | 746(25.6%) | ||
| Sleep duration(hour) | ≤4 | 111(20.5%) | 450(15.4%) | <0.001 |
| 5–6 | 225(41.6%) | 1163(39.9%) | ||
| >6 | 205(37.9%) | 1300(44.6%) | ||
| Physical activity | Less than 30 minutes a day | 305(56.4%) | 1117(38.3%) | <0.001 |
| 30 minutes or more a day | 236(43.6%) | 1796(61.7%) | ||
| Work-related stress | No | 220(40.7%) | 2148(73.7%) | <0.001 |
| Yes | 321(59.3%) | 765(26.3%) |
Notes: *Participants who enrolled in a 3-year program.
Abbreviation: BMI, body mass index.
Confusion Matrix of Test Dataset
| Actual outcome | Predicted Outcome | ||
|---|---|---|---|
| Person without diabetes | Person with diabetes | ||
| Total dataset | Person without diabetes | 844 | 35 |
| Person with diabetes | 65 | 90 | |
| Accuracy (%) | 90.3 | ||
| Precision (%) | 89.7 | ||
| Recall (%) | 90.3 | ||
The Results of Classification Algorithms
| Model | Accuracy | Precision | Recall | AUC | |
|---|---|---|---|---|---|
| AdboostM1 | 0.901 | 0.893 | 0.901 | 0.894 | 0.866 |
| J48 | 0.903 | 0.897 | 0.903 | 0.899 | 0.872 |
| Logistic | 0.897 | 0.887 | 0.897 | 0.887 | 0.838 |
| Naïve Bayes | 0.885 | 0.878 | 0.885 | 0.888 | 0.833 |
| Bayes Net | 0.884 | 0.880 | 0.884 | 0.882 | 0.843 |
Notes: AUC: the area under the receiver operating characteristic (ROC) curve.
Figure 2Graphical representation of the decision tree model of dataset. Sample sizes were devoted in the brackets for each node.
A List of the 20 Rules Used for Constructing the Decision Tree
| Rule 1: IF BMI≤25 and sleep time>6 hours, THEN patient without diabetes (723/731 or 98.9%) |
| Rule 2: IF BMI≤25 and sleep time≤6hours, without stress, THEN patient without diabetes (933/987 or 94.5%) |
| Rule 3: IF BMI≤25 and sleep time≤6 hours, with stress, single and age≤34, THEN patient without diabetes (3/3 or 100%) |
| Rule 4: IF BMI≤25 and sleep time≤6 hours, with stress, single marital status and age>34, THEN patient with diabetes (23/26 or 88.5%) |
| Rule 5: IF BMI≤25 and sleep time≤6 hours, with stress, married and junior education level, THEN patient with diabetes (4/5 or 80%) |
| Rule 6: IF BMI≤25 and sleep time≤6 hours, with stress, married and undergraduate or graduate education level, THEN patient without diabetes (47/54 or 87%) |
| Rule 7: IF BMI>25, with activity, THEN patient without diabetes (179/188 or 95.2%) |
| Rule 8: IF BMI>25, without activity, with sleep time≤6 hours, negative family history of diabetes, graduate education level and history of hyperlipidemia, THEN patient with diabetes (28/29 or 96.6%) |
| Rule 9: IF BMI>25, without activity, with sleep time≤6 hours, negative family history of diabetes, graduate education level and negative history of hyperlipidemia, THEN patient without diabetes (3/3 or 100%) |
| Rule 10: IF BMI>25, without activity, with sleep time≤4hours, negative family history of diabetes, undergraduate or junior education level, negative history of cardiovascular, age>34 and male, THEN patient with diabetes (7/9 or 77.8%) |
| Rule 11: IF BMI>25, without activity, with sleep time≤4 hours, negative family history of diabetes, undergraduate or junior education level, negative history of cardiovascular, age>34 and female, THEN patient without diabetes (2/2 or 100%) |
| Rule 12: IF BMI>25, without activity, with 4<sleep time≤6hours, negative family history of diabetes, undergraduate or junior education level, negative history of cardiovascular, and age>34, THEN patient without diabetes (8/10 or 80%) |
| Rule 13: IF BMI>25, without activity, with sleep time≤6 hours, negative family history of diabetes, undergraduate or junior education level, negative history of cardiovascular and age≤34, THEN patient without diabetes (21/22 or 95.5%) |
| Rule 14: IF BMI>25, without activity, with sleep time≤6 hours, negative family history of diabetes, undergraduate or junior education level, negative history of cardiovascular and positive history of hypertension, THEN patient with diabetes (36/41 or 87.8%) |
| Rule 15: IF BMI>25, without activity, with sleep time≤6 hours, negative family history of diabetes, undergraduate or junior education level, positive history of cardiovascular and negative history of hypertension, THEN patient without diabetes (5/5 or 100%) |
| Rule 16: IF BMI>25, without activity, with sleep time≤6 hours, positive history of diabetes and positive history of cardiovascular, THEN patient with diabetes (94/96 or 97.9%) |
| Rule 17: IF BMI>25, without activity, with sleep time≤6 hours, negative family history of diabetes, without stress and married, THEN patient without diabetes (7/8 or 87.5%) |
| Rule 18: IF BMI>25, without activity, with sleep time≤6 hours, negative family history of diabetes, without stress and single, THEN patient with diabetes (19/22 or 86.4%) |
| Rule 19: IF BMI>25, without activity, with sleep time≤6 hours, negative family history of diabetes, and stress, THEN patient with diabetes (40/44 or 90.9%) |
| Rule 20: IF BMI>25, without activity, with sleep time>6 hours, THEN patient without diabetes (96/103 or 93.2%) |