| Literature DB >> 30666841 |
Byoung Geol Choi1,2, Seung Woon Rha3, Suhng Wook Kim1, Jun Hyuk Kang4, Ji Young Park5, Yung Kyun Noh6.
Abstract
PURPOSE: Many studies have proposed predictive models for type 2 diabetes mellitus (T2DM). However, these predictive models have several limitations, such as user convenience and reproducibility. The purpose of this study was to develop a T2DM predictive model using electronic medical records (EMRs) and machine learning and to compare the performance of this model with traditional statistical methods.Entities:
Keywords: Type 2 diabetes mellitus; big data; diabetes; machine learning; prediction
Mesh:
Year: 2019 PMID: 30666841 PMCID: PMC6342710 DOI: 10.3349/ymj.2019.60.2.191
Source DB: PubMed Journal: Yonsei Med J ISSN: 0513-5796 Impact factor: 2.759
Fig. 1Study flow chart. KUGH: Korea University Guro Hospital, EMR: electronic medical record.
Baseline Characteristics and Relative Risk Analysis for New-Onset Type 2 Diabetes Mellitus up to 5-Year Follow-up
| Features | Total (n=8454) | T2DM (n=404) | Non-DM (n=8050) | Relative risk (95% CI) | |
|---|---|---|---|---|---|
| Sex, male | 3970 (46.9) | 208 (51.4) | 3762 (46.7) | 0.062 | 1.20 (0.99–1.47) |
| Age (yr) | 53.9±14.1 | 60.8±11.4 | 53.5±14.1 | <0.001 | 1.04 (1.03–1.05) |
| Hypertension | 3644 (43.1) | 242 (59.9) | 3402 (42.2) | <0.001 | 2.04 (1.66–2.50) |
| CAD | 948 (11.2) | 77 (19.0) | 871 (10.8) | <0.001 | 1.94 (1.49–2.51) |
| Prior MI | 226 (2.6) | 10 (2.4) | 216 (2.6) | 0.800 | 0.92 (0.48–1.74) |
| Prior PCI | 463 (5.4) | 44 (10.8) | 419 (5.2) | <0.001 | 2.22 (1.60–3.09) |
| Dyslipidemia | 377 (4.4) | 28 (6.9) | 349 (4.3) | 0.014 | 1.64 (1.10–2.44) |
| Stroke | 832 (9.8) | 82 (20.2) | 750 (9.3) | <0.001 | 2.47 (1.92–3.19) |
| Chronic kidney disease | 42 (0.4) | 2 (0.4) | 40 (0.4) | 0.996 | 0.99 (0.23–4.13) |
| CKD-MDRD stage | <0.001 | 1.38 (1.23–1.57) | |||
| Stage 0 | 4163 (49.2) | 161 (39.8) | 4002 (49.7) | ||
| Stage 1 | 3810 (45.0) | 199 (49.2) | 3611 (44.8) | ||
| Stage 2 | 350 (4.1) | 28 (6.9) | 322 (4.0) | ||
| Stage 3 | 89 (1.0) | 14 (3.4) | 75 (0.9) | ||
| Stage 4 | 29 (0.3) | 2 (0.4) | 27 (0.3) | ||
| Stage 5 | 13 (0.1) | 0 (0.0) | 13 (0.1) | ||
| Hyperuricemia | 621 (7.3) | 50 (12.3) | 571 (7.0) | <0.001 | 1.85 (1.35–2.51) |
| Atrial fibrillation | 283 (3.3) | 20 (5.0) | 263 (3.3) | 0.066 | 1.54 (0.96–2.45) |
| A1c (%) | 5.51±0.30 | 5.69±0.29 | 5.50±0.30 | <0.001 | 11.5 (7.69–17.4) |
| Glucose (mL/dL) | 92.8±8.35 | 96.4±8.5 | 92.6±8.3 | <0.001 | 1.06 (1.05–1.08) |
| Medications | |||||
| ARB | 1827 (21.6) | 162 (40.0) | 1665 (20.6) | <0.001 | 2.56 (2.08–3.15) |
| ACEI | 579 (6.8) | 39 (9.6) | 540 (6.7) | 0.022 | 1.48 (1.05–2.09) |
| Diuretic | 1641 (19.4) | 164 (40.5) | 1477 (18.3) | <0.001 | 3.04 (2.47–3.73) |
| β-blockers | |||||
| Selective | 620 (7.3) | 54 (13.3) | 566 (7.0) | <0.001 | 2.04 (1.51–2.75) |
| Non-selective | 871 (10.3) | 90 (22.2) | 781 (9.7) | <0.001 | 2.66 (2.08–3.41) |
| CCB | |||||
| DHP | 1680 (19.8) | 137 (33.9) | 1543 (19.1) | <0.001 | 2.16 (1.74–2.67) |
| Non-DHP | 1023 (12.1) | 79 (19.5) | 944 (11.7) | <0.001 | 1.82 (1.41–2.36) |
| Nitrate | 1632 (19.3) | 132 (32.6) | 1500 (18.6) | <0.001 | 2.11 (1.70–2.62) |
| Aspirin | 88 (1.0) | 10 (2.4) | 78 (0.9) | 0.009 | 2.59 (1.33–5.04) |
| Clopidogrel | 814 (9.6) | 96 (23.7) | 718 (8.9) | <0.001 | 3.18 (2.49–4.05) |
| Cilostazol | 290 (3.4) | 32 (7.9) | 258 (3.2) | <0.001 | 2.59 (1.77–3.80) |
| Warfarin | 181 (2.1) | 22 (5.4) | 159 (1.9) | <0.001 | 2.85 (1.80–4.51) |
| PPI | 103 (1.2) | 14 (3.4) | 89 (1.1) | <0.001 | 3.21 (1.81–5.69) |
| Statin | 1605 (18.9) | 150 (37.1) | 1455 (18) | <0.001 | 2.67 (2.17–3.30) |
T2DM, type 2 diabetes mellitus; CI, confidence interval; CAD, coronary artery disease; MI, myocardial infarction; CKD-MDRD, chronic kidney disease–the modification of diet in renal disease; PCI, percutaneous coronary intervention; ARB, angiotensin receptor blockers; ACEI, angiotensin-converting enzyme inhibitors; CCB, calcium channel blockers; DHP, dihydropyridine; PPI, proton pump inhibitors.
Variables are expressed as mean±standard deviation or number (percentage).
Fig. 2Selection of features for type 2 diabetes mellitus prediction model generation using ‘Information Gain Attribute Evaluation.’ CAD, coronary artery disease; CKD-MDRD, chronic kidney disease–the modification of diet in renal disease; PCI, percutaneous coronary intervention; ARB, angiotensin receptor blockers; ACEI, angiotensin-converting enzyme inhibitors; CCB, calcium channel blockers; DHP, dihydropyridine; BB, beta blockers.
Fig. 3ROC analysis of the cross-validation tests ranging from 0 to 30 quartile according to the learning model. Change in AUC (A) and amount of change in AUC (B). ROC, receiver-operating characteristic; AUC, area under the curve, KNN, K-nearest neighbor.
Performance Evaluation of the Predictive Model for New-Onset Type 2 Diabetes Mellitus up to 5-Year Follow-up based on Hyper-Parameters (Number of Neighbors, Distance Measurement Method) in K-Nearest Neighbor Algorithm Learning Model
| No. neighbors | Area under the curve according to the distance metrics | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| City block | Euclidian | Cosine | Minkowski | Mahalanobis | Hamming | Jaccard | Correlation | Spearman | Chebyshev | |
| 1 | 0.53 | 0.53 | 0.53 | 0.52 | 0.53 | 0.53 | 0.53 | 0.53 | 0.53 | 0.53 |
| 10 | 0.65 | 0.64 | 0.65 | 0.64 | 0.63 | 0.63 | 0.63 | 0.64 | 0.62 | 0.63 |
| 100 | 0.76 | 0.75 | 0.75 | 0.74 | 0.73 | 0.75 | 0.75 | 0.74 | 0.74 | 0.72 |
| 200 | 0.77 | 0.76 | 0.76 | 0.75 | 0.75 | 0.75 | 0.75 | 0.74 | 0.73 | 0.72 |
| 300 | 0.77 | 0.77 | 0.76 | 0.76 | 0.76 | 0.75 | 0.75 | 0.74 | 0.73 | 0.71 |
| 500 | 0.77 | 0.77 | 0.77 | 0.77 | 0.76 | 0.75 | 0.75 | 0.72 | 0.73 | 0.71 |
| 1000 | 0.77 | 0.77 | 0.77 | 0.77 | 0.77 | 0.75 | 0.75 | 0.70 | 0.72 | 0.71 |
The performance evaluation of the prediction model is based on the results of Fig. 3, and a 10-fold cross-validation test was applied.
Fig. 410-fold cross-validation test of the predictive models of type 2 diabetes mellitus. KNN, K-nearest neighbor; AUC, area under the curve.