| Literature DB >> 35708754 |
Hao Yang1, Jiaxi Li2, Siru Liu3, Xiaoling Yang4, Jialin Liu1,5.
Abstract
BACKGROUND: Hypoglycemia is a common adverse event in the treatment of diabetes. To efficiently cope with hypoglycemia, effective hypoglycemia prediction models need to be developed.Entities:
Keywords: EHR; XGBoost; diabetes; electronic health record; hypoglycemia; learning; machine learning model; natural language processing; type 2 diabetes
Year: 2022 PMID: 35708754 PMCID: PMC9247813 DOI: 10.2196/36958
Source DB: PubMed Journal: JMIR Med Inform
Figure 1The patient selection process.
Statistics of missing values (N=29,843).
| Features | Missing data, n (%) |
| Red blood cell count | 1860 (6.2) |
| Hemoglobin | 1858 (6.2) |
| Blood platelet count | 1883 (6.3) |
| White blood cell count | 1858 (6.2) |
| Total protein | 1791 (6.0) |
| Albumin | 1768 (5.9) |
| Globulin | 1812 (6.1) |
| Urea | 1755 (5.9) |
| Alanine aminotransferase | 1821 (6.1) |
| Aspartate aminotransferase | 1809 (6.1) |
| Cholesterol | 2126 (7.1) |
| High-density lipoprotein | 2128 (7.1) |
| Low-density lipoprotein | 2131 (7.1) |
| Sodium | 1516 (5.1) |
| Chlorine | 1585 (5.3) |
| Thrombin time | 3970 (13.3) |
| Creatinine | 1749 (5.9) |
| Uric acid | 1769 (5.9) |
| C-reactive protein | 18,249 (61.1) |
| Procalcitonin | 20,101 (67.3) |
| Glycosylated hemoglobin or HbA1ca | 14,410 (48.3) |
| Prothrombin time | 3725 (12.5) |
| Activated partial thromboplastin time | 3779 (12.7) |
aHbA1c: glycated hemoglobin.
Figure 2The weights of variables importance. ALT: alanine aminotransferase; APIT: activated partial thromboplastin time; AST: aspartate aminotransferase; CRP: C-reactive protein; DBP: diastolic blood pressure; FIB: fibrinogen; GLB: globulin; GLU: glucose; HB: hemoglobin; HbA1c: glycated hemoglobin; HDL: high-density lipoprotein; hr: heart rate; iod-Nateglinide: iodine urea and Nateglinide; LDL: low-density lipoprotein; p: pulse; PCT: procalcitonin; PLT: blood platelet count; PT: prothrombin time; r: respiratory rate; RBC: red blood cell count; SBP: systolic pressure; t: body temperature; TP: total protein; TT: thrombin time; UA: uric acid; WBC: white blood cell count. (A) the curve between the number of features and accuracy. (B) the weights of variables importance (when accuracy is up to 90%).
Figure 3Training process of Paragraph Vector–Distributed Memory (PV-DM) model.
Demographics of patients with diabetes (N=29,843).
| Variables | Normoglycemia (blood glucose>3.9 mmol/L; n=27,039) | Hypoglycemia (blood glucose<3.9 mmol/L; n=2804) | |||
|
| .002 | ||||
|
| Female | 9479 (35.1) | 1065 (38) |
| |
|
| Male | 17,560 (64.9) | 1739 (62) |
| |
| Age (years), mean (SD; range) | 64.2 (12.3; 18-104) | 64.8 (12.6; 19-98) | .03 | ||
| BMI, mean (SD) | 24.3 (4.26) | 23.6 (5.24) | <.001 | ||
|
| <.001 | ||||
|
| No | 19,733 (73) | 1229 (43.8) |
| |
|
| Yes | 7306 (27) | 1575 (56.2) |
| |
|
| <.001 | ||||
|
| No | 17,766 (65.7) | 1422 (50.7) |
| |
|
| Yes | 9273 (34.3) | 1382 (49.3) |
| |
Figure 4Weight of the variables in the different models. ALT: alanine aminotransferase; APIT: activated partial thromboplastin time; AST: aspartate aminotransferase; CRP: C-reactive protein; DBP: diastolic blood pressure; FIB: fibrinogen; GLB: globulin; GLU: glucose; HB: hemoglobin; HDL: high-density lipoprotein; hr: heart rate; iod-Nateglinide: Iodine urea and Nateglinide; LDL: low-density lipoprotein; p: pulse; PCT: procalcitonin; PLT: blood platelet count; PT: prothrombin time; r: respiratory rate; RBC: red blood cell count; SBP: systolic pressure; t: body temperature; TP: total protein; TT: thrombin time; UA: uric acid; WBC: white blood cell count.
Accuracy and area under the receiver operating characteristic curve (AUC) of different models.
| Model | Embedding method | AUC, mean (SD) | Accuracy, mean (SD) | |
| XGBoost | XGBoost | 0.718 (0.0014) | 0.892 (0.002) |
N/Aa |
| XGBoost1 | XGBoost+CCb | 0.785 (0.0012) | 0.919 (0.002) |
<.001 vs XGBoost |
| XGBoost2 | XGBoost+CC+HPIc | 0.817 (0.0023) | 0.928 (0.001) |
<.001 vs XGBoost <.001 vs XGBoost1 |
| XGBoost3 | XGBoost+CC+HPI+FHd | 0.822 (0.0024) | 0.934 (0.002) |
<.001 vs XGBoost <.001 vs XGBoost1 <.001 vs XGBoost2 |
aN/A: not applicable.
bCC: chief complaints.
cHPI: history of present illness.
dFH: family history.
Figure 5Comparison between the change detection algorithm (CDA) and receiver operating characteristic (ROC) curve of different models. (A) The ROC curve of the 4 models. (B) The DCA curve of the 4 models.
Figure 6The confusion matrix of XGBoost3.