| Literature DB >> 35428291 |
Yuxi Li1, Tak-Ming Chan2, Jinghan Feng3, Liang Tao4, Jie Jiang1, Bo Zheng1, Yong Huo1, Jianping Li5.
Abstract
BACKGROUND: Clinical data repositories (CDR) including electronic health record (EHR) data have great potential for outcome prediction and risk modeling. We built a prediction tool integrated with CDR based on pattern discovery and demonstrated a case study on contrast related acute kidney injury (AKI).Entities:
Keywords: Acute kidney injury; Machine learning; Pattern discovery; Predictive tool
Mesh:
Year: 2022 PMID: 35428291 PMCID: PMC9013021 DOI: 10.1186/s12911-022-01841-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1Pattern discovery based visual analytics tool using the in-mode of knowledge incorporation: age in pattern changed by clinician on-the-fly. Note all the prediction metrics are for the training data. The left panel displays the pattern where the modified variable Age [AKI] highlighted in blue illustrates the clinician’s domain knowledge incorporation (in-mode). In the pattern, the prediction target (AKI-Yes) is shown at the top. Pattern variables were connected via arcs indicating statistical significances of Chi-square test of independence [13]. A click on a variable removes an attribute (dimming the blue vertical bar). A click on “Add attribute” shows a pop-up list of variables that could be added by users. The top right panel shows the training predictive metrics of the pattern once “Update pattern” is clicked. The bottom right pattern shows the pattern history summary, where the last pattern is generated in the pre-mode. A click on “Export results” exports the current pattern for post-mode refinement
Training and testing statistics of the AKI case study
| Training (N = 1791) | Testing (N = 769) | P value | |
|---|---|---|---|
| Age, mean (SD), years | 64.37 (11.07) | 64.21 (11.00) | 0.742 |
| Age (> 60)*, n(%) | 1104 (61.6%) | 488 (63.5%) | 0.413 |
| Male, n (%) | 1189 (66.4%) | 516 (67.1%) | 0.769 |
| Anemia, n (%) | 33 (1.8%) | 27 (3.5%) | 0.016 |
| Diabetes, n (%) | 783 (43.7%) | 345 (44.8%) | 0.623 |
| Heart Failure, n (%) | 127 (7.1%) | 63 (8.2%) | 0.372 |
| Hypotension, n (%) | 20 (1.1%) | 8 (1.0%) | 0.970 |
| MI history, n (%) | 127 (7.1%) | 49 (6.4%) | 0.566 |
| Hypercholesterolemia, n (%) | 1542 (86.1%) | 683 (88.8%) | 0.071 |
| Urgent PCI, n (%) | 204 (11.4%) | 114 (14.8%) | 0.019 |
| Hypertension, n (%) | 1251 (69.8%) | 553 (71.9%) | 0.612 |
| IABP, n (%) | 4 (0.5%) | 12 (0.7%) | 0.867 |
| Contrast volume, mean (SD), mL | 135.23 (71.17) | 124.46 (63.90) | < 0.001 |
| GFR, mean (SD), ml/min | 77.76 (26.44) | 82.56 (26.86) | 0.248 |
| HDL-C, mean (SD), mmol/L | 1.02 (0.26) | 1.02 (0.25) | 0.784 |
| Pre peak creatinine, mean (SD), μmol/L | 109.78 (18.80) | 106.76 (19.58) | 0.558 |
| LVEF, mean (SD), % | 66.27 (11.37) | 66.36 (10.96) | 0.841 |
| LVEF (≤ 45%)*, n (%) | 103 (5.7%) | 44 (5.7%) | 0.949 |
| AKI, n (%) | 149 (8.3%) | 40 (5.2%) | 0.007 |
SD, standard deviation; MI, myocardial infarction; PCI, percutaneous coronary intervention; IABP, intra-aortic balloon pump; GFR, glomerular filtration rate; HDL-C, high density lipoprotein cholesterol; LVEF, left ventricular ejection fraction; AKI, acute kidney injury
*Categorized versions to illustrate training–testing consistency of the variables even after categorization
Fig. 2The risk factors importance from Random Forest for AKI. PCI, percutaneous coronary intervention; HDL, high density lipoprotein cholesterol; IABP, intra-aortic balloon pump; AKI, acute kidney injury
Testing results of the three knowledge incorporation models in comparison with other risk scores and machine learning methods
| Model | AUC | Sensitivity | Specificity | F-score | G-mean |
|---|---|---|---|---|---|
| (1) Pre-mode | 0.77 | 0.57 | 0.17 | 0.69 | |
| (2) In-mode | 0.80 | 0.70 | 0.80 | 0.26 | |
| (3) Post-mode | 0.60 | 0.88 | 0.73 | ||
| Mehran’s (> 7.8) | 0.70 | 0.24 | 0.20 | 0.47 | |
| Chen's (≥ 13) | 0.72 | 0.42 | 0.88 | 0.24 | 0.61 |
| Gao's (> 5) | 0.67 | 0.34 | 0.29 | 0.57 | |
| AGEF (≥ 0.66) | 0.62 | 0.37 | 0.88 | 0.21 | 0.57 |
| Logistic regression | 0.59 | 0.84 | 0.33 | 0.12 | 0.53 |
| Decision tree | 0.58 | 0.61 | 0.55 | 0.12 | 0.58 |
| Random forest | 0.64 | 0.58 | 0.72 | 0.17 | 0.64 |
| Easy ensemble | 0.70 | 0.61 | 0.79 | 0.23 | 0.69 |
The evaluation metrics are defined as follows:
Specificity = TN/(TN + FP); Sensitivity = TP/(TP + FN); Precision = TP/(TP + FP); F-score = 2*Precision*Recall/(Precision + Recall) if TP > 0 and 0 if TP = 0; TP is the count of true positives, FP of false positive, TN of true negatives and FN of false negatives
AUC, areas-under-curve; AGEF, Age, Glomerular filtration rate and Ejection Fraction