| Literature DB >> 36082270 |
Stefan Hegselmann1, Christian Ertmer2, Thomas Volkert2, Antje Gottschalk2, Martin Dugas3, Julian Varghese1.
Abstract
Background: Intensive care unit (ICU) readmissions are associated with mortality and poor outcomes. To improve discharge decisions, machine learning (ML) could help to identify patients at risk of ICU readmission. However, as many models are black boxes, dangerous properties may remain unnoticed. Widely used post hoc explanation methods also have inherent limitations. Few studies are evaluating inherently interpretable ML models for health care and involve clinicians in inspecting the trained model.Entities:
Keywords: artificial intelligence; doctor-in-the-loop; explainable AI; human evaluation; intensive care unit; interpretable machine learning; machine learning; readmission
Year: 2022 PMID: 36082270 PMCID: PMC9445989 DOI: 10.3389/fmed.2022.960296
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1Flowchart of the study. (A) We created a local cohort for the development of machine learning (ML) models. Information on intensive care unit (ICU) transfers was extracted from the hospital information system (HIS), and ICU data was extracted from the patient data management system (PDMS). Extensive preprocessing was applied to clean the data. We generated labels for 3 day ICU readmission and descriptive statistics as features. (B) Four ML models were developed for comparison. For LR, we also performed feature selection. The RNN directly uses the time series data. (C) The development of the EBM model involved four steps [see 1–4 in (C)]. We conducted parameter tuning for EBM (and our other models) and performed greedy risk function selection based on the importance determined on the temporal splits. In step 3, we inspected the model with a team of clinicians to identify and remove problematic risk functions. The remaining risk functions were used for the predictions. (D) We evaluated all models for their area under the precision-recall curve (PR-AUC) and area under the receiver operating characteristic curve (ROC-AUC) on the hold-out split. (E) External validation for the EBM and GBM models was performed on the Medical Information Mart for Intensive Care (MIMIC) version IV. (D,E) Error bars were determined with the standard deviation on five temporal splits. EBM, explainable boosting machine; SAPS II, Simplified Acute Physiology Score II; LR, logistic regression; GBM, gradient boosting machine; RNN, recurrent neural network.
Figure 2Flowchart of the cohort selection for the University Hospital Münster (UKM) cohort. Transfers to ICU and IMC wards of the UKM between 2006 and 2019 served as initial data. We included four ICUs managed by the ANIT-UKM department. Transfers had to be merged using a manual procedure to obtain consecutive ICU stays. Patients who died in the ICU and those who were discharged to an external ICU or IMC were excluded. We required an observation period of at least 3 days to ensure readmission to an ICU in the UKM. Lastly, implausible cases were removed.
Overview of the UKM cohort.
|
|
|
|
|
|---|---|---|---|
| Number of ICU stays, | 15,589 (100.0) | 14,698 (94.3) | 891 (5.7) |
| Number of patients, | 14,188 (100.0) | 13,349 (94.1) | 839 (5.9) |
| Age, mean ± SD, years | 63.33 ± 14.73 | 63.16 ± 14.77 | 66.08 ± 13.85 |
| Female sex, | 4,919 (100.0) | 4,659 (94.7) | 260 (5.3) |
| Male sex, | 10,670 (100.0) | 10,039 (94.1) | 631 (5.9) |
| Length of ICU stay, mean ± SD, days | 3.70 ± 8.08 | 3.67 ± 8.11 | 4.23 ±7.53 |
| ICU at discharge | ICU 1 ( | ICU 1 ( | ICU 1 ( |
| ICU 2 ( | ICU 2 ( | ICU 2 ( | |
| ICU 3 ( | ICU 3 ( | ICU 3 ( | |
| ICU 4 ( | ICU 4 ( | ICU 4 ( |
The key characteristics of all included ICU stays and the ICU stays divided by their labels. This information is based on ICU stays, so a single patient can be considered more than once.
Overview of the variables and features of the risk functions included in the final EBM model ordered by importance.
|
|
|
|
| |
|---|---|---|---|---|
| 1 | Age [years], Base Excess (BE) [mmol/L] | Static per patient, IQR 3 days | 4.20 | X |
| 2 | Drugs for constipation, Leucocytes [thousand/μL] | Unique 1 day, median 1 day | 3.52 | X |
| 3 | Blood volume out [mL], Procalcitonin [ng/mL] | Extrapolate 7 days, maximum 7 days | 2.57 | X |
| 4 | Hematocrit [%], Blood volume out [mL] | Maximum 3 days, extrapolate 3 days | 2.19 | X |
| 5 | Leucocytes [thousand/μL], Blood volume out [mL] | Median 1 day, extrapolate 3 days | 1.87 | X |
| 6 | Endotracheal tube (tubus) exists | Days since last application | 1.71 | |
| 7 | Age [years] | Static per patient | 1.70 | |
| 8 | Antithrombotic agents prophylactic dosage | Days since last application | 1.65 | |
| 9 | Partial thromboplastin time (PTT) [s] | Maximum 1 day | 1.63 | X |
| 10 | O2 saturation [%] | Minimum 12 hours | 1.58 | |
| 11 | Blood volume out [mL] | Extrapolate 7 days | 1.52 | |
| 12 | Gamma-GT [U/L] | Median 7 days | 1.46 | |
| 13 | Chloride [mmol/L] | Trend per day 3 days | 1.40 | |
| 14 | Heart rate [bpm] | Minimum 4 hours | 1.39 | |
| 15 | Partial thromboplastin time (PTT) [s] | Maximum 3 days | 1.37 | X |
| 16 | Chloride [mmol/L] | Minimum 1 day | 1.37 | |
| 17 | Hemoglobin [mmol/L] | Maximum 3 days | 1.30 | |
| 18 | Length of stay before ICU [days] | Manually added | 1.28 | |
| 19 | Hematocrit [%] | Maximum 3 days | 1.26 | |
| 20 | Calcium [mmol/L] | Trend per day 3 days | 1.26 | X |
| 21 | Estimated glomerular filtration rate (eGFR) ml/min/1.73 m2 | Trend per day 7 days | 1.24 | |
| 22 | Richmond agitation sedation (RAS) scale | Maximum 3 days | 1.24 | |
| 23 | Urine volume out [mL] | Extrapolate 1 day | 1.24 | |
| 24 | Thrombocytes [thousand/μL] | Trend per day 7 days | 1.24 | |
| 25 | Blood volume out [mL] | Extrapolate 3 days | 1.23 | |
| 26 | paO2/FiO2 [mmHg/FiO2] | Median 1 day | 1.21 | |
| 27 | pH | Trend per day 3 days | 1.21 | |
| 28 | Phosphate [mg/dL] | Minimum 7 days | 1.20 | |
| 29 | pH | Median 1 day | 1.20 | |
| 30 | Body core temperature [°C] | Minimum 1 day | 1.18 | X |
| 31 | Creatine kinase (CK) [U/L] | Minimum 7 days | 1.15 | |
| 32 | Richmond agitation sedation (RAS) scale | Trend per day 12 hours | 1.13 | X |
| 33 | Potassium [mmol/L] | Median 1 day | 1.13 | |
| 34 | Glasgow coma scale (GCS) score | Minimum 3 days | 1.11 | |
| 35 | Body core temperature [°C] | Median 1 day | 1.10 | |
| 36 | Base excess (BE) [mmol/L] | IQR 3 days | 1.10 | X |
| 37 | Blood urea nitrogen [mg/dL] | Minimum 3 days | 1.10 | |
| 38 | paO2/FiO2 [mmHg/FiO2] | Trend per day 3 days | 1.09 | |
| 39 | Drugs for constipation | Unique 1 day | 1.09 | |
| 40 | Urine volume out [mL] | Extrapolate 7 days | 1.09 | |
| 41 | Partial thromboplastin time (PTT) [s] | Minimum 7 days | 1.07 | X |
| 42 | Diastolic blood pressure [mmHg] | Median 1 day | 1.06 | |
| 43 | Partial pressure of oxygen (pO2) [mmHg] | Minimum 12 hours | 1.06 | |
| 44 | Creatine kinase-MB (CK-MB) [U/L] | Maximum 3 days | 1.05 | |
| 45 | Richmond agitation sedation (RAS) scale | Maximum 1 day | 1.05 | |
| 46 | Partial thromboplastin time (PTT) [s] | Minimum 3 days | 1.05 | X |
| 47 | Systolic blood pressure [mmHg] | IQR 12 hours | 1.05 | |
| 48 | paO2/FiO2 [mmHg/FiO2] | Median 3 days | 1.04 | |
| 49 | Creatine kinase (CK) [U/L] | Median 7 days | 1.04 | X |
| 50 | Lactate [mmol/L] | Maximum 3 days | 1.04 | |
| 51 | Creatine kinase-MB (CK-MB) [U/L] | Median 3 days | 1.04 | |
| 52 | Lactate [mmol/L] | Minimum hours | 1.00 | |
| 53 | Phosphate [mg/dL] | Maximum 1 day | 1.00 | |
| 54 | Partial thromboplastin time (PTT) [s] | Maximum 7 days | 0.98 | X |
| 55 | Partial pressure of carbon dioxide (PCO2) [mmHg] | Median 1 day | 0.98 | |
| 56 | Base excess (BE) [mmol/L] | Trend per day 3 days | 0.97 | |
| 57 | Glucose [mg/dL] | Median 3 days | 0.97 | |
| 58 | Base excess (BE) [mmol/L] | Minimum hours | 0.96 | |
| 59 | Methemoglobinemia (MetHb) [%] | Minimum hours | 0.96 | |
| 60 | Is on automatic ventilation | Days since last application | 0.95 | |
| 61 | Body core temperature [°C] | Minimum 4 hours | 0.95 | X |
| 62 | Partial pressure of carbon dioxide (PCO2) [mmHg] | IQR 1 day | 0.95 | |
| 63 | Sodium [mmol/L] | Median 3 days | 0.93 | |
| 64 | Leucocytes [thousand/μL] | Median 1 day | 0.92 | |
| 65 | Sodium [mmol/L] | Trend per day 3 days | 0.92 | |
| 66 | Procalcitonin [ng/mL] | Maximum 7 days | 0.91 | |
| 67 | Base excess (BE) [mmol/L] | Median hours | 0.91 | |
| 68 | Mean blood pressure [mmHg] | Median 4 hours | 0.87 | |
| 69 | Leucocytes [thousand/μL] | Trend per day 3 days | 0.84 | X |
| 70 | pH | Median 3 days | 0.84 | |
| 71 | Bilirubin total [mg/dL] | Maximum 7 days | 0.84 | |
| 72 | Partial pressure of oxygen (pO2) [mmHg] | IQR hours | 0.84 | |
| 73 | Base excess (BE) [mmol/L] | IQR 1 day | 0.83 | |
| 74 | Body core temperature [°C] | Trend per day 1 day | 0.83 | |
| 75 | C-reactive protein [mg/dL] | Maximum 3 days | 0.83 | |
| 76 | Heart rate [bpm] | Minimum 1 day | 0.82 | |
| 77 | Hematocrit [%] | Median hours | 0.80 | |
| 78 | Partial pressure of carbon dioxide (PCO2) [mmHg] | Minimum 3 days | 0.76 | |
| 79 | Mean blood pressure [mmHg] | Median hours | 0.72 | |
| 80 | Calcium [mmol/L] | Maximum 1 day | 0.69 | |
| 81 | Estimated respiratory rate | Median 1 day | 0.68 | |
| 82 | pH | IQR 1 day | 0.67 | |
| 83 | Leucocytes [thousand/μL] | IQR 3 days | 0.63 | |
| 84 | Heart rate [bpm] | IQR 4 hours | 0.60 | |
| 85 | Reduced hemoglobin (RHb) | Median hours | 0.60 | X |
These risk functions were selected from a total of 1,423 based on their importance on the five-temporal splits. Risk functions 1–5 are two-dimensional, and the remaining functions are one-dimensional. The relative importance was determined on the final training split. The last column indicates whether a risk function was excluded during the model inspection by a team of physicians. Visualizations of all risk functions and the detailed reasons for exclusion are given in the supplement.
Figure 3Two most important risk functions and four excluded risk functions of the EBM model. (A,B) Two most important risk functions that are included in the EBM model. (A) Contains the number of days since the last existence of an endotracheal tube. Patients that have an endotracheal tube immediately before discharge have a highly increased risk. Lower risk is assigned to values between 0.4 and 4.1 days. Also, patients with no endotracheal tube (unknown) receive an increased risk. (B) The risk function for age shows an increased risk for higher age values. There is a peak at 60 years with no obvious explanation. (C) A maximum PTT value over the last 3 days before discharge between 82.5 and 115.5 s gets a lower risk for 3 day ICU readmission. It was identified that this is an artifact of the previous procedure to determine the PTT for cardiac surgery patients. This will not generalize for future data. (D) For a median hematocrit between 24.875 and 28.525%, the model determined an elevated risk. For slightly lower and higher values, the risk is negative. This is against common medical knowledge, where a decreasing hematocrit value should be associated with increased risk. (E) The interquartile range (IQR) of the partial pressure of carbon dioxide (pCO2) over the last day before discharge receives an increased risk for values between 0 and 0.863 and 2.513 and 3.313 mmHg. However, the interpretation of this behavior and determining its clinical implications was impossible. (F) The 2D risk function for age and the IQR of the base excess (BE) over 3 days. Patients over 71.5 years have a high risk for a high IQR of the BE. Patients between 59.5 and 71.5 have only a slightly increased risk for low IQR values, and younger patients have a decreased risk across all BE values. The team excluded it due to a lack of interpretability.
Figure 4Performance evaluation on the University Hospital Münster (UKM) cohort (A,B) and external validation on the Medical Information Mart for Intensive Care version IV (MIMIC-IV) database (C,D). (A) The area under the precision-recall curve (PR-AUC) was considered the most relevant performance indicator owing to the imbalanced label distribution. We optimized the PR-AUC during the parameter tuning and selection procedures for all models. The differences between models are relatively small. The explainable boosting machines (EBMs) and gradient boosting machines (GBMs) show the highest PR-AUC. (B) The area under the receiver operating characteristic curve (ROC-AUC) was determined as an additional performance measure. Again, the EBM and GBM models performed best. (C,D) The same performance indicators were determined on the MIMIC-IV database. Both models again showed similar results. The confidence intervals for all curves were determined with the standard deviation on the five temporal splits.