| Literature DB >> 35742018 |
Kaouter Karboub1,2,3, Mohamed Tabaa4.
Abstract
This paper targets a major challenge of how to effectively allocate medical resources in intensive care units (ICUs). We trained multiple regression models using the Medical Information Mart for Intensive Care III (MIMIC III) database recorded in the period between 2001 and 2012. The training and validation dataset included pneumonia, sepsis, congestive heart failure, hypotension, chest pain, coronary artery disease, fever, respiratory failure, acute coronary syndrome, shortness of breath, seizure and transient ischemic attack, and aortic stenosis patients' recorded data. Then we tested the models on the unseen data of patients diagnosed with coronary artery disease, congestive heart failure or acute coronary syndrome. We included the admission characteristics, clinical prescriptions, physiological measurements, and discharge characteristics of those patients. We assessed the models' performance using mean residuals and running times as metrics. We ran multiple experiments to study the data partition's impact on the learning phase. The total running time of our best-evaluated model is 123,450.9 mS. The best model gives an average accuracy of 98%, highlighting the location of discharge, initial diagnosis, location of admission, drug therapy, length of stay and internal transfers as the most influencing patterns to decide a patient's readiness for discharge.Entities:
Keywords: Electronic Health Records; cardiovascular diseases; discharge; intensive care units; machine learning
Year: 2022 PMID: 35742018 PMCID: PMC9222879 DOI: 10.3390/healthcare10060966
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Figure 1Flow dynamic in ICUs.
Figure 2MIMIC III dataset generation process.
Baseline patients’ characteristics and outcome measures.
| Overall Population Characteristics | Dead at Discharge Characteristics | Alive at Discharge Characteristics | |
|---|---|---|---|
|
| 65 | 73 | 64 |
|
| 53% | 54% | 57% |
|
| |||
| Emergency | 2804 | 226 | 2577 |
| Elective | 1466 | 14 | 1451 |
| Urgent | 132 | 11 | 121 |
|
| |||
| Coronary artery disease | 2808 | 54 | 2753 |
| Congestive heart failure | 1315 | 175 | 1140 |
| Acute coronary syndrome | 279 | 22 | 257 |
|
| 80 | 100–110 | 60–100 |
|
| 21 | 12–20 | ≤12 or ≥20 |
|
| |||
| Cholesterol lowering medications | 12.83% | 0.14% | 99.86% |
| ACE inhibitors | 14.62% | 0.38% | 99.62% |
| Bronchodilators | 10.66% | 0.47% | 99.53% |
| Diuretics | 9.1% | 1% | 99% |
| Insulins | 7.85% | 1.04% | 98.96% |
| Anticoagulants | 7.42% | 0.8% | 99.2% |
| Electrolytes | 13.22% | 0.66% | 99.34% |
| Beta blockers | 7.2% | 1.78% | 98.22% |
| Antiplatelet agents and DAPT | 3.46% | 3.58% | 96.42% |
| Anti-histamines | 3.44% | 0.53% | 99.47% |
| Quinolone antibiotics | 3.2% | 1.42% | 98.58% |
| Nitrates | 2.63% | 1.21% | 98.79% |
| Peptides | 1.36% | 1.01% | 98.99% |
| Glucose elevating agents | 1.63% | 5.88% | 94.12% |
| Antidysrhythmics | 0.89% | 4.6% | 95.4% |
| Calcium channel blockers | 0.35% | 3.95% | 96.05% |
| Sulfonic acid | 0.15% | 6.06% | 93.94% |
|
| 2.9 days | 3.1 days | 2.7 days |
Figure 3Feature selection of informative features. (a,b) represent the correlation between drug dosage and type, and time of discharge, respectively. Figure (c) is the time that marks the end of a specified drug therapy. (d,e) are the chronological timing of admission to ICUs, and their actual discharge time.
Parameters of models.
| Decision Trees | Gradient Boosted Models |
|---|---|
| Criterion: Entropy | Number of estimators: 2000 |
| Max depth: 10 | Learning rate: 0.3 |
| Splitter: Best | Criterion: MSE |
| Max features: log2 | Min sample leaf: 2 |
| Min samples leaf: 4 | Min samples split: 5 |
| Min samples split: 10 |
Figure 4Architecture of the proposed approach.
Performance comparison summary between RM values, accuracy, and prediction time metrics.
| Linear Regression | Trees Regression | Mixed (Blenders) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Exp 1 | Model | LR * | TR1 * | TR2 * | TR3 * | TR4 * | B1 * | B2 * | B3 * | |
| Accuracy % | 0.89 | 0.783 | 0.917 | 0.761 | 0.726 | 0.98 | 0.935 | 0.961 | ||
| RM | Validation (18%) | 0.00682 | 0.18 | 0.00399 | 0.01714 | 0.01353 | - | - | - | |
| Cross Validation (36%) | 0.00575 | 0.09 | 0.00261 | 0.00953 | 0.01357 | - | - | - | ||
| Holdout (72%) | 0.00539 | - | 0.00173 | 0.00521 | 0.00312 | 0.0012 | 0.00158 | 0.00058 | ||
| Prediction time (mS) | 5627.83 | 3734.79 | 6914.43 | 22,443.8 | 17,452.56 | 123,450.9 | 123,800 | 69,376.8 | ||
| Exp 2 | Accuracy % | 0.88 | 0.77 | 0.91 | 0.77 | 0.756 | 0.978 | 0.91 | 0.956 | |
| RM | Validation (18%) | 0.00679 | 0.18 | 0.00399 | 0.01714 | 0.01353 | - | - | - | |
| Cross Validation (36%) | 0.00571 | 0.09 | 0.00261 | 0.00953 | 0.01357 | - | - | - | ||
| Holdout (80%) | 0.00512 | - | 0.00159 | 0.00503 | 0.00298 | 0.0012 | 0.00147 | 0.00054 | ||
| Prediction time (mS) | 5628.01 | 3699.45 | 6999 | 23,166.4 | 20,514.6 | 123,551 | 132,500 | 70,015.2 | ||
| Exp 3 | Accuracy % | 0.89 | 0.78 | 0.914 | 0.76 | 0.754 | 0.937 | 0.914 | 0.915 | |
| RM | Validation (18%) | 0.00659 | 0.18 | 0.00373 | 0.01694 | 0.0112 | - | - | - | |
| Cross Validation (46%) | 0.00551 | 0.085 | 0.00191 | 0.00958 | 0.01057 | - | - | - | ||
| Holdout (72%) | 0.00512 | - | 0.00159 | 0.00503 | 0.00298 | 0.80319 | 0.00147 | 0.00054 | ||
| Prediction (mS) | 5568 | 4697 | 7859 | 23,894.2 | 25,735.02 | 215,693 | 151,236 | 78,020 | ||
| Exp 4 | Accuracy % | 0.89 | 0.781 | 0.925 | 0.709 | 0.781 | 0.95 | 0.965 | 0.89 | |
| RM | Validation (25%) | 0.00614 | 0.1 | 0.00329 | 0.01658 | 0.01123 | - | - | - | |
| Cross Validation (36%) | 0.00541 | 0.09 | 0.00261 | 0.00953 | 0.01357 | - | - | - | ||
| Holdout (72%) | 0.00481 | - | 0.00148 | 0.00493 | 0.00298 | 0.000001 | 0.9947 | 0.00054 | ||
| Run time for 100 Predictions (mS) | 5750 | 3610.9 | 6990 | 32,548 | 24,590.4 | 133,511 | 15,039.3 | 699,081 | ||
* LR: Auto-tuned stochastic Gradient Descent Regression * TR1: Decision Tree; * TR2: Extreme Gradient Boosted Trees Regression with Early Stopping * TR3: Gradient Boosted Greedy Trees Regression with Early Stopping * TR4: Light Gradient Boosted Trees Regressor with Early Stopping. * B1: Advanced Generalized Linear Regression Model (GLRM); * B2: Efficient Neural Network (ENET); * B3: AVG Blender.
Figure 5MIMIC III: training results.
Figure 6Residuals of different models (yellow represent expected outputs and blue ground-truth output).
Results and Benchmarking.
| Ref | Methods and Approach | Dataset | Metrics and Results | Scoring of Recommendation Strength |
|---|---|---|---|---|
| [ | Focus: prognostication of clinical outcomes in ICUs. | Critical Care Health Informatics Collaborative (CCHIC) data infrastructure (22,514 intensive care admissions of which 21,911 were used in the study; 90.8% of them were alive at discharge.) | On day 2 (AUC): | Larger data improves model’s performance. |
| [ | Focus: mortality prediction, LOS prediction and ICD-9 code group prediction. | Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) | SuperLearner-I: AUROC = 0.8448 and AUPRC = 0.4351. | Larger data improves model’s performance. |
| [ | Focus: prediction of final diagnosis and clinical outcomes. Methods: universal language model fine-tuning for text classification (ULMFiT) | Medical Information Mart for Intensive Care III (MIMIC-III) | Accuracy: 80.3% for diagnosis top10, 80.5% procedure top10, 70.7% diagnosis top50, 63.9% procedures top50. | Larger data improves model’s performance. |
| [ | Focus: in-hospital mortality prediction. Methods: deep learning networks. | Medical Information Mart for Intensive Care III (MIMIC-III) (42,818 hospital admissions of 35,348 patients) | Mortality prediction: AUROC: 0.9178 with data of all sources (AS) and 0.9029 with chart data (CD). PRAUC: 0.6251 for AS, 0.5701 for CD. | Larger data improves model’s performance. |
| [ | Focus: ICUs discharge prediction. Methods: random forest (RF) and logistic classifier (LC). | Bristol Royal Infirmary general intensive care unit (GICU) (1870 intensive care patients) and 7592 from MIMIC-III. | On the MIMIC dataset: AUROC(RF):0.8859, AUROC(LC): 0.8726. Accuracy (RF): 0.8531 and accuracy (LC): 0.8494. sensitivity (RF): 0.9049 and sensitivity (LC) is 0.9001. | Larger data improves model’s performance. |
| [ | Focus: prediction of discharge location in ICUs. Methods: National Early Warning Score (NEWS/NEWS 2) | Surgical, coronary, cardiac surgery recovery, medical and trauma surgical intensive care patients with single admission in ICUs in a US hospital. | The NEWS AUROC (95% CI): all patients 0.727 (0.709–0.745); Coronary Care Unit (CCU) 0.829 (0.821–0.837); Cardiac Surgery Recovery Unit (CSRU) 0.844 (0.838–0.850); Medical Intensive Care Unit (MICU) 0.778 (0.767–0.791); Surgical Intensive Care Unit (SICU) 0.775 (0.762–0.788); Trauma Surgical Intensive Care Unit (TSICU) 0.765 (0.751–0.773). | Larger data improves model’s performance. |
| [ | Focus: risk scoring in ICUs. Methods: attentive deep Markov model (AttDMM). | MIMIC-III with 53,423 ICU stays. | AttDMM with AUROC of 0.876. | Not specified |
| [ | Focus: ICU readmission prediction after 24 to 72 h of discharge. | MIMIC II (data of 4 different ICUs 26,655 patients, of which 19,075 are adults; 38% of the adult patients stayed at the medical ICU (MICU), 27% at the surgical ICU (SICU), 20% at the cardiac surgery recovery unit (CSRU) and 15% at the critical care unit (CCU)) | AUROC of 0.76 and | Not specified |
| [ | Focus: length of stay prediction in ICUs. Methods: neural network (random forest) | MIMIC-III (31,018 chosen data points) | Accuracy of 80%. | Larger dataset might improve the model’s performance. |
| Our model | Preprocessing: Regularized Linear Processing, Ordinal encoding of categorical variables, Tree based Algorithm. | MIMIC-III | {18% 1, 36% 2, 72% 3}: Accuracy = 98%, RESIDUAL MEAN = 0.000001, Prediction time: 123,450.9 mS. | Larger dataset improve model’s performance |