| Literature DB >> 35742138 |
Chih-Chou Chiu1, Chung-Min Wu1, Te-Nien Chien2, Ling-Jing Kao1, Jiantai Timothy Qiu3,4.
Abstract
Predicting clinical patients' vital signs is a leading critical issue in intensive care units (ICUs) related studies. Early prediction of the mortality of ICU patients can reduce the overall mortality and cost of complication treatment. Some studies have predicted mortality based on electronic health record (EHR) data by using machine learning models. However, the semi-structured data (i.e., patients' diagnosis data and inspection reports) is rarely used in these models. This study utilized data from the Medical Information Mart for Intensive Care III. We used a Latent Dirichlet Allocation (LDA) model to classify text in the semi-structured data of some particular topics and established and compared the classification and regression trees (CART), logistic regression (LR), multivariate adaptive regression splines (MARS), random forest (RF), and gradient boosting (GB). A total of 46,520 ICU Patients were included, with 11.5% mortality in the Medical Information Mart for Intensive Care III group. Our results revealed that the semi-structured data (diagnosis data and inspection reports) of ICU patients contain useful information that can assist clinical doctors in making critical clinical decisions. In addition, in our comparison of five machine learning models (CART, LR, MARS, RF, and GB), the GB model showed the best performance with the highest area under the receiver operating characteristic curve (AUROC) (0.9280), specificity (93.16%), and sensitivity (83.25%). The RF, LR, and MARS models showed better performance (AUROC are 0.9096, 0.8987, and 0.8935, respectively) than the CART (0.8511). The GB model showed better performance than other machine learning models (CART, LR, MARS, and RF) in predicting the mortality of patients in the intensive care unit. The analysis results could be used to develop a clinically useful decision support system.Entities:
Keywords: electronic health records; intensive care units; latent dirichlet allocation; machine learning; topic model
Year: 2022 PMID: 35742138 PMCID: PMC9222812 DOI: 10.3390/healthcare10061087
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Figure 1Research scheme.
Details of the MIMIC-III patient population for patients aged 16 years and above.
| Adult Patients Critical Care Unit | Total |
|---|---|
| Distinct patients | 38,597 |
| Hospital admissions | 49,785 |
| Distinct ICU stays | 53,423 |
| Coronary Care Unit (CCU) | 7726 (14.5%) |
| Cardiac Surgery Recovery Unit (CSRU) | 9854 (18.4%) |
| Medical Intensive Care Unit (MICU) | 21,087 (39.5%) |
| Surgical Intensive Care Unit (SICU) | 8891 (16.6%) |
| Trauma Surgical Intensive Care Unit (TSICU) | 5865 (11.1%) |
| Age, years, median [Q1–Q3] | 65.8 [52.8–77.8] |
| Gender, male | 27,983 (55.9%) |
| ICU length of stay, median days [Q1–Q3] | 2.1 [1.2–4.6] |
| Hospital length of stay, median days [Q1–Q3] | 6.9 [4.1–11.9] |
| ICU mortality | 4565 (8.5%) |
| Hospital mortality | 5748 (11.5%) |
| A mean of # is available for each hospital admission. | |
| Chartevents (330,712,483) | 6642.81 |
| Inputevents (21,136,926) | 424.56 |
| Outputevents (4,349,218) | 87.36 |
| Labevents (27,854,055) | 559.49 |
| Noteevents (2,083,180) | 41.84 |
ICU, intensive care unit.
Figure 2LDA model framework.
Figure 3The detailed process of data extraction.
Selected Patient Demographic Information.
| 12 h | 24 h | |||||
|---|---|---|---|---|---|---|
| Overall | Dead at Hospital | Alive at Hospital | Overall | Dead at Hospital | Alive at Hospital | |
| General (%) | ||||||
| Number | 24,252 (100%) | 2384 (9.83%) | 21,868 (90.17%) | 27,809 (100%) | 2559 (9.20%) | 25,250 (90.80%) |
| Age [Q1–Q3] | 63.02 [50.96–78.16] | 70.76 [61.19–83.32] | 62.17 [50.07–77.27] | 63.06 [51.32–77.82] | 70.88 [61.43–83.32] | 62.26 [50.51–76.97] |
| Gender (male) | 13,675 (56.38%) | 1267 (9.27%) | 12,408 (90.73%) | 15,805 (56.83%) | 1353 (8.56%) | 14,452 (91.44%) |
| Ethnicity (%) | ||||||
| Asian | 598 (2.47%) | 56 (9.36%) | 542 (90.64%) | 680 (2.45%) | 62 (9.12%) | 618 (90.88%) |
| Black | 1930 (7.96%) | 106 (5.49%) | 1824 (94.51%) | 2142 (7.70%) | 113 (5.28%) | 2029 (94.72%) |
| Hispanic | 841 (3.47%) | 44 (5.23%) | 797 (94.77%) | 919 (3.30%) | 49 (5.33%) | 870 (94.67%) |
| White | 17,262 (71.18%) | 1604 (9.29%) | 15,658 (90.71%) | 19,809 (71.23%) | 1733 (8.75%) | 18,076 (91.25%) |
| Other | 3621 (14.93%) | 574 (15.85%) | 3047 (84.15%) | 4259 (15.32%) | 602 (14.13%) | 3657 (85.87%) |
| Admission Type (%) | ||||||
| Urgent | 562 (2.32%) | 69 (12.28%) | 493 (87.72%) | 667 (2.40%) | 77 (11.54%) | 590 (88.46%) |
| Emergency | 21,096 (86.99%) | 2284 (10.83%) | 18,812 (89.17%) | 22,890 (82.31%) | 2427 (10.60%) | 20,463 (89.40%) |
| Elective | 2594 (10.70%) | 31 (1.20%) | 2563 (98.80%) | 4252 (15.29%) | 55 (1.29%) | 4197 (98.71%) |
| Site (%) | ||||||
| MICU | 9654 (39.81%) | 1099 (11.38%) | 8555 (88.62%) | 10,309 (37.07%) | 1187 (11.51%) | 9122 (88.49%) |
| SICU | 3942 (16.25%) | 476 (12.08%) | 3466 (87.92%) | 4543 (16.34%) | 501 (11.03%) | 4042 (88.97%) |
| CCU | 3925 (16.18%) | 334 (8.51%) | 3591 (91.49%) | 4316 (15.52%) | 360 (8.34%) | 3956 (91.66%) |
| CSRU | 2955 (12.18%) | 126 (4.26%) | 2829 (95.74%) | 4482 (16.12%) | 149 (3.32%) | 4333 (96.68%) |
| TSICU | 3776 (15.57%) | 349 (9.24%) | 3427 (90.76%) | 4159 (14.96%) | 362 (8.70%) | 3797 (91.30%) |
| Outcomes | ||||||
| Hospital LOS (days) [Q1–Q3] | 8.98 [3.79–10.66] | 9.16 [2.76–11.40] | 8.97 [3.86–10.58] | 8.95 [3.88–10.47] | 9.27 [2.77–11.49] | 8.92 [3.96–10.34] |
| ICU LOS (days) [Q1–Q3] | 4.26 [1.37–4.57] | 6.64 [2.12–8.13] | 4.00 [1.24–3.97] | 4.15 [1.26–4.17] | 6.68 [2.08–8.12] | 3.89 [1.22–3.89] |
| Hospital death (%) | 2384 (9.83%) | - | - | 2559 (9.20%) | - | - |
MICU Denotes Medical ICU; SICU Denotes Surgical ICU; CCU Denotes Coronary Care Unit; CSRU Denotes Cardiac Surgery Recovery Unit; TSICU Denotes Trauma Surgical ICU.
Demographic information of the selected patient cohort.
| In-Hospital Mortality | Short-Term Mortality | Long-Term Mortality | ||||
|---|---|---|---|---|---|---|
| 48 h | 72 h | 30 Days | 1 Year | |||
| 12 h | Number of Survive | 21,868 | 23,873 | 23,590 | 21,932 | 21,839 |
| Number of death | 2384 | 379 | 662 | 2320 | 2413 | |
| Mortality ratio | 9.83% | 1.56% | 2.73% | 9.57% | 9.95% | |
| 24 h | Number of Survive | 25,250 | 27,409 | 27,103 | 25,324 | 25,219 |
| Number of death | 2559 | 400 | 706 | 2485 | 2590 | |
| Mortality ratio | 9.20% | 1.44% | 2.54% | 8.94% | 9.31% | |
Number of instances increased by SMOTE technique.
| Hours after Hospital Admission | Mortality | Percentage of SMOTE Increase | Class “Survived” | Class “Died” | |
|---|---|---|---|---|---|
| 12 h | In-Hospital | 900% | 21,868 | 21,456 | |
| Short Term | 48 h | 6200% | 23,873 | 23,498 | |
| 72 h | 3500% | 23,590 | 23,170 | ||
| Long Term | 30 Days | 900% | 21,932 | 20,880 | |
| 1 Year | 900% | 21,839 | 21,717 | ||
| 24 h | In-Hospital | 900% | 25,250 | 23,031 | |
| Short Term | 48 h | 6800% | 27,409 | 27,200 | |
| 72 h | 3800% | 27,103 | 26,828 | ||
| Long Term | 30 Days | 1000% | 25,324 | 24,850 | |
| 1 Year | 900% | 25,219 | 23,310 | ||
Confusion Matrix.
| Prediction | |||
|---|---|---|---|
| Positive | Negative | ||
| Actual | Positive | True Positive (TP) | False Negative (FN) |
| Negative | False Positive (FP) | True Negative (TN) | |
Comparisons between different models constructed using 12 h dataset in terms of their prediction accuracy.
| Metric | Method | In-Hospital Mortality | Short-Term Mortality | Long-Term Mortality | ||
|---|---|---|---|---|---|---|
| 48 h | 72 h | 30 Days | 1 Year | |||
| AUROC | CART | 0.8101 | 0.8033 | 0.8006 | 0.7925 | 0.8471 |
| LR | 0.8029 | 0.8659 | 0.8222 | 0.8224 | 0.9082 | |
| MARS | 0.8124 | 0.8502 | 0.8195 | 0.8170 | 0.8716 | |
| RF | 0.8415 | 0.8867 | 0.8498 | 0.8543 | 0.8953 | |
| GB | 0.8489 | 0.8862 | 0.8542 | 0.8556 | 0.9171 | |
| Specificity | CART | 0.7416 | 0.8425 | 0.7071 | 0.7544 | 0.8181 |
| LR | 0.7352 | 0.7921 | 0.7455 | 0.7743 | 0.8521 | |
| MARS | 0.5696 | 0.6243 | 0.5861 | 0.6344 | 0.7090 | |
| RF | 0.1735 | 0.2408 | 0.1638 | 0.1900 | 0.3796 | |
| GB | 0.7528 | 0.8712 | 0.7578 | 0.7907 | 0.9259 | |
| Sensitivity | CART | 0.7460 | 0.7211 | 0.7864 | 0.7553 | 0.8080 |
| LR | 0.7126 | 0.8188 | 0.7457 | 0.7158 | 0.8000 | |
| MARS | 0.8599 | 0.8842 | 0.8665 | 0.8275 | 0.8480 | |
| RF | 0.9912 | 0.9842 | 0.9970 | 0.9933 | 0.9760 | |
| GB | 0.7810 | 0.7211 | 0.7864 | 0.7647 | 0.6720 | |
| Precision | CART | 0.2323 | 0.1167 | 0.2179 | 0.2605 | 0.0788 |
| LR | 0.2273 | 0.1041 | 0.2400 | 0.2756 | 0.0952 | |
| MARS | 0.1731 | 0.0636 | 0.1785 | 0.2059 | 0.0531 | |
| RF | 0.1117 | 0.0361 | 0.1101 | 0.1231 | 0.0294 | |
| GB | 0.2487 | 0.1391 | 0.2520 | 0.2950 | 0.1487 | |
| F1-Statistic | CART | 0.3542 | 0.2009 | 0.3413 | 0.3874 | 0.1436 |
| LR | 0.3447 | 0.1848 | 0.3632 | 0.3979 | 0.1702 | |
| MARS | 0.2882 | 0.1187 | 0.2960 | 0.3297 | 0.1000 | |
| RF | 0.2007 | 0.0696 | 0.1983 | 0.2191 | 0.0571 | |
| GB | 0.3773 | 0.2332 | 0.3817 | 0.4258 | 0.2435 | |
CART, classification and regression trees; LR, logistic regression; MARS, multivariate adaptive regression splines; RF, random forest; and GB, gradient boosting.
Comparisons between different models constructed using 24 h dataset in terms of their prediction accuracy.
| Metric | Method | In-Hospital Mortality | Short-Term Mortality | Long-Term Mortality | ||
|---|---|---|---|---|---|---|
| 48 h | 72 h | 30 Days | 1 Year | |||
| AUROC | CART | 0.8049 | 0.8246 | 0.8064 | 0.8140 | 0.8511 |
| LR | 0.8331 | 0.9014 | 0.8438 | 0.8434 | 0.8987 | |
| MARS | 0.8053 | 0.8843 | 0.8250 | 0.8102 | 0.8935 | |
| RF | 0.8623 | 0.9203 | 0.8705 | 0.8710 | 0.9096 | |
| GB | 0.8623 | 0.9249 | 0.8760 | 0.8736 | 0.9280 | |
| Specificity | CART | 0.7639 | 0.7927 | 0.7537 | 0.7658 | 0.8197 |
| LR | 0.7707 | 0.8313 | 0.7812 | 0.7684 | 0.8772 | |
| MARS | 0.6443 | 0.6151 | 0.6115 | 0.6054 | 0.6765 | |
| RF | 0.2520 | 0.3011 | 0.2056 | 0.2301 | 0.4212 | |
| GB | 0.8184 | 0.8507 | 0.8123 | 0.7882 | 0.9316 | |
| Sensitivity | CART | 0.7402 | 0.8071 | 0.7578 | 0.7607 | 0.7983 |
| LR | 0.7278 | 0.8090 | 0.7597 | 0.7683 | 0.7642 | |
| MARS | 0.8023 | 0.9137 | 0.8512 | 0.8409 | 0.8992 | |
| RF | 0.9899 | 0.9848 | 0.9932 | 0.9866 | 0.9580 | |
| GB | 0.7427 | 0.8325 | 0.7618 | 0.7821 | 0.7563 | |
| Precision | CART | 0.2499 | 0.0905 | 0.2303 | 0.2425 | 0.0658 |
| LR | 0.2658 | 0.1168 | 0.2673 | 0.2601 | 0.0955 | |
| MARS | 0.1933 | 0.0572 | 0.1756 | 0.1736 | 0.0423 | |
| RF | 0.1233 | 0.0348 | 0.1084 | 0.1122 | 0.0257 | |
| GB | 0.3030 | 0.1248 | 0.2831 | 0.2669 | 0.1495 | |
| F1-Statistic | CART | 0.3736 | 0.1628 | 0.3532 | 0.3678 | 0.1216 |
| LR | 0.3894 | 0.2041 | 0.3955 | 0.3887 | 0.1698 | |
| MARS | 0.3116 | 0.1077 | 0.2912 | 0.2878 | 0.0809 | |
| RF | 0.2193 | 0.0672 | 0.1955 | 0.2014 | 0.0500 | |
| GB | 0.4304 | 0.2171 | 0.4128 | 0.3980 | 0.2497 | |
CART, classification and regression trees; LR, logistic regression; MARS, multivariate adaptive regression splines; RF, random forest; and GB, gradient boosting.
Figure 4The AUROCs of different classifiers (A) based on 12 h dataset (B) based on 24 h dataset.
Figure 5AUROC of different classifiers based on 24 h dataset.
The selected six important variables for 12 h and 24 h datasets by using GB.
| Dataset | Order of Variable Importance | In-Hospital Mortality | Short-Term Mortality | Long-Term Mortality | ||
|---|---|---|---|---|---|---|
| 48 h | 72 h | 30 Days | 1 Year | |||
| 12 h | 1 | x1 | x1 | x1 | x1 | x1 |
| 2 | x12 | x5 | x12 | x12 | x9 | |
| 3 | x5 | x12 | x5 | x5 | TOPICA3 | |
| 4 | x2 | TOPICA3 | x6 | x9 | x12 | |
| 5 | x6 | x9 | TOPICA3 | x2 | x2 | |
| 6 | x4 | x3 | x4 | x4 | x4 | |
| 24 h | 1 | x1 | x7 | x7 | x7 | x7 |
| 2 | x5 | x10 | TOPICB1 | x12 | TOPICB1 | |
| 3 | x12 | TOPICB1 | x12 | x6 | x10 | |
| 4 | x9 | x12 | x10 | TOPICB1 | x12 | |
| 5 | TOPICB1 | x8 | x3 | x10 | x3 | |
| 6 | x2 | x1 | x5 | x3 | x1 | |
Comparisons (made using the GB method) of the prediction results generated by models based on 24 h dataset.
| Dataset | Metric | In-Hospital Mortality | Short-Term Mortality | Long-Term Mortality | ||
|---|---|---|---|---|---|---|
| 48 h | 72 h | 30 Days | 1 Year | |||
| With semi-structure data | AUROC | 0.8623 | 0.9249 | 0.8760 | 0.8736 | 0.9280 |
| Specificity | 0.8184 | 0.8507 | 0.8123 | 0.7882 | 0.9316 | |
| Sensitivity | 0.7427 | 0.8325 | 0.7618 | 0.7821 | 0.7563 | |
| Precision | 0.3030 | 0.1248 | 0.2831 | 0.2669 | 0.1495 | |
| F1-Statistic | 0.4304 | 0.2171 | 0.4128 | 0.3980 | 0.2497 | |
| Without semi-structure data | AUROC | 0.8545 | 0.9141 | 0.8643 | 0.8683 | 0.9152 |
| Specificity | 0.8113 | 0.8276 | 0.8046 | 0.7932 | 0.9215 | |
| Sensitivity | 0.7389 | 0.8426 | 0.7564 | 0.7687 | 0.7143 | |
| Precision | 0.2939 | 0.1111 | 0.2735 | 0.2682 | 0.1265 | |
| F1-Statistic | 0.4205 | 0.1963 | 0.4017 | 0.3976 | 0.2149 | |