| Literature DB >> 35441086 |
Dina A Alabbad1, Abdullah M Almuhaideb2, Shikah J Alsunaidi3, Kawther S Alqudaihi3, Fatimah A Alamoudi3, Maha K Alhobaishi1, Naimah A Alaqeel1, Mohammed S Alshahrani4.
Abstract
The COVID-19 virus has spread rapidally throughout the world. Managing resources is one of the biggest challenges that healthcare providers around the world face during the pandemic. Allocating the Intensive Care Unit (ICU) beds' capacity is important since COVID-19 is a respiratory disease and some patients need to be admitted to the hospital with an urgent need for oxygen support, ventilation, and/or intensive medical care. In the battle against COVID-19, many governments utilized technology, especially Artificial Intelligence (AI), to contain the pandemic and limit its hazardous effects. In this paper, Machine Learning models (ML) were developed to help in detecting the COVID-19 patients' need for the ICU and the estimated duration of their stay. Four ML algorithms were utilized: Random Forest (RF), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), and Ensemble models were trained and validated on a dataset of 895 COVID-19 patients admitted to King Fahad University hospital in the eastern province of Saudi Arabia. The conducted experiments show that the Length of Stay (LoS) in the ICU can be predicted with the highest accuracy by applying the RF model for prediction, as the achieved accuracy was 94.16%. In terms of the contributor factors to the length of stay in the ICU, correlation results showed that age, C-Reactive Protein (CRP), nasal oxygen support days are the top related factors. By searching the literature, there is no published work that used the Saudi Arabia dataset to predict the need for ICU with the number of days needed. This contribution is hoped to pave the path for hospitals and healthcare providers to manage their resources more efficiently and to help in saving lives.Entities:
Keywords: Coronavirus disease 2019 (COVID-19); Intensive care unit (ICU); Length of stay (LoS); Machine learning (ML); Predation; Resource management
Year: 2022 PMID: 35441086 PMCID: PMC9010025 DOI: 10.1016/j.imu.2022.100937
Source DB: PubMed Journal: Inform Med Unlocked ISSN: 2352-9148
Summary of works aimed at estimating the need and LoS in ICU for COVID-19 patients worldwide.
| Ref | Aim | Model | Algorithms/ Methods | Dataset | Model inputs/Extracted features | Results |
|---|---|---|---|---|---|---|
| [ | Predict ICU admission, LoS in the ICU, and mortality for COVID-19 patients | Multivariate logistic regression | LR | EHRs of | Clinical data | |
| [ | Predict patient census and estimate ventilator needs for a specific hospital during the COVID-19 pandemic | Analytical model (Weibull distribution) | Linear and log-linear regression | EHRs from | LoS in hospital, and duration of using the ventilator | |
| [ | Estimate LoS of hospitalized COVID-19 patients | Non-parametric model | – | EHRs of | Age, and gender | – |
| [ | Estimate the LoS of COVID-19 patients in ICU | Semiparametric distributional index model | Distributional regression model | EHRs of | Age, and gender | |
| [ | Describe COVID-19 clinical characteristics outside of Wuhan and predict the risk of long LoS in hospital | Multivariate regression model | Statistical methods ( | EHRs of | Demographic data, comorbidities, laboratory results symptoms, and vital signs | |
| [ | Predict ICU admission, LoS in the ICU, and mortality for COVID-19 patients | ML model | SVM | EHRs of | Demographic, laboratory, and clinical data | |
| [ | Estimate average LoS in the ICU for COVID-19 patients | Mathematical model | Two estimation methods: DPE and CPE | EHRs of COVID-19 patients entered the ICU of | Age and gender |
Note: Area Under the Curve (AUC), Acute Respiratory Distress Syndrome (ARDS), Electronic Healthcare Records (EHR), Discharged Patient Estimation (DPE), Censored Patient Estimation (CPE), Linear Regression (LR), Mean Absolute Error (MAE), Support Vector Machine (SVM), University of Iowa Hospitals and Clinics (UIHC), University of Miami UHealth Tower (UHT), Zhongnan Hospital of Wuhan University (ZHWU).
Relationship between LoS and clinical features of patients.
| Factor | Studies that confirmed relation | Studies that confirmed no relation |
|---|---|---|
| [ | [ | |
| [ | [ | |
| [ | ||
| [ | ||
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – | |
| [ | – |
Strengths and weaknesses for the implemented ML algorithms RF, GB, XGB, and ensemble classifier.
| Algorithm | Strength | Weakness |
|---|---|---|
| Random Forest (RF) | Collection of decision trees that fit the data and cause high variation in classification Data classification is based on the most votes. Lower chance of variation in data training. Good scale for big dataset. Knows what is better fields in the classification [ | Very sensitive to training data which makes it error-prone. Complex and computationally expensive The base classifiers need to be defined It prefers the parameters that take higher different values [ |
| Gradient Boosting (GB) | It improves the prediction performance [ The algorithm builds relations by shortening the number of errors from old weak classifiers [ | Up-sampling of similar data does not show any impact in improving results [ |
| Extreme Gradient Boosting (XGBoost) | Designed to be used with large complex datasets and avoid model overfitting. The method is scalable in all cases. It can handle sparse data and also parallel and distributed computation which makes learning process faster and quicker [ Always involves many classification and regression trees [ | Complex and computationally expensive [ |
| Ensemble Classifier | It is combined by weighted averaging or the voting of a collection of single classifiers. The ensemble method combines multiple weak classifiers as a strong classifier. An empirical study shows that the price of building a base classifier is lower than the price of building a strong classifier. It can maximize the information of the base learner and improve the overall ability of classification [ | The method robustness is affected by the quality of the dataset [ |
Fig. 1Days to discharge from ICU class distribution before oversampling.
Fig. 2Days to discharge from ICU class distribution after oversampling.
Entropy values for features included in the study.
| Attribute | Entropy Evaluation |
|---|---|
| 0.64 | |
| 4.03 | |
| 0.71 | |
| 0.68 | |
| 0.40 | |
| 0.31 | |
| 0.2 | |
| 0.04 | |
| 0.23 | |
| 0.12 | |
| 0.10 | |
| 0.71 | |
| 0.71 | |
| 0.45 | |
| 0.35 | |
| 0.64 | |
| 0.36 | |
| 0.53 | |
| 2.86 | |
| 3.28 | |
| 0.80 | |
| 0.08 | |
| 0.72 | |
| 0.43 | |
| 0.63 | |
| 0.56 | |
| 0.17 | |
| 0.10 | |
| 0.06 | |
| 0.06 | |
| 0.03 | |
| 0.03 | |
| 0.06 | |
| 0.13 | |
| 5.35 | |
| 5.56 | |
| 5.64 | |
| 4.34 | |
| 4.57 | |
| 4.73 | |
| 5.93 | |
| 5.47 | |
| 5.90 |
The results of predicting the number of days to discharge from ICU class for Model 1.
| K-fold | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 94.16% | 94.14% | 94.16% | 94.14% | |
| 93.55% | 93.51% | 93.55% | 93.53% | |
| 87.23% | 87.23% | 87.23% | 87.20% | |
| 86.07% | 86.11% | 86.07% | 85.96% | |
| 92.38% | 92.31% | 92.38% | 92.33% |
The results of predicting the number of days to discharge from ICU class for Model 1 with feature selection.
| K-fold | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 93.30% | 93.30% | 93.30% | 93.30% | |
| 89.45% | 89.24% | 89.45% | 89.26% | |
| 87.23% | 87.26% | 87.23% | 87.17% | |
| 86.30% | 86.36% | 86.30% | 86.28% | |
| 79.59% | 80.17% | 79.59% | 79.67% |
The results of predicting the number of days to discharge from ICU class for Model 2.
| K-fold | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 86.21% | 86.24% | 86.21% | 86.11% | |
| 85.16% | 85.26% | 85.16% | 85.03% | |
| 86.33% | 86.37% | 86.33% | 86.30% | |
| 83.33% | 83.50% | 83.33% | 83.36% | |
| 88.14% | 88.17% | 88.14% | 88.08% |
The results of predicting the number of days to discharge from ICU class for Model 2 with feature selection.
| K-fold | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 87.29% | 87.09% | 87.29% | 87.02% | |
| 83.98% | 83.80% | 83.98% | 83.80% | |
| 80.85% | 80.93% | 80.85% | 80.62% | |
| 76.48% | 76.42% | 76.48% | 76.25% | |
| 77.26% | 77.99% | 77.26% | 77.48% |
Results of predicting the number of days to discharge from ICU class for Model 3.
| K-fold | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 91.41% | 91.49% | 91.41% | 91.42% | |
| 91.21% | 91.10% | 91.21% | 91.10% | |
| 87.02% | 87.07% | 87.02% | 86.97% | |
| 83.56% | 83.67% | 83.56% | 83.49% | |
| 82.69% | 83.27% | 82.69% | 82.89% |
The result of predicting the number of days to discharge from ICU class for Model 3 with feature selection.
| K-fold | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 90.72% | 90.80% | 90.72% | 90.78% | |
| 89.26% | 89.27% | 89.26% | 89.20% | |
| 85.32% | 85.04% | 85.32% | 84.98% | |
| 86.99% | 87.05% | 86.99% | 87.00% | |
| 82.69% | 83.39% | 82.69% | 82.86% |
The result of predicting the number of days to discharge from ICU class for Model 4.
| K-fold | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 93.13% | 93.42% | 93.13% | 93.19% | |
| 92.58% | 92.56% | 92.58% | 92.54% | |
| 88.94% | 89.55% | 88.94% | 88.96% | |
| 91.10% | 91.39% | 91.10% | 91.20% | |
| 85.01% | 85.44% | 85.01% | 85.12% |
The result of predicting the number of days to discharge from ICU class for Model 4 with feature selection.
| K-fold | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 93.81% | 93.79% | 93.81% | 93.78% | |
| 92.19% | 92.23% | 92.19% | 92.13% | |
| 87.45% | 87.53% | 87.45% | 87.45% | |
| 88.81% | 88.85% | 88.81% | 88.72% | |
| 86.56% | 87.18% | 86.56% | 86.71% |
Results of investigating the effect of feature selection on the dataset.
| Algorithm | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Model 1 (RF) | 87.27% | 87.27% | 87.17% | 87.14% |
| Model 2 (GB) | 81.17% | 81.25% | 81.17% | 81.03% |
| Model 3 (XGBoost) | 87.00% | 87.11% | 87.00% | 86.96% |
| Model 4 (Ensemble) | 93.81% | 93.79% | 93.81% | 93.78% |
Results of using the complete features set to predict the number of days to discharge from ICU.
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Model 1 (RF) | 94.16% | 94.14% | 94.16% | 94.14% |
| Model 2 (GB) | 88.14% | 88.17% | 88.14% | 88.08% |
| Model 3 (XGBoost) | 91.41% | 91.49% | 91.41% | 91.42% |
| Model 4 (Ensemble) | 93.13% | 93.42% | 93.13% | 93.19% |
Fig. 3Prediction models accuracy before and after feature selection.
Comparison of obtained results with the related studies.
| Ref | Method | Measurements | Features | Dataset location | Dataset size |
|---|---|---|---|---|---|
| [ | LR | Clinical data | Manisa, Turkey | 1668 | |
| [ | LR | LOS in hospital, and duration of using the ventilator | Miami-USA | – | |
| [ | Nonparametric mixture cure model | – | Age, and gender | Spain | 10,454 |
| [ | Distributional regression model | Age, and gender | Switzerland | 2411/ 557 | |
| [ | DPE and CPE | DPE and CPE estimates of ICU-ALoS (95% CI) | Age, and gender | ZHWU | 59 |
| [ | SVM | Demographic, and clinical data | Wuhan, China | 733 | |
| [ | Statistical methods ( | Demographic and clinical data | Zhejiang Tertiary | 75 | |
| Ours | RF, GB, XGBoost, Ensemble | Demographic and clinical data | Saudi Arabia | 895 |
Note: Area Under the Curve (AUC), Acute Respiratory Distress Syndrome (ARDS), Electronic Healthcare Records (EHR), Discharged Patient Estimation (DPE), Censored Patient Estimation (CPE), Linear Regression (LR), Mean Absolute Error (MAE), Support Vector Machine (SVM), University of Iowa Hospitals and Clinics (UIHC), University of Miami UHealth Tower (UHT), Zhongnan Hospital of Wuhan University (ZHWU).
Top 10 features that are highly correlated with the LoS for COVID-19 patients in Saudi Arabia.
| Feature | Rank |
|---|---|
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 |