Literature DB >> 35280933

Predicting hospital readmission risk in patients with COVID-19: A machine learning approach.

Mohammad Reza Afrash1, Hadi Kazemi-Arpanahi2,3, Mostafa Shanbehzadeh4, Raoof Nopour5, Esmat Mirbagheri6.   

Abstract

Introduction: The Coronavirus 2019 (COVID-19) epidemic stunned the health systems with severe scarcities in hospital resources. In this critical situation, decreasing COVID-19 readmissions could potentially sustain hospital capacity. This study aimed to select the most affecting features of COVID-19 readmission and compare the capability of Machine Learning (ML) algorithms to predict COVID-19 readmission based on the selected features. Material and methods: The data of 5791 hospitalized patients with COVID-19 were retrospectively recruited from a hospital registry system. The LASSO feature selection algorithm was used to select the most important features related to COVID-19 readmission. HistGradientBoosting classifier (HGB), Bagging classifier, Multi-Layered Perceptron (MLP), Support Vector Machine ((SVM) kernel = linear), SVM (kernel = RBF), and Extreme Gradient Boosting (XGBoost) classifiers were used for prediction. We evaluated the performance of ML algorithms with a 10-fold cross-validation method using six performance evaluation metrics.
Results: Out of the 42 features, 14 were identified as the most relevant predictors. The XGBoost classifier outperformed the other six ML models with an average accuracy of 91.7%, specificity of 91.3%, the sensitivity of 91.6%, F-measure of 91.8%, and AUC of 0.91%.
Conclusion: The experimental results prove that ML models can satisfactorily predict COVID-19 readmission. Besides considering the risk factors prioritized in this work, categorizing cases with a high risk of reinfection can make the patient triaging procedure and hospital resource utilization more effective.
© 2022 Published by Elsevier Ltd.

Entities:  

Keywords:  AUC, Area under the curve; Artificial intelligent; CDSS, Clinical Decision Support Systems; COVID-19; COVID-19, Coronavirus disease 2019; CRISP, Cross-Industry Standard Process; Coronavirus; HGB, Hist Gradient Boosting; LASSO, Least Absolute Shrinkage and Selection Operator; ML, Machine learning; MLP, Multi-Layered Perceptron; Machine learning; Readmission; SVM, Support Vector Machine; XGBoost, Extreme Gradient Boosting

Year:  2022        PMID: 35280933      PMCID: PMC8901230          DOI: 10.1016/j.imu.2022.100908

Source DB:  PubMed          Journal:  Inform Med Unlocked        ISSN: 2352-9148


Introduction

Hospital readmission is a well-accepted metric of hospital care quality [1]. It is defined as the new hospitalization in the same hospital within a specified time between 30 and 60 days after initial hospital discharge [[2], [3], [4]]. The high readmission rates are most probably related to the quality of care delivered by hospitals and other health centers during or after the former admission [5,6]. Because of the high costs that readmission imposes on hospitals and patients, it has gained substantial attention as one of the most important criteria for evaluating the quality of care and discharge procedures. Estimates show that 60% of patient readmission can be prevented [7,8]. As the prevalence of the COVID-19, the health care systems of many countries were collapsed and could not meet the growing needs of patients to diagnose, treatment, and care services [9,10]. Many patients in such conditions were discharged after admission with partial recovery [11]. Meanwhile, due to the unknown and aggressive nature of the disease, the readmission rate of patients increased [12]. Readmission imposes additional costs on care organizations and patients. In addition, it will reduce the quality indicators of service delivery; increase the rate of serious complications and deaths during the pandemic [13]. According to the formal reports, about 5% of COVID-19 confirmed patients necessitate hospitalization care services, and the tolls of readmission from this disease report vary from 2 to 10% [14,15]. In this situation, enhancing the capability of the healthcare system against the pandemic requires attention to technological and intelligent-based solutions such as Clinical Decision Support Systems (CDSSs) [16,17]. CDSSs attracted increasing interest because of the growing availability of a large amount of patient-level data [18,19]. CDSSs using available patient data at the time of admission may provide caregivers with valuable information regarding the likelihood risk of COVID-19 readmission [20,21]. Machine learning (ML) algorithms are complex and flexible classification modeling that leverage big datasets to reveal new and practical patterns [18,22]. ML algorithms will reduce uncertainties and ambiguities related to new diseases such as COVID-19 by providing diagnostic and predictive models based on valid and scientific evidence to assess risks, screening, forecasting, and health planning [23,24]. Recently, published works have shown that several ML methods are more accurate than conventional statistics models for predicting clinical outcomes in COVID-19 hospitalized patients. They are such as predicting the Length of Stay (LOS), hospital bed occupancy and turnover, Intensive Care Unit (ICU) admission, and respiratory intubation [[25], [26], [27]]. Due to the high prevalence of the disease in our country and the existence of some limitations and lack of healthcare resources [28], therefore, the purpose of this study is to develop an effective and efficient diagnostic model based on comparing the performance of ML algorithms for COVID-19 readmission prediction. Therefore, the present study seeks to answer two questions. What are the most important predictor variables affecting readmission and worsening of patients after receiving first hospitalization services? And which ML model is more effective for predicting readmission?

Material and methods

Study roadmap and experiment environment

The present study was conducted in the form of a retrospective and single-center study in 2022 to predict readmission in patients with confirmed COVID-19 based on one of the most popular ML methods called the Cross-Industry Standard Process (CRISP). It was carried out through five main steps including, 1- Data understanding, 2- Data preprocessing, 3- Feature selection, 4- Classifier, and 5-Evaluation. Fig. 1 shows the proposed models of study steps and sub-steps based on CRISP. This study used Python programming language to run all experiments on the data mining algorithms to predict readmission in patients with confirmed COVID-19 (see Fig. 2 ).
Fig. 1

The roadmap of the proposed system for prediction of readmission based on the CRISP method.

Fig. 2

Flow chart describing patient selection.

The roadmap of the proposed system for prediction of readmission based on the CRISP method. Flow chart describing patient selection.

Data set description

The included cases are defined based on 42 variables in three main classes, including patient's demographics (three variables), hospitalization (eight variables), and clinical (31 variables) (see Table 1 ). After reviewing the demographical, clinical and hospitalization information of the patients with confirmed COVID-19, statically analysis was performed to describe the differences in the patients with confirmed COVID-19 data, were readmitted or not. For this purpose, the differences in demographical and hospitalization information of patient were described based on whether the patients were readmitted or not, and the relationship of each feature with readmission was checked by the Chi-square test.
Table 1

Patient characteristics variable data.

Patient CharacteristicsVariablesTotalReadmission
Non-Readmission
P-value
NN
DemographicalSexFemale27204122308<0.002**
Male30713322739
Marital statussingle,1219631588<0.004**
married45722394333
Age0–3013631521211<0.001**
30–6018361461690
60–9029525722380
HospitalizationNumber of admissions1492104921
2–47807800<0.002**
>490900
Type of admissionInpatient care20755241551<0.001**
Outpatient care37163463370
ICU admissionYes52846266<0.002**
No52634084855
Oxygen therapyYes720543177<0.161
No50713274744
CRP on admissionYes38032951<0.039**
No54115414870
Duration of hospitalization<24 h3917433874<0.497**
1–7 days1465519946
>7days409308101
Patient status on dischargePartial recovery-1430774656<0.041**
Complete recovery3970623908
dead39134357
Time to readmission<30 days13002571043<0.052
>30days44916133878
COVID statusCritical52014506<0.001**
Severe1034142892
Moderate23005401760
Mild1540981442
Recovered39714383
Severe kidney diseaseYes24049191<0.630
No55518214730
Solid organ transplantationYes1829488<0.951
No56097764833
Lymphocytes on dischargeYes746297449<0.832
No50455734472
Coronary artery diseaseYes570381189<0.267
No52214894732
CancerYes16811949<0.574
No56237514872
History of CT resultNormal33215402781<0.059
Unmoral24703302140
PregnancyYes942371<0.720
No56978474850
Congestive heart failureYes350180170<0.968
No54416904751
Cerebrovascular diseaseYes49841<0.602
No57428624880
C reactive protein on admissionYes53087104598<0.057
No753160593
Congestive heart failureYes1359441<0.619
No56567764880
AsthmaYes744133<0.570
No57178294888
Metastatic solid tumorYes14311<0.924
No57768674909
Diabetes mellitusYes36479285<0.738
No54277914636
D-dimerYes46803614319<0.042**
No1111509602
DyspneaYes16404901150<0.069
No41513803771
Underlying diseasesYes839538301<0.073
No49524684484
HeadacheYes49816814300<0.075
No810189621
Weakness and lethargyYes51345264608<0.052
No657344313
Body painYes43916173774<0.061
No14002531147
Pain or pressure in the chestYes26705942076<0.068
No31212762845
High feverYes46217133908<0.072
No11701571013
Nausea & VomitingYes39106723238<0.067
No18811981683
CoughYes46275934034<0.0512
No1164277887
Gastrointestinal symptomsYes23456178<0.102
No55578144743
Chronic pulmonaryYes26173188<0.284
No55307974733
HypertensionYes840142698<0.043**
No49517284223
ConsolidationYes46159402<0.0497**
No53308114519
Pleural fluidYes571137434<0.0581
No52207334487
Hypersensitive troponinYes892261568<0.042*
No48996094290
Patient characteristics variable data. Of 5791 COVID-19 hospitalized patients, 3071 (53.04%) were male, 2720 (46.96%) were women, and the median age of participants was 57.25 (interquartile 00–100). 528 (13.87%) were hospitalized in ICU, and 2075 (86.13%) were hospitalized in general wards. Out of 5791 included patients, 870 (15.02%) patients were readmitted within 30 days after initial discharge.

Ethical consideration

The ethical committee board approved the study of Ilam University of Medical Sciences (Ethics code: IR.MEDILAM.REC.1399.294). To protect the privacy and confidentiality of patients, we concealed the unique identification information of all patients in the process of data collection and presentation.

Preprocessing step

Preprocessing on the dataset was applied before the training of the proposed model. Several preprocessing steps were examined on the dataset, including removing missing values (rows with missing values greater than 70% were removed.), Standard scalar, Min-Max Scalar, Data validation under sampling for correct use of data in the machine learning algorithms. The noisy and abnormal values, duplicates, and meaningless data impacted ML models' results and were examined and removed by two authors: (M: A and M: SH).

Patient selection criteria

After applying the exclusion criteria, out of 9180 confirmed COVID-19 patients, 6411 hospitalized cases were included in the study. In the preprocessing steps, 818 patient record values were removed, and after deleting these values, the number of patient records was reduced to 5791 cases. Among them, 870 (15.02%) cases were readmitted after a 30-day of the first hospitalization.

Feature selection

Feature selection or variable selection is needed before feeding data into the ML algorithms since outside dimensions affect the classification performance and precision and decrease run time [29]. To select the most important feature to predict readmission, we used Least Absolute Shrinkage and Selection Operator Features Selection Algorithm (LASSO) in this study. The LASSO selects the most important and relevant features for predicting readmission in COVID-19 patients according to updating the absolute value of the variables' coefficient. If the coefficients value of variables is equal to zero, these zero Values for features eliminated that from features subset, and if any variables obtained high values for coefficients. Hence, the feature included in selected variables subsets.

Machine learning methods

In this study, to predict the readmission in the patient with confirmed COVID-19, we used seven ML classification algorithms, including Hist Gradient Boosting (HGB) classifier, Bagging classifier, Multi-Layered Perceptron (MLP) classifier, Support Vector Machine ((SVM) kernel = linear), SVM (kernel = RBF), and Extreme Gradient Boosting (XGBoost) classifier.

Performance metrics

To evaluate the performance of applied algorithms and verify the quality of the algorithms in this study, we used the k-fold cross-validation method. Cross-validation is a resampling method used to assess ML models in an unseen data sample. This method has one parameter named k that refers to the number of parts that the dataset should be split. In this study, we use 10 -fold cross validation method. In 10-fold cross-validation methods, the algorithms are trained and tested 10-time times, and then the mean evaluation metrics. Accuracy, specificity, sensitivity, KAPA statistic, Area under the curve (AUC) are measured at the end of the process curve (Equations (1), (2), (3), (4), (5))).

Results

Patient characteristics

The mean age of patients who were readmitted to the hospital was 59 ± 9 years old. The mean age of patients who were not readmitted to the hospital was 51 ± 6 years old (p < 0.002). Table 1 indicated that there was a significant association between some features of patients who readmitted or not: features with p-value < 0.005 that showed in Table 1 with (** symbol) have a significant difference in patients who readmitted d or not class. For example, the results showed that there was a significant relationship between ICU admission and COVID status with readmission (p-value < 0.002) and (p-value-<0.001), respectively. The LASSO feature selection method selects the most important and relevant features for predicting readmission according to updating the absolute value of the variables' coefficient. The LASSO feature selection ranks the relevant variables. After feature selection, out of 42, 28 variables have not been selected to predict readmission and have been deleted from the dataset. The top 14 selected important variables by the LASSO feature selection method and their scores are represented in Table 2 .
Table 2

Important variables selected by the LASSO algorithm.

OrderFeature nameScoreP-Value
1COVID status3.780/015
2ICU admission3.500/035
3Oxygen therapy3.310/012
4CRP on admission3.190/047
5Duration of hospitalization3.080/032
6Solid organ transplantation2.94<0/001
7Lymphocytes on discharge2.710/001
8Coronary artery disease2.640/023
9Cerebrovascular disease2.470/027
10C reactive protein on admission2.390/012
11Congestive heart failure2.150/017
12Asthma2.090/021
13Metastatic solid tumor2.030/006
14Age1.740/045
Important variables selected by the LASSO algorithm. Based on Table 2, COVID-19 status, ICU admission, and oxygen therapy obtain the highest score for the prediction of readmission in a patient with COVID-19. Moreover, age and solid metastatic tumor have a low score in relevant variables scores, so it means that age and solid metastatic tumor have a low impact on the prediction of readmission in confirmed COVID-19 patients.

Results of hyper-parameters tuning

The performance of ML algorithms is highly dependent on the selection of their hyper-parameters. Hyper-parameters are applied to ML algorithms to produce the best model on a given dataset. After the preprocessing step, several ML modeling was performed by adjusting and optimizing hyper-parameters. The best hyper-parameters needed to build models with the highest F-criteria score were identified during this step. In the present study, to select the most precise and powerful models, the Randomized Search CV method was used for parameter adjustment and optimization algorithms, including HGB classifier, Bagging classifier, MLP classifier, SVM (kernel = linear), SVM (kernel = RBF), and XGBoost classifier. Table 3 represents the best Hyper-parameters for ML algorithm modeling for predicting readmission.
Table 3

Best hyper-parameters for ML algorithm modeling in prediction of readmission.

NumAlgorithmsHyper-parametersf-score
1HistGradientBoostingClassifier‘verbose’ = 2, ‘random_state’ = 999, ‘max_leaf_nodes’ = 62, ‘max_iter’ = 150, ‘max_depht’ = 7, ‘learning rate’ = 0.193.7
2BaggingClassifier‘verbose’ = 2, ‘random_state’ = 999, ‘n_estimation’ = 12, ‘max-samples’ = 0.5, ‘bootstrap’ = ‘true’91.28
3MLP Classifier‘Learning rate’ = ‘constant’, hidden_layer_size’ = (100,100,100), ‘alpha’ = 0.05, ‘activation’ = ‘rulo’91.07
4SVM (kernel = linear)C = 100,G = 0.000190.09
5SVM (kernel = RBF)C = 10, G = 0.00189.24
6XG Boost Classifier‘min_chid_weigh’ = 1′max_depht’ = 12,‘learning_rate’ = 0.1, ‘gamma’ = 0.4, ‘colsample_bytree’ = 0.389.01
7K Nearest Neighbor ClassifierK = 3, ‘n_jobs’ = −1, ‘algorithm’ = ‘auto’87.00
Best hyper-parameters for ML algorithm modeling in prediction of readmission.

K-fold cross-validation

Selected features by the LASSO feature selection method were tested on seven ML algorithms with a 10-fold cross-validation method. 10-fold cross-validation splits our selected data set into ten subsets and performs the holdout method ten times. 90% of data was used for training ML algorithms for each run, and 10% was fed into the algorithms to test models. To measure the performance of ML algorithms with a 95% confidence interval, we measured the mean of evaluation metrics. Table 4 shows the results of seven prediction models on the selected feature by the LASSO method with a 10-fold cross-validation method to predict the readmission in COVID-19 patients.
Table 4

10-fold CV Classification performance of different classifiers on selected features.

ClassifierMean AccuracyMean Specificity (%)Mean SensitivityMean F- measureKappa Statistic (KS)AUC
HGB ClassifierMean0.81760.8140.82960.820182.4%0.8233
95% CI(0.81, 0.83)(0.8, 0.82)(0.81, 0.85)(0.81, 0.83)(0.82, 0.86)(0.81, 0.83)
STD0.01540.01270.02960.01480.02570.0157
Bagging ClassifierMean0.8470.8410.8470.84584.36%0.843
95% CI(0.84, 0.85)(0.84, 0.85)(0.84, 0.85)(0.85, 0.85)(0.84, 0.85)(0.84, 0.85)
STD0.01720.01160.001280.01940.01270.0182
MLP ClassifierMean0.8860.8890.8840.88188.6%0.882
95% CI(0.88, 0.89)(0.88, 0.89)(0.88, 0.89)(0.88, 0.89)(0.88, 0.89)(0.88, 0.89)
STD0.00270.01120.01340.001400.0100.0129
XGBoost ClassifierMean0.9170.9130.9160.91891.37%0.9145
95% CI(0.91, 0.92)(0.91, 0.92)(0.91, 0.92)(0.91, 0.92)(0.91, 0.92)(0.91, 0.92)
STD0.01460.01380.01470.01750.019240.0126
SVM (kernel = linear)Mean0.88960.87330.9120.89288.7%0.892
95% CI(0.87, 0.90)(0.66, 0.88)(0.90, 0.93)(0.88, 0.90)(0.88, 0.89)(0.88, 0.90)
STD0.01740.01670.01290.01820.01400.01864
SVM (kernel = RBF)Mean0.8570.8500.8610.85986.7%0.863
95% CI(0.85, 0.86)(0.84, 0.86)(0.85, 0.87)(0.85, 0.87)(0.86, 0.87)(0.86, 0.87)
STD0.01270.017340.01290.01340.01180.01727
K Nearest Neighbor ClassifierMean0.88350.87850.8920.893788.3%0.886
95% CI(0.88, 0.89)(0.87, 0.89)(0.89, 0.90)(0.89, 0.90)(0.88, 0.89)(0.88, 0.89)
STD0.00140.01740.0180.01620.01830.0163
10-fold CV Classification performance of different classifiers on selected features. Table 4 shows the results of the ML models on the adopted features by the LASSO feature selection method in ten independent runs. The results show that the HGB classifier gave a mean accuracy of 88.6%, a mean sensitivity of 88.4%, a mean specificity of 88.9.55%, mean F-measure of 88.1%, a mean for Kappa statistic of 88.6%, and AUC of 88.2% when selected risk factors were used. Bagging classifier obtained a mean accuracy of 84.7%, a mean sensitivity of 84.7%, a mean specificity of 84.1%, a mean F-measure of 84.5%, a mean for Kappa statistic of 84.36.6%, and AUC of 84.3% when the LASSO feature selection method was included in the classifier. Based on Table 3, the MLP classifier shows good performance that has a mean accuracy of 88.6%, 88.9% for a mean of specificity, 88.4% for a mean sensitivity of 88.1%, a Mean F-measure, 88.6% a mean of Kappa Statistic, and 88.2% for a mean of AUC metrics. The performance of the XGBoost classifier was excellent, as shown in Table 3. The XGBoost classifier achieved 91.7% for a mean accuracy, 91.3% specificity, 91.6% mean of sensitivity, 91.8% mean F-measure, 91.37% a mean of Kappa Statistic 91.4% for a mean of AUC per ten independent runs. The SVM (kernel = linear) was the second-best classifier that has a mean of accuracy 88.9%, 87.3% for a mean of specificity, 91.2% for a mean of sensitivity, 89.2% mean F- measure, 88.7% a mean of Kappa Statistic and 89.2% obtained as a mean of AUC. The SVM (kernel = RBF) has a mean accuracy of 85.7%, a mean sensitivity of 86.1%, a mean specificity of 85.0%, Mean F-measure of 85.9%, a mean for Kappa rate of 86.7%, and AUC of 86.3% when LASSO feature selection method was included in the classifier. The KNN classifier with mean classification accuracy 88.3%, specificity 87.8%, sensitivity 89.2%, F- measure 89.37%, Kappa statistic 88.3%, and AUC 88.6% achieved nearly acceptable performance. As shown in Fig. 3 , the performance of the XGBoost classifier outperformed the other six ML models with 91.7% mean accuracy, 91.3% mean specificity, 91.6% mean sensitivity, 91.8% mean F-measure, and 0.9145 AUC. The second important model was SVM with the linear kernel (ACU = 0.892), and the worst performance was observed for the HGB classifier out of six other ML algorithms (AUC = 0.8233). The classification report and ROC curve of the XGBoost classifier as the best classification algorithm in the present study in terms of the highest evaluation metrics are displayed in Fig. 4 .
Fig. 3

Comparison of classification models performance on selected features.

Fig. 4

Classification report and AUC curve of the XGBoost classifier.

Comparison of classification models performance on selected features. Classification report and AUC curve of the XGBoost classifier.

Discussion

Given the unknown nature of COVID-19 with a wide range of symptoms and complications, it is important to implement intelligent-based models for estimating the possibility of its reinfection and recurrence [30,31]. Readmission and disease recurrence prediction is complex and challenging, especially in new and ambiguous diseases such as COVID-19 [32,33]. Based on our knowledge, this work is one of the few studies that applied ML algorithms for predicting the readmission risk of patients with COVID-19. So far, most previous ML-based studies have focused on predicting readmission of chronic conditions such as cardiovascular [1,[34], [35], [36], [37], [38], [39]], stroke [[40], [41], [42], [43], [44]], and COPD [5,6,[45], [46], [47]]. Till now, few studies have been conducted about COVID-19 readmission. In Rodriguez's study (2021), a predictive model for readmission in COVID-19 patients was presented based on an ML classifier. They concluded that ML and data mining-based approaches have seemed fruitful for readmission prediction [20]. Koteswari (2020) proposed an intelligent model to predict the readmission probability of various COVID-19 cases using ML techniques. The experimental results demonstrate ML-based predictive models can reduce COVID-19 readmission [30]. Raftarai (2021) compared the performance of four ML algorithms for predicting readmission in patients with COVID-19. The AdaBoost ensemble classifier yielded the best performance (accuracy 91.61%) [33]. Similarly, Jia (2021) assessed the performance of some ML algorithms to predict future deterioration among discharged patients with COVID-19. Finally, the best performance was yielded by XGBoost with a mean accuracy of 91.7%, mean specificity of 91.3%, mean sensitivity of 91.6%, mean F-measure of 91.8%, and AUC of 91.45%. Ryu (2021) [48] showed Gradient Boosting Machine (GBM) and Lo (2021) [49] concluded Categorical boosting (Catboost) had the highest AUC performance (= %75.1 and %75.15 respectively) in prediction readmission. Besides in recent studies (performed in 2021) by Zhao [50], Darabi [51], Chen [52], Shah [53], the results showed Boosting algorithms gained better performance in predicting patient readmission. Boosting like Adaptive Boosting (Ada Boost), XGBoost, HGB, Catboost and GBM is a set of powerful and most widely used ML algorithms. Boosting classifiers improve the classification accuracy by combining of the outputs from a sequence of weak learner and developing a robust predictive model [54,55]. The results of previous studies showed that the performance of these algorithms was optimum in predicting hospital readmission risk in patients with COVID-19. In the present study, due to the optimization of prediction variables through performing feature selection and data preprocessing before using them as inputs for modeling, the performance of the implemented models has been improved. Similarly in the current work the XGBoost model outperformed the other six techniques (0.91% AUC, 0.91–0.92 CI and 0.0146 STD). Since the COVID-19 pandemic began, several studies selected clinically important predictors for post-discharge COVID-19. For example, Rodriguez's study (2021) indicated underline chronic disease, hypoxia (oxygen saturation ≤94%), increased LDH, CRP, and ESR as the most effective factors on hospital readmission [20]. In another study performed by Mendito (2021), several clinical features such as age, neutrophilia count, sequential organ failure assessment (SOFA), LDH, CRP, and D-dimer are recognized as highly contributing factors to the readmission of COVID-19 patients [31]. But, Duarte's research (2021) detected polypharmacy, living in residential care or nursing homes, general illness, chest pain, psychological symptoms, syncope, and superinfection as the most relevant factors on COVID-19 hospital readmission [56]. Accordingly, in Nematshahi et al.'s (2021) study, the period between discharge to readmission, age, gender, underline disease, creatinine level, and pulmonary involvement were renowned as influencing factors in predicting COVID-19 readmission [57]. Similarly, in Jeon's (2020) research, age and sex variables and the presence of underlying disease are effective in increasing the risk of readmission of COVID-19 patients [58]. The presence of comorbidities, high BMI, adult age, laboratory indicators such as CRP, creatinine, and ALT/ASP rate was introduced as one of the most important underlying factors for readmission in COVID-19 patients in the Verna study [59]. In a systematic review study conducted by Akbari et al. (2021), they concluded that male sex, white ethnicity, comorbid diseases, and old age are affecting variables on COVID-19 readmission [60]. Fukushima's study (2021) also showed that certain comorbidities such as diabetes, hypertension, and cardiovascular diseases have a higher capability in predicting the readmission risk among COVID-19 patients [61]. Age over 60 years, underlying diseases, especially diabetes, high creatinine level, and lung involvement were the essential predictors of readmission in the patients with COVID-19 (et al. [32]). The most important variables in the Green (2021) study for readmission prediction were age, LOS, ICU admission, oxygen saturation, D-dimer, and cardiovascular diseases [62]. Similarly, we identified 14 highly correlated variables with the output class. Major risk factors for readmission in the current study include COVID-19 status, ICU admission, Oxygen therapy, CRP on admission, duration of hospitalization, Solid-organ transplantation, Lymphocytes on discharge, Coronary artery disease, Cerebrovascular disease, CRP on admission, congestive heart failure, asthma, metastatic solid tumor, and age most of which are non-modifiable. It should be noted that the identified variables in the present study are consistent with the previous researches. In the reviewed studies, baseline variables (e.g. age and sex), laboratory indicators, underlying diseases (comorbidities) and resource utilization variables such as LOS, ICU admission, and oxygen therapy play a pivotal role in predicting the readmission of patients with COVID-19. However in these studies, the importance of radiological data for readmission risk prediction among COVID-19 patients, has been neglected. Similarity, in the present study, after doing feature selection, the selected data set lacks radiological variables. Therefore, more studies are needed in this regard. In addition, several models for predicting the risk of readmission among COVID-19 patients have been developed, one of which gained reasonable performance in the evaluation phase. Interestingly, the selected ML algorithm (XGBoost) can predict the 30-day readmission risk of patients with high accuracy. The proposed model of the present study can help healthcare providers timely detect patient deterioration and reduce the severe complications and the resulting mortalities. This study is a retrospective-single-center study including a relatively small number of patient data. Therefore, the findings may not be generalizable to the wider population. In addition, the existence of some noisy data fields such as inconsistency, meaningless, missing, error-prone, and abnormal fields might impact the data mining accuracy. Moreover, we used only eight ML algorithms for prediction analyses based on some clinical features. Our data set furthermore lacked clinically essential variables such as imaging indicators. Therefore, at first, to remove noisy data, the normal range of each variable is defined using the opinion of two infectious diseases specialists. Then, we specified all the values outside the defined range and completed them by referring them to the responsible doctor. In addition, the records with more than 70% of empty fields (=439 as shown in Fig. 1) were removed. The missing fields in the records with less than 70% missing are imputed by mean and mode values substitution for continuous and discrete variables, respectively. Additional external validation methods should be used to prove the results of the present study and further verify the generalizability of our results. Finally, the selected dataset lacks some clinical variables such as radiological indicators. As practical solutions, the accuracy and generalizability of our models will be enhanced if we test more ML techniques at the larger, multicenter, and prospective datasets.

Conclusion

We implement and validate several predictive models stratifying readmission risk for COVID-19 patients. In particular, it has been observed that the XGBoost model performed best on classification accuracy better than the other ML algorithms. This method can provide caregivers and hospital administrators with an effective instrument to allocate limited hospital resources best. These models also may be an advantage in better and customized care delivery, lessen clinician workload, and diminish severe complication and death in the COVID-19 patients. In future work, the proposed method is expected to be applied to other hospital resource utilization domains such as ICU bed turnover, LOS, and respiratory ventilator.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  41 in total

1.  Readmission and Death After Initial Hospital Discharge Among Patients With COVID-19 in a Large Multihospital System.

Authors:  John P Donnelly; Xiao Qing Wang; Theodore J Iwashyna; Hallie C Prescott
Journal:  JAMA       Date:  2021-01-19       Impact factor: 56.272

2.  Diagnoses and timing of 30-day readmissions after hospitalization for heart failure, acute myocardial infarction, or pneumonia.

Authors:  Kumar Dharmarajan; Angela F Hsieh; Zhenqiu Lin; Héctor Bueno; Joseph S Ross; Leora I Horwitz; José Augusto Barreto-Filho; Nancy Kim; Susannah M Bernheim; Lisa G Suter; Elizabeth E Drye; Harlan M Krumholz
Journal:  JAMA       Date:  2013-01-23       Impact factor: 56.272

3.  Machine learning-based prediction of heart failure readmission or death: implications of choosing the right model and the right metrics.

Authors:  Saqib Ejaz Awan; Mohammed Bennamoun; Ferdous Sohel; Frank Mario Sanfilippo; Girish Dwivedi
Journal:  ESC Heart Fail       Date:  2019-02-27

4.  Analysis of Characteristics in Death Patients with COVID-19 Pneumonia without Underlying Diseases.

Authors:  Yiqi Hu; He Deng; Lu Huang; Liming Xia; Xin Zhou
Journal:  Acad Radiol       Date:  2020-04-07       Impact factor: 3.173

5.  Explaining the reasons for not maintaining the health guidelines to prevent COVID-19 in high-risk jobs: a qualitative study in Iran.

Authors:  Neda SoleimanvandiAzar; Seyed Fahim Irandoost; Sina Ahmadi; Tareq Xosravi; Hadi Ranjbar; Morteza Mansourian; Javad Yoosefi Lebni
Journal:  BMC Public Health       Date:  2021-05-03       Impact factor: 3.295

6.  Predictors of readmission requiring hospitalization after discharge from emergency departments in patients with COVID-19.

Authors:  Vincenzo G Menditto; Francesca Fulgenzi; Martina Bonifazi; Umberto Gnudi; Silvia Gennarini; Federico Mei; Aldo Salvi
Journal:  Am J Emerg Med       Date:  2021-04-22       Impact factor: 2.469

7.  Comparing machine learning algorithms for predicting COVID-19 mortality.

Authors:  Khadijeh Moulaei; Mostafa Shanbehzadeh; Zahra Mohammadi-Taghiabad; Hadi Kazemi-Arpanahi
Journal:  BMC Med Inform Decis Mak       Date:  2022-01-04       Impact factor: 2.796

8.  Developing a clinical decision support system based on the fuzzy logic and decision tree to predict colorectal cancer.

Authors:  Raoof Nopour; Mostafa Shanbehzadeh; Hadi Kazemi-Arpanahi
Journal:  Med J Islam Repub Iran       Date:  2021-04-03

9.  Prediction of 30-Day Readmission After Stroke Using Machine Learning and Natural Language Processing.

Authors:  Christina M Lineback; Ravi Garg; Elissa Oh; Andrew M Naidech; Jane L Holl; Shyam Prabhakaran
Journal:  Front Neurol       Date:  2021-07-13       Impact factor: 4.003

10.  Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality.

Authors:  Sheojung Shin; Peter C Austin; Heather J Ross; Husam Abdel-Qadir; Cassandra Freitas; George Tomlinson; Davide Chicco; Meera Mahendiran; Patrick R Lawler; Filio Billia; Anthony Gramolini; Slava Epelman; Bo Wang; Douglas S Lee
Journal:  ESC Heart Fail       Date:  2020-11-17
View more
  1 in total

1.  Predictive modeling for COVID-19 readmission risk using machine learning algorithms.

Authors:  Mostafa Shanbehzadeh; Azita Yazdani; Mohsen Shafiee; Hadi Kazemi-Arpanahi
Journal:  BMC Med Inform Decis Mak       Date:  2022-05-20       Impact factor: 3.298

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.