| Literature DB >> 35280933 |
Mohammad Reza Afrash1, Hadi Kazemi-Arpanahi2,3, Mostafa Shanbehzadeh4, Raoof Nopour5, Esmat Mirbagheri6.
Abstract
Introduction: The Coronavirus 2019 (COVID-19) epidemic stunned the health systems with severe scarcities in hospital resources. In this critical situation, decreasing COVID-19 readmissions could potentially sustain hospital capacity. This study aimed to select the most affecting features of COVID-19 readmission and compare the capability of Machine Learning (ML) algorithms to predict COVID-19 readmission based on the selected features. Material and methods: The data of 5791 hospitalized patients with COVID-19 were retrospectively recruited from a hospital registry system. The LASSO feature selection algorithm was used to select the most important features related to COVID-19 readmission. HistGradientBoosting classifier (HGB), Bagging classifier, Multi-Layered Perceptron (MLP), Support Vector Machine ((SVM) kernel = linear), SVM (kernel = RBF), and Extreme Gradient Boosting (XGBoost) classifiers were used for prediction. We evaluated the performance of ML algorithms with a 10-fold cross-validation method using six performance evaluation metrics.Entities:
Keywords: AUC, Area under the curve; Artificial intelligent; CDSS, Clinical Decision Support Systems; COVID-19; COVID-19, Coronavirus disease 2019; CRISP, Cross-Industry Standard Process; Coronavirus; HGB, Hist Gradient Boosting; LASSO, Least Absolute Shrinkage and Selection Operator; ML, Machine learning; MLP, Multi-Layered Perceptron; Machine learning; Readmission; SVM, Support Vector Machine; XGBoost, Extreme Gradient Boosting
Year: 2022 PMID: 35280933 PMCID: PMC8901230 DOI: 10.1016/j.imu.2022.100908
Source DB: PubMed Journal: Inform Med Unlocked ISSN: 2352-9148
Fig. 1The roadmap of the proposed system for prediction of readmission based on the CRISP method.
Fig. 2Flow chart describing patient selection.
Patient characteristics variable data.
| Patient Characteristics | Variables | Total | Readmission | Non-Readmission | P-value | |
|---|---|---|---|---|---|---|
| N | N | |||||
| Demographical | Sex | Female | 2720 | 412 | 2308 | <0.002** |
| Male | 3071 | 332 | 2739 | |||
| Marital status | single, | 1219 | 631 | 588 | <0.004** | |
| married | 4572 | 239 | 4333 | |||
| Age | 0–30 | 1363 | 152 | 1211 | <0.001** | |
| 30–60 | 1836 | 146 | 1690 | |||
| 60–90 | 2952 | 572 | 2380 | |||
| Hospitalization | Number of admissions | 1 | 4921 | 0 | 4921 | |
| 2–4 | 780 | 780 | 0 | <0.002** | ||
| >4 | 90 | 90 | 0 | |||
| Type of admission | Inpatient care | 2075 | 524 | 1551 | <0.001** | |
| Outpatient care | 3716 | 346 | 3370 | |||
| ICU admission | Yes | 528 | 462 | 66 | <0.002** | |
| No | 5263 | 408 | 4855 | |||
| Oxygen therapy | Yes | 720 | 543 | 177 | <0.161 | |
| No | 5071 | 327 | 4744 | |||
| CRP on admission | Yes | 380 | 329 | 51 | <0.039** | |
| No | 5411 | 541 | 4870 | |||
| Duration of hospitalization | <24 h | 3917 | 43 | 3874 | <0.497** | |
| 1–7 days | 1465 | 519 | 946 | |||
| >7days | 409 | 308 | 101 | |||
| Patient status on discharge | Partial recovery- | 1430 | 774 | 656 | <0.041** | |
| Complete recovery | 3970 | 62 | 3908 | |||
| dead | 391 | 34 | 357 | |||
| Time to readmission | <30 days | 1300 | 257 | 1043 | <0.052 | |
| >30days | 4491 | 613 | 3878 | |||
| COVID status | Critical | 520 | 14 | 506 | <0.001** | |
| Severe | 1034 | 142 | 892 | |||
| Moderate | 2300 | 540 | 1760 | |||
| Mild | 1540 | 98 | 1442 | |||
| Recovered | 397 | 14 | 383 | |||
| Severe kidney disease | Yes | 240 | 49 | 191 | <0.630 | |
| No | 5551 | 821 | 4730 | |||
| Solid organ transplantation | Yes | 182 | 94 | 88 | <0.951 | |
| No | 5609 | 776 | 4833 | |||
| Lymphocytes on discharge | Yes | 746 | 297 | 449 | <0.832 | |
| No | 5045 | 573 | 4472 | |||
| Coronary artery disease | Yes | 570 | 381 | 189 | <0.267 | |
| No | 5221 | 489 | 4732 | |||
| Cancer | Yes | 168 | 119 | 49 | <0.574 | |
| No | 5623 | 751 | 4872 | |||
| History of CT result | Normal | 3321 | 540 | 2781 | <0.059 | |
| Unmoral | 2470 | 330 | 2140 | |||
| Pregnancy | Yes | 94 | 23 | 71 | <0.720 | |
| No | 5697 | 847 | 4850 | |||
| Congestive heart failure | Yes | 350 | 180 | 170 | <0.968 | |
| No | 5441 | 690 | 4751 | |||
| Cerebrovascular disease | Yes | 49 | 8 | 41 | <0.602 | |
| No | 5742 | 862 | 4880 | |||
| C reactive protein on admission | Yes | 5308 | 710 | 4598 | <0.057 | |
| No | 753 | 160 | 593 | |||
| Congestive heart failure | Yes | 135 | 94 | 41 | <0.619 | |
| No | 5656 | 776 | 4880 | |||
| Asthma | Yes | 74 | 41 | 33 | <0.570 | |
| No | 5717 | 829 | 4888 | |||
| Metastatic solid tumor | Yes | 14 | 3 | 11 | <0.924 | |
| No | 5776 | 867 | 4909 | |||
| Diabetes mellitus | Yes | 364 | 79 | 285 | <0.738 | |
| No | 5427 | 791 | 4636 | |||
| D-dimer | Yes | 4680 | 361 | 4319 | <0.042** | |
| No | 1111 | 509 | 602 | |||
| Dyspnea | Yes | 1640 | 490 | 1150 | <0.069 | |
| No | 4151 | 380 | 3771 | |||
| Underlying diseases | Yes | 839 | 538 | 301 | <0.073 | |
| No | 4952 | 468 | 4484 | |||
| Headache | Yes | 4981 | 681 | 4300 | <0.075 | |
| No | 810 | 189 | 621 | |||
| Weakness and lethargy | Yes | 5134 | 526 | 4608 | <0.052 | |
| No | 657 | 344 | 313 | |||
| Body pain | Yes | 4391 | 617 | 3774 | <0.061 | |
| No | 1400 | 253 | 1147 | |||
| Pain or pressure in the chest | Yes | 2670 | 594 | 2076 | <0.068 | |
| No | 3121 | 276 | 2845 | |||
| High fever | Yes | 4621 | 713 | 3908 | <0.072 | |
| No | 1170 | 157 | 1013 | |||
| Nausea & Vomiting | Yes | 3910 | 672 | 3238 | <0.067 | |
| No | 1881 | 198 | 1683 | |||
| Cough | Yes | 4627 | 593 | 4034 | <0.0512 | |
| No | 1164 | 277 | 887 | |||
| Gastrointestinal symptoms | Yes | 234 | 56 | 178 | <0.102 | |
| No | 5557 | 814 | 4743 | |||
| Chronic pulmonary | Yes | 261 | 73 | 188 | <0.284 | |
| No | 5530 | 797 | 4733 | |||
| Hypertension | Yes | 840 | 142 | 698 | <0.043** | |
| No | 4951 | 728 | 4223 | |||
| Consolidation | Yes | 461 | 59 | 402 | <0.0497** | |
| No | 5330 | 811 | 4519 | |||
| Pleural fluid | Yes | 571 | 137 | 434 | <0.0581 | |
| No | 5220 | 733 | 4487 | |||
| Hypersensitive troponin | Yes | 892 | 261 | 568 | <0.042* | |
| No | 4899 | 609 | 4290 | |||
Important variables selected by the LASSO algorithm.
| Order | Feature name | Score | P-Value |
|---|---|---|---|
| 1 | COVID status | 3.78 | 0/015 |
| 2 | ICU admission | 3.50 | 0/035 |
| 3 | Oxygen therapy | 3.31 | 0/012 |
| 4 | CRP on admission | 3.19 | 0/047 |
| 5 | Duration of hospitalization | 3.08 | 0/032 |
| 6 | Solid organ transplantation | 2.94 | <0/001 |
| 7 | Lymphocytes on discharge | 2.71 | 0/001 |
| 8 | Coronary artery disease | 2.64 | 0/023 |
| 9 | Cerebrovascular disease | 2.47 | 0/027 |
| 10 | C reactive protein on admission | 2.39 | 0/012 |
| 11 | Congestive heart failure | 2.15 | 0/017 |
| 12 | Asthma | 2.09 | 0/021 |
| 13 | Metastatic solid tumor | 2.03 | 0/006 |
| 14 | Age | 1.74 | 0/045 |
Best hyper-parameters for ML algorithm modeling in prediction of readmission.
| Num | Algorithms | Hyper-parameters | f-score |
|---|---|---|---|
| 1 | HistGradientBoostingClassifier | ‘verbose’ = 2, ‘random_state’ = 999, ‘max_leaf_nodes’ = 62, ‘max_iter’ = 150, ‘max_depht’ = 7, ‘learning rate’ = 0.1 | 93.7 |
| 2 | BaggingClassifier | ‘verbose’ = 2, ‘random_state’ = 999, ‘n_estimation’ = 12, ‘max-samples’ = 0.5, ‘bootstrap’ = ‘true’ | 91.28 |
| 3 | MLP Classifier | ‘Learning rate’ = ‘constant’, hidden_layer_size’ = (100,100,100), ‘alpha’ = 0.05, ‘activation’ = ‘rulo’ | 91.07 |
| 4 | SVM (kernel = linear) | C = 100,G = 0.0001 | 90.09 |
| 5 | SVM (kernel = RBF) | C = 10, G = 0.001 | 89.24 |
| 6 | XG Boost Classifier | ‘min_chid_weigh’ = 1′max_depht’ = 12,‘learning_rate’ = 0.1, ‘gamma’ = 0.4, ‘colsample_bytree’ = 0.3 | 89.01 |
| 7 | K Nearest Neighbor Classifier | K = 3, ‘n_jobs’ = −1, ‘algorithm’ = ‘auto’ | 87.00 |
10-fold CV Classification performance of different classifiers on selected features.
| Classifier | Mean Accuracy | Mean Specificity (%) | Mean Sensitivity | Mean F- measure | Kappa Statistic (KS) | AUC | |
|---|---|---|---|---|---|---|---|
| HGB Classifier | Mean | 0.8176 | 0.814 | 0.8296 | 0.8201 | 82.4% | 0.8233 |
| 95% CI | (0.81, 0.83) | (0.8, 0.82) | (0.81, 0.85) | (0.81, 0.83) | (0.82, 0.86) | (0.81, 0.83) | |
| STD | 0.0154 | 0.0127 | 0.0296 | 0.0148 | 0.0257 | 0.0157 | |
| Bagging Classifier | Mean | 0.847 | 0.841 | 0.847 | 0.845 | 84.36% | 0.843 |
| 95% CI | (0.84, 0.85) | (0.84, 0.85) | (0.84, 0.85) | (0.85, 0.85) | (0.84, 0.85) | (0.84, 0.85) | |
| STD | 0.0172 | 0.0116 | 0.00128 | 0.0194 | 0.0127 | 0.0182 | |
| MLP Classifier | Mean | 0.886 | 0.889 | 0.884 | 0.881 | 88.6% | 0.882 |
| 95% CI | (0.88, 0.89) | (0.88, 0.89) | (0.88, 0.89) | (0.88, 0.89) | (0.88, 0.89) | (0.88, 0.89) | |
| STD | 0.0027 | 0.0112 | 0.0134 | 0.00140 | 0.010 | 0.0129 | |
| XGBoost Classifier | Mean | 0.917 | 0.913 | 0.916 | 0.918 | 91.37% | 0.9145 |
| 95% CI | (0.91, 0.92) | (0.91, 0.92) | (0.91, 0.92) | (0.91, 0.92) | (0.91, 0.92) | (0.91, 0.92) | |
| STD | 0.0146 | 0.0138 | 0.0147 | 0.0175 | 0.01924 | 0.0126 | |
| SVM (kernel = linear) | Mean | 0.8896 | 0.8733 | 0.912 | 0.892 | 88.7% | 0.892 |
| 95% CI | (0.87, 0.90) | (0.66, 0.88) | (0.90, 0.93) | (0.88, 0.90) | (0.88, 0.89) | (0.88, 0.90) | |
| STD | 0.0174 | 0.0167 | 0.0129 | 0.0182 | 0.0140 | 0.01864 | |
| SVM (kernel = RBF) | Mean | 0.857 | 0.850 | 0.861 | 0.859 | 86.7% | 0.863 |
| 95% CI | (0.85, 0.86) | (0.84, 0.86) | (0.85, 0.87) | (0.85, 0.87) | (0.86, 0.87) | (0.86, 0.87) | |
| STD | 0.0127 | 0.01734 | 0.0129 | 0.0134 | 0.0118 | 0.01727 | |
| K Nearest Neighbor Classifier | Mean | 0.8835 | 0.8785 | 0.892 | 0.8937 | 88.3% | 0.886 |
| 95% CI | (0.88, 0.89) | (0.87, 0.89) | (0.89, 0.90) | (0.89, 0.90) | (0.88, 0.89) | (0.88, 0.89) | |
| STD | 0.0014 | 0.0174 | 0.018 | 0.0162 | 0.0183 | 0.0163 | |
Fig. 3Comparison of classification models performance on selected features.
Fig. 4Classification report and AUC curve of the XGBoost classifier.