| Literature DB >> 35756093 |
Javad Zarei1, Amir Jamshidnezhad1, Maryam Haddadzadeh Shoushtari2, Ali Mohammad Hadianfard1, Maria Cheraghi3, Abbas Sheikhtaheri4.
Abstract
Prediction of the death among COVID-19 patients can help healthcare providers manage the patients better. We aimed to develop machine learning models to predict in-hospital death among these patients. We developed different models using different feature sets and datasets developed using the data balancing method. We used demographic and clinical data from a multicenter COVID-19 registry. We extracted 10,657 records for confirmed patients with PCR or CT scans, who were hospitalized at least for 24 hours at the end of March 2021. The death rate was 16.06%. Generally, models with 60 and 40 features performed better. Among the 240 models, the C5 models with 60 and 40 features performed well. The C5 model with 60 features outperformed the rest based on all evaluation metrics; however, in external validation, C5 with 32 features performed better. This model had high accuracy (91.18%), F-score (0.916), Area under the Curve (0.96), sensitivity (94.2%), and specificity (88%). The model suggested in this study uses simple and available data and can be applied to predict death among COVID-19 patients. Furthermore, we concluded that machine learning models may perform differently in different subpopulations in terms of gender and age groups.Entities:
Mesh:
Year: 2022 PMID: 35756093 PMCID: PMC9226971 DOI: 10.1155/2022/1644910
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 3.822
Figure 1Overview of the study steps.
Different feature sets.
| Feature set | Method | Number of features | Features |
|---|---|---|---|
| 1 | Feature selection node (default setting) | 17 | Age, contact with COVID-19 patients, cough, diabetes, diagnosis only by abnormal CT, diagnosis only by positive PCR, diagnosis by positive PCR and abnormal CT, gender, heart diseases, HTN, and ICU. Admission, intubation, muscle ache, number of comorbidity, oxygen therapy blood oxygen saturation level, and respiratory distress. |
|
| |||
| 2 | Univariate analysis ( | 32 | Age, cancer, chronic kidney disease, chronic liver disease, contact (with a probable or confirmed case in the 14 days before the onset of symptoms), convulsion, cough, diabetes, diagnosis only by abnormal CT, diagnosis only by positive PCR, diagnosis by positive PCR and abnormal CT, dialysis, diarrhea, dizziness, drug abuse, gender, headache, heart diseases, HIV/AIDS, HTN, and ICU. Admission, immune diseases, intubation, nervous system diseases, number of comorbidities, other chronic lung diseases, oxygen therapy, paralysis, blood oxygen saturation level, pregnancy, respiratory distress, and unconsciousness. |
|
| |||
| 3 | Univariate analysis ( | 40 | The feature set 2 + asthma, chronic hematology diseases, mental disorders, muscle ache, other diseases (comorbidities), drowsiness, gustatory dysfunction, and weakness. |
|
| |||
| 4 | All features | 60 | The feature set 3 + abdominal pain, autoimmune disease, chest pain, chills, constipation, ocular manifestations, fever, GI bleeding, hemoptysis, nausea, anorexia, other GI signs, paresis, runny nose, skin manifestations, sore throat, olfactory dysfunction, smoking, sweating, and vomiting. |
Comparison of surviving and nonsurviving patients.
| Variables | Alive ( | Dead ( | Total patients ( |
|
|---|---|---|---|---|
|
| ||||
| Mean (±SD), years | 54 ± 18.3 | 65.7 ± 16.2 | 55.88 ± 18.46 | <0.0001 |
| Median (Q1, Q3) | 56 (42, 67) | 67 (57, 77) | 58 (43, 69) | |
| Sex, male | 4611 (51.5) | 1010 (59) | 5621 (52.7) | <0.0001 |
| Contact with infected people (yes) | 3169 (35.4) | 706 (41.3) | 3875 (36.4) | <0.0001 |
|
| ||||
| Cough (yes) | 5296 (59.2) | 899 (52.5) | 6195 (58.1) | <0.0001 |
| Respiratory distress (yes) | 5021 (56.1) | 1288 (75.3) | 6309 (59.2) | <0.0001 |
| Fever (yes) | 4225 (47.2) | 802 (46.9) | 5027 (47.2) | 0.788 |
| Muscle aches (yes) | 2417 (27) | 426 (24.9) | 2843 (26.7) | 0.069 |
| Chills (yes) | 70 (0.8) | 9 (0.5) | 79 (0.7) | 0.257 |
| Vomiting (yes) | 452 (5.1) | 79 (4.9) | 531 (5) | 0.448 |
| Headache (yes) | 480 (5.4) | 51 (3) | 531 (5) | <0.0001 |
| Chest pain (yes) | 304 (3.4) | 61 (3.6) | 365 (3.4) | 0.728 |
| Diarrhea (yes) | 315 (3.5) | 40 (2.3) | 355 (3.3) | 0.012∗ |
| Sore throat (yes) | 48 (0.2) | 4 (0.2) | 52 (0.5) | 0.100 |
| Gustatory dysfunction (yes) | 98 (1.1) | 10 (0.6) | 108 (1) | 0.053 |
| Olfactory dysfunction (yes) | 123 (1.4) | 19 (1.1) | 142 (1.3) | 0.382 |
| Abdominal pain (yes) | 203 (2.3) | 31 (1.8) | 234 (2.2) | 0.237 |
| Runny nose (yes) | 8 (0.1) | 0 (0.0) | 8 (0.1) | 0.216 |
| Convulsion (yes) | 42 (0.5) | 19 (1.1) | 61 (0.6) | 0.001 |
| Altered consciousness (yes) | 213 (2.4) | 419 (24.5) | 633 (5.9) | <0.0001 |
| GI bleeding (yes) | 5 (0.1) | 0 (0.0) | 5 (0.0) | 0.417 |
| Skin lesion/rush (yes) | 11 (0.1) | 3 (0.2) | 14 (0.1) | 0.584 |
| Dizziness (yes) | 249 (2.8) | 30 (1.8) | 279 (2.6) | 0.014 |
| Paresis (yes) | 54 (0.6) | 11 (0.6) | 65 (0.6) | 0.848 |
| Paralysis (yes) | 22 (0.2) | 13 (0.8) | 35 (0.3) | 0.001 |
| Weakness (yes) | 350 (3.9) | 80 (4.7) | 430 (4) | 0.142 |
| Sweating (yes) | 11 (0.1) | 2 (0.1) | 13 (0.1) | 0.947 |
| Ocular manifestations (yes) | 3 (0.0) | 0 (0.0) | 3 (0.0) | 0.449 |
| Hemoptysis (yes) | 6 (0.1) | 2 (0.1) | 8 (0.1) | 0.491 |
| Drowsiness (yes) | 3 (0.0) | 2 (0.1) | 5 (0.0) | 0.185 |
| Constipation (yes) | 7 (0.1) | 1 (0.1) | 8 (0.1) | 0.784 |
| Nausea (yes) | 478 (5.3) | 89 (5.2) | 567 (5.3) | 0.811 |
| Anorexia (yes) | 724 (8.1) | 138 (8.1) | 862 (8.1) | 0.969 |
| Other GI symptoms (yes) | 7 (0.1) | 0 (0.0) | 7 (0.1) | 0.247 |
|
| ||||
| (i) Less than 93 | 2046 (22.9) | 934 (54.6) | 2980 (28) | <0.0001 |
| (ii) More than 93 | 6900 (77.1) | 777 (45.4) | 7677 (72) | |
|
| ||||
| Any comorbidity (yes) | 3314 (37) | 826 (48.3) | 4140 (38.8) | <0.0001 |
| Number of comorbidities | <0.0001 | |||
| 0 | 5632 (63) | 885 (51.7) | 6517 (61.2) | |
| 1 | 1868 (2.9) | 391 (22.9) | 2259 (21.2) | |
| 2 | 946 (10.6) | 275 (16.1) | 1221 (11.5) | |
| 3 | 396 (4.4) | 112 (6.5) | 508 (4.8) | |
| >3 | 104 (1.1) | 48 (2.8) | 152 (1.5) | |
| Number of comorbidities (mean ± SD) | 0.6 ± 0.9 | 0.87 ± 1.1 | 0.65 ± 0.97 | <0.0001 |
| Hypertension (yes) | 1291 (14.4) | 356 (20.8) | 1647 (5.5) | <0.0001∗ |
| Heart diseases (yes) | 1102 (12.3) | 294 (17.2) | 1396 (13.11) | <0.0001 |
| Diabetes (yes) | 1577 (17.6) | 376 (22) | 1953 (18.3) | <0.0001 |
| Immunodeficiency diseases (yes) | 32 (0.4) | 13 (0.8) | 45 (0.4) | 0.019 |
| Asthma (yes) | 198 (2.2) | 28 (1.6) | 226 (2.1) | 0.129 |
| Neurological diseases (yes) | 140 (1.6) | 49 (2.9) | 189 (1.8) | <0.0001 |
| Chronic kidney diseases (yes) | 289 (3.2) | 114 (6.7) | 403 (3.8) | <0.0001 |
| Dialysis (yes) | 78 (0.9) | 33 (1.9) | 111 (1) | <0.0001 |
| Other chronic lung diseases (yes) | 136 (1.5) | 44 (2.6) | 180 (1.7) | 0.002 |
| Chronic hematologic diseases (yes) | 740 (0.8) | 20 (1.2) | 94 (0.9) | 0.166 |
| Cancer (yes) | 172 (1.9) | 80 (4.7) | 252 (2.4) | <0.0001 |
| Autoimmune diseases (yes) | 2 (0.0) | 0 (0.0) | 2 (0.0) | 0.536 |
| Chronic liver diseases (yes) | 46 (0.5) | 16 (0.9) | 62 (0.6) | 0.036 |
| HIV/AIDS (yes) | 7 (0.1) | 5 (0.3) | 12 (0.1) | 0.016 |
| Mental disorders (yes) | 26 (0.3) | 2 (0.1) | 28 (0.3) | 0.198 |
| Smoking (yes) | 143 (1.6) | 33 (1.9) | 176 (1.7) | 0.326 |
| Drug abuse (yes) | 54 (0.6) | 21 (1.2) | 75 (0.7) | 0.005∗ |
| Other comorbidities (yes) | 286 (3.2) | 69 (4) | 355 (0.0) | 0.078 |
| Pregnancy | 63 (0.7) | 2 (0.1) | 65 (0.6) | 0.004 |
|
| ||||
| Intubation (yes) | 308 (3.44) | 962 (56.2) | 1270 (11.9) | <0.0001 |
| ICU care (yes) | 1323 (14.8) | 1088 (63.6) | 2411 (22.6) | <0.0001 |
| Oxygen therapy (yes) | 2921 (32.7) | 682 (39.9) | 3603 (33.8) | <0.0001 |
|
| ||||
| (i) Only abnormal CT | 3197 (35.7) | 583 (31.4) | 3735 (35) | <0.0001 |
| (ii) Only positive PCR | 1161 (13) | 160 (9.4) | 1321 (12.4) | <0.0001 |
| (iii) Positive PCR and abnormal CT | 4588 (51.3) | 1013 (59.2) | 5601 (52.6) | <0.0001 |
Significant difference.
Top 10 models developed on original dataset 1.
| Setting | Feature set | Accuracy | Sensitivity | Specificity | Precision | F-score | AUC | |
|---|---|---|---|---|---|---|---|---|
| Bayesian network | Default | 2 | 91.12 | 64.7 | 96.2 | 76.4 | 0.701 | 0.914 |
| CHIAD | Default | 2 | 90.76 | 54 | 97.8 | 82.6 | 0.653 | 0.909 |
| MLP | 2.5.5 boosting | 1 | 90.63 | 53.6 | 97.7 | 81.5 | 0.647 | 0.904 |
| MLP | Boosting 1.10 | 3 | 90.79 | 54 | 97.8 | 82.3 | 0.652 | 0.903 |
| C5 | Boosting | 2 | 90.7 | 56.4 | 97.3 | 79.9 | 0.662 | 0.901 |
| MLP | 2.10.10 | 2 | 90.55 | 53.4 | 97.7 | 81.5 | 0.646 | 0.901 |
| MLP | 2.5.5 | 1 | 90.31 | 55.4 | 97 | 77.6 | 0.646 | 0.901 |
| RF | Default | 2 | 84.52 | 77.5 | 85.9 | 51.3 | 0.617 | 0.9 |
| MLP | 2.20.20 | 3 | 90.51 | 53.6 | 97.5 | 80.5 | 0.643 | 0.899 |
| Bayesian network | Default | 1 | 90.46 | 55.5 | 97.1 | 78.5 | 0.65 | 0.899 |
For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2.
Top 10 models developed on dataset 2.
| Settings | Feature set | Accuracy | Sensitivity | Specificity | Precision | F-score | AUC | |
|---|---|---|---|---|---|---|---|---|
| SVM | RBF default | 4 | 87.83 | 83.4 | 90.3 | 82.9 | 0.832 | 0.942 |
| C5 | Boosting | 3 | 87.44 | 81.8 | 90.6 | 82.7 | 0.822 | 0.94 |
| SVM | RBF default | 3 | 87.59 | 82.7 | 90.3 | 82.4 | 0.826 | 0.938 |
| C5 | Boosting | 4 | 87.88 | 79.9 | 92.4 | 85.5 | 0.826 | 0.938 |
| RF | Default | 4 | 87.86 | 85.7 | 89.1 | 81.5 | 0.836 | 0.931 |
| C5 | Boosting | 2 | 86.68 | 78.5 | 91.5 | 84.3 | 0.813 | 0.927 |
| C5 | Boosting | 1 | 85.99 | 77.2 | 90.8 | 82.2 | 0.797 | 0.926 |
| SVM | RBF default | 2 | 86.61 | 79 | 91.1 | 83.7 | 0.813 | 0.926 |
| MLP | 1.10 | 3 | 85.38 | 77 | 90 | 80.9 | 0.789 | 0.923 |
| RF | Default | 1 | 85.26 | 85.2 | 85.3 | 76.2 | 0.804 | 0.923 |
For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2.
Top 10 models developed on dataset 3.
| Settings | Feature set | Accuracy | Sensitivity | Specificity | Precision | F-score | AUC | |
|---|---|---|---|---|---|---|---|---|
| C5 | Boosting | 4 | 92.77 | 95.1 | 90.5 | 90.8 | 0.929 | 0.972 |
| C5 | Boosting | 3 | 91.74 | 93.6 | 89.8 | 90.5 | 0.92 | 0.965 |
| C5 | Boosting | 2 | 91.18 | 94.2 | 88 | 89.1 | 0.916 | 0.96 |
| SVM | RBF default | 4 | 90.16 | 92.7 | 87.7 | 88.1 | 0.903 | 0.956 |
| C5 | Boosting | 1 | 89.28 | 91.3 | 87.3 | 87.7 | 0.895 | 0.952 |
| SVM | RBF default | 3 | 88.81 | 90.5 | 87.1 | 87.9 | 0.892 | 0.944 |
| MLP | 2.15.15 boosting | 3 | 88.59 | 90.2 | 86.9 | 87.7 | 0.889 | 0.94 |
| MLP | 2.12.12 boosting | 4 | 87.61 | 88.5 | 86.8 | 86.8 | 0.876 | 0.938 |
| C5 | Default | 3 | 87.4 | 89.8 | 85 | 86.1 | 0.879 | 0.934 |
| SVM | RBF default | 2 | 86.34 | 86.6 | 86.1 | 86.6 | 0.866 | 0.932 |
For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2.
Ensemble models developed on dataset 3.
| ID | Included models | Feature set | Accuracy | Sensitivity | Specificity | Precision | F-score | AUC |
|---|---|---|---|---|---|---|---|---|
| 1 |
| 1 | 86.10 | 0.799 | 0.924 | 0.914 | 0.853 | 0.954 |
| 2 |
| 2 | 87.39 | 0.859 | 0.889 | 0.888 | 0.873 | 0.954 |
| 3 |
| 3 | 87.26 | 0.831 | 0.915 | 0.908 | 0.867 | 0.954 |
| 4 |
| 4 | 89.13 | 0.864 | 0.919 | 0.916 | 0.890 | 0.961 |
External validation on dataset 3.
| Models | Settings | Feature set | Accuracy | Sensitivity | Specificity | Precision | F-score | AUC |
|---|---|---|---|---|---|---|---|---|
| C5 | Boosting | 1 | 92.56 | 0.955 | 0.919 | 0.720 | 0.821 | 0.974 |
| C5 | Boosting | 2 | 91.81 | 0.964 | 0.908 | 0.695 | 0.808 | 0.98 |
| SVM | RBF default | 3 | 91.00 | 0.848 | 0.924 | 0.706 | 0.771 | 0.955 |
| Ensemble 2 | — | 2 | 87.77 | 0.861 | 0.881 | 0.611 | 0.715 | 0.954 |
| SVM | RBF default | 2 | 88.24 | 0.890 | 0.881 | 0.618 | 0.729 | 0.953 |
| Ensemble 1 | — | 1 | 88.75 | 0.819 | 0.902 | 0.645 | 0.722 | 0.949 |
| C5 | Boosting | 3 | 86.51 | 0.935 | 0.850 | 0.575 | 0.712 | 0.948 |
| Ensemble 3 | — | 3 | 88.18 | 0.783 | 0.903 | 0.637 | 0.702 | 0.931 |
| MLP | 2.15.15 boosting | 3 | 87.95 | 0.767 | 0.904 | 0.634 | 0.694 | 0.914 |
| MLP | 2.12.12 boosting | 4 | 87.31 | 0.754 | 0.899 | 0.618 | 0.679 | 0.914 |
| Ensemble 4 | — | 4 | 86.62 | 0.770 | 0.887 | 0.596 | 0.672 | 0.91 |
| C5 | Boosting | 4 | 85.64 | 0.748 | 0.880 | 0.575 | 0.650 | 0.889 |
| C5 | Default | 3 | 85.24 | 0.780 | 0.868 | 0.562 | 0.653 | 0.887 |
| SVM | RBF default | 4 | 83.79 | 0.725 | 0.862 | 0.533 | 0.615 | 0.868 |
For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2.
Figure 2Subgroup false-positive rate (FPR) for different models. (a) C5 model on feature set 1. (b) C5 model on feature set 2. (c) SVM model on feature set 3. (d) Ensemble model on feature set 2.
Figure 3Subgroup false-negative rate (FNR) for different models. (a) C5 model on feature set 1. (b) C5 model on feature set 2. (c) SVM model on feature set 3. (d) Ensemble model on feature set 2.
Figure 4Variable importance of the selected model.
Some machine learning models suggested in the literature to predict death from COVID-19.
| Author | Number of patients, death rate, number of features | Models | Accuracy | AUC |
|---|---|---|---|---|
| Muhammad et al. [ | 1505, NA, 4 | Decision tree (DT) | 99.85 | NA |
| LR | 97.49 | NA | ||
| SVM | 98.85 | NA | ||
| Naive Bayes | 97.52 | NA | ||
| RF | 99.60 | NA | ||
| KNN | 98.06 | NA | ||
| Pourhomayoun and Shakibi [ | 307382, NA, 57 | RF | 87.93 | 0.94 |
| ANN | 89.98 | 0.93 | ||
| SVM | 89.02 | 0.88 | ||
| KNN | 89.83 | 0.90 | ||
| LR | 87.91 | 0.92 | ||
| DT | 86.87 | 0.93 | ||
| Li et al. [ | 2924, 8.8%, different features (83, 152, 5) | Gradient boosting decision tree, 83 features | 88.9 | 0.939 |
| LR, 152 features | 86.8 | 0.928 | ||
| LR, 5 features | 88.7 | 0.915 | ||
| Goncalves and Rouco [ | 827601, 8.7%, 3 | Adaboost, gradient boosting, and RF | NA | 0.919 |
| LR | NA | 0.917 | ||
| An et al. [ | 8000, 2.2%, 10 | SVM linear | 91.9 | 0.962 |
| LASSO | 91.1 | 0.963 | ||
| LASSO (14 days) | 86.8 | 0.944 | ||
| SVM linear (14 days) | 87.7 | 0.941 | ||
| LASSO (30 days) | 89.5 | 0.953 | ||
| SVM linear (30 days) | 87.7 | 0.948 | ||
| Yadaw et al. [ | 3841, 8.1%, 17 and 3 | XGBoost (17 and 3 features) | NA | 0.91 |
| Yan et al. [ | 375, 35%, 3 | XGBoost | 90 | F1: 0.97∗ |
| Gao et al. [ | 2160, 11%, 14 | SVM | 95.8 | 0.976 |
| ANN | 95.6 | 0.976 | ||
| Ensemble | 95.5 | 0.976 | ||
| LR | 95.4 | 0.974 | ||
| GBDT | 94.8 | 0.953 | ||
| Chen et al. [ | (192, 26%) only critically ill patients, 47 (17 nonlaboratory, 30 laboratory) | SVM linear | 93 (47 features) 87.8 (17 features) 85.6 (30 features) | NA |
| Booth et al. [ | 398, 10.8%, 5 | SVM-RBF | 93 | |
| Parchure et al. [ | 567, 17.8%, 55 | RF | 65.5 | 85.5 |
| Zhao et al. [ | 641, 12.8%, 47 | LR | NA | 0.82 |
| Das et al. [ | 3524, 2.1%, 4 | LR | 96.5 | 0.83 |
| SVM | 97 | 0.825 | ||
| KNN | 92.4 | 0.759 | ||
| RF | 92.4 | 0.787 | ||
| Gradient boosting | 97.1 | 0.787 | ||
| Chen et al. [ | 1002 severe and critical cases, 16.1%, 7 | LR | NA | 0.903 |
| Khan et al. [ | 103888, 5.7%, 15 | Deep neural network | 0.970 | F1: 0.985 |
| RF, XGBoost | 0.946 | 0.972 | ||
| LR, DT | 0.945 | 0.972 | ||
| KNN | 0.944 | 0.971 |
These studies did not report the AUC.