| Literature DB >> 35741274 |
Seung-Min Baik1, Miae Lee2, Kyung-Sook Hong1, Dong-Jin Park2.
Abstract
This study was designed to develop machine-learning models to predict COVID-19 mortality and identify its key features based on clinical characteristics and laboratory tests. For this, deep-learning (DL) and machine-learning (ML) models were developed using receiver operating characteristic (ROC) area under the curve (AUC) and F1 score optimization of 87 parameters. Of the two, the DL model exhibited better performance (AUC 0.8721, accuracy 0.84, and F1 score 0.76). However, we also blended DL with ML, and the ensemble model performed the best (AUC 0.8811, accuracy 0.85, and F1 score 0.77). The DL model is generally unable to extract feature importance; however, we succeeded by using the Shapley Additive exPlanations method for each model. This study demonstrated both the applicability of DL and ML models for classifying COVID-19 mortality using hospital-structured data and that the ensemble model had the best predictive ability.Entities:
Keywords: COVID-19; artificial intelligence; ensemble model; mortality
Year: 2022 PMID: 35741274 PMCID: PMC9221552 DOI: 10.3390/diagnostics12061464
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Schematic overall flow chart of conducting the study.
Demographics and clinical characteristics of COVID-19 patients.
| Variable | Total ( | Non-Survival ( | Survival ( |
|
|---|---|---|---|---|
| Male sex (%) | 98 (48.3) | 24 (49.0) | 74 (48.1) | 0.910 |
| Age (years) | 67.5 ± 16.16 | 75.7 ± 11.58 | 64.9 ± 16.57 | <0.05 |
| Hospitalization period | 13.4 ± 10.20 | 16.8 ± 12.81 | 12.4 ± 9.00 | <0.05 |
| Comorbidities (%) | ||||
| Hypertension | 98 (48.3) | 30 (61.2) | 68 (44.2) | 0.055 |
| Diabetes mellitus | 56 (27.6) | 20 (40.8) | 36 (23.4) | <0.05 |
| Heart disease | 6 (3.0) | 6 (12.2) | 10 (6.5) | 0.319 |
| Lung diasease | 12 (5.9) | 3 (6.1) | 9 (5.8) | 0.943 |
| Liver disease | 5 (2.5) | 0 (0) | 5 (3.2) | 0.454 |
| Kidney disease | 5 (2.5) | 2 (4.1) | 3 (1.9) | 0.756 |
| Brain disease | 27 (13.3) | 6 (12.2) | 21 (13.6) | 0.993 |
| Malignant disease | 28 (13.8) | 7 (14.3) | 21 (13.6) | 0.909 |
| Vital signs at hospital admission | ||||
| Systolic blood pressure (mmHg) | 133.4 ± 22.08 | 132.9 ± 25.12 | 133.6 ± 21.10 | 0.866 |
| Diastolic blood pressure (mmHg) | 79.3 ± 14.76 | 75.2 ± 14.22 | 80.6 ± 14.74 | <0.05 |
| Pulse rate (PR, bpm) | 84.4 ± 17.79 | 84.3 ± 23.84 | 84.5 ± 15.53 | 0.962 |
| Respiratory rate (RR, bpm) | 22.4 ± 10.03 | 25.6 ± 17.10 | 21.3 ± 6.17 | 0.095 |
| Body temperature (°C) | 37.1 ± 0.71 | 36.9 ± 0.82 | 37.1 ± 0.65 | <0.05 |
| Pulse pressure (mmHg) | 54.1± 17.88 | 57.7 ± 20.23 | 53.0 ± 16.99 | 0.147 |
| PR/RR | 4.1 ± 1.34 | 3.9 ± 1.74 | 4.2 ± 1.18 | 0.316 |
Figure 2Area under the receiver operating characteristic curves (AUC) of machine-learning models, deep-learning model, and an ensemble model.
DL and ML performances by AUC optimization.
| Classifier | AUC | Accuracy | F1-Score | Precision | Recall |
|---|---|---|---|---|---|
| XGboost | 0.8616 | 0.82 | 0.75 | 0.8 | 0.73 |
| LGBM | 0.8318 | 0.83 | 0.71 | 0.83 | 0.68 |
| RF | 0.8560 | 0.83 | 0.74 | 0.80 | 0.71 |
| KNN | 0.7631 | 0.79 | 0.59 | 0.77 | 0.58 |
| SVM | 0.8158 | 0.81 | 0.72 | 0.74 | 0.71 |
| DL | 0.8721 | 0.84 | 0.76 | 0.79 | 0.74 |
| Ensemble model * | 0.8811 | 0.85 | 0.77 | 0.81 | 0.75 |
DL: deep learning; ML: machine learning; XGboost: extreme gradient boosting; LGBM: light gradient boosting model; RF: random forest; KNN: K-nearest neighbors; SVM: support vector machine. * Ensemble model of deep-learning and machine-learning models.
DL and ML performances by F1 score optimization.
| Classifier | AUC | Accuracy | F1-Score | Precision | Recall |
|---|---|---|---|---|---|
| XGboost | 0.8331 | 0.83 | 0.77 | 0.76 | 0.77 |
| LGBM | 0.8318 | 0.85 | 0.75 | 0.84 | 0.72 |
| RF | 0.8560 | 0.84 | 0.78 | 0.78 | 0.77 |
| KNN | 0.7631 | 0.79 | 0.72 | 0.72 | 0.74 |
| SVM | 0.8158 | 0.82 | 0.76 | 0.76 | 0.76 |
| DL | 0.8614 | 0.83 | 0.78 | 0.77 | 0.79 |
| Ensemble model * | 0.8631 | 0.85 | 0.80 | 0.80 | 0.80 |
DL: deep learning; ML: machine learning; XGboost: extreme gradient boosting; LGBM: light gradient boosting model; RF: random forest; KNN: K-nearest neighbors; SVM: support vector machine. * Ensemble model of deep-learning and machine-learning models.
Figure 3Feature importance of XGBoost (a) and RF (b) in COVID-19 mortality.
Figure 4Analysis of features contributing to COVID-19 mortality by the SHAP method. (a) DL model. (b) XGBoost model. (c) LGBM model. (d) RF model.