| Literature DB >> 34198547 |
Irfan Ullah Khan1, Nida Aslam1, Malak Aljabri1, Sumayh S Aljameel1, Mariam Moataz Aly Kamaleldin1, Fatima M Alshamrani1, Sara Mhd Bachar Chrouf1.
Abstract
The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.Entities:
Keywords: COVID-19; deep learning; machine learning; mortality rate; prediction
Mesh:
Year: 2021 PMID: 34198547 PMCID: PMC8296243 DOI: 10.3390/ijerph18126429
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Summary of related studies using machine learning models.
| Reference | Technique | Dataset Size | Feature Selection | Results |
|---|---|---|---|---|
| [ | RF | 287 COVID-19 patients | Extra tree classifiers | ACC:0.95 |
| [ | Ensemble based on RF and DNN. | 467 COVID-19 patients | ANOVA, ADR | ACC: 0.92 |
| [ | SVM, RF, LR, XGBoost | 3841 COVID-19 patients | Recursive feature elimination | AUC: 0.91 |
| [ | XGBoost | 3062 COVID-19 patients | - | AUC: 0.901 |
| [ | NN | 370 COVID-19 patients | backward step-down | ACC: 0.965 |
| [ | XGBoost | 4098 COVID-19 patients | SHAP, LASSO | AUC: 0.89 |
| [ | LR | 3524 COVID-19 patients | - | ACC: 0.968 |
| [ | SVM(Linear) | 10,237 COVID-19 patients | L1-norm | AUC: 0.963 |
| [ | LR | 2307 COVID-19 patients | - | AUC: 0.89 |
| [ | XGBoost | ~60,000 patients | - | AUC: 0.91 |
| [ | RF | 567 COVID-19 patients | Gini importance criteria | ACC: 0.655 |
| [ | MLP | 302 responses from an online survey | - | ACC: 0.85 |
| [ | RF | 341 COVID-19 patients | - | ROC: 0.84 |
| [ | DT | - | - | SEN: 0.95 |
| [ | LR | 1955 COVID-19 patients | - | AUC: 0.891 |
| [ | ANN | 3,073,82l COVID-19 patients | - | ACC: 0.8998 |
Summary of the related studies using Deep Learning models.
| Reference | Technique | Dataset Size | Feature Selection | Results |
|---|---|---|---|---|
| [ | DL with 5 condensed layers | 1108 COVID-19 patients | Boruta | AUC: 0.844 |
| [ | RBFNN, PNN | - | - | RMSE:7.89 |
| [ | DL | 181 COVID-19 patients | - | risk score AUC: 0.968 |
| [ | RNN | 3780 and 2307 COVID-19 confirmed cases from 2 datasets | Entropy, information gain, Gini index, chi-square | SEN: 0.84 |
Figure 1Framework of the proposed study.
Figure 2Data preprocessing steps.
Figure 3Selected features correlation in the dataset.
Description of the dataset.
| Feature Type | Feature Name | Datatype | Values (Unique) |
|---|---|---|---|
| Demographic | Age | Numeric | 101 |
| Sex | Categorical | 3 | |
| Country | Categorical | 76 | |
| Hospital Attribute | LOS | Numeric | 34 |
| Symptoms | Fatigue | Categorical | 2 |
| Fever | Categorical | 2 | |
| Weakness | Categorical | 2 | |
| Pneumonia | Categorical | 2 | |
| Cough | Categorical | 2 | |
| Diarrhoea | Categorical | 2 | |
| Sore Throat | Categorical | 2 | |
| Headache | Categorical | 2 | |
| Chronic Disease | Hypertension | Categorical | 2 |
| Diabetes | Categorical | 2 | |
| Cardiac | Categorical | 2 | |
| Target | Outcome | Categorical | 2 |
Optimized parameter values for the Logistic Regression model.
| Parameter | Value |
|---|---|
| Penalty | l2 |
| Random_state | 777 |
| Max_iter | 10,000 |
| Tol | 10 |
Optimized parameter values for the Random Forest model.
| Parameter | Value |
|---|---|
| n_estimators | 100 |
| max_depth | 15 |
| min_samples_split | 5 |
| min_samples_leaf | 1 |
Optimized parameter values for Extreme Gradient Boosting model.
| Parameter | Value |
|---|---|
| Objective | binary: logistic |
| Random_state | 42 |
Performance comparison of the proposed models for mortality rate prediction of COVID-19 patients.
| Classifier | Accuracy | Precision | Sensitivity | Specificity | F1-Score |
|---|---|---|---|---|---|
| Decision Tree | 0.945 | 0.998 | 0.947 | 0.799 | 0.972 |
| Logistic Regression | 0.945 | 0.998 | 0.946 | 0.777 | 0.972 |
| Random Forest | 0.946 | 0.998 | 0.947 | 0.807 | 0.972 |
| XGBoost | 0.946 | 0.998 | 0.947 | 0.810 | 0.972 |
| K-Nearest Neighbors | 0.944 | 0.997 | 0.947 | 0.699 | 0.971 |
| DNN | 0.970 | 1.000 | 0.970 | 1.000 | 0.985 |
Figure 4Proposed Deep Learning model: training accuracy and loss.
Figure 5Proposed Deep Learning model: validation accuracy and loss.
Comparison of proposed model with the baseline study.
| Reference | Year | Techniques | Features Used | Accuracy |
|---|---|---|---|---|
| [ | 2021 | NN | 57 features | 0.8998 |
| Proposed study | 2021 | DNN | 15 features | 0.970 |