| Literature DB >> 34025034 |
Makram Soui1, Nesrine Mansouri2, Raed Alhamad1, Marouane Kessentini3, Khaled Ghedira4.
Abstract
Nowadays, humanity is facing one of the most dangerous pandemics known as COVID-19. Due to its high inter-person contagiousness, COVID-19 is rapidly spreading across the world. Positive patients are often suffering from different symptoms that can vary from mild to severe including cough, fever, sore throat, and body aches. In more dire cases, infected patients can experience severe symptoms that can cause breathing difficulties which lead to stern organ failure and die. The medical corps all over the world are overloaded because of the exponentially myriad number of contagions. Therefore, screening for the disease becomes overwrought with the limited tools of test. Additionally, test results may take a long time to acquire, leaving behind a higher potential for the prevalence of the virus among other individuals by the patients. To reduce the chances of infection, we suggest a prediction model that distinguishes the infected COVID-19 cases based on clinical symptoms and features. This model can be helpful for citizens to catch their infection without the need for visiting the hospital. Also, it helps the medical staff in triaging patients in case of a deficiency of medical amenities. In this paper, we use the non-dominated sorting genetic algorithm (NSGA-II) to select the interesting features by finding the best trade-offs between two conflicting objectives: minimizing the number of features and maximizing the weights of selected features. Then, a classification phase is conducted using an AdaBoost classifier. The proposed model is evaluated using two different datasets. To maximize results, we performed a natural selection of hyper-parameters of the classifier using the genetic algorithm. The obtained results prove the efficiency of NSGA-II as a feature selection algorithm combined with AdaBoost classifier. It exhibits higher classification results that outperformed the existing methods.Entities:
Keywords: AdaBoost; COVID-19 prediction; Feature selection; Hyper-parameters optimization; Machine learning; NSGA-II
Year: 2021 PMID: 34025034 PMCID: PMC8129611 DOI: 10.1007/s11071-021-06504-1
Source DB: PubMed Journal: Nonlinear Dyn ISSN: 0924-090X Impact factor: 5.022
Fig. 1Number of COVID-19 confirmed cases and total deaths for the period (30 December 2019–12 April 2021) reported weekly by WHO (https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19---6-april-2021 (Accessed 12 April 2021))
Fig. 2Overview of proposed approach
Dataset 1 description
| Indicators | Column | Description | Code description | Type |
|---|---|---|---|---|
| Demographic characteristics | Age | Patient’s age | [1,98] | Numeric |
| Gender | Patient’s gender | 1: Male 0: female | ||
| Clinical symptoms | Fever | Patient fever | 1: Yes 0: No | |
| Cough | Patient dry cough | 1: Yes 0: No | ||
| Fatigue | Patient fatigue | 1: Yes 0: No | ||
| Pains | Patient pains | 1: Yes 0: No | ||
| Nasal congestion | Patient nasal congestion | 1: Yes 0: No | ||
| Shortness of breath | Patient breathing problem | 1: Yes 0: No | ||
| Runny nose | Patient runny nose | 1: Yes 0: No | ||
| Sore Throat | Patient sore throat | 1: Yes 0: No | ||
| Diarrhea | Patient diarrhea | 1: Yes 0: No | ||
| Chills | Patient chills | 1: Yes 0: No | ||
| Headache | Patient headache | 1: Yes 0: No | ||
| Vomiting | Patient vomiting | 1: Yes 0: No | ||
| Other information | Lives in affected area | Patient is from COVID-19 affected area or not | 1: Yes 0: No |
Dataset 2 description
| Indicators | Column | Description | Code description | Type |
|---|---|---|---|---|
| Demographic characteristics | Age | Age 60 years or above | 1: Yes 0: No | Numeric |
| Sex | Patient’s sex | 1: Male 0: female | ||
| Clinical symptoms | Cough | Patient cough | 1: Yes 0: No | |
| Fever | Patient fever | 1: Yes 0: No | ||
| Sore throat | Patient sore throat | 1: Yes 0: No | ||
| Shortness of breath | Patient breathing problem | 1: Yes 0: No | ||
| Headache | Patient headache | 1: Yes 0: No | ||
| Other information | Known with confirmed | Known contact with an individual confirmed to have COVID-19 | 1: Yes 0: No |
Confusion matrix
| Predicted class | ||
|---|---|---|
| Actual class | COVID-19 | NON-COVID-19 |
| COVID-19 | True positive (TP) | False positive (FP) |
| NON-COVID-19 | False negative (FN) | True negative (TN) |
Classification results with full datasets
| Dataset | Classifier | Accuracy (%) | Precision (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | AUC (%) |
|---|---|---|---|---|---|---|---|
| Dataset 1 | MLP | 79.96 | 82.40 | 79.67 | 80.29 | 81.01 | 79.98 |
| SVM | 75.72 | 78.70 | 75.10 | 76.44 | 76.86 | 75.77 | |
| LR | 77.51 | 80.70 | 76.35 | 78.85 | 78.46 | 77.60 | |
| Decision tree | 79.69 | 79.38 | 84.65 | 74.52 | 81.93 | 79.58 | |
| Gradient boosting | 80.40 | 81.22 | 82.57 | 77.88 | 81.89 | 80.23 | |
| XGboost | 80.40 | 81.22 | 82.57 | 77.88 | 81.89 | 80.23 | |
| AdaBoost | 79.96 | 81.86 | 80.50 | 79.33 | 81.17 | 79.91 | |
| Dataset 2 | MLP | 89.36 | 89.36 | 89.36 | 90.01 | 89.36 | 89.68 |
| SVM | 92.62 | 92.62 | 92.62 | 93.90 | 92.62 | 93.26 | |
|
|
|
|
|
|
| ||
| Decision tree | 85.96 | 85.96 | 85.96 | 86.27 | 85.96 | 86.12 | |
| Gradient boosting | 92.41 | 92.41 | 92.41 | 93.95 | 92.41 | 93.18 | |
| Random forest | 89.36 | 89.36 | 89.36 | 90.01 | 89.36 | 89.68 | |
| XGboost | 92.36 | 92.36 | 92.36 | 93.94 | 92.36 | 93.15 | |
| AdaBoost | 89.35 | 89.35 | 89.35 | 90.01 | 89.35 | 89.68 |
Bold values highlight the best results for the two studied datasets
Experimental results of studied feature selection algorithms for dataset 1
| Feature selection | Accuracy (%) | Precision (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | AUC (%) |
|---|---|---|---|---|---|---|
| 82.41 | 84.32 | 82.57 | 82.21 | 83.44 | 82.39 | |
| 75.72 | 78.7 | 75.1 | 76.44 | 76.86 | 75.77 | |
| 79.29 | 85.58 | 73.86 | 85.58 | 79.29 | 79.72 | |
| 80.4 | 80.97 | 82.99 | 77.4 | 81.97 | 80.2 | |
| 81.07 | 84.21 | 79.67 | 82.69 | 81.88 | 80.97 | |
| 81.96 | 83.19 | 82.16 | 80.77 | 82.67 | 81.46 | |
| 80.85 | 81.38 | 83.4 | 77.88 | 82.38 | 80.64 | |
| 81.07 | 81.97 | 82.99 | 78.85 | 82.47 | 80.92 | |
| 81.96 | 83.61 | 82.57 | 81.25 | 83.09 | 81.91 | |
| 75.72 | 78.7 | 75.1 | 76.44 | 76.86 | 75.77 | |
| 79.73 | 86.41 | 73.86 | 86.54 | 79.64 | 80.2 | |
| 80.4 | 80.97 | 82.99 | 77.4 | 81.97 | 80.2 | |
| 80.62 | 82.63 | 80.91 | 80.29 | 81.76 | 80.6 | |
| 82.41 | 84.32 | 82.57 | 82.21 | 83.44 | 82.39 | |
| 82.85 | 84.75 | 82.99 | 82.69 | 83.86 | 82.84 | |
| 81.07 | 81.45 | 83.82 | 77.88 | 82.62 | 80.85 | |
| 80.18 | 81.15 | 82.16 | 77.88 | 81.65 | 80.02 | |
| 75.72 | 78.7 | 75.1 | 76.44 | 76.86 | 75.77 | |
| 77.73 | 80.26 | 77.59 | 77.88 | 78.9 | 77.74 | |
| 80.62 | 81.3 | 82.99 | 77.88 | 82.14 | 80.44 | |
| 81.07 | 84.21 | 79.67 | 82.69 | 81.88 | 81.18 | |
| 82.63 | 84.98 | 82.16 | 83.17 | 83.54 | 82.67 | |
| 82.85 | 84.75 | 82.99 | 82.69 | 83.86 | 82.84 | |
| 83.3 | 85.17 | 83.4 | 83.17 | 84.28 | 83.29 | |
| 83.52 | 85.09 | 82.91 | 84.19 | 83.98 | 83.55 | |
| 79.51 | 79.83 | 81.20 | 77.67 | 80.51 | 79.44 | |
| 83.07 | 4.96 | 82.05 | 84.19 | 83.48 | 83.12 | |
| 84.51 | 82.8 | 88.46 | 80 | 85.54 | 84.23 | |
| 84.41 | 82.28 | 89.32 | 79.07 | 85.66 | 84.19 | |
| 83.74 | 85.78 | 82.48 | 85.12 | 84.10 | 83.8 | |
| 83.52 | 85.71 | 82.05 | 85.12 | 83.84 | 83.58 | |
Bold value indicates the highest result
Experimental results of studied feature selection algorithms for dataset 2
| Feature selection | Accuracy (%) | Precision (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | AUC (%) |
|---|---|---|---|---|---|---|
| 92.62 | 92.62 | 92.62 | 93.90 | 92.62 | 93.26 | |
| 92.88 | 92.88 | 92.88 | 94.91 | 92.88 | 93.89 | |
| 93.75 | 93.75 | 93.75 | 94.90 | 93.75 | 94.32 | |
| 86.47 | 86.47 | 86.47 | 87.24 | 86.47 | 86.86 | |
| 92.51 | 92.51 | 92.51 | 93.92 | 92.51 | 93.22 | |
| 92.62 | 92.62 | 92.62 | 93.90 | 92.62 | 93.26 | |
| 92.51 | 92.51 | 92.51 | 93.94 | 92.51 | 93.23 | |
| 92.52 | 92.52 | 92.52 | 93.91 | 92.52 | 93.22 | |
| 92.62 | 92.62 | 92.62 | 93.90 | 92.62 | 93.26 | |
| 94.01 | 94.01 | 94.01 | 99.60 | 94.01 | 96.81 | |
| 95.31 | 95.31 | 95.31 | 98.2 | 95.31 | 96.75 | |
| 92.61 | 92.61 | 92.61 | 93.90 | 92.61 | 93.26 | |
| 92.62 | 92.62 | 92.62 | 93.90 | 92.62 | 93.26 | |
| 91.68 | 91.68 | 91.68 | 92.77 | 91.68 | 92.23 | |
| 95.10 | 95.10 | 95.10 | 98.25 | 95.10 | 96.67 | |
| 95.34 | 95.34 | 95.34 | 98.23 | 95.34 | 96.79 | |
| 91.68 | 91.68 | 91.68 | 92.77 | 91.68 | 92.23 | |
| 92.88 | 92.88 | 92.88 | 94.91 | 92.88 | 93.89 | |
| 95.12 | 95.12 | 95.12 | 98.20 | 95.12 | 96.66 | |
| 86.47 | 86.47 | 86.47 | 87.24 | 86.47 | 86.86 | |
| 92.51 | 92.51 | 92.51 | 93.92 | 92.51 | 93.22 | |
| 92.62 | 92.62 | 92.62 | 93.90 | 92.62 | 93.26 | |
| 95.34 | 95.34 | 95.34 | 98.23 | 95.34 | 96.79 | |
| 91.68 | 91.68 | 91.68 | 92.77 | 91.68 | 92.23 | |
| 91.64 | 91.64 | 91.64 | 95.26 | 91.64 | 93.45 | |
| 95.10 | 95.10 | 95.10 | 98.25 | 95.10 | 96.67 | |
| 94.76 | 94.76 | 94.76 | 98.24 | 94.76 | 96.50 | |
| 92.52 | 92.52 | 92.52 | 93.91 | 92.52 | 93.22 | |
| 95.34 | 95.34 | 95.34 | 98.23 | 95.34 | 96.79 | |
| 92.44 | 92.44 | 92.44 | 93.91 | 92.44 | 93.17 | |
| 92.62 | 92.62 | 92.62 | 93.90 | 92.62 | 93.26 | |
Bold value indicates the highest result
Parameters setting
| Algorithms | Parameters |
|---|---|
| NSGA-II | Population size: 100 |
| Selection: binary tournament selection | |
| Crossover: single point crossover, | |
|
| |
| Mutation: polynomial mutation, | |
| Genetic algorithm | Population size: 100 |
| Mutation: 0.1 | |
| Crossover: 0.9 | |
|
| |
| AdaBoost (dataset1) |
|
| Max_depth=6 | |
|
| |
| AdaBoost (dataset2) |
|
| Max_depth=11 | |
|
|
Significant test results of paired t-test () for Dataset 1
| Model A | Model B | ||
|---|---|---|---|
|
|
| 9.1444 | 3.74E |
|
| 24.1853 | 8.44E | |
|
| 10.4965 | 1.19E | |
|
| 13.2775 | 1.61E | |
|
| 3.562 | 3.00E | |
|
| 7.0801 | 2.89E | |
|
| 7.6714 | 1.54E | |
|
| 7.9099 | 1.21E | |
|
| 8.6199 | 6.06E | |
|
| 15.9806 | 3.25E | |
|
| 8.6415 | 5.94E | |
|
| 13.1475 | 1.76E | |
|
| 10.256 | 1.44E | |
|
| 8.6415 | 5.94E | |
|
| 8.7993 | 5.13E | |
|
| 13.9716 | 1.04E | |
|
| 7.8576 | 1.27E | |
|
| 24.1853 | 8.44E | |
|
| 17.8533 | 1.23E | |
|
| 10.4523 | 1.23E | |
|
| 5.6934 | 1.00E | |
|
| 9.0294 | 4.15E | |
|
| 7.341 | 2.18E | |
|
| 6.0921 | 9.04E | |
|
| 6.863 | 3.68E | |
|
| 12.397 | 2.31E | |
|
| 7.765 | 1.40E | |
|
| 6.7806 | 4.03E | |
|
| 5.7425 | 1.30E | |
|
| 8.448 | 7.14E | |
|
| 7.8391 | 1.30E |
Significant test results of paired t-test (=0.05) for Dataset 2
| Model A | Model B | ||
|---|---|---|---|
|
|
| 17.5025 | 1.47E |
|
| 9.4181 | 2.94E | |
|
| 8.6483 | 5.91E | |
|
| 27.6612 | 2.56E | |
|
| 13.5235 | 1.38E | |
|
| 14.2987 | 8.54E | |
|
| 9.7485 | 2.21E | |
|
| 16.324 | 2.70E | |
|
| 10.5152 | 1.18E | |
|
| 7.0754 | 2.91E | |
|
| 6.3859 | 6.37E | |
|
| 16.3742 | 2.63E | |
|
| 13.6838 | 1.25E | |
|
| 21.7618 | 2.15E | |
|
| 5.3533 | 2.00E | |
|
| 4.3751 | 8.00E | |
|
| 24.8923 | 6.53E | |
|
| 14.0697 | 9.82E | |
|
| 4.6437 | 6.00E | |
|
| 31.4675 | 8.11E | |
|
| 15.2167 | 4.98E | |
|
| 13.8253 | 1.14E | |
|
| 5.6214 | 1.00E | |
|
| 32.9086 | 5.44E | |
|
| 19.0164 | 7.08E | |
|
| 8.8527 | 4.88E | |
|
| 8.7921 | 5.17E | |
|
| 17.191 | 1.72E | |
|
| 5.2141 | 2.00E | |
|
| 18.3929 | 9.49E | |
|
| 16.1819 | 2.91E |
Comparison of our model with similar works for COVID-19 prediction
| Dataset | Reference | Model | Accuracy (%) | Precision (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | AUC (%) |
|---|---|---|---|---|---|---|---|---|
| Dataset 1 |
|
|
|
|
|
|
|
|
| Banik et al. [ | Logistic Regression | 81.2 | 79.7 | 79.7 | – | 79.7 | – | |
| Naive Bayesian | 75.9 | 73.9 | 73.9 | – | 73.9 | – | ||
| Decision Tree | 71.9 | 70.4 | 67.3 | – | 68.8 | – | ||
| LinearSVM | 80.2 | 77.6 | 80.4 | – | 85 | – | ||
| Random Forest | 80.6 | 77.8 | 84 | – | 80.8 | – | ||
| Dataset 2 |
|
|
|
|
|
| – |
|
| Zoabi et al. [ | Gradient boosting | – | – | 87.3 | 71.98 | – | 90 |
Bold values highlight the best results for the two studied datasets