| Literature DB >> 36210961 |
Doaa A Altantawy1, Sherif S Kishk1.
Abstract
SARS-CoV2 (COVID-19) is the virus that causes the pandemic that has severely impacted human society with a massive death toll worldwide. Hence, there is a persistent need for fast and reliable automatic tools to help health teams in making clinical decisions. Predictive models could potentially ease the strain on healthcare systems by early and reliable screening of COVID-19 patients which helps to combat the spread of the disease. Recent studies have reported some key advantages of employing routine blood tests for initial screening of COVID-19 patients. Thus, in this paper, we propose a novel COVID-19 prediction model based on routine blood tests. In this model, we depend on exploiting the real dependency among the employed feature pool by a sparsification procedure. In this sparse domain, a hybrid feature selection mechanism is proposed. This mechanism fuses the selected features from two perspectives, the first is Pearson correlation and the second is a new Minkowski-based equilibrium optimizer (MEO). Then, the selected features are fed into a new 1D Convolutional Neural Network (1DCNN) for a final diagnosis decision. The proposed prediction model is tested with a new public dataset from San Raphael Hospital, Milan, Italy, i.e., OSR dataset which has two sub-datasets. According to the experimental results, the proposed model outperforms the state-of-the-art techniques with an average testing accuracy of 98.5% while we employ only less than half the size of the feature pool, i.e., we need only less than half the given blood tests in the employed dataset to get a final diagnosis decision.Entities:
Keywords: 1DCNN; Blood tests; COVID-19; Equilibrium optimization; Feature pool sparsification; Feature selection; Pearson correlation
Year: 2022 PMID: 36210961 PMCID: PMC9527205 DOI: 10.1016/j.eswa.2022.118935
Source DB: PubMed Journal: Expert Syst Appl ISSN: 0957-4174 Impact factor: 8.665
Fig. 1A logarithmic scale for COVID-19 Monthly total deaths (Worldometer, 2020).
Comparison of different COVID-19 detection based on routine blood tests.
| Authors/ref. | Dataset | Adopted methodology | Accuracy | Sensitivity | Specificity | ROC-AUC | |
|---|---|---|---|---|---|---|---|
| Hospital Israelita Albert Einstein, São Paulo, Brazil | SMOTEBoost, Ensemble of 10 SVM models | – | 70.25 % | 85.98 % | 86.78 % | ||
| RF, LR, GLMNET, ANN | 81 %–87 % | 43 %–65 % | 81 %–91 % | 80 %-84 % | |||
| NN, RF, GBT, LR, SVM | – | 67.7 %–80.6 % | 80 %–85 % | 84.2 %–84.7 % | |||
| DTX, RF, Ensemble of LR, RF, XGBoost, SVM, MLP | 88 % | 66 % | 91 % | 86 % | |||
| Ensemble of ANN, CNN LSTM, RNN CNNLSTM CNNRNN | 86.66 % | – | – | 62.50 % | |||
| XMLP, SVM, RT, RF, BN, NB | 95.159 % | 96.8 % | 93.6 % | ---- | |||
| KNNimputer, iForest, SMOTE, Ensemble of RF, LR, and ET | 95 % | 95 % | 95 % | 95 % | |||
| Tongji Hospital of Wuhan, China | LASSO-LR | ---- | 98 % | 91 % | 0·997 | ||
| XGBoost | – | 83 % | – | ----- | |||
| San Raphael Hospital, Milan, Italy | RF, NB, LR, SVM, and KNN | 83 %–91 % | 76 %–92 % | 92 %–96 % | 83 % − 94 % | ||
| DT, ET, KNN, LR, NB, RF, SVM, TWRF | 82 % −86 % | 92 % − 95 % | – | – | |||
| FI, DNN | 97.658 % | 96.55 % | – | – | |||
| New York Presbyterian Hospital/Weill Cornell Medicine (NYPH/WCM) | LR, DT, RF, XGBoost | 68.9 %–79.1 % | 61.8 % −76.1 % | 73.2 %–80.8 % | 70.4 %–85.4 % | ||
| Stanford Health Care, CA, USA | LR | – | 86–93 % | 35–55 % | – | ||
| Hospitals in Zhejiang, China | LR, DT, RF, SVM. DNN | 91 % | 87 % | 95 % | 86.4 % | ||
| Hospital in Milan Italy | ANN, LR, RF, DT | 91.4 % | 94.1 % | 88.7 % | – | ||
| University Medical Center, Ljubljana, Slovenia | XGBoost, RF, DNN | – | 81.9 % | 97.9 % | 97 % |
*1 is the dataset size, is the number of COVID-19 positive cases in the employed dataset, is the total number of features in the targeted dataset, and is the number of the selected features in the diagnosis process. Using “–”, means not mentioned in the original study.
List of abbreviations.
| Abbreviation | Explanation | Abbreviation | Explanation |
|---|---|---|---|
| ML | Machine learning | PCR | Polymer chain reaction |
| mRMR | maximum relevance minimum redundancy algorithm | SMOTEBoost | an oversampling method based on the SMOTE algorithm (Synthetic Minority Oversampling Technique) |
| SVM | Support vector machine | RF | Random Forest |
| LR | Logistic regression | GLMNET | Lasso and Elastic-Net Regularized Generalized Linear Models |
| ANN/NN | Artificial neural network | DNN | Deep neural network |
| GBT | Gradient boosting trees | XGBoost | is an optimized distributed gradient boosting library |
| MLP | Multi-layer perceptron | CNN | Convolutional neural network |
| LSTM | Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) | NB | Naïve bayes |
| BN | Bayesian network | iForest | Isolation forest |
| LASSO | least absolute shrinkage and selection operator | KNN | k-nearest neighbors algorithm |
| TWRF | Trees Weighting Random Forest | FI | Fuzzy inference |
| DT | Decision Tree | GNB | Gaussian Naïve Bayes |
| ET | Extremely Randomized Trees | RSVM | Radial Support Vector Machine |
| LSVM | Linear Support Vector Machine | QDA | Quadratic Discriminant Analysis |
| LDA | Linear Discriminant Analysis | EO | Equilibrium optimizer |
| AdaBoost | Adaptive Boosting trees | MEO | Minkowski-based equilibrium optimizer |
The numerical features in the OSR dataset with its mean value , standard deviation and missing rate..
| Feature (Abb.) | Description | COVID-specific dataset | CBC dataset | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Exist. | MR % | Exist. | MR % | ||||||
| Calcium (CA) | A test checks the calcium level in the body that is not stored in the bones | ✓ | 5.35 | 2.21 | 0.48 | ✕ | |||
| Creatine kinase (CK) | This test measures the amount of an enzyme called creatine kinase (CK) in your blood. CK is a type of protein. The muscle cells in your body need CK to function. | ✓ | 59.44 | 181.64 | 405.71 | ✕ | |||
| Creatinine (CREA) | A test measures how well your kidneys are performing their job of filtering waste from your blood | ✓ | 4.26 | 1.16 | 0.98 | ✕ | |||
| Alkaline phosphatase (ALP) | ALP is an enzyme found throughout the body, but it is mostly found in the liver, bones, kidneys, and digestive system. When the liver is damaged, ALP may leak into the bloodstream | ✓ | 27.3 | 88.54 | 71.44 | ✓ | 53 | 89.89 | 89.09 |
| Gamma glutamyl transferase (GGT) | A test assess the body response to glucose | ✓ | 25.11 | 66.22 | 135.39 | ✓ | 51.25 | 82.48 | 132.70 |
| Glucose (GLU) | A test measures the level of glucose (sugar) in a person's blood | ✓ | 5.65 | 119 | 57.91 | ✕ | |||
| Aspartate aminotrans-ferase (AST) | AST is an enzyme that is normally present in the liver, heart, brain, pancreas, kidneys, and many other muscles and tissues in the body. Enzymes like AST help facilitate fundamental biological processes in these organs and tissues | ✓ | 5.65 | 45.85 | 50.67 | ✓ | 0.72 | 54.20 | 57.61 |
| Alanine aminotrans-ferase (ALT) | A test measures the amount of ALT in the blood. High levels of ALT in the blood can indicate a liver problem, even before you have signs of liver disease, such as jaundice, a condition that causes your skin and eyes to turn yellow. An ALT blood test may be helpful in early detection of liver disease | ✓ | 5.53 | 39.17 | 42.55 | ✓ | 4.66 | 44.92 | 45.50 |
| Lactate dehydrogenase (LDH) | A test looks for signs of damage to the body's tissues. LDH is an enzyme found in almost every cell of your body, including your blood, muscles, brain, kidneys, and pancreas. The enzyme turns sugar into energy | ✓ | 17.45 | 327.64 | 211.62 | ✓ | 30.47 | 380.45 | 193.98 |
| polymerase chain reaction (CRP) | A test measures the amount of CRP in the blood to detect inflammation due to acute conditions or to monitor the severity of disease in chronic conditions | ✓ | 5.59 | 67 | 77.8 | ✓ | 2.15 | 90.88 | 94.4 |
| Potassium (K) | A test checks how much potassium is in the blood | ✓ | 4.61 | 4.23 | 0.52 | ✕ | |||
| Sodium (NA) | checks how much sodium is in the blood | ✓ | 4.21 | 138.59 | 4.58 | ✕ | |||
| UREA | Urea is usually passed out in the urine. A high blood level of urea indicates that the kidneys may not be working properly, or that you have a low body water content (are dehydrated) | ✓ | 38.94 | 48.96 | 42.47 | ✕ | |||
| White blood cell (WBC) | A test measures the count of White blood cells | ✓ | 3.63 | 8.72 | 4.64 | ✓ | 0.72 | 8.55 | 4.86 |
| Red blood cell (RBC) | A test measures the count of Red blood cells | ✓ | 3.63 | 4.52 | 0.73 | ✕ | |||
| Hemoglobin (HGB) | a protein in your red blood cells that carries oxygen to your body's organs and tissues and transports carbon dioxide from your organs and tissues back to your lungs | ✓ | 3.63 | 13.14 | 2.04 | ✕ | |||
| Hematocrit (HCT) | A test measures the proportion of red blood cells in your blood. Red blood cells carry oxygen throughout your body. Having too few or too many red blood cells can be a sign of certain diseases | ✓ | 3.63 | 39.21 | 5.61 | ✕ | |||
| Mean corpuscular volume (MCV) | There are three main types of corpuscles (blood cells) in your blood: red blood cells, white blood cells, and platelets. An MCV blood test measures the average size of your red blood cells | ✓ | 3.63 | 87.29 | 7.06 | ✕ | |||
| Mean corpuscular hemoglobin (MCH) | It's the average amount in each of your red blood cells of a protein called hemoglobin, which carries oxygen around your body | ✓ | 3.63 | 29.21 | 2.72 | ✕ | |||
| Mean corpuscular hemoglobin concentration (MCHC) | A test checks the average amount of hemoglobin in a group of red blood cells | ✓ | 3.63 | 33.45 | 1.34 | ✕ | |||
| Platelets (PLT) | A normal platelet count ranges from 150,000 to 450,000 platelets per microliter of blood | ✓ | 3.63 | 235.66 | 94.22 | ✓ | 0.72 | 226.53 | 101.17 |
| Neutrophils (NET, NE) | a type of white blood cell that helps heal damaged tissues and resolve infections ( | (✓,✓) | (20.85, 20.85) | (6.45, 72.35) | (4.47, 13.26) | (✓,✕) | (25.1, -----) | (6.2, ----) | (4.17, ----) |
| Lymphocytes (LYT, LY) | are a type of white blood cell. They play an important role in your immune system, helping your body fight off infection ( | (✓,✓) | (20.85, 20.85) | (1.37, 18.58) | (0.95, 11) | (✓,✕) | (25.1, -----) | (1.18, -----) | (0.81, ----) |
| Monocytes (MOT, MO) | are a measurement of a particular type of white blood cell. Monocytes are helpful at fighting infections and diseases ( | (✓,✓) | (20.85, 20.85) | (0.62, 7.83) | (0.54, 3.88) | (✓,✕) | (25.1, -----) | (0.61, -----) | (0.41, -----) |
| Eosinophils (EOT, EO) | are a type of disease-fighting white blood cell. This condition most often indicates a parasitic infection, an allergic reaction or cancer ( | (✓,✓) | (20.85, 20.85) | (0.07, 0.88) | (0.14, 1.62) | (✓,✕) | (25.1, ----) | (0.06, -----) | (0.13, ----) |
| Basophils (BAT, BA) | are a type of white blood cell. Like most types of white blood cells, basophils are responsible for fighting fungal or bacterial infections and viruses ( | (✓,✓) | (20.85, 20.85) | (0.02,0.34) | (0.04,0.27) | (✓,✕) | (25.45, ----) | (0.01, -----) | (0.04, ----) |
Fig. 3COVID-19 examination results for COVID-specific dataset in (a) and for CBC dataset in (b).
Fig. 4COVID-19 swab result distribution according to age and gender for COVID-specific dataset in (a) and for CBC dataset in (b).
Fig. 2An illustration of the proposed COVID-19 prediction model.
Fig. 53D Visualization of the predicted outliers/inliers in COVID-specific dataset via three PCA components.
Fig. 6An illustration of the proposed feature selection technique that is based on a fusion process between Pearson dropping (PCC) and the introduced Minkowski-based equilibrium optimizer (MEO) in a serial and parallel manner in the original features domain once and in the proposed sparse domain another. represents combining decisions by OR operations while represents seeking the intersections of decisions by AND operations.
Fig. 7Pairwise Pearson correlation of features: (a), (c) for the original feature pool while (b), (d) for the sparsified feature pool . The first row for COVID-specific dataset and the second one for CBC dataset.
Fig. 82D illustration of Equilibrium candidates’ collaboration in updating particles’ concentration.
Fig. 9Flow chart of the proposed MEO algorithm.
Fig. 10Comparison of the results of average fitness over iterations for the traditional EO, in the first row, and the proposed MEO, in the second one, for COVID-specific dataset. The first column is the results of the original feature pool while the second one for the sparsified feature pool .
Fig. 11An example of 1DCNN model for a binary classification problem. In this example, the network consists of two convolutional layers (Conv_1with 32 filters and Conv_2 with 64 filters), Max pooling layer, flattening layer and finally some fully connected layers with soft-max layer.
Fig. 12Summary of the proposed 1DCNN for COVID-19 prediction considering 9 selected features.
Fig. 13The employed evaluation metrics.
The validation results of the effect of employing the data preparation steps, i.e., SMOTE for data balancing and iForest for outlier detection, on the proposed COVID-19 diagnosis algorithm on the employed datasets.
| dataset | Case | Case name | ACC | PPV | SV | F1 | AUC | SP | Features |
|---|---|---|---|---|---|---|---|---|---|
| Covid-specific dataset | 1 | Imbalanced w/ outliers | 0.83 | 0.834 | 0.816 | 0.824 | 0.890 | 0.773 | 19/33 |
| 2 | Balanced w/ outliers | 0.866 | 0.876 | 0.918 | 0.896 | 0.929 | 0.921 | 18/33 | |
| 3 | Imbalanced w/o outliers | 0.894 | 0.882 | 0.902 | 0.891 | 0.956 | 0.901 | 19/33 | |
| 4 | Balanced w/o outliers | 13/33 | |||||||
| CBC dataset | 1 | Imbalanced w/ outliers | 0.771 | 0.783 | 0.9 | 0.833 | 0.824 | 0.806 | 7/13 |
| 2 | Balanced w/ outliers | 0.923 | 0.921 | 0.956 | 0.938 | 0.976 | 0.938 | 6/13 | |
| 3 | Imbalanced w/o outliers | 0.906 | 0.902 | 0.956 | 0.92 | 0.95 | 0.931 | 9/13 | |
| 4 | Balanced w/o outliers | 6/13 |
The selected number of features (x) out of the total size of the original feature pool (y); (x/y)
Fig. 14Confusion matrices of testing the proposed COVID-19 prediction algorithm adopting the four cases indicated in Table 4 showing the effect of SMOTE and iForest on the performance.
Validation results of applying all features, and PCC and MEO-based feature selection, separately, in different cases for COVID-specific dataset. The best performance is marked by bold font. (--) is the number of selected features.
| Train and testing for the original samples in | Train and testing for the sparse samples in | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | PPV | SV | F1 | AUC | SP | ACC | PPV | SV | F1 | AUC | SP | |
| All features (33) | 0.94 | 0.943 | 0.96 | 0.952 | 0.986 | 0.961 | 0.942 | 0.944 | 0.96 | 0.954 | ||
| PCC-based feature selection for the original features in | 0.939 | 0.94 | 0.96 | 0.95 | 0.98 | 0.958 | 0.943 | 0.947 | 0.954 | 0.986 | 0.958 | |
| PCC-based feature selection for the sparse features in | 0.932 | 0.935 | 0.958 | 0.946 | 0.977 | 0.958 | 0.939 | 0.942 | 0.96 | 0.95 | 0.983 | 0.955 |
| MEO-based selection for the original features in | 0.925 | 0.93 | 0.95 | 0.94 | 0.98 | 0.948 | 0.932 | 0.94 | 0.952 | 0.945 | 0.98 | 0.961 |
| MEO-based selection for the sparse features in | 0.934 | 0.94 | 0.96 | 0.947 | 0.983 | 0.958 | 0.96 | 0.984 | 0.958 | |||
Validation results of applying all features, and PCC and MEO-based feature selection, separately, in different cases for CBC dataset. The best performance is marked by bold font. (--) is the number of selected features.
| Train and testing for the original samples in | Train and testing for the sparse samples in | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | PPV | SV | F1 | AUC | SP | ACC | PPV | SV | F1 | AUC | SP | |
| All features (13) | 0.989 | 0.985 | 0.997 | 0.991 | 0.999 | 1 | ||||||
| PCC-based feature selection for the original features in | 0.987 | 0.982 | 0.997 | 0.99 | 0.999 | 1 | 0.986 | 0.983 | 0.995 | 0.989 | 0.999 | 1 |
| PCC-based feature selection for the sparse features in | 0.989 | 0.987 | 0.995 | 0.991 | 0.999 | 1 | 0.989 | 0.987 | 0.998 | 0.991 | 0.999 | 1 |
| MEO-based selection for the original features in | 0.99 | 1 | 0.992 | 0.999 | 1 | 0.986 | 0.984 | 0.994 | 0.989 | 0.999 | 1 | |
| MEO-based selection for the sparse features in | 0.961 | 0.963 | 0.974 | 0.968 | 0.989 | 0.964 | 0.969 | 0.968 | 0.983 | 0.974 | 0.996 | 0.988 |
Fig. 15AdaBoost feature importance employing all features for COVID-specific dataset in the first row and CBC dataset in the second row. (a) and (c) in the features original domain while (b) and (d) in the sparse domain.
Fig. 16AdaBoost feature importance, for COVID-specific dataset, adopting the followings: 1. PCC-based feature selection in the features original domain (22 feature selected) while applying training and testing once for the original samples in (a), and another for the sparse samples in (b). 2. PCC-based feature selection in sparse domain (14 feature selected) while applying training and testing once for the original samples in (c), and another for the sparse samples in (d). 3. MEO-based feature selection in features original domain (12 feature selected) while applying training and testing once for the original samples in (e), and another for the sparse samples in (f). 4. PCC-based feature selection in sparse domain (12 feature selected) while applying training and testing once for the original samples in (g), and another for the sparse samples in (h).
Fig. 17Classification reports of testing the proposed COVID diagnosis model based on the proposed fused selection method and 1DCNN in both original domain (a), (c) and sparse domain (b), (d). The first two rows belong to COVID-specific dataset while the other rows belong to CBC dataset.
Computitative comparison between some traditional ML techniques and the proposed 1DCNN model while training and testing performed once in original features domain and another in sparse domain for COVID-specific dataset (13 selected Features out of 33). The top performer is bolded, while the second is underlined.
| Training and testing Domain | Classifier | ACC | PPV | SV | F1 | AUC | SP | Macro- | Micro- | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PPV | SV | F1- | PPV | SV | F1- | ||||||||
| Original features domain | LSVM | 0.84 | 0.858 | 0.891 | 0.874 | 0.902 | 0.907 | 0.836 | 0.824 | 0.828 | 0.84 | 0.84 | 0.84 |
| RSVM | 0.897 | 0.909 | 0.927 | 0.918 | 0.939 | 0.931 | 0.893 | 0.887 | 0.89 | 0.897 | 0.897 | 0.897 | |
| LR | 0.836 | 0.893 | 0.837 | 0.863 | 0.903 | 0.867 | 0.827 | 0.835 | 0.829 | 0.836 | 0.836 | 0.836 | |
| RF | 0.928 | 0.943 | 0.942 | 0.963 | 0.924 | 0.923 | 0.923 | 0.928 | 0.928 | 0.928 | |||
| AdaBoost | 0.932 | 0.934 | 0.958 | 0.924 | |||||||||
| DT | 0.883 | 0.902 | 0.911 | 0.906 | 0.874 | 0.941 | 0.877 | 0.874 | 0.875 | 0.883 | 0.883 | 0.883 | |
| KNN | 0.869 | 0.868 | 0.932 | 0.899 | 0.936 | 0.949 | 0.872 | 0.85 | 0.857 | 0.869 | 0.869 | 0.869 | |
| XGBoost | 0.899 | 0.914 | 0.926 | 0.919 | 0.956 | 0.947 | 0.896 | 0.891 | 0.892 | 0.899 | 0.899 | 0.899 | |
| GNB | 0.787 | 0.794 | 0.888 | 0.838 | 0.858 | 0.925 | 0.783 | 0.756 | 0.763 | 0.787 | 0.787 | 0.787 | |
| ET | 0.934 | 0.945 | 0.93 | 0.926 | 0.931 | 0.931 | 0.931 | ||||||
| LDA | 0.824 | 0.835 | 0.894 | 0.863 | 0.895 | 0.92 | 0.821 | 0.802 | 0.808 | 0.824 | 0.824 | 0.824 | |
| QDA | 0.783 | 0.789 | 0.89 | 0.836 | 0.869 | 0.917 | 0.78 | 0.749 | 0.757 | 0.783 | 0.783 | 0.783 | |
| OURS | 0.971 | ||||||||||||
| Sparse domain | LSVM | 0.844 | 0.868 | 0.884 | 0.875 | 0.904 | 0.909 | 0.838 | 0.832 | 0.833 | 0.844 | 0.844 | 0.844 |
| RSVM | 0.894 | 0.908 | 0.924 | 0.916 | 0.939 | 0.939 | 0.89 | 0.885 | 0.887 | 0.894 | 0.894 | 0.894 | |
| LR | 0.836 | 0.891 | 0.841 | 0.864 | 0.905 | 0.875 | 0.828 | 0.835 | 0.829 | 0.836 | 0.836 | 0.836 | |
| RF | 0.933 | 0.939 | 0.954 | 0.946 | 0.982 | 0.981 | 0.931 | 0.926 | 0.928 | 0.933 | 0.933 | 0.933 | |
| AdaBoost | |||||||||||||
| DT | 0.889 | 0.9 | 0.923 | 0.911 | 0.878 | 0.944 | 0.885 | 0.878 | 0.881 | 0.889 | 0.889 | 0.889 | |
| KNN | 0.892 | 0.886 | 0.948 | 0.916 | 0.947 | 0.965 | 0.897 | 0.874 | 0.882 | 0.892 | 0.892 | 0.892 | |
| XGBoost | 0.896 | 0.905 | 0.93 | 0.917 | 0.951 | 0.952 | 0.893 | 0.885 | 0.888 | 0.896 | 0.896 | 0.896 | |
| GNB | 0.822 | 0.862 | 0.849 | 0.855 | 0.869 | 0.885 | 0.811 | 0.813 | 0.811 | 0.822 | 0.822 | 0.822 | |
| ET | 0.963 | ||||||||||||
| LDA | 0.824 | 0.837 | 0.893 | 0.863 | 0.896 | 0.912 | 0.82 | 0.803 | 0.808 | 0.824 | 0.824 | 0.824 | |
| QDA | 0.79 | 0.803 | 0.88 | 0.839 | 0.865 | 0.912 | 0.785 | 0.762 | 0.768 | 0.79 | 0.79 | 0.79 | |
| OURS | 0.984 | ||||||||||||
Computitative comparison between some of the traditional ML techniques and the proposed 1DCNN model while training and testing performed once in original features’ domain and in another in sparse domain for CBC dataset (6 selected features out of 13). The top performer is bolded, while the second is underlined.
| Training and testing Domain | Classifier | ACC | PPV | SV | F1 | AUC | SP | Macro- | Micro- | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PPV | SV | F1- | PPV | SV | F1- | ||||||||
| Original features’ domain | LSVM | 0.828 | 0.848 | 0.873 | 0.860 | 0.880 | 0.845 | 0.822 | 0.815 | 0.818 | 0.828 | 0.828 | 0.828 |
| RSVM | 0.893 | 0.897 | 0.930 | 0.913 | 0.937 | 0.908 | 0.892 | 0.882 | 0.886 | 0.893 | 0.893 | 0.893 | |
| LR | 0.810 | 0.844 | 0.844 | 0.843 | 0.881 | 0.828 | 0.802 | 0.801 | 0.801 | 0.810 | 0.810 | 0.810 | |
| RF | 0.956 | 0.954 | 0.974 | 0.964 | 0.957 | 0.951 | 0.953 | 0.956 | 0.956 | 0.956 | |||
| AdaBoost | 0.976 | 0.980 | 0.983 | 0.976 | 0.976 | 0.976 | 0.976 | ||||||
| DT | 0.932 | 0.940 | 0.950 | 0.944 | 0.928 | 0.971 | 0.932 | 0.928 | 0.929 | 0.932 | 0.932 | 0.932 | |
| KNN | 0.955 | 0.958 | 0.968 | 0.963 | 0.988 | 0.960 | 0.955 | 0.951 | 0.952 | 0.955 | 0.955 | 0.955 | |
| XGBoost | 0.946 | 0.940 | 0.974 | 0.957 | 0.981 | 0.977 | 0.949 | 0.939 | 0.943 | 0.946 | 0.946 | 0.946 | |
| GNB | 0.694 | 0.854 | 0.597 | 0.701 | 0.827 | 0.557 | 0.715 | 0.720 | 0.693 | 0.694 | 0.694 | 0.694 | |
| ET | 0.975 | 0.983 | |||||||||||
| LDA | 0.802 | 0.838 | 0.837 | 0.837 | 0.871 | 0.810 | 0.794 | 0.792 | 0.792 | 0.802 | 0.802 | 0.802 | |
| QDA | 0.760 | 0.889 | 0.692 | 0.777 | 0.877 | 0.644 | 0.768 | 0.779 | 0.758 | 0.760 | 0.760 | 0.760 | |
| OURS | |||||||||||||
| Sparse domain | LSVM | 0.816 | 0.841 | 0.860 | 0.850 | 0.871 | 0.787 | 0.809 | 0.804 | 0.805 | 0.816 | 0.816 | 0.816 |
| RSVM | 0.876 | 0.870 | 0.936 | 0.902 | 0.936 | 0.931 | 0.879 | 0.860 | 0.867 | 0.876 | 0.876 | 0.876 | |
| LR | 0.801 | 0.836 | 0.837 | 0.836 | 0.874 | 0.805 | 0.793 | 0.791 | 0.791 | 0.801 | 0.801 | 0.801 | |
| RF | 0.969 | 0.976 | 0.974 | 0.975 | 0.966 | 0.968 | 0.968 | 0.968 | 0.969 | 0.969 | 0.969 | ||
| AdaBoost | 0.981 | 0.977 | |||||||||||
| DT | 0.938 | 0.951 | 0.947 | 0.949 | 0.936 | 0.948 | 0.935 | 0.936 | 0.935 | 0.938 | 0.938 | 0.938 | |
| KNN | 0.960 | 0.968 | 0.966 | 0.967 | 0.988 | 0.966 | 0.959 | 0.959 | 0.958 | 0.960 | 0.960 | 0.960 | |
| XGBoost | 0.942 | 0.950 | 0.954 | 0.952 | 0.981 | 0.948 | 0.940 | 0.938 | 0.939 | 0.942 | 0.942 | 0.942 | |
| GNB | 0.752 | 0.877 | 0.689 | 0.770 | 0.842 | 0.667 | 0.759 | 0.769 | 0.750 | 0.752 | 0.752 | 0.752 | |
| ET | 0.985 | ||||||||||||
| LDA | 0.798 | 0.830 | 0.840 | 0.835 | 0.868 | 0.822 | 0.790 | 0.787 | 0.788 | 0.798 | 0.798 | 0.798 | |
| QDA | 0.777 | 0.864 | 0.751 | 0.803 | 0.872 | 0.695 | 0.773 | 0.784 | 0.772 | 0.777 | 0.777 | 0.777 | |
| OURS | |||||||||||||
Fig. 18Training-validation performance in terms of accuracy for the proposed COVID prediction algorithm. The first row for COVID-specific dataset and the other one for CBC dataset. The training in (a), (c) is performed in features original domain and the others (b), and (d) in sparse domain. The training is performed over the selected features by the proposed fused-based feature selection mechanism which results 13 features for COVID-specific dataset and 6 features for CBC-dataset.
Fig. 19Classification reports of testing the following studies: (Alakus & Turkoglu, 2020) {18/33–13/13} as (a), (AlJame et al., 2020) {18/33–13/13} as (b), (Cabitza et al., 2021) {33/33–13/13} as (c), (Brinati et al., 2020) {33/33–13/13} as (d), (Shaban et al., 2021) {33/33–13/13} as (e), and Ours {13/33–6/13} as (f). {} denotes {selected features/total no. of features for COVID-specific dataset – selected features/total no. of features for CBC dataset. (?.1) for COVID-specific dataset and (?.2) for CBC dataset.
| 1. | ||
| 2. | ||
| 3. | ||
| 4. | ||
| 5. | ||
| 6. | ||
| 7. | ||
| 8. | ||
| 9. | // | |
| 10. | ||
| // See Eq. | ||
| 11. | ||
| 12. | ||
| 1: | Initialize the solution’s/ particle’s population randomly, | |
| // Eq. | ||
| 2: | Assign a small number to the equilibrium candidates’ objective/ fitness function | |
| // | ||
| 3: | Select the equilibrium candidates | |
| 4: | Update the states of candidate solutions using search equation (Eq. | |
| 5: | Assign values to the following free parameters | |
| 6: | // the iteration no. | |
| 7: | | |
| 8: | Calculate the fitness function of the | // follow Eq. |
| 9: | | |
| 10: | Replace | |
| 11: | | |
| 12: | Replace | |
| 13: | | |
| 14: | Replace | |
| 15: | | |
| 16: | Replace | |
| 17: | | |
| 18: | | |
| 19: | | |
| 20: | Construct the equilibrium pool | |
| 21: | Accomplish memory saving if | |
| 22: | Assign | |
| 23: | | |
| 24: | Choose one candidate, randomly, from the equilibrium pool | |
| 25: | Generate random vectors of | |
| 26: | Construct | |
| 27: | Update concentration | |
| 28: | | |
| 29: | | |