Literature DB >> 36210961

Equilibrium-based COVID-19 diagnosis from routine blood tests: A sparse deep convolutional model.

Abstract

SARS-CoV2 (COVID-19) is the virus that causes the pandemic that has severely impacted human society with a massive death toll worldwide. Hence, there is a persistent need for fast and reliable automatic tools to help health teams in making clinical decisions. Predictive models could potentially ease the strain on healthcare systems by early and reliable screening of COVID-19 patients which helps to combat the spread of the disease. Recent studies have reported some key advantages of employing routine blood tests for initial screening of COVID-19 patients. Thus, in this paper, we propose a novel COVID-19 prediction model based on routine blood tests. In this model, we depend on exploiting the real dependency among the employed feature pool by a sparsification procedure. In this sparse domain, a hybrid feature selection mechanism is proposed. This mechanism fuses the selected features from two perspectives, the first is Pearson correlation and the second is a new Minkowski-based equilibrium optimizer (MEO). Then, the selected features are fed into a new 1D Convolutional Neural Network (1DCNN) for a final diagnosis decision. The proposed prediction model is tested with a new public dataset from San Raphael Hospital, Milan, Italy, i.e., OSR dataset which has two sub-datasets. According to the experimental results, the proposed model outperforms the state-of-the-art techniques with an average testing accuracy of 98.5% while we employ only less than half the size of the feature pool, i.e., we need only less than half the given blood tests in the employed dataset to get a final diagnosis decision.

Entities: Chemical

Keywords: 1DCNN; Blood tests; COVID-19; Equilibrium optimization; Feature pool sparsification; Feature selection; Pearson correlation

Year: 2022 PMID： 36210961 PMCID： PMC9527205 DOI： 10.1016/j.eswa.2022.118935

Source DB: PubMed Journal: Expert Syst Appl ISSN： 0957-4174 Impact factor: 8.665

Introduction

COVID-19 pandemic is the contemporary element of worriment across the world. This pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which is found to have a high degree of spread causing a massive death toll. The COVID-19 infection caused clusters of fatal pneumonia with clinical presentation greatly resembling SARS-CoV. In fact, patients experience flu-like symptoms, such as, fever, dry cough, tiredness, and difficulty breathing. However, sometimes, in more severe cases, pneumonia and renal failure develop to death (Huang et al., 2020). By, now, June 2022, more than 500 million confirmed patients have been reported in 222 countries with more than 6 million deaths due to this pandemic (Worldometer, 2020). Fig. 1 indicates the logarithmic modeling of death rate all over the world. Hence, A timely detection and diagnosis of the virus plays a leading role in infection control and accordingly in the death rate reduction. Therefore, developing efficient testing methods to identify COVID-19 infection is a must, in order to start early treatment, and to isolate the infected individuals from the rest.

Fig. 1

A logarithmic scale for COVID-19 Monthly total deaths (Worldometer, 2020).

A logarithmic scale for COVID-19 Monthly total deaths (Worldometer, 2020). Polymer chain reaction (PCR) (Zimmermann and Mannhalter, 1996, Corman et al., 2020), and Antibody testing (Serological testing) are the two main testing methods adopted by the global healthcare systems for COVID-19 diagnosis, however, both methods have their own limitations. Despite being the current gold standard for infection diagnosis, PCR has limitations in terms of resources and specimen collection (Ai et al., 2020), besides high cost. In addition, PCR, generally, has high specificity, but low sensitivity with about 20 % false-negative rate (Ferrari et al., 2020, Li et al., 2020). Thus, PCR negative test does not negate the possibility of COVID-19, hence, those patients will not receive the appropriate treatment on time. Moreover, there is a global shortage of the availability of PCR test kits. On the other side, tests based on IgM/IgG antibodies have shown a very low sensitivity (18.8 %) and specificity (77.8 %) in diagnosing COVID-19 during its early phase (Burog et al., 2020, Sethuraman et al., 2020). Accordingly, imaging-based diagnosis methods, such as Chest Radiograph images (CXRs)/ X-rays, computerized tomography (CT) scan, MRI and Ultrasound, besides other laboratory methods, such as routine blood test, can be employed to define the severity of the illness caused by COVID-19. Till now, COVID-19 pandemic continues challenging the world with the increase demands of hospital beds and medical equipments, especially with the everyday variations of the virus and with the exhausted healthcare workers. This has prompted researchers to investigate alternative automated methods with accurate and fast detection, less expensive, more accessible, and with minimal human interference. Over the years, machine learning (ML) field has gained much popularity for solving numerous real-world problems by producing systems that are capable of learning from examples and improving without being explicitly programmed (Brink et al., 2016). Hence, ML-based approaches have been used in the screening of patients suspected of being contaminated by SARS-CoV2, supporting the medical decision (Alballa & Al-Turaiki, 2021). Lately, several outbreak prediction models for COVID-19 have been developed to make informed-decisions and enforce relevant control measures (Albahri et al., 2020, Bullock et al., 2020, Latif et al., 2020, Alafif et al., 2021). However, due to a high level of uncertainty and lack of essential data, diagnosing COVID-19 by machine learning and soft computing models is still challenging research area. In this work, we introduce a new COVID-19 detection model based on routine blood tests, see Fig. 1. The main contributions can be summarized as Seeking an optimum dimensionality reduction, besides exploiting the real dependency among features in the adopted feature pool, a sparsification procedure is adopted. Hence, the introduced feature selection techniques can perform better in the discovered sparse domain. This sparsification procedure is performed by a sparse and low-rank decomposition process. The resultant sparse composite of the feature pool is expected to provide features with few pairwise interactions. For more effective feature selection performance, the adopted selection mechanism fuses the selection decisions from a statistical perspective on a side, i.e., Pearson correlation, and from a wrapper perspective, on the other side, i.e., Equilibrium Optimizer (EO) (Faramarzi et al., 2020). Instead of applying the traditional EO in the adopted feature selection procedure, the introduced diagnosis algorithm adopts a new Minkowski-based equilibrium optimizer (MEO) which employs a Minkowski-based scheme for local minimum avoidance, besides a recycling strategy for the worst solutions in order to find the most proper features, i.e., blood tests. For the classification phase, the proposed COVID-19 diagnosis algorithm adopts a 1DCNN model which shows superior performance compared to multiple traditional ML algorithms. The introduced COVID-19 diagnosis algorithm outperforms the state-of-the-art prediction model on all metrics, that are based on routine blood tests, while employing only less than half the size of the feature pool which means less blood tests and less cost which suits the conditions in the developing countries. The rest of the paper is organized as following: Section 2 indicates related work. In Section 3, details about the employed routine blood test dataset are indicated. In Section 4, the whole proposed methodology is introduced and detailed in some subsections. Section 5 indicates the experimental results with proper discussions. In Section 6, the conclusion is demonstrated.

Related work

Machine learning (ML) is a key branch of computational algorithms that are designed to imitate human intelligence by an automatic learning from the surrounding environment. Hence, the machine takes decisions and does predictions / forecasting based on data ML is one of today’s most rapidly growing technical topics, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. ML is considered the working horse in the new era of the so-called big data. Different machine learning techniques have been applied successfully in diverse fields, such as, from wireless communications (Tan et al., 2014), computer vision (Khan et al., 2021, Altantawy et al., 2020), finance (Kumbure et al., 2022), entertainment (Porcino et al., 2022), control system (Hedrea and Petriu, 2021) and computational biology to biomedical and medical applications (Chiang et al., 2014, Albu et al., 2019, Upadhyay and Nagpal, 2020). ML can be used to combat COVID-19 pandemic by improving diagnosis, prevention, monitoring, administration of treatments, disease surveillance and antiviral drug discovery to enhance patients’ health outcomes (Bullock et al., 2020, Latif et al., 2020, Alafif et al., 2021). Since the beginning of COVID-19 outbreak, there has been a growing interest in studying the diagnosis of COVID-19, either through the analysis of medical images (Albahri et al, 2020) or routine blood tests (Cabitza et al., 2021) by different ML techniques. These alternative diagnosing methods are less expensive and more accessible. In this section, we review some of these ML-based studies. The diagnosis of COVID-19 in ML terms can be formulated as Binary classification problem, hence, with the trained model, patients can be classified positive or negative COVID-19 or sometimes patients can be checked for the severity of illness (Albahri et al, 2020). Medical imaging, such as computed tomography (CT) scans and chest X-rays images, are the main two types of datasets that have been employed by different ML techniques and have demonstrated promising results to support the traditional diagnostic techniques of COVID-19, such as molecular biology (RT-PCR) and immune (IgM/IgG) assays. There have been several recent reviews with exclusive focus on X-rays or CT scans (Albahri et al., 2020, Latif et al., 2020, Alafif et al., 2021). Several studies observed that the sensitivity of CT in diagnosing COVID-19 is significantly higher than that of RT-PCR (Ai et al., 2020, Fang et al., 2020, Ye et al., 2020). However, CT scans have screening limitations because of the radiation doses, the relative low number of devices available, and the related high costs. In addition, by employing X-rays or CT scans only, COVID-19 can be mistakenly diagnosed as pneumonia or lung cancer (Ibrahim et al., 2021, Mohammad-Rahimi et al., 2021). Recently, in (Dong et al., 2020), some researchers have employed ultrasound imaging as a radiation-free and non-invasive tool for COVID-19 detection, especially for children and pregnant women. Other research groups have explored the opportunities of employing speech and sound analysis for a ML-based COVID-19 detection (Imran et al., 2020, Schuller et al., 2021). In (Zoabi et al., 2021), the authors tried a ML-based prediction of COVID-19 diagnosis based on symptoms. Recently, different studies have revealed that a routine blood test can play an important role in COVID-19 initial screening (Bao et al., 2020, Gao et al., 2020, Ferrari et al., 2020). Hence, a routine blood test can provide faster and cheaper diagnostic alternative to PCR test with comparable performance via different ML techniques (Brinati et al., 2020, Cabitza et al., 2021). In (Wu, J. et al., 2020), the authors are the pioneers of employing blood results in COVID-19 detection. They utilized a ML algorithm of three stages based on a random forest classification algorithm with several different validation methods to ensure the reliability and reproducibility of their COVID-19 identification algorithm. They achieved high accuracy ∼98 %, but the model considered few features, and the dataset is very small to be applicable in real settings. In (Wu et al., 2020, Yan et al., 2020), they employed datasets with different sizes from Tongji Hospital of Wuhan, China. Wu et al. (2020) achieved higher accuracy with smaller-size dataset with larger number of selected features. They build their model based on the maximum relevance minimum redundancy algorithm (mRMR), the least absolute shrinkage (LA) and LASSO logistic regression model. On the other side, (Yan et al., 2020) employed larger-size dataset with larger number of features. However, they selected small group of these features and achieved lower accuracy using a trained model based on XGBoost algorithm. In Feng et al. (2021), the authors continued employing small dataset from single source, i.e., First Medical Center, Beijing, China. However, they developed an innovative predictive model for an early identification of COVID-19 based on candidate features included clinical symptoms, routine laboratory tests, and other clinical information on admission. They employed LA and LASSO in their prediction model. They produce their own website for COVID-19 diagnosis (Suspected COVID-19 pneumonia Diagnosis Aid System, 2021). In addition, there exist different studies that employed their own dataset from single source, i.e., medical Centre or hospital, with different ML techniques (Joshi et al., 2020, Kukar et al., 2021, Li et al., 2020, Yang et al., 2020, Langer et al., 2020, Sun et al., 2020). Hospital Israelita Albert Einstein, São Paulo, Brazil has provided a common blood test dataset that has been utilized by different studies, such as (Alakus and Turkoglu, 2020, AlJame et al., 2020, Banerjee et al., 2020, de Moraes et al., 2020, Soares, 2020, Alves et al., 2021, de Freitas Barbosa et al., 2021). In (Alakus and Turkoglu, 2020, Banerjee et al., 2020, de Moraes et al., 2020, Soares, 2020, Alves et al., 2021), the authors employed small version of the original whole dataset. They achieved a medium accuracy, however, with very small number of selected features, based on applying multiple ML algorithms. de Freitas Barbosa et al., 2021, AlJame et al., 2020 employed the original full dataset. Both achieved high accuracy via multiple ML algorithms employing large number of selected features. Lately, AlJame et al. (2020) revealed better performance via three well-known classifiers, Extremely Randomized Trees, Random Forest and Logistic regression. Their model combines the predictions of three classifiers, as a first level classification, then, they used an extreme gradient boosting (XGBoost), as a second classifier, to achieve a better performance. San Raphael Hospital, Milan, Italy has provided the most recent dataset (OSR dataset), which adopted by different studies under different sizes with different feature mechanisms and variety of ML algorithms (Brinati et al., 2020, Cabitza et al., 2021, Shaban et al., 2021). Table 1 summarizes a comparison between the state-of the-art techniques, while Table 2 summarizes the list of abbreviations employed in this article.

Table 1

Comparison of different COVID-19 detection based on routine blood tests.

Authors/ref.	Datasetsource	ND/N+NF/NSF∗1	Adopted methodology	Accuracy	Sensitivity	Specificity	ROC-AUC
Soares, 2020	Hospital Israelita Albert Einstein, São Paulo, Brazil	599/81108/16	SMOTEBoost, Ensemble of 10 SVM models	–	70.25 %	85.98 %	86.78 %
Banerjee et al., 2020		598/81108/14	RF, LR, GLMNET, ANN	81 %–87 %	43 %–65 %	81 %–91 %	80 %-84 %
de Moraes et al., 2020		253/102108/15	NN, RF, GBT, LR, SVM	–	67.7 %–80.6 %	80 %–85 %	84.2 %–84.7 %
Alves et al., 2021		524/48108/23	DTX, RF, Ensemble of LR, RF, XGBoost, SVM, MLP	88 %	66 %	91 %	86 %
Alakus & Turkoglu, 2020		520/80108/18	Ensemble of ANN, CNN LSTM, RNN CNNLSTM CNNRNN	86.66 %	–	–	62.50 %
de Freitas Barbosa et al., 2021		5644/559108/24	XMLP, SVM, RT, RF, BN, NB	95.159 %	96.8 %	93.6 %	----
AlJame et al., 2020		5644/559108/18	KNNimputer, iForest, SMOTE, Ensemble of RF, LR, and ET	95 %	95 %	95 %	95 %
Wu et al., 2020	Tongji Hospital of Wuhan, China	11047/7	LASSO-LR	----	98 %	91 %	0·997
Yan et al., 2020	Tongji Hospital of Wuhan, China	375/201300/3	XGBoost	–	83 %	–	-----
Cabitza et al., 2021	San Raphael Hospital, Milan, Italy	1,624/84572	RF, NB, LR, SVM, and KNN	83 %–91 %	76 %–92 %	92 %–96 %	83 % − 94 %
Brinati et al., 2020		279/17713	DT, ET, KNN, LR, NB, RF, SVM, TWRF	82 % −86 %	92 % − 95 %	–	–
Shaban et al., 2021		279/17713	FI, DNN	97.658 %	96.55 %	–	–
Yang et al., 2020	New York Presbyterian Hospital/Weill Cornell Medicine (NYPH/WCM)	1,822/496685/33	LR, DT, RF, XGBoost	68.9 %–79.1 %	61.8 % −76.1 %	73.2 %–80.8 %	70.4 %–85.4 %
Joshi et al., 2020	Stanford Health Care, CA, USA	390/334	LR	–	86–93 %	35–55 %	–
Sun et al., 2020	Hospitals in Zhejiang, China	912/36131/10	LR, DT, RF, SVM. DNN	91 %	87 %	95 %	86.4 %
Langer et al., 2020	Hospital in Milan Italy	199/12774/42	ANN, LR, RF, DT	91.4 %	94.1 %	88.7 %	–
Kukar et al., 2021	University Medical Center, Ljubljana, Slovenia	5333/160117/35	XGBoost, RF, DNN	–	81.9 %	97.9 %	97 %

*1 is the dataset size, is the number of COVID-19 positive cases in the employed dataset, is the total number of features in the targeted dataset, and is the number of the selected features in the diagnosis process. Using “–”, means not mentioned in the original study.

Table 2

List of abbreviations.

Abbreviation	Explanation	Abbreviation	Explanation
ML	Machine learning	PCR	Polymer chain reaction
mRMR	maximum relevance minimum redundancy algorithm	SMOTEBoost	an oversampling method based on the SMOTE algorithm (Synthetic Minority Oversampling Technique)
SVM	Support vector machine	RF	Random Forest
LR	Logistic regression	GLMNET	Lasso and Elastic-Net Regularized Generalized Linear Models
ANN/NN	Artificial neural network	DNN	Deep neural network
GBT	Gradient boosting trees	XGBoost	is an optimized distributed gradient boosting library
MLP	Multi-layer perceptron	CNN	Convolutional neural network
LSTM	Long short-term memory (LSTM) is an artificial recurrent neural network (RNN)	NB	Naïve bayes
BN	Bayesian network	iForest	Isolation forest
LASSO	least absolute shrinkage and selection operator	KNN	k-nearest neighbors algorithm
TWRF	Trees Weighting Random Forest	FI	Fuzzy inference
DT	Decision Tree	GNB	Gaussian Naïve Bayes
ET	Extremely Randomized Trees	RSVM	Radial Support Vector Machine
LSVM	Linear Support Vector Machine	QDA	Quadratic Discriminant Analysis
LDA	Linear Discriminant Analysis	EO	Equilibrium optimizer
AdaBoost	Adaptive Boosting trees	MEO	Minkowski-based equilibrium optimizer

Comparison of different COVID-19 detection based on routine blood tests. *1 is the dataset size, is the number of COVID-19 positive cases in the employed dataset, is the total number of features in the targeted dataset, and is the number of the selected features in the diagnosis process. Using “–”, means not mentioned in the original study. List of abbreviations.

The employed routine blood test dataset

Here, the employed dataset for COVID-19 prediction is routine blood-test results performed on group of patients on admission to the ED department at the San Raffaele Hospital, ospedale San Raffaele, (OSR), from February 19, 2020, to May 31, 2020. The OSR dataset consists of two subgroups (Brinati et al., 2020, Cabitza et al., 2021) with different sample size and different blood features. (1) A larger sub-dataset consisting of 1736 sample with 35 features, named as “COVID-specific dataset”.3 (2) A smaller one consisting of 279 sample with 15 features, denoted as “CBC dataset”.4 The features set existed in OSR dataset is detailed in Table 3 . These features represent the numerical ones besides additional ones, like gender, age, and ID number. We always exclude ID number before processing. In Fig. 3 , the label distribution, i.e., swab result, of both employed sub-datasets are indicated. In addition, Fig. 4 indicates the distribution of COVID-19 examination results over the age and the gender of the samples.

Table 3

The numerical features in the OSR dataset with its mean value , standard deviation and missing rate..

Feature (Abb.)	Description	COVID-specific dataset				CBC dataset
Feature (Abb.)	Description	Exist.	MR %	μ	σ	Exist.	MR %	μ	σ
Calcium (CA)	A test checks the calcium level in the body that is not stored in the bones	✓	5.35	2.21	0.48	✕
Creatine kinase (CK)	This test measures the amount of an enzyme called creatine kinase (CK) in your blood. CK is a type of protein. The muscle cells in your body need CK to function.	✓	59.44	181.64	405.71	✕
Creatinine (CREA)	A test measures how well your kidneys are performing their job of filtering waste from your blood	✓	4.26	1.16	0.98	✕
Alkaline phosphatase (ALP)	ALP is an enzyme found throughout the body, but it is mostly found in the liver, bones, kidneys, and digestive system. When the liver is damaged, ALP may leak into the bloodstream	✓	27.3	88.54	71.44	✓	53	89.89	89.09
Gamma glutamyl transferase (GGT)	A test assess the body response to glucose	✓	25.11	66.22	135.39	✓	51.25	82.48	132.70
Glucose (GLU)	A test measures the level of glucose (sugar) in a person's blood	✓	5.65	119	57.91	✕
Aspartate aminotrans-ferase (AST)	AST is an enzyme that is normally present in the liver, heart, brain, pancreas, kidneys, and many other muscles and tissues in the body. Enzymes like AST help facilitate fundamental biological processes in these organs and tissues	✓	5.65	45.85	50.67	✓	0.72	54.20	57.61
Alanine aminotrans-ferase (ALT)	A test measures the amount of ALT in the blood. High levels of ALT in the blood can indicate a liver problem, even before you have signs of liver disease, such as jaundice, a condition that causes your skin and eyes to turn yellow. An ALT blood test may be helpful in early detection of liver disease	✓	5.53	39.17	42.55	✓	4.66	44.92	45.50
Lactate dehydrogenase (LDH)	A test looks for signs of damage to the body's tissues. LDH is an enzyme found in almost every cell of your body, including your blood, muscles, brain, kidneys, and pancreas. The enzyme turns sugar into energy	✓	17.45	327.64	211.62	✓	30.47	380.45	193.98
polymerase chain reaction (CRP)	A test measures the amount of CRP in the blood to detect inflammation due to acute conditions or to monitor the severity of disease in chronic conditions	✓	5.59	67	77.8	✓	2.15	90.88	94.4
Potassium (K)	A test checks how much potassium is in the blood	✓	4.61	4.23	0.52	✕
Sodium (NA)	checks how much sodium is in the blood	✓	4.21	138.59	4.58	✕
UREA	Urea is usually passed out in the urine. A high blood level of urea indicates that the kidneys may not be working properly, or that you have a low body water content (are dehydrated)	✓	38.94	48.96	42.47	✕
White blood cell (WBC)	A test measures the count of White blood cells	✓	3.63	8.72	4.64	✓	0.72	8.55	4.86
Red blood cell (RBC)	A test measures the count of Red blood cells	✓	3.63	4.52	0.73	✕
Hemoglobin (HGB)	a protein in your red blood cells that carries oxygen to your body's organs and tissues and transports carbon dioxide from your organs and tissues back to your lungs	✓	3.63	13.14	2.04	✕
Hematocrit (HCT)	A test measures the proportion of red blood cells in your blood. Red blood cells carry oxygen throughout your body. Having too few or too many red blood cells can be a sign of certain diseases	✓	3.63	39.21	5.61	✕
Mean corpuscular volume (MCV)	There are three main types of corpuscles (blood cells) in your blood: red blood cells, white blood cells, and platelets. An MCV blood test measures the average size of your red blood cells	✓	3.63	87.29	7.06	✕
Mean corpuscular hemoglobin (MCH)	It's the average amount in each of your red blood cells of a protein called hemoglobin, which carries oxygen around your body	✓	3.63	29.21	2.72	✕
Mean corpuscular hemoglobin concentration (MCHC)	A test checks the average amount of hemoglobin in a group of red blood cells	✓	3.63	33.45	1.34	✕
Platelets (PLT)	A normal platelet count ranges from 150,000 to 450,000 platelets per microliter of blood	✓	3.63	235.66	94.22	✓	0.72	226.53	101.17
Neutrophils (NET, NE)	a type of white blood cell that helps heal damaged tissues and resolve infections (109/L,%)	(✓,✓)	(20.85, 20.85)	(6.45, 72.35)	(4.47, 13.26)	(✓,✕)	(25.1, -----)	(6.2, ----)	(4.17, ----)
Lymphocytes (LYT, LY)	are a type of white blood cell. They play an important role in your immune system, helping your body fight off infection (109/L,%)	(✓,✓)	(20.85, 20.85)	(1.37, 18.58)	(0.95, 11)	(✓,✕)	(25.1, -----)	(1.18, -----)	(0.81, ----)
Monocytes (MOT, MO)	are a measurement of a particular type of white blood cell. Monocytes are helpful at fighting infections and diseases (109/L,%)	(✓,✓)	(20.85, 20.85)	(0.62, 7.83)	(0.54, 3.88)	(✓,✕)	(25.1, -----)	(0.61, -----)	(0.41, -----)
Eosinophils (EOT, EO)	are a type of disease-fighting white blood cell. This condition most often indicates a parasitic infection, an allergic reaction or cancer (109/L,%)	(✓,✓)	(20.85, 20.85)	(0.07, 0.88)	(0.14, 1.62)	(✓,✕)	(25.1, ----)	(0.06, -----)	(0.13, ----)
Basophils (BAT, BA)	are a type of white blood cell. Like most types of white blood cells, basophils are responsible for fighting fungal or bacterial infections and viruses (109/L,%)	(✓,✓)	(20.85, 20.85)	(0.02,0.34)	(0.04,0.27)	(✓,✕)	(25.45, ----)	(0.01, -----)	(0.04, ----)

Fig. 3

COVID-19 examination results for COVID-specific dataset in (a) and for CBC dataset in (b).

Fig. 4

COVID-19 swab result distribution according to age and gender for COVID-specific dataset in (a) and for CBC dataset in (b).

The numerical features in the OSR dataset with its mean value , standard deviation and missing rate.. An illustration of the proposed COVID-19 prediction model. COVID-19 examination results for COVID-specific dataset in (a) and for CBC dataset in (b). COVID-19 swab result distribution according to age and gender for COVID-specific dataset in (a) and for CBC dataset in (b).

The proposed methodology

In this section, the proposed COVID-19 detection algorithm, as a binary classification problem, is detailed in some subsections, see Fig. 2 where an illustration of the proposed COVID-19 detection algorithm is indicated. In the first and the second subsections, the dataset preparation and feature pool sparsification are demonstrated. In the third one, the proposed feature selection scheme is indicated and finally, in the last subsection, the deep classification model is proposed.

Fig. 2

An illustration of the proposed COVID-19 prediction model.

Dataset preparation

The process of data preparation includes four stages: handling categorical features, handling missing values, outliers detection and elimination, and data balancing. : The only categorical features in the OSR datasets are the gender and the covid exam result. Hence, both are mapped to 0 and 1. firstly, the samples that have more than 75 % of its features missed are excluded. Secondly, to address data incompleteness, we performed missing data imputation by k-nearest neighbors using the mean value from nearest neighbors. KNN algorithm is useful for matching a data-point with its closest k neighbors in a multi-dimensional space. outliers elimination helps to increase the accuracy of the classification model. Clustering-based approaches (Borlea et al., 2021) can be used for outlier detection (Zhang et al., 2021). However, for detecting anomalies in the adopted OSR dataset, we employed a tree-based approach, i.e., Isolation Forests algorithm (Liu et al., 2008). Isolation Forests (IF or iForest), like Random Forests, are build based on decision trees. It has no pre-defined labels. Hence, it is an unsupervised model like most of outlier detection algorithms. iForest is based on the fact that anomalies are “few and different”. In iForest, randomly sub-sampled data is processed in a tree structure based on randomly selected features. The samples that travel deeper into the tree are less likely to be anomalies as they required more cuts to isolate them. Similarly, the samples which end up in shorter branches indicate anomalies as it was easier for the tree to separate them from other observations. We chose iForest as an outlier detection method because it employs no distance or density measures to detect anomalies which eliminate most of the computational cost of distance calculation in all distance-based and density-based outlier detection algorithms. In addition, iForest has a linear time complexity with a low constant and a low memory requirement, hence it can handle extremely large data size. Fig. 5 indicates the visual results of the detected outliers in COVID-specific dataset via visualizing 3 PCA components.

Fig. 5

3D Visualization of the predicted outliers/inliers in COVID-specific dataset via three PCA components.

3D Visualization of the predicted outliers/inliers in COVID-specific dataset via three PCA components. : Having unbalanced data, where the number of samples belonging to one class is significantly lower than those belonging to the other classes, might bias the classification to the majority class. Hence, we performed a synthetic balancing by using Synthetic Minority Oversampling TEchnique (SMOTE) (Chawla et al., 2002). SMOTE is an oversampling technique for generating synthetic samples from the minority class. SMOTE uses linear combinations of two similar samples to construct new data.

Feature pool sparsification

In order to exploit the real dependency among features in the preprocessed feature pool to have a better prediction for the most important features (blood exams), we propose a novel idea of sparsifying the feature pool, i.e., representations which are sparse or of low redundancy. After the sparsification process, we now have two versions of the feature pool, i.e., the original preprocessed feature pool and its corresponding sparse one , which are needed in the upcoming feature selection algorithm. The idea of features sparsification is about decomposing the feature pool to low-rank feature pool and sparse feature pool. Hence, consider having a feature pool , where is the feature vector for samples. This feature pool can be assumed neither sparse nor low rank. Hence, its low-rank and sparse structure can be explored by either approximation or decomposition. Robust Principal Component Analysis (RPCA) (Candès et al., 2011) offers a blind separation of low-rank data and sparse noises, i.e., , where is the low-rank component of the feature pool , while is the sparse one. Hence, RPCA deals with the targeted sparse component as noise or unwanted part. Hence, we seek for trilateral decomposition, i.e., , where is the noise part contaminating the feature pool. This problem is intrinsically different from RPCA. Different studies introduce different styles for a trilateral decomposition of signals for different purposes, such as the work in (Zhou and Tao, 2011, Bouwmans et al., 2017, Altantawy et al., 2020). Seeking sparse features, the feature pool can be decomposed in terms of Low-rank and sparse components as is a tight rank- approximation to the feature pool , and has a cardinality of no more than . The decomposition problem can be solved by minimizing the decomposition error as The optimization problem in Eq. (2) can be solved by alternatively solving two sub-problems until convergence. These two subproblems can be expressed at iteration as The above two subproblems in Eq. (3), particularly, can be solved by updating via singular value hard thresholding (Candès et al., 2011) of and updating via entry-wise hard thresholding of , i.e., keeping entries of that have the largest absolute values, aswhere represent entry-wise hard thresholding operation. The main computational cost of solving the previous subproblems belongs to SVD in updating the low-rank component , especially with large feature pool size . In (Halko et al., 2009), the authors prove that a matrix can be well approximated by its projection onto the column space of its random projections. This rank-revealing method provides a fast approximation of SVD. Hence, given bilateral random projections (BRP) of an dense feature pool matrix (w.l.o.g, ), i.e., and , where and are independent Gaussian random matrices, the low-rank component can be obtained according to (Fazel et al., 2008) as However, and are correlated random matrices updated from and , respectively, and can be obtained as a tight rank- approximation to a full rank matrix . Hence, we replace SVD with BRP, since BRP based low-rank approximation is near optimal and efficient in order to significantly reduce the time cost (Zhou and Tao, 2011). However, when singular values of the feature pool decay slowly, Eq. (5) may perform poorly, i.e., doesn’t guarantee a tight rank- R approximation. Accordingly, the power scheme in (Zhou and Tao, 2011) can be employed with BRP to perform the decomposition process. According to the power scheme, we instead calculate BRP of a new version of the feature pool matrix , whose singular values decay faster than . In particular, . Both and share the same singular vectors. The BRP of can be expressed as Like Eq. (5), the BRP based rank approximation of is demonstrated as Hence, in order to obtain the approximation of the original feature pool with rank r, QR decomposition of and is calculated as Accordingly, the low-rank composite and the sparse composite of the original feature pool can be demonstrated as Algorithm 1 summarizes the main steps for the decomposition process seeking the targeted sparse feature pool .

The proposed feature selection scheme

The goal of feature selection is to find which blood exams are more relevant to COVID-19 prediction. Hence, we can gain three jackpots: first, the number of required exams for the diagnostic decision is reduced and consequently the total price. Second, a dimensionality reduction is obtained and consequently less computations. Third, selecting the appropriate features helps to reduce data redundancy and to avoid noisy data, hence, the classification model performance can be improved. After the sparsification process, we now have two versions of feature pool, i.e., the original preprocessed feature pool and its corresponding sparse one . In addition, we intend to apply-two feature selectors. The first is Pearson correlation-based one (PCC) which provides a quick screen and removal of irrelevant features relying on the characteristics of the data, without any need to complicated machine learning algorithms, thus, it is computationally less expensive. However, PCC can give lower prediction performance. Hence, a second feature selector is needed. Inspired by the traditional Equilibrium optimizer (EO) (Faramarzi et al., 2020), which is a novel physics-based meta-heuristic optimization algorithm, we propose a new Minkowski-based equilibrium optimizer (MEO) which can provide better selection performance compared to the traditional EO. The advantages of such meta-heuristics include their simplicity, independency to the problem, flexibility, and gradient-free nature (Halim et al., 2021). Having two feature selectors, they can be applied serially or parallelly and can be applied to the two versions of the feature pool, i.e., and , then, the different selection decisions can be fused to get the most important features as proposed in Fig. 6 . Applying the selectors serially is expected to provide the best decisions in contrast to applying the selectors parallelly. Hence, we combine the decisions from the serial application of selectors through (OR) operation and seek the intersection in decisions from the parallel application of selectors through (AND) operation. In the following subsection, the introduced two feature selectors in the fused selection scheme, i.e., PCC, and the new Minkowski-based equilibrium optimizer (MEO) are indicated.

Fig. 6

An illustration of the proposed feature selection technique that is based on a fusion process between Pearson dropping (PCC) and the introduced Minkowski-based equilibrium optimizer (MEO) in a serial and parallel manner in the original features domain once and in the proposed sparse domain another. represents combining decisions by OR operations while represents seeking the intersections of decisions by AND operations.

Pearson correlation-based feature selection

Features in their native form are not always correlated with each other. After the stage of feature sparsification, a clear and real correlation is exploited between features. The features with an extremely high correlation should be eliminated. Hence, reducing relevant features is helpful to loose the learned model and then eliminate overfitting to a certain extent. Pearson Correlation Coefficient (PCC) is employed here to help in the feature dropping task and it is expressed as in Eq. (10) to evaluate the linear correlation between two feature vectors where is the covariance matrix, are the standard deviations of , respectively, while are the respective means. can be ranged from −1 to 1. “1” indicates full positive correlation, while “−1” implies a negative full correlation. 0 is a sign of non-correlation. Mostly, show extremely high correlation when exceeds a threshold of 0.8 and strong correlation when exceeds threshold of 0.6. In Fig. 7 , pairwise Pearson correlation of the original feature pool and the sparsified one is shown. As indicated, after feature sparsification, Pearson maps become more brighter by discovering more correlation between features. For COVID-specific dataset, we have initially 34 features, by applying Pearson elimination to and with threshold 0.8, we got 28 selected features from , while we got 23 features from which demonstrates that sparsity allowed us to drop more 5 features. With a threshold of 0.6, the feature pool is reduced from 34 to 24 features, while the sparsified feature pool turned from 34 into 15 features, which means that sparsity allowed us to drop more 9 features. On the other side, for CBC dataset, we have initially 16 features. The features in both and in CBC dataset don’t show a correlation higher than 0.8. However, with a Pearson threshold of 0.6, the feature pool reduced from 16 to 13 features, while the sparsified one turned from 16 into 12 features, which means that sparsity allowed us to drop more 1 feature.

Fig. 7

Pairwise Pearson correlation of features: (a), (c) for the original feature pool while (b), (d) for the sparsified feature pool . The first row for COVID-specific dataset and the second one for CBC dataset.

Equilibrium-based feature selection

The traditional equilibrium optimizer (EO)

Equilibrium optimization (EO) is originally inspired by the dynamic mass balance equation which describes the conservation of mass that enters, leaves, or generates in a control volume (Faramarzi et al., 2020). In another words, the dynamic mass balance equation is utilized to measure the number of mass entries and be generated in the volume over a period of time. The following three steps indicates the operation of EO. Similar to other meta-heuristic algorithms, the EO search starts by initializing the population of candidate solutions/ features/blood exams. For this initialization, a uniform random one in the search space is required. Eq. (11) demonstrates the initial distributed solutions in the search space.where indicates to the ith candidate solutions/features. , and are the minimum and maximum bounds for the ith candidate solution , respectively. is d-dimensional random vector ranging from zero to one. specifies the number of particles/solutions in the group. Then, the equilibrium candidates are determined by a sorting process to their fitness function. The objective function, i.e., fitness function, is employed within each optimization process in order to measure the fitness or the quality of each solution. The solution with the best fit is assigned as the best-so-far one for solving the targeted optimization problem. The proposed fitness function is a weighted sum between the classification accuracy based on KNN classifier and the proportion of the number of features/particles selected during each iteration, aswhere represents the classification accuracy of the currently selected features. represents the number of the currently selected features, while k is the total number of features in the feature pool. is a weighting random coefficient between [0, 1]. . As most of meta-heuristic algorithms that search for food source, EO searches for the equilibrium state of system/problem. At the beginning of the optimization process, the equilibrium state is unknown, i.e., the concentrations that achieve equilibrium are unknowns. The equilibrium state represents the global optimum of the optimization problem which is the final convergence state of the algorithm. However, equilibrium candidates are identified to provide a search domain for the particles. According to Faramarzi et al. (2020), choosing or assigning five equilibrium candidates, mostly, works effectively. The first four , are the four “best-so-far” particles identified in the population during the whole optimization process and the last one is the particle with concentration equals the arithmetic mean of the previous mentioned four particles, i.e., . The first four candidates help EO to have better diversification capability, while the last average one enhances the EO exploitation. Of course, the optimization problem has a word on determining the most proper number of candidates. The equilibrium pool is a vector constructed from these candidates as .when the candidate solutions/features are initialized using Eq. (11), their positions are updated over iterations bywhere and are the original and updated concentrations of solutions/features at and , respectively. is a randomly selected feature vector from the equilibrium pool . The exponential term , as indicated in Eq. (14), helps in the main concentration updating role by keeping a good balance between exploration and exploitation in the Equilibrium optimization process. The exponential term relies on the turnover rate , and the time interval . is originally varies with time in a real control volume. Hence, it is supposed to be a random vector ranging from zero to one. On the other side, the time interval boundaries are defined aswhere is the iteration number, while is the total number of iterations. is a constant value for controlling the exploitation process.where denotes a constant value which controls the diversification and intensification of EO process. By increasing the parameter , the exploration/diversification capability increases while the exploitation/intensification ability decreases. On the other side, the higher the parameter , the higher intensification capability and the lower diversification capability. The term controls the direction of exploitation and exploration based on another random vector, , ranging from zero to one. By employing Eq. (15), (16), the exponential term can be rewritten as Another term to enhance the exploitation phase is the generation rate G which is a first-order exponential decay process demonstrated aswherewhere , and are random numbers in a range from zero to one. is defined as the generation rate control parameter, i.e., it controls generation term’s contribution to the updating process. is the probability of how many particles utilize generation term to update their states. For keeping a good balance between exploitation and exploration, is assigned a value of 0.5. Fig. 8 indicates a 2D representation of the equilibrium candidates’ collaboration to update the concentration of a particle. In this figure, is representative of the second term in Eq. (14) and it is responsible for searching the space, i.e., exploration role, to find an optimum point. The large variation between a sample concentration and the equilibrium makes the term contribute more to the exploration process of EO. On the other side, the term is a representative of the third term in Eq. (14). It introduces small variations in the concentration once a point is found by the exploration process. These small variations contribute to making the solution more accurate. Hence, the term contributes more to the exploitation process of EO. In addition, the sign of both the second and the third term helps in the exploration and the exploitation process. Having the same signs makes the variations larger and accordingly searching the full space better, while opposite signs keep small variations which enhances the local searches. In Algorithm 2, a pseudo code to indicate the procedure of the traditional EO is demonstrated.

Fig. 8

2D illustration of Equilibrium candidates’ collaboration in updating particles’ concentration.

The proposed Minkowski-based equilibrium optimizer (MEO)

In the proposed MEO, we try to move a set of the worst solutions, i.e., particles with worst fitness, toward the “best-so-far” attempting to find better solution in a smaller number of iterations. However, this recycling idea may cause an entrapment in local minima, and accordingly, the chance of having better global solution is impossible. Hence, in the proposed modified version of EO (MEO), a recycling strategy for the worst solutions is presented with a strategy for local minima suppression. The proposed MEO is indicated in the following subsections. Recycling strategy for the worst solutions: As mentioned before, the main purpose of this strategy is to move the worst solutions toward the best-so-far solutions, hence, the chance to find solutions better than the best-so-far solutions can be enhanced. At the same time, the recycling strategy should guarantee to take the solutions away from the local minimum. Hence, the number of worst solutions to be recycled is controlled by the following equation.where is the size of the initial population, while denotes a fixed number of the solutions to be updated within each iteration. is the current iteration number and is the total number of iterations. As indicated from Eq. (20), as the iteration number increases, the recycling strategy is controlled by decreasing the number of the worst solutions to be updated to decrease the chance of local minima entrapment. After finding the most suitable number of worst particles to be recycled, the recycling mechanism of their concentrations/features are demonstrated aswhere is a weighting random parameter, in range from zero to one, between the mean equilibrium concentration and a randomly selected concentration from the equilibrium pool. This weighting mechanism is proposed to keep a suitable diversity between the worst solutions even after their movements towards the best-so-far ones. is a random number ranging from zero to one. Local minimum avoidance: To support MEO in their fighting towards the local minima problem for achieving better solutions within their searches, the technique in Eq. (22), (23) is proposed. In this technique, both local and global exploration can be controlled according to the degree of the diversity in the equilibrium pool. The diversity in the equilibrium pool, , is calculated in Minkowski-based manner between each pair of equilibrium concentrations aswhere is the order of Minkowski distance metric. denotes the number of particles/ solutions in the equilibrium pool. Hence, the avoidance of local minima problem is demonstrated aswhere and are two random vectors in range [0,1]. The updating mechanism in Eq. (23) offers a global exploration property within the search boundaries when the diversity in the equilibrium pool is low, i.e., , where is a specific predefined threshold for the degree of diversity. In the other hand, with large diversity, i.e., , the updating mechanism offers a local exploration between two solutions selected randomly from the population, i.e., and . Fig. 9 indicates a flow chart for the proposed MEO. In addition, in Fig. 10 , a comparison is set between the traditional EO and the proposed modified version MEO, with Minkowski distance of order , employing the original feature pool once and employing the sparsified one another for COVID-specific dataset. In the original feature domain , the traditional EO provides a leader fitness of 0.838 with a leader KNN classification accuracy of 0.864, while the proposed MEO provides a leader fitness of 0.87 with a leader KNN classification accuracy of 0.896. on the other side, in the sparse domain , the traditional EO provides a leader fitness of 0.848 with a leader KNN classification accuracy of 0.882, while the proposed MEO provides a leader fitness of 0.878 with a leader KNN classification accuracy of 0.911. Hence, Applying the proposed MEO to the sparsified feature pool shows the best performance in terms of leader accuracy. In addition, sparse features, even, help the traditional EO to have better performance compared to the traditional features.

Fig. 9

Flow chart of the proposed MEO algorithm.

Fig. 10

Comparison of the results of average fitness over iterations for the traditional EO, in the first row, and the proposed MEO, in the second one, for COVID-specific dataset. The first column is the results of the original feature pool while the second one for the sparsified feature pool .

Flow chart of the proposed MEO algorithm. Comparison of the results of average fitness over iterations for the traditional EO, in the first row, and the proposed MEO, in the second one, for COVID-specific dataset. The first column is the results of the original feature pool while the second one for the sparsified feature pool .

Classification stage:

Following the proposed dataset preprocessing and feature selection, an efficient classifier is needed. Mostly, ensembles of different machine learning classifiers are employed to guarantee better classification performance, such as the diagnosis criteria in (AlJame et al., 2020, Alves et al., 2021). In (Brinati et al., 2020, Cabitza et al., 2021, de Freitas Barbosa et al., 2021), the authors tried to introduce the performance of different machine learning classifiers in comparative way to choose the best classifier. On the other hand, instead of the traditional machine learning techniques, the authors in (Alakus and Turkoglu, 2020, Shaban et al., 2021) employed deep learning techniques in their diagnosis. (Alakus & Turkoglu, 2020) introduced a prediction study for COVID-19 disease with deep learning application models, such as Artificial Neural Network (ANN), Convolutional Neural Networks (CNN), Long-Short Term Memory (LSTM), Recurrent Neural Networks (RNN), CNNLSTM, and CNNRNN. Shaban et al. (2021) proposed a hybrid classification model that consists of two classifiers: fuzzy inference engine and Deep Neural Network (DNN). Deep Learning (DL) is the latest accomplishment of the machine learning era by providing a multi-level hierarchical architecture with subsequent stages for more effective information processing. DL era is started by (Hinton & Salakhutdinov, 2006), when they explained the role of ‘‘the depth” of an ANN in machine learning. In other words, they pointed out the role of increasing the number of hidden layers in increasing the learning ability of networks. Convolutional Neural Networks (CNN), as a common type of deep neural networks (DNN), are mostly used with two-dimensional data (2D CNN), such as images (Albawi et al., 2017). CNN mainly constructed from convolutional layers with pooling layers, as a feature extraction stages, and fully connected layers for classification. The advantages of CNNs can be summarized as following. 1) with a single body, CNNs can guarantee a fusion process between feature extraction and feature classification. 2) The features can be directly optimized from the raw input during the training process. 3) CNNs can deal with large inputs effectively via sparsely-connected neurons outperforming the traditional Multi-Layer Perceptrons (MLP) networks. 4) CNNs are robust to small variants in the input data, such as translation, scaling, skewing and distortion. 1D Convolutional Neural Networks (1DCNNs) have recently been developed (Kiranyaz et al., 2021). They can deal efficiently with 1D signals. 1DCNNs have superior advantages, such as low computational cost due to employing 1D convolutions instead of 2D convolutions in 2DCNNs. Usually, 1DCNN employs small number of hidden layers, hence, we get small number of learning parameters which suits CPU implementations and real-time applications. To understand the performed operations in a 1DCNN, Fig. 11 indicates a simplified example that provides an overview. Conv is a 1D convolution layer with some feature detectors (filters). The selected number of filters defines how many sliding windows are used. Each filter has a kernel size (filter length) that matches the size/height of the slider window. This window will slide through the data and lead to an output matrix. The first Conv layer learns the basic functions. An additional 1D convolution layer with other filters before pooling allows our model to learn more complex functions. If the layer is a convolution layer, the formula for the one-dimensional convolution layer, i.e., the output of this layer, is indicated in Eq. (24). Pooling layer is a layer to reduce variance and computation complexity (e.g., average pooling reduces 75 % of data) and extract low-level features from the neighborhood. Applying Max Pooling moves a window over our data and replaces the values with the maximum value. The pooling layer will remove a certain percentage of our values from the previous layer, creating a new matrix. To further reduce the probability of over-fitting, a drop layer is added. Dense layer is a fully connected layer to ensure better classification results. Flatten is a layer to flatten the multi-dimensional data, resulting from the previous conv layers, which cannot be feed directly into the feed forward neural network. Hence, they are used usually before dense layers to flatten data firstly. The final layer is a dense layer which uses a SoftMax activation function to generate a probability distribution across the output classes. The final output layer consists of neurons (one for each label/output class) including their probability. The output of a fully connected layer is demonstrated in Eq. (25).where denotes the convolution kernels, represents the number of kernels, indicates the channel number of input ; represents the output from the previous layer. denotes the bias corresponding to the kernel. denotes the adopted activation function and ∗ represents the convolution operator.where , denote the weights and the bias, respectively.

Fig. 11

An example of 1DCNN model for a binary classification problem. In this example, the network consists of two convolutional layers (Conv_1with 32 filters and Conv_2 with 64 filters), Max pooling layer, flattening layer and finally some fully connected layers with soft-max layer. In Fig. 12 , a graphical summary the adopted 1DCNN model in the proposed COVID −19 prediction algorithm is shown. As indicated, the proposed network consists of 4 convolutional layers with filters sizes , all have the same kernel size of 32 and all employ ReLU as activation function. We employed 4 drop out layers. The first three with a dropping factor of 0.2 and the last one with a dropping factor of 0.5. Their main function is to inactivate 20 %, and 50 % of neurons, respectively, in order to prevent overfitting. Then, a flatten layer is utilized to flatten the multi-dimensional data to suit a dense layer of 128 neurons. Finally, a dense layer with 2 neurons and with SoftMax activation function is used to suit the binary classification problem. We have employed Stochastic Gradient Descent (SGD) as an optimizer with a learning rate of 0.001, and momentum of 0.9. The stopping criterion of the training process is when we got no change in the validation accuracy for 5 epochs.

Fig. 12

Summary of the proposed 1DCNN for COVID-19 prediction considering 9 selected features.

Experimental results and discussions

To assess the performance of the proposed algorithm, several performance metrics are employed. We chose six metrics to evaluate the performance: accuracy (ACC), precision (PPV), F1-score, AUC, specificity (SP), and sensitivity (SV). Those metrics are based on the resultant confusion matrix values, i.e., TP, TN, FP, and FN, see Fig. 13 for the metrics formulas. AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) demonstrates the relation between TP rate, on Y-axis, and FP rate, on X-axis. The higher the AUC, the higher the efficiency of the model in differentiating between the problem’s two classes.

Fig. 13

The employed evaluation metrics.

The employed evaluation metrics. During the performed experiments of the proposed model, to avoid the risk of over-fitting, the employed dataset is splitted into a training set (75 % of the instances) and a test set (25 % of the instances) using a stratified procedure, then 30 % of the training set is used for validation. During all performed experiments, the models were trained and calibrated on the whole training set. Later, the calibrated models were evaluated on the hold-out test set in terms of the previously mentioned six metrics. In the upcoming subsections, the performance of the proposed diagnosis algorithm is indicated in detail through an ablation study. In this ablation study, while we discuss the impact of a specific step, we keep the other steps of the proposed algorithm the same, see Fig. 2 for an illustration of the whole algorithm.

Impact of data preparation

In this step, as discussed in the methodology section, we employed iForest algorithm for outlier detection and SMOTE for data balancing. Table 4 indicates computitative validation results of the effect of employing the data preparation steps on the proposed COVID-19 diagnosis algorithm while the other main steps are kept the same. As indicated, we got improvements in all evaluation metrics as we move from case (1) of not applying any preprocessing steps to the employed dataset to case (4) of applying both SMOTE and iForest for data balancing and outlier detection, respectively. For iForest, we kept the default parameter in Scikit-learn library, except the number of base estimators, we employed 150. For SMOTE, we kept the default parameters in Imbalanced-learn library. In Table 4, in COVID-specific dataset, we got the biggest enhancement in the specificity with an increase from 0.773 (case 1) to 0.958 (case 4) with a reduction of 6 features. On the other side, in CBC dataset, we got the biggest enhancement in accuracy with an increase from 0.771 to 0.983. In Fig. 14 , the confusion matrices of testing the prementioned cases is indicated. These matrices show the effectiveness of the adopted preprocessing steps in enhancing the performance of the proposed COVID-19 prediction model.

Table 4

dataset	Case	Case name	ACC	PPV	SV	F1	AUC	SP	Features*
Covid-specific dataset	1	Imbalanced w/ outliers	0.83	0.834	0.816	0.824	0.890	0.773	19/33
	2	Balanced w/ outliers	0.866	0.876	0.918	0.896	0.929	0.921	18/33
	3	Imbalanced w/o outliers	0.894	0.882	0.902	0.891	0.956	0.901	19/33
	4	Balanced w/o outliers	0.988	0.985	0.975	0.979	0.988	0.985	13/33
CBC dataset	1	Imbalanced w/ outliers	0.771	0.783	0.9	0.833	0.824	0.806	7/13
	2	Balanced w/ outliers	0.923	0.921	0.956	0.938	0.976	0.938	6/13
	3	Imbalanced w/o outliers	0.906	0.902	0.956	0.92	0.95	0.931	9/13
	4	Balanced w/o outliers	0.994	0.985	0.993	0.986	0.998	0.986	6/13

The selected number of features (x) out of the total size of the original feature pool (y); (x/y)

Fig. 14

Confusion matrices of testing the proposed COVID-19 prediction algorithm adopting the four cases indicated in Table 4 showing the effect of SMOTE and iForest on the performance.

The validation results of the effect of employing the data preparation steps, i.e., SMOTE for data balancing and iForest for outlier detection, on the proposed COVID-19 diagnosis algorithm on the employed datasets. The selected number of features (x) out of the total size of the original feature pool (y); (x/y) Confusion matrices of testing the proposed COVID-19 prediction algorithm adopting the four cases indicated in Table 4 showing the effect of SMOTE and iForest on the performance.

Impact of feature selection:

In this subsection, we will discuss the effect of employing the proposed feature selection that is based on a fusion mechanism between the selected features from a correlation-based (PCC) perspective and the modified equilibrium-based (MEO) perspective. Hence, we will discuss first the application of the two adopted feature selection techniques, i.e., PCC and MEO, separately, and then compare to the performance of applying the fusion mechanism. In the solo application, PCC and MEO can be applied to the original feature pool (original feature domain) or the sparse one (sparse domain), then after the selection decisions, the classification step can be applied to samples from or . Hence, we have ten conditions to be studied, check Table 5, Table 6 . 1) employing all features in , 2) employing all sparse features in sparse domain , 3) correlation-based selection in and employing the same domain for performing training and testing, 4) correlation-based selection in and performing training and testing in sparse domain, i.e., sparsified samples from , 5) correlation-based selection in and employing the original features domain for performing training and testing, 6) correlation-based selection in features original domain and performing training and testing in sparse domain , 7) MEO-based selection in the features original domain and employing the same domain for performing training and testing, 8) MEO-based selection in features original domain performing training and testing in sparse domain, 9) MEO-based selection in sparse domain and employing the original features domain for performing training and testing. 10) MEO-based selection in features original domain and performing training and testing in sparse domain.

Table 5

	Train and testing for the original samples in F						Train and testing for the sparse samples in Fs
	ACC	PPV	SV	F1	AUC	SP	ACC	PPV	SV	F1	AUC	SP
All features (33)	0.94	0.943	0.96	0.952	0.986	0.961	0.942	0.944	0.96	0.954	0.987	0.966
PCC-based feature selection for the original features in F (22 features)	0.939	0.94	0.96	0.95	0.98	0.958	0.943	0.947	0.961	0.954	0.986	0.958
PCC-based feature selection for the sparse features in Fs (14 features)	0.932	0.935	0.958	0.946	0.977	0.958	0.939	0.942	0.96	0.95	0.983	0.955
MEO-based selection for the original features in F (12 features)	0.925	0.93	0.95	0.94	0.98	0.948	0.932	0.94	0.952	0.945	0.98	0.961
MEO-based selection for the sparse features in Fs (12 features)	0.934	0.94	0.96	0.947	0.983	0.958	0.944	0.95	0.96	0.955	0.984	0.958

Table 6

Validation results of applying all features, and PCC and MEO-based feature selection, separately, in different cases for CBC dataset. The best performance is marked by bold font. (--) is the number of selected features.

	Train and testing for the original samples in F						Train and testing for the sparse samples in Fs
	ACC	PPV	SV	F1	AUC	SP	ACC	PPV	SV	F1	AUC	SP
All features (13)	0.989	0.985	0.997	0.991	0.999	1	0.99	0.988	0.995	0.992	1	1
PCC-based feature selection for the original features in F (10 features)	0.987	0.982	0.997	0.99	0.999	1	0.986	0.983	0.995	0.989	0.999	1
PCC-based feature selection for the sparse features in Fs (9 features)	0.989	0.987	0.995	0.991	0.999	1	0.989	0.987	0.998	0.991	0.999	1
MEO-based selection for the original features in F (8 features)	0.99	0.988	1	0.992	0.999	1	0.986	0.984	0.994	0.989	0.999	1
MEO-based selection for the sparse features in Fs (6 features)	0.961	0.963	0.974	0.968	0.989	0.964	0.969	0.968	0.983	0.974	0.996	0.988

Validation results of applying all features, and PCC and MEO-based feature selection, separately, in different cases for COVID-specific dataset. The best performance is marked by bold font. (--) is the number of selected features. Validation results of applying all features, and PCC and MEO-based feature selection, separately, in different cases for CBC dataset. The best performance is marked by bold font. (--) is the number of selected features. To take a look at the selected features in each case and get an indication of its importance in the prementioned ten conditions, Fig. 15, Fig. 16 indicates the AdaBoost feature importance in different conditions. As indicated, we can see how the sparse domain exploit the real importance of features compared to the original domain of features. Sparsity gives the highest importance to white blood cell count (WBC). Recently, WBC is considered as a prognostic indicator of COVID-19 (Li et al., 2021). In addition, in Table 5, Table 6, computitative validation results are indicated as a comparison between the prementioned conditions. As indicated, in COVID-specific dataset, Table 5,employing all features in the features original domain or sparse domain didn’t introduce that high performance compared to the other conditions due to the existence of high correlated features, such as these couple of features, Neutrophils (NET, NE), Lymphocytes (LYT, LY), Monocytes (MOT, MO), Eosinophils (EOT, EO). On the other side, performing a correlation-based or MEO-based selection in sparse domain can show superior performance even with smaller number of features than that of original features domain. Hence, it is demonstrated that sparsifying features can exploit more details from samples, hence better performance can be achieved. For CBC dataset, Table 6, we can see that employing all features provides the best performance in sparse domain, but we can see the performance in features original domain is still competing, as well, due to employing less correlated features than that of COVID-specific dataset, see Fig. 7. However, even with employing 6 out of 13 features in the case of MEO-based feature selection in sparse domain, we can obtain accuracy of around 97 % as demonstrated. The introduced performance with CBC dataset competes the state-of-the-art performance by Shaban et al in (2021) which achieves accuracy of 97.6 % employing all features while the proposed prediction algorithm achieves accuracy of 99 % employing all features in sparse domain as indicated in Table 6. According to the demonstrated results in Table 5, Table 6, the proposed fused feature selection mechanism is introduced to combine the advantages of MEO and PCC-based feature selection. Hence, the most possible performance can be achieved with the least possible number of features. Fig. 17 indicates the classification reports of testing the proposed COVID diagnosis model based on the proposed fused selection method in both original domain and sparse domain. The fusion mechanism selects 13 features for COVID- specific dataset and 6 features for CBC dataset. These features later are entered to the proposed 1DCNN model for a final classification stage. As indicated from Fig. 17, the fusion mechanism in feature selection helps to enhance the performance of the proposed prediction algorithm compared to the results in Table 5, Table 6, especially when performing training and testing in sparse domain. In sparse domain, we achieved a testing accuracy of 98 %, 99 % for COVID-specific dataset and CBC dataset, respectively. Hence, the proposed COVID prediction algorithm adopts the fused selection mechanism for feature selection. In addition, the training and the testing processes are performed to the sparse samples in .

Fig. 15

Fig. 16

AdaBoost feature importance, for COVID-specific dataset, adopting the followings: 1. PCC-based feature selection in the features original domain (22 feature selected) while applying training and testing once for the original samples in (a), and another for the sparse samples in (b). 2. PCC-based feature selection in sparse domain (14 feature selected) while applying training and testing once for the original samples in (c), and another for the sparse samples in (d). 3. MEO-based feature selection in features original domain (12 feature selected) while applying training and testing once for the original samples in (e), and another for the sparse samples in (f). 4. PCC-based feature selection in sparse domain (12 feature selected) while applying training and testing once for the original samples in (g), and another for the sparse samples in (h).

Fig. 17

Classification reports of testing the proposed COVID diagnosis model based on the proposed fused selection method and 1DCNN in both original domain (a), (c) and sparse domain (b), (d). The first two rows belong to COVID-specific dataset while the other rows belong to CBC dataset.

AdaBoost feature importance employing all features for COVID-specific dataset in the first row and CBC dataset in the second row. (a) and (c) in the features original domain while (b) and (d) in the sparse domain. AdaBoost feature importance, for COVID-specific dataset, adopting the followings: 1. PCC-based feature selection in the features original domain (22 feature selected) while applying training and testing once for the original samples in (a), and another for the sparse samples in (b). 2. PCC-based feature selection in sparse domain (14 feature selected) while applying training and testing once for the original samples in (c), and another for the sparse samples in (d). 3. MEO-based feature selection in features original domain (12 feature selected) while applying training and testing once for the original samples in (e), and another for the sparse samples in (f). 4. PCC-based feature selection in sparse domain (12 feature selected) while applying training and testing once for the original samples in (g), and another for the sparse samples in (h). Classification reports of testing the proposed COVID diagnosis model based on the proposed fused selection method and 1DCNN in both original domain (a), (c) and sparse domain (b), (d). The first two rows belong to COVID-specific dataset while the other rows belong to CBC dataset.

Impact of classification model:

In this subsection, we evaluate the proposed COVID-19 diagnosis model by comparing its 1DCNN model with other ML models, such as GNB, DT, ET, GBT, KNN, LR, RF, LSVM, RSVM, XGBoost, LDA, QDA, AdaBoost. A great review of these models can be found in (Tang et al., 2014). With the traditional ML techniques, we employed 10-fold stratified cross-validation to avoid the problem of overfitting. It is the best practice when developing a traditional ML model. Then, a grid search procedure is employed to find the best combination hyperparameters (e.g., learning rate, interaction depth) using AUC as reference measure. In Table 7, Table 8 , computitative comparisons are indicated between the proposed 1DCNN model and the prementioned ML algorithms. Table 7 shows the testing results for COVID-specific dataset while Table 8 demonstrates the testing results for CBC dataset. As indicated, the performance of the proposed 1DCNN with the adopted fused feature selection mechanism shows superior performance compared to the other ML models, especially when the training and testing processes are performed to sparse samples. However, performing the training and testing processes in the original features domain still shows good results and, so far, better than the tradition ML methods. Moreover, we can see that AdaBoost and ET show good performance among the other traditional ML techniques. In Fig. 18 , the training-validation performance is indicated, in terms of accuracy, for the proposed COVID-19 prediction algorithm, based on the introduced fusion-based feature selection and 1DCNN model, once for the original samples in , and another for the sparse samples in .

Table 7

Training and testing Domain	Classifier	ACC	PPV	SV	F1	AUC	SP	Macro-			Micro-
Training and testing Domain	Classifier	ACC	PPV	SV	F1	AUC	SP	PPV	SV	F1-	PPV	SV	F1-
Original features domain	LSVM	0.84	0.858	0.891	0.874	0.902	0.907	0.836	0.824	0.828	0.84	0.84	0.84
	RSVM	0.897	0.909	0.927	0.918	0.939	0.931	0.893	0.887	0.89	0.897	0.897	0.897
	LR	0.836	0.893	0.837	0.863	0.903	0.867	0.827	0.835	0.829	0.836	0.836	0.836
	RF	0.928	0.941	0.943	0.942	0.98	0.963	0.924	0.923	0.923	0.928	0.928	0.928
	AdaBoost	0.932	0.934	0.958	0.946	0.983	0.984	0.932	0.924	0.927	0.932	0.932	0.932
	DT	0.883	0.902	0.911	0.906	0.874	0.941	0.877	0.874	0.875	0.883	0.883	0.883
	KNN	0.869	0.868	0.932	0.899	0.936	0.949	0.872	0.85	0.857	0.869	0.869	0.869
	XGBoost	0.899	0.914	0.926	0.919	0.956	0.947	0.896	0.891	0.892	0.899	0.899	0.899
	GNB	0.787	0.794	0.888	0.838	0.858	0.925	0.783	0.756	0.763	0.787	0.787	0.787
	ET	0.931	0.934	0.956	0.945	0.983	0.979	0.93	0.923	0.926	0.931	0.931	0.931
	LDA	0.824	0.835	0.894	0.863	0.895	0.92	0.821	0.802	0.808	0.824	0.824	0.824
	QDA	0.783	0.789	0.89	0.836	0.869	0.917	0.78	0.749	0.757	0.783	0.783	0.783
	OURS	0.967	0.974	0.958	0.965	0.971	0.984	0.971	0.956	0.965	0.967	0.955	0.964

Sparse domain	LSVM	0.844	0.868	0.884	0.875	0.904	0.909	0.838	0.832	0.833	0.844	0.844	0.844
	RSVM	0.894	0.908	0.924	0.916	0.939	0.939	0.89	0.885	0.887	0.894	0.894	0.894
	LR	0.836	0.891	0.841	0.864	0.905	0.875	0.828	0.835	0.829	0.836	0.836	0.836
	RF	0.933	0.939	0.954	0.946	0.982	0.981	0.931	0.926	0.928	0.933	0.933	0.933
	AdaBoost	0.94	0.942	0.964	0.952	0.985	0.987	0.94	0.933	0.936	0.94	0.94	0.94
	DT	0.889	0.9	0.923	0.911	0.878	0.944	0.885	0.878	0.881	0.889	0.889	0.889
	KNN	0.892	0.886	0.948	0.916	0.947	0.965	0.897	0.874	0.882	0.892	0.892	0.892
	XGBoost	0.896	0.905	0.93	0.917	0.951	0.952	0.893	0.885	0.888	0.896	0.896	0.896
	GNB	0.822	0.862	0.849	0.855	0.869	0.885	0.811	0.813	0.811	0.822	0.822	0.822
	ET	0.94	0.942	0.963	0.952	0.985	0.987	0.94	0.933	0.936	0.94	0.94	0.94
	LDA	0.824	0.837	0.893	0.863	0.896	0.912	0.82	0.803	0.808	0.824	0.824	0.824
	QDA	0.79	0.803	0.88	0.839	0.865	0.912	0.785	0.762	0.768	0.79	0.79	0.79
	OURS	0.983	0.982	0.975	0.976	0.987	0.984	0.98	0.971	0.973	0.98	0.971	0.972

Table 8

Computitative comparison between some of the traditional ML techniques and the proposed 1DCNN model while training and testing performed once in original features’ domain and in another in sparse domain for CBC dataset (6 selected features out of 13). The top performer is bolded, while the second is underlined.

Training and testing Domain	Classifier	ACC	PPV	SV	F1	AUC	SP	Macro-			Micro-
Training and testing Domain	Classifier	ACC	PPV	SV	F1	AUC	SP	PPV	SV	F1-	PPV	SV	F1-
Original features’ domain	LSVM	0.828	0.848	0.873	0.860	0.880	0.845	0.822	0.815	0.818	0.828	0.828	0.828
	RSVM	0.893	0.897	0.930	0.913	0.937	0.908	0.892	0.882	0.886	0.893	0.893	0.893
	LR	0.810	0.844	0.844	0.843	0.881	0.828	0.802	0.801	0.801	0.810	0.810	0.810
	RF	0.956	0.954	0.974	0.964	0.994	0.989	0.957	0.951	0.953	0.956	0.956	0.956
	AdaBoost	0.976	0.976	0.985	0.980	0.998	0.983	0.976	0.974	0.975	0.976	0.976	0.976
	DT	0.932	0.940	0.950	0.944	0.928	0.971	0.932	0.928	0.929	0.932	0.932	0.932
	KNN	0.955	0.958	0.968	0.963	0.988	0.960	0.955	0.951	0.952	0.955	0.955	0.955
	XGBoost	0.946	0.940	0.974	0.957	0.981	0.977	0.949	0.939	0.943	0.946	0.946	0.946
	GNB	0.694	0.854	0.597	0.701	0.827	0.557	0.715	0.720	0.693	0.694	0.694	0.694
	ET	0.977	0.975	0.988	0.981	0.998	0.983	0.978	0.974	0.976	0.977	0.977	0.977
	LDA	0.802	0.838	0.837	0.837	0.871	0.810	0.794	0.792	0.792	0.802	0.802	0.802
	QDA	0.760	0.889	0.692	0.777	0.877	0.644	0.768	0.779	0.758	0.760	0.760	0.760
	OURS	0.984	0.981	0.988	0.982	0.998	0.984	0.98	0.984	0.981	0.98	0.983	0.98

Sparse domain	LSVM	0.816	0.841	0.860	0.850	0.871	0.787	0.809	0.804	0.805	0.816	0.816	0.816
	RSVM	0.876	0.870	0.936	0.902	0.936	0.931	0.879	0.860	0.867	0.876	0.876	0.876
	LR	0.801	0.836	0.837	0.836	0.874	0.805	0.793	0.791	0.791	0.801	0.801	0.801
	RF	0.969	0.976	0.974	0.975	0.995	0.966	0.968	0.968	0.968	0.969	0.969	0.969
	AdaBoost	0.980	0.981	0.986	0.983	0.998	0.977	0.980	0.978	0.979	0.980	0.980	0.980
	DT	0.938	0.951	0.947	0.949	0.936	0.948	0.935	0.936	0.935	0.938	0.938	0.938
	KNN	0.960	0.968	0.966	0.967	0.988	0.966	0.959	0.959	0.958	0.960	0.960	0.960
	XGBoost	0.942	0.950	0.954	0.952	0.981	0.948	0.940	0.938	0.939	0.942	0.942	0.942
	GNB	0.752	0.877	0.689	0.770	0.842	0.667	0.759	0.769	0.750	0.752	0.752	0.752
	ET	0.980	0.982	0.985	0.983	0.998	0.983	0.980	0.978	0.979	0.980	0.980	0.980
	LDA	0.798	0.830	0.840	0.835	0.868	0.822	0.790	0.787	0.788	0.798	0.798	0.798
	QDA	0.777	0.864	0.751	0.803	0.872	0.695	0.773	0.784	0.772	0.777	0.777	0.777
	OURS	0.991	0.983	0.99	0.985	0.998	0.984	0.981	0.988	0.987	0.981	0.987	0.981

Fig. 18

Training-validation performance in terms of accuracy for the proposed COVID prediction algorithm. The first row for COVID-specific dataset and the other one for CBC dataset. The training in (a), (c) is performed in features original domain and the others (b), and (d) in sparse domain. The training is performed over the selected features by the proposed fused-based feature selection mechanism which results 13 features for COVID-specific dataset and 6 features for CBC-dataset.

Computitative comparison between some traditional ML techniques and the proposed 1DCNN model while training and testing performed once in original features domain and another in sparse domain for COVID-specific dataset (13 selected Features out of 33). The top performer is bolded, while the second is underlined. Computitative comparison between some of the traditional ML techniques and the proposed 1DCNN model while training and testing performed once in original features’ domain and in another in sparse domain for CBC dataset (6 selected features out of 13). The top performer is bolded, while the second is underlined. Training-validation performance in terms of accuracy for the proposed COVID prediction algorithm. The first row for COVID-specific dataset and the other one for CBC dataset. The training in (a), (c) is performed in features original domain and the others (b), and (d) in sparse domain. The training is performed over the selected features by the proposed fused-based feature selection mechanism which results 13 features for COVID-specific dataset and 6 features for CBC-dataset.

Comparison to the state-of-the-art

In this subsection, we compare the proposed COVID-19 prediction algorithm to other prediction methods from previous studies (Alakus and Turkoglu, 2020, AlJame et al., 2020, Brinati et al., 2020, Cabitza et al., 2021, Shaban et al., 2021), see Table 1 for their dependencies. The proposed ERLX method in AlJame et al. (2020), is the most similar algorthim to ours, especially in the data preprocessing steps, but it doesn’t have clear feature selection mechanism and it employs ensemble of different ML techniques. The proposed HDS algorithm in Shaban et al. (2021) employed fuzzy inference engine and Deep Neural Network for their prediction scheme. The rest of studies (Brinati et al., 2020, Cabitza et al., 2021) didn’t employ any feature selection algorithms and just employ ensemble of different ML algorithms for the classification task. On the other side, Alakus and Turkoglu (2020) introduce a new ensemble of different deep learning models. Hence, for the sake of fair comparison between these prementioned studies, we employed the same preprocessing steps and the same adopted dataset while keeping the other steps adopted by each study. In addition, we employed their available codes, unless there is no one. In Fig. 19 , a comparison of testing the prementioned studies is indicated as classification reports. As shown, the proposed COVID-19 prediction model outperformes the state-of-the-art in both datasets, i.e., it achieves accuracy of 98 %, 99 % for COVID-specific and CBC datasets, respectively. In addition, this superior performance is achieved only with less than half the size of the available features/blood exams. Ours employs 6 out of 13 for CBC dataset and 13 out of 33 for COVID-specific dataset, while the other studies employs, mostly, all available features as in (Brinati et al., 2020, Cabitza et al., 2021, Shaban et al., 2021) or larger feature number than ours as in (Alakus and Turkoglu, 2020, AlJame et al., 2020).

Fig. 19

Classification reports of testing the following studies: (Alakus & Turkoglu, 2020) {18/33–13/13} as (a), (AlJame et al., 2020) {18/33–13/13} as (b), (Cabitza et al., 2021) {33/33–13/13} as (c), (Brinati et al., 2020) {33/33–13/13} as (d), (Shaban et al., 2021) {33/33–13/13} as (e), and Ours {13/33–6/13} as (f). {} denotes {selected features/total no. of features for COVID-specific dataset – selected features/total no. of features for CBC dataset. (?.1) for COVID-specific dataset and (?.2) for CBC dataset.

Conclusion

In this paper, we proposed a novel COVID-19 prediction model based on routine blood tests. In this model, we exploited the benefits of sparsifying the feature pool to get the real dependencies between the employed blood tests. In this employed sparse domain, we succeeded to reduce the feature pool size to less than the half using the adopted hybrid feature selection scheme. This scheme fuses the elimination decisions of Pearson's correlation coefficient and a new Minkowski-based equilibrium optimizer. Then, with a deep convolutional model, we proved that the proposed algorithm can efficiently predict COVID-19 infection with small number of blood tests. Hence, scarce healthcare resources can be more effectively prioritized, especially in developing and low middle income countries. The major limitation in this study is training time of 1DCNN compared to the traditional ML techniques, but still PCR tests typically take hours to perform, and the target is to find alternative predictive models that still compete with accurate results to improve healthcare resource prioritization and inform patient care. Hence, in the future work, we intend to reduce the computational cost of the whole prediction algorithm, especially the training stage.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Algorithm 1: The introduced decomposition process for the feature poolF

1. Input: F, ζ=1, Ψ=n×k,∊=0.001,q=1
2. Initialize: FL0:=F, FS0:=0, i:=0
3. While‖F-FL-FS‖F2/‖F‖F2>∊do // the stopping criterion
4. i:=i+1
5. F∼L=F-FSi-1F-FSi-1TqF-FSi-1 // following the formulation F∼=FFTqF in power scheme; at q=1, we got F∼=F, F≈FL+FS
6. X1=F∼LB1, B2=X1
7. X2=F∼LTX1=Q2R2, X1=F∼LX2=Q1R1
8. IfrankB2TX1<ζ then ζ:=rankB2TX1, go to the first step; end
9. FLi=F∼L12q+1=Q1R1B2TX1-1R2T12q+1Q2T		// see Eq. (9)
10. FSi=HγF-FLi,γ is the nonzero subset of the first Ψ largest entries of F-FLi
		// See Eq. (4), 9
11. End while
12. Output: FL, FS

Algorithm 2: Pseudo code of the traditional EO optimizer

1:	Initialize the solution’s/ particle’s population randomly, i=1,⋯,n;
		// Eq. (11), n is the search agents
2:	Assign a small number to the equilibrium candidates’ objective/ fitness function ξ;
// ξ=0.0001
3:	Select the equilibrium candidates f→eq1,f→eq2,f→eq3,f→eq4 from the population
4:	Update the states of candidate solutions using search equation (Eq. (14))
5:	Assign values to the following free parameters η1=2,η2=1,pω=0.5
6:	WhileI¨<NI¨	// the iteration no. NI¨=100
7:	Fori=1:n
8:	Calculate the fitness function of the ith particle ξ(f→i)	// follow Eq. (12) to calculate ξ
9:	Ifξf→i>ξ(f→eq1)
10:	Replace f→eq1 with f→i and ξ(f→eq1) and ξf→i
11:	Elseifξf→i<ξ(f→eq1)&ξf→i>ξ(f→eq2)
12:	Replace f→eq2 with f→i and ξ(f→eq2) and ξf→i
13:	Elseifξf→i<ξ(f→eq1)&ξf→i<ξ(f→eq2)&ξf→i>ξ(f→eq3)
14:	Replace f→eq3 with f→i and ξ(f→eq3) and ξf→i
15:	Elseifξf→i<ξ(f→eq1)&ξf→i<ξ(f→eq2)&ξf→i<ξ(f→eq3)&ξf→i>ξ(f→eq4)
16:	Replace f→eq4 with f→i and ξ(f→eq4)andξf→i
17:	End If
18:	End for
19:	f→eqavg=f→eq1+f→eq2+f→eq3+f→eq4/4
20:	Construct the equilibrium pool F→eq=f→eq1,f→eq2,f→eq3,f→eq4,f→eqavg
21:	Accomplish memory saving if Ï>1
22:	Assign t according to Eq. (15)
23:	Fori=1:n
24:	Choose one candidate, randomly, from the equilibrium pool F→eq
25:	Generate random vectors of r→ and α→ from Eq. (17)
26:	Construct ω→,Ω→,G→0,G→ according to Eq. (19)
27:	Update concentration f→ according to Eq. (14)
28:	End for
29:	I¨=I¨+1
30:	End While

37 in total

1. Reducing the dimensionality of data with neural networks.

Authors: G E Hinton; R R Salakhutdinov
Journal: Science Date: 2006-07-28 Impact factor: 47.728

2. A prediction model of outcome of SARS-CoV-2 pneumonia based on laboratory findings.

Authors: Gang Wu; Shuchang Zhou; Yujin Wang; Wenzhi Lv; Shili Wang; Ting Wang; Xiaoming Li
Journal: Sci Rep Date: 2020-08-20 Impact factor: 4.379

3. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR.

Authors: Victor M Corman; Olfert Landt; Marco Kaiser; Richard Molenkamp; Adam Meijer; Daniel Kw Chu; Tobias Bleicker; Sebastian Brünink; Julia Schneider; Marie Luisa Schmidt; Daphne Gjc Mulders; Bart L Haagmans; Bas van der Veer; Sharon van den Brink; Lisa Wijsman; Gabriel Goderski; Jean-Louis Romette; Joanna Ellis; Maria Zambon; Malik Peiris; Herman Goossens; Chantal Reusken; Marion Pg Koopmans; Christian Drosten
Journal: Euro Surveill Date: 2020-01

4. Comparison of deep learning approaches to predict COVID-19 infection.

Authors: Talha Burak Alakus; Ibrahim Turkoglu
Journal: Chaos Solitons Fractals Date: 2020-07-11 Impact factor: 5.944

5. Development of machine learning models to predict RT-PCR results for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in patients with influenza-like symptoms using only basic clinical data.

Authors: Thomas Langer; Martina Favarato; Riccardo Giudici; Gabriele Bassi; Roberta Garberi; Fabiana Villa; Hedwige Gay; Anna Zeduri; Sara Bragagnolo; Alberto Molteni; Andrea Beretta; Matteo Corradin; Mauro Moreno; Chiara Vismara; Carlo Federico Perno; Massimo Buscema; Enzo Grossi; Roberto Fumagalli
Journal: Scand J Trauma Resusc Emerg Med Date: 2020-12-01 Impact factor: 2.953

6. Application of Machine Learning in Diagnosis of COVID-19 Through X-Ray and CT Images: A Scoping Review.

Authors: Hossein Mohammad-Rahimi; Mohadeseh Nadimi; Azadeh Ghalyanchi-Langeroudi; Mohammad Taheri; Soudeh Ghafouri-Fard
Journal: Front Cardiovasc Med Date: 2021-03-25

Review 7. Machine Learning Approaches in COVID-19 Diagnosis, Mortality, and Severity Risk Prediction: A Review.

Authors: Norah Alballa; Isra Al-Turaiki
Journal: Inform Med Unlocked Date: 2021-04-03

8. Exploration of prognostic factors for critical COVID-19 patients using a nomogram model.

Authors: Juan Li; Lili Wang; Chun Liu; Zhengquan Wang; Yi Lin; Xiaoqi Dong; Rui Fan
Journal: Sci Rep Date: 2021-04-14 Impact factor: 4.379

9. Diagnostic utility of clinical laboratory data determinations for patients with the severe COVID-19.

Authors: Yong Gao; Tuantuan Li; Mingfeng Han; Xiuyong Li; Dong Wu; Yuanhong Xu; Yulin Zhu; Yan Liu; Xiaowu Wang; Linding Wang
Journal: J Med Virol Date: 2020-04-10 Impact factor: 2.327

10. A predictive tool for identification of SARS-CoV-2 PCR-negative emergency department patients using routine test results.

Authors: Rohan P Joshi; Vikas Pejaver; Noah E Hammarlund; Heungsup Sung; Seong Kyu Lee; Al'ona Furmanchuk; Hye-Young Lee; Gregory Scott; Saurabh Gombar; Nigam Shah; Sam Shen; Anna Nassiri; Daniel Schneider; Faraz S Ahmad; David Liebovitz; Abel Kho; Sean Mooney; Benjamin A Pinsky; Niaz Banaei
Journal: J Clin Virol Date: 2020-06-10 Impact factor: 3.168