Literature DB >> 35077935

Detection of COVID-19 severity using blood gas analysis parameters and Harris hawks optimized extreme learning machine.

Jiao Hu¹, Zhengyuan Han², Ali Asghar Heidari³, Yeqi Shou⁴, Hua Ye⁵, Liangxing Wang⁶, Xiaoying Huang⁷, Huiling Chen⁸, Yanfan Chen⁹, Peiliang Wu¹⁰.

Abstract

Coronavirus disease-2019 (COVID-19) has made the world more cautious about widespread viruses, and a tragic pandemic that was caused by a novel coronavirus has harmed human beings in recent years. The new coronavirus pneumonia outbreak is spreading rapidly worldwide. We collect arterial blood samples from 51 patients with a COVID-19 diagnosis. Blood gas analysis is performed using a Siemens RAPID Point 500 blood gas analyzer. To accurately determine the factors that play a decisive role in the early recognition and discrimination of COVID-19 severity, a prediction framework that is based on an improved binary Harris hawk optimization (HHO) algorithm in combination with a kernel extreme learning machine is proposed in this paper. This method uses specular reflection learning to improve the original HHO algorithm and is referred to as HHOSRL. The experimental results show that the selected indicators, such as age, partial pressure of oxygen, oxygen saturation, sodium ion concentration, and lactic acid, are essential for the early accurate assessment of COVID-19 severity by the proposed feature selection method. The simulation results show that the established methodlogy can achieve promising performance. We believe that our proposed model provides an effective strategy for accurate early assessment of COVID-19 and distinguishing disease severity. The codes of HHO will be updated in https://aliasgharheidari.com/HHO.html.

Entities: Chemical

Keywords: Blood; COVID-19; Coronavirus disease; Extreme learning machine; Feature selection; Harris hawk optimization

Mesh：

Year: 2021 PMID： 35077935 PMCID： PMC8701842 DOI： 10.1016/j.compbiomed.2021.105166

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 4.589

Introduction

The International Committee on the Taxonomy of Viruses (ICTV) described a new virus, namely, severe acute respiratory syndrome coronavirus (SARS-CoV-2) [1]. SARS-CoV-2 is believed to be the pathogen that causes viral pneumonia, which has caused a worldwide pandemic [2]. On February 11, 2020, the viral pneumonia that is caused by SARS-CoV-2 was named coronavirus disease 2019 (COVID-19) by the World Health Organization (WHO) [3]. The case fatality ratio (CFR) of COVID-19 is significantly lower than those of severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) [4,5]. However, COVID-19's long incubation period of approximately two weeks and the presence of asymptomatic infections increase the risk of COVID-19 virus infection and promote its spread [6]. As of 6:18pm CET, 4 January 2022, according to WHO data3 , COVID-19 has spread globally, with more than 290,959,019 cases of COVID-19 diagnosed in various countries, which have resulted in more than 5,446,753 related deaths (https://www.who.int/). The most common clinical manifestation of 2019-nCoV infection includes fever, dry cough, dyspnea, chest pain, myalgia, and fatigue [7]. According to a previous clinical study, 20.7%–31.4% of COVID-19 patients developed a severe form of this disease, such as adult respiratory distress syndrome (ARDS) [[8], [9]]. Furthermore, 4.9%–11.5% of patients with COVID-19 needed advanced life support techniques in the intensive care unit (ICU) [8]. The main characteristics of severe COVID-19 are rapid progression, ARDS, multiple organ dysfunction (MOF), and high fatality rate [10]. According to previous reports, elderly patients with comorbidities are more susceptible to being infected by COVID-19 [11,12]. Moreover, the mortality rate of elderly patients (>65 years old) with comorbidities and ARDS is significantly increased [13]. The rapid progression of COVID-19 places substantial strain on health care systems and hospital critical care resources. Severe COVID-19 patients may experience rapid deterioration if not treated in the ICU timely [14]. Therefore, it is vital to conduct accurate and frequent clinical assessments [[8], [9], [15], [16], [17]]. However, due to resources being stretched thin and lack of experience and prior knowledge of experts, accurate and frequent assessment is not easy. Effective novel prognostic model systems that can help monitor changes in the condition of COVID-19 patients can guide the effective use of hospital resources. Recently, machine learning methods have been used by many medical workers to help solve medical problems. According to many studies, machine learning (ML) techniques and artificial intelligence (AI) methods have been widely implemented in the diagnosis of COVID-19 [[18], [19], [20], [21]], and many studies have applied swarm intelligence methods to COVID-19 image segmentation [22,23]. Pham et al. [24] trained convolutional neural networks (CNNs) to fine-tune COVID-19 detection in chest slices. Canayaz [25] used deep learning models such as AlexNet, VGG19, GoogleNet, and ResNet to conduct feature extraction and select optimal potential features; two meta-heuristic algorithms, namely, binary particle swarm optimization and binary gray wolf optimization, were used. Al-Falluji [26] et al. utilized X-ray images for deep learning and retrieved essential biomarkers that were related to COVID-19 disease detection. Shaban et al. [27] combined a fuzzy inference engine with deep neural networks (DNNs) to propose a new hybrid diagnostic strategy (HDS) for classifying newly infected individuals to determine whether they are infected. Sun et al. [28] proposed an adaptive feature selection guided deep forest (AFS-DF) approach that is based on chest CT images for COVID-19 classification. Shaban et al. [29] introduced a novel COVID-19 patient detection strategy (CPDS) that is based on hybrid feature selection and enhanced the KNN classifier, which significantly improved the diagnostic accuracy. Dey et al. [30] built various machine learning models for predicting protein–protein interactions (PPIs) between COVID-19 and human proteins and further validated them by biological experiments. Abraham et al. [31] combined multiple CNN-extracted features with a correlation-based feature selection (CFS) technique and a Bayesian classifier for COVID-19 prediction. Liu et al. [32] developed and validated a complete machine learning framework for chest CT images to distinguish COVID-19 from global pneumonia (GP). Tuncer et al. [33] proposed a novel intelligent computer vision method for automatic detection of the COVID-19 virus and conducted 10-fold cross-validation based on the SVM classifier, which showed a classification accuracy of 100.0%. Casiraghi et al. [34] proposed an interpretable machine learning system that provides simple decision criteria for use by clinicians in assessing patient risk. Novitasari et al. [35] used convolutional neural network methods such as feature extraction and support vector machines (SVM) as classification methods to detect whether a patient being examined was healthy, coronavirus-positive, or only had pneumonia. In this work, we proposed a new machine learning framework, namely, the swarm intelligence augmented kernel extreme learning machine (KELM) [30], for predicting the severity of COVID-19. Notably, for the first time, the binary Harris hawk optimization (HHO)4 , which is improved using specular reflection learning, is used in combination with the KELM classifier for feature selection. The experimental results and simulation results demonstrate the superior performance of the method. The results show that bHHOSRL_KELM performs well in determining which factors play a decisive role in the outcome in COVID-19 diagnosis. The proposed bHHOSRL_KELM was compared with other classifiers, such as fuzzy k-nearest neighbors(FKNN), k-nearest neighbors(KNN), multilayer perceptron (MLP), and support vector machines(SVM), in terms of four classification metrics, namely, classification accuracy, sensitivity, specificity, and Mathews correlation coefficient (MCC). In addition, nine other feature selection methods that are based on swarm intelligence algorithms were used to evaluate the performance of the developed bHHOSRL_KELM based on the fitness values throughout the iterative process. The experimental results show that the developed bHHOSRL_KELM has high predictive performance. The main contributions of this study are as follows: (1) an efficient diagnostic aid for COVID-19 in blood specimens was developed; (2) a promising hybrid model was proposed, and the potential of KELM was enhanced using an improved HHO method; and (3) the bHHOSRL_KELM feature selection method was used to identify the most critical features effectively. The remainder of this paper is organized as bellow. The first section describes the current state of research on novel coronaviruses and the application of machine learning methods to novel coronavirus research. The second section presents the dataset that is used in this paper. The third section presents the improved HHO method and its integration with KELM. The fourth section analyze the proposed method and presents the experimental results on COVID-19. The fifth section discusses the proposed method and analyzes the final medical data results.

Materials and methods

Data collection

Our study was approved by the Ethics Committee of the Affiliated Yueqing Hospital of Wenzhou Medical University (Yueqing, China; protocol number 202000002) and complied with the Helsinki declaration. In this single-center retrospective study, our dataset consisted of clinical notes on 51 Chinese COVID-19 patients from a third-level grade-A hospital in eastern China between January 21 and March 10, 2020. The diagnosis was based on the positive detection of SARS-CoV-2 nucleic acid by reverse-transcription polymerase chain reaction (RT–PCR) testing on throat swab samples. We separated the COVID-19 patient dataset into severe COVID-19, which corresponded to 21 samples, and nonsevere COVID-19, which corresponded to 30 samples. For a diagnosis of severe COVID-19, the following requirements should be satisfied: (1) respiratory rate greater than 30/min (respiratory distress); (2) resting oxygen saturation lower than 93%; and (3) oxygenation index (OI) lower than 300 mmHg. We collected arterial blood samples from all 51 patients with a COVID-19 diagnosis. Blood gas analysis was performed by a SIEMENS RAPID Point 500 blood gas analyzer. The basic clinical information and 22 blood gas analysis parameters (features) are listed in Table 1 . All continuity variables are presented as the mean ± standard deviation (SD) and were analyzed with SPSS Statistics 24.0. An independent sample t-test was used to analyze the continuous variables (age and blood gas analysis parameters). p < 0.05 was considered statistically significant. The statistical analysis results are presented in Table 2 .

Table 1

List of the used features and their abbreviations [36].

	Feature	Abbreviation
F1	Gender	Gender
F2	Age	Age
F3	Hydrogen ion concentration	PH
F4	Partial pressure of carbon dioxide	PaCO₂
F5	Partial pressure of oxygen	PaO₂
F6	Oxygen saturation	SaO₂%
F7	Hemoglobin percentage	Hb
F8	Oxyhemoglobin percentage	HbO₂%
F9	Carboxyhaemoglobin percentage	COHb%
F10	Deoxyhemoglobin percentage	DeOxyHb%
F11	Methaemoglobin percentage	MetHb%
F12	Potassium ion concentration	K⁺
F13	Sodium ion concentration	Na⁺
F14	Chloride ion concentration	Cl⁻
F15	Calcium ion concentration	Ca²⁺
F16	Glucose concentration	GLU
F17	Lactic acid	LAC
F18	Anion gap	AG
F19	Buffer bases	BB
F20	Bases excess	BE
F21	Standard bicarbonate	SB
F22	Actual bicarbonate	AB

Table 2

Comparison of age and blood gas analysis indices between severe COVID-19 and nonsevere COVID-19 [36].

Index	Severe COVID-19 (n = 21)	Nonsevere COVID-19 (n = 30).	p value
Age	61.43 ± 17.64	42.30 ± 11.53	0.00
PH	7.46 ± 0.34	7.43 ± 0.32	0.01
PaCO₂	32.10 ± 4.20	37.55 ± 4.51	0.00
PaO₂	65.13 ± 12.45	103.73 ± 27.87	0.00
SaO₂%	92.73 ± 4.20	98.03 ± 1.00	0.00
Hb	13.50 ± 1.98	14.41 ± 2.16	0.13
HbO₂%	91.58 ± 4.28	96.51 ± 0.97	0.00
COHb%	1.01 ± 0.27	1.02 ± 0.26	0.84
DeOxyHb %	6.84 ± 4.29	1.95 ± 0.98	0.00
MetHb %	0.51 ± 0.28	0.52 ± 0.18	0.97
K⁺	3.36 ± 0.42	3.32 ± 030	0.68
Na⁺	131.00 ± 3.72	136.23 ± 2.74	0.00
Cl⁻	103.33 ± 3.37	107.37 ± 3.87	0.00
Ca²⁺	1.08 ± 0.06	1.12 ± 0.03	0.01
GLU	10.29 ± 3.58	8.16 ± 2.88	0.02
LAC	2.35 ± 1.09	1.70 ± 0.62	0.01
AG	4.86 ± 2.20	4.28 ± 1.41	0.25
BB	−0.24 ± 2.82	0.93 ± 2.48	0.29
BE	−0.86 ± 3.09	0.77 ± 2.79	0.06
SB	24.02 ± 2.55	25.29 ± 2.15	0.06
AB	22.56 ± 3.21	24.58 ± 2.72	0.02

List of the used features and their abbreviations [36]. Comparison of age and blood gas analysis indices between severe COVID-19 and nonsevere COVID-19 [36].

Methods

In this study, the improved HHO with specular reflection learning (HHOSRL) mechanism is used as a learning algorithm in the parcel-based feature selection method, and the binary HHOSRL method is used as a feature selection tool to identify the critical features and evaluate the feature subsets using the KELM model. HHO [37] is a new swarm algorithm that was proposed by Heidari et al., in 2019 inside the class of bio-inspired methods [38]. Since its introduction, it has been used in many situations, such as parameter estimation of photovoltaic models. It is a superior method compared to many other algorithms in recent years [[39], [40], [41], [42]], such as the colony predation algorithm (CPA) [43]. HHO has the unique feature that Harris hawks can cooperate in groups to chase prey and adjust the chase pattern according to the dynamics of the situation and the escape pattern of the prey. Since its introduction, HHO has been applied to solve many problems such as parameters identification of photovoltaic cells and modules [[44], [45], [46], [47]], feature selection [48,49], optimizing the machine learning models [21,50], engineering design problems [47,[51], [52]], web service composition [52], bankruptcy prediction [53], and multi-objective problems [54].

Exploration phase

Harris hawks perch at random in a specified location and find their prey through 2 strategies, which are expressed mathematically in Eq. (1).where and are the positions of individuals at the current and next iterations, respectively; is the number of iterations; is the randomly selected individual position; and is the prey position, namely, the global optimum of the current evaluation. , , , and are random numbers in [0,1], is used to select the strategy to be used randomly, is the average individual position, is the position of the kth individual in the population and is the population size.

Conversion between exploration and exploitation

The HHO algorithm shifts between exploration and various exploitation behaviors based on the prey's escape energy, which is defined as follows:where is the initial energy of the prey, which is a random number in [-1,1] that is updated automatically at each iteration; is the number of iterations; and is the maximum number of iterations. The search phase is entered when , and the exploitation phase is entered when .

Exploitation phase

Define r as a random number between [0,1] for choosing the exploitation strategy. When and , a soft besiege strategy is used to update the position.where represents the difference between the prey position and the individual's current position and is a random number in [0,2]. When and , a hard besiege strategy is used to update the position. When and r < 0.5, a soft besiege strategy with a progressive rapid dive is used to update the position. where is the fitness function; S is a 2-dimensional random vector with elements that are random numbers in [0,1]; and is the mathematical expression for Levy flight. When and r < 0.5, a soft besiege strategy with a progressive rapid dive is used to update the position.

Proposed binary HHOSRL

In this paper, we address a binary optimization case, which corresponds to whether a feature is selected or not. In a discrete binary space, a solution is restricted to two numbers (0 and 1). For our case, an upgraded binary HHO is proposed. In this study, we represent a solution as a d-dimensional vector, where d is the number of attributes of the dataset. A value of 1 indicates that the corresponding attribute in the d-dimensional sample is selected, whereas a value of 0 indicates the attribute is not selected. Updating equations such as Eq. (1), Eq. (3), Eq. (4), Eq. (5), and Eq. (8) are useless in dealing with binary optimization tasks since these solutions do not have only two values, namely, "0" and "1". To overcome this problem, the method discretizes the updated hawks' position vector to a binary vector. The updating equation is as follows:where is a random number in [0,1] and is the updated location at the t-th iteration. The expression for is presented below. This paper updates the solution using a variation operator based on specular reflection learning (SRL) to further explore more combinations of properties.where is the binary value of the th dimension of the i-th individual in the t-th iteration after mutation and is a variable that controls the range of the new position after specular reflection, which is calculated via Eq. (14).where and are two random numbers that are uniformly distributed between 0 and 1 and is the neighborhood radius, which is also called the elasticity factor. From Eq. (14), if > , will appear in the right neighborhood of ; otherwise, will appear in the left neighborhood of . Therefore, it is reasonable to set to [0,1] to balance the search space of the left and right neighbors. Additionally, is also set to [0,1], namely, all possible values within a radius of the neighborhood are obtained. Procedure of HHOSRL

Classification based on KELM

KELM

The ELM is a special kind of single-implicit feedforward neural network with three separate layers: input, hidden, and output. For a dataset of training samples , when the number of implicit layer nodes of the ELM is and the excitation function is :where the vector of weights between the hidden layer node and the output layer node is . ELM differs completely from traditional iterative learning algorithms in that it randomly selects the input weights ω and bias b of the implicit layer nodes before analytically calculating the least-square solution of the output weights β. Reducing the training error rate and optimizing the generalization capabilities are the goals of these calculations. Eq. (15) is expressed in a compact form according to ELM theory: After the excitation function and the number of implicit layer nodes are established, the following 3 steps are conducted to train the ELM on a training dataset. Randomly generate input weights and , ; Calculate the output matrix of the hidden layer ; Calculate the output weight matrix ; where is the Moore-Penrose generalized inverse of the implicit layer output matrix . When is nonsingular, . To eliminate errors in the results of the "sick matrix," according to the strategy of ridge regression, a regularization factor is introduced, and the least-squares solution of the output weights of the network is Therefore, the corresponding ELM output function is If the feature mapping function h(x) is not known, a new kernel-based ELM (KELM) method can be formed by introducing a kernel function into the ELM. In the KELM method, we need to define the kernel matrix , which has the following elements: Then, using Eq. (18), the network output can be expressed as follows: In Eq. (20), the radial basis kernel function is selected as the kernel function :where γ is the nuclear parameter of the RBF kernel function. The two key parameters have been shown to have a large impact on the performance of KELM on many problems, such as second major prediction [55], medical diagnosis [[56], [57], [58], [59], [60], [61]], financial stress prediction [62,63], and recognition of foreign fibers in cotton [64].

Proposed HHOSRL-KELM

To find more representative attributes in the dataset to help us make a medical diagnosis, this paper uses HHOSRL-KELM as a feature selection method to choose the optimal feature subset. First, we use HHOSRL as the optimization algorithm to find the optimal subset, and after finding the optimal features, we use KELM as the classifier for the classification task. Fig. 1 shows a HHOSRL-KELM flowchart. It illustrates the process of finding key factors for a new coronary pneumonia blood sample. The main steps of the HHOSRL-KEML method are described below. where is the classification error rate of KELM, is the number of features in the data sample, represents the number of features in the selected feature subset, and are two weights that reflect the importance of the classification error rate and the length of the selected features, respectively In this paper, we set = 0.99 and = 0.01, which are values that are commonly used in many works [48,49].

Fig. 1

Flowchart of the HHOSRL-KELM

Initialize the parameters of the HHOSRL method, such as population size, search space boundary, variance probability, maximum number of iterations, and initial escape energy. Randomly initialize the binary population of Harris hawks. Use the agent's binary value in each dimension to represent the subset selection of the dataset (1 indicates that the feature is selected, and 0 indicates that it is not selected). Calculate the fitness value of the selected feature subset for each hawk as follows: Update the population of agents according to the HHOSRL algorithm. Select the individual with the smallest fitness value as the optimal solution. Determine whether the termination condition has been reached, namely, whether the maximum number of iterations has been reached. If yes, proceed to the next step; otherwise, repeat Step 3 until the termination condition is met. Return the final optimal solution as the selected feature subset. Use the final feature subset as the input parameter of KELM to obtain the final classification result. Using the classification results that were obtained in Step 9, calculate the classification error accuracy, number of selected feature subsets, sensitivity, specificity, and other evaluation criteria. Flowchart of the HHOSRL-KELM

Experiments and analysis

Experimental setup

To verify the effectiveness of the proposed HHOSRL-KELM method, the traditional BHHO, bGWO, bMFO, bWOA, and BPSO were used for comparison. bHHO is a binary HHO algorithm that is obtained by discretizing the original HHO algorithm with the sigmoid method. bGWO, bMFO, bWOA, and BPSO are feature selection algorithms that were proposed by various scholars and perform well on the UCI dataset. Additionally, to further evaluate the performance of the HHOSRL, experiments were also done on a practical dataset to evaluate the performance of the HHOSRL in combination with various classifiers. The blood samples were normalized to the range of [−1,1] before developing the classification model. For comparison purposes, all implementations were performed using the same simulation parameters, as per rules for fair comparisions in machine learning [65]. The max_iter and popsize were set to 50 and 20, respectively. Other parameters in WOA, MFO, and GWO were set to those that were used in the original manuscripts [[66], [67], [68]]. Classification performance was evaluated using 10-fold cross-validation (CV) analysis to obtain an objective result. Additionaly, to evaluate the performance of HHOSRL-KELM, we considered five commonly used evaluation criteria: classification accuracy (ACC), specificity, sensitivity, number of selected feature subsets, and MCC [69].

Performance metrics

We utilized four mutual rules based on the confusion matrix to validate the efficacy of the classifier. Full definitions of these metrics are provided in Refs. [70,71]. Here, we present their formulas to avoid discussions that are outside the scope of this study: where all variables are as defined in Ref. [71]. In addition, the MCC was used to carefully evaluate the classifier's performance because it provides an objective predictive valuation [72].

Analysis of the experimental results

It is well known that classification methods play an important role in the performance of wrapped feature selection approches [20,[73], [74], [75], [76], [77], [78]]. Therefore, to evaluate the performance of the proposed HHOSRL method, it was combined with five classification methods, namely, SVM, KNN, FKNN, MLP, and KELM, and applied to the feature selection of blood samples. To evaluate the reliability of the results, we analyzed these five feature selection methods on blood data in terms of four aspects: MCC, ACC, sensitivity, and specificity. The parameter settings of these five classification methods are presented in Table 3 .

Table 3

Parameter settings for the five methods.

Method	Parameter values
bHHOSRL_FKNN	K=1,m=2
bHHOSRL_SVM	C=850,γ=0.17
bHHOSRL_KNN	K=1
bHHOSRL_MLP	C=88,γ=1024
bHHOSRL_KELM	M=1

Parameter settings for the five methods. Fig. 2 shows error bar graphs for the five methods on these four criteria. The error line is the magnitude of the mean confidence space, which represents the standard error and is used to visualize the magnitude of the standard deviation. It can be easily observed from the graph that all five classification methods achieve a result of 1 on this criterion and that the error results are all 0. This is sufficient to prove that HHOSRL has a strong learning performance and can accurately find the globally optimal solution. It also shows that HHOSRL has strong stability.

Fig. 2

Comparison of HHOSRL on five well-known classifiers.

Comparison of HHOSRL on five well-known classifiers. In addition to these four evaluation criteria, to compare the advantages and disadvantages of HHOSRL using these five classifiers, the five feature selection algorithms were also analyzed in terms of time consumption, fitness value, size of the selected feature subset and classification error rate. Fig. 3 is a box plot, which is used to view possible outlier data conditions. The blue points in the graph are called error points. The upper and lower middle-box lines represent the upper and lower quartiles with a specified resistance. A horizontal line in the middle represents the median, which reflects the distribution of the experimental results. As shown in the graph, all the time consumptions are relatively small except for that of bHHOSRL_MLP. All the algorithms have small fitness values except for bHHOSRL_SVM, which has a large fitness value and is not concentrated. All the algorithms have classification error rates of 0. Finally, in terms of the size of the feature subset, bHHOSRL_SVM does not perform very well, and the values for bHHOSRL_FKNN and bHHOSRL_KNN are relatively large. The most stable subset size is obtained by bHHOSRL_KELM. Therefore, we finally selected bHHOSRL_KELM as the wrapper method. The convergence plot of the average fitness value from 50 iterations in Fig. 4 shows that bHHOSRL_KELM has the fastest convergence speed and the smallest convergence value.

Fig. 3

Boxplot of the classification performances of the four methods in terms of time, fitness, error, and size.

Fig. 4

Convergence evolution trends of the five methods.

Boxplot of the classification performances of the four methods in terms of time, fitness, error, and size. Convergence evolution trends of the five methods. After determining the classification method, we used bHHOSRL_KELM as the final feature selection method and compared it with eight feature selection methods that were proposed by other scholars and with the feature selection method after discretizing the original HHO. The performance of bHHOSRL_KELM was evaluated in terms of MCC, ACC, sensitivity, and specificity, which are the four commonly agreed upon classification accuracy metrics, and in terms of time consumption, classification error rate, and convergence of the fitness values of the algorithmTo evaluate the impacts of the feature selection part and the efficiency of the developed HHO-based core in the bHHOSRL_KELM technique, we compared it with the method without feature selection. The method was compared with a set of well-regarded classifiers, such as classification decision tree (CART), BP, extreme learning machine (ELM), the ensemble methods including (AdaBoostM1) and RF. We utilized the BP algorithm, CART, RandomF, and AdaBoostM1 in the self-built classifiers from the MATLAB toolbox. The ELM method is available at http://www.ntu.edu.sg/eee/icis/cv/egbhuang.htm. We present the results of the six classifiers in Fig. 5 . As shown in Fig. 5, the proposed bHHOSRL_KELM method with feature selection outperforms the basic classifier without the HHO-based strategy. In addition, the bHHOSRL_KELM technique is the best performing model on the blood dataset in terms of four performance metrics.

Fig. 5

Comparison of bHHOSRL_KELM with well-known classifiers.

Comparison of bHHOSRL_KELM with well-known classifiers. Fig. 5 shows that among the six classification methods, bHHOSRL_KELM performs the best, and AdaBoost ranks second. Pure ELM does not perform best on the blood dataset. This shows that the HHOSRL algorithm in bHHOSRL_KELM can effectively compensate for the disadvantages of the simple classifier in classification to achieve better results. To further evaluate the effectiveness of the developed bHHOSRL_KELM on the blood dataset, bHHOSRL_KELM was compared with common algorithms, namely, bGWO, bMFO, BGSA, bALO, BPSO, BSSA, bWOA, and BBA. Fig. 6 compares these ten algorithms on the four evaluation criteria. As shown in the figure, bGWO, BGSA, and BPSO perform poorly in terms of ACC and MCC. Four algorithms, namely, bMFO, bALO, BBA, and BSSA, reach the maximum value of 1 in most cases on these four criteria. However, according to the error line, these four algorithms have many errors. bHHO does not reach the optimum on any of the four criteria. Finally, the error bar graph of bHHOSRL_KELM shows that the overall accuracy is the highest and the error is the smallest; hence, bHHOSRL_KELM has the best results on these ten feature selection methods.

Fig. 6

Comparison results of 10 algorithms on four classification criteria.

Comparison results of 10 algorithms on four classification criteria. To fully analyze the performance of the bHHOSRL_KELM algorithm, we also compared the algorithm in terms of classification error rate and time consumption. Fig. 7 shows that the classification error rates are larger for bMFO, BBA, BSSA, and bWOA. There are anomalous data points in bGWO, BPSO, bALO, and bHHO in the tenfold cross-validation. This indicates that these algorithms are not very stable. In contrast, bHHOSRL_KELM always attains a value of 0 for both the overall distribution and the anomalies. Therefore, bHHOSRL_KELM has both higher stability and higher classification accuracy. An analysis of the algorithm's classification performance must also consider the time consumption. bHHO is the least time-consuming, and bHHOSRL_KELM is the second most time-consuming. Overall, the classification performance of bHHOSRL_KELM is the best.

Fig. 7

Boxplot of the performances of the ten methods in terms of error and time consumption.

Boxplot of the performances of the ten methods in terms of error and time consumption. Finally, the results of the feature selection were analyzed. To determine whether the subset of features that were selected by bHHOSRL_KELM are substantially helpful for medical diagnosis, tenfold cross-validation of the classification results was performed in this paper. The number of times that each feature was selected in each experiment was counted. The results are presented in the form of a line graph in Fig. 8 . According to the graph, age, PaO2, SaO2%, Na+, and LAC were selected the most times. PaO2 was selected 57 times; hence, PaO2 was the most important factor that influenced the determination of whether a patient was infected with the new coronavirus. The next most important attribute was SaO2%, which was selected ten times. This also suggests that SaO2% plays a role in the final outcome diagnosis. Age and Na+ were checked nine times, thereby indicating that age and Na+ can facilitate final diagnosis. Finally, LAC was selected eight times. This shows that LAC is also a factor that cannot be ignored in diagnosis. Fig. 9 shows the convergence graphs of these ten algorithms, which were used to analyze the convergence accuracy and convergence speed of bHHOSRL_KELM. As shown in the graph, in terms of both convergence speed and convergence accuracy, bHHOSRL_KELM performs the best, and BGSA and bGWO perform the next best.

Fig. 8

Selected features by the bHHOSRL_KELM during the 10-fold CV procedure.

Fig. 9

Convergence evolution trends of ten methods.

Selected features by the bHHOSRL_KELM during the 10-fold CV procedure. Convergence evolution trends of ten methods.

Discussion

In this study, a KELM-based feature selection method was used to screen a dataset of COVID-19 patients. Then, we screened several key features, including age, PaO2, SaO2%, Na+, and LAC. Subsequently, the bHHOSRL_KELM model was constructed to accurately assess the severity of COVID-19 and monitor its early progression. Therefore, we believe that the use of the bHHOSRL_KELM model can make a more accurate clinical decision-making. Due to its great optimization capability, the proposed HHOSRL algorithm can also be applied to solve other problems, such as problems in video deblurring [79], microgrid planning [80], information retrieval services [[81], [82], [83]], image dehazing [84], image fusion [85], kayak cycle phase segmentation [86], human motion capture [87], fault detection [88], virus detection [89], video coding optimization [90], outlier detection [91], location-based services [92,93], image retrieval [94], multivariate time series analysis [95] and multi-objective problems [96]. Previous studies have shown that age is an important independent indicator for predicting the prognosis of SARS and MERS [97,98]. In the study, Smits et al. found that elderly macaques that were vaccinated with SARS coronavirus showed stronger host innate responses than younger macaques, which may be related to the increased differential expression of inflammation-related genes and reduced expression of type I interferon β [99]. Similar to SARS and MERS, Zhou et al. found that the mortality rate of COVID-19 patients increases with age [100]. Through multivariate analysis, other studies show that in COVID-19 patients, advanced age is an independent risk factor for pneumonia. Moreover, patients with ARDS are older than those without ARDS [101]. Furthermore, another study found that elderly patients have a higher probability of developing sepsis, which may be related to age-dependent defects in lymphocyte function and excessive production of type 2 cytokines [102]. In summary, age is an essential predictor of the prognosis of COVID-19 patients. SaO2 is the percentage of the volume of HbO2 that is bound by oxygen in the blood relative to the total volume of bound hemoglobin. PO2 is the tension that is produced by oxygen that is physically dissolved in the blood [103]. Xie et al. found that hypoxemia in COVID-19 patients is associated with mortality [104]. Hypoxia has been reported to increase angiotensin I-converting enzyme 2 (ACE-2) expression at the transcription and protein levels in human cells [105,106]. According to reports, ACE2 is the target receptor that SARS-CoV-2 enters. Moreover, the binding affinity of the SARS-CoV-2 spike protein to the ACE2 receptor is 10–20 times higher than that of SARS-CoV [107]. The high affinity of ACE2 and the SARS-CoV-2 spike protein may explain some complications of COVID-19 patients, including acute renal failure, cardiovascular and cerebrovascular diseases. Diffuse endotheliitis and microthrombi formation may be an important pathogenesis mechanism for COVID-19 [108,109]. Therefore, hypoxia may aggravate the severity of COVID-19 by upregulating the target receptor for virus entry. Additionally, in the presence of microthrombi, hypoxemia will cause a higher degree of hypoxia and damage to peripheral tissues [110]. Based on the above considerations, SaO2 and PO2 may be powerful predictors of the severity of COVID-19. Sodium ions (Na+), which are the most abundant cations in the extracellular fluid, are essential for maintaining the volume of the extracellular fluid, regulating the acid-base balance, maintaining the normal osmotic pressure and cell physiological functions, and participating in the normal physiological activities of the nerve and muscle system [111]. In early COVID-19 studies, the sodium ion concentration in the severe COVID-19 group was significantly lower than the concentration in the nonsevere COVID-19 group [112,113]. In a study that was based on 59 COVID-19 patients, Huang and colleagues found that the sodium ion concentration in ICU care groups was lower than that in no ICU care groups [11]. Our study also found that the average sodium ion concentration in severe patients, namely, 131.00 mmol/L, was lower than that in nonsevere patients, namely, 136.23 mmol/L (p < 0.01). Therefore, serum sodium ion concentration may also be an essential indicator for predicting severe COVID-19. Lactic acid (LAC) is an intermediate product of sugar metabolism. Increased glucose metabolism or decreased pyruvate metabolism will increase the production of lactic acid. In sepsis, the inflammation response is associated with increased glycolysis and impaired pyruvate dehydrogenase. At this time, the metabolism of pyruvate is limited; hence, the concentration of pyruvate increases. To maintain a normal ratio of pyruvate to lactic acid, the lactic acid concentration will increase [114]. Therefore, LAC can be used as an indicator of the severity of inflammation in patients. A study of 1461 patients showed that the mortality rate of COVID-19 patients with high lactate levels was approximately twice that of patients with low lactate levels [115]. Although cell breakdown under conditions of critical illness may also play a role, the increase in lactic acid in COVID-19 patients may be caused by increased glycolytic activity in multiple cell types [116]. Moreover, an inflammatory factor, namely, interleukin 6, which induces lactic acid production, is present at high concentration in patients with COVID-19. In conclusion, LAC can be used as a predictor of the severity of COVID-19 disease. Up to now, few studies have utilized blood gas analysis parameters or clinical information to distinguish the severity of the COVID-19. To the best of our knowledge, this is the first study to predict the prognosis of COVID-19 using machine learning methods based on age, PaO2, SaO2%, Na+, and LAC. However, this study has several limitations. First, our data originated from a single third-level grade-A hospital in the east of China, and the sample size is not large enough. This limits the accuracy of the prediction. We will expand the sample size in future studies. Second, multicenter, large independent/external datasets and prospective studies need to be conducted to validate the results.

Conclusions and future work

This article begins with a description of the current status of COVID-19 and the tremendous strain it places on health care systems and hospital critical care resources. It also describes in detail the sources of the data that are used in this paper. Later, the methodology that is proposed is described in detail. First, the basic HHO algorithm is described. Then, the improvement strategy that is proposed in this paper is presented. Finally, its fusion with the KELM classifier is used to filter out essential features from a blood sample dataset. To evaluate the performance of the proposed method, HHOSRL is fused with various classifiers, and it is demonstrated that HHOSRL produces satisfactory results using many of these classifiers. An accuracy of almost 100.00% is achieved. Then, it is determined that fusion with the KELM classifier enables HHOSRL to perform best on blood samples. After determining the best classifier, the proposed method is compared with various swarm intelligence feature selection methods. It is found that the proposed bHHOSRL_KELM achieves almost 100.00% specificity, accuracy, sensitivity, and MCC. The time consumption of bHHOSRL_KELM is much less than those of other feature selection methods. Finally, the five features that were selected most frequently in the experiment were selected by this method. Moreover, the roles of these five features in medical diagnosis are discussed. For future work, several issues merit further consideration. Additional influencing factors and coefficients can be included in the investigation, and in our paper, only the available data are presented. Moreover, parallel computing can be used to decrease the computational load during various applications. Additionally, more data samples can be collected to construct a more efficient and reliable framework. Finally, bHHOSRL_KELM can be used for the diagnosis of other diseases, and the algorithm's application scope can be expanded, e.g., to clustering and CT image segmentation.

Declaration of competing interest

The authors declare that there is no conflict of interests regarding the publication of article.

60 in total

1. Adaptive Hashing With Sparse Matrix Factorization.

Authors: Huawen Liu; Xuelong Li; Shichao Zhang; Qi Tian
Journal: IEEE Trans Neural Netw Learn Syst Date: 2019-12-30 Impact factor: 10.451

2. Classification of COVID-19 chest X-rays with deep learning: new models or fine tuning?

Authors: Tuan D Pham
Journal: Health Inf Sci Syst Date: 2020-11-22

3. Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification With Chest CT.

Authors: Liang Sun; Zhanhao Mo; Fuhua Yan; Liming Xia; Fei Shan; Zhongxiang Ding; Bin Song; Wanchun Gao; Wei Shao; Feng Shi; Huan Yuan; Huiting Jiang; Dijia Wu; Ying Wei; Yaozong Gao; He Sui; Daoqiang Zhang; Dinggang Shen
Journal: IEEE J Biomed Health Inform Date: 2020-08-26 Impact factor: 5.772

Review 4. Coronavirus disease 2019 (COVID-19): A literature review.

Authors: Harapan Harapan; Naoya Itoh; Amanda Yufika; Wira Winardi; Synat Keam; Haypheng Te; Dewi Megawati; Zinatul Hayati; Abram L Wagner; Mudatsir Mudatsir
Journal: J Infect Public Health Date: 2020-04-08 Impact factor: 3.718

5. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study.

Authors: Nanshan Chen; Min Zhou; Xuan Dong; Jieming Qu; Fengyun Gong; Yang Han; Yang Qiu; Jingli Wang; Ying Liu; Yuan Wei; Jia'an Xia; Ting Yu; Xinxin Zhang; Li Zhang
Journal: Lancet Date: 2020-01-30 Impact factor: 79.321

6. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2.

Authors:
Journal: Nat Microbiol Date: 2020-03-02 Impact factor: 17.745

Review 7. COVID-19, SARS and MERS: are they closely related?

Authors: N Petrosillo; G Viceconte; O Ergonul; G Ippolito; E Petersen
Journal: Clin Microbiol Infect Date: 2020-03-28 Impact factor: 8.067

8. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study.

Authors: Fei Zhou; Ting Yu; Ronghui Du; Guohui Fan; Ying Liu; Zhibo Liu; Jie Xiang; Yeming Wang; Bin Song; Xiaoying Gu; Lulu Guan; Yuan Wei; Hui Li; Xudong Wu; Jiuyang Xu; Shengjin Tu; Yi Zhang; Hua Chen; Bin Cao
Journal: Lancet Date: 2020-03-11 Impact factor: 79.321

5 in total

1. An Improved Machine-Learning Approach for COVID-19 Prediction Using Harris Hawks Optimization and Feature Analysis Using SHAP.

Authors: Kumar Debjit; Md Saiful Islam; Md Abadur Rahman; Farhana Tazmim Pinki; Rajan Dev Nath; Saad Al-Ahmadi; Md Shahadat Hossain; Khondoker Mirazul Mumenin; Md Abdul Awal
Journal: Diagnostics (Basel) Date: 2022-04-19

2. Multilevel threshold image segmentation for COVID-19 chest radiography: A framework using horizontal and vertical multiverse optimization.

Authors: Hang Su; Dong Zhao; Hela Elmannai; Ali Asghar Heidari; Sami Bourouis; Zongda Wu; Zhennao Cai; Wenyong Gui; Mayun Chen
Journal: Comput Biol Med Date: 2022-05-18 Impact factor: 6.698