Literature DB >> 36034050

Binary Simulated Normal Distribution Optimizer for feature selection: Theory and application in COVID-19 datasets.

Shameem Ahmed¹, Khalid Hassan Sheikh¹, Seyedali Mirjalili^2,3,4, Ram Sarkar¹.

Abstract

Classification accuracy achieved by a machine learning technique depends on the feature set used in the learning process. However, it is often found that all the features extracted by some means for a particular task do not contribute to the classification process. Feature selection (FS) is an imperative and challenging pre-processing technique that helps to discard the unnecessary and irrelevant features while reducing the computational time and space requirement and increasing the classification accuracy. Generalized Normal Distribution Optimizer (GNDO), a recently proposed meta-heuristic algorithm, can be used to solve any optimization problem. In this paper, a hybrid version of GNDO with Simulated Annealing (SA) called Binary Simulated Normal Distribution Optimizer (BSNDO) is proposed which uses SA as a local search to achieve higher classification accuracy. The proposed method is evaluated on 18 well-known UCI datasets and compared with its predecessor as well as some popular FS methods. Moreover, this method is tested on high dimensional microarray datasets to prove its worth in real-life datasets. On top of that, it is also applied to a COVID-19 dataset for classification purposes. The obtained results prove the usefulness of BSNDO as a FS method. The source code of this work is publicly available at https://github.com/ahmed-shameem/Feature_selection.

Entities: Chemical

Keywords: Algorithm; COVID-19; Feature selection; Generalized Normal Distribution Optimizer; Meta-heuristic; Optimization; Simulated annealing

Year: 2022 PMID： 36034050 PMCID： PMC9396289 DOI： 10.1016/j.eswa.2022.116834

Source DB: PubMed Journal: Expert Syst Appl ISSN： 0957-4174 Impact factor: 8.665

Introduction

Data mining and Machine learning are some of the fastest-growing research topics in the information industry due to the availability of ample amounts of data that can be converted to potentially useful information. These fields are essential and integral part of the knowledge discovery (KDD) process which consists of a set of iterative sequences of tasks such as data cleaning, data reduction, data integration, and data transformation etc. Han et al. (2011). These pre-processing steps have a major impact on the performance of data mining and machine learning algorithms. Data can be considered as the ’new currency’ in this decade, which simply states the importance of data. Hence, handling data properly for our needs is a new adventure. With the growing popularity of these fields, we are receiving data in abundance, which is making our job difficult as the dimensions of these data are very high. Now, any data mining and machine learning algorithm take a huge amount of time during training because of this. To solve this problem of the ’curse of dimensionality’, researchers have come up with various techniques. Feature selection (FS) is one such most popular technique, which removes the unnecessary and irrelevant features, thereby reducing the number of attributes that do not help in the classification purpose, rather act as a noise and increase the space requirement as well as the computational cost. Generally speaking, there are two ways to perform FS: filter and wrapper (Liu & Motoda, 2012). Filter methods try to evaluate the feature subset using some designated methods such as Information gain (IG), Chi-square (Zheng et al., 2004), Laplacian score (He et al., 2006) etc. Whereas, wrapper methods use a learning algorithm to evaluate the selected feature subset. Filter methods are usually faster than wrapper methods but generally wrapper methods produce better classification accuracy (Liu & Motoda, 2012). Some of the recent and promising ones like the column-subset selection problem (Boutsidis et al., 2014, Cortinovis and Kressner, 2020, Drineas et al., 2008, Tripathi and Reza, 2020) are known to perform FS with provable theoretical bounds. These methods have been used to perform FS on k-means (Boutsidis et al., 2009), SVM (Paul et al., 2016) which provide a significant performance enhancement. These methods are known to outperform existing methods like mutual information, recursive feature elimination, etc. Finding the most functional feature subset or necessary features is a challenging task. For the last few years, meta-heuristic algorithms have been employed to address the FS problem. These works have widened the way to FS in an efficient manner. If any dataset consists of ‘N’ features/attributes, then there are -1 number of feature combinations. Evaluating all these feature subsets is a hectic task, i.e., it is very time consuming and hence inefficient. To solve this problem random search is another possible solution (Lai et al., 2006). However, meta-heuristic procedures are considered to be more appropriate as they can handle the worst case scenario (Talbi, 2009). There are many such meta-heuristic algorithms in literature like genetic algorithm (GA) (Davis, 1991), particle swarm optimization (PSO) algorithm (Kennedy & Eberhart, 1995), artificial bee colony (ABC) algorithm (Karaboga & Basturk, 2007), harmony search (HS) algorithm (Geem et al., 2001), sine–cosine algorithm (SCA) (Mirjalili, 2016) etc. The search process of any meta-heuristic algorithm depends on the balance between its exploration and exploitation phases. Exploration simply means diversification of solutions, i.e., evaluate the candidate solutions which are not neighbouring solutions. Exploitation, on the other hand, means intensification, i.e., searching the neighbourhood for possible better solutions. These two traits become the deciding factor in finding an optimal solution. Hence, proper tuning between these two is very important. In this paper, we have tried to maintain a fine balance between these two phases of Generalized Normal Distribution Optimizer (GNDO) (Zhang et al., 2020) with the help of Simulated Annealing (SA) (Kirkpatrick et al., 1983) which acts as a local search to enhance the exploitation capabilities of GNDO. The proposed method has been applied over various datasets to prove its worth and effectiveness. The rest of the paper is organized in the following manner: Section 2 discusses some popular and recent meta-heuristic algorithms found in the literature, Section 3 presents the motivation and contribution of this work, Section 4 describes the search process of GNDO and SA, Section 5 discusses the fitness function and transfer function used here, as well as the time complexity of the method, Section 6 reports the detailed experiments that have been performed to prove the effectiveness of the proposed method, Section 7 proves the robustness of the model, Section 8 shows the effectiveness of the proposed method in COVID detection and finally Section 9 concludes the paper along with its future work.

Related work

In recent times, optimization algorithms have attracted a lot of attention from researchers. In particular, meta-heuristic algorithms have seen numerous improvements over the years. Meta-heuristic is a genre of randomized algorithms where the algorithm learns to find the optimal solution through the iteration process. Meta-heuristic algorithms can be divided into different categories: single solution based and population-based (Gendreau & Potvin, 2005), nature-inspired and non-nature inspired (Abdel-Basset et al., 2018, au2 et al., 2013), etc. From the ‘inspiration’ point of view, these algorithms can broadly be divided into four categories (Nematollahi et al., 2019): Evolutionary, Swarm inspired, Physics based, and Human related. Evolutionary Algorithms: These algorithms are basically inspired by the biological process of evolution. In an evolutionary endeavour, the fittest individual is generated through crossover and mutation in each generation, which inspired the pioneer algorithm in this field, Genetic Algorithm (GA) (Davis, 1991). Other evolutionary algorithms are Genetic programming (Koza, 1994), Co-evolving algorithm (Hillis, 1990), Cultural algorithm (Xue et al., 2011), Biogeography-Based Optimization (Simon, 2008), Grammatical evolution (Ryan et al., 1998) etc. Swarm inspired algorithms: This genre of algorithm mimics the individual and social behaviour of swarm, herd, schools, groups, and teams. The key idea behind such algorithms in the optimization field is that in swarms, each individual has a certain behaviour but with the collective effort, the swarm can solve very complex optimization problems. One of the most popular algorithms in this field is PSO (Kennedy & Eberhart, 1995), which is inspired by the behaviour of flock of birds. The other famous swarm-based algorithms are the Shuffled frog-leaping algorithm (Eusuff et al., 2006), Bacterial foraging Passino (2002), ABC (Karaboga & Basturk, 2007), Firefly Algorithm (Yang, 2009), Grey Wolf Optimizer (GWO) (Mirjalili et al., 2014), Crow search algorithm (Askarzadeh, 2016), The Whale Optimization Algorithm (Mirjalili & Lewis, 2016), Grasshopper Optimization Algorithm (Saremi et al., 2017), Squirrel Search Algorithm (Jain et al., 2019). Physics based algorithms: This type of algorithms is inspired by the working principle of the physical world. Music, metallurgy to mathematics, physics, chemistry, and complex dynamic systems, are some of the physical processes which inspire Physics based meta-heuristic algorithms. Some noted algorithms are the Gravitational Search algorithm (GSA) (Rashedi et al., 2009), SA (Kirkpatrick et al., 1983), Self propelled particles (Vicsek et al., 1995), HS algorithm (Geem et al., 2001), Black hole optimization (Hatamlou, 2013), Multi-verse optimizer (Mirjalili et al., 2015), Find-Fix-Finish-Exploit-Analyze (Kashan et al., 2019) etc. Human related algorithms: These are developed based on human behaviour Teaching–Learning-Based optimization (Rao et al., 2011), Society and civilization (Ray & Liew, 2003), Fireworks algorithm (Tan & Zhu, 2010), are some algorithms in this genre. However, one of the issues with meta-heuristic algorithms is their premature convergence that leads to finding less optimal solutions. Therefore, these algorithms are often coupled with other techniques (e.g. local search algorithms). In this case, the local search algorithm tries to find solutions that are locally adjacent to an existing solution, which can outperform the existing solutions. Some of the commonly used local search algorithms are Hill Climbing (HC), SA (Kirkpatrick et al., 1983), Tabu Search (TS) (Glover & Laguna, 1998), Late acceptance hill climbing (LAHC). Some modifications to HC algorithm are HC (Al-Betar, 2016) and Adaptive HC (Al-Betar et al., 2019). Some of the work based on hybridization of local search and meta-heuristic algorithms are Elgamal et al., 2020, Kurtuluş et al., 2020 and Mafarja and Mirjalili (2017).

Motivation and contributions

For the past few decades, meta-heuristic algorithms have proved their utilities in several research fields. Because of its immense usefulness, researchers are investing more time in it to come up with better-performing algorithms. At the end of the day, we want to find the most optimal solution to such NP-hard problems. So there is not really any best result. We can always improve our findings with new or modified algorithms. Moreover, according to the No Free Lunch (NFL) theorem (Wolpert & Macready, 1997), any two algorithms produce equivalent results when they are evaluated on all possible optimization problems. It has been observed that an algorithm may achieve superior results on some problems, but that does not ensure the same on other problems. Hence, we can say that there is no such universal algorithm that is qualified enough to be used in all the optimization problems and produce the best results. These inferences keep the research resilient in this field. As FS is considered as an optimization problem (Ghosh et al., 2020), so researchers are also coming up with new and efficient FS methods using meta-heuristic algorithms. This is the motivation of our proposed work where we have designed a new algorithm by modifying the GNDO (Zhang et al., 2020). The GNDO algorithm is inspired by the generalized normal distribution model, where each individual uses a generalized normal distribution curve to update its current position in the hope of finding a better position. GNDO is employed to increase the accuracy of extracting the unknown parameters of the single diode model, double diode model and photovoltaic module model. There are two ways of hybridizing meta-heuristic algorithms (Talbi, 2009): low-level and high-level. A low-level approach en routes one algorithm in the other, whereas the algorithms are executed in succession in the high-level accession. This work follows the high-level version to hybridize GNDO and SA, maintaining a pipeline model where the output of one meta-heuristic algorithm is considered as the input to the other. To the best of our knowledge, this is the first time, GNDO is hybridized with SA for solving FS problems. The present work proposes an improved version of the binary form of GNDO (BGNDO), known as Binary Simulated Normal Distribution Optimizer (BSNDO), hybridized with another meta-heuristic algorithm called SA (Kirkpatrick et al., 1983). Recently, some hybrid FS methods have been proposed (Ahmed et al., 2021, Ahmed et al., 2020b, Bhattacharyya et al., 2020, Sheikh et al., 2020), which have demonstrated their effectiveness and superiority over other methods. This has also motivated us to come up with hybrid version of GNDO. Also, COVID-19 is a threat to the humanity as many people are suffering from it as well as many have died. Our normal day-to-day life is destroyed because of this uncertainty. Many works have been proposed for detection of COVID, few of them are Das et al., 2021, Garain et al., 2021, Karbhari et al., 2021 etc. We have performed FS on a publicly available COVID-19 dataset for classification purpose of COVID-19. In a nutshell, the main contributions of this work are as follows: A new FS method called BSNDO is introduced using BGNDO and another popular meta-heuristic called SA. The proposed hybrid FS method is assessed on 18 standard UCI datasets (Dua & Graff, 2017) using K-nearest Neighbours (KNN) classifier, Random Forest classifier as well as Naive Bayes classifier. BSNDO is also applied on high-dimensional microarray datasets to prove its effectiveness. It is also applied on a publicly available COVID-19 dataset for classification purposes. The proposed FS method is compared with many state-of-the-art meta-heuristic based FS methods.

Preliminaries

Generalized normal distribution optimizer

GNDO (Zhang et al., 2020) is inspired from normal distribution (Gaussian distribution) theory. This distribution is used to narrate the natural phenomenon. A normal distribution is described as follows: assume a random variable ‘r’, obeys a probability distribution having location parameter and scale parameter , and its probability density function can be written as: Then ‘r’ is a random variable and this distribution is called normal distribution, i.e., r (). Any population-based optimization algorithm starts with the random initialization, and then all solutions converge towards the global optima following the rules of exploration and exploitation. In the end, all individuals assemble around the achieved the best solution. Now, we can visualize this search process as multiple normal distributions. The position of every individual is regarded as random variables which are subject to normal distribution. Exploration of GNDO is dependent on three randomly selected agents. And the exploitation of GNDO is based on the generalized normal distribution model, which is accompanied by the current mean position and the current optimal position. Based on the correspondence between the distribution of the solutions in the population and the normal distribution, a generalized distribution model can be built by: where is the trail vector of the th agent at time t, is generalized mean position of the th agent, is generalized standard variance and is a penalty factor. Also, and can be defined as follows: where a, b, and are random numbers [0, 1], is the current best position and M is the mean position of the current population, which is calculated by: As current best individual contains useful information related to the global optimal solution, the th individual is pulled towards the direction of . It is to be noted that when gets confined into local optima, all agents still move towards the direction of that will lead the algorithm to premature converge. To resolve this concern, the mean position of the current population M is introduced. Although the position of may not change in some generations, however, the mean position M is changed over the generation that becomes useful for finding better solutions. Thus, the mean position M is introduced in the searching process, which increases the probability to avoid the local optima. is employed to amplify the local search ability of GNDO. Further, can be interpreted as a random progression to perform the local search around the generalized mean position . Moreover, the distance between the position of the th individual and the mean position M and the position of the best individual is larger, the oscillation of the generated random sequence is prominent. Hence, the probability to find a better solution around an individual is very minimal when has a very bad fitness value which can help the individual to search better solution. On the contrary, there is a large probability for the individual to find a better solution around it when the individual has good fitness. Thus, a random sequence with weak oscillation may help the individual to achieve a better solution. In the GNDO algorithm, the penalty factor is used to increase the randomness of the generated generalized standard variance. Most penalty factors are located [−1, 1]. As the generated generalized standard variances are all positive, the penalty factor can increase the search directions of GNDO, which can enhance its searchability. Now, the global exploration of GNDO is dependent on three randomly selected individuals, which is given by: where and are randomly generated numbers subject to standard normal distribution, is the adjust parameter which is a random number [0, 1], and are trail vectors which are calculated as follows: where p1, p2 and p3 are three randomly generated integers [1, D], following . The th individual is given information by p2 and p3. The solution p1 shares information with th solution . The adjust parameter is used to balance the two information-sharing procedures. Moreover, and are random numbers with standard normal distribution, which can make GNDO has a larger search space in the process of performing the global search. In order to bring the better solution into the next generation population, a mechanism is designed, which is represented as : A brief description of the parameters used in GNDO are summarized in Table 1.

Table 1

Brief description of parameters used in GNDO.

Parameter	Description	Value
α	Generalized mean position	NA
β	Generalized standard variance (enhances local search ability)	NA
γ	Penalty factor (enhances randomness of generated generalized standard variance)	[−1,1]
a,b, λ1, λ2	Random numbers	[0,1]
λ3, λ4	Random numbers subject to standard normal distribution	[0,1]
δ	Adjust parameter	[0,1]
D	Dimension of search space	NA

The pseudocode of the GNDO algorithm is given in Algorithm 1. Brief description of parameters used in GNDO.

Simulated annealing

SA, proposed by Kirkpatrick et al. (1983), is inspired by an analogy between simulation of annealing of solids and large combinatorial optimization problems. Often meta-heuristic algorithms can fail to find the global optima, rather they can stagnate in local optima. To overcome this issue, SA uses a probabilistic approach to accept a poor solution. By accepting poor solutions with a certain probability, exploration increases. The algorithm starts with a randomly generated initial solution, in each iteration, a solution neighbouring with respect to the current solution is generated on a random basis based on the existing neighbouring structure. Then, the neighbouring solution is evaluated based on the fitness function. There may occur two possibilities: Neighbouring solution is a better performing solution than the existing solution: in this case, a new solution is always accepted. Neighbouring solution is worse performing solution in comparison with the existing solution: in this case, the worse solution can be accepted with a certain probability determined by the Boltzmann probability, . Here is the difference of fitness value of the neighbouring solution and existing best solution, and is the temperature of “simulated annealing process”. The temperature is periodically reduced over the iterations. The temperature is initialized to where represents the feature-length. The temperature reducing scheme can be represented by the following equation: . It is to be noted that the temperature decay and the probability of exploration/exploitation are taken from the work done by Kirkpatrick et al. (1983) and Zhang et al. (2020) respectively. A brief description of the parameters used in SA are summarized in Table 2.

Table 2

Brief description of parameters used in SA.

Parameter	Description	Value
P	Boltzmann probability	[0,1]
T	Temperature	NA
‖N‖	Number of attributes for each dataset	NA

Algorithm 2 shows the pseudo-code of SA. Brief description of parameters used in SA.

Proposed method

This section elaborates the fitness function and transfer function used, and the computational complexity of the proposed algorithm. At every iteration, the agents update their position following the rules of GNDO and at the end, they try to find a better solution in their neighbourhood using SA.

Fitness function

Selecting the relevant features from a dataset that actually helps the classifier to identify the class of a sample is the main challenge. Now, during the process of selecting relevant features, we have to automatically rule out the redundant ones and maximize the classification accuracy of a classification problem when the selected feature subset is used for classification purposes (Pudil et al., 1994). This work applies BSNDO to find the best feature subset and calculate the classification accuracy of this subset using a classifier. Let be the classification accuracy of the model calculated using a classifier, be the dimension of the feature subset and be the total number of attributes present in the original dataset. So, (1 - ) is the classification error and is the fraction of features selected from the original dataset. We define the fitness function as: where denotes weightage given to the classification error.

Transfer function

As FS is a binary optimization problem (Ghosh et al., 2020), its output is {0, 1} where zero represents that the feature is rejected as it is redundant and one represents that the feature is useful and hence it is selected. However, we cannot discard the possibility of the obtained result going out of the desired range. To ensure that the output always stays within the expected range, we have to apply a binarization function on each agent. Here, this task is performed by the sigmoid (S-shaped) transfer function (Mirjalili & Lewis, 2013). The S-shaped transfer function, depicted in Fig. 1, is given by -

Fig. 1

S-shaped transfer function.

The range of this function [0,1]. If the transfer function produces output , where rnd is a random number with uniform distribution in the range (0, 1), we set the value to be 1 i.e., we consider that attribute is useful and if it is rnd, we set the value to be 0 i.e., the attribute is redundant, hence it will not be considered (Mafarja et al., 2019). S-shaped transfer function.

Computational complexity

For any meta-heuristic algorithm, the computation complexity depends on the time taken by each individual to update their positions, the maximum number of iterations and some other operations like comparison/sorting and the time to update variables. The computational complexity of BSNDO is , where represents the maximum number of iterations, represents the number of agents, represents the dimension of the search space, and indicates the required time for calculating the fitness of a particular solution using a classifier. The usage of SA is to find a better solution if available in the neighbourhood of the current solution. SA does not affect the computational cost in terms of -notation.

Experiments

Dataset details

To investigate the performances of BGNDO and BSNDO, 18 standard UCI datasets (Dua & Graff, 2017) are considered here. These datasets are from diverse domains. Some basic information regarding these datasets is provided in Table 3. As the datasets used here are assorted in terms of the number of features and instances, so it helps us to understand the robustness of the proposed FS method.

Table 3

Brief idea of the datasets employed here to assess the proposed FS method.

Sl. No.	Dataset	#Attributes	#Samples	#Classes	Domain
1	Breastcancer	9	699	2	Biology
2	BreastEW	30	569	2	Biology
3	CongressEW	16	435	2	Politics
4	Exactly	13	1000	2	Biology
5	Exactly2	13	1000	2	Biology
6	HeartEW	13	270	2	Biology
7	IonosphereEW	34	351	2	Electromagnetic
8	KrvskpEW	36	3196	2	Game
9	Lymphography	18	148	4	Biology
10	M-of-n	13	1000	2	Biology
11	PenglungEW	325	73	2	Biology
12	SonarEW	60	208	2	Biology
13	SpectEW	22	267	2	Biology
14	Tic-tac-toe	9	958	2	Game
15	Vote	16	300	2	Politics
16	WaveformEW	40	5000	3	Physics
17	WineEW	13	178	3	Chemistry
18	Zoo	16	101	6	Artificial

Brief idea of the datasets employed here to assess the proposed FS method.

Parameter settings

For any multi-agent evolutionary algorithm, the parameters always play an important role in determining the outcome. Specially, the population size and the total number of iterations (number of generations) always affect the outcome of the algorithm heavily. Hence, we have performed some experiments to determine the approximate ideal population size and a total number of iterations. The achieved classification accuracy by BGNDO and BSNDO for different population sizes varying from 10 to 50 are provided in Table 4. Similarly, the numbers of selected features for different population sizes varying from 10 to 50 by BGNDO and BSNDO are depicted in Table 5. To observe the convergence of the solution to the optimal position, convergence graphs have been plotted (which are given in Fig. 2) over 50 iterations. To maintain the fairness of the comparison, we have run each dataset 10 times and taken the average over these runs.

Table 4

Achieved classification accuracy obtained by BGNDO and BSNDO with different population sizes.

Pop_size	10		20		30		40		50
Dataset	BGNDO	BSNDO	BGNDO	BSNDO	BGNDO	BSNDO	BGNDO	BSNDO	BGNDO	BSNDO
Breastcancer	98.57	100	97.14	100	99.28	98.57	99.28	99.28	98.57	100
BreastEW	96.49	97.36	97.37	98.25	96.49	98.24	97.36	97.36	95.61	99.122
CongressEW	96.55	100	97.7	100	98.85	98.85	96.55	97.7	98.85	98.85
Exactly	100	100	100	100	100	100	100	99.5	100	100
Exactly2	77	78.5	80.5	80.5	80	78.5	79	80	80	79.5
HeartEW	85.18	83.33	90.74	94.44	88.88	90.74	85.18	94.44	85.18	90.74
IonosphereEW	91.43	92.86	95.71	95.74	91.43	94.28	94.29	92.86	92.85	94.28
KrvskpEW	97.65	98.12	98.12	98.44	98.43	97.49	98.59	97.81	97.96	97.33
Lymphography	93.33	93.33	96.67	96.67	90	96.67	96.67	90	96.67	93.33
M-of-n	100	100	100	100	100	100	100	100	100	100
PenglungEW	93.33	93.33	100	100	93.33	93.33	86.67	100	93.33	100
SonarEW	95.23	88.09	97.62	95.24	92.85	92.86	97.62	95.23	92.85	97.62
SpectEW	90.56	94.44	92.45	96.22	92.45	94.44	94.33	90.74	92.45	88.89
Tic-tac-toe	83.33	86.46	89.58	87.5	84.89	86.46	83.85	86.46	88.54	84.89
Vote	98.33	100	100	100	98.33	98.33	100	100	98.33	100
WaveformEW	84.3	84.4	84.5	87	83.4	84.6	85.3	85.6	85.7	83.8
WineEW	97.22	100	100	100	100	100	100	100	97.22	100
Zoo	100	100	100	100	100	100	100	100	100	100

Table 5

Number of selected features by BGNDO and BSNDO for different population sizes.

Pop_size	10		20		30		40		50
Dataset	BGNDO	BSNDO	BGNDO	BSNDO	BGNDO	BSNDO	BGNDO	BSNDO	BGNDO	BSNDO
Breastcancer	4	4	7	4	4	6	4	3	4	4
BreastEW	13	8	14	4	15	11	16	5	13	12
CongressEW	6	6	9	7	7	9	8	10	7	8
Exactly	7	6	10	6	7	6	6	7	7	6
Exactly2	6	4	9	8	8	6	9	12	7	6
HeartEW	5	5	6	4	6	4	5	4	4	5
IonosphereEW	17	12	26	16	15	16	20	12	10	8
KrvskpEW	24	24	22	22	21	25	26	24	22	17
Lymphography	10	5	11	5	8	8	6	6	9	6
M-of-n	7	6	8	6	7	6	7	6	7	6
PenglungEW	48	132	209	187	129	132	171	179	177	139
SonarEW	30	24	39	27	33	31	30	36	27	28
SpectEW	9	13	14	6	10	7	11	6	11	12
Tic-tac-toe	6	9	9	9	7	9	9	9	9	9
Vote	10	7	10	3	7	3	8	6	6	7
WaveformEW	34	25	27	33	27	26	31	28	26	4
WineEW	6	4	9	3	7	1	4	4	4	4
Zoo	8	6	11	5	6	1	7	6	5	8

Fig. 2

Convergence graphs depicting the convergence of best individual at every iteration for 18 UCI datasets using BGNDO and BSNDO.

From the initial experiments, we have found that a population size of 20 leads to noteworthy results. Keeping the computational cost in min,d this population size is considered for further experiments. At the same time, from the convergence graphs, approximately after 30 iterations, the best solution is almost at the optimal position. Hence, it has been used for further experiments. Achieved classification accuracy obtained by BGNDO and BSNDO with different population sizes. Number of selected features by BGNDO and BSNDO for different population sizes. Convergence graphs depicting the convergence of best individual at every iteration for 18 UCI datasets using BGNDO and BSNDO.

Result and discussion

This section discusses the outcomes produced by BGNDO and BSNDO evaluated on UCI datasets whose details are given in Table 3 while evaluated using KNN, Random Forest and Naive Bayes classifiers. These results establish the superiority of BSNDO over BGNDO. Table 6, Table 7, Table 8 present the outcomes obtained by the proposed BSNDO algorithm while evaluated by KNN, Random Forest and Naive Bayes classifiers respectively. Compared to the BGNDO algorithm, the obtained results clearly depicts the effect of BSNDO in finding a better solution. Observing these results, we can conclude that BSNDO performs better than BGNDO on UCI datasets. Furthermore, KNN is used widely in the references for FS on UCI datasets (Emary et al., 2016, Mafarja and Mirjalili, 2017, Mafarja et al., 2019), hence, for further experiments and discussion, we have utilized KNN classifier with .

Table 6

Achieved classification accuracy and number of selected features by BGNDO and BSNDO using KNN classifier (highest classification accuracies and lowest no. of selected features are highlighted in bold font).

Sl. No.	Dataset	Original		BGNDO		BSNDO
		Accuracy	Features	Accuracy	Features	Accuracy	Features
1	Breastcancer	96	9	97.14	7	100	4
2	BreastEW	92.63	30	97.37	14	98.25	4
3	CongressEW	92.18	16	97.7	9	100	7
4	Exactly	72.3	13	100	10	100	6
5	Exactly2	73.3	13	80.5	9	80.5	8
6	HeartEW	68.15	13	90.74	6	94.44	4
7	IonosphereEW	83.43	34	95.71	26	95.74	16
8	KrvskpEW	96.1	36	98.12	22	98.44	22
9	Lymphography	81.33	18	96.67	11	96.67	5
10	M-of-n	87.4	13	100	8	100	6
11	PenglungEW	81.33	325	100	209	100	187
12	SonarEW	80.95	60	94.62	39	95.24	27
13	SpectEW	82.22	22	92.45	14	96.22	6
14	Tic-tac-toe	81.1	9	83.854	8	87.5	8
15	Vote	92.33	16	100	10	100	3
16	WaveformEW	81.44	40	84.5	27	87	33
17	WineEW	66.67	13	100	9	100	3
18	Zoo	87	16	100	11	100	5

Table 7

Achieved classification accuracy and number of selected features by BGNDO and BSNDO using Random Forest classifier (highest classification accuracies and lowest no. of selected features are highlighted in bold font).

Sl no.	Dataset	Original		BGNDO		BSNDO
		Accuracy	Features	Accuracy	Features	Accuracy	Features
1	Breastcancer	97.8	9	97.14	7	97.86	2
2	BreastEW	98.2	30	95.61	2	100	4
3	CongressEW	97.7	16	96	1	98.85	5
4	Exactly	78.5	13	100	6	100	6
5	Exactly2	74	13	76	1	76	1
6	HeartEW	81.5	13	88.89	5	94.44	5
7	IonosphereEW	91.4	34	95.71	24	98.57	20
8	KrvskpEW	99.5	36	98.12	28	99.53	17
9	Lymphography	90	18	93.33	8	96.67	4
10	M-of-n	100	13	100	8	100	6
11	PenglungEW	86.7	325	93.33	140	100	193
12	SonarEW	90.7	60	92.86	42	95.24	14
13	SpectEW	88.9	22	90.74	15	96.3	7
14	Tic-tac-toe	95.8	9	82.94	5	94.37	8
15	Vote	95	16	98.33	12	98.33	6
16	WaveformEW	85.8	40	83	33	86.2	29
17	WineEW	100	13	100	4	100	3
18	Zoo	100	16	100	4	100	3

Table 8

Achieved classification accuracy and number of selected features by BGNDO and BSNDO using Naive Bayes classifier (highest classification accuracies and lowest no. of selected features are highlighted in bold font).

Sl. No.	Dataset	Original		BGNDO		BSNDO
		Accuracy	Features	Accuracy	Features	Accuracy	Features
1	Breastcancer	89.28	9	97.87	7	99.24	5
2	BreastEW	96.49	30	97.36	14	98.2	5
3	CongressEW	98.85	16	98.85	8	100	6
4	Exatly	69.5	13	96	8	100	6
5	Exactly2	76	13	76	9	76	6
6	HeartEW	94.44	13	92.3	9	94.44	4
7	IonosphereEW	95.71	34	92.88	18	95.74	15
8	KrvskpEW	65.88	36	95.31	9	97.18	12
9	Lymphography	86.67	18	90	15	100	8
10	M-of-n	96.5	13	98.5	6	100	6
11	PenglungEW	60	325	73.33	120	93.33	157
12	SonarEW	80.95	60	80.95	23	97.61	21
13	SpectEW	72.22	22	90.24	14	92.59	14
14	Tic-tac-toe	75.52	9	72.92	6	82.92	5
15	Vote	98.33	16	100	8	100	3
16	WaveformEW	82.2	40	82.5	25	85.8	18
17	WineEW	100	13	100	3	100	2
18	Zoo	100	16	100	5	100	3

Inspecting the results in these tables, we can observe that BSNDO provides better results than BGNDO in every dataset while evaluated using different classifiers. From Table 6 we can see that BSNDO achieves 90% accuracy in 15 datasets (83.33%), out of which it produces 100% classification accuracy in 8 datasets (44.44%) while evaluated using KNN classifier. It produces better classification accuracy than BGNDO except Exactly, Exactly2, Lymphography, M-of-n, PenglungEW, Vote, WineEW and Zoo, where both produces equivalent accuracy. Considering the number of selected features, BSNDO beats BGNDO in every dataset except WaveformEW. They select the same number of features in the case of KrvskpEW. Similarly, from Table 7 we can see that BSNDO achieves 90% accuracy in 16 datasets (88.89%) while evaluated using Random Forest classifier. It achieves 100% accuracy in 6 datasets (33.33%). In the case of Exactly, Exactly2, M-of-n, Vote, WineEW and Zoo, BSNDO and BGNDO produces the same classification accuracy. BSNDO has the upper hand over BGNDO over the rest cases. Talking about the number of selected features, only in the case of BreastEW, CongressEW, PenglungEW and Tic-tac-toe, BGNDO produces better results. They select the same number of features in cases of Exactly, Exactly2 and HeartEW. BSNDO selects fewer features than BGNDO in the rest cases. While evaluated using Naive Bayes classifier, BSNDO produces 90% classification accuracy in 15 datasets (83.33%) Table 8. It achieves 100% accuracy in 7 datasets (38.89%). BSNDO and BGNDO produce equivalent results in the case of Exactly2, Vote, WineEW and Zoo. BSNDO produces better results in the rest of the cases. It also selects fewer features than BGNDO in almost every dataset except KrvskpEW and PenglungEW. Both of them selects the same number of features in the case of M-of-n and SpectEW. From the above discussion, we can say that BSNDO is superior to BGNDO while evaluated using KNN, Random Forest and Naive Bayes classifiers. The results achieved by BSNDO using these classifiers establish the fact that BSNDO produces noteworthy and impressive results while evaluated using different classifiers. Achieved classification accuracy and number of selected features by BGNDO and BSNDO using KNN classifier (highest classification accuracies and lowest no. of selected features are highlighted in bold font). Achieved classification accuracy and number of selected features by BGNDO and BSNDO using Random Forest classifier (highest classification accuracies and lowest no. of selected features are highlighted in bold font). Achieved classification accuracy and number of selected features by BGNDO and BSNDO using Naive Bayes classifier (highest classification accuracies and lowest no. of selected features are highlighted in bold font).

Comparison

We have established this claim that BSNDO produces better results than BGNDO beforehand. This section provides the performance comparison of BSNDO with eight state-of-the-art FS methods. These state-of-the-art methods consist of few popular methods and some recently proposed hybrid methods. They are: GA, PSO, adaptive switching grey-whale optimizer (ASGW), serial grey-whale optimizer (HSGW), random switching grey-whale optimizer (RSGW), social ski driver algorithm and late acceptance hill-climbing (SSDsLAHC) (Chatterjee et al., 2020), electrical harmony based meta-heuristic (Sheikh et al., 2020) and embedded chaotic whale survival algorithm (ECWSA-4) (Guha et al., 2020). From Table 9 we can say that BSNDO produces the overall best result. In the case of Breastcancer, BSNDO and EHHM produce 100% accuracy. In BreastEW, BSNDO holds the second position along with SSDsLAHC after ASGW and EHHM. BSNDO holds the top position along with SSDsLAHC in CongressEW producing 100% accuracy. In the case of Exactly, BSNDO produces the best result with SSDsLAHC, HSGW, BGA, BPSO and EHHM. HSGW beats BSNDO in Exactly2 with a very narrow margin. It stands at third position in the case of HeartEW. It stands at fifth position in IonosphereEW and SonarEW. In the case of KrvskpEW and Lymphography, it attains the second position after BGA and EHHM respectively. In the case of M-of-n, PenglungEW, Vote, WineEW and Zoo, BSNDO stands at first position along with few other methods. It achieves the highest classification accuracy in the case of SpectEW and Tic-tac-toe. In the case of WaveformEW, it produces the second best result after EHHM.

Table 9

Comparison of BSNDO with state-of-the-art FS methods based on achieved classification accuracy tested on UCI datasets (highest classification accuracies are highlighted).

Dataset	BSNDO	SSDs+LAHC	HSGW	RSGW	ASGW	BGA	BPSO	EHHM	ECWSA-4
Breastcancer	100	98.93	98.6	97.1	98.5	97.43	96.29	100	95.21
BreastEW	98.25	98.25	98.1	98.2	100	97.54	97.19	100	97.38
CongressEW	100	100	97.5	96.1	99.4	96.79	96.33	98.85	96.23
Exactly	100	100	100	99.7	99.9	100	100	100	78.09
Exactly2	80.5	79	81.5	77.9	77.7	77	76.8	79.1	78.9
HeartEW	90.74	91.67	92.3	84.8	83.1	87.41	83.7	90.7	85.63
IonosphereEW	95.74	96.43	94.4	97.8	97.2	94.89	94.89	98.6	86.79
KrvskpEW	98.44	97.81	97.3	97.2	97.1	98.5	97.31	97.81	93.53
Lymphography	96.67	96.67	93.4	89.3	88.4	83.78	89.19	96.9	87.02
M-of-n	100	100	100	100	100	100	100	100	92.47
PenglungEW	100	100	94.2	100	100	91.89	91.89	100	87.63
SonarEW	95.24	97.62	96.4	97.9	94.8	99.04	94.23	92.85	76.84
SpectEW	96.22	95.15	86.2	81.5	87	89.55	88.81	90.74	79.84
Tic-tac-toe	87.5	87.24	82.8	85.9	86.5	79.96	79.96	85	78.75
Vote	100	100	98.3	99.6	98.4	97.33	96	98.4	95.08
WaveformEW	85	84.4	74.8	75.7	74.6	78.36	75.6	86.8	80.18
WineEW	100	100	100	100	100	98.88	97.75	100	98.02
Zoo	100	100	100	100	100	90.2	96.08	100	98.95

Avg rank	1.833	2	3.5	3.944	4	4.22	5.33	2.33	5.944
Ass rank	1	2	4	5	6	7	8	3	9

Table 10 gives the comparison of BSNDO with state-of-the-art FS methods based on the number of selected features. It selects the least number of features in BreastEW, Exactly along with SSDsLAHC, BGA and BPSO, Lymphography along with BGA and BPSO. It also produces the best result in the case of Vote along with BPSO. It stands at second position in the case of Breastcancer along with BGA and BPSO, HeartEW, M-of-n with SSDsLAHC, BGA and BPSO, SpectEW with BPSO, WineEW along with SSDsLAHC. BSNDO stands at third position in the case of Exactly2 with SSDsLAHC and Zoo along with BPSO. It stands at fourth position in the case of IonosphereEW and Tic-tac-toe along with ECWSA-4. It selects the same number of features as EHHM in the case of CongressEW attaining the fifth position. In the case of PenglungEW and WaveformEW, it stands at ninth position.

Table 10

Comparison of BSNDO with state-of-the-art methods based on number of selected features tested on UCI datasets (least number of selected features are highlighted).

Dataset	BSNDO	SSDs+LAHC	HSGW	RSGW	ASGW	BGA	BPSO	EHHM	ECWSA-4
Breastcancer	4	2.5	5	5.933	4.867	4	4	4	7
BreastEW	4	9	16.667	17.5	15.833	8	9	13	15
CongressEW	7	5.5	8.867	9.7	8.833	2	3	7	4
Exactly	6	6	6.7	7.1	6.867	6	6	7	7
Exactly2	8	8	9.033	9.2	7.933	1	1	5	9
HeartEW	4	5	8.767	6.133	6.367	5	3	8	9
IonosphereEW	16	12	18.167	20.5	17.3	7	7	7	10
KrvskpEW	22	20	24.8	24.8	24.5	11	12	15	16
Lymphography	5	6.5	10.567	10.567	11.2	5	5	6	10
M-of-n	6	6	6.8	7.1	6.867	6	6	7	5
PenglungEW	187	140	135.33	181.2	170.3	84	130	74	93
SonarEW	27	23.5	34.3	36.433	35.5	19	22	22	23
SpectEW	6	9	10.233	13.3	10.167	5	6	11	7
Tic-tac-toe	8	9	7	7	7	5	6	6	8
Vote	3	4.5	7.567	8.8	8.967	5	3	5	6
WaveformEW	33	22.5	26.933	27.533	25.833	15	15	20	15
WineEW	3	3	4.533	5.867	5.933	4	5	1	7
Zoo	5	4.5	5.533	5.3	7.6	4	5	1	7

Avg rank	3.5	3.22	5.277	6.277	5.33	1.61	2.055	2.944	4.16
Ass rank	5	4	7	9	8	1	2	3	6

Comparison of BSNDO with state-of-the-art FS methods based on achieved classification accuracy tested on UCI datasets (highest classification accuracies are highlighted). To make a quantitative decision about a process, we perform a statistical test. The goal of this test is to determine whether there is enough clarity to “reject” a conjecture about the process. The conjecture is called the null hypothesis. In our case, the null hypothesis states that the two sets of results have the same distribution, which implies that if the distribution of two results is statistically different, then the generated -value from the test statistics will be 0.05 when the test is performed at 0.05% significance level. This will result in the rejection of the null hypothesis. So, to determine the statistical significance of the BSNDO algorithm, Wilcoxon rank-sum test (Wilcoxon, 1992) has been performed. It is a non-parametric statistical test where a pairwise comparison is performed. Individual meta-heuristic algorithms were run 20 times for each UCI dataset used here to perform the statistical test. From the test results provided in Table 11, we can conclude that the results of the proposed BSNDO algorithm is found to be statistically significant.

Table 11

-values generated via pairwise Wilcoxon test using the results obtained from 20 independent runs of the proposed BSNDO method and state-of-the-art FS methods used for comparison.

Dataset	SSDs+LAHC	HSGW	RSGW	ASGW	BGA	BPSO	EHHM	ECWSA-4
Breastcancer	0.031623	0.000212	0.000292	0.013724	0.000126	0.000392	0.000455	0.000392
BreastEW	0.275234	0.00029	0.000119	0.284088	0.003088	0.001373	1.91E−06	1.91E−06
CongressEW	0.599266	0.000138	0.000297	0.017474	0.000161	0.008211	0.176853	0.000212
Exactly	0.022958	8.83E−05	8.73E−05	0.000131	8.54E−05	8.72E−05	0.000131	0.000618
Exactly2	7.88E−05	8.72E−05	8.66E−05	0.000127	0.000153	7.99E−05	0.00021	0.974353
HeartEW	0.359797	8.81E−05	0.000102	0.00023	0.000153	8.72E−05	0.521673	8.72E−05
IonosphereEW	0.925575	0.000291	0.000127	0.003191	0.000845	0.047031	0.294252	1.91E−06
KrvskpEW	0.000155	8.86E−05	8.83E−05	0.000515	8.81E−05	0.000132	1.91E−06	0.009436
Lymphography	0.510917	0.000127	8.34E−05	0.001458	0.000285	0.000213	3.62E−05	0.000119
M-of-n	0.072789	8.77E−05	8.79E−05	8.77E−05	8.82E−05	0.000131	0.000618	8.72E−05
PenglungEW	0.054977	0.000234	0.014786	0.145537	0.213681	0.474082	0.009436	8.77E−05
SonarEW	0.264962	0.000115	0.00013	0.000527	0.937537	0.011806	1.91E−06	0.000127
SpectEW	0.262646	8.77E−05	0.000212	0.011068	0.000178	0.00013	1.91E−06	0.000127
Tic-tac-toe	0.155787	8.78E−05	8.83E−05	0.021748	0.000187	8.84E−05	0.452375	8.84E−05
Vote	0.166793	0.000128	0.000127	0.029028	0.00019	0.000406	4.77E−05	9.02E−05
WaveformEW	0.000182	8.86E−05	0.000103	8.77E−05	8.86E−05	8.83E−05	1.91E−06	0.000297
WineEW	0.365712	9.95E−05	8.71E−05	0.000859	0.05349	0.000269	0.001341	0.000269
Zoo	0.763025	9.02E−05	0.000104	0.0455	0.001689	0.000147	0.974353	8.77E−05

Comparison of BSNDO with state-of-the-art methods based on number of selected features tested on UCI datasets (least number of selected features are highlighted). -values generated via pairwise Wilcoxon test using the results obtained from 20 independent runs of the proposed BSNDO method and state-of-the-art FS methods used for comparison.

Additional testing on microarray datasets

The reported results, mentioned above, establish the fact that BSNDO performs better than the state-of-the-art methods considered here for comparison. To check the robustness of the proposed method, we have applied it on several high-dimensional microarray datasets (Ahmed et al., 2020a). The description of the datasets is given in Table 12. To confirm the superiority of the proposed method, it is compared with some state-of-the-art methods, namely: GA (Ghosh et al., 2018a), Memetic algorithm (MA) (Ghosh et al., 2019b, Ghosh et al., 2018b), WFACOFS (Ghosh et al., 2019a) and ECWSA (Guha et al., 2020). The comparison table is given in Table 13

Table 12

Description of datasets used to check the robustness of BSNDO.

Dataset	Number of features	Number of samples	Number of classes
AMLGSA2191	12 616	54	2
DLBCL	7070	77	2
Leukaemia	5147	72	2
Prostate	12 533	102	2
MLL	12 533	72	3
SRBCT	2308	83	4

Table 13

Comparison of the results of BSNDO on microarray with state-of-the-art methods. The number of features selected is provided in brackets at the side of the accuracy.

Dataset	GA	MA	WFACOFS	ECWSA-1	ECWSA-2	ECWSA-3	ECWSA-4	BSNDO
AMLGSE2191	100(98)	100(91)	96.3(17)	96.67(17)	100(9)	95.83(16)	95.83(18)	100(9)
DLBCL	100(88)	100(105)	100(3)	100(29)	100(24)	100(26)	100(31)	100(10)
Leukaemia	100(85)	100(65)	100(5)	97.22(7)	100(8)	100(4)	97.22(5)	100(12)
Prostrate	100(99)	100(107)	100(22)	96.3(16)	98.15(16)	96.3(9)	96.3(19)	95.24(20)
MLL	100(94)	100(80)	100(25)	100(16)	100(17)	100(8)	100(15)	100(16)
SRBCT	100(78)	100(50)	100(19)	100(45)	100(32)	100(34)	100(30)	100(11)

As microarray datasets consist of a very high number of attributes, it becomes a challenging task for us to rule out the irrelevant ones. The obtained results again demonstrate the effectiveness of BSNDO. From this Table 13, we can see that BSNDO produces noteworthy results as compared to the other methods considered here for comparison. It produces 100% accuracy in every dataset except Prostrate. It also selects the least features in the case of AMLGSE2191 and SRBCT. Description of datasets used to check the robustness of BSNDO. Comparison of the results of BSNDO on microarray with state-of-the-art methods. The number of features selected is provided in brackets at the side of the accuracy.

Testing on COVID-19 dataset

COVID-19 is a contagious disease, which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The common symptoms of COVID-19 are fever, cough, fatigue, breathing difficulties, and loss of smell and taste etc. Symptoms begin 1 to 14 days after exposure to the virus. While most people have mild symptoms, some people develop acute respiratory distress syndrome (ARDS). Because of its nature, the accurate result of the COVID-19 test has become a challenging task. Some recent COVID-19 screening techniques are Bandyopadhyay et al., 2021, Barnes et al., 2021, Dey et al., 2021, Ismael and Şengür, 2021, Kundu et al., 2021 and Nigam et al. (2021). More than 190 million people is suffering from COVID-19, and more than 4 million people have died because of it. So, detecting the COVID-19 and keeping those people in quarantine have become one of the topmost priorities of every country. Though vaccination process has started, it will take time to reach everyone, especially in the under-development countries. We have tested our FS method on a COVID-19 dataset, which is publicly available at https://github.com/Atharva-Peshkar/Covid-19-Patient-Health-Analytics in csv format. This dataset contains 1086 instances and 74 attributes. The obtained results are compared with some meta-heuristics based FS methods: SSDsLAHC, ASGW, HSGW, RSGW, GA, PSO and Adaptive -coral reefs optimization (ACRO) (Ahmed et al., 2020a). The comparison table shows the achieved classification accuracy and the number of selected features (in brackets) (see Table 14).

Table 14

Comparison of achieved classification accuracy evaluated on mentioned COVID-19 dataset.

GNDO+SA	SSDs+LAHC	ASGW	HSGW	RSGW	GA	PSO	AβCRO
98.61 (26)	97.69 (23)	97.69 (40)	96.31 (50)	97.75 (54)	94.9 (23)	97.24 (31)	98.2(20)

Comparison of achieved classification accuracy evaluated on mentioned COVID-19 dataset.

Conclusion and future work

In this work, a new FS method, called BSNDO, which is based on GNDO and SA has been proposed. SA has been used as a local search to enhance the exploitation of the GNDO and to create a proper balance between exploration and exploitation of the overall method. The proposed method shows significant improvement in achieved classification accuracy while FS is performed using BSNDO than BGNDO and some state-of-the-art methods. Primarily the method has been tested on various UCI datasets. To prove the robustness of the model, the proposed method is also applied on high dimensional microarray datasets. Furthermore, it is experimented on a COVID-19 dataset for detecting the COVID-19 cases. This is to be noted that all the datasets used here are publicly available. The obtained results show the applicability of BSNDO in varied datasets. One of the limitations of this method may be the computational complexity due to addition of the local search to the GNDO algorithm. As a future scope of this work, a deeper analysis of the gene selections done by BSNDO and their biological impact can be studied. The proposed FS method can also be applied to some other real-world problems like handwritten word or digit recognition, and face recognition etc., where researchers sometimes use very high dimensional feature vectors without knowing the importance of all the features.

CRediT authorship contribution statement

Shameem Ahmed: Conceptualization, Methodology, Writing – original draft, Software, Investigation. Khalid Hassan Sheikh: Writing – review & editing, Software, Investigation, Conceptualization. Seyedali Mirjalili: Writing – review & editing, Supervision, Project administration, Validation. Ram Sarkar: Writing – review & editing, Supervision, Project administration, Validation, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

10 in total

1. Novel type of phase transition in a system of self-driven particles.

Authors:
Journal: Phys Rev Lett Date: 1995-08-07 Impact factor: 9.161

2. Optimization by simulated annealing.

Authors: S Kirkpatrick; C D Gelatt; M P Vecchi
Journal: Science Date: 1983-05-13 Impact factor: 47.728

3. Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods.

Authors: Manosij Ghosh; Sukdev Adhikary; Kushal Kanti Ghosh; Aritra Sardar; Shemim Begum; Ram Sarkar
Journal: Med Biol Eng Comput Date: 2018-08-01 Impact factor: 2.602

4. Detection of COVID-19 from CT scan images: A spiking neural network-based approach.

Authors: Avishek Garain; Arpan Basu; Fabio Giampaolo; Juan D Velasquez; Ram Sarkar
Journal: Neural Comput Appl Date: 2021-04-16 Impact factor: 5.606

5. COVID-19: Automatic detection from X-ray images by utilizing deep learning methods.

Authors: Bhawna Nigam; Ayan Nigam; Rahul Jain; Shubham Dodia; Nidhi Arora; B Annappa
Journal: Expert Syst Appl Date: 2021-03-16 Impact factor: 6.954

6. Generation of Synthetic Chest X-ray Images and Detection of COVID-19: A Deep Learning Based Approach.

Authors: Yash Karbhari; Arpan Basu; Zong-Woo Geem; Gi-Tae Han; Ram Sarkar
Journal: Diagnostics (Basel) Date: 2021-05-18

7. Fuzzy rank-based fusion of CNN models using Gompertz function for screening COVID-19 CT-scans.

Authors: Rohit Kundu; Hritam Basak; Pawan Kumar Singh; Ali Ahmadian; Massimiliano Ferrara; Ram Sarkar
Journal: Sci Rep Date: 2021-07-08 Impact factor: 4.379

8. Harris Hawks optimisation with Simulated Annealing as a deep feature selection method for screening of COVID-19 CT-scans.

Authors: Rajarshi Bandyopadhyay; Arpan Basu; Erik Cuevas; Ram Sarkar
Journal: Appl Soft Comput Date: 2021-07-14 Impact factor: 6.725

9. Choquet fuzzy integral-based classifier ensemble technique for COVID-19 detection.

Authors: Subhrajit Dey; Rajdeep Bhattacharya; Samir Malakar; Seyedali Mirjalili; Ram Sarkar
Journal: Comput Biol Med Date: 2021-06-22 Impact factor: 4.589

10 in total

2 in total

1. An adaptive and altruistic PSO-based deep feature selection method for Pneumonia detection from Chest X-rays.

Authors: Rishav Pramanik; Sourodip Sarkar; Ram Sarkar
Journal: Appl Soft Comput Date: 2022-08-10 Impact factor: 8.263

2. A hybrid binary dwarf mongoose optimization algorithm with simulated annealing for feature selection on high dimensional multi-class datasets.

Authors: Olatunji A Akinola; Absalom E Ezugwu; Olaide N Oyelade; Jeffrey O Agushaka
Journal: Sci Rep Date: 2022-09-02 Impact factor: 4.996

2 in total