Literature DB >> 36072739

Feature Selection Based on Adaptive Particle Swarm Optimization with Leadership Learning.

Zhiwei Ye1, Yi Xu1, Qiyi He1, Mingwei Wang1, Wanfang Bai2, Hongwei Xiao3.   

Abstract

With the rapid development of the Internet of Things (IoT), the curse of dimensionality becomes increasingly common. Feature selection (FS) is to eliminate irrelevant and redundant features in the datasets. Particle swarm optimization (PSO) is an efficient metaheuristic algorithm that has been successfully applied to obtain the optimal feature subset with essential information in an acceptable time. However, it is easy to fall into the local optima when dealing with high-dimensional datasets due to constant parameter values and insufficient population diversity. In the paper, an FS method is proposed by utilizing adaptive PSO with leadership learning (APSOLL). An adaptive updating strategy for parameters is used to replace the constant parameters, and the leadership learning strategy is utilized to provide valid population diversity. Experimental results on 10 UCI datasets show that APSOLL has better exploration and exploitation capabilities through comparison with PSO, grey wolf optimizer (GWO), Harris hawks optimization (HHO), flower pollination algorithm (FPA), salp swarm algorithm (SSA), linear PSO (LPSO), and hybrid PSO and differential evolution (HPSO-DE). Moreover, less than 8% of features in the original datasets are selected on average, and the feature subsets are more effective in most cases compared to those generated by 6 traditional FS methods (analysis of variance (ANOVA), Chi-Squared (CHI2), Pearson, Spearman, Kendall, and Mutual Information (MI)).
Copyright © 2022 Zhiwei Ye et al.

Entities:  

Mesh:

Year:  2022        PMID: 36072739      PMCID: PMC9441366          DOI: 10.1155/2022/1825341

Source DB:  PubMed          Journal:  Comput Intell Neurosci


1. Introduction

Large amounts of data have been generated in various fields such as social media, healthcare, cybersecurity, and education in the past decades, and edge computing provides an effective solution for data storage and transmission. However, as the dimensionality of the data increases, the curse of dimensionality problem becomes common, which has a negative impact on the stability, security, and computational efficiency of edge computing. Feature selection (FS) is a data preprocessing technique in machine learning and data mining that has been applied to improve the performance of edge computing by eliminating irrelevant and redundant features in the datasets [1-3]. In general, it is a combinatorial optimization problem [4, 5] that tries to find the optimal feature subsets with essential information from the original datasets. Given a dataset with N features, there will be 2N possible feature subsets, and the search space rises exponentially as the number of features increases [6, 7]. Hence, some traditional FS methods have received considerable interest due to their ability to evaluate feature importance and select a certain number of top-ranked features. These methods include statistical test (e.g., analysis of variance (ANOVA) [8, 9] and Chi-Squared (CHI2) [10, 11]), correlation criteria (e.g., Pearson [12], Spearman [13, 14], Kendall [15, 16]), and information theory (e.g., symmetrical uncertainty (SU) [17], mutual information (MI) [18, 19], and entropy [20]). However, the statistical test and correlation criteria techniques only consider the correlation between features and labels, and the feature subsets are not appropriate because some highly correlated but redundant features are selected. As a result, information theory techniques are applied to FS problems owing to their consideration of redundancy between features as well. Moreover, the redundancy calculation only focuses on the interaction between two features and fails to identify those of multiple features [21], which may ignore some important features. Therefore, how to find suitable feature subsets efficiently needs to be further investigated. Metaheuristic algorithms such as monarch butterfly optimization (MBO) [22], slime mold algorithm (SMA) [23], moth search algorithm (MSA) [24], hunger games search (HGS) [25], hybrid rice optimization (HRO) [26], colony predation algorithm (CPA) [27], weighted mean of vectors (INFO) [28], grey wolf optimizer (GWO) [29], clonal flower pollination algorithm (FPA) [30], salp swarm algorithm (SSA) [31], Harris hawks optimization (HHO) [32], and particle swarm optimization (PSO), have been used to solve combinatorial optimization problems because of their dynamic exploration and exploitation capabilities in the search space, some of which have shown to be successful in FS problems [33, 34]. For instance, Shen and Zhang [29] proposed a two-stage GWO for processing biomedical datasets, which showed better performance in terms of time consumption and classification accuracy by removing more than 95.7% of the redundant features. Hussain et al. [32] developed an FS method based on HHO, which removed 87% of features and achieved 92% of classification accuracy. Yan et al. [30] presented a binary clonal FPA for some biomedical datasets, which enhanced population diversity and selected fewer features with strong robustness. Balakrishnan et al. [31] designed an FS method based on salp SSA, which increased the ability of particles to explore different regions by randomly updating their position and improved the confidence level by 0.1033% on 6 datasets. However, a series of parameters need to be set by users in these metaheuristic algorithms, and unsuitable parameters may lead to slow convergence and local stagnation. A lot of experiments and extensive experience are needed to find the appropriate parameter settings. Compared with the above metaheuristic algorithms, PSO is applied to solve FS problem of its fast convergence and few parameters. However, the exploration and exploitation capabilities are influenced by parameter setting and population diversity as the number of features increases. Therefore, some improved PSO based on parameter updating and population diversity updating strategies have been proposed for FS. For example, Song et al. [35] developed a three-phase hybrid FS algorithm, which reduced the computational cost by using correlation-guided clustering and an improved integer PSO. Tran et al. [36] used a bare-bones PSO for FS, which reduced the search space of the problem and improve the search efficiency. Song et al. [37] also introduced a variable-size cooperative coevolutionary PSO for high-dimensional datasets, which divided a high-dimensional FS problem into multiple low-dimensional subproblems with a low computational cost. Hu et al. [38] presented a multi-objective PSO for FS, which achieved superior performances in approximation, diversity, and feature cost by introducing a tolerance coefficient. Hosseini Bamakan et al. [39] proposed a time-varying PSO-based FS method to deal with the network intrusion detection problem, which obtained a higher detection rate and lower false alarm rate by introducing a chaotic concept and time-varying parameters. Mafarja et al. [40] proposed a binary PSO-based FS method, which adopted a time-varying inertia weighting strategy and showed a superior convergence rate on some datasets. Huang et al. [41] utilized cut-point and feature discretization to expand the searching scope of PSO for gene expression datasets, which selected fewer features and maintained similar classification accuracy. Xue et al. [42] introduced adaptive parameters in PSO for high-dimensional datasets, which allowed particles to automatically adjust parameters during the search process and decreased time consumption. Moradi and Gholampour [43] used a PSO with the local search strategy for high-dimensional datasets, which adjusted the search process by considering the correlation information between distinct features. Chen et al. [44] introduced an FS method based on hybrid PSO and differential evolution (HPSO-DE), which enhanced population diversity by adopting mutation, crossover, and selection operators. Although the optimization ability of PSO is improved to some extent by the above techniques, the randomness of the search process may be increased and they lack consideration for jumping out of the local optima. In the paper, an FS method based on adaptive PSO with leadership learning (APSOLL) is proposed, which combines parameter updating and population diversity updating strategies to compensate for the shortcomings of PSO. The adaptive updating strategy for parameters is used to guide particles to search in a more reasonable scope, and the leadership learning strategy is utilized to enhance population diversity. Overall, the main contributions of our work are as follows: Based on the population state, an adaptive updating strategy for parameters is proposed to replace the constant parameters which guide particles to search in a more reasonable scope. Adopting leadership learning strategies to provide valid population diversity by learning from the first three leaders in the population that enhances the exploration and exploitation capabilities of PSO. The effectiveness of the proposed method is verified by comparing it with six traditional methods (ANOVA, CHI2, Pearson, Spearman, Kendall, and MI) and seven metaheuristic algorithms-based FS methods (GWO, HHO, FPA, SSA, LPSO, and HPSO-DE).

2. Background and Related Work

2.1. Overview of PSO

PSO is a population-based metaheuristic algorithm for simulating the predatory activities of bird and fish populations [45, 46], and each particle in the population has two properties: velocity vector v = (v, v, ⋯, v) and position vector x = (x, x, ⋯, x), where d denotes the dimension. In the search process of PSO, the velocity vectors are dynamically adjusted by the personal best position (pbest) and the global best position (gbest) at the current stage, and the position vectors are the candidate solutions to the optimization problems, all of which are updated by equations (1)–(2).where v and x represent the velocity and position vectors of the i–th (i = 1, 2,…, N) particle, and the upper and lower limits of each dimension are set to 1 and 0, respectively. ω is defined as the inertia parameter, and it is a non-negative number. c and c are acceleration parameters, and the former represents the personal learning parameter and the latter represents the global learning parameter, which is used to control the search scope of particles and set by users. r and r are random numbers in [0, 1].

2.2. The Leadership Learning Strategy

Leadership learning strategy is a management concept that describes the dynamic process of feed-forward and feedback in a living system. Hirst et al. [47] suggested that learning activities of individuals will affect the decisions of leaders, and it is called feed-forward learning flow. Moreover, effective leaders may quickly identify key information in group development and have a lasting impact on the individuals and group activities through their decisions in turn, which is regarded as feedback learning flow. In the model of leadership learning strategy, feed-forward and feedback learning flow among individuals, groups, and leaders together determine the scope of the system development, and the framework is shown in Figure 1.
Figure 1

The framework of leadership learning.

Based on the leadership learning strategy, GWO was proposed with effective exploration capability and acceptable time consumption by learning from the first three best solutions (leaders) of each iteration [48-51]. In the search process, the population is divided into four levels, sequentially α, β, δ, and ω, where α, β, and δ are regarded as leaders, the remaining particles ω are considered as individuals, and the population is considered group. Moreover, the particles and leaders learning from each other are considered as the leadership learning strategy, and it is shown in Equation (3).where , , and are position vectors of α, β, and δ. , , denote the distance between particles and leaders. , , and are random numbers from 0 to 2. The search scope of particles is controlled by the convergence factor , which is computed as Equation (4).where the variable a = 2(1 − t/T) is the control coefficient (T denotes the maximum number of iterations), and it decreases linearly from 2 to 0 during the search process.

3. The Proposed Method

In this section, an FS method based on APSOLL is presented to conduct classification on 10 UCI datasets. The corresponding techniques for the proposed method are described as follows:

3.1. Adaptive Updating Strategy for Parameters

During the search process of PSO, the search scope of particles is affected by convergence factor c1 and c2. In general, they are usually less than 2 and set to constant values by users [52-54]. However, the population is dynamically changed according to the optimal fitness value, it is appropriate to adaptively adjust c1 and c2 for better exploration and exploitation. Moreover, the change of fitness value during the iteration reflects the state of the population, thus the adaptive updating strategy is proposed based on this case, and it is used to replace the convergence factor, which is shown in equations (5)‒(6).where m is a variable and initially set to 0, and it is increased by 1 if the fitness value is improved in the next iteration, otherwise the value of which is always 0. Thus, c is dynamically changed between 1 and 2 during the search process, and it is gradually increased if the algorithm falls into the local optima.

3.2. The Search Process of Leadership Learning Strategy

The population diversity of PSO may be inadequate due to the strategy learned from pbest and gbest. Smith [55] proposed that the more leaders of individuals engage feed-forward and feedback in a living system, the more possible it is for the group to change, innovate, and cooperate. However, the time consumption will increase as the number of leaders increases during the process. Therefore, inspired by GWO, the leadership learning strategy from 3 leaders is used to reconstruct the velocity vectors of PSO, which will increase population diversity and provide more accurate information for better exploration and exploitation. In addition, an adaptive parameter c is combined to guide the particles to search in a more reasonable scope, and the process is shown in Equation (7).where , and represent the leadership learning strategy. r4 is a random number between 0 and 1. c is updated by (6), it is dynamically changed between 1 and 2 during the search process, and it is gradually increased if the algorithm falls into the local optima. The cooperation of c/2, c/3 and c/4 will allow particles to search in a more reasonable scope with higher possibilities. As for the leadership learning strategy, Hu et al. [50] proposed that the convergence factor greater than 1 shows better exploration capability and less than 1 shows better exploitation capability. However, it can be seen from (4) that is linearly decreased and always less than 1in the last 50% of iterations, and the exploration capability is insufficient when the algorithm is trapped in the local optima in this case. Hence, it is considered to increase the possibility that is greater than 1 at this stage and it is modified as shown in Equation (8).where r5 is a random number in [0, 1], and is adaptively changed during the search process. It will be greater than 1 with a higher possibility and thus enhance the exploration capability when the algorithm falls into the local optima.

3.3. The Encoding Schema

The core object of the proposed method is to select a suitable expression form for FS and establish a reasonable mapping between the solutions and the feature subsets. The candidate solutions that are binarized are used to represent the features, where “1” denotes the feature is selected and “0” illustrates the feature is abandoned. For instance, there is a feature dataset with 10 features, and the candidate solution is coded as 1010000011, which means the 1st, 3rd, 9th, and 10th features are selected and the others are abandoned. The position vector of each particle is binarized according to Equation (9).where Xb=(Xb, Xb, ⋯ , Xb), i and d denote the number of particles and the number of features, respectively.

3.4. The Definition of Objective Function

The feature subsets generated by FS methods for classification have two main goals, which are maximizing the classification accuracy (minimizing the classification error) and minimizing the number of selected features. As a mainstream classifier, K nearest neighbor (KNN) [56-58] is utilized for FS due to its advantages of simplicity and insensitivity to noisy data. Furthermore, how to reduce the number of selected features is considered another core issue. The ultimate goal is to obtain the optimal feature subsets with essential information from the original datasets while achieving higher classification accuracy with fewer features. Hence, the objective function that combines the classification accuracy and the number of selected features is adopted and it is defined as Equation (10).where acc (X) denotes the classification accuracy of the feature subsets, #X and N represent the number of features in the feature subset and the original dataset. θ is a weighting factor to balance the classification accuracy and the number of selected features, and it is set to 0.7.

3.5. Implementation of the Proposed Method

The main process of APSOLL is to search for the optimal feature subsets with essential information from the original datasets and apply it for classification, and the pseudocode is shown in Algorithm 1. Among these, the particles are binarized to determine the corresponding feature subsets in each iteration, and the leaders are determined by computing the fitness function, which is used to guide the search process. Figure 2 shows the flowchart of APSOLL. When the algorithm starts running, it randomly initializes the velocity vector v, position vector x, pbest, gbest, and sets m = 0 and t = 0. In each iteration, the fitness value of each particle is calculated in order to find the optimal three solutions (leaders). Based on the information provided by the leader, the velocity of the particles and the position of the population are updated. In this process, if the optimal fitness value is not changed, the adaptive parameter m is added by 1. The algorithm run is ended and the optimal solution is binarized when the maximum number of iterations is reached.
Figure 2

The flowchart of APSOLL.

4. Experimental Design

All experimental procedures are implemented using Python 3.8 in a PC with Intel(R) Core (TM) i5-9400 @ 2.9 GHz CPU, and 16 GB DDR4 of RAM under Windows 10 Operating System. 10 public datasets are used to assess the quality of the proposed method. APSOLL is compared with 7 metaheuristic algorithms to evaluate the optimization ability, and 6 traditional FS methods such as ANOVA, CHI2, Pearson, Spearman, Kendall, and MI are used to analyze the effectiveness of the feature subsets selected by the proposed method.

4.1. Datasets Description

10 datasets from the UCI machine learning database are used to evaluate the performance of the proposed method, including myocardial infarction complications (MIC), urban, SCADI, arrhythmia, madelon, isolet5, multiple features (MF), Parkinson's disease (PD), CNAE-9, and QSAR, all of which have more than 100 features, with the number of classes ranging from 2 to 26 and instances ranging from 69 to 2600, and the details of datasets are shown in Table 1. In the experiments, each dataset is randomly divided into two parts: a total of 70% of the instances are chosen as the training data, and the remaining 30% are used as the testing data. Li et al. [54]described in detail why the dataset dividing approach was adopted.
Table 1

Details of datasets.

DatasetNumber of featuresNumber of instancesNumber of classes
MIC12417007
Urban1475079
SCADI205696
Arrhythmia27945213
Madelon50026002
Isolet5617155926
MF649200010
PD7547562
CNAE-985710809
QSAR102416872

4.2. Parameters Setting for Metaheuristic Algorithms

As for APSOLL, the search process requires only one inertia weight parameter ω to be set. In addition, some commonly used FS methods based on metaheuristic algorithms are adopted to evaluate the optimization ability, such as GWO, PSO, HHO, FPA, SSA, LPSO, and HPSO-DE. Among them, LPSO [40] and HPSO-DE [44] are classical benchmark PSO-based FS methods by adopting parameter updating and population diversity updating strategies, respectively. The parameters of each metaheuristic algorithm are set based on the published literature, which is shown in Table 2. Furthermore, the binary encoding scheme is utilized for each metaheuristic algorithm and it is run independently 30 times to take the average as the result in order to eliminate the influence of randomness.
Table 2

Parameters Setting of different metaheuristic algorithms.

AlgorithmsParametersValues
Common settingsNumber of iterations T=100
Population size N=30
The upper limit of particle position ub=1
The lower limit of particle positionlb =0

GWOCorrelation coefficient a decreases linearly from 2 to 0

PSOAcceleration factor c 1  = 2, c2 = 2
Inertia weight w = 0.9

HHOLevy component β = 0.8

FPAAcceleration factor c 1  = 2, c2 = 2
Levy component β = 1.5
Switch probability P=0.8

SSAConvergence factor C decreases linearly from 2 to 0

LPSOAcceleration factor c 1  = 2, c2 = 2
Upper limit of inertia weight wmax = 0.9
Lower limit of inertia weight wmin = 0.4

HPSO-DEAcceleration factor c 1  = 2, c2 = 2
Crossover rate CR = 0.2
Scaling factor F = 0.5
Predefined generation G = 5
Inertia weight w = 0.9

APSOLLInertia weight w = 0.9

5. Results and Discussion

5.1. Experimental Results of Different Metaheuristic Algorithms

The optimization ability of APSOLL is evaluated from the fitness value, classification accuracy, number of selected features, and CPU time. The average convergence curves of the fitness value are shown in Figures 3-4, and the number of selected features in the search process is shown in Figures 5-6. In the experiment, the t-test with a significance level of 0.05 is used to determine whether the results obtained from the proposed algorithm are statistically significantly different from other metaheuristic algorithms, and the experimental results are presented in Tables 3-4, where Fit, Acc, and #F denote the fitness values, classification accuracy and number of selected features after 30 independent runs, and Time presents the CPU time of the whole process (in seconds). S, S, and S display the t-test results, where “+” or “−” means the result is worse or better than the proposed method and “=” means they are similar in the t-test. In other words, the more “+”, the better the proposed methods.
Figure 3

The average convergence curves of different metaheuristic algorithms for datasets below 500 dimensions.

Figure 4

The average convergence curves of different metaheuristic algorithms for datasets above 500 dimensions.

Figure 5

The average number of selected features for datasets below 500 dimensions by different FS methods based on metaheuristic algorithms.

Figure 6

The average number of selected features for datasets above 500 dimensions by different FS methods based on metaheuristic algorithms.

Table 3

Comparisons between APSOLL and other metaheuristic algorithms for datasets below 500 dimensions.

DatasetsMethodFit (std.)SfitAcc (std.)Sacc#F (std.) S f Time
MICGWO93.28 (0.27)+91.03 (0.40)+1.80 (0.65)=125.37
PSO87.04 (0.96)+91.03 (0.48)+27.40 (3.49)+220.42
HHO91.83 (1.76)+89.08 (2.60)+2.13 (2.42) = 162.52
FPA82.66 (0.53)+90.63 (0.75)+44.2 (2.50)+217.76
SSA82.68 (0.87)+90.65 (0.78)+44.2 (3.23)+122.10
LPSO86.89 (0.89)+91.08 (0.54)+28.13 (3.48)+220.95
HPSO-DE92.99 (0.38)+90.83 (0.45)+2.43 (1.09)+133.59
APSOLL93.65 (0.35) 91.40 (0.56) 1.33 (0.47) 122.10

UrbanGWO87.21 (4.12)=85.03 (16.03)=11.33 (2.70)+96.00
PSO65.57 (5.57)+64.97 (13.90)+48.53 (5.89)+160.65
HHO83.78 (3.18)+79.43 (14.26)+8.93 (5.41)=94.06
FPA57.64 (2.87)+58.26 (10.28)+64.40 (6.52)+163.10
SSA58.10 (3.92)+58.17 (10.39)+61.83 (5.88)+162.84
LPSO62.53 (4.90)+60.41 (11.97)+47.80 (5.17)+163.89
HPSO-DE86.06 (1.20)=82.24 (14.30)=7.40 (2.11)=47.19
APSOLL86.60 (2.18) 82.84 (20.46) 6.83 (1.91) 75.32

SCADIGWO95.13 (2.04)=95.40 (2.88)=11.23 (7.49)=29.19
PSO86.43 (3.22)+93.33 (4.19)+60.87 (8.10)+124.32
HHO91.95 (3.61)+90.63 (4.51)+10.23 (7.14)=24.98
FPA81.80 (3.48)+92.38 (4.19)+87.90 (7.17)+147.73
SSA81.05 (3.59)+90.79 (4.59)+85.47 (8.11)+152.42
LPSO86.01 (3.12)+92.54 (4.20)+59.90 (6.65)+100.95
HPSO-DE94.38 (2.38)+93.65 (3.33)+8.07 (2.89)=23.31
APSOLL97.04 (1.63) 97.22 (2.35) 6.92 (2.75) 33.66

ArrhythmiaGWO78.11 (1.31)+72.33 (1.70)+23.48 (5.65)+161.93
PSO67.50 (1.28)+68.97 (1.98)+100.23 (8.12)+164.50
HHO74.48 (1.94)+65.29 (3.20)+11.40 (10.72)=127.69
FPA62.73 (1.07)+65.59 (1.97)+122.57 (7.68)+160.16
SSA62.56 (1.30)+65.39 (1.77)+122.93 (6.44)+159.65
LPSO67.92 (1.39)+68.77 (1.80)+95.03 (6.31)+167.47
HPSO-DE75.49 (0.86)+66.96 (1.39)+12.87 (2.50)=80.80
APSOLL80.82 (1.45) 74.14 (1.75) 10.08 (3.95) 113.36

MadelonGWO90.28 (1.00)+89.71 (1.17)+42.00 (7.01)+310.38
PSO74.72 (1.12)+82.04 (1.18)+211.73 (12.36)+327.80
HHO81.44 (3.82)+78.95 (3.48)+216.47 (10.49)+399.71
FPA75.08 (0.86)+77.52 (1.16)+236.67 (9.62)+320.63
SSA70.06 (1.21)+77.53 (1.57)+242.17 (8.97)+322.84
LPSO75.07 (1.18)+82.94 (1.58)+63.70 (37.22)+325.15
HPSO-DE79.98 (1.72)+73.64 (2.56)+26.13 (5.06)+301.25
APSOLL92.44 (0.44) 90.65 (0.62) 16.92 (4.75) 259.51
Table 4

Comparisons between APSOLL and other metaheuristic algorithms for datasets above 500 dimensions.

DatasetsMethodFit (std.)SfitAcc (std.)Sacc#F (Std.)SfTime
Isolet5GWO89.66 (1.01)+91.23 (1.38)=86.53 (9.14)+212.36
PSO78.10 (1.04)+87.31 (1.44)+268.07 (10.71)+219.31
HHO82.61 (1.71)+81.60 (1.83)+92.57 (26.83)+283.60
FPA74.45 (0.89)+83.50 (1.34)+287.73 (10.48)+211.13
SSA74.25 (1.02)+83.60 (1.45)+293.53 (8.69)+207.08
LPSO78.61 (0.98)+87.79 (1.40)+264.42 (10.19)+215.30
HPSO-DE81.72 (1.08)+76.42 (1.68)+36.53 (5.12)215.85
APSOLL91.37 (0.49) 91.08 (0.55) 48.92 (2.36) 219.14

MFGWO96.63 (0.54)+97.77 (0.54)=39.27 (6.89)+225.82
PSO86.86 (0.72)+97.19 (0.54)+241.73 (13.51)+274.84
HHO94.04 (0.95)+94.98 (0.98)+52.93 (13.66)+303.36
FPA84.31 (0.53)+96.47 (0.60)+286.13 (6.77)+281.36
SSA84.39 (0.53)+96.67 (0.71)+287.33 (9.16)+275.93
LPSO87.18 (0.63)+97.19 (0.61)+234.7 (8.29)+267.64
HPSO-DE94.05 (0.53)+93.84 (0.77)+35.57 (4.65)+224.65
APSOLL97.71 (0.33) 98.07 (0.53) 20.25 (1.23) 228.91

PDGWO85.54 (2.25)+80.88 (3.54)=27.00 (9.84)+187.67
PSO71.54 (1.62)+74.60 (2.11)+268.4 (11.07)+185.17
HHO86.78 (1.28)+81.60 (1.90)+8.43 (6.53)=127.75
FPA68.77 (1.31)+74.60 (2.24)+338.03 (12.81)+174.27
SSA68.59 (1.44)+74.23 (1.99)+336 (13.24)+173.38
LPSO71.38 (1.96)+74.48 (2.60)+270.43 (14.95)+185.49
HPSO-DE86.88 (0.87)+83.26 (1.39)=35.20 (4.53)+197.66
APSOLL88.44 (0.85) 83.92 (1.24) 7.58 (2.22) 152.82

CNAE-9GWO86.28 (1.22)+88.80 (1.53)167.83 (20.93)+203.26
PSO77.41 (1.75)+88.25 (2.47)409.80 (14.03)+197.09
HHO74.04 (1.83)+79.55 (4.12)+332.23 (80.68)+269.84
FPA73.91 (1.36)+83.79 (1.99)+420.70 (15.22)+185.77
SSA73.74 (1.85)+83.80 (2.51)+425.57 (12.44)+183.18
LPSO77.69 (1.27)+88.79 (1.92)412.63 (14.06)+195.26
HPSO-DE69.52 (1.87)+77.60 (2.43)+422.40 (13.83)+200.46
APSOLL87.35 (0.55) 85.03 (0.93) 61.83 (5.38) 210.71

QSARGWO92.45 (0.54)+93.21 (0.66)=95.57 (9.18)+236.35
PSO82.20 (0.62)+92.01 (0.68)+416.70 (17.62)+327.26
HHO92.68 (0.45)+90.35 (0.83)+19.10 (12.23)227.42
FPA80.13 (0.49)+91.16 (0.80)+466.93 (10.49)+323.93
SSA80.06 (0.48)+91.37 (0.71)+474.40 (11.67)+320.06
LPSO82.40 (0.55)+92.02 (0.59)+410.13 (14.90)+323.23
HPSO-DE92.44 (0.27)+91.14 (0.41)+46.30 (6.15)+207.01
APSOLL94.10 (0.55) 93.11 (0.74) 36.83 (7.84) 231.28
From the variation curves of the fitness value, it is shown that APSOLL has achieved better fitness values on all datasets, which means the optimization ability of APSOLL is better than other metaheuristic algorithms by adopting the adaptive updating and leadership learning strategy. From Figures 3–4, it can be observed that HHO and HPSO-DE converge prematurely on most datasets, and PSO, SSA, FPA, and LPSO converge slower and have poor overall performance. In contrast, APSOLL achieves a balance in convergence speed and performance. In terms of classification accuracy, APSOLL-based FS method exceeds 80% on average in 9 of the 10 datasets, especially on MF, which has reached 98.07%. As it can be seen in Figures 5–6, PSO, SSA, FPA, and LPSO have limited performance in reducing the size of feature subsets, while APSO performs better than other methods on most datasets during the iterative process. In Tables 3–4, the number of selected features by APSOLL is less than those of other metaheuristic algorithms in most cases. A total of 30%–50% of features in the original datasets are selected by FPA and SSA, while less than 8% of features are selected by APSOLL. In particular, only 7.58 features are selected on average from the original 754 features on PD. As for CPU time, APSOLL consumes less time on MIC and madelon compared to other metaheuristic algorithms. Moreover, although it consumes slightly more time on other datasets, it performs better in the two main aims of the classification accuracy and the number of selected features. In summary, the optimization ability of APSOLL is better than other metaheuristic algorithms, and the suitable feature subsets are selected with higher classification accuracy and fewer features at an acceptable time.

5.2. Experimental Results of Traditional Methods

To demonstrate the effectiveness of APSOLL-based FS method, the performance is compared with that of 6 traditional methods. Figures 7‒8 show the classification accuracy of 6 traditional FS methods for different numbers of selected features, and the optimal solutions of the proposed and traditional methods are presented in Table 5.
Figure 7

The classification accuracy of 6 traditional FS methods in selecting different numbers of features for datasets below 500 dimensions.

Figure 8

The classification accuracy of 6 traditional FS methods in selecting different numbers of features for datasets above 500 dimensions.

Table 5

The optimal classification accuracy, number of selected features, and CPU time in comparison to traditional methods.

DatasetsANOVACHI2PearsonSpearmanKendallMIAPSOLL
MICAcc (%)90.3994.3190.3990.3990.3990.3992.55
#F316295659192
Time3.383.244.0313.048.15164.07123.69

UrbanAcc (%)83.0183.6663.4063.4063.4082.3585.62
#F2262333314
Time2.123.362.9615.328.61161.5991.62

SCADIAcc (%)95.2495.2485.7190.4785.71100100
#F233494118107467
Time0.792.192.0219.5612.49143.4638.08

ArrhythmiaAcc (%)63.9763.2463.2463.9763.2463.9775.74
#F55128171346
Time2.942.348.0956.2440.11675.94127.73

MadelonAcc (%)89.8789.3671.4171.4171.4179.3691.15
#F17134994994994817
Time38.0340.0149.72223.64137.442833.54169.32

Isolet5Acc (%)86.9785.4784.8385.2685.2687.8292.08
#F24528937835122320446
Time62.0461.6157.49561.71197.988484.77218.44

MFAcc (%)98.3398.8394.8394.0094.3393.8398.50
#F62240248238641162921
Time43.7944.3150.45312.71188.625533.33105.40

PDAcc (%)79.3087.6780.1881.0681.0678.4185.90
#F4140162425709
Time61.7562.9462.16395.49234.742488.38154.63

CNAE-9Acc (%)89.8188.2785.4985.4985.4989.5185.80
#F21314285585585564764
Time66.9184.3263.66462.64237.187087.23221.19

QSARAcc (%)91.5291.7291.7291.7291.7291.7293.88
#F11010598498498446133
Time126.37135.70125.281058.36350.858411.24221.18
It is observed from Figures 7‒8 that the traditional methods are difficult to improve the classification accuracy by sequentially increasing the number of features when a certain level is reached. In comparison, more suitable feature subsets are obtained by the metaheuristic algorithm-based FS method, among these, APSOLL has better performance. In addition, it is not the case that the more features selected, the higher the classification accuracy is, which indicates that the redundancy among features affects the classification performance on most datasets. As can be seen from Table 5, it is clear that the classification accuracy is improved by at least 1.28% on average via the proposed method on 5 datasets, especially on arrhythmia and isolet5, with 11.77% and 4.26%, respectively. Although the classification accuracy of the proposed method is about 2% on average lower than traditional methods on myocardial, MF, PD, and CNAE-9, the number of selected features is lower than that of these methods, only 2, 21, 9, and 64 features are selected, respectively. To further analyze the number of selected features, fewer features are selected by the proposed method on 6 datasets. Among them, it is noticed that more than 30% of the features are selected by traditional methods on Isolet5 and MF, while only 7.46% and 3.24% of the features are selected by the proposed method, respectively. In terms of time consumption, traditional methods are affected by the number of features due to the sequential addition of features to the feature subsets, and its time consumption increase dramatically as the number of features increases, while APSOLL performs more stability on most datasets because its dynamic exploration and exploitation capabilities, and the CPU time is still acceptable. In brief, the proposed method is dependable and effective for solving FS problems compared with traditional methods.

6. Conclusions and Future Work

In the paper, APSOLL is proposed for FS, which enhances exploration and exploitation capabilities by utilizing an adaptive updating strategy to guide the population search in a more reasonable scope and the leadership learning strategy to increase population diversity. Experimental results in comparison with other FS methods based on metaheuristic algorithms reveal that APSOLL offers better optimization ability and selects the suitable feature subsets within an acceptable time. Moreover, APSOLL-based FS method achieves better or approximate classification accuracy by selecting less than 8% of features from the original datasets compared to other traditional methods. In conclusion, the suitable feature subsets are selected by the proposed method while ensuring a proper balance between the classification accuracy and the number of selected features. In the future, it is interesting to decrease the CPU time of APSOLL by combining the feature ranking and applying it to process ultrahigh dimensional datasets.
  11 in total

1.  Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach.

Authors:  Gulsah Karakaya; Stefano Galelli; Selin Damla Ahipasaoglu; Riccardo Taormina
Journal:  IEEE Trans Cybern       Date:  2015-07-06       Impact factor: 11.448

2.  A Novel Feature Selection Method for High-Dimensional Biomedical Data Based on an Improved Binary Clonal Flower Pollination Algorithm.

Authors:  Chaokun Yan; Jingjing Ma; Huimin Luo; Ge Zhang; Junwei Luo
Journal:  Hum Hered       Date:  2019-08-29       Impact factor: 0.444

3.  Identifying DNA-binding proteins based on multi-features and LASSO feature selection.

Authors:  Shengli Zhang; Fu Zhu; Qianhao Yu; Xiaoyue Zhu
Journal:  Biopolymers       Date:  2021-01-21       Impact factor: 2.505

4.  A New Representation in PSO for Discretization-Based Feature Selection.

Authors:  Binh Tran; Bing Xue; Mengjie Zhang; Bing Xue; Binh Tran; Mengjie Zhang
Journal:  IEEE Trans Cybern       Date:  2017-06-23       Impact factor: 11.448

5.  Feature Selection Through Message Passing.

Authors:  Partha Pratim Kundu; Sushmita Mitra
Journal:  IEEE Trans Cybern       Date:  2016-09-28       Impact factor: 11.448

6.  Multiobjective Particle Swarm Optimization for Feature Selection With Fuzzy Cost.

Authors:  Ying Hu; Yong Zhang; Dunwei Gong
Journal:  IEEE Trans Cybern       Date:  2020-09-14       Impact factor: 11.448

7.  A machine learning approach to select features important to stroke prognosis.

Authors:  Gang Fang; Wenbin Liu; Lixin Wang
Journal:  Comput Biol Chem       Date:  2020-06-23       Impact factor: 2.877

8.  A Fast Hybrid Feature Selection Based on Correlation-Guided Clustering and Particle Swarm Optimization for High-Dimensional Data.

Authors:  Xian-Fang Song; Yong Zhang; Dun-Wei Gong; Xiao-Zhi Gao
Journal:  IEEE Trans Cybern       Date:  2022-08-18       Impact factor: 19.118

9.  A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information.

Authors:  Li Zhang
Journal:  Comput Intell Neurosci       Date:  2021-12-28
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.