Literature DB >> 33817104

A Method of Biomedical Information Classification Based on Particle Swarm Optimization with Inertia Weight and Mutation.

Mi Li^1,2, Ming Zhang^1,2, Huan Chen^1,2, Shengfu Lu^3,1,2.

Abstract

With the rapid development of information technology and biomedical engineering, people can get more and more information. At the same time, they begin to study how to apply the advanced technology in biomedical information. The main research of this paper is to optimize the machine learning method by particle swarm optimization (PSO) and apply it in the classification of biomedical data. In order to improve the performance of the classification model, we compared the different inertia weight strategies and mutation strategies and their combinations with PSO, and obtained the best inertia weight strategy without mutation, the best mutation strategy without inertia weight and the best combination of the two. Then, we used the three PSO algorithms to optimize the parameters of support vector machine in the classification of biomedical data. We found that the PSO algorithm with the combination of inertia weight and mutation strategy and the inertia weight strategy that we proposed could improve the classification accuracy. This study has an important reference value for the prediction of clinical diseases.

Entities: CellLine Chemical Disease Gene Species

Keywords: Biomedical information classification; Inertia weight strategy; Mutation strategy; Particle swarm optimization; Support vector machine

Year: 2018 PMID： 33817104 PMCID： PMC7874695 DOI： 10.1515/biol-2018-0044

Source DB: PubMed Journal: Open Life Sci ISSN： 2391-5412 Impact factor: 0.938

Introduction

Bioinformatics integrates computer science technology and biological information technology, which reveals the significance of biological research and applications. An important part of bioinformatics is to predict which category it belongs by the given data. The rapid development of computer technology has brought much biological information. It is very difficult for people to intuitively visualize these large amounts of data. In recent years, many scholars have applied machine learning algorithms to predict diseases in the field of biomedicine and achieved good results. The common machine learning methods include support vector machine (SVM) proposed by Joachims [1], the decision tree proposed by Quinlan [2], the k nearest proposed by Fukunaga and Narendra [3], the Bayesian algorithm proposed by Bataineh and Al-Qudah [4] and the deep learning proposed by Lecun and Bengio [5], etc. Since SVM has a great advantage in solving small sample, nonlinear and high-dimensional problems, SVM has been widely used as a core method in biomedicine classification and recognition. For example, Chen and Huang used the EEG signal to identify epilepsy with SVM [6]. Soares and Paiva used SVM to diagnose breast tumor mass [7]. Zhou and Cui used SVM and Bayesian algorithms to predict protein localization [8]. Qu and Chen used SVM for the segmentation of a pathological picture of breast cancer tissue [9]. Mishra and Lakkadwala used SVM to predict cardiovascular disease [10]. Liu and Zhang used SVM to predict osteosarcoma [11]. Parikh and Shah used SVM to diagnose skin disease [12]. SVM is a supervised learning algorithm, and its basic idea is transforming the input space into a higher dimensional feature space through nonlinear transformation, and then the optimal classification hyperplane is calculated in this space, so as to classify the unknown sample. SVM is widely used in the field of machine learning. In order to further improve the performance of SVM, many variants of support vector machine are proposed. For example, Fung and Mangasarian proposed a proximal support vector machine (PSVM) [13]. The advantage of PSVM relative to SVM is that it uses an equation set to replace the convex programming problem in SVM. Under the condition of maximum interval, the PSVM uses two parallel hyperplanes to fit the two types of samples. Lin and Wang proposed a fuzzy support vector machine (FSVM) [14]. According to the different contribution of input samples, FSVM gives the samples different membership degrees, and separated the noise from effective samples. Although FSVM improved the traditional SVM, it is difficult to determine the membership function in FSVM at present. Jayadeva and Khemchandani proposed the twin support vector machine (TWSVM) [15]. TWSVM transforms two generalized feature problems into two smaller convex programming problems similar to support vector machine. SVM puts all the samples in the constraint condition of the convex programming problem, while TWSVM puts the samples that are opposite to the target function in the constraint conditions, so that the training speed of TWSVM is greatly improved. However, TWSVM also has the characteristics of lack of sparsity and low generalization ability, and it also needs improvement. The most important part of support vector machine is the kernel function and its parameter, which can affect the performance of the SVM directly, thus the choice of kernel function is a key problem in SVM. The common kernel functions are linear kernel function, polynomial kernel function and radial basis kernel function (RBF). The linear kernel function is mainly used in linear separable situations. The RBF is mainly used in linear inseparable situations. Relative to the linear kernel function, the RBF kernel function can map the feature to a higher dimensional space, and the linear kernel function can be regarded as a special case of RBF. In general, the RBF kernel is widely used. For example, Kuo and Ho used the SVM with the RBF kernel in image classification [16]. Prabin and Veerappan used the SVM with mixed RBF kernel to diagnose the MRI image of brain tumors [17]. Bousseta and Tayeb used SVM with RBF kernel in EEG data classification [18]. When using the SVM based on RBF kernel function to classify, we need to set the penalty factor (C) and the kernel parameter (g). Keerthi and Lin researched the penalty factor and the RBF kernel parameter in SVM [19], and analyzed the influence of different kernel parameters on the performance of the classifier. Chapelle and Vapnik proposed a method to adjust the kernel parameters of SVM automatically [20], but this method needs to compute the gradient of various bounds, which increases the complexity of the algorithm. In recent years, many swarm intelligent evolutionary algorithms have been proposed. Kennedy and Eberhart proposed the particle swarm optimization (PSO) algorithm [21], which simulates the process of bird hunting. Goldberg proposed the genetic algorithm (GA) based on the genetic process of nature [22]. Qinghong and Zhang proposed the ant colony algorithm according to the process of ants foraging [23]. Selima and Alsultan proposed the simulated annealing algorithm based on the process of temperature drop during solid annealing [24]. Eskandar and Sadollah proposed a water cycle algorithm (WCA) based on the process of water circulation in nature and the flow of rivers to the sea [25]. This algorithm shows good performance in solving the constraints problem, but the efficiency of the algorithm is not so high. In order to further improve the performance of the WCA algorithm, Pahnekolaie and Alfi proposed a gradient based water cycle algorithm, and applied the algorithm in the chaos suppression problem [26]. Gonçalves and Lopez proposed a search group algorithm (SGA) and applied it to truss structure optimization [27]. The SGA algorithm has a good ability of exploration and exploitation, but it is also sensitive to parameters. Seyedeh and Alireza proposed a fuzzy logic method to control the parameters of the SGA algorithm adaptively [28]. Zong and Kim proposed harmony search algorithm (HS) [29], which simulates musicians relying on their own memory to adjust the tone of each musical instrument in the band to achieve a wonderful state of harmony. Ameli and Alfi proposed a discrete harmony search algorithm (DHS), and applied it in the optimization of the capacitor position in the distribution system, and achieved a good result [30]. In these heuristic search algorithms, the PSO algorithm is relatively simple and easy to implement, and therefore is widely used in various optimization problems. For example, Zhang and Lv proposed an adaptive inertia weight chaotic PSO algorithm and applied it to train the single hidden layer neural network, and achieved a good classification result [31]. Wang and Phillips proposed the binary particle swarm optimization with mutation (BPSO-M), the binary particle swarm optimization with time-varying learning factors (BPSO-T) and the binary particle swarm optimization with the combination of mutation and time-varying learning factors (BPSO-MT), and applied these three algorithms to select features [32]. Zhang and Wang analyzed the status of swarm intelligent algorithms in detail, and summarized their applications in various industries [33]. Fernandez and Caballero used genetic algorithms to optimize the parameters of SVM [34]. Behravan and Dehghantanha used the binary particle swarm optimization (BPSO) to select features [35]. Kuang and Zhangused the improved chaotic PSO algorithm to optimize the parameters of the mixed kernel function [36]. Subasi thought that PSO had a significant improvement in SVM [37]. Ren and Hu used a grid algorithm to optimize kernel function parameters [38]. Zhang thought that PSO was more efficient in coding and optimizing than the genetic algorithm [39]. Therefore, it is relatively common to use the PSO algorithm to optimize the parameters of SVM. Because the original PSO algorithm easily falls into local optimal solutions, many scholars have improved the PSO algorithm. Tanweer and Auditya proposed a self-adjusting particle swarm algorithm [40]. Meng and Li adopted a cross search strategy and proposed a new updating formula, and horizontal crossover and vertical crossover operator are used [41]. Wang and Liu introduced a chaos search method and promoted the algorithm to jump out of the local optimal solution through the uncertainty of chaos [42]. Chen and Zhang adopted dynamic topology PSO algorithm in which the structure of the population is changed with the information of particles [43]. Liang and Kang proposed an adaptive mutation strategy and adopted a nonlinear variable inertia weight strategy [44]. The inertia weight of the PSO algorithm is an important parameter, and Shi and Eberhart introduced the inertia weight before the velocity term in the basic PSO algorithm [45]. Alireza and Modares proposed an adaptive inertia weight strategy. The improved PSO algorithm was applied to the optimization of PID parameters, and a good result was obtained compared with the genetic algorithm [46]. Eberhart and Shi proposed a random changing inertia weight strategy [47]. Malik and Rahman uses a sigmoid changing inertial weight strategy [48]. The inertia weight is stable at the beginning and end stages, and changes faster in the middle stage. Gholamian proposed a chaotic changing inertia weight [49]. Javad and Mousa proposed a nonlinear changing inertia weight [50]. Alireza and Fateh proposed a mutation strategy to improve the ability of global exploration and the speed of convergence, and used it to identify the hydraulic suspension system [51]. At present, the improvement methods of PSO algorithm mainly include two aspects: the inertia weight strategy and the mutation strategy. But most of the current strategies are unilateral, and there is almost no literature to compare the combinatorial performance of the mutation strategy and inertia weight strategy. In this paper, we compared the combinatorial performance of different inertia weights and mutation strategies, and got the best mutation strategy, the best inertia weight strategy and the best combination of them. Finally, we used the best mutation strategy, the best inertia weight strategy and the best combinations of PSO algorithms to optimize the parameters of SVM in biomedical information classification.

Algorithm and Model

Support Vector Machine

SVM is a machine learning method proposed in the 1990s based on statistical learning theory [52]. It is based on the principle of structural risk minimization. The kernel function method of SVM is used to transform the inseparable problem in low dimensional space into linearly separable problem in high dimensional space. For a given sample set of SVM (xi, yi), i=1, 2,…, n, yi∈{+1,-1}, where xi denotes the input data, yi denotes the output labels, n denotes the number of samples, and the hyperplane can be expressed as: where w denotes the weight vector, and b is the threshold. The optimization problem of SVM is described as: where C is the penalty parameter, which is used to control the degree of punishment for the error sample. The above problem can be transformed into Lagrange dual form, and the final decision function is: f (x) is the decision function. When different sample data is fed into the function, the decision function will predict the corresponding label.

Particle Swarm Optimization Algorithm

PSO is an intelligent evolutionary algorithm proposed by Kennedy and Eberhart based on the foraging process of birds [21]. In the PSO algorithm, the process of optimizing the problem is regarded as the process of bird feeding. Each particle in the PSO algorithm is abstracted as a solution to the optimization problem, which is described by two parameters: the position and the velocity of the particle. In each iteration, the velocity and position of the particle are updated by the formulas: where, i denotes the ith particle; d denotes the dimension; t is the number of iterations; xid denotes the position of the ith particle in the d dimension, and xid is limited in the interval [popmin, popmax]; vid(t) denotes the velocity of the particle; w is the inertial weight, which controls the influence of contemporary velocity on the next generation’s; pbestid(t) denotes the individual best position, and gbestd(t) denotes the global best position, c1 and c2 are acceleration factors; r1 and r2 are the random numbers in interval [0, 1]. Set z=c1r1[pbestid(t)-xid(t)]+c2r2[ gbestd(t)-xid(t)]. Figure 1 shows the above two formulas are represented with vectors in two dimensions. Figure 1(a) denotes the velocity formula, and the velocity v(t+1) is generated by wv(t) and z. Figure 1(b) denotes the position formula; the position of the new particle x(t+1) is added by the position x(t) and the velocity v(t+1).

Figure 1

Updating of a PSO algorithm with vector representation in two dimension. (a) velocity update schematic. (b) position update schematic

Mutation Strategy and Inertia Weight Strategy of PSO

Mutation Strategy of PSO

The mutation strategy of PSO is mainly to change the position of the particle. Figure 2 shows the effect of mutation strategy on PSO algorithm. The solid line denotes the original position and the dotted line denotes the new position after the mutation. Since the mutation strategy does not directly affect the velocity of the particle, the vector v(t+1) in Figure 2(a) does not change. The position of the particle is changed from x(t) to x(t)’ after mutation in Figure 2(b), and the position of the next generation of particle is changed from x(t+1) to x(t+1)’ by adding with v(t+1). If we adjust the particle’s moving distance reasonably, we can control particle movement in a certain range, which is beneficial for the particle to jump out of the local optimal solution.

Figure 2

The effect of mutation strategy on the PSO algorithm. (a) speed update schematic; (b) position updating schematic after introducing the mutation strategy

The effect of mutation strategy on the PSO algorithm. (a) speed update schematic; (b) position updating schematic after introducing the mutation strategy Alireza proposed an adaptive mutation strategy, and the distance of particle moving can be automatically adjusted according to fitness values [53]. The calculation method of variation is as follows: where xi’ is the position of the ith particle after mutation. βij represents a random number that obeys a Gaussian distribution with a mean of 0 and a standard deviation of 1. Parameter Mt can control the distance of mutation, which is calculated according to the fitness of global best position. popmax is the maximum value of the particles search space, and tanh is a monotonic increasing function. F (gbestd(t)) represents the best fitness of the population. Because the value of tanh is in the interval [0, 1], the value of Mt can be controlled adaptively within a certain range. Stacey and Jancic proposed a Gaussian mutation algorithm (GPSO) [54]. The basic idea is changing the position of a particle when it is in the local optimal solution, and the mutation formula of particle xi is: where xi is the position of the current particle and xi’ is the position after mutation. Gaussian(σ) is a random number that obeys a Gaussian distribution with a mean of 0 and a standard deviation of σ. The value of σ is 0.1 times the length of search space. The ability of the particle to jump out of the local optimal solution is increased by mutation. When using Gaussian PSO, a Gaussian distribution operator will be added to the position of a particle. According to the characteristics of Gaussian distributions, the value of the operator is smaller in most cases, resulting in the particle moving in a small range. Therefore, the Gaussian mutation has a strong local search ability and poor global search ability. In addition, there are many variants of the Gaussian strategy, such as Zhan and Lu’s neighbor heuristic and Gaussian cloud learning particle swarm optimization algorithm [55]. Wang and Li proposed a Cauchy mutation strategy (CPSO) [56]. The mutation formula of the particle is: where W is the mean vector of all particle velocities. n is the size of the population. N is a random number that obeys the Cauchy distribution. Because the probability density function image of a Cauchy distribution is a relatively smooth strip, the two ends are larger and the middle is smaller, which makes the position of the particles change greatly, so the Cauchy mutation has stronger jumping ability. In addition, there are many variants based on Cauchy mutation, such as adding a scaling factor on the Cauchy mutation to control the distance the particle moves [57]. Brockmann and Sokolov found the Levy flight pattern [58], which indicated that most of the cases are changed in a small range, and occasionally a small part of the situation would move to a distant position. Hakl and Uguz applied Levy mutation in particle swarm optimization (LFPSO) [59], and the mutation formula is: where α denotes the step size of the particle moves, and Levy(β) denotes the distribution of the Levy with parameter β. β is a variable between [0, 2]. Due to the occasional long distance movement of Levy flight, the particle’s position may move in a wide range occasionally during mutation, causing the particle to jump out of the local optimal solution. Wang and Wu combined the Gaussian, Cauchy and Levy mutation strategies by an adaptive approach [60]. Nishio and Kushida proposed an adaptive PSO with multidimensional mutation strategy [61]. Li and Liu proposed a stochastic mutation method (RPSO) [62]. The particle mutated by the following formul'a: where popmax and popmin denote the maximum and the minimum of the particle search range respectively. r is a random number obeying a uniform distribution in the interval (0, 1). Zhang and Lu proposed a feedback mutation particle swarm optimization algorithm (FBPSO) [63]. The mutation formula is: where β obeys a Gaussian distribution with the mean of 0 and the standard deviation of σ. fit(i) is the fitness of the ith particle, and fitgbest is the global fitness, fitavg is the mean fitness of all the particles. Since σ changes according to fitness, this mutation strategy can adjust the distribution of mutation positions based on the information of the particle.

Inertia Weight Strategy of PSO

Kennedy and Eberhart introduced the inertial weight parameter in the original PSO algorithm, and proposed the particle swarm algorithm with inertial weight [21]. Figure 3 shows the effect of inertia weight in a PSO algorithm. In Figure 3(a), the vector changes from wv(t) to wv(t)’ after changing the inertia weight w, and then the velocity changes from v(t+1) to v(t+1)’. In Figure 3(b), the new position of the particle is changed from x(t+1) to x(t+1)’ by adding v(t+1)’ to the position x(t). Therefore, the inertia weight in PSO algorithm essentially changes the position of particles. When using a larger inertia weight, the position of particle will also change greatly, which facilitates the global exploration of the algorithm in a large surrounding space. When the inertia weight is smaller, the position changes less, which facilitates the local search of the algorithm in the additional small space.

Figure 3

The effect of the inertia weight on PSO algorithm. (a) the speed update schematic after changing the inertia weight. (b) the position update schematic after changing the inertia weight

The effect of the inertia weight on PSO algorithm. (a) the speed update schematic after changing the inertia weight. (b) the position update schematic after changing the inertia weight PSO algorithm is nonlinear and highly complex in the search process. In order to improve the ability to monitor the population, Alireza proposed an inertia weight strategy with fitness feedback [53]. The formula is: where tanh and F (gbestd(t)) are the same as the definitions in formula (7), which are abbreviation for hyperbolic tangent and the fitness of the current best solution. In this strategy, the value of inertia weight is limited to the interval [0.5, 1]. When the fitness does not decrease, w(t) is large and changes slowly, which benefits to global exploration. When the fitness reduces, w(t) is small, which benefits to local exploitation. Eberhart and Shi proposed a random inertial weight (RANDPSO) [47]. The inertia weight is calculated as: where rand is the random number between [0, 1]. w is a random number in the interval [0.5, 1]. Since the value of inertia weight is random, the result of the algorithm is contingent. Yang and Gao proposed an exponential inertial weight strategy (LHNPSO) [64]. The inertial weight formula is where wmax and wmin are the maximum and minimum of inertia weight respectively. α=1/π2. The inertia weight decreases with the number of iterations, which benefits to global exploration and local exploitation of the algorithm. Nickabadi and Ebadzadeh proposed an adaptive inertia weight, in which the inertia weight is adjusted by the ratio of successful particles (AIWPSO) [65]. The inertia weight formula is: where, n is the size of the population. fit(pbesti(t)) denotes individual fitness. Ps(t) is the search success rate. If the individual fitness of a particle is smaller than the individual fitness of the last generation, the particle searches successfully, and Si(t)=1; otherwise, the particle does not search successfully, Si(t)=0. The inertia weight is beneficial to supervise the information of particles in the population. Chauhan and Deep used global best position and individual best position to adjust the inertia weight (DESIWPSO) [66]. The inertia weight of the ith particle is calculated as: Taherkhani and Safabakhsh proposed a multi–dimensional changing inertia weight strategy (SAIWPSO) [67]. In this method, the inertia weight of the next generation is automatically adjusted by the position information of the contemporary particle. What’s more, the concept of ith particle searches success is introduced: where fit(xi(t+1)) is the fitness of the ith particle in t+1 generation. fit(pbesti(t)) is individual fitness. If fit(xi(t+1))< fit(pbesti(t)), the particle searches successfully, and δi(t)=1, otherwise, the particle does not search successfully, δi(t)=-1. The inertia weight formula is: where wij(t) is the inertia weight of the ith particle in the jth dimension. w0 is the initial inertia weight. σ is the standard deviation of all particles in the jth dimension. ε is a smaller positive number, ε=0.005. All of the above inertia weight strategies do not make full use of the information of the particle population, and cannot supervise the status of the population very well. Therefore, we proposed a multi-information fusion inertia weight (MDAPSO) [68]. In order to supervise the state of the population, we refer to other adaptive inertia weights, and introduce the velocity and position of a particle in the inertia weight, as shown in formulas Y and Z. In order to control the overall trend of inertia weight, we refer to the decreasing inertia weight with iterations, and introduce λ1 and λ2, and in order to avoid the particles falling into local optimal solutions, we introduce a random disturbance. In order to increase the diversity of inertia weight, the inertia weight wid is changed at each iteration, each particle and each dimension. Finally, the formula of inertia weight is as follows: In the formula, min and max are used to limit the inertia weight wij(t) within the interval [0.1,1]. α=0.9, β=0.55, γ=0.5. case1, case2 and case3 are three conditions; δi(t-1) is used to determine whether the particle searches successfully. case1 indicates that the particle has searched successfully for two generations, and case2 indicates that the particle has not searched successfully for two generations. case3 denotes the above two cases are not satisfied. In the proposed inertia weight formula (27), we refer to α, β and γ as weight coefficients and limit them in the interval [0, 1]. Weight coefficients denote the influence ratios of Y, Z and rand on the inertia weight. To obtain the weight coefficients more easily and quickly, we use two coefficients to fix one and subsequently test the other one with the step length of 0.1, and the principle is shown in Figure 4. The method fixes β and γ first. Because the range of each weight coefficient lies in the interval [0, 1], the midpoint of the interval is chosen as the fixed value. We fix β=0.5 and γ=0.5 at first and calculate the fitness of each benchmark function when α is different and finally select the α value when the fitness of each benchmark function is minimum. In the same manner, we fix α and γ to obtain β and fix α and β to obtain γ. Figure 5(a), (b) and (c) show the effect of α on the benchmark functions when β and γ are fixed. The horizontal axis is α, which is in the interval [0,1], and the vertical axis is the error between the fitness and theoretical value. The error is small, and the performance of the algorithm is better. From Fig. 5(a), (b) and (c), we note that when α=0.9, the algorithm has a small error. From Fig. 5(d), (e) and (f), we find that β=0.5. From Fig. 5(g), (h) and (i), we observe γ=0.5. We obtain the optimal coefficient of each solution, but the solutions are optimal under certain conditions, and their combination is not necessarily the optimal solution of the algorithm. Therefore, we need to finely adjust α, β and γ to select the most appropriate solution near the three solutions. Eventually, we find that when α=0.9, β=0.55 and γ=0.5, the algorithm achieves a better result.

Fig. 4

Method for obtaining parameters α, β and γ in this paper

Fig. 5

Effects of different α, β and γ on benchmark functions

Method for obtaining parameters α, β and γ in this paper Effects of different α, β and γ on benchmark functions The pseudo code for our improved inertia weight PSO algorithm is shown in Algorithm 1. Algorithm 1 Initialize values: n, D, c1, c2, tmax, popmin, pop max, vmin, vmax, wid for i=1 to n initialize xi=xi1, xi2, xi3,…, xiD, vi=vi1,vi 2,vi 3,…,vi D end evaluate the values fitness(xi) pbesti=xi set the particle with best fitness to gbest for t=1 to tmax for i=1 to n for d=1 to D vid(t+1)=widvid(t)+c1r1[pbestid(t)- xid(t)]+c2r2[gbestd(t)-xid(t)] end x(t+1)= x(t)+ v(t+1) evaluate the values fitness(xi) if (fitness(xi)< fitness(pbesti)) pbesti=xi end if (fitness(xi)< fitness(gbest)) gbest=xi end end for i=1 to n for d=1 to D update wid end end end The flow chart of the algorithm is shown in the figure 6.

Fig 6

Flow chart of MDAPSO algorithm

Comparing the Different Inertia Weight Strategies and Mutation Strategies

Benchmark Function and Parameter Setting

In order to test the performance of the improved strategies, the different inertial weight strategies and mutation strategies were applied to optimize the benchmark functions that Taherkhani and Safabakhsh [67] used. All the benchmark functions were optimized for the minimum value. Table 3 shows the formula, dimension, search space and global minimum value of each test function. f1-f8 are unimodal benchmark functions, mainly to test the optimizing precision f9-f15 are multimodal benchmark functions, and f16-f20 are rotating multimodal benchmark functions, mainly to test the ability of the algorithm jumping out of local optimal solutions. The parameters of the PSO algorithm have a great influence on the algorithm. In order to ensure the fairness of testing, all the other parameters of the algorithm were the same except the inertia weight strategy and mutation strategy adopted by different literatures: population size n=20, learning factor c1=1.49445, c2=1.49445, maximum velocity vmax=0.2(popmax-popmin), minimum velocity vmin=-vmax, dimension D and particle search range [popmin,popmax] are determined by the benchmark function, iteration number tmax=10000D, and each algorithm runs 30 times independently. The final result of each benchmark function is the mean value of 30 results. The experimental platform is an HPZ640 workstation with 32G memory and Windows7 64 bit operating system. The software is MATLAB R2014a and parallel computing is used. We provide an example for parallel computing (https://github.com/zhangming8/APSO) In this example, it is the implementation of the improved PSO proposed by Alireza [53], so some of the parameters are the same as in the original paper. Table 1 and Table 2 are the summaries of the improvement strategies that have been discussed earlier.

Table 3

The information of the benchmark functions

Function	Formula	D	Search space	f (x )^min
Sphere	f1(x)=∑i=1Dxi2	30	[-100,100]^D	0
Rotated ellipsoid hyper	f2(x)=∑i=1D(∑j=1ixj)2	30	[-100,100]^D	0
Step	f3(x)=∑i=1D(xi+0.5)2	30	[-100,100]^D	0
Branin	f4(x)=(x2−5.14π2x12+5πx1−6)2+10(1−18π)cos(x1)+10−5/4π	2	₁ −5 ≤ x ≤10 0 ≤ x ₂ ≤15	0
Rosenbrock	f5(x)=∑i=1D−1[100(xi2−xi+1)2+(xi−1)2]	30	[-5,10]^D	0
McCormick	f6(x)=sin(x1+x2)+(x1−x2)2−1.5x1+2.5x2+0.9133	2	−1.5 ≤ x ≤ 4 ₁ −3 ≤ x ₂ ≤ 4	0
Beale	f7(x)=(1.5-x1+x1x2)2+(2.25-x1+x1x22)2+(2.625-x1+x1x23)2	2	[-4.5,4.5]^D	0
Bukin N.6	f8(x)=100\|x2-0.01x12\|+0.01\|x1+10\|	2	−15 ≤ x ≤ −5 ₁	0
			−3 ≤ x ₂ ≤ 3
Schwefel	f9(x)=∑i=1D−xisin(\|x1\|)+418.982D	30	[-500,500]^D	0
Rastrigin	f10(x)=∑i=1D[xi2−10cos(2πxi)+10]	30	[-5.12,5.12]^D	0
	f11(x)=∑i=1D[yi2−10cos(2πyi)+10]
Noncontinuois Rastrigin	yi={xi\|xi\|<0.5round(2xi)2else	30	[-5.12,5.12]^D	0
Ackley	f12(x)=−20exp(−0.21D∑i=1Dxi2)−exp(1D∑i=1Dcos2πxi)+20+e	30	[-32,32]^D	0
Griewank	f13(x)=14000∑i=1Dxi2−∏i=1Dcos(xi)+1	30	[-600,600]^D	0
Levy	f14(x)=sin2(πy1)+∑i=1D−1((yi−1)2(1+10sin2(πy1+1)))+(yD−1)2(1+sin2(2πyD))	30	[-10,10]^D	0
	yi=1+xi−14
Shubert	f15(x)=(∑i=15icos((i+1)x1+i))(∑i=15icos((i+1)x2+i))+186.7309	2	[-10,10]^D	0
Rotated Schwefel	f16(x)=∑i=1D−yisin(\|yi\|)+418.982D	30	[-500,500]^D	0
	Y=M×X
Rotated Rastrigin	f17(x)=∑i=1D[yi2−10cos(2πyi)+10]	30	[-5.12,5.12]^D	0
	Y=M×X
Rotated	f18(x)=∑i=1D[zi2−10cos(2πzi)+10]
Noncontinuous Rastrigin	zi={yi\|yi\|<0.5round(2yi)2else},Y=M*X	30	[-5.12,5.12]^D	0
Rotated Ackley	f19(x)=−20exp(−0.21D∑i=1Dyi2)−exp(1D∑i=1Dcos2πyi)+20+e	30	[-32,32]^D	0
	Y=M×X
Rotated Griewank	f20(x)=14000∑i=1Dyi2−∏i=1Dcos(yii)+1	30	[-600,600]^D	0
	Y=M×X

Table 1

Formulas of mutation strategies

Name	Author	Mutation strategies	Reference
m1	Stacey et al.	x′=x(1+Gaussian(σ))	[54]
		pg'(i)=pg(i)+W(i)×N(popmin, popmax)
m2	Wang et al.	W(i)=1n∑j=1nV[j][i]	[56]
m 3	Li et al.	x′=x+α+(popmax-popmin)×r	[62]
		x′=x+α+Levy(β)
m4	Brockmann et al.	x′=x+random(size(D))+Levy(β)	[59]
		x′=x+β×x
		β~N(0,σ2)
m5	Zhang et al.	σ=\|fit(i)−fitgbestfitavg−fitgbest\|+0.1	[63]
		x′=x+M×β
m6	Alireza et al.	M=popmax×tanh[1α×F(gbestd(t))]	[53]

Table 2

Formulas of inertia weight strategies

Name	Author	Inertia weight strategies	Reference
w1	Eberhart et al.	w(t)=0.5+rand2	[47]
w2	Yang et al.	w(t)=wmax-(Wmax-wmin)[ttmax]α	[64]
		w(t)=(wmax-wmin)Ps(t)+wmin
w3	Nickabadi et al.	Ps(t)−1n∑i=1nSi(t)	[65]
		w(t+1)=exp(-exp(-Ri(t)))
w4	Chauhan et al.	Ri(t)=\|gbest(t)−pbesti(t)\|×tmax−ttmax	[66]
		wij(t+1)={min{1, wij(t)+(1−w0)×N+ε}, if δi(t)>0 and δi(t−1)>0max{0.1, wij(t)−w0×(1−N)−ε}, if δi(t)<0 and δi(t−1)<0 wij(t) ,else
w5	Taherkhani et al.		[67]
		N=exp((xij(t+1)−pbestij(t))2−2σ2)
w6	Li et al.	wij(t)={min{1,wij(t−1)+((1−α)Y+(1−β)Z+γ⋅rand)λ(t−1)},case1 max{0.1,wij(t)−(αY+β(1−Z)−(1−γ)⋅rand)λ(t−1)},case2 wij(t−1) ,case3	[68]
w7	Alireza et al.	w(t)=0.5{1+tanh[1α×F(gbestd(t))]}	[53]

Formulas of mutation strategies Formulas of inertia weight strategies The information of the benchmark functions

Comparing Mutation Strategies without Inertial Weight

The benchmark functions were optimized by PSO with different mutations when the inertia weight strategies were not used. The result is shown in Table 4. The “BestNumber” of the last line in the table denotes the best number of optimization result of each PSO algorithm. As can be seen from Table 4, when the inertia weight strategy is not used, the maximum value of BestNumber corresponding to m 4 is 5, so the performance of mutation m 4 is the best. From the results, the good results of m 4 mainly concentrated in the multimodal benchmark functions. The main reason is that m4 introduces the Levy flight strategy, which makes the particle occasionally produce a larger jump range and promotes the algorithm to jump out of the local optimal solution. The performance of m6 and m3 are also not bad. The main reason is that m6 used an adaptive mutation strategy in which the mutation distance can be changed according to the fitness value of the population, so as to better monitor the information of the population and promote the algorithm to jump out of the local optimal solution. m3 used a uniformly distributed mutation operator, and the overall optimization performance is also good.

Table 4

The result of benchmark function (Mean) of PSO mutation strategies without inertia weight (w=1)

	m₁	m₂	m₃	m₄	m₅	m₆
f₁	3.14E+02	8.20E-03	5.43E-04	1.43E-02	1.87E+02	5.50E-03
f₂	4.82E-03	1.19E-01	7.59E-02	1.64E-01	2.31E-02	6.35E-04
f₃	3.25E+02	5.56E-03	5.47E+01	1.74E-02	1.60E-03	6.83E-03
f₄	1.59E-05	1.45E-05	1.87E-06	1.42E-05	1.56E-05	9.18E-05
f₅	1.52E-04	1.01E-03	5.45E-04	8.55E-03	1.21E-04	8.51E-05
f₆	8.45E-06	9.26E-06	8.21E-04	9.19E-06	8.30E-05	4.57E-02
f₇	3.45E-06	5.10E-03	8.06E-06	3.69E-05	5.54E-06	2.81E-02
f₈	2.58E-02	2.33E-02	1.96E-02	3.85E-03	2.29E-02	4.26E-01
f₉	6.12E+01	2.30E+02	6.92E+00	2.33E+02	5.93E+00	1.46E-01
f₁₀	1.61E+01	1.75E-03	2.08E+01	1.89E-05	9.21E+00	8.72E-05
f₁₁	1.13E+01	2.15E-03	1.65E+01	3.25E-05	2.89E+00	9.42E-04
f₁₂	1.11E+00	6.72E-04	1.32E+00	4.06E-03	1.02E-02	9.82E-03
f₁₃	3.56E+00	6.53E-03	4.90E+00	6.73E-02	1.44E+00	6.91E-02
f₁₄	5.90E-01	1.42E-05	1.51E+00	2.04E-06	7.12E-01	1.50E-05
f₁₅	6.73E-03	7.91E-03	5.98E-03	7.13E-03	4.98E-03	9.42E-02
f₁₆	7.00E+02	6.73E+01	5.07E+00	6.83E+01	6.19E+02	4.10E+02
f₁₇	1.71E+01	2.47E-03	2.09E+01	1.13E-04	1.80E+01	3.68E-03
f₁₈	1.61E+01	2.65E-02	5.16E-03	4.23E-02	1.36E+01	9.91E+02
f₁₉	1.08E+00	4.72E-03	1.32E-03	2.41E-03	1.02E+00	1.25E-03
f₂₀	3.62E+00	7.84E-03	5.03E+00	6.46E-02	1.69E-03	3.57E-03
BestNumber	2	2	4	5	3	4

The result of benchmark function (Mean) of PSO mutation strategies without inertia weight (w=1)

Comparing Inertial Weight Strategies without Mutation

In this section, the benchmark functions were optimized by PSO with different inertia weight when the mutation strategies were not used. The result is shown in Table 5. When the mutation strategy is not used, the maximum value of BestNumber corresponding to w6 is 8, so the performance of w6 is the best. It shows good performance not only in unimodal benchmark functions, but also in multimodal benchmark functions. The main reason is that w6 improves the diversity of the population, enhances the supervising ability of the population, and balances the global exploration and local exploitation ability very well. In addition, the performance of w5 is also not bad, especially in unimodal benchmark functions. The main reason is that w5 uses the position of the particle to adjust the inertia weight. Meanwhile, the inertia weight changes in each iteration, each particle and each dimension, which helps to improve the diversity of the population.

Table 5

The result of benchmark functions (Mean) of PSO inertia weight strategies without mutation

	w₁	w₂	w₃	w₄	w₅	w₆	w₇
f₁	6.71 E-98	8.86E+00	3.45E-18	1.11E+01	1.87E-122	5.59E-161	6.55E-05
f₂	2.80E-100	1.78E+02	3.38E-49	2.07E+02	2.60E-128	6.04E-164	5.18E-05
f₃	4.43E-30	6.30E+00	3.98E-06	1.08E+01	4.91E-29	1.65E-32	2.09E-04
f₄	2.13E-05	2.13E-05	2.13E-05	2.13E-05	2.13E-06	2.13E-05	1.83E-06
f₅	2.25E+00	4.23E+01	1.87E+00	1.93E+01	1.87E+00	1.27E+00	1.99E-05
f₆	1.05E-02	7.72E-06	7.72E-06	7.72E-06	7.72E-06	1.05E-02	4.57E-02
f₇	2.55E-03	5.09E-03	1.78E-02	1.02E-02	5.09E-03	5.09E-03	2.54E-02
f₈	2.04E-03	2.80E-03	2.48E-03	1.66E-03	1.44E-03	2.80E-03	4.26E-01
f₉	4.40E+02	5.80E+02	5.81E+02	4.96E+02	5.80E+02	4.03E+02	1.31E-01
f₁₀	6.05E-06	7.19E-06	6.61E+00	6.61E+00	6.85E+00	5.67E-07	9.20E-06
f₁₁	3.98E+00	5.30E+00	4.29E+00	5.82E+00	7.61E-03	1.25E-04	3.31E-04
f₁₂	1.60E-01	5.64E-01	6.31E-01	5.21E-01	6.28E-01	2.07E-02	7.74E-04
f₁₃	2.96E-03	2.83E-01	2.03E-02	6.06E-01	1.45E-02	7.40E-04	1.36E-03
f₁₄	7.17E-01	7.83E-01	1.30E+00	2.17E-01	1.27E+00	8.13E-01	9.76E-05
f₁₅	8.85E-07	8.85E-07	8.85E-07	8.85E-07	8.85E-07	8.79E-07	6.70E-02
f₁₆	6.03E+02	5.89E+02	6.66E+02	6.94E+02	6.16E+02	5.92E+02	5.19E+03
f₁₇	7.03E-02	7.35E-05	7.76E-04	7.96E-01	7.74E-05	6.95E-05	9.44E-05
f₁₈	5.11E+00	7.19E+00	7.59E+00	7.08E+00	4.70E+00	4.85E+00	1.04E+02
f₁₉	3.06E-01	6.25E-03	6.38E-01	6.24E-04	6.13E-01	1.78E-03	8.69E-04
f₂₀	1.87E-03	1.93E-01	1.26E-02	5.50E-01	1.92E-05	1.70E-03	3.27E-05
BestNumber	2	2	2	3	6	8	4

The result of benchmark functions (Mean) of PSO inertia weight strategies without mutation

Comparing the best inertia weight strategy and mutation strategy

In the previous two sections, the best mutation strategy and the best inertia weight strategy were obtained. We also compared the best inertia weight and the best mutation strategy together, and the results are shown in Table 6. It can be seen from Table 6 that the overall performance of inertia weight and mutation strategy are similar. As for the single peak functions f1, f2, f3, the inertia weight w6 obtained higher precision than the mutation m 4, so a good inertia weight strategy has great influence on the precision of the PSO algorithm for the single peak function. For the multimodal functions, the mutation strategy shows better results, so the mutation improves the ability of the algorithm to jump out of the local optimal solution.

Table 6

Results of the best inertia weight and mutation strategy

	m₄	w₆
f₁	1.43E-02	5.59E-161
f₂	1.64E-01	6.04E-164
f₃	1.74E-02	1.65E-32
f₄	1.42E-05	2.13E-05
f₅	8.55E-03	1.27E+00
f₆	9.19E-06	1.05E-02
f₇	3.69E-05	5.09E-03
f₈	3.85E-03	2.80E-03
f₉	2.33E+02	4.03E+02
f₁₀	1.89E-05	5.67E-07
f₁₁	3.25E-05	1.25E-04
f₁₂	4.06E-03	2.07E-02
f₁₃	6.73E-02	7.40E-04
f₁₄	2.04E-06	8.13E-01
f₁₅	7.13E-03	8.79E-07
f₁₆	6.83E+01	5.92E+02
f₁₇	1.13E-04	6.95E-05
f₁₈	4.23E-02	4.85E+00
f₁₉	2.41E-03	1.78E-03
f₂₀	6.46E-02	1.70E-03
BestNumber	10	10

Results of the best inertia weight and mutation strategy

Comparing the Combinations of Inertia Weight Strategies and Mutation Strategies

In the PSO algorithm, the combined inertia weight and mutation strategy can further improve the performance of the algorithm. Figure 7 is the PSO algorithm updating process with the combination of inertia weight and mutation strategy. Figure 7(a) denotes velocity updating, and Figure 7(b) denotes position updating. In Figure 7(a) the vector changes from wv(t) to wv(t)’ after changing the inertia weight w, and then the velocity changes from v(t+1) to v(t+1)’. In Figure 7(b), the position of the particle changes from x(t) to x(t)’ after mutation, therefore, the new position of the particle is changed from x(t+1) to x(t+1)’ by adding v(t+1)’ to the position x(t)’. The position of the particle is changed by the combination of inertia weight and mutation operations.

Fig.7

Updating principle of PSO when inertia weight and mutation are used. (a) velocity updates, (b) position updates

Updating principle of PSO when inertia weight and mutation are used. (a) velocity updates, (b) position updates In order to compare the combinatorial performance of inertia weight and mutation strategies, benchmark functions were used to optimize the combinations from w1 to w7 and m1 to m6. The parameters used were the same as in Section 4.1 In order to clearly reflect the performance of each combination, the “BestNumber” of each combination relative to other combinations was calculated. Figure 8 shows the BestNumber when all the inertia weight strategies and mutation strategies were combined. The transverse axis denotes the different inertia weight strategies and the longitudinal axis is the BestNumber. Each figure denotes 7 combinations. By comparing the 42 combinations, the maximum value of BestNumber is 6, which corresponded to the best combination of w3+m5 in Figure 8(e). Although inertia weight w6 is the best without mutation strategy and mutation m4 is the best without inertia weight strategy, the combination of them (w6+m4) is not the best. The main reason is that the mutation strategy directly changes the position of the particle, whereas the

Figure 8

The BestNumber when all the inertia weight strategies and mutation strategies are combined

The BestNumber when all the inertia weight strategies and mutation strategies are combined inertia weight strategy changes the position indirectly by changing the velocity of particle. The essence of both strategies is to change the position of particle. Therefore, using different inertia weight and mutation strategies will interfere with each other and affect the result of the PSO algorithm. The reason for this phenomenon can also be obtained from Figure 7.

The Optimization in the Classification of Biomedical Datasets

Dataset and Test Process

In this section, different classification models were evaluated by the datasets in the UCI machine learning library proposed by Bache and Lichman [69]. Informed consent: Informed consent has been obtained from all individuals included in this study. The datasets used in this study included Breast Cancer, Diabetes, Liver-disorders, Parkinsons, Statlog (heart), and Lung-A (lung cancer) [70], and the specific information of each dataset is given in Table 7.

Table 7

The datasets used in this paper

dataset	number	features	classes
Breast Cancer	683	9	2
Diabetes	768	8	2
liver-disorders	341	6	2
Parkinsons	195	22	2
Lung-A	197	1000	4
Statlog (heart)	270	11	2

The datasets used in this paper In the test process, SVM based on RBF kernel function was used to classify the above datasets. Meanwhile, the PSO algorithm was used to optimize the penalty factor (C) and kernel parameter (g) in SVM, and the process was as follows: Initialize the particle swarm, set the population size and the number of iterations, and initialize the position and velocity of particles randomly; Train the SVM model, and calculate the classification accuracy based on current C and g; Calculate the inertia weight; Update the velocity and position; Update the individual best position pbest and global best position gbest; Modify the particle position by mutation; Judge the stop condition and return to Step2 if it is not satisfied, otherwise continue; End.

Results

From Section 4 and Section 5, it is known that m4 is the best when using mutation strategy, and w6 is the best when using inertia weight strategy. The combination of w3 and m5 is the best when using the combination of inertia weight strategy and mutation strategy. This paper compared the three PSO algorithms based on mutation strategy m 4, inertia weight strategy w6, and the combination of w3 and m5 to optimize penalty factors (C) and RBF kernel parameters (g) in SVM. The other parameters of PSO were set as follows: the population size was 20; the acceleration factors c1 and c2 were 1.49445; the maximum number of iteration was 1000; the minimum value of the penalty factor C and the kernel parameter g was 0.01, and the maximum value was 100; random initialization was performed before the algorithm ran. The maximum velocity of the particle was 0.2 times the range of search space, and the minimum velocity was the opposite of the maximum velocity. We used LIBSVM tool proposed by Chang and Lin during the test[. 10-fold cross validation was used during the test and repeated 5 times. The test results were averaged. Figure 9 shows the classification accuracy curves of the three PSO algorithms during iteration. Table 8 shows the final accuracies of all the datasets.

Figure 9

The accuracy curves of the three PSO algorithm during iteration on different datasets

Table 8

The average classification accuracies of all the datasets (Mean ± standard deviation)

	Breast Cancer	Diabetes	Liver-disorders	Parkinsons	Lung-A	Statlog (heart)
m₄	98.03±0.08	80.32±0.32	79.48±1.10	98.26±1.07	82.51±5.89	85.93±2.14
w₆	98.12±0.07	80.58±0.28	80.35±0.55	98.26±1.07	96.43±0.81	87.11±0.84
w₃+m₅	98.18±0.13	80.81±0.38	80.70±0.86	97.95±1.35	85.51±2.50	87.41±0.79

The accuracy curves of the three PSO algorithm during iteration on different datasets The average classification accuracies of all the datasets (Mean ± standard deviation) From Figure 9, the accuracy of using w6 on Parkinsons and Lung-A datasets was the highest, the accuracy of using combination w3+m5 on other datasets was the highest, and the classification accuracy of using m4 was the lowest. During the iterations, the combination w3+m5 improved rapidly in the early stage on Breast Cancer and Diabetes datasets, and converged to the highest classification accuracy eventually. For the Liver-disorders dataset, the accuracy of w3+m5 and w6 were almost the same at the beginning, but w3+m5 converged to the highest accuracy finally. For the Parkinsons dataset, the final accuracies of m 4 and w6 were the same, and both of them were higher than w3+m5. For the Lung-A dataset, all the algorithms converged to a stable value quickly, and the accuracy of w6 was obviously higher than m 4 and w3+m5. For the Statlog (heart) dataset, w3+m5 not only converged faster but also converged to a higher accuracy than m 4 and w6. In general, both the combination w3+m5 and the inertial weight w6 that we proposed were the best, which means that a good algorithm does not show the best results when solving all problems, and this phenomenon also conformed to the “no free lunch” theorem proposed by Wolpert and Macready [72]. Therefore, in the actual classification, we can use the combination w3+m5 first. If the improvement is not obvious, we can use w6 and choose a better algorithm. If the two strategies have the same effect, we can use the strategy that is easier to implement.

Conclusion

This paper compared different inertia weight strategies, mutation strategies and their combinations by optimizing the benchmark function. From the results, we obtained the best mutation strategy without inertia weight, the best inertia weight strategy without mutation, and the best combination of them. At the same time, we found that the effect was not the best when using the best inertial weight and the best mutation strategy simultaneously. The main reason was that the inertia weight and the mutation could interfere with each other. Finally, we used PSO based on different inertia weight and mutation to optimize the penalty factors and kernel parameter of SVM. The classification results showed that the combination of inertia weight and mutation strategy (w3+m5) and the inertia weight (w6) that we proposed had their own advantages on the datasets; both of them could improve the accuracy of biomedical information classification.

11 in total

1. Asymptotic behaviors of support vector machines with Gaussian kernel.

Authors: S Sathiya Keerthi; Chih-Jen Lin
Journal: Neural Comput Date: 2003-07 Impact factor: 2.026

Review 2. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM).

Authors: Michael Fernandez; Julio Caballero; Leyden Fernandez; Akinori Sarai
Journal: Mol Divers Date: 2010-03-20 Impact factor: 2.943

3. Twin Support Vector Machines for pattern classification.

Authors: R Khemchandani; Suresh Chandra
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2007-05 Impact factor: 6.226

4. Fuzzy support vector machines.

Authors: Chun-Fu Lin; Sheng-De Wang
Journal: IEEE Trans Neural Netw Date: 2002

5. The nature of statistical learning theory~.

Authors: V Cherkassky
Journal: IEEE Trans Neural Netw Date: 1997

6. Classification of breast regions as mass and non-mass based on digital mammograms using taxonomic indexes and SVM.

Authors: Fernando Soares Sérvulo de Oliveira; Antonio Oseas de Carvalho Filho; Aristófanes Corrêa Silva; Anselmo Cardoso de Paiva; Marcelo Gattass
Journal: Comput Biol Med Date: 2014-12-10 Impact factor: 4.589

Review 7. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

8. Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.

Authors: Abdulhamit Subasi
Journal: Comput Biol Med Date: 2013-02-27 Impact factor: 4.589

9. Magnetic resonance brain classification by a novel binary particle swarm optimization with mutation and time-varying acceleration coefficients.

Authors: Shuihua Wang; Preetha Phillips; Jianfei Yang; Ping Sun; Yudong Zhang
Journal: Biomed Tech (Berl) Date: 2016-08-01 Impact factor: 1.411

10. A Novel Flexible Inertia Weight Particle Swarm Optimization Algorithm.

Authors: Mohammad Javad Amoshahy; Mousa Shamsi; Mohammad Hossein Sedaaghi
Journal: PLoS One Date: 2016-08-25 Impact factor: 3.240