Literature DB >> 30993220

Training of feedforward neural networks for data classification using hybrid particle swarm optimization, Mantegna Lévy flight and neighborhood search.

Omid Tarkhaneh1, Haifeng Shen2.   

Abstract

Artificial Neural networks (ANNs) are often applied to data classification problems. However, training ANNs remains a challenging task due to the large and high dimensional nature of search space particularly in the process of fine-tuning the best set of control parameters in terms of weight and bias. Evolutionary algorithms are proved to be a reliable optimization method for training the parameters. While a number of conventional training algorithms have been proposed and applied to various applications, most of them share the common disadvantages of local optima stagnation and slow convergence. In this paper, we propose a new evolutionary training algorithm referred to as LPSONS, which combines the velocity operators in Particle Swarm Optimization (PSO) with Mantegna Lévy distribution to produce more diverse solutions by dividing the population and generation between different sections of the algorithm. It further combines Neighborhood Search with Mantegna Lévy distribution to mitigate premature convergence and avoid local minima. The proposed algorithm can find optimal results and at the same time avoid stagnation in local optimum solutions as well as prevent premature convergence in training Feedforward Multi-Layer Perceptron (MLP) ANNs. Experiments with fourteen standard datasets from UCI machine learning repository confirm that the LPSONS algorithm significantly outperforms a gradient-based approach as well as some well-known evolutionary algorithms that are also based on enhancing PSO.

Entities:  

Keywords:  Computer science

Year:  2019        PMID: 30993220      PMCID: PMC6449775          DOI: 10.1016/j.heliyon.2019.e01275

Source DB:  PubMed          Journal:  Heliyon        ISSN: 2405-8440


Introduction

Artificial Neural Networks (ANNs) are inspired by the human nervous systems and often used for pattern recognition and data classification [1] in various application domains such as manufacturing and medical diagnostics [2], [3], [4], [5], [6]. One type of ANNs is Multi-Layer Perceptron (MLP), which is particularly useful for nonlinear modeling through training algorithms that are either gradient-based or based on meta-heuristics [7]. One of the well-known gradient-based methods is Back Propagation (BP), which however can get trapped into local minima and cannot find the appropriate values for the control parameters of weight and bias using training algorithms especially when the problem has a large scale [8]. This limitation has inspired researchers to harness meta-heuristic approaches to train ANNs as their stochastic nature contributes to remarkable performance in finding global optimal results [9]. One of the well-known meta-heuristic training algorithms is Particle Swarm Optimization (PSO) [10], a swarm intelligence-based algorithm inspired by the social behavior of animals such as birds flocking and fishes schooling. Each fish or bird is treated as a particle that has position and velocity and particles try to follow their local best positions until they reach the global best position. PSO, including its extensions such as Ant Colony Optimization (ACO) and Artificial Bee Colony (ABC), has been employed in various studies to train their MLP ANNs and showed great performance in optimizing the training process [11], [12], [13]. A weak trade-off between exploration and exploitation and limitation of population diversity are two major challenges in PSO [14] and work has been done to address them in terms of parameter setting, neighborhood topology, learning approaches, and hybridized methods [15]. For example, some works tried to fine tune and regulate the parameters through memory adaptation [16], [17], Gaussian adaptation [18], or fuzzy-based methods [19], while other works attempted to avoid premature convergence by utilizing a neighborhood strategy like fully informed [20], self-adaptive [21] or ring topology [22], or a combination strategy through Lévy distribution such as LFPSO [23] and PSOLF [24]. The No Free Lunch (NFL) theorem asserts that no optimization methods can defeat all optimizers in solving all problems [25], [26], which motivated us to further extend PSO in order to better avoid local minimum and create a more balanced trade-off between exploration and exploitation in training MLP ANNs. The proposed PSO extension is a hybrid algorithm that combines the PSO velocity operator with the Mantegna Lévy distribution to escape from local minima by finding different search areas, and to promote global search, enhance convergence speed, and balance exploration and exploitation by dividing the population and generation between different sections of the algorithm. The proposed hybrid algorithm further combines the Mantegna Lévy distribution with a new formulation of Global Neighborhood search [14] to boost local search, mitigate premature convergence and avoid local minima by searching more undiscovered areas in the search space to produce more diverse solutions. The new hybrid algorithm is referred to as LPSONS (Mantegna Lévy Flight, Particle Swarm Optimization, and Neighborhood Search) and has been implemented to optimize training of Feedforward MLP ANNs with a single hidden layer for the sake of simplicity yet without losing generality as the single layer can be generalized to approximate any continuous function with a finite number of neurons [27]. We have also conducted a series of experiments to analyze and test the structure schema of the proposed algorithm with fourteen datasets from UCI machine learning repository. We have further evaluated the performance of LPSONS against those of two well-known PSO extensions – PSOLF and LFPSO – and a well-known Gradient-Based algorithm Back Propagation (BP) [28] based on the metrics of Classification Accuracy, Mean Squared Error (MSE), Specificity and Sensitivity. Statistical results using Friedman test show that the LPSONS algorithm significantly outperforms those benchmark algorithms. The rest of the paper is organized as follows. Section 2 introduces some related work on training of ANNs and the fundamental work on which the proposed approach is based including MLP networks, Particle Swarm Optimization, and Lévy Flight. Section 3 then provides the details of the proposed LPSONS algorithm and after that Section 4 presents the evaluation experiments and discusses the results. Section 5 finally concludes the paper with a summary of major contributions and future work.

Related work

Training of artificial neural networks

In recent years, quite a lot of work has been done to optimize the training of ANNs using evolutionary algorithms, especially Evolution Strategy, Differential Evolution, and swarm-intelligent based approaches [17], [18], [19], [20]. Green II et al. proposed a Central Force Optimization (CFO) method for training ANNs and found it performed better than PSO in terms of algorithm design, computational complexity, and natural basis [29]. Bolaji et al. proposed the fireworks algorithm and compared it against other established algorithms using different benchmark datasets [30]. Faris et al. proposed the Lightening Search Algorithm (LSA) for finding optimal results and tested it with different measurements [28]. Karaboga et al. contributed Artificial Bee Colony (ABC) for optimizing weights in ANNs [31]. Aljarah et al. developed the whale optimization algorithm to find optimal connection weights in MLP ANNs [32], which showed superior performance to those of other benchmark algorithms. Genetic Algorithm (GA) has been applied to different problems including training of MLP ANNs. For instance, in Sexton et al.'s work [33], GA was used to optimize an MLP ANN with a single hidden layer. Karegowda et al. proposed a hybrid approach combining GA and Back Propagation Network (BPN) to optimize connection weights in ANNs and applied their work to medical diagnosis [34]. Khan et al. did a comparison study between two gradient descent algorithms and three population-based algorithms including GA, PSO, and Bat algorithm and found that the Bat algorithm performed the best [35]. Pawelczyk et al. proposed a Genetically-Trained Deep Neural Networks in order to promote the training process in Deep Neural Networks (DNN) by combining genetic algorithm and Gradient-based Back Propagation algorithm [36]. A number of hybrid algorithms have been proposed to improve the performance of PSO for training ANNs. Chen et al. suggested a hybrid approach to optimizing the training of Feedforward Neural Networks (FNNs) by combining PSO and Cuckoo Search (CS) and the comparison results revealed that it outperformed either PSO or CS alone as well as other FNN training algorithms [37]. Mirjalili et al. proposed a hybrid method using PSO and Gravitational Search Algorithm (GSA) for training FNNs and the comparison results showed its superior performance to that of the basic PSO and GSA alone in terms of convergence speed and local minima avoidance [38]. Ozturk and Karaboga introduced a hybrid approach consisting of ABC and the Levenberq-Marquardt (LM) algorithm for training an ANN [39] in which the network is first trained by the ABC algorithm and the LM algorithm then continues the training by grasping the best weight set of ABC algorithm in order to minimize the training error [39]. The proposed approach was tested on XOR, Decoder-Encoder, and 3-Bit Parity problems and exhibited remarkable performance. In summary, evolutionary algorithms are proven useful in training MLP ANNs and much work has been done to enhance their performance from different perspectives using different measures such as training error and accuracy of classification. Our work in this paper extends Particle Swarm Optimization with Mantegna Lévy Flight and Neighborhood Search in order to produce more diverse solutions, mitigate premature convergence and avoid local minima using a rich set of measurement metrics including Classification Accuracy (ACC), Mean Squared Error (MSE), Specificity and Sensitivity.

Multi-layer perceptron networks

An Multi-Layer Perceptron (MLP) network contains the elements of input layer, hidden layer, and output layer [40]. An MLP network can contain multiple different hidden layers enabling the network to have computational and processing abilities to generate the network outputs [41]. Figure 1 shows an MLP network with a single hidden layer, which contains some weights connecting between layers. The output values will be calculated through the following steps.
Figure 1

An MLP network with a single hidden layer.

An MLP network with a single hidden layer. First, the sum of weights is calculated as follows: where is the input variable, is the weight between the input variable and neuron j, and is the input variable's bias term. Second, neurons' output values in the hidden layers are generated from the received values of weighted summation (Equation (1)) by using an activation function. A popular choice of such a function is a sigmoid function as follows: where is the sigmoid function for neuron j and is the sum of weights. Finally, the output of neuron j is calculated as follows: where is the output of neuron j, is the weight between the output variable and neuron j, is the activation function for neuron j, and is the output variable's bias term. After the structure of an MLP ANN is created, a training process is required to fine tune the control parameters of weight and bias in order to achieve good results, e.g., minimizing the error rate including both classification and approximation errors. Both gradient-based approach such as BP and meta-heuristic-based approach such as PSO can be used for this purpose. Our proposed training algorithm is based on PSO.

Particle Swarm Optimization (PSO)

Particle Swarm Optimization (PSO), a swarm intelligence based algorithm proposed by Kennedy and Eberhart [10], mimics the social behavior of birds or fishes such as flocking or schooling, regrouping, and changing directions suddenly by using velocity to model their movements. In PSO, each solution is called a particle, which is characterized by four attributes: the current position , the best historical position evaluated by the objective function , the best historical position discovered in all particles , and the current velocity . Changes of velocity and position are described by the following equation: where and are acceleration factors, w is inertia weight, and and uniformly generate random numbers in the range of [0, 1]. Algorithm 1 lists the pseudo code of the PSO algorithm.
Algorithm 1

Particle Swarm Optimization (PSO).

Particle Swarm Optimization (PSO).

Lévy flight

In nature, animals look for food based on a random walk, that is, the next step in a search path is based on the current location and the transition probability to the next location. A Lévy flight is a kind of random walk and studies have shown that many animals and insects have their flight styles resemble the features of Lévy flights. This behavior has been applied to optimal search and optimization algorithms [42], [43], [44]. In particular, transition from to in the ith solution of an optimization algorithm can be described as follows: where ∂ is the step size that is subject to the scale of the problem of interest, ⊕ is the product operator for entry wise multiplications [42], and provides a random walk with their large steps drawn from a Lévy distribution as follows: which has an infinite variance with an infinite mean and . Clearly, generation of step size samples is not trivial using Lévy flights and below is a simple scheme [43]: where μ and v are drawn from normal distributions and are defined as follows: where Γ is the standard gamma function.

Methodology

This section provides the details of the proposed LPSONS algorithm, including the division strategy that splits the population and the generations so that partial values of the population and the generations are assigned to different components of the proposed algorithm, the Mantegna Lévy distribution and PSO operators, the Neighborhood Search method, and the encoding strategy.

Division strategy in population and generations

The division strategy aims to divide the population and the generations and assign them to different components of the algorithm. Some evolutionary algorithms benefit from a division strategy in population as well as one in generations. For instance, Zhang et al. introduced a CS algorithm that utilized subgroups of the population and experimental results showed that the adopted division strategy not only helped improve both exploration and exploitation but also create a balance between them [45]. Salgotra et al. proposed an efficient hybrid algorithm based on CS and Cauchy Distribution by dividing both the population and generation in order to improve exploration and exploitation and tested the algorithm with varying population sizes using some benchmark problems [46]. It has been proved that a good balance between exploration and exploitation is a key efficiency indicator of an evolutionary algorithm, especially a swarm intelligence-based algorithm. Good exploration prevents from getting into local minima, while good exploitation suggests efficient convergence speed [47]. Therefore, the proposed LPSONS algorithm also uses a division strategy for both population and generations in order to strike a good balance between exploration and exploitation. In the proposed algorithm, both the population size and the generations size (total number of generations in proposed algorithm) are divided into two parts: one half is used by the Mantegna Lévy distribution along with the PSO operator, while the other half is used by both the Mantegna Lévy flight and PSO operator as well as the global neighborhood search strategy in order to obtain the fitness value. In addition, in order to prevent from premature convergence, if the fitness value of generated solutions does not change for a number of iterations, the algorithm switch its approach with another one to generate new solutions. These values utilized two variables such as Limit value and Trial value. The limit value is the constant number of iterations which is defined inside the loop related to the population size and Trial is a counter. When the trial values exceed the limit value, the algorithm uses another strategy to generate solutions. This strategy is inspired by Haklı et al.'s work in which a limit value was set to change the solution generating strategy when there was not enough change in producing better solutions [23].

Mantegna Lévy distribution and PSO operators

The LPSONS algorithm employs Mantegna Lévy distribution along with PSO velocity operator in order to improve its accuracy. Based on [48], [49], Mantegna Lévy distribution is defined as follows. where is a generated solution by the Mantegna Lévy flight, α indicates the Lévy step size, φ is computed using Equation (8) which equals to , =φ⋅ and ( is a normal distribution of D dimension with ), and is a randomly selected solution, while is the best solution ever found. The step size α is defined based on the following equation. In the proposed LPSONS algorithm, a solution will first be generated using Mantegna Lévy distribution, which will then be combined with the velocity operator in the PSO algorithm as follows. where is the combined solution, is the solution generated by the Mantegna Lévy flight, and is the velocity operator for solution i defined in the original PSO algorithm using Equation (4). Compared to the standard CS algorithm that only uses Mantegna Lévy distribution, the LPSONS algorithm provides both better exploitation and exploration.

The neighborhood search method

Premature convergence is a major issue with the PSO algorithm and its variants. To avoid this issue and at the same time increase the local search, we use Neighborhood Search (NS) so that even with a high step size, the proposed algorithm can still find most of the good solutions. NS has been adopted in various other algorithms to speed up convergence, for instance, Das et al. proposed an efficient algorithm by using both local and global NS methods based on the DE/target-to-best/1 scheme to boost its convergence [50]. In a similar work, Wang et al. developed the DNSPSO algorithm that utilized both local and global NS methods [51]. Zhou et al. introduced an ABC-based algorithm that used the NS operators to produce a trial solution [52]. The general formula for generating a trial solution is based on the following equation: where , , and are mutually exclusive random number generators in the range of that satisfy and will change at the start of each generation, is the best solution for the current generation ever found by the algorithm, and and are two randomly chosen solutions that must be different from . The LPSONS algorithm uses the following equation to generate a trial solution using NS: where refers to the current solution in generation t, while are two co-efficients derived from experiments for the purpose of diversifying the solutions in order to generate better ones, is a step size generated by the Mantegna Lévy Distribution which is shown in equation (9), and F is a scaling factor in the range of [0, 1]. Figure 2 depicts the general NS strategy.
Figure 2

The global NS strategy used in LPSONS.

The global NS strategy used in LPSONS. The trial solution will be evaluated against the best solution in terms of fitness value. If it wins the competition, it will survive and be used for the next generation. Figure 3 illustrates the general steps of the LPSONS algorithm.
Figure 3

The general steps of LPSONS.

The general steps of LPSONS. Algorithm 2 lists the proposed LPSONS algorithm, which comprises two main phases. The main phases of the proposed algorithm will perform the classification based on the instructions described in Algorithm 3.
Algorithm 2

LPSONS.

Algorithm 3

.

LPSONS. . The first phase consists of two nested loops. The outer loop runs for a determined generation size set by the user, while the inner loop runs for a determined population size. If the trial set values are less than the limit value, LPSONS will generate new solutions based on the standard PSO algorithm using Equation (4); otherwise, it will use the Mantegna Lévy distribution enhanced PSO algorithm to generate new solutions using Equation (9) and Equation (11). After that, it will compare the fitness value of the generated solution with that of the local best solution to update the trial set accordingly. The second phase also consists of two nested loops with the outer loop for the generation size determined by the user and the inner loop for a determined population size. If the trial set values are less than the limit value, LPSONS will use the Mantegna Lévy distribution enhanced PSO algorithm to generate new solutions using Equation (9) and Equation (11); otherwise, it will further use NS to generate new solutions using Equation (13). After that, the fitness value of the generated solution is compared against that of the local best solution to update the trial set value. Finally, the ever best solution is chosen from those returned in both phases and its fitness value is derived accordingly.

Encoding strategy

The LPSONS adopts a vector encoding strategy in which particles are represented as randomly generated one-dimensional arrays with values in the range of [-1, 1]. Each generated solution contains connection weights and biases linking the input layer to the hidden layer as well as linking the hidden layer to the output layer. Figure 4 shows a sample solution generated by the proposed algorithm.
Figure 4

Sample solution generated by LPSONS.

Sample solution generated by LPSONS.

Results & discussion

Datasets

Fourteen standard datasets from the UCI machine learning repository are used to evaluate the LPSONS algorithm in terms of accuracy and efficiency against the benchmark algorithms of LFPSO [23], PSOLF [24], and gradient-based Back Propagation algorithm (BP) [28]. These datasets are Wisconsin breast cancer (denoted as Breast Cancer), Liver, Pima, Wine, Australian, Hepatitis, Heart, Blood, Iris, Credit, Seeds, Haberman, Balance, and Diabetes. Table 1 lists these datasets, including the number of their attributes, classes, and data objects.
Table 1

Experimental datasets.

DatasetNumber of AttributesNumber of ClassesNumber of data objects
Breast Cancer102683
Liver72345
Pima82768
Wine133178
Australian142690
Hepatitis192155
Heart132297
Blood52748
Iris43150
Credit152690
Seeds73210
Haberman32306
Balance43625
Diabetes (Diabetic Debrecen)2021151
Experimental datasets.

Experiment settings

Every algorithm used in the experiments runs for 30 times with random initial solutions on every dataset. The population size of all algorithms is 100, the number of perceptron in the hidden layer is set to 5, the value of constant variable limit which is indicated in line 2 of Algorithm 2 initialized to 10 for each particle. The datasets are divided into three parts: 70% used for training, 10% used for validation data, and 20% used for testing purpose. For the benchmark purpose, we have implemented the LFPSO and PSOLF algorithms based on their original source codes and utilized BP algorithm according to its standard algorithm. As each algorithm takes a long time to process classification of the given dataset, we have utilized Function Evaluations (FEs = 13000) as the threshold to terminate the process. Table 2 lists all the parameters and their values used by the algorithms. All algorithms have been implemented in Matlab 2016a and executed on a computer with Intel Core i3, 2.5 GHz, 4 GB RAM running Windows 7.
Table 2

Parameters and Values.

AlgorithmParametersValues
MLFβ1.9



LFPSOc1, c2, limit2, 2, 10
wMax_IterationCurrent_IterationMax_Iteration



PSOLFc1, c2, limit2, 2, 10
wMax_IterationCurrent_IterationMax_Iteration



LPSONSc1, c2, limit2, 2, 10
α1, α2, F1.49, 1.49, 0.5
wMax_IterationCurrent_IterationMax_Iteration
Parameters and Values.

Evaluation measures

We have used Mean Square Error (MSE) as the fitness function for all the training algorithms to be evaluated. The aim of each algorithm is to minimize MSE in order to achieve an optimal network. MSE is defined as follows: where n is the number of samples, while / are the actual and predicted output respectively. We have also adopted a confusion matrix as a basis for a number of evaluation metrics used to evaluate the performance of each classifier. In a classification problem, each element I is mapped to a negative label N and a positive label P and accordingly Table 3 lists a confusion matrix for binary classification of instances [28], [53]:
Table 3

Confusion Matrix.

True Positive (TP): positive instance and positively classified, False Negative (FN): positive instance and negatively classified, True Negative (TN): negative instance and negatively classified, or False Positive (FP): negative instance and positively classified. Confusion Matrix. Based on the confusion matrix, the following evaluation metrics are used to measure the performance of each classification algorithm: Accuracy: the rate of correctly classified positive and negative instances to all instances. Sensitivity (also known as Recall): the rate of classified true positive instances to actual positive instances. Specificity: the rate of classified true negative instances to actual negative instances. Precision: the rate of classified positive instances to all positive instances that should be classified positive. F-Score (also known as F-measure): a harmonic average of Precision and Recall.

Results and discussions

Table 4, Table 5 show the accuracy measures of the four algorithms for both training and testing respectively involving the 14 datasets. These values are arranged in the order of Best, Mean, and Standard Deviation (Std). Best and Mean respectively indicate the best value and the average value of the accuracy measure for the 30 individual runs, while Std indicates the standard deviation of the achieved values.
Table 4

Classification Training Accuracy.

Dataset/AlgorithmBPPSOLFLFPSOLPSONS
Breast CancerBest0.97600.97900.98060.9811
Mean0.97210.97210.97530.9759
Std0.00410.00430.00460.0041



LiverBest0.78370.78240.77400.7866
Mean0.73930.72970.75350.7652
Std0.01930.02480.01830.0177



PimaBest0.77780.79550.77440.7993
Mean0.74450.78250.77040.7890
Std0.01370.01310.01290.0076



WineBest0.99081.00001.00001.0000
Mean0.98720.98320.99681.0000
Std0.00470.00960.00560.0000



AustralianBest0.89290.89020.89440.9586
Mean0.87910.87120.88190.8818
Std0.01100.00900.00910.0279



HepatitisBest0.98531.00001.00001.0000
Mean0.97210.96190.97680.9885
Std0.01620.03050.01470.0128



HeartBest0.93920.89560.90860.9183
Mean0.91160.87190.87640.8894
Std0.01750.01130.01700.0147



BloodBest0.79340.78810.79770.8015
Mean0.75120.76740.78070.7904
Std0.01820.01340.01450.0077



IrisBest0.99210.98090.99041.0000
Mean0.96350.95190.96470.9752
Std0.03230.04950.01750.0163



CreditBest0.90460.89050.89930.9059
Mean0.88250.88220.88400.8877
Std0.01780.00650.00840.0114



SeedsBest0.97750.96400.96400.9784
Mean0.95870.94890.95660.9647
Std0.01230.01150.01060.0110



HabermanBest0.78100.78030.78500.7803
Mean0.76350.76160.76120.7654
Std0.02150.01360.01860.0134



BalanceBest0.89210.89490.96400.8995
Mean0.87050.87410.88490.8853
Std0.01890.00970.01040.0090



DiabetesBest0.71630.70990.72290.7345
Mean0.70820.70.930.71910.7109
Std0.01470.00930.00840.0109
Table 5

Classification Testing Accuracy.

Dataset/AlgorithmBPPSOLFLFPSOLPSONS
Breast CancerBest0.97480.97560.97070.9756
Mean0.95290.95900.95900.9648
Std0.09200.01160.00760.0072



LiverBest0.84270.73230.70580.7353
Mean0.65580.64690.65390.6638
Std0.04950.04480.06470.0450



PimaBest0.77000.78230.76950.7826
Mean0.72350.73690.67330.7426
Std0.05220.04120.07510.0390



WineBest0.92590.98110.98111.0000
Mean0.86670.93770.94130.9509
Std0.05310.04270.03460.0328



AustralianBest0.89850.89850.88880.8985
Mean0.86290.86040.86230.8635
Std0.03250.02630.01840.0223



HepatitisBest0.96550.91170.85290.9118
Mean0.92110.79130.79110.7941
Std0.04620.08010.04260.0399



HeartBest0.98700.83140.87640.8539
Mean0.82930.80240.82580.8112
Std0.02650.01830.03480.0264



BloodBest0.81540.79910.82140.8125
Mean0.77390.77130.77460.7799
Std0.04250.02480.03390.0244



IrisBest0.95650.97771.00001.0000
Mean0.92300.91550.97110.9755
Std0.05530.07010.02980.0286



CreditBest0.95290.87750.87750.8826
Mean0.93290.84180.82500.8290
Std0.2200.02590.05870.0621



SeedsBest0.98081.00000.98331.0000
Mean0.92150.94160.92500.9250
Std0.03930.03450.03260.0479



HabermanBest0.78720.78260.78260.8043
Mean0.74160.73470.73600.7500
Std0.03610.03700.04550.0251



BalanceBest0.86350.90900.90370.9144
Mean0.82390.88830.87160.8883
Std0.04240.01730.02320.0200



DiabetesBest0.68000.70430.67950.7092
Mean0.73150.63920.60780.6383
Std0.03690.02330.04020.0221
Classification Training Accuracy. Classification Testing Accuracy. For the training accuracy, the proposed LPSONS algorithm outperforms LFPSO, PSOLF, and BP for most of the datasets. In terms of Best, LPSONS exhibits superiority for 10/14 datasets: Breast Cancer, Liver, Pima, Australian, Blood, Iris, Credit, Seeds, Balance, and Diabetes, while for Wine and Hepatitis datasets, its performance is exactly the same as those of the benchmark algorithms. In terms of Mean, LPSONS performs better than LFPSO, PSOLF, and BP for 11/14 datasets: Breast Cancer, Liver, Pima, Wine, Hepatitis, Blood, Iris, Credit, Seeds, Haberman, and Balance. LPSONS also displays better Std for 8/14 of the datasets (Breast Cancer, Liver, Pima, Wine, Hepatitis, Blood, Iris, and Haberman), indicating that it is more stable than the benchmark algorithms. For the testing accuracy, LPSONS also outperforms LFPSO, PSOLF, and BP for most of the datasets. For Best accuracy, it performs better for 7/14 datasets: Pima, Wine, Iris, Credit, Haberman, and Balance and has a similar accuracy for the breast cancer and seeds datasets in comparison to the PSOLF algorithm. For Mean accuracy, it performs better for 8/14 datasets: Breast Cancer, Liver, Pima, Wine, Australian, Blood, Iris, and Haberman. For Std, it shows better stability and robustness for 8/14 datasets: Breast Cancer, Pima, Wine, Hepatitis, Blood, Iris, Haberman, and Diabetes. Table 6, Table 7 show the mean value of Specificity, Sensitivity, and F-Measure of the four algorithms for both training and testing respectively involving the 14 datasets. The results are organized in the order of Specificity, Sensitivity, and F-measure, all using mean values.
Table 6

Training Specificity, Sensitivity, and F-Measurere.

Dataset/AlgorithmBPPSOLFLFPSOLPSONS
Breast CancerSpec0.94950.95870.96050.9615
Sens0.95490.97030.97350.9749
F-Measure0.95600.96960.97290.9738



LiverSpec0.73310.73090.75490.769
Sens0.70890.70290.73720.7423
F-Measure0.73270.71600.74520.7526



PimaSpec0.66330.74950.75590.7506
Sens0.62830.72760.73730.7402
F-Measure0.65880.74950.75110.7581



WineSpec0.97770.99040.99621.0000
Sens0.97130.98520.99680.9980
F-Measure0.96920.98370.99670.9977



AustralianSpec0.82220.84070.85980.8629
Sens0.86040.87190.88040.8807
F-Measure0.86910.87090.87990.8805



HepatitisSpec0.92750.96940.98190.9892
Sens0.79950.91890.94650.9735
F-Measure0.83120.93220.95560.9805



HeartSpec0.87780.89260.88480.8970
Sens0.86550.86760.87340.8870
F-Measure0.86750.87150.87550.8889



BloodSpec0.61200.67500.67260.6898
Sens0.53630.56250.57610.5944
F-Measure0.62050.63610.64440.6609



IrisSpec0.97620.93920.97370.9854
Sens0.96610.95200.96650.9747
F-Measure0.96670.95280.96700.9757



CreditSpec0.91470.91770.92150.9236
Sens0.88500.88400.88620.8561
F-Measure0.88370.88270.88480.8713



SeedsSpec0.86700.96050.97370.9728
Sens0.93600.94750.95570.9641
F-Measure0.93950.94840.95670.9643



HabermanSpec0.61380.62960.64830.6245
Sens0.57170.58540.59590.5845
F-Measure0.63250.63730.64910.6377



BalanceSpec0.79220.83250.88840.9018
Sens0.63070.63720.64170.6448
F-Measure0.67690.68010.69810.7013



DiabetesSpec0.64360.69250.67110.6675
Sens0.68350.70960.72300.7137
F-Measure0.69080.70710.72170.7135
Table 7

Testing Specificity, Sensitivity, and F-Measure.

Dataset/AlgorithmBPPSOLFLFPSOLPSONS
Breast CancerSpec0.94690.94100.94740.9478
Sens0.95490.95320.95510.9617
F-Measure0.95650.95380.95560.9613



LiverSpec0.73210.64920.70900.6753
Sens0.70540.61040.63300.6437
F-Measure0.79870.65000.64730.6704



PimaSpec0.66320.67300.65620.6803
Sens0.62710.71240.62090.7164
F-Measure0.65790.71040.64870.7164



WineSpec0.96800.96890.95400.9690
Sens0.95490.94350.95330.9580
F-Measure0.95480.94470.95410.9553



AustralianSpec0.82170.81510.81860.8588
Sens0.85580.86470.86480.8617
F-Measure0.85890.86210.86350.8632



HepatitisSpec0.92540.89470.87820.8961
Sens0.78050.64680.64180.6509
F-Measure0.82600.64250.63140.6479



HeartSpec0.87720.80970.83250.8002
Sens0.86290.80120.82210.8093
F-Measure0.86510.80270.82210.8101



BloodSpec0.62710.65780.63590.6600
Sens0.53340.56270.56460.5724
F-Measure0.63190.63160.63450.6400



IrisSpec0.96350.93730.96380.9684
Sens0.96020.91500.95430.9609
F-Measure0.96260.91950.95710.9637



CreditSpec0.91480.88050.84910.8792
Sens0.88440.84360.81940.8310
F-Measure0.88300.84370.83100.8359



SeedsSpec0.86250.93950.91970.9562
Sens0.92270.93020.92900.9274
F-Measure0.91620.92940.92630.9274



HabermanSpec0.65350.59330.51830.6792
Sens0.56540.58130.60910.6142
F-Measure0.63480.63120.62980.6651



BalanceSpec0.79130.85230.87810.8930
Sens0.63060.63660.63090.6427
F-Measure0.65110.66250.65410.6961



DiabetesSpec0.62320.64820.64650.6532
Sens0.60310.63770.62070.6347
F-Measure0.62040.65670.62440.6502
Training Specificity, Sensitivity, and F-Measurere. Testing Specificity, Sensitivity, and F-Measure. For all the three measures on training networks, LPSONS displays better performance than the three benchmark algorithms do. Particularly in terms of Specificity, LPSONS does better for 10/14 datasets: Breast Cancer, Liver, Wine, Australian, Hepatitis, Heart, Blood, Iris, Credit, and Balance. In terms of Sensitivity, it yields better results for 11/14 datasets: Breast Cancer, Liver, Pima, Wine, Australian, Hepatitis, Heart, Blood, Iris, Seeds, and Balance. In terms of F-Measure, it excels for 11/14 datasets: Breast Cancer, Liver, Pima, Wine, Australian, Hepatitis, Heart, Blood, Iris, Seeds, and Balance. LPSONS also performs better in testing ANNs. For example, in terms of Specificity, LPSONS does better for 10/14 datasets: Breast Cancer, Pima, Wine, Australian, Blood, Iris, Seeds, Haberman, Balance, and Diabetes. In terms of Sensitivity, it reveals superiority for 7/14 datasets: Breast Cancer, Pima, Wine, Blood, Iris, Haberman, and Balance. In terms of F-Measure, it produces better results in 7/14 datasets: Breast Cancer, Pima, Australian, Blood, Iris, Haberman, and Balance. Table 8, Table 9 show the mean MSE and its mean Std of the four algorithms for both training and testing respectively involving the 14 datasets. For training, LPSONS outperforms the benchmark algorithms in 13/14 datasets: Breast Cancer, Liver, Pima, Wine, Australian, Hepatitis, Blood, Iris, Credit, Seeds, Haberman, Balance, and Diabetes in terms of MSE and in 9/14 datasets: Breast Cancer, Pima, Wine, Australian, Hepatitis, Iris, Credit, Seeds, and Balance in terms of Std respectively. For testing, LPSONS outperforms the benchmark algorithms in 10/14 datasets: Breast Cancer (denoted as breast cancer), Pima, Wine, Australian, Blood, Iris, Seeds, Haberman, Balance, and Diabetes in terms of MSE and in 9/14 datasets: Pima, Australian, Hepatitis, Blood, Iris, Seeds, Haberman, Balance, and Diabetes in terms of Std respectively.
Table 8

Training MSE.

Dataset/AlgorithmBPPSOLFLFPSOLPSONS
Breast CancerMSE0.03110.02420.04170.0210
Std0.00410.00340.06350.0033



LiverMSE0.19140.19770.18890.1818
Std0.01450.00910.00470.0070



PimaMSE0.19030.15360.17110.1472
Std0.00510.00480.01710.0041



WineMSE0.04180.03060.02000.0146
Std0.00630.00560.00480.0029



AustralianMSE0.10890.09940.09490.0910
Std0.00700.00540.00540.0044



HepatitisMSE0.07490.04520.03140.0247
Std0.01850.01600.00940.0091



HeartMSE0.10060.09450.09950.1508
Std0.01270.00490.00750.0055



BloodMSE0.16070.15180.15270.1428
Std0.00700.00410.00370.0050



IrisMSE0.03830.04270.02810.0258
Std0.02650.02300.00480.0044



CreditMSE0.09340.09430.09160.0884
Std0.00740.00350.00440.0034



SeedsMSE0.06800.04080.03090.0281
Std0.02590.00610.00370.0033



HabermanMSE0.17770.16760.16650.1654
Std0.01060.00870.00770.0063



BalanceMSE0.14490.07030.06760.0534
Std0.02530.00810.00950.0054



DiabetesMSE0.17480.15560.15150.1477
Std0.00590.00270.00350.0049
Table 9

Testing MSE.

Dataset/AlgorithmBPPSOLFLFPSOLPSONS
Breast CancerMSE0.03100.03400.03330.0299
Std0.00790.01000.00520.0067



LiverMSE0.19280.20030.22650.2577
Std0.02010.03430.01990.0329



PimaMSE0.19070.18750.24460.1863
Std0.01010.02750.05360.0267



WineMSE0.04170.05850.04670.0396
Std0.03130.02280.01310.0146



AustralianMSE0.10870.11130.11680.1087
Std0.01410.01460.01280.0125



HepatitisMSE0.07840.19590.20820.1927
Std0.08790.06170.05770.0506



HeartMSE0.10240.15140.13750.1570
Std0.02170.01490.01960.0207



BloodMSE0.16060.16090.16030.1579
Std0.02290.01770.01620.0159



IrisMSE0.03990.06290.03920.0363
Std0.01520.04090.01570.0129



CreditMSE0.09380.12840.13780.1587
Std0.03460.01460.02710.0540



SeedsMSE0.07570.05090.05110.0455
Std0.04230.01480.01260.0115



HabermanMSE0.17920.19010.19100.1790
Std0.02130.01820.02340.0137



BalanceMSE0.14620.06600.07170.0565
Std0.03370.00900.01170.0081



DiabetesMSE0.17510.17400.18060.1737
Std0.01110.01090.01980.0027
Training MSE. Testing MSE. In terms of average computational time, all algorithms have been executed for 13,000 function evaluations (FEs) as a fair measure criterion. As shown in Table 10, the BP algorithm takes the longest time, while the LFPSO algorithm is the fastest one. The proposed LPSONS algorithm is the second fastest algorithm, slightly slower than LFPSO but performing significantly better than LFPSO in terms of most evaluation measures.
Table 10

Average Computational Time (Seconds).

DatasetsAverage Computational Time
BPPSOLFLFPSOLPSONS
Breast Cancer839.3459281.9295196.0042236.9956
Liver413.4199249.6352234.4620246.4468
Pima413.0324281.5354210.7203236.5657
Wine473.9051301.5204211.9306241.1945
Australian414.1895219.8676202.8870244.5231
Hepatitis445.3621304.756298.6928302.2047
Heart401.0953255.58015223.6950226.888
Blood509.6808263.2071219.4589249.3311
Iris717.8826261.5717239.9011241.0838
Credit373.1287259.7465222.1118249.8160
Seeds540.9844261.55765232.1413246.2889
Haberman499.205252.1027239.4086255.6158
Balance411.0724260.3498218.8188247.0611
Diabetes410.0023264.14185255.9756257.2963
Average Computational Time (Seconds). Figure 5, Figure 6 graphically depict the convergence curves on the best MSE achieved for the three PSO-based algorithms to train the network on the 14 datasets. It is clear that the proposed LPSONS algorithm exhibits a better convergence rate for 11/14 of the datasets: Breast Cancer (A), Pima (C), Wine (D), Australian (E), Hepatitis (F), Blood (H), Iris (I), Seeds (K), Haberman (L), Balance (M), and Diabetes (N).
Figure 5

Convergence Curves on Mean MSE for LPSONS, LFPSO, and PSOLF on Breast Cancer (A), Liver (B), Pima (C), and Wine (D) Datasets to train MLP Neural Network.

Figure 6

Convergence Curves on Mean MSE for LPSONS, LFPSO, and PSOLF on Australian (E), Hepatits (F), Heart (G), and Blood (H), Iris (I), and Credit (J), Seeds (K), Haberman (L), Balance (M), and Diabetes (N) Datasets to train MLP Neural Network.

Convergence Curves on Mean MSE for LPSONS, LFPSO, and PSOLF on Breast Cancer (A), Liver (B), Pima (C), and Wine (D) Datasets to train MLP Neural Network. Convergence Curves on Mean MSE for LPSONS, LFPSO, and PSOLF on Australian (E), Hepatits (F), Heart (G), and Blood (H), Iris (I), and Credit (J), Seeds (K), Haberman (L), Balance (M), and Diabetes (N) Datasets to train MLP Neural Network. One class of optimization algorithms, such as GA and Differential Evaluation (DE), pertains to evolutionary algorithms that utilize abrupt random changes in generated solutions. Another class, such as PSO and ABC, is related to swarm-intelligence based algorithms. Due to the fact that these algorithms need to move in a search space and there is no abrupt change to leap from one side of the search space to another side, they generally cannot perform better than evolutionary algorithms do in terms of exploration. This class of algorithms is guided by the best solution achieved at each stage and hence its performance benefits from exploitation and a good convergence rate. However, the performance of a generated solution is subject to the initial position; if the best solution is located in a local solution, there is a danger of stagnation into the local minimum. Therefore, keeping a good tradeoff between exploration and exploitation is a key factor that enables LPSONS to be more efficient and more robust than both PSOLF and LFPSO for most of the datasets. A sensitivity analysis has been done to find out the impact of each component and the effects of Mantegna Lévy flight and the Neighborhood search in the LPSONS algorithm. After conducting the experiments involving different components, it is clear that combining PSO operators with Mantegna Lévy flight has a great impact on the global search and it contributes to the convergence speed too, while the NS strategy contributes to the local search. The original PSO algorithm (PSO) can be trapped into local minimum in some cases, leading to lower accuracy; however, running the algorithm with the MLF strategy (MLF) yields better results, confirming that the algorithm can converge to the global optimum due to searching new areas in the search space and good convergence rate. The MLF tries different step sizes resulting in exploring different areas in the search space and avoiding local minima. As an example, Figure 7, Figure 8 show the results of some experiments on different components of the algorithm in terms of error rate and convergence speed; however, the conducted experiments have been done on different angles. The results are mean values of error rate in both train and test samples using the Iris dataset as well as of the convergence rate in train samples for thirty independent runs involving PSO (the original PSO algorithm), MLF (the algorithm using only MLF and PSO operators to generate the solutions (Equation (11))), and NS (the algorithm using Neighborhood Search to generate solutions (Equation (13))). It is clear that NS alone does not achieve a good convergence rate as it only contributes to local search in the algorithm, while MLF has the most significant impact on the algorithm's convergence rate. However, using both NS and MLF in the algorithm helps achieve the best results as compared to using them alone. In fact, running the algorithm using only PSO helps shed a light on the fact that the original PSO can be trapped into local minimum, while running the algorithm with both NS and MLF proves the fact that the algorithm is more capable of avoiding local minimum thanks to its good convergence speed as well as suitable global and local search. Thus, due to the good global search, convergence speed, and local search, it can be concluded that MLF and NS can contribute to both good exploration and exploitation. A good exploration prevents from getting into local minima, while a good exploitation suggests efficient convergence speed. Another key point is the division strategy used by the algorithm to help create a good balance between exploration and exploitation, a point that has been proved important in other studies such as [46].
Figure 7

Error rate of both Train and Test data for original PSO operators (PSO) (a1-a2), Mantegna Lévy Flight (MLF) (b1-b2), and Neighborhood Search (NS) (c1-c2).

Figure 8

Convergence rate for original PSO operators (PSO), Mantegna Lévy Flight (MLF), and Neighborhood Search (NS).

Error rate of both Train and Test data for original PSO operators (PSO) (a1-a2), Mantegna Lévy Flight (MLF) (b1-b2), and Neighborhood Search (NS) (c1-c2). Convergence rate for original PSO operators (PSO), Mantegna Lévy Flight (MLF), and Neighborhood Search (NS). To statistically test whether there are significant differences between the results produced by LPSONS and those produced by LFPSO, PSOLF, and BP across multiple test attempts, we have conducted Friedman test [54] on the results of training and testing accuracy as well as of training and testing MSE with the significance level of 5%. If the p-value of a test is not greater than 0.05, the null hypothesis is rejected, in other words, the difference is significant. Table 11 lists the statistical test results from which it is clear that the results produced by LPSONS are statistically significant, which confirms that they are deterministic rather than achieved stochastically or by chance.
Table 11

Friedman test.

MeasuresStatistical ValueP-ValueNull Hypothesis
Training Accuracy25.061.50185e-05Rejected
Testing Accuracy10.250.0166Rejected
Training MSE26.318.19622e-06Rejected
Testing MSE9.860.0198Rejected
Friedman test.

Conclusions

Motivated by the identified gaps of premature convergence and local optima stagnation in the family of swarm-intelligence based algorithms from the literature and inspired by the NFL theorem, this paper has presented a robust and efficient hybrid approach to optimizing the training of feedforward MLP neural networks by utilizing Mantegna Lévy Flight, PSO operators, and the global neighborhood search strategy. The proposed LPSONS algorithm has been evaluated against two well-known swarm-intelligence based algorithms LFPSO and PSOLF, as well as a gradient-based Back Propagation (BP) algorithm. PSOLF and LFPSO both enhanced the original PSO algorithm using Lévy flight. With fourteen standard datasets from UCI machine learning repository, LPSONS has significantly outperformed the employed benchmark algorithms in terms of the measurement metrics of Accuracy, Specificity, Recall, and F-measure for the classification of data. Furthermore, it reveals less error rate and better convergence speed in terms of mean MSE. It can be concluded that the proposed approach is a good trainer for MLP neural networks as it can avoid from local minima through a good balance between exploration and exploitation and at the same time is fast and flexible enough to handle a diversity of real-world classification problems. In future work, we will apply and extend LPSONS to other ANN structures. We will also explore using the proposed approach to solve function approximation problems and other problems such as text clustering and feature subset selection. Furthermore, it is important to investigate how LPSONS works when solving complex classification problems.

Declarations

Author contribution statement

Omid Tarkhaneh: Conceived and designed the experiments; Performed the experiments; Wrote the paper. Haifeng Shen: Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Competing interest statement

The authors declare no conflict of interest.

Additional information

Data associated with this study has been deposited at https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29 https://archive.ics.uci.edu/ml/machine-learning-databases/liver-disorders/ https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv https://archive.ics.uci.edu/ml/machine-learning-databases/wine/ http://archive.ics.uci.edu/ml/datasets/statlog+(australian+credit+approval) https://archive.ics.uci.edu/ml/datasets/hepatitis http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/heart/ https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center https://archive.ics.uci.edu/ml/datasets/iris https://archive.ics.uci.edu/ml/datasets/credit+approval https://archive.ics.uci.edu/ml/datasets/seeds https://archive.ics.uci.edu/ml/datasets/Haberman's+Survival http://archive.ics.uci.edu/ml/datasets/balance+scale https://archive.ics.uci.edu/ml/datasets/Diabetic+Retinopathy+Debrecen+Data+Set
  4 in total

1.  GGA-MLP: A Greedy Genetic Algorithm to Optimize Weights and Biases in Multilayer Perceptron.

Authors:  Priti Bansal; Rishabh Lamba; Vaibhav Jain; Tanmay Jain; Sanchit Shokeen; Sumit Kumar; Pradeep Kumar Singh; Baseem Khan
Journal:  Contrast Media Mol Imaging       Date:  2022-02-24       Impact factor: 3.161

2.  A New Initialization Approach in Particle Swarm Optimization for Global Optimization Problems.

Authors:  Waqas Haider Bangyal; Abdul Hameed; Wael Alosaimi; Hashem Alyami
Journal:  Comput Intell Neurosci       Date:  2021-05-17

3.  Training Feedforward Neural Network Using Enhanced Black Hole Algorithm: A Case Study on COVID-19 Related ACE2 Gene Expression Classification.

Authors:  Elham Pashaei; Elnaz Pashaei
Journal:  Arab J Sci Eng       Date:  2021-01-23       Impact factor: 2.334

4.  Automatically detecting and understanding the perception of COVID-19 vaccination: a middle east case study.

Authors:  Wajdi Aljedaani; Ibrahem Abuhaimed; Furqan Rustam; Mohamed Wiem Mkaouer; Ali Ouni; Ilyes Jenhani
Journal:  Soc Netw Anal Min       Date:  2022-09-04
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.