Literature DB >> 32879410

Hybrid Harris hawks optimization with cuckoo search for drug design and discovery in chemoinformatics.

Essam H Houssein1, Mosa E Hosney2, Mohamed Elhoseny3, Diego Oliva4,5, Waleed M Mohamed6, M Hassaballah7.   

Abstract

One of the major drawbacks of cheminformatics is a large amount of information present in the datasets. In the majority of cases, this information contains redundant instances that affect the analysis of similarity measurements with respect to drug design and discovery. Therefore, using classical methods such as the protein bank database and quantum mechanical calculations are insufficient owing to the dimensionality of search spaces. In this paper, we introduce a hybrid metaheuristic algorithm called CHHO-CS, which combines Harris hawks optimizer (HHO) with two operators: cuckoo search (CS) and chaotic maps. The role of CS is to control the main position vectors of the HHO algorithm to maintain the balance between exploitation and exploration phases, while the chaotic maps are used to update the control energy parameters to avoid falling into local optimum and premature convergence. Feature selection (FS) is a tool that permits to reduce the dimensionality of the dataset by removing redundant and non desired information, then FS is very helpful in cheminformatics. FS methods employ a classifier that permits to identify the best subset of features. The support vector machines (SVMs) are then used by the proposed CHHO-CS as an objective function for the classification process in FS. The CHHO-CS-SVM is tested in the selection of appropriate chemical descriptors and compound activities. Various datasets are used to validate the efficiency of the proposed CHHO-CS-SVM approach including ten from the UCI machine learning repository. Additionally, two chemical datasets (i.e., quantitative structure-activity relation biodegradation and monoamine oxidase) were utilized for selecting the most significant chemical descriptors and chemical compounds activities. The extensive experimental and statistical analyses exhibit that the suggested CHHO-CS method accomplished much-preferred trade-off solutions over the competitor algorithms including the HHO, CS, particle swarm optimization, moth-flame optimization, grey wolf optimizer, Salp swarm algorithm, and sine-cosine algorithm surfaced in the literature. The experimental results proved that the complexity associated with cheminformatics can be handled using chaotic maps and hybridizing the meta-heuristic methods.

Entities:  

Mesh:

Year:  2020        PMID: 32879410      PMCID: PMC7468137          DOI: 10.1038/s41598-020-71502-z

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

The prediction and analysis of molecules are essential tasks in cheminformatics, which use methods from mathematics and computer science to enhance their performance. The implementation of these methods depends on databases. The processes that generate most of the affectations are the storage and retrieval of molecular structures and properties (e.g., pharmacogenomics data). Typically, the behavior of the compounds can be investigated using molecular analysis. The molecular analysis helps to develop and test molecules for decreasing the effects of specific diseases[1]. One drawback associated with cheminformatics is the exponential increment of the search space owing to features in the dataset[2]. However, cheminformatics is still being widely used in drug design, where the protein structures are estimated and the interactions of molecules and biological targets can be determined by considering the basis of the cellular processes[1]. A drug is an organic molecule that can inhibit the effects of a disease. The main points for drug design and discovery are: (1) structure optimization[3], (2) establishment of the quantitative structure-activity relationship (QSAR)[4], and (3) docking of the ligand into a receptor denovo design of ligands[5]. Thus, drug design and discovery aim to develop new medicines based on the knowledge about a biological target[6]. The features contained in the datasets are essential for cheminformatics, but due to the big amount of generated information, it results in complicated to handle them in most of the cases[7]. Generally speaking, feature selection (FS) is an important preprocessing step for performance enhancement in data mining. FS is especially used for classification and regression problems. FS approaches are widely used to eliminate the irrelevant and redundant features from the original dataset, therefore, the dimensionality of the dataset is reduced[8]. As was mentioned cheminformatic datasets are huge and the use of FS is mandatory in order to identify the best subset of information. Typically, the FS approaches can be divided into wrapper and filter methods[9]. The wrapper-based approaches often cope with the filters, because the proposed subset of features is directly assessed using feedback from the learning algorithm as to its accuracy[10,11]. In the wrapper techniques, the option of using machine learning algorithms is wide open, then it is possible to find implementations of the most popular algorithms including support vector machines (SVMs) and K-nearest neighbor (KNN), among others. Nevertheless, in order to find an efficient FS technique, researchers have put significant efforts, particularly those working with metaheuristic algorithms (MAs). In this regard, a wide spectrum of MAs are either used alone[12] or with others to form hybrid methods[13] for efficient results, since a comprehensive list can be easily found in this review[14]. Due to the success of MAs in solving complex problems[15], they can be employed in cheminformatics. Harris hawks optimization (HHO) is a recent method introduced in[16]. Apart from its novelty, HHO is a powerful optimization tool that is robust, exhibits smooth transitions between exploration and exploitation, and provides competitive results to complex problems[17]. However, there is no perfect MA, and HHO has some disadvantages. In HHO, exploration, and exploitation are unbalanced and it has premature convergence when the problems are highly multimodal[18]. In this context, the cuckoo search (CS) algorithm is inspired by the breeding behavior of the cuckoo birds. It has been introduced as an alternative method for global optimization[19]. Since its publication, CS has been widely used by the scientific community[20-22]. In addition, CS is applied for secondary protein structure prediction[23]. Generally, the advantages of CS are that it ensures global convergence and maintains a well balance between exploration and exploitation[24]. The use of Lvy flights in CS permits them to perform a successful global search, which is reflected in their capabilities to obtain space using sub-optimal solutions. However, chaos is part of the nonlinear dynamic systems. Chaos is described as a behavior of complex systems, where small, random, and unpredictable changes can be observed over time with respect to the initial conditions. The concepts of chaos are helpful in optimization because they help to generate accurate solutions. Chaos is commonly used instead of random distributions to improve MA performance[25]. The inclusion of chaotic maps in optimization methods increases the diversity of solutions by avoiding local solutions and speeding up the convergence. In the basic HHO, the control energy parameter E, as well as the position vectors, called and plays the main role in avoiding the local optima and balancing the exploitation and exploration. Therefore, in this study, we introduce a hybrid method that combines the benefits of HHO with those of CS and chaotic maps (C); this algorithm can be referred to as CHHOCS. The concept of the CHHOCS is to enhance the search process of HHO to obtain near-optimal solutions. To be specific, a new formulation of the initial escape energy , escaping energy factor E and the initialization of solutions with chaotic maps are presented. The inclusion of chaotic maps may avoid the local optima and accelerates the convergence. Additionally, in CHHOCS method, CS is used to control the position vectors called and of the basic HHO. The objective (or fitness) function is then shared in the entire optimization process. It means that the CS works with the same objective function used by HHO. Finally, the CHHOCS is combined with the support vector machine (SVM) to select the appropriate chemical descriptors (features) and compounds activities. In addition, this study investigates the influence of the chaotic map with respect to the cheminformatics problems. Several experiments and comparisons have been conducted with respect to different versions to select the version which provides the most accurate solutions. Furthermore, twelve datasets are used to evaluate the efficiency of CHHOCS compared to seven well-known metaheuristic algorithms, including: HHO[16], CS[19], particle swarm optimization (PSO)[26], moth-flame optimization (MFO)[27], grey wolf optimizer (GWO)[28], salp swarm algorithm (SSA)[29], and sine–cosine algorithm (SCA)[30]. The CHHOCS method achieves the best results of classification accuracy and the number of selected features when compared with the remaining competitor algorithms. The major contributions of this work are as follows: The rest of this paper is structured as follows. Literature review is presented in “Related work” section. “Materials and methods” section introduces the necessary material and methods used in the study, such as QSAR, SVM, HHO, the theory of Cuckoo search (CS) algorithm, and the chaotic maps. Meanwhile, “The proposed CHHOCS” section explains the pre-processing process and introduces the proposed CHHOCS method. The experimental result and discussion are presented in “Results” section. Finally, the conclusion of the paper is provided in “conclusion” section. A new CHHOCS method is proposed based on combining HHO with the benefits of CS and chaotic maps. CS and chaotic maps (C) are used to enhance the limitations of the original HHO. The SVM classifier is utilized in the CHHOCS to select the chemical descriptors and chemical compound activities. Several experiments are conducted on various datasets to confirm the superiority of the proposed CHHOCS method in combination with SVM compared with other metaheuristic algorithms.

Related work

A previously conducted study has investigated drug design and discovery, exhibiting differences in efficiency[31]. The available tools used to identify chemical compounds which are known as computer-aided drug design (CADD) allows the reduction of different risks associated with the subsequent rejection of lead compounds. CADD has an important role and exhibits high success rates for the identification of the hit compounds[32]. The CADD methodology has two related concepts: ligand/hit optimization and ligand/hit identification. Methods hitting identification/optimization are based on the efficiency of the virtual screening techniques used to achieve the target binding sites. They are known to dock huge libraries for small molecules including chemical information or ZINC database, to identify the compounds based on the pharmacophore modeling tools (docking) to predict the optimal medicines and proteins obtained using the information from the ligand. The Pymol software[33] is useful in selecting the optimal ligand as the optimal drug, and the AutoDock software is employed to calculate the energy[5]. Thus, genetic algorithms (GAs) are applied in the AutoDock software and AutoDock Vina[34]. Also, in[35], fuzzy systems have been introduced to address the optimization of the chemical product design. Another important method for drug design called QSAR is derived from CADD to extract the description of the correlation among different structures from a set of molecules and the response to the target[36]. Drug design and discovery are the main aspects of cheminformatics[37]. Cheminformatics can be divided into two sub-processes. The first process considers three-dimensional information; this process is called encoding. The second process, which is called mapping, comprises building a model using machine learning (ML) techniques[38]. In the encoding process, the molecular structure is transformed based on the calculation of the descriptors[36]. Moreover, the mapping process aims to discover different mappings created between the feature vectors and their properties. In cheminformatics and drug discovery, the mapping can be performed using various machine learning[2,39]. Chaotic maps are random-like deterministic methods that constitute dynamic systems. They have nonlinear distributions indicating that chaos is a simple deterministic dynamic system and a source of randomness. Chaos has random variables instead of chaotic variables and absolute searches can be performed with higher speeds when compared with stochastic search methods mainly based on probabilities. In a previous study[40], chaotic maps have been considered to improve the performance of the whale optimization algorithm and balance the exploration and exploitation phases. Also, a grey wolf optimizer and flower pollination algorithm have been enhanced using ten chaotic maps to extract the parameters of the bio-impedance models[41]. Meanwhile, in[42], the grasshopper optimization algorithm with chaos theory is employed to accelerate its global convergence and avoid local optimal. In[43] the schema of the CS algorithm based on a chaotic map variable value is introduced. In fact, the methodology of hybridizing MAs is widely used in different domains of optimization other than feature selection[44]. In this vein, combinations of different ML techniques and MAs (e.g., search strategies) have been applied in many fields with modifications and hybridization to benefit from one technique in uplifting search efficiency. For instance, the salp swarm algorithm combined with k-NN based on QSAR is an interesting alternative, which provides competitive solutions[45]. Also, Houssein et al.[37] introduced a novel hybridization approach for drug design and discovery-based hybrid HHO and SVM. However, in this study, we applied hybridization to select the chemical descriptor and compound activities in cheminformatics. Particularly, this study proposes an alternative classification approach with respect to cheminformatics, termed as CHHOCS-based SVM classifier, for selecting the chemical descriptor and chemical compound activities; the hybrid HHO and CS were enhanced based on the chaos (C) theory.

Materials and methods

In this section, we briefly discus the QSAR model, the basics of SVM, the original HHO, the original CS, and the chaotic map theory.

Quantitative structure-activity relationship

QSAR provides information based on the relation between the mathematical models associated with the biological activity and the chemical structures. QSAR is widely used because it can detect major characteristics of the chemical compounds. Therefore, it is not necessary to test and synthesize compounds. The inclusion of ML methods to study QSAR helps to predict whether the compound activity is similar to a drug-like activity in case of a specific disease or a chemical test. The compounds possess complex molecular structures, containing many attributes for their description. Some of the features include characterization and topological indices. Therefore, molecular descriptors are highly important in pharmaceutical sciences and chemistry[4].

Support vector machine

SVM is an important supervised learning algorithm commonly used for classification[46]. SVM extracts different points from the data and maps them in a high-dimensional space using a nonlinear kernel function. SVM works by searching for the optimal solution for class splitting. The solution can be used to maximize the distance with respect to the nearest points defined as support vectors, and the result of SVM is a hyperplane. For obtaining optimal results, SVM has some parameters that have to be tuned. The C controls the interaction between smooth decision boundaries and the accurate classification of the training points. If the C has a significant value, more training points will be accurately obtained, indicating that more complex decision curves will be generated by attempting to fit in all the points. The different values of C for a dataset can be used to obtain a perfectly balanced curve and prevent over-fitting. is utilized to characterize the impact of single training. Low gamma implies that each point will have a considerable reach, whereas high gamma implies that each point has a close reach. The implementation of SVM has been extended to cheminformatics. In this work, steps of SVM are presented in Algorithm 1, and its graphical description is presented in Fig. 1.
Figure 1

General structure of a decision boundary in SVMs classification.

General structure of a decision boundary in SVMs classification.

Harris hawks optimization

HHO[16] is a metaheuristic algorithm and is implemented as a competitive solution for complex problems. HHO is inspired by the attitude of Harris hawks, which are intelligent birds. This species possesses a mechanism that allows them to catch prey even when they are escaping. This process is modeled in the form of a mathematical expression, allowing its computational implementation. HHO is a stochastic algorithm that can explore complex search spaces to find optimal solutions. The basic steps of HHO can be obtained with respect to various states of energy. The exploration phase simulates the mechanism when Harris’s hawk cannot accurately track the prey. In such a case, the hawks take a break to track and locate new prey. Candidate solutions are the hawks in the HHO method, and the best solution in every step is prey. The hawks randomly perch at different positions and wait for their prey using two operators, which are selected on the basis of probability q as given by Eq. (1), where indicates that the hawks perch at the location of other population members and the prey (e.g., rabbit). If , the hawks are at random positions around the population range. For facilitating the understanding of HHO, a list of symbols used in this algorithm is defined as follows: The exploration step is defined as:The average location of the Hawks is represented by:where shows the positions in the iteration for each Hawk t and N identifies the total number of Hawks. The average position can be obtained by using different methods, but this is the simplest rule. A good transition from exploration to exploitation is required, here a shift is expected between the different simulated exploitative behaviors based on the escaping energy factor E of the prey, which diminishes dramatically during the escaping behavior. The energy of the prey is computed by Eq. (3).where E, , and T represent the initial escape energy, the escape energy and the maximum number of iterations, respectively. Vector of hawks position (search agents) Position of Rabbit (best agent) Position of a random Hawk Hawks average position Maximum number of iterations, swarm size, iteration counter T, N, t Random numbers between (0, 1) , , , , , q Dimension, lower and upper bounds of variables D, LB, UB Initial state of energy, escaping energy , E The soft besiege is an important step in HHO, it is shown if and . In this scenario, the rabbit has all sufficient energy. When it occurs, the rabbit performs random misleading shifts to escape, but in the metaphor, it cannot. The besiege step is defined by the following rules:where is the difference locations vector for all rabbits and for presently positions in the iteration t, and Is the rabbit’s spontaneous jumping ability throughout the escaping phase. The J value varies randomly in each iteration to represent the rabbit’s behavior. In the extreme siege stage when and , The prey is exhausted and has no escaping strength. The Harris hawks are hardly circling the trained prey, and they can make an assault of surprise. For this case, the current position is changed using:Consider the behavior of hawks in real life, they will gradually choose the best dive for the prey if they want to capture specific prey in competitive situations. This is simulated by:The soft besiege presented in the previous Eq. (7) is performed in progressive rapid dives only if but . In this case, the rabbit has sufficient energy to escape and is applied for a soft siege before the attack comes as a surprise. The HHO models have different patterns of escape for a leap frog and prey movements. The Lévy flights (LF) are launched here to emulate the various movements of the Hawk and rabbit dives. Eq. (8) computes such patterns.where S represents the random vector for size and LF is for the levy flight function, using this Eq. (9):Here u, v are random values between (0, 1), is the default constant set to 1.5. The final step in the process is to update positions of the hawks using:where Y and Z are obtained using Eqs. (7) and (8). During progressive fast dives, HHO is also hard-pressed, where it may happen if and . Here the strength of the rabbit to escape is not sufficient and the hard siege is suggested before the numerous surprise attacks are made to catch and kill the prey. In this step, Hawks seek to reduce the various distances between their prey and the average position. This operator is explained as follows:The values of Y and Z are proposed by using new rules in Eqs. (12) and (13), where is obtained using Eq. (2).

Cuckoo search

Fundamentally, Cuckoo Search (CS) is a metaheuristic algorithm used often for solving complex problems of optimization[19]. The cuckoo quest hypothesis is inspired by a bird known as the cuckoo. Cuckoos are interesting creatures not only because they can make beautiful sounds but also for their aggressive strategy of reproduction. In the nests of other host birds or animals, adult cuckoos lay their eggs. Cuckoo search is based on three main rules: The probability is based on these three rules such that the host bird can either throw away the egg or leave the nest and build a completely new nest. This statement may be approximated by a fraction of n nests that are replaced by new nests (with new random solutions). The pseudo-code of CS is shown in Algorithm 2. Growing cuckoo lays one egg at a time and dumps the egg in a nest selected randomly. The best nest with high-quality eggs will be delivered to the next generation. The number of host nests available is set and the host bird finds the egg laid by a cuckoo with a probability .

Chaotic maps

The majority of MAs have been established based on stochastic rules. These rules primarily rely on certain randomness obtained using certain distributions of probabilities, which are often uniform or Gaussian. In principle, the replacement of this randomness with chaotic maps can be beneficial because of the significant dynamic properties associated with the behavior of chaos. This dynamic mixing is important to ensure that the solutions obtained using the algorithm are sufficiently diverse to enter any mode in the objective multimodal landscape. These approaches, which use chaotic maps, are called chaotic optimization instead of random distributions. The mixing properties of chaos will perform the search process at higher speeds than traditional searches based on the standard probability distributions[47]. One-dimensional non-invertible maps will be used to produce a set of variants of chaotic optimization algorithms to achieve this ability. Table 1 presents some of the prominent chaotic maps used in this study. In addition, chaotic maps are obliged to result in 0/1 based on the normalization concept.
Table 1

Details of chaotic maps applied on CHHO–CS.

No.Map nameRef.Map equationNotes
M1Tent[48]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=\left\{ \begin{array}{l} \frac{{x}_{k}}{0.7},{{x}_{k}}\prec 0.7 \\ \frac{10}{3}(1-{{x}_{k}}),{{x}_{k}}\ge 0.7 \\ \end{array} \right. $$\end{document}xk+1=xk0.7,xk0.7103(1-xk),xk0.7
M2Logistic[49]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=a{{x}_{k}}(1-{{x}_{k}})$$\end{document}xk+1=axk(1-xk)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{o}}\in (0,1)$$\end{document}xo(0,1) for kth chaotic number
M3Sinusoidal[49]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=ax_{k}^{2}\sin (\pi {{x}_{k}})$$\end{document}xk+1=axk2sin(πxk)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document}μ is a parameter between 0.9 and 1.08
M4Singer[50]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=\mu (7.86{{x}_{k}}-23.31x_{k}^{2}+28.75x_{k}^{3}-13.3x_{k}^{4})$$\end{document}xk+1=μ(7.86xk-23.31xk2+28.75xk3-13.3xk4)
M5Sine[51]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=\frac{a}{4}\,\,\sin (\pi {{x}_{k}})$$\end{document}xk+1=a4sin(πxk)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0\prec a\prec 4$$\end{document}0a4
M6Chebyshev[52]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=\cos (k\,{{\cos }^{-1}}({{x}_{k}}))$$\end{document}xk+1=cos(kcos-1(xk))
M7Circle[53]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}={{x}_{k}}+b-(\frac{a}{2\pi })\sin (2\pi {{x}_{k}})\bmod $$\end{document}xk+1=xk+b-(a2π)sin(2πxk)moda = 0.5 and b = 0.2, it generates chaotic sequence in (0, 1)
M8Iterative[54]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=\sin (\frac{a\pi }{{{x}_{k}}})$$\end{document}xk+1=sin(aπxk)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a\in (0, 1)$$\end{document}a(0,1)
M9Gauss/Mouse[55]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=\left\{ \begin{array}{l} 0\,{{x}_{k}}=0 \\ \frac{1}{{{x}_{k}}\bmod (1)},otherwise \\ \end{array} \right. $$\end{document}xk+1=0xk=01xkmod(1),otherwiseGenerates chaotic sequences in (0, 1)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{{{x}_{k}}\bmod (1)}=\frac{1}{{{x}_{k}}}-\left[ \frac{1}{{{x}_{k}}} \right] $$\end{document}1xkmod(1)=1xk-1xk
M10Piecewise[56]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{x}_{k+1}}=\left\{ \begin{array}{l} \frac{{x}_{k}}{P},0\le {{x}_{k}}\prec P \\ \frac{{x}_{k}-P}{0.5-P}\,P\le {{x}_{k}}\prec 0.5 \\ \frac{1-P-{{x}_{k}}}{o.5-P},0.5\le {{x}_{k}}\prec 1-P \\ \frac{1-{{x}_{k}}}{P},1-P\le {{x}_{k}}\prec 1 \\ \end{array} \right. $$\end{document}xk+1=xkP,0xkPxk-P0.5-PPxk0.51-P-xko.5-P,0.5xk1-P1-xkP,1-Pxk1The control parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\in (0,0.5)$$\end{document}P(0,0.5) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in (0,1)$$\end{document}x(0,1) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\ne 0$$\end{document}P0
The main task of chaotic maps is to avoid the local optima and speed up the convergence. Here, it is important to mention that the nature of chaotic maps could also increase the exploration due to the intrinsic randomness. It is necessary to properly select the best map that helps each algorithm for a specific problem. Another important point to be considered is that chaotic maps do not take decision about the exploration and exploitation of the algorithms. However, along with the iterations, the chaotic values generated by the maps permit to change the degree of exploration or exploitation of the search space. Details of chaotic maps applied on CHHOCS.

The proposed CHHO–CS

In this section, the proposed CHHOCS is explained in detail, which is used to improve the search-efficiency of basic HHO. Typically, HHO has the characteristics of acceptable convergence speed and a simple structure. However, for some complex optimization problems, HHO may fail to maintain the balance between exploration and exploitation and fall into a local optimum. Especially in the face of high dimension functions and multi-modal problems, the shortcomings of HHO are more obvious. The optimization power of the basic HHO depends on the optimal solution[57]. In this paper, we introduced two strategies (Chaotic maps, and CS) to enhance the performance of the basic HHO. The following points are worthwhile: Chaotic maps influence: applying chaos theory to the random search process of MAs significantly enhances the effect of random search. Based on the randomness of chaotic local search, MAs can avoid falling into local optimum and premature convergence. In the basic HHO algorithm, the transition from global exploration to local exploitation is realized according to Eq. (3). As a result, the algorithm will easily fall into a local optimum. Hence, in the CHHOCS algorithm, a new formulation of initial escape energy and escaping energy factor E with chaotic maps are employed as demonstrated in Algorithm 3. Figure 2 shows the influence of a chaotic map on the energy parameter E obtained by the proposed method versus the basic HHO. Notably, the curve in the left-side linearly decreasing versus the proposed non-linear energy parameter defined by the new formulation of E, which clearly focuses on providing the search direction towards the middle of the search process to infuse enough diversity in population during the exploitation phase.
Figure 2

Influence of proper selection of energy parameter E.

CS method influence: in the basic HHO, the position vectors and are responsible for the exploration step defined by Eq. (1), which plays a vital role in balancing the exploitation and exploration. More significant values of position vectors expedite global exploration, while a smaller value expedites exploitation. Hence, an appropriate selection of and should be made, so that a stable balance between global exploration and local exploitation can be established[58]. Accordingly, in the CHHOCS algorithm, we borrow the merits CS method to control the position vectors of HHO. At the end of each iteration T, CS trying to find the better solution (if better solution found then update and ; otherwise left obtained values by HHO unchanged). Consequently, CS will determine the fitness value of the new solution, if it is better than the fitness value of the obtained from HHO, then the new solutions will be set; otherwise the old remains unchanged. Influence of proper selection of energy parameter E. To be specific, the steps of the CHHOCS algorithm are executed as; chaotic maps are employed to avoid falling into local optimum and premature convergence. Moreover, a balancing between exploration and exploitation is performed by CS. Then, SVM is used for classification purposes. The flowchart of the proposed CHHOCS method is represented in Fig. 3. The pseudo-code of the proposed CHHOCS method is illustrated in Algorithm 3. Here is important to mention that for SVM and feature selection, in the CHHOCS each solution of the population is encoded as a set of indexes that correspond to the rows of the dataset. For example, if a dataset has 100 rows a possible candidate solution in the population for five dimensions could be [10, 20, 25, 50, 80], such values are rows with the features to be evaluated in the SVM. The location vector in the soft and hard besiege with progressive rapid dives in HHO is updated as follows:
Figure 3

General flowchart of the proposed CHHO–CS method.

General flowchart of the proposed CHHOCS method.

Feature selection

FS is a data pre-processing step, which is used in combination with the ML techniques. FS permits the selection of a subset without redundancies and desired data. FS can effectively increase the learning accuracy and classification performance. Therefore, the prediction accuracy and data understanding in ML techniques can be improved by selecting the features that are highly correlated with other features. Two features show perfect correlation; however, only one feature is introduced to sufficiently describe the data. Therefore, classification is considered to be a major task in the ML techniques; in classification, data are classified into groups depending on the information obtained with respect to different features. Large search spaces are a major challenge associated with FS; therefore, different MAs are used to perform this task.

Fitness function

Each candidate solution is evaluated along with the number of iterations to verify the performance of the proposed algorithm. Meanwhile, in classification, the dataset needs to be divided into training and test sets. The fitness function of the proposed CHHOCS method is defined by the following equation:andwhere R refers to the classification error and C is the total number features for a given dataset D. refer to the subset length and represents the classification performance defined in the range [0, 1]. T is a necessary condition and G is a group column for the specific classifier. Each step in the algorithm is compared with T, where the obtained fitness value must be greater than in order to maximize the solution. It is important to remark that the fitness (or objective) function in Eq. (15) is also used by the CS to compute the the positions of and .

Results

To perform the experiments and comparisons, it is necessary to set up the initial values of the problem. In this way, the number of search agents is 30, the problem dimensions 1,665 for the first dataset, and 41 for the second dataset. Meanwhile, the number of iterations is set to 100 and 1,000, number experiments (runs) 30, is the fitness function 0.99, in the fitness function 0.01, lower bound 0 and upper bound 1. For comparative purposes, seven meta-heuristics algorithms including the standard Cuckoo Search (CS) and Harris Hawks Optimizer (HHO), also ten chaotic maps to verify which of them provides better results are used to verify the proposed method but due to the lack of space we have added the results of the best map only. The selected meta-heuristics and the proposal have the same elements in the population and all of them are randomly initialized. The internal parameters for all the algorithms are provided in Table 2.
Table 2

Parameters setting of competitor algorithms used in the comparison and evaluation.

MethodsParameters
PSOAgents number = 50
Velocity = 65
MFOAgents number = 50
B = 1
GWOAgents number = 50
Number domination = 100
SSAAgents number = 50
L = 2 and C = rand
SCAAgents number = 50
A = 2
HHOAgents number = 50
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_0$$\end{document}E0 variable change from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,1$$\end{document}-1 to 1 (Default)
Beta = 1.5
CSAgents number = 50
Discovery rate of align eggs solution = 0.25
Levy distribution parameter = 1.5
Step length = 0.01
HHO–CSBoth HHO and CS parameters
CHHO–CSBoth HHO and CS parameters
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_0$$\end{document}x0 = rand default for maps
Parameters setting of competitor algorithms used in the comparison and evaluation. A common machine learning classifier has been used in experiments including called SVM also was combined with the proposed CHHOCS method for the classification purpose.

Performance analysis using UCI datasets

Description and pre-processing of the datasets, results, and comparison of the proposed CHHOCS is described in the following subsections.

UCI Data description

The proposed algorithm is examined on ten benchmark datasets obtained from the UCI machine learning repository[59] illustrated in Fig. 3 and it is available at “https://www.openml.org/search”. Description of the UCI machine learning repository datasets.

Statistical results

SVM is used for the classification task. Following the previous methodology, in this experiment, iterations are set to 1,000 for each of the 30 runs. The experimental results are reported in Tables 4 and 5. In this experiment, the CHHOCS-Piece based on SVM achieves the best mean and Std.
Table 4

Values of the statistical measures obtained by the competitor algorithms using the SVM classifier with 1,000 iterations over D1, D2, D3, D4 and D5.

DatasetMethodsMeanStdBestWorst
D1PSO8.79E+017.80E−0185.58784.972
MFO8.85E+0177.70E−0187.98587.481
GWO8.37E+017.90E−0187.50387.399
SSA8.55E+017.85E−0186.30185.930
SCA8.75E+017.70E−0185.60285.099
HHO8.95E+017.55E−0187.50186.430
CS8.90E+017.90E−0182.50382.399
HHO–CS9.80E+017.66E−0190.10289.890
CHHO–CS-Piece9.89E+017.20E−0191.20290.591
D2PSO8.79E+017.80E−0184.08783.872
MFO8.85E+017.70E−0188.09787.881
GWO8.37E+017.90E−0186.10386.099
SSA8.55E+017.85E−0188.10187.930
SCA8.75E+017. 70E−0187.40286.909
HHO8.95E+017.55E−0189.50188.430
CS8.90E+017.95E−0182.00081.469
HHO–CS8.80E+017.66E−0191.29291.199
CHHO–CS-Piece9.89E+017.19E−0191.50291.299
D3PSO8.79E+017.82E−0185.18785.179
MFO8.85E+017.75E−0187.19786.980
GWO8.37E+017.90E−01186.10386.999
SSA8.55E+017.85E−0187.30187.131
SCA8.75E+017. 74E−01187.11286.909
HHO8.75E+017.70E−0190.00189.230
CS8.90E+0117.95E−0182.00081.869
HHO–CS8.80E+017.66E−0190.99291.999
CHHO–CS-Piece8.97E+017.11E−0191.00290.299
D4PSO8.70E+017.82E−0185.18784.970
MFO8.80E+017.73E−0186.17785.780
GWO8.33E+017.91E−0187.12186.980
SSA8.50E+017.85E−0188.10387.930
SCA8.72E+017. 73E−0187.12286.660
HHO8.86E+017.56E−0190.55189.990
CS8.77E+017.92E−0182.31281.960
HHO–CS8.89E+017.66E−0191.99190.980
CHHO–CS-Piece9.09E+017.76E−0192.11391.950
D5PSO8.70E+017.88E−0187.18086.920
MFO8.81E+017.75E−0187.37786.980
GWO8.30E+017.93E−0187.12186.980
SSA8.50E+017.80E−0187.91087.310
SCA8.70E+017. 75E−0192.91091.560
HHO8.90E+017.85E−0192.51091.410
CS8.99E+017.80E−0184.0183.900
HHO–CS8.96E+017.76E−0192.99091.990
CHHO–CS-Piece9.89E+017.06E−0193.80192.990
Table 5

Values of the statistical measures obtained by the competitor algorithms using the SVM classifier with 1,000 iterations over D6, D7, D8, D9 and D10.

DatasetMethodsMeanStdBestWorst
D6PSO8.73E+017.82E−0187.16086.500
MFO8.80E+017.72E−0191.10091.120
GWO8.36E+017.90E−0190.01288.691
SSA8.55E+017.80E−0189.12088.900
SCA8.70E+017. 70E−0187.53087.091
HHO8.85E+017.55E−0190.91090.769
CS8.80E+017.70E−0184.00083.599
HHO–CS8.90E+017.66E−0191.78090.890
CHHO–CS-Piece9.11E+017.02E−0191.59090.180
D7PSO8.29E+017.53E−0182.12081.920
MFO8.39E+017.69E−0187.10086.431
GWO8.30E+017.81E−0184.10083.771
SSA8.29E+017.89E−0182.99180.190
SCA8.13E+017.90E−0184.01283.060
HHO8.49E+017.13E−0185.10182.920
CS8.66E+017.30E−0182.19181.090
HHO–CS8.65E+017.17E−0186.02185.431
CHHO–CS-Piece8.79E+017.02E−0187.70985.310
D8PSO8.29E+017.53E−0182.12081.920
MFO8.32E+017.66E−0187.07086.530
GWO8.33E+017.82E−0184.01083.570
SSA7.83E−0182.93082.93081.990
SCA8.13E+017. 80E−0184.01183.261
HHO8.42E+017.19E−0185.01184.901
CS8.52E+017.29E−0182.09081.199
HHO–CS8.55E+017.14E−0186.02085.730
CHHO–CS-Piece8.77E+017.01E−0187.50786.610
D9PSO8.28E+017.75E−0187.19087.070
MFO8.23E+017.70E−0187.02086.980
GWO8.28E+017.79E−0190.50289.920
SSA8.40E+017.83E−0191.50290.091
SCA8.44E+017. 92E−0191.99090.861
HHO8.80E+017.45E−0190.04189.919
CS8.21E+017.89E−0184.09083.990
HHO–CS8.86E+017.10E−0190.82189.931
CHHO–CS-Gauss8.82E+017.02E−0193.63992.470
D10PSO8.24E+017.79E−0179.18078.471
MFO8.25E+017.78E−0180.12079.080
GWO8.26E+017.79E−0180.00179.022
SSA8.43E+017.89E−0180.10280.090
SCA8.47E+017. 94E−0180.89179.360
HHO8.82E+017.35E−0181.09080.910
CS8.24E+017.80E−01878.09176.091
HHO–CS8.88E+017.30E−0180.99180.230
CHHO–CS-Piece8.81E+017.09E−0182.01980.012
Values of the statistical measures obtained by the competitor algorithms using the SVM classifier with 1,000 iterations over D1, D2, D3, D4 and D5. Values of the statistical measures obtained by the competitor algorithms using the SVM classifier with 1,000 iterations over D6, D7, D8, D9 and D10.

Classification results

Since SVM is one of the most promising methods of classification, its performance needs to be analyzed. In this experiment, the number of iterations are set to 1,000, also the obtained results are reported in Tables 6 and 7. Notably, the CHHOCS-Piece based on SVM obtains the best classification accuracy, sensitivity, specificity, recall, precision, and F-measure.
Table 6

Classification values obtained by the competitor algorithms using the SVM classifier with 1,000 iterations over D1, D2, D3, D4 and D5.

DatasetMethodsAccuracySensitivitySpecificityRecallPrecisionF-measure
D1PSO85.58732.80046.10032.80054.43040.950
MFO87.98533.15047.45033.15054.99041.750
GWO87.50333.10047.15033.10055.15041.710
SSA86.30133.15047.12033.15054.19041.540
SCA85.60231.99046.35031.99054.55040.570
HHO88.70933.25047.70033.25054.49041.420
CS84.00331.51045.30031.51054.69040.760
HHO–CS90.10233.95048.93033.95056.57041.910
CHHO–CS-Piece91.20233.59048.95033.59055.33042.590
D2PSO84.08730.85147.42030.85154.74041.940
MFO88.09732.15148.42632.15155.15040.847
GWO86.10331.55147.90631.55154.94541.940
SSA88.10131.95048.92031.95055.24041.980
SCA87.40231.35048.12031.35054.94040.540
HHO89.50132.15048.92032.15055.75041.240
CS82.00029.95047.42029.95051.95540.640
HHO–CS91.29233.15049.12033.15056.94041.647
CHHO–CS-Piece91.50233.25047.25033.25055.95041.840
D3PSO85.18730.85147.92030.85154.74540.940
MFO87.19730.96148.42030.96155.14541.347
GWO86.10330.45048.15030.45055.04541.150
SSA87.30130.65047.45030.65055.14541.350
SCA87.10230.75047.41030.75054.95041.370
HHO90.00132.45049.12032.45056.14042.940
CS82.00030.15045.12030.15052.14539.940
HHO–CS90.99233.55149.25033.55154.34040.947
CHHO–CS-Piece91.00233.75049.75033.75054.60041.240
D4PSO85.18730.95047.93630.95054.64040.247
MFO86.17731.10048.15031.10054.95040.807
GWO87.12131.25048.54031.25055.14041.240
SSA88.10331.30048.86031.30055.25041.740
SCA87.12231.10048.15631.10054.14540.940
HHO90.55132.15049.96032.15055.64042.940
CS82.31229.75046.52029.75053.14039.640
HHO–CS91.99132.35049.12032.35055.74042.870
CHHO–CS-Piece92.11332.89049.99632.89055.99542.970
D5PSO87.18031.71048.24031.71055.20043.940
MFO87.37730.20048.22030.15054.25041.970
GWO87.12131.65047.16031.65054.95041.250
SSA87.91031.70048.72031.70055.85043.280
SCA92.91032.30048.10031.20055.73042.140
HHO92.51032.35048.71032.35055.35043.990
CS84.01030.10047.22030.10053.45140.150
HHO–CS92.99033.16049.74033.16056.25544.870
CHHO–CS-Piece93.80133.25049.19033.25056.85044.590
Table 7

Classification values obtained by the competitor algorithms using the SVM classifier with 1,000 iterations over D6, D7, D8, D9 and D10.

DatasetMethodsAccuracySensitivitySpecificityRecallPrecisionF-measure
D6PSO87.16030.28048.49030.28055.56043.890
MFO91.10030.39048.77030.39055.10043.893
GWO90.01230.29947.79030.29954.74043.471
SSA89.12030.65048.55030.12054.99943.595
SCA87.53031.99648.29031.99655.470442.25
HHO90.91032.89548.99032.89555.99444.397
CS82.31229.75046.52029.75053.14039.640
HHO–CS91.78032.76649.99032.76656.49244.992
CHHO–CS-Piece91.59033.25249.66033.25256.99144.899
D7PSO82.12031.90148.74231.90155.73243.902
MFO87.10030.90148.62930.90154.75343.991
GWO84.10031.98947.97931.98954.93343.962
SSA82.99131.96948.82031.96955.93943.599
SCA84.01231.35948.99031.35955.96042.951
HHO85.10132.29848.98032.29855.59944.992
CS82.19131.84947.35931.54053.85940.932
HHO–CS86.02131.39149.37731.39156.99044.993
CHHO–CS-Piece87.70931.10249.29131.10255.85244.711
D8PSO82.12031.97948.47231.97955.33943.920
MFO87.07030.19248.73230.19254.85243.909
GWO84.01031.28947.77231.28954.93143.269
SSA82.93031.99048.83031.99055.90143.893
SCA84.01131.95248.92931.95255.96842.952
HHO85.01132.29748.98732.29755.79944.399
CS82.09031.53747.45231.53753.95540.956
HHO–CS86.02031.99149.97131.99156.59944.930
CHHO–CS-Piece87.50731.01049.09131.01055.95044.410
D9PSO87.19031.90948.97031.90955.91043.919
MFO87.02030.90248.97030.90254.92043.991
GWO90.50231.99047.97931.99054.93343.962
SSA82.99131.96948.82031.96955.93943.492
SCA84.01231.35948.99031.35955.96042.951
HHO85.10132.29848.98032.29855.59944.992
CS82.19131.84947.35931.54053.85940.932
HHO–CS86.02131.39149.37731.39156.99044.993
CHHO–CS-Piece87.70931.10249.29131.10255.85244.711
D10PSO82.12031.97948.47231.97955.33943.920
MFO87.07030.19248.73230.19254.85243.909
GWO84.01031.28947.77231.28954.93143.269
SSA82.93031.99048.83031.99055.90143.893
SCA84.01131.95248.92931.95255.96842.952
HHO85.01132.29748.98732.29755.79944.399
CS82.09031.53747.45231.53753.95540.956
HHO–CS86.02031.99149.97131.99156.59944.930
CHHO–CS-Piece87.50731.01049.09131.01055.95044.410
Classification values obtained by the competitor algorithms using the SVM classifier with 1,000 iterations over D1, D2, D3, D4 and D5. Classification values obtained by the competitor algorithms using the SVM classifier with 1,000 iterations over D6, D7, D8, D9 and D10.

Performance analysis using chemical datasets

Description of chemical datasets

In this study, two different datasets are used to experimentally evaluate the performance of the proposed method. (1) The MAO dataset comprises 68 molecules and is divided into two classes: 38 molecules that inhibit MAO (antidepressants) and 30 molecules that do not. MAO is available at http://iapr-tc15.greyc.fr/links.html. Each molecule should have a mean size of 18.4 atoms, and the mean degree of the atoms is 2.1 edges. In addition, the smallest molecule contains 11 atoms, whereas the largest one contains 27 atoms; each molecule has 1,665 descriptors. (2) The QSAR biodegradation dataset comprises 1,055 chemical compounds, 41 molecular descriptors, and one class; it is available at http://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation. These chemical compounds are obtained from the National Institute of Technology and Evaluation of Japan (NITE). The MAO dataset is transformed into a line notation form to describe the structure of the simplified molecular-input line-entry system (SMILES) using the open babel software[60]; E-dragon[61] is subsequently applied to obtain the molecular descriptor. Information obtained with respect to the second QSAR biodegradation dataset was preprocessed by the Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca and is available at http://www.michem.unimib.it/

Data preprocessing

Here, the required steps to preprocess the data set information are presented. The information obtained from the molecules is transferred to the features representing chemical compounds[36,39]. The data obtained from the proteins are stored in a special chemical format. Further, the software should be used to transfer the information into the isomeric SMILES. The data set contains different instances with specific multidimensional attributes (commonly two-dimensional 2D and 3D according to the QSAR model. The E-dragon software is used to compute the descriptors from this dataset. The descriptors contain physicochemical or structural information as solvation properties, molecular weight, aromaticity, volume, rotatable bonds, molecular walk counts, atom distribution, distances, interatomic, electronegativity, and atom types. They are used for determining values of generations and instances which belong to a class as shown in Fig. 4.
Figure 4

Mapping from a molecular to a space of features.

Mapping from a molecular to a space of features. Here, the SVM is used for the classification task. Following the previous methodology, in the first experiment, iterations are set to 100 for each of the 30 runs. The experimental results are reported in Tables 8. In this experiment, the CHHOCS-Piece based on SVM obtains the best mean and Std. The same rank is obtained for maximizing the classification accuracy solution, Sensitivity, Specificity, Recall, Precision, and F measure. In this case, the HHOCS with SVM is the second-ranked in mean value, Std, and maximizing the classification accuracy solution, sensitivity, specificity, recall, precision, and F-measure. The iterations are configured to 1,000; the idea is to obtain the best solutions. In this case, the results are presented in Table 9, where the CHHOCS-Piece combined with the SVM is the fist ranked approach for the mean value, and Std, the same occurs for maximizing the classification accuracy solution, sensitivity, specificity, recall, precision, and F-measure. Meanwhile, the second algorithm in the rank is the HHOCS with SVM for mean value, Std, and maximizing the classification accuracy solution.
Table 8

Values of the statistical measures obtained by the competitor algorithms using the SVM classifier with 100 iterations.

DatasetMethodsMeanStdBestWorst
MAOPSO8.07E+017.30E−0187.98786.472
MFO8.83E+017.36E−0185.28584.981
GWO8.20E+017.40E−0185.00384.999
SSA8.40E+017.32E−0187.50187.430
SCA8.60E+017.33E−0186.00285.699
HHO9.50E−017.45E−0294.24793.011
CS8.50E−012.60E−0184.23283.178
HHO–CS9.60E−017.32E−0295.32094.334
CHHO–CS-Piece9.76E−017.15E−0296.18095.702
QSARPSO8.70E+017.30E−0179.98779.472
MFO8.30E+017.10E−0180.28580.981
GWO8.40E+017.04E−0180.50380.399
SSA8.60E+017.35E−0179.50178.430
SCA8.50E+017.06E−0180.00279.999
HHO8.19E−016.69E−0380.99081.017
CS8.17E−016.71E−0478.90279.011
HHO–CS8.28E−016.66E−0481.97082.011
CHHO–CS-Piece8.33E−016.68E−0482.52182.711
Table 9

Values of the statistical measures obtained by the competitor algorithms using the SVM classifier with 1,000 iterations.

DatasetMethodsMeanStdBestWorst
MAOPSO8.15E+017.22E+0087.98186.981
MFO8.12E+010.00E+0087.17686.176
GWO9.25E+017.20E−0190.70589.705
SSA9.12E+017.17E−0192.64791.235
SCA9.12E+017.17E−0292.64791.176
HHO9.55E−017.48E−0295.25994.061
CS8.55E−012.90E−0184.30083.523
HHO–CS9.60E−017.40E−0295.53095.440
CHHO–CS-Piece9.85E−017.23E−0296.19095.950
QSARPSO8.47E+017.30E−0179.88779.472
MFO8.33E+017.16E−0180.98580.681
GWO8.40E+017.94E−0180.60380.499
SSA7.40E+017.05E−0178.80178.630
SCA8.42E+017.16E−0180.00279.999
HHO8.39E−011.41E−0380.97181.210
CS8.28E−012.42E−0279.80079.901
HHO–CS8.40E−011.40E−0382.30182.511
CHHO–CS-Piece8.42E−011.39E−0384.01284.001
Values of the statistical measures obtained by the competitor algorithms using the SVM classifier with 100 iterations. Values of the statistical measures obtained by the competitor algorithms using the SVM classifier with 1,000 iterations. Since SVM is one of the most promising methods of classification, its performance needs to be analyzed. In the first experiment, iterations are set to 100; the experimental results are reported in Table 10. In this experiment, the CHHOCS-Piece based on SVM obtains the best results. In this case, the HHOCS with SVM is the second-ranked in most of the assessment criteria. A final experiment for SVM is performed by using 1,000 iterations and the reported values in Table 11 confirms that the CHHOCS-Piece combined with the SVM is the first ranked approach. Meanwhile, HHOCS with SVM is the second-ranked algorithm in most of the assessment criteria.
Table 10

Classification values obtained by the competitor algorithms using the SVM classifier with 100 iterations.

DatasetMethodsAccuracySensitivitySpecificityRecallPrecisionF-measure
MAOPSO87.9873333.89049.95056.74042.901
MFO85.28533.93050.15033.93056.950743.201
GWO85.00334.10050.20034.10057.15043.901
SSA87.50134.25050.25034.25057.40044.101
SCA86.00234.40050.70034.40057.53044.501
HHO94.24749.93064.16049.93066.53655.130
CS84.23233.65049.92033.65056.54042.851
HHO–CS95.32050.12067.81650.12068.39259.646
CHHO–CS-Piece96.18053.94171.66053.94173.62562.540
QSARPSO79.98749.61066.95049.61068.19058.950
MFO80.28549.75066.98049.75068.25059.100
GWO80.50349.80067.13049.80068.30059.150
SSA79.50149.60067.30049.60068.20059.300
SCA80.00249.75067.35049.75068.15059.450
HHO81.07049.72067.71049.72066.53658.950
CS79.00149.51066.92049.51068.59258.851
HHO–CS82.17049.82067.81649.82068.69058.640
CHHO–CS-Piece82.72049.54067.46049.54068.59062.540
Table 11

Classification values obtained by the competitor algorithms using the SVM classifier with 1,000 iterations.

DatasetMethodsAccuracySensitivitySpecificityRecallPrecisionF-measure
MAOPSO87.98140.54050.12040.54056.74045.360
MFO87.17640.75050.52040.75056.95045.470
GWO90.70541.15050.72041.15057.15045.800
SSA92.64741.35050.83041.35057.40045.900
SCA92.64741.45050.85041.45057.53046.100
HHO95.25951.33166.04351.33169.02458.172
CS84.30040.34250.02140.34260.99045.062
HHO–CS95.53053.44469.83053.44471.93062.846
CHHO–CS-Piece96.19055.48573.84355.48575.72766.182
QSARPSO79.88740.54050.10040.54061.19045.160
MFO80.98540.65050.15040.65061.20045.190
GWO80.60340.71050.25040.71061.15045.490
SSA78.80140.82050.30040.82061.09045.510
SCA80.00240.93050.53040.93061.10045.550
HHO81.20151.94069.04351.94070.92064.950
CS79.90145.94055.02145.94069.99065.162
HHO–CS82.50152.42069.13052.42071.13065.150
CHHO–CS-Piece84.00152.54069.34052.54071.87065.880
Classification values obtained by the competitor algorithms using the SVM classifier with 100 iterations. Classification values obtained by the competitor algorithms using the SVM classifier with 1,000 iterations.

The convergence analysis

This section aims to analyze the convergence of the proposed CHHOCS based chaotic maps presented in this paper. Figures 5 and 6 shows the convergence curves for the competitor algorithms over the ten UCI Machine Learning Repository datasets along the iterative process 100, and 1,000 iterations respectively. Over the ten UCI datasets, the convergence curves plotted in Figs. 5 and 6 provides evidence that the proposed CHHOCS method using SVM obtained the best results compared with the original HHO and CS algorithms and the other competitor algorithms along with the two-stop criteria (100 and 1,000 iterations).
Figure 5

Convergence curves for the best CHHO–CS-based chaotic map and the competitor algorithms using SVM on ten UCI datasets with 100 iterations.

Figure 6

Convergence curves for the best CHHO–CS-based chaotic map and the competitor algorithms using SVM on ten UCI datasets with 1,000 iterations.

Convergence curves for the best CHHOCS-based chaotic map and the competitor algorithms using SVM on ten UCI datasets with 100 iterations. Convergence curves for the best CHHOCS-based chaotic map and the competitor algorithms using SVM on ten UCI datasets with 1,000 iterations. On the other hand, the convergence curves plotted in Fig. 7a–d provide evidence that the proposed CHHOCS method with SVM classifier obtained over the two datasets (MAO and QSAR biodegradation) the best results compared with the original HHO and CS algorithms and the other competitor algorithms along with the two-stop criteria (100 and 1,000 iterations).
Figure 7

Convergence curves for the best CHHO–CS-based chaotic map and the competitor algorithms using SVM on MonoAmine Oxidase (MAO) and QSAR Biodegradation datasets. (a,b) MAO dataset with 100, and 1,000 iterations respectively. On the other hand, (c,d) QSAR biodegradation dataset with 100, and 1,000 iterations respectively.

Convergence curves for the best CHHOCS-based chaotic map and the competitor algorithms using SVM on MonoAmine Oxidase (MAO) and QSAR Biodegradation datasets. (a,b) MAO dataset with 100, and 1,000 iterations respectively. On the other hand, (c,d) QSAR biodegradation dataset with 100, and 1,000 iterations respectively.

Discussion

According to the aforementioned results for both of the UCI datasets and the two chemical datasets (MonoAmine Oxidase (MAO) and QSAR biodegradation datasets), the CHHOCS maximizes the accuracy and reduces the number of selected features. Also, the obtained Std values are increasing directly when the number of iterations increases for the proposed CHHOCS method with the SVM classifier. The statistic metrics as mean, Std, best, and worst, as well as the classification assessment, indicate that chaotic maps introduce better results in comparison with the standard approaches. The evidence of this fact can be observed in the convergence curves as shown in Figs. 5, 6 and 7, where the CHHOCS method based chaotic map with SVM is applied over the UCI datasets and the two chemical datasets (MOA and QSAR). In worthwhile, the convergence curve is presented because it is a graphical form to study the relationship between the number of iterations and the fitness function. It declares the best-performed algorithm by comparison between various approaches and when increasing the number of iterations, it represents a direct correlation. The convergence curves plotted in Fig. 5a–j revealed that the proposed CHHOCS-Piece method achieved better results compared with the competitor algorithms. Also, in the same context, the convergence curves plotted in Fig. 6a–j revealed that the proposed CHHOCS-Piece method achieved better results compared with the competitor algorithms. To sum up, the experiments were conducted on MOA and QSAR biodegradation datasets and the obtained results are interesting and due to the lack of space, we have added the results of the best map only. For example, in the first MOA dataset with the SVM classification technique in different stop conditions 100, and 1,000 iterations as shown in Fig. 7a–d, respectively. Moreover, on the MAO dataset, with 100 and 1,000 iterations, it is interesting that CHHOCS-Piece with SVM is better than the other competitor algorithms. Meanwhile, for the second QSAR biodegradation dataset, the optimal solutions with SVM are computed with 100, and 1,000 iterations as stop condition, it is interesting that the version CHHOCS-Piece with SVM provides the optimal solutions in comparison with the other metaheuristic algorithms.

Conclusion

metaheuristic algorithms and machine learning techniques are important tools that can solve complex tasks in the field of cheminformatics. The capabilities of MAs and ML to optimize and classify information are useful in drug design. However, these techniques should be highly accurate to obtain optimal compounds. In this paper, a hybrid metaheuristic method termed CHHOCS which combined the Harris hawks optimizer (HHO) with operators of the cuckoo search (CS) and chaotic maps (C) in order to enhance the performance of the original HHO. Moreover, the proposed CHHOCS method was combined with the support vector machine (SVM) as machine learning classifiers for conducting the chemical descriptor selection and chemical compound activities. The main tasks of the proposed method are to select the most important features and classify the information in the cheminformatics datasets (e.g., MAO and QSAR biodegradation). The experimental results confirm that the use of chaotic maps enhances the optimization process of the hybrid proposal. It is important to mention that not all the chaotic maps are completely useful, and it is necessary to decide when to use one or another. As expected, this is dependent on the dataset and the objective function. Comparisons of the proposed CHHOCS method with the standard algorithms revealed that the CHHOCS yields superior results with respect to cheminformatics using different stop criteria. In the future, the proposed CHHOCS method can be used as a multi-objective global optimization or feature selection paradigm for high-dimensional problems containing many instances to increase the classification rate and decrease the selection ratio of attributes.
Table 3

Description of the UCI machine learning repository datasets.

NoDatasetInstancesNo featuresClasses
D1Breast cancer66992
D2KCL2,110212
D3WineEW178133
D4WDBC569302
D5Lung Cancer226232
D6Diabetic1,151192
D7Stock95092
D8Scene2,4072992
D9Lymphography148184
D10Parkinsons195222
  14 in total

Review 1.  Computer Aided Drug Design: Success and Limitations.

Authors:  Mohammad Hassan Baig; Khurshid Ahmad; Sudeep Roy; Jalaluddin Mohammad Ashraf; Mohd Adil; Mohammad Haris Siddiqui; Saif Khan; Mohammad Amjad Kamal; Ivo Provazník; Inho Choi
Journal:  Curr Pharm Des       Date:  2016       Impact factor: 3.116

2.  DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins.

Authors:  Ali Akbar Jamali; Reza Ferdousi; Saeed Razzaghi; Jiuyong Li; Reza Safdari; Esmaeil Ebrahimie
Journal:  Drug Discov Today       Date:  2016-01-25       Impact factor: 7.851

3.  Simple mathematical models with very complicated dynamics.

Authors:  R M May
Journal:  Nature       Date:  1976-06-10       Impact factor: 49.962

Review 4.  Machine-learning approaches in drug discovery: methods and applications.

Authors:  Antonio Lavecchia
Journal:  Drug Discov Today       Date:  2014-11-04       Impact factor: 7.851

5.  PyMOL and Inkscape Bridge the Data and the Data Visualization.

Authors:  Shuguang Yuan; H C Stephen Chan; Slawomir Filipek; Horst Vogel
Journal:  Structure       Date:  2016-12-06       Impact factor: 5.006

6.  Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power.

Authors:  Zhe Wang; Huiyong Sun; Xiaojun Yao; Dan Li; Lei Xu; Youyong Li; Sheng Tian; Tingjun Hou
Journal:  Phys Chem Chem Phys       Date:  2016-04-25       Impact factor: 3.676

Review 7.  Descriptors and their selection methods in QSAR analysis: paradigm for drug design.

Authors:  Asad U Khan
Journal:  Drug Discov Today       Date:  2016-06-18       Impact factor: 7.851

Review 8.  Machine learning in chemoinformatics and drug discovery.

Authors:  Yu-Chen Lo; Stefano E Rensi; Wen Torng; Russ B Altman
Journal:  Drug Discov Today       Date:  2018-05-08       Impact factor: 7.851

9.  Sequence-based design of bioactive small molecules that target precursor microRNAs.

Authors:  Sai Pradeep Velagapudi; Steven M Gallo; Matthew D Disney
Journal:  Nat Chem Biol       Date:  2014-02-09       Impact factor: 15.040

10.  CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database.

Authors:  Baofeng Jia; Amogelang R Raphenya; Brian Alcock; Nicholas Waglechner; Peiyao Guo; Kara K Tsang; Briony A Lago; Biren M Dave; Sheldon Pereira; Arjun N Sharma; Sachin Doshi; Mélanie Courtot; Raymond Lo; Laura E Williams; Jonathan G Frye; Tariq Elsayegh; Daim Sardar; Erin L Westman; Andrew C Pawlowski; Timothy A Johnson; Fiona S L Brinkman; Gerard D Wright; Andrew G McArthur
Journal:  Nucleic Acids Res       Date:  2016-10-26       Impact factor: 16.971

View more
  7 in total

1.  Cross-Tissue Analysis Using Machine Learning to Identify Novel Biomarkers for Knee Osteoarthritis.

Authors:  Yudong Zhao; Yu Xia; Gaoyan Kuang; Jihui Cao; Fu Shen; Mingshuang Zhu
Journal:  Comput Math Methods Med       Date:  2022-06-23       Impact factor: 2.809

2.  An Optimized Hyperparameter of Convolutional Neural Network Algorithm for Bug Severity Prediction in Alzheimer's-Based IoT System.

Authors:  Iqra Yousaf; Fareeha Anwar; Salma Imtiaz; Ahmad S Almadhor; Farruh Ishmanov; Sung Won Kim
Journal:  Comput Intell Neurosci       Date:  2022-06-28

Review 3.  Harris Hawk Optimization: A Survey onVariants and Applications.

Authors:  B K Tripathy; Praveen Kumar Reddy Maddikunta; Quoc-Viet Pham; Thippa Reddy Gadekallu; Kapal Dev; Sharnil Pandya; Basem M ElHalawany
Journal:  Comput Intell Neurosci       Date:  2022-06-27

4.  Improved barnacles mating optimizer algorithm for feature selection and support vector machine optimization.

Authors:  Heming Jia; Kangjian Sun
Journal:  Pattern Anal Appl       Date:  2021-05-13       Impact factor: 2.307

5.  Optimizing quantum cloning circuit parameters based on adaptive guided differential evolution algorithm.

Authors:  Essam H Houssein; Mohamed A Mahdy; Manal G Eldin; Doaa Shebl; Waleed M Mohamed; Mahmoud Abdel-Aty
Journal:  J Adv Res       Date:  2020-10-17       Impact factor: 10.479

6.  An enhanced version of Harris Hawks Optimization by dimension learning-based hunting for Breast Cancer Detection.

Authors:  Navneet Kaur; Lakhwinder Kaur; Sikander Singh Cheema
Journal:  Sci Rep       Date:  2021-11-09       Impact factor: 4.379

7.  A velocity-guided Harris hawks optimizer for function optimization and fault diagnosis of wind turbine.

Authors:  Wen Long; Jianjun Jiao; Ximing Liang; Ming Xu; Tiebin Wu; Mingzhu Tang; Shaohong Cai
Journal:  Artif Intell Rev       Date:  2022-07-25       Impact factor: 9.588

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.