Literature DB >> 33520607

COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach.

Miodrag Zivkovic1, Nebojsa Bacanin1, K Venkatachalam2, Anand Nayyar3,4, Aleksandar Djordjevic1, Ivana Strumberger1, Fadi Al-Turjman5.   

Abstract

The main objective of this paper is to further improve the current time-series prediction (forecasting) algorithms based on hybrids between machine learning and nature-inspired algorithms. After the recent COVID-19 outbreak, almost all countries were forced to impose strict measures and regulations in order to control the virus spread. Predicting the number of new cases is crucial when evaluating which measures should be implemented. The improved forecasting approach was then used to predict the number of the COVID-19 cases. The proposed prediction model represents a hybridized approach between machine learning, adaptive neuro-fuzzy inference system and enhanced beetle antennae search swarm intelligence metaheuristics. The enhanced beetle antennae search is utilized to determine the parameters of the adaptive neuro-fuzzy inference system and to improve the overall performance of the prediction model. First, an enhanced beetle antennae search algorithm has been implemented that overcomes deficiencies of its original version. The enhanced algorithm was tested and validated against a wider set of benchmark functions and proved that it substantially outperforms original implementation. Afterwards, the proposed hybrid method for COVID-19 cases prediction was then evaluated using the World Health Organization's official data on the COVID-19 outbreak in China. The proposed method has been compared against several existing state-of-the-art approaches that were tested on the same datasets. The proposed CESBAS-ANFIS achieved R 2 score of 0.9763, which is relatively high when compared to the R 2 value of 0.9645, achieved by FPASSA-ANFIS. To further evaluate the robustness of the proposed method, it has also been validated against two different datasets of weekly influenza confirmed cases in China and the USA. Simulation results and the comparative analysis show that the proposed hybrid method managed to outscore other sophisticated approaches that were tested on the same datasets and proved to be a useful tool for time-series prediction.
© 2020 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  ANFIS; Beetle antennae search; COVID-19; Enhanced; Machine learning; Prediction; Swarm intelligence

Year:  2020        PMID: 33520607      PMCID: PMC7836389          DOI: 10.1016/j.scs.2020.102669

Source DB:  PubMed          Journal:  Sustain Cities Soc        ISSN: 2210-6707            Impact factor:   7.587


Introduction

The recently discovered coronavirus SARS-CoV-2, known by the name COVID-19 (Coronaviridae Study Group of the International et al., 2020), is a novel respiratory virus, which was initially detected in humans in December 2019, in Wuhan, China (Chan et al., 2020, Yadav and Saxena, 2020). Since then, the virus has spread worldwide, affecting more than 200 countries, with the number of reported cases rising to over 13 million infected people, and the number of deaths rising to 570,000 as the mid of July 2020. Since it is a novel virus, scientists and epidemiologists still have a lot to learn about it. One fact that remains unknown is the number of people with the very mild condition or even asymptomatic infection, and how infectious those people actually are. Estimates are varying from 5 to even 50 people without symptoms for every reported case. The precise scale of the COVID-19 outbreak is not known at the moment, and it will take months, maybe even years to gather all relevant data to give a precise estimate. This will include massive testing on the population which every country will perform to see how many people are actually infected. However, this information is neither available now, nor it was available at the moment of the outbreak. The first estimations from the World Health Organization (WHO) were that the novel coronavirus is extremely contagious and dangerous (World Health Organization et al., 2020). Within the first three months of the pandemic, the virus had spread over all continents and reached almost every country in the world. Most of the countries officials worldwide declared the state of emergency and enforced regulations regarding social distancing, and many other relevant control measures in order to try and control the virus spread and minimize the number of deaths (Sohrabi et al., 2020, Spinelli and Pellino, 2020). The main goal was to limit the number of infected individuals so that the health system does not get overwhelmed by the people with serious respiratory illness who would require intensive care in hospitals. As a result, airports, schools, faculties, public transport, and many businesses were shut down across the world, and the people were encouraged to practice social distancing and work from home if possible. Nevertheless, some countries were affected by the virus much severely than other countries, which have translated unfortunately in a greater number of deaths. The officials had widely utilized various epidemiological models (Morens et al., 2010, Rypdal and Sugihara, 2019, Scarpino and Petri, 2019) to try and estimate the outbreak, identify and estimate the peak of the epidemic as early as possible, and to try predicting the number of potential deaths. Based on these prediction models, the officials decided what measures must be taken in order to control the outbreak, suggested new policies, and also assessed the effectiveness of the measures that were already in place. Therefore, the accuracy of the outbreak prediction model which is being used is critical in order to obtain relevant insight into the possible spread of the virus and death toll of the disease. As a matter of fact, COVID-19 is not the first coronavirus that has threatened humanity in the past twenty years. The first virus outbreak was the SARS in 2003, followed by the MERS outbreak in 2012. In the past two decades, there were several other disease outbreaks around the world, including Ebola, swine flu, H1N1 flu, the previously mentioned SARS and MERS, and the most recent Zika virus. These outbreaks led to the development of novel and advanced epidemiological models, which were able to predict the outbreaks with high accuracy. Unfortunately, the COVID-19 pandemic has shown a non-linear and very complex nature, as it has been shown in Ivanov (2020). The novel coronavirus outbreak has also exhibited a lot of differences compared to the other previous outbreaks, which had put in doubt the practical ability of the existing models to deliver accurate predictions and results. The COVID-19 outbreak still has multiple unknown variables which are influencing the spread of the virus — the complex and varying behavior of the population in different countries, different approaches of the governments and officials when applying the measures to contain the virus spread, declared a state of emergency to name the few. These unknown variables had decreased current models performances drastically (Scarpino & Petri, 2019). Some of the more recent models have included the influence of social distancing, quarantine, and curfew into their outbreak prediction, i.e. Zhan et al. (2019) and Rypdal and Sugihara (2019). The overview of the recent literature which considers the prediction of the virus spread shows a significant amount of research currently going on about this hot topic. Most of the recent research focuses on the estimation of the number of infected people, serious cases (infected individuals who must be taken care of in intensive care units), and fatalities. This kind of research is extremely important for the current outbreak, although the virus has already shown some signs of slowing down in some countries especially in Europe and eastern Asia, where the outbreak control measures have already been relaxed to some extent. The virus is currently raging in North and South America, with the number of reported infections showing that India and Russia particularly have also been affected a lot, so one can safely say that we are still far away from the global pandemic suppression. This research is also important for the future, as no one is certain whether or not there will be a second wave later this year/or next year, and if there is a second wave, would it be more or less dangerous and lethal than the first wave. Secondly, this research can help in predicting the outbreak of some completely new disease in the following years. The majority of the recent papers deal with the prediction models. Outbreak prediction with the machine learning approach was discussed in Ardabili et al. (2020). It investigates a wide range of machine learning models and outlines two models that have shown promising results (MLP or multi-layered perceptron, and ANFIS — adaptive network-based fuzzy inference system). The conclusion of this research suggests that machine learning can be used effectively to model the outbreak of the disease. This approach was later exemplified on the case of Hungary (Pinter, Felde, Mosavi, Ghamisi, & Gloaguen, 2020), in order to demonstrate the potential of the machine learning approach and to set a path for future research. The research presented in Suzuki and Suzuki (2020) utilize machine learning approach to estimate the number of reported cases in each province of South Korea, by employing a combination of XGBoost and MultiOutputRegressor as a machine learning model. Alternative machine learning approach was conducted in Liu et al. (2020), by combining disease estimates from mechanistic models with digital traces, in order to reliably forecast COVID-19 activity in the Chinese provinces at near real-time. More precisely, the proposed method was able to produce stable and accurate forecasts 2 days ahead of the current time. This was done by combining inputs from official health reports from Chinese Center Disease for Control and Prevention, COVID-19-related internet search activity, news, and media activity, and daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. In the modern world, a large number of cities developed into smart cities, with significant application of the Internet of Things (IoT) (Silva, Khan, & Han, 2018). The rapid growth of the population in the cities, together with the urbanization process required introducing new ways to handle this process with minimal consequences to the environment and lifestyle commodity of the citizens. Smart cities evolved through intensive application of IoT, and today the city operations are supported in an intelligent manner, while minimizing the human interaction. The concept of the smart and sustainable city is founded on a basis of a large number of interdisciplinary sciences, working together to achieve ecologically and technologically advanced environment (Bibri & Krogstie, 2017). These cities are almost exclusively found in technologically advanced and rich countries. However, even in the most developed countries, it was obvious that no one was prepared for a global pandemic of COVID-19 proportions. Lock-downs and curfews were necessary in the smart cities. Additionally, schools, restaurants and small business were closed for months. As a result, various research are focused on the social distancing rules and required sufficient ventilation in the buildings, in order to try to bring the lifestyle of the citizens as close to normal as possible while preventing the virus transmission (Sun & Zhai, 2020). The lessons learned during the COVID-19 will have a huge impact on the architecture and urbanism, which will never be the same after the pandemic due to the fear of infection (Megahed & Ghoneim, 2020). Additionally, smart cities will be including new technology for different threats like COVID-19 in the future. The research presented within this paper aimed to show the sophisticated mechanism for predicting the number of new cases of infection on a city level. The simulations conducted in this paper were based on the available dataset from China and other datasets were used for the purpose of the comparative analysis. The proposed method can be easily implemented in smart cities to address any future pandemic situation and help predicting the number of the confirmed cases, which will help in deciding which measures need to be taken and at what time to save lives. The basic research question, as well as motivation behind the research proposed in this paper, can be defined as follows: Is it possible to further improve current time-series prediction (forecasting) algorithms based on hybrids between machine learning and nature-inspired algorithms?. To accomplish this goal, an enhanced version of the recently developed beetle antennae search (BAS) algorithm has been adopted for updating parameters of the adaptive neuro-fuzzy inference system (ANFIS) machine learning method. Antecedent and conclusion ANFIS parameters have been taken into the consideration, while the type of membership function was not subject to the optimization process. Moreover, during practical simulations with the original BAS algorithm, that belongs to the family of swarm intelligence, some deficiencies can be observed. Therefore, for the purpose of this research, first, the basic BAS algorithm has been improved and then both algorithms were tested on the standard set of unconstrained benchmarks to validate enhancements. Next, a framework based on the ANFIS trained by the improved BAS algorithm has been employed to create a prediction model for the virus outbreak anticipation. The main goal of this research is to enhance the prediction accuracy of new cases of COVID-19. The secondary objective is to try to improve the original BAS algorithm by addressing its deficiencies. Due to the fact that COVID-19 is the most important and urgent global challenge that human beings face, the proposed method has been tested on the COVID-19 dataset from China. Moreover, to show that the proposed method can be successfully applied in predicting time-series of any other disease, additional experiments have been performed with the dataset of weekly confirmed cases of influenza. One additional experiment has been performed to predict if climate and environment variables, such as population density, can have an effect on the infection rate. The rest of the paper is organized as follows: Section 2 gives an overview of the ANFIS and swarm intelligence metaheuristics application in solving various NP-hard problems, Section 3 provides insights of ANFIS method. In Section 4, details of the original and improved BAS algorithm have been given, as well as of the proposed ANFIS framework implementation. Section 5 presents the simulation results and discussion, while Section 6 provides a conclusion and final remarks of this research along with the recommendations for the future work.

Background and related work

This section presents a literature overview of the ANFIS model and swarm intelligence applications in solving various real-life problems. This section also discusses how swarm intelligence can be applied to the optimization of the ANFIS model parameters. The ANFIS belongs to the group of artificial intelligence techniques, and it merges artificial neural networks with fuzzy inference systems. Its structure allows it to be used in the process of modeling a large number of systems from various application domains. Neuro-fuzzy systems have been applied to solving many real-world problems. ANFIS was introduced by Jang in 1993 (Jang, 1993), and it is considered to be one of the most popular neuro-fuzzy systems. It combines the characteristics and advantages of both artificial neural networks and fuzzy inference systems, therefore providing a firm background for problem identification and modeling. As it can be seen from the available literature, ANFIS has been widely used in time series forecasting. Some of the fields where ANFIS was successfully applied include traffic control, medical systems, economic data, image processing, feature extraction, forecasting, etc Karaboga and Kaya (2019a). By reviewing the most recent ANFIS publications it can be seen that many successful ANFIS implementations exist. For example, in Naderloo et al. (2012), ANFIS was applied to predict crop yield, based on the different energy inputs. The ANFIS was also used to estimate the relative viscosity of nanofluids (Baghban, Jalali, Shafiee, Ahmadi, & Chau, 2019). In Harandizadeh, Toufigh, and Toufigh (2019), ANFIS approach was used to estimate the bearing capacity of piles, with promising results. Another recent publication shows an approach that employs a hybrid ANFIS model for forest fire probability prediction (Jaafari, Zenner, Panahi, & Shahabi, 2019). The ANFIS was also used numerous times for disease diagnosis and spread forecasting. The application of neuro-fuzzy systems in forecasting Measles cases in Ethiopia was discussed in Uyar, Ilhan, Iseri, and Ilhan (2019). Other applications of ANFIS for disease forecasts include: Hepatitis C virus epidemic (Khodaei-mehr, Tangestanizadeh, Vatankhah, & Sharifi, 2018), tuberculosis (Mohammed et al., 2018, Uçar et al., 2013), and finally, COVID-19. The COVID-19 related applications include forecasting the confirmed cases of the COVID-19 in China (Al-qaness, Ewees, Fan, & Aziz, 2020), outbreak prediction (Ardabili et al., 2020) and prediction case-study on the state of Hungary (Pinter et al., 2020). Since the COVID-19 is a relatively new challenge, only a few studies have been found from this domain. One of the greatest issues and challenges in machine learning algorithms is to establish optimal or near-optimal values of its parameters for tackling a specific problem. Unfortunately, there is no universal rule and to solve each specific problem, a different set of parameters values should be determined. Establishing optimal or near-optimal values of these parameters is an NP-hard task and for its solving metaheuristics approaches could be applied. NP-hard problems cannot be solved within the polynomial time if only traditional (deterministic) methods are used. NP-hard problems have considerable practical importance and belong to the domain of the theory of computational complexity, which plays a central role in modern computer science. Practical NP-hard problems can be found in machine learning, cloud computing, wireless sensor networks, software, and hardware design and operations research, to name the few. To solve this kind of problems in a reasonable amount of time, the stochastic approach can be applied. Metaheuristics are the form of the stochastic approaches, and their goal is to find an approximate solution which is good enough (not necessarily the best solution), within the reasonable time (Strumberger, Bacanin et al., 2017, Tuba, Bacanin et al., 2015, Tuba, Strumberger, Bacanin, and Tuba, 2018). Recently, metaheuristic algorithms have been utilized in solving a large number of NP-hard problems (Strumberger, Minovic, Tuba, & Bacanin, 2019). One of the most prominent families of metaheuristics is bio-inspired (nature-inspired) algorithms. In general, bio-inspired metaheuristics can be divided into two large distinctive groups. The first group of algorithms is known under the name evolutionary algorithms (EA), while swarm intelligence algorithms represent the second group. The EA mimics the process of natural selection, which can be defined as the survival of the most fit. The most fit individuals are selected for breeding to produce offspring for the next generation. Therefore, natural selection starts by selecting the most fit individuals from a given population. These individuals breed and produce the offspring which are added to the next generation and which will inherit the beneficial characteristics from the parents. The offspring is assumed to be better than its parents, resulting in a better chance for survival. As the process iterates, in the end, it will result in a generation of the most fit individuals. This logic can be directly applied to a search problem. A set of solutions for a given problem can be observed and the best solutions can be selected. The most important example of the evolutionary algorithms is a genetic algorithm (GA) (Goldberg, 1989). The GA was used to solve numerous NP-hard real-life problems in the past, including scheduling and load balancing in the cloud computing (Wang et al., 2014, Zhan et al., 2014), designing convolutional neural networks (Baldominos et al., 2018, Suganuma et al., 2017), feature selection for machine learning (Kim et al., 2000, Xue et al., 2015), image processing (Bochinski et al., 2017, Nickolay et al., 1997) and so on. The second group of bio-inspired algorithms, swarm intelligence, was inspired by the social behavior expressed by the group of otherwise simple and primitive individuals: bees, ants, moths, fireflies, dragonflies, bats, fish, etc. These individuals in swarms exhibit coordinated and highly intelligent actions, without any dedicated central unit which will organize and coordinate all other individuals. This characteristic of the swarms was used as an inspiration for swarm intelligence algorithms (Yang, 2014). One of the first algorithms which was introduced in the domain of swarm intelligence is the particle swarm optimization algorithm (PSO) (Kennedy & Eberhart, 1995). The PSO performs the search by simulating the behavior of the flocks of birds and fish. This algorithm was successfully applied in solving numerous practical problems, including scheduling problems in the cloud computing (Kumar & Sharma, 2018). Another important representative of swarm algorithms with numerous applications for different NP-hard problems is artificial bee colony (ABC). The ABC metaheuristic has been tested against benchmark problems (Bacanin & Tuba, 2012), and had also been applied to practical NP-hard problems, as it can be seen from Kulkarni et al., 2016, Tuba and Bacanin, 2014, and Cheng, Qu, and Xu (2017). Another well-known swarm algorithm is bat algorithm (BA) (Yang, 2010), with numerous applications in a wide range of domains, i.e. solving the workflow scheduling problem in the cloud computing (Sagnika, Bilgaiyan, & Mishra, 2018). Another popular swarm metaheuristics is cuckoo search (CS) (Gandomi, Yang, & Alavi, 2013), which has also been successfully applied to numerous problems, such as cloud computing (Agarwal & Srivastava, 2018) and neural networks training (Tuba, Alihodzic, & Bacanin, 2015). Ant colony optimization (ACO) was one of the first swarm algorithms, and it has proven to be one of the most efficient approaches, as stated in Jovanovic and Tuba, 2011, Jovanovic and Tuba, 2013. Firefly algorithm (FA), inspired by the behavior of the fireflies and their lighting properties, has been extensively used in solving several NP-hard problems, in modified (Bacanin and Tuba, 2014, Strumberger, Bacanin et al., 2017) and hybridized versions (Tuba & Nebojsa, 2014). There are also other numerous novel swarm intelligence algorithms, and important representatives are monarch butterfly optimization (MBO) and moth search algorithms (MS). MBO was initially proposed by Wang and Deb in 2015 (Wang, Deb, & Cui, 2015), and was applied to numerous practical NP-hard problems with promising results including wireless sensor networks localization problem (Strumberger, Tuba, Bacanin, Beko, & Tuba, 2018), cloud computing optimization problems (Strumberger, Tuba, Bacanin, & Tuba, 2019) and many others (Strumberger, Tuba et al., 2018, Strumberger et al., 2020). The MS algorithm, on the other hand, was proposed in 2016 by Wang (2016). It was inspired by the behavior of moths, more precisely phototaxis, and Lévy flights of the moths. MS has proven to be one of the best algorithms for global optimization benchmark problems, and also showed promising results for solving some real-life NP-hard problems, such as the problem of the drone placement (Strumberger, Sarac, Markovic, & Bacanin, 2018) and localization problem in the wireless sensor networks (Strumberger, Tuba, Bacanin, Beko, & Tuba, 2018). Besides already mentioned algorithms, many other swarm algorithms exist, such as elephant herding optimization (EHO) (Correia et al., 2018, Strumberger, Bacanin, Beko et al., 2017, Strumberger, Beko et al., 2018, Strumberger, Minovic et al., 2019), tree growth algorithm (TGA) (Strumberger et al., 2019, Strumberger et al., 2019), brain storm optimization (BSO) (Tuba, Strumberger, Bacanin, Zivkovic, & Tuba, 2018), and many others (Strumberger, Bacanin, Tuba, & Tuba, 2019). As it has been already mentioned, the ANFIS has shown some very promising results for prediction model development in a wide spectrum of different domains. In order to achieve good prediction, the training process of ANFIS is crucial. Nevertheless, the quality and precision of the model can be further improved drastically, by optimizing the model parameters. There are numerous optimization methodologies available, however, the most promising approach is the application of the swarm intelligence metaheuristics to reinforce the parameters and outputs of the ANFIS. In Karaboga and Kaya (2019b), ABC algorithm was applied for ANFIS optimization, in order to estimate the number of foreign visitors coming to Turkey. The same authors proposed training ANFIS with and adaptive and hybrid ABC algorithm, as shown in Karaboga and Kaya (2019c). Another recent paper (Mir, Kamyab, Lariche, Bemani, & Baghban, 2018) discusses ANFIS paired with PSO with a goal to estimate gas density based on the pressure, temperature, molecular weight, and other important gas parameters. The proposed ANFIS-PSO model was more accurate when compared to other gas prediction models. In the manuscript (Al-qaness et al., 2020) the flower pollination algorithm (FPA) by using the salp swarm algorithm (SSA) to improve the ANFIS was proposed. SSA is applied to improve the FPA flaws, such as getting trapped in the local optima. Hybridized ANFIS approach was utilized recently in numerous applications in the domain of sustainability. Work presented in Seifi, Ehteram, Singh, and Mosavi (2020) deals with six metaheuristics approaches used to hybridize the artificial neural network (ANN) and ANFIS with a goal to predict the monthly groundwater level. Authors were able to conclude that the approach where ANFIS was hybridized with grasshopper optimization algorithm (GOA), called ANFIS-GOA, showed superior performance and enhanced the ANFIS accuracy drastically. ANFIS-PSO approach was used in Adedeji, Akinlabi, Madushele, and Olatunji (2020) to predict the potential power output of wind turbines. The proposed ANFIS-PSO was compared to the standalone ANFIS, and provided better forecast accuracy, with a cost in a higher computational time. The research presented in Xu, Huang, Li et al. (2020) proposed the ANFIS hybridized with the vibration particle swarm optimization (VPSO). ANFIS-VPSO was then utilized to optimize the reasoning system in the milling process and reduce the energy consumption, while improving the efficiency of the tools. In Yaseen et al. (2019), authors evaluated three different algorithms, namely PSO, GA and differential evolution (DE), and integrated them with the ANFIS with a goal to predict rainfall time series. The presented results showed that all three hybridized approaches, ANFIS-PSO, ANFIS-GA and ANFIS-DE performed better than the conventional ANFIS. Hybrid ANFIS-PSO and ANFIS-DE were analyzed and compared in Dormishi, Ataei, Khaloo Kakaie, Mikaeil, and Shaffiee Haghshenas (2019) with a goal to predict and optimize the performance of gang saw in the process of cutting the carbonate rocks. The obtained results showed that the ANFIS-PSO performance were more superior than ANFIS-DE and conventional ANFIS. Hybridized ANFIS approach was also analyzed in Bemani, Baghban, and Mosavi (2020), where authors compared and evaluated ANFIS coupled with five different evolutionary algorithms for predicting the diffusivity coefficient of carbon dioxide. ANFIS was hybridized with PSO, GA, ACO, DE and backpropagation (BP) algorithms. Obtained results showed that the hybrid ANFIS-PSO outperforms all other approaches. Finally, ANFIS-VOA (virus optimization algorithm) approach was utilized in Behnood, Golafshani, and Hosseini (2020) for predicting the COVID-19 infection rate by observing various climate-related variables. Different hybridized ANFIS implementations for various research problems are shown in Table 1.
Table 1

Hybridized ANFIS implementations.

MethodProblem descriptionReference
ANFIS-ABCEstimation of tourists coming to TurkeyKaraboga and Kaya (2019b)
ANFIS-PSOEstimation of gas density based on gas parametersMir et al. (2018)
ANFIS-FPASSAForecasting COVID19 cases in ChinaAl-qaness et al. (2020)
ANFIS-GOA, CSO, WA, GA, KA, PSOForecasting the monthly groundwater levelSeifi et al. (2020)
ANFIS-PSOForecasting the output of wind turbinesAdedeji et al. (2020)
ANFIS-VPSOOptimization of the tools for milling processXu, Huang, Li et al. (2020)
ANFIS-PSO, GA, DEForecasting the rainfall time seriesYaseen et al. (2019)
ANFIS-PSO, DEOptimizing the gang saw rock cuttingDormishi et al. (2019)
ANFIS-PSO, GA, ACO, DE, BPPredicting the diffusivity of carbon dioxideBemani et al. (2020)
ANFIS-VOAPredicting COVID19 infections rateBehnood et al. (2020)
Hybridized ANFIS implementations.

Overview of adaptive neuro-fuzzy inference system

Neuro-fuzzy systems are widely used today to model various real-life problems. They have gained popularity among the scientific society because they efficiently combine the advantages of fuzzy logic and artificial neural networks. The fuzzy logic component takes care of the learning abilities, while the artificial neural network component takes the feature interpretation from fuzzy logic. By using these two approaches together, it is possible to eliminate the drawbacks of individual components, and neuro-fuzzy systems have proven to have much more superior features. The ANFIS, which was originally developed by Jang in 1993 (Jang, 1993), belongs to the group of neuro-fuzzy systems. It is based on the Takagi–Sugeno inference model (Angelov and Filev, 2004, Johansen et al., 2000), which generates the mappings between the inputs and the outputs by obtaining and applying IF-THEN rules. To achieve this goal, the ANFIS model has to be trained. The error is given by the difference obtained when comparing the output during the training with the actual output of the observed system. Based on the error status, the parameters of the ANFIS model are repeatedly updated to achieve the optimum structure of the model. Fig. 1 shows one example of an ANFIS structure, which consists of two inputs and one output, and five layers in total. In fact, the neural network architecture which is utilized in ANFIS consists of five fixed layers: fuzzification (layer one), fuzzy inference system (layers two and three), defuzzification (layer four), and aggregation (layer five).
Fig. 1

ANFIS model and parameters used for training.

On layer one, every node is adaptive with one parametric activation function. Membership functions use values of the inputs to obtain fuzzy clusters. Different membership functions can be utilized to calculate the membership values, where some of the most commonly used functions include generalized bell function, trapezium, triangle, gaussian, and sigmoid. These calculated membership values are within the range of . Parameters are used to set the form of the utilized membership function, and they are used in ANFIS training. These parameters are often referred to as antecedent parameters. The output is the membership degree of input values which satisfy the membership functions. For example, generalized bell membership function is given with Eqs. (1), (2). ANFIS model and parameters used for training. On layer two, each node is a fixed node, and output is the product of the input signal. Typically, it applies the fuzzy operation AND. Firing strengths are calculated by utilizing the membership values which are the output of level one. Values are obtained as a product of the membership values, represented in Eq. (3): On layer three, each node is fixed, and it computes the normalized firing strengths for each rule by utilizing the firing strengths which are the output from level two. Normalized firing strength for the rule is computed as a ratio of firing strength of the rule relative to the sum of all firing strengths, represented in Eq. (4): On layer four, which is known as a defuzzification layer, each node is adaptive. Here, the output for each rule is calculated by multiplying the normalized firing strength from the previous layer by a first-order polynomial. The set of polynomial’s parameters are known as the conclusion parameters, which are used in the training of the ANFIS model. The output for every rule is computed by using Eq. (5): On level five, every node is fixed and adds all incoming values. Therefore, the final output of ANFIS is a sum of outputs of each rule from the level four, and it can be computed using Eq. (6): The training process of ANFIS in practice refers to the optimization of the parameters used in the model. ANFIS parameters include the number of inputs to the system, types, and the number of the membership function utilized in the model, and a total number of rules used in the model. Together with the antecedent and conclusion parameters, this represents a set of parameters that can be optimized. In this paper, an enhanced BAS algorithm has been employed to perform the optimization, however, only antecedent and conclusion parameters have been taken into the account, while a generalized bell membership function has been chosen. Therefore, the type of membership function was not subject to the optimization process.

Proposed hybrid machine learning method

In this section, the method that was proposed in this research will be described. First, an overview of the original BAS will be outlined. Then, insights into basic BAS’s deficiencies will be discussed and the improved BAS algorithm that overcomes those deficiencies will be explained. Lastly, a developed ANFIS-based framework that utilizes enhanced BAS metaheuristics for the training will be shown.

Original BAS algorithm

The BAS algorithm is a novel bionic algorithm introduced in 2017 by the Jiang and Li (2017). The algorithm was inspired by the behavior of longhorn beetles, more precisely by the process of detecting and searching for food. The longhorn beetles have two antennas which they use to detect the food smell concentration. If the higher smell concentration is detected by the left antennae, the longhorn beetle will fly to the left. Similarly, if the smell concentration detected by the right antennae is higher, it will fly to the right. By doing so, the beetle is able to find the food successfully in an unknown environment. The BAS algorithm mimics this process and it can achieve efficient optimization, without prior knowledge about the particular form of the function and its gradient. It also requires only one individual, which has a great impact on lowering the computational complexity of the algorithm. This algorithm can be utilized to enhance the calculation efficiency for the back propagation (BP) algorithm in neural networks and help it find the global optimal solution with a higher probability, by determining the hyperparameters in an intelligent manner. This relatively novel metaheuristic has already shown some promising results on real-life optimization problems. It was applied in Xu, Huang, and Ma (2020) to improve the BP neural network model to predict the gas explosion pressures. It was also used to solve other optimization problems as well, such as path planning for mobile robots with collision-free capability (Wu, Lin, Jin, Chen, Li, & Chen, 2020), conditioning optimization of extreme learning machine (Zhang et al., 2018), intelligent fault diagnosis of wind turbine rolling bearings (Wang, Yao, Cai, & Zhang, 2020) etc. The BAS algorithm considers the position of the beetle as a vector at time instant and defines the concentration of odor at position by the fitness function . The maximum value of the fitness function marks the source of the odor. Next, BAS algorithm utilizes two rules inspired by the beetle using antennae to search and explore an unknown environment in a random fashion. First, the searching behavior of beetle in a random direction can be modeled by Eq. (7): Where, stands for the random function, while represents the dimension of the position. Afterwards, the searching behaviors of the right and left antenna respectively can be modeled by Eqs. (8), (9): where and mark the positions located on the right and left side of the searching area, respectively. The sensing range of the antenna is marked with , and it corresponds to the exploit ability, which must be large enough to cover an adequate searching area in order to be capable of jumping out of local minimum points at the beginning and then attenuate as time elapses. The detecting behavior is formulated by the iterative model, which associates the detection of odor by considering the searching behavior, as described by Eq. (10). where represents the step size of each iteration, and is the sign function. The searching parameters, such as antenna length and step size , are updated according to the rules given by Eqs. (11), (12): The pseudo-code of the original BAS is presented in Algorithm 1 below.

Improved BAS algorithm

Similarly as in Wu et al. (2020) and Xie, Chu, Zheng, and Liu (2019), by running empirical simulations, it can be concluded that some components of the original BAS metaheuristics could be enhanced. The main drawbacks of the basic BAS refer to the premature convergence with implications that the search process may be trapped in local optimums. In some runs, the diversity of the population in early iterations is not on the satisfying level and the whole population, due to the stochastic nature, may converge to sub-optimal solutions. This scenario arises when initial pseudo-random individuals are deployed in parts of the search space that are far from optimum regions. The original BAS algorithm in each iteration performs search around the current solution (Eqs. (8)–(10)). Exploration and exploitation processes, as well as the balance between them, are controlled by antenna lengths () and the step size (). Those are also parameters that will be updated in each iteration by using Eqs. (11), (12) for and , respectively. However, by adjusting the values of these parameters it is very hard to establish an appropriate balance between exploration and exploitation, which is in most cases adjusted in favor of exploitation because the search process is guided by the position of the current solution . The simulations with 500 runs have been performed for standard unconstrained benchmark instances, and it was evident that on average 25% the original BAS could not converge properly and that leads to the unacceptable mean values. Therefore it can clearly be concluded that the original BAS could be improved by establishing stronger exploration in early iterations. At this stage of the algorithm’s execution, it is important to have stronger diversification, so the algorithm could find part of the search region, where the optimum solution resides. To efficiently address observed deficiencies of the basic BAS, two mechanisms have been incorporated: inspired by the approach presented in C., G., H., and T. (2019), Cauchy mutation operator has been adopted in the original BAS to improve solutions diversity and to control whether Cauchy perturbation (mutation) operator will be executed, or not, a mechanism that is similar to one used in the ABC algorithm has been implemented (Karaboga & Akay, 2011) First, the goal is to improve exploration ability and solutions diversity of original BAS in early phases of execution by incorporating Cauchy mutation operator. The Cauchy distribution is utilized to conduct Cauchy variation on solutions that do not converge in consecutive iterations ( is additional control parameter of improved BAS which will be explained later). The basic idea behind this approach is that some solutions may be trapped in the local extreme hence an external intervention is required (Cao, Iosifidis, Chen, & Gabbouj, 2018), so the search can be redirected towards exploration (global search) (El-Ela, El-Sehiemy, & Abbas, 2018). For single dimension random variable of Cauchy distribution, density function is defined as C. et al. (2019): where represents standard Cauchy distribution. In many evolutionary algorithms’ implementations, Cauchy and Gaussian mutations operators are used. However, since the peak of Cauchy distribution at the origin is smaller than in Gaussian and the speed of converging towards axis indefinitely at both ends is slower, Cauchy distribution is more efficient in generating random numbers, which can be substantial in avoiding algorithm to fall into sub-optimum domains (C. et al., 2019). In the proposed improved BAS implementation, for each parameter of solution , a step vector of length (in this case represents number of solutions’ parameters) is generated in the following way: where and denote the maximum and minimum value of parameter , respectively, is the Cauchy mutation probability, while the Cauchy variation expression is represented as . New solution in iteration is then generated by using Eq. (15): The global exploration ability of Cauchy operator is needed in early iterations of the algorithm’s run. However, in the later phases, when the search has converged to the optimal domain, this mechanism is not useful anymore. To control this behavior, a new parameter has been added — Cauchy mutation invocation (). If the condition is satisfied, then the Cauchy mutation is triggered, otherwise, the new solution is created as in the original BAS algorithm, according to Eq. (10). Moreover, in early phases of algorithms execution, Cauchy mutation operator will be applied (Eq. (15)) only to solutions that are not being improved in consecutive iterations. The consecutive iterations stagnation () is another control parameter of the improved BAS algorithm. To incorporate this behavior, each solution in the population is encoded by using attribute — not improved counter (). In each iteration, if the solution is not improved by using the standard BAS equation, the is incremented by one. Finally, when conditions and are met, Cauchy operator will be triggered. As it can be seen from Eq. (14), only for certain parameters, when condition is satisfied, Cauchy variation expression is applied. This method allows for greater control over the global search. If this operator would be applied to all parameters, the exploration will be too strong, and the balance will be set in favor of a global search that would generate lower solutions’ quality. Moreover, in order to establish better adjustable balance between exploitation and exploration, attenuation coefficient () has been employed for the step size and the antenna length (sensing diameter) along with the minimum antenna length (sensing diameter length) . Similar approach was performed in Xie et al. (2019) and Wu et al. (2020). Expression of the original BAS for calculating these two parameters (Eqs. (11), (12)) are replaced with the following ones: Motivated by the nature of modifications, the proposed approach is named Cauchy exploration strategy BAS (CESBAS). By introducing Cauchy mutation and three additional control parameters (, and ) proposed improved CESBAS outperforms basic BAS versions, as it is shown in Section 5. It must be noted here that the optimal values of the parameters which were used in this manuscript were determined empirically, by conducting simulations with trial and error approach. Pseudo-code of the proposed CESBAS metaheuristics is given in Algorithm 2. In the provided pseudo-code, represents total number of solutions in the population, denotes the fitness function, marks th solution in the th iteration, while and denote representation of the best solution and its fitness, respectively.

Proposed hybrid method

Since the ANFIS parameters have a significant influence on the overall ANFIS system performance, and an optimum combination of parameters’ values represent NP-hard optimization problem, swarm algorithms could be applied to improve ANFIS time series forecasting. The goal of the proposed hybrid method is to enhance ANFIS performance by determining its parameters via CESBAS metaheuristics approach. The hybrid approach was named CESBAS-ANFIS. The process of training ANFIS refers to the optimization of its structure and parameters for a specific problem. The number of inputs and rules, along with type and number of membership functions provide the total number of parameters in the ANFIS structure. In the proposed approach, the total number of parameters that should be optimized is represented as the sum of antecedent and conclusion parameters. In devised hybrid CESBAS-ANFIS method, a similar strategy has been used as in Al-qaness et al. (2020). The proposed method is based on the classic ANFIS model and employs five layers (Fig. 1). Input variables are provided in Layer 1, while Layer 5 generates foretasted values. The best weights between layers 4 and 5 are determined by the CESBAS approach in the ANFIS training process. Benchmark function details. At the beginning of execution, the CESBAS-ANFIS prepares input data by formatting it in time series form. As in Al-qaness et al. (2020), for this purpose, autocorrelation function (ACF) has been used, as means to find the patterns in the data. Variables with ACF value greater than 0.2 have been considered. To train and evaluate the model, the train-test-split approach has been used, with the 75% of data set used for training, while the remaining 25% was used for testing. Moreover, the fuzzy c-mean (FCM) method was used for ANFIS model construction. The ANFIS parameters are then trained by the CESBAS metaheuristics. The best solution (ANFIS structure) generated by the CESBAS is then returned to the ANFIS and the test phase is performed with this solution. Each CESBAS solution represents one ANFIS structure. The length of each solution is the sum of antecedent and conclusion parameters. The type of membership function was not considered. To calculate fitness of each solution (potential ANFIS structure) in the training phase by the CESBAS, the mean square error (MSE) metrics is used: where and represent the predicted and the actual data for each observation, respectively, and the total number of observations is denoted as . The fitness of each solution from the population (), is then calculated by utilizing the following expression: It should be noted here that only generalized bell function has been considered as membership function in the conducted simulations since as it was mentioned earlier the membership function was not considered as a variable of the optimization process. Flow chart diagram of the proposed CESBAS-ANFIS is shown in Fig. 2.
Fig. 2

CESBAS-ANFIS flow chart diagram.

CESBAS-ANFIS flow chart diagram.

Experimental setup and simulations

The experimental section is divided into two parts. In the first part, results obtained on standard tests for unconstrained (bound-constrained) benchmarks have been shown and analyzed with the goal of validating proposed CESBAS on a wider range of benchmark instances. Since the original BAS was also tested on these benchmarks, a comparative analysis with the original BAS has been performed, as well as with one other enhanced BAS implementation and one improved PSO approach, for which the results were retrieved from the modern literature (Xie et al., 2019). In the second part of the simulation section, results for predicting COVID-19 cases will be shown on one practical study and a comparative analysis is performed with other approaches that were tested on the same datasets and in the same experimental environment (Al-qaness et al., 2020). BAS and CESBAS approaches have been implemented in the Python environment. For testing ANFIS-BAS and ANFIS-CESBAS hybrid methods, anfis 0.3.1 module for Python has been utilized. Moreover, for the purpose of results’ visualization, data science Python libraries have been utilized: scipy, pandas, pyplot and seaborn. Since each fitness function evaluation requires training and testing ANFIS with the available dataset, it utilizes a lot of computational resources. Therefore all simulations were performed on computer platform with 6 NVIDIA GTX 1080 GPUs with Intel® CoreTM i7-8700K CPU and 32 GB of RAM running under Windows 10 x64 operating system.

Simulation results for unconstrained benchmarks

Before validating CESBAS on the practical problem of COVID-19 outbreak prediction, experiments have been conducted on six well-known unconstrained benchmark instances with 20 dimensions. Formulations of benchmark functions (dataset) that were utilized in the simulations are given in Table 2.
Table 2

Benchmark function details.

IDFunction nameFunction definitionGlobal minimumSearch domain
f1Ackleyf(x)=20e(0.21ni=1nxi2)e(1ni=1ncos(2πxi))+20+e(1)f(0,0)=05x,y5
f2Rastriginf(x)=i=1n[xi210cos(2πxi)+10]f(0,,0)=05.12xi5.12
f3Sum Squaresf(x)=n=1N(nxn2)f(0)=0xi[10,10]
f4Spheref(x)=n=1Nxn2f(0,,0)=0xi,1in1in
f5Griewankf(x)=1+i=1nxi24000i=1ncos(xii)f(0,,0)=0xi[600,600]
f6Salomonf(x)=1cos(2πi=1Dxi2)+0.1i=1Dxi2f(0,,0)=0xi[100,100]
Since BAS belongs to the group of relatively novel metaheuristics, in the literature survey of recent computer science literacy, authors have identified only one paper that provides simulation results of original BAS on standard unconstrained benchmarks (Xie et al., 2019), therefore for comparative analysis purposes the same benchmarks have been used that were utilized in this paper. Moreover, for the sake of objective comparative analysis, simulations were performed under the same conditions as in Xie et al. (2019). In this manuscript, an improved BAS was proposed and both original and improved versions were tested in 50 independent runs with 200 iterations per run and with only one solution in the population. Improved BAS was compared with the original BAS, as well as with linear decreasing weight PSO (LDWPSO). The LDWPSO algorithm was tested with 3 solutions in the populations since in the original and improved BAS three functions evaluations are performed in each iteration for each solution (left and right antenna and centroid). For more details regarding specific parameters’ setup of original BAS, improved BAS, and LDWPSO, please refer to Xie et al. (2019). Specific parameters’ of the CESBAS were set as follows: and were set to 0.95 and 0.01, respectively, was set to 0.5, to 5, while the value for was adjusted to 80. The values of all parameters were determined empirically by conducting simulations. It must be pointed out that the same values for and were used as in the original BAS (Eqs. (11), (12)) (Jiang & Li, 2017). Simulation environment parameters of CESBAS approach for unconstrained benchmarks are summarized in Table 3.
Table 3

CESBAS experiment parameters for unconstrained benchmarks tests.

ParameterNotationValue
Number of runsruns50
Number of iterationsTmax200
Attenuation coefficientα0.95
Min sensing diameter lengthd00.01
Cauchy mutation probabilityCMP0.5
Cauchy mutation invocationcmi80
Consecutive iterations stagnationcis5
In order to better present the search history and how the CESBAS algorithm performs the search, 2D Gaussian KDE (Kernel density estimation) and Surface plot of the Gaussian 2D KDE have been generated for all six unconstrained benchmark functions after 100 iterations. 2D Gaussian KDE are given in Fig. 3, while the Surface plot is shown in Fig. 4
Fig. 3

CESBAS 2D Gaussian Kernel.

Fig. 4

CESBAS 2D Gaussian Kernel Surface plot.

CESBAS experiment parameters for unconstrained benchmarks tests. The results for quality and convergence speed in terms of the number of iterations were taken as criteria for comparison with mean and standard deviation metrics calculated over 50 independent runs. Results for basic BAS improved BAS and LDWPSO were retrieved from Xie et al. (2019). Comparative analysis results are summarized in Table 4 (for the sake of better visual representation, the best results for each test category are marked bold).
Table 4

Experimental results and comparative analysis for simulations with unconstrained benchmarks.

CriterionMetricsMetaheuristicsf1f2f3f4f5f6
Results qualityMeanCESBAS0.310.030.000.000.110.11
Imp.BAS0.550.060.000.000.140.14
BAS13.1661.2115.181.375.477.05
LDWPSO12.7458.2455.8411.3136.706.19
StdCESBAS0.200.020.000.000.070.10
Imp.BAS0.760.100.000.000.200.10
BAS6.5223.0817.642.6112.352.75
LDWPSO5.9923.1370.5212.9139.924.7

ConvergenceMeanCESBAS47.2115.134.004.253.007.15
Imp.BAS55.220.32.93.53.210.5
BAS112.498.8039.0039.9037.2043.60
LDWPSO82.2079.2037.2044.2051.4064.60
StdCESBAS23.206.202.452.101.507.00
Imp.BAS38.4013.701.101.701.708.00
BAS29.9029.609.708.509.909.60
LDWPSO41.1047.6024.0027.3026.1026.80
CESBAS 2D Gaussian Kernel. CESBAS 2D Gaussian Kernel Surface plot. On average, when all criteria and metrics are taken into account, the proposed CESBAS exhibits better performance than all other approaches as evident by the comparative analysis. The second best algorithm is improved BAS, which was proposed in Xie et al. (2019). In results quality comparison, four out of six tests, CESBAS outperforms improved BAS. For relatively easy benchmarks, and , both algorithms in all runs managed to obtain global optimum. Moreover, in , and instances CESBAS also managed to establish better standard deviation of results than improved BAS, while in , and tests, metaheuristics showed the same performance. Experimental results and comparative analysis for simulations with unconstrained benchmarks. As noted before, in Table 4, convergence time was shown in terms of the number of iterations that the algorithms took to converge to the best solution in each run. In the case of simpler functions and , improved BAS exhibits better convergences speed than the proposed CESBAS for both indicators, mean and standard deviation. However, for all remaining functions, CESBAS managed to outperform the improved BAS. The LDWPSO was proven to be the worst approach that was considered in comparative analysis, while both CESBAS and improved BAS showed significantly better results than the basic BAS metaheuristics. It can be concluded from the comparative analysis table that, due to the lack of exploration, in some runs BAS could not achieve the right part of the search space which led to worse mean values. Even for simple and benchmarks, BAS failed to achieve optimum. The same can be stated for convergence speed. The proposed improved CESBAS managed to significantly improve results quality as well as convergence speed of the original BAS algorithm and obtained a significantly better results in all tests for all criteria and performance metrics. Visual comparative analysis of results quality between CESBAS and the basic BAS is given in Fig. 5, Fig. 6, Fig. 7. Swarm plot diagrams of the best obtained results for 50 independent runs is shown in Fig. 5. Each point in the diagram represents the result of one run. Similarly, in Fig. 6, Fig. 7, box plot diagrams (box and whiskers) and histogram are shown, respectively from the same data set. From the presented figures it is clear that in some runs, basic BAS did not converge to the optimum region, and these results are distributed far away from the median value.
Fig. 5

Swarmplots - (a) CESBAS vs. (b) BAS.

Fig. 6

Box plot diagrams CESBAS vs. BAS.

Fig. 7

Results’ quality histogram CESBAS vs. BAS.

The visual representation and comparative analysis between the proposed CESBAS and the original BAS was performed while taking into account convergence speed criteria and is presented in Fig. 8. On the given graphs, convergence speed averaged over 50 independent runs is shown. From the given figure it is obvious that CESBAS shows much better convergence than original BAS metaheuristics.
Fig. 8

Convergence speed — CESBAS vs. BAS.

Swarmplots - (a) CESBAS vs. (b) BAS. Box plot diagrams CESBAS vs. BAS.

COVID-19 cases prediction simulations

As noted above, in the second part of simulations, CESBAS was validated against an important and current challenge of predicting COVID-19 cases by using the dataset from China. In this subsection, performance metrics that are used for testing the proposed CESBAS-ANFIS method are shown first. Then, the employed dataset and control parameter setup are shown, and finally comparative analysis with other state-of-the-art approaches that were tested on the same dataset and under similar experimental conditions is presented. For more details about the proposed CESBAS-ANFIS hybrid approach, please refer to Section 4.3.

Performance metrics

The quality and performance of the proposed CESBAS-ANFIS approach have been evaluated by utilizing standard metrics for regression: root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared relative error (RMSRE) and coefficient of determination (). Mathematical expressions of these metrics are provided in the following few paragraphs. The RMSE is calculated as given in Eq. (20). where and stand for the predicted and actual values, respectively, and the data sample size (number of observation) is given with the parameter . The MAE, MAPE, RMSRE and () indicators are calculated by Eq. (21), Eq. (22), Eq. (23), (24), respectively: where denotes the average value of . Smaller values of RMSE, RMSRE, MAE, and MAPE indicate better performance of the proposed approach, while the higher value of indicates a better correlation and thus better results’ quality.

Datasets and control parameters’ setup

To test the performance of the proposed method, certain time period of COVID-19 dataset from China was used. Moreover, the performance of the proposed ANFIS-CESBAS was also compared with the hybrid between flower pollination algorithm and salp swarm algorithm (FPASSA) which was originally tested for the same problem (Al-qaness et al., 2020). In this paper, the FPASSA was also used for updating ANFIS parameters (FPASSA-ANFIS). Additionally, a comparative analysis was conducted with other state-of-the-art swarm algorithms used for updating ANFIS parameters as well as with the original BAS algorithm. For the sake of performing all of the aforementioned comparisons, in this experiment, the same dataset was taken and simulated in a similar experimental conditions as in Al-qaness et al. (2020). It can be noted that a larger dataset could have been taken, however, in this scenario it would not be possible to compare the performance of the proposed method with other algorithms, due to the fact that only a few methods, which results are published in the state-of-the-art journals, were implemented and evaluated for COVID-19 cases prediction. Since the implementation of the original BAS algorithm for this problem does not exist in the literature, to evaluate improvements of the CESBAS over original BAS for training ANFIS parameters, during this research, BAS-ANFIS has also been implemented. The dataset that was employed in experiments was retrieved from the World Health Organization (WHO) by merging reports of daily confirmed cases in China from January 21, 2020, till February 18, 2020. The data was captured from the following URL: www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/. This dataset is shown in Table 5, while its visual representation is provided in Fig. 9. Also, as in Al-qaness et al. (2020), 25% of the data was utilized for testing and the remaining 75% was used for training. In order to allow readers to see more clearly how the proposed CESBAS algorithm works, and to see the discrepancies between predicted and actual cases, the results which CESBAS obtained are shown along with the actual confirmed cases in Table 6.
Table 5

Confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020, retrieved and merged from WHO reports.

Date (D/M/Y)Confirmed casesDate (D/M/Y)Confirmed cases
21/1/20202785/2/202024363
22/1/20203096/2/202028060
23/1/20205717/2/202031211
24/1/20208308/2/202034598
25/1/202012979/2/202037251
26/1/2020198510/2/202040554
27/1/2020274111/2/202042708
28/1/2020453712/2/202044730
29/1/2020599713/2/202046550
30/1/2020773614/2/202048548
31/1/2020972015/2/202050054
1/2/20201182116/2/202051174
2/2/20201441117/2/202070635
3/2/20201728318/2/202072528
4/2/202020471
Fig. 9

Visual representation of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020 that were retrieved from WHO reports.

Table 6

Confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020, retrieved and merged from WHO reports, along with the predicted cases by CESBAS method.

Date (D/M/Y)ConfirmedPredictedDate (D/M/Y)ConfirmedPredicted
21/1/20202782615/2/20202436324701
22/1/20203092756/2/20202806028798
23/1/20205715237/2/20203121131760
24/1/20208307718/2/20203459835090
25/1/2020129713699/2/20203725137802
26/1/20201985202010/2/20204055441550
27/1/20202741259711/2/20204270843802
28/1/20204537456312/2/20204473045090
29/1/20205997613313/2/20204655046100
30/1/20207736790214/2/20204854847610
31/1/20209720957015/2/20205005451200
1/2/2020118211152016/2/20205117454950
2/2/2020144111449017/2/20207063557320
3/2/2020172831712018/2/20207252861423
4/2/20202047120190
It can be noticed from Table 6 that the proposed algorithm was not able to predict the large surge of the new cases between 16.2.2020 and 17.2.2020, leading to a slightly worse metrics. This is expected behavior, as in any machine learning algorithm for predicting the time series there is an error due to external unpredictable factors. The accuracy of the prediction depends on two factors, reducible and irreducible error, as shown in Eq. (25): where denotes the average of the squared distance between the actual and predicted value of Y, represents the reducible error, while denotes the irreducible error. Irreducible error cannot be reduced, no matter how well the prediction is performed (James, Witten, Hastie, & Tibshirani, 2014). The reason for this is that the prediction depends on some unmeasured variables which are useful, but unknown. In order to establish a better analysis of CESBAS-ANFIS performance, additional simulations have been conducted by using two datasets of confirmed influenza cases on a weekly basis, as in Al-qaness et al. (2020). The data for the first dataset (influenza dataset 1 - IDS1) was retrieved from the Center for Disease Control and Prevention (CDS) and this data refers to the time period between fourteenth week in 2015 and the sixth week in 2020 (Center for Disease Control and Prevention (CDS), 2020). The second dataset (influenza dataset 2 - IDS2), which was captured from the WHO website, and it comprises the data of confirmed influenza cases in China from week one in 2016 until the week 8 in 2020 (World Health Organization (WHO), 2020). Confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020, retrieved and merged from WHO reports. Confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020, retrieved and merged from WHO reports, along with the predicted cases by CESBAS method. Results’ quality histogram CESBAS vs. BAS. Convergence speed — CESBAS vs. BAS. Global CESBAS-ANFIS and BAS-ANFIS control parameters were adjusted similar to what was done in Al-qaness et al. (2020): population size was set to 8 and the number of iterations was adjusted to 100. Algorithms that were implemented in Al-qaness et al. (2020) were tested by using 25 solutions in the population. However, as in the case of unconstrained benchmarks tests, since both BAS and CESBAS algorithms perform three function evaluations per iteration for each solution, experiments were conducted with only 8 solutions in the population. Other BAS and CESBAS control parameters were set as shown in Table 3. Algorithms are executed in 30 independent runs and best values are noted in the comparative analysis table. In both experiments, with COVID-19 and influenza datasets, previous time series have been considered as independent variables, while the prediction of COVID-19 and influenza new cases are considered as dependent variables. Time-series dataset was prepared in the following way: time-series data is categorized into four inputs for the last four consequently even days’ of confirmed cases that are used for predicting , as the next day’s confirmed case. There is no prediction horizon, as in every following iteration the algorithm takes the predicted values for calculating the next value, and so on. This methodology is visualized in Fig. 10.
Fig. 10

Input and output visualization of proposed CESBAS-ANFIS method.

Experimental results, comparative analysis and discussion

In Al-qaness et al. (2020), besides proposed FPASA-ANFIS, results of other bio-inspired algorithms that were also used for updating ANFIS parameters were shown: ANFIS-GA, ANFIS-PSO, ANFIS-GA and ANFIS-FPA. Additionally, standard machine learning algorithms and methods were also included in the analysis including: artificial neural network (ANN), K nearest neighborhoods (KNN), support vector regression (SVR) and bare bones ANFIS. All these approaches were included in comparative analysis along with the original BAS. Comparative analysis is given in Table 7. In the presented table, the results of the best run are recorded for each method along with the performance metrics. Comparative analysis also includes computation time, however, this metrics cannot be objectively compared since the approaches in this paper have been tested on different computation platform than algorithms shown in Al-qaness et al. (2020). It can be noted that authors in Al-qaness et al. (2020) have not provided details of the computation platform that was used in simulations.
Table 7

Comparative analysis between CESBAS-ANFIS and other methods for COVID-19 outbreak prediction results for China dataset retrieved from the WHO reports.

MethodRMSEMAEMAPERMSRER2time
ANN8750541313.090.2040.8991
KNN1210076718.320.1300.7710
SVR782253548.400.0800.8910
ANFIS737555235.320.090.9032
PSO-ANFIS684245595.120.080.949224.18
GA-ANFIS719449635.260.080.957527.02
ABC-ANFIS832760666.860.100.790646.80
FPA-ANFIS605943795.040.070.943923.41
FPASSA-ANFIS577942714.790.070.964523.30
BAS-ANFIS706951256.560.100.795216.66
CESBAS-ANFIS432931954.080.060.976319.82
In provided comparative analysis (Table 7) best obtained results for each performance metric were marked in bold style. Established results categorically prove that the proposed CESBAS-ANFIS substantially outperforms all other approaches included in the comparative analysis by establishing the best results for all performance indicators that were taken into consideration. Hybrid FPASSA-ANFIS approach shows relatively good performance, as the second-best method included in the analysis, however still significantly lower than the proposed CESBAS-ANFIS. For example, the FPASSA-ANFIS manages to obtain score of 0.9645, while CESBAS-ANFIS achieves of 0.9763, which is relatively high. Comparative analysis between CESBAS-ANFIS and other methods for COVID-19 outbreak prediction results for China dataset retrieved from the WHO reports. Also, from the presented results it can be seen that the basic BAS (BAS-ANFIS) shows relatively modest performance when compared to other metaheuristics and performs alike ABC. Both, BAS-ANFIS and ABC-ANFIS for all metrics obtain similar results, with the slight advantage of the BAS metaheuristics. Considering “pure” machine learning approaches (ANN, KNN and SVR), BAS-ANFIS performs better. It is also interesting to notice that the bare bones ANFIS obtains better results’ quality than ABC-ANFIS and BAS-ANFIS.

CESBAS-ANFIS vs. BAS-ANFIS comparative analysis with world health organization dataset.

With the goal of evaluating performance improvements of proposed CESBAS-ANFIS over the BAS-ANFIS, authors have further performed the comparative analysis of these two methods. Visual representation of obtained results for CESBAS-ANFIS and BAS-ANFIS, which is shown in Fig. 11, clearly indicate that CESBAS-ANFIS manages to predict total cases in China with significantly greater accuracy than BAS-ANFIS.
Fig. 11

Prediction of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020 based on the data retrieved from WHO reports — CESBAS (left) and BAS (right).

Visual representation of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020 that were retrieved from WHO reports. Input and output visualization of proposed CESBAS-ANFIS method. Prediction of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020 based on the data retrieved from WHO reports — CESBAS (left) and BAS (right). Since both approaches were executed in 30 independent runs, with the goal of more detailed comparative analysis, results have been ranked, where the run with the rank 1 obtains best results, and run with the rank 30 obtains the worst result. Based on this data, a visual comparative analysis has been generated between CESBAS-ANFIS and BAS-ANFIS for RMSE and MAE indicators by using bar charts. This comparison is given in Fig. 12. Moreover, to perform detailed comparative analysis and to see results’ quality distribution of 30 runs, swarm plot diagrams have also been generated of the best results obtained in each run for RMSE and MAE metrics. This analysis is provided in Fig. 13.
Fig. 12

CESBAS-ANFIS vs. BAS-ANFIS visual comparative analysis in form of bar plots for confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020, based on the data retrieved from WHO reports for RMSE (left) and MAE (right) metrics.

Fig. 13

CESBAS-ANFIS vs. BAS-ANFIS visual swarm plot comparative analysis of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020, based on the data retrieved from WHO reports - (a) CESBAS, (b) BAS.

From the presented visual swarm plot comparative analysis it can be clearly seen that in some runs, BAS-ANFIS (labeled with b) in the figure misses the right part of the search space those results can be considered as outliers. In this example, in five runs, the BAS-ANFIS completely underscores and generates results with low quality (high RMSE and MAE values). This behavior is a consequence of the bad exploitation–exploration trade-off. However, CESBAS-ANFIS in all runs manages to generate satisfying results and there is no single run, where it misses the right part of the search space. In conclusion, it is stated that also in this practical example can be seen that the CESBAS overcomes drawbacks of the original BAS algorithm. CESBAS-ANFIS vs. BAS-ANFIS visual comparative analysis in form of bar plots for confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020, based on the data retrieved from WHO reports for RMSE (left) and MAE (right) metrics. CESBAS-ANFIS vs. BAS-ANFIS visual swarm plot comparative analysis of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020, based on the data retrieved from WHO reports - (a) CESBAS, (b) BAS.

COVID-19 cases prediction in China with Our World in Data dataset for the periods January 21, 2020 - February 18, 2020 and November 10, 2020–December 10, 2020.

When analyzing COVID-19 data that is available on the Internet, one more reliable source has been found on the web site Our World in Data (URL: https://ourworldindata.org/coronavirus-source-data). It can be observed that there are slight discrepancies between data from this web site and the data provided by the WHO reports for the observed period of time in China. In the WHO data, there has been a substantial increase in the number of new cases between February 12, 2020 and February 13, 2020 (from 44k to 59k). However, according to Our World in Data, even larger new cases increase happened between February 16, 2020 and February 17, 2020 (from 51k to 70k). This could potentially have an influence on the accuracy of prediction hence the authors decided to test CESBAS-ANFIS and BAS-ANFIS by using this dataset as well. This approach was taken to evaluate the robustness of CESBAS-ANFIS and BAS-ANFIS frameworks. This method can also reveal if significant changes in performance metric values would be seen. This dataset is visually represented in Fig. 14
Fig. 14

Visual representation of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020 that were retrieved from website Our World in Data.

Experiments have been performed under the same conditions as that was used for the WHO data and the results are presented for CESBAS-ANFIS and BAS-ANFIS in Table 8.
Table 8

COVID-19 outbreak results for dataset retrieved from Our World in Data website.

MethodRMSEMAEMAPERMSRER2time
BAS-ANFIS686449606.530.100.795416.50
CESBAS-ANFIS410629944.080.060.977519.43
Results presented in Tables 7 (data from the WHO) and Table 8 (data from the website Our World in Data) are just slightly different, which is excepted. Therefore, it can be concluded that both methods, CESBAS-ANFIS and BAS-ANFIS are not susceptible to changes in datasets. Visual representation of obtained results for CESBAS-ANFIS and BAS for the dataset retrieved from the Our World in Data are shown in Fig. 15.
Fig. 15

Prediction of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020 based on the data retrieved from website Our World in Data — CESBAS (left) and BAS (right).

COVID-19 outbreak results for dataset retrieved from Our World in Data website. Finally, the authors have taken recent data related to confirmed cases in China from the Our World in Data source, from the period of previous month (from October 10, 2020 till November 9, 2020) and trained proposed CESBAS-ANFIS model. The goal was to try to predict number of potential cases in the following thirty days time period (from November 10, 2020 until December 10, 2020). Results of the prediction are shown in Table 9.
Table 9

Expected number of confirmed cases in China for the period November 10, 2020–December 10, 2020.

DateExpected casesNewDateExpected casesNew
10/11/2020915782226/11/20209197926
11/11/2020915971927/11/20209200627
12/11/2020916182128/11/20209203731
13/11/2020916422429/11/20209207033
14/11/2020916712930/11/20209210030
15/11/2020917033201/12/20209213535
16/11/2020917383502/12/20209217641
17/11/2020917672903/12/20209221438
18/11/2020917942704/12/20209224834
19/11/2020918182405/12/20209227931
20/11/2020918452706/12/20209230728
21/11/2020918682307/12/20209233932
22/11/2020918902208/12/20209237031
23/11/2020919091909/12/20209240434
24/11/2020919302110/12/20209244238
25/11/20209195323

The infection rate trends and the climate variables.

In order to further evaluate the proposed CESBAS-ANFIS method, additional experiment was conducted by including the climate-related variables. This experiment was conducted by utilizing the same experimental setup as given in Behnood et al. (2020). That study utilized various climate factors to predict the speed of the COVID19 spread in the USA, while data was obtained from various sources. The observed factors included the average temperature, minimum temperature, maximum temperature, precipitation, humidity, wind speed and population density. Finally, hybridized ANFIS-VOA approach was utilized to predict the infection rate based on the aforementioned inputs. More details can be found in Behnood et al. (2020). The results of the conducted simulations are shown in the Table 10. The statistical indicators for linear regression, stand-alone ANFIS, ANFIS-VOA-I and ANFIS-VOA-II were obtained from the referenced paper (Behnood et al., 2020). From the presented results, it can be seen that ANFIS-BAS slightly outperforms the ANFIS-VOA-I approach, while it is still behind the ANFIS-VOA-II method. The proposed ANFIS-CESBAS method, however, manages to slightly outperform ANFIS-VOA-II method, and all other evaluated methods.
Table 10

Predicted infections rate based on the climate variables.

MethodRMSD(Infected people/Days)MAE (Infected people/Days)R2R-value
Linear regression43.020412.19120.39250.6257
ANFIS30.65159.01270.69110.8314
ANFIS-VOA-I27.65339.04940.74860.8653
ANFIS-VOA-II22.47447.33370.39250.6257
ANFIS-BAS26.55789.01160.75620.8741
ANFIS-CESBAS22.12457.15540.84570.9185
Expected number of confirmed cases in China for the period November 10, 2020–December 10, 2020. The infection rate trends with the changing of the most important climate input variables are given in Fig. 16. It can be observed that infection rate rises drastically with the increase of the population density, which could justify the need for the social distancing. It is also notable that the infection rate drops with the increase of the average temperature, while it also slightly drops with the increase of the wind. Finally, infection rate shows trends of increasing with the increase of the humidity.
Fig. 16

Infection rate trends by different climate variables.

Simulations with influenza datasets.

Finally, the comparative analysis has been performed with approaches presented in Al-qaness et al. (2020) for confirmed influenza datasets. The description of this dataset, as well as control parameters used in simulations, are given in Section 5.2.2. Visual representation of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020 that were retrieved from website Our World in Data. Based on the results presented in Table 11, it can be stated that when average results are taken into account, the proposed CESBAS-ANFIS establishes a better performance than all other approaches that have been included in the analysis. In simulations with IDS1 dataset, only FPASSA-ANFIS managed to obtain better MAPE value than CESBAS-ANFIS, while both approaches perform the same in terms of metric comparison.
Table 11

Computational results for datasets of confirmed influenza cases.

DatasetMethodRMSEMAEMAPERMSRER2time
IDS1ANFIS95257037.610.5510.969
PSO-ANFIS79849434.130.5100.97825.43
GA-ANFIS76648035.440.5300.9828.70
ABC-ANFIS87856439.790.5930.97249.27
FPA-ANFIS61841137.690.5700.97924.58
FPASSA-ANFIS60939132.580.4970.98624.55
BAS-ANFIS87255539.430.5880.97619.05
CESBAS-ANFIS59836232.620.4910.98621.32

IDS2ANFIS71840564.201.1980.858
PSO-ANFIS62035352.070.8700.89231.64
GA-ANFIS62236287.913.2160.90234.83
ABC-ANFIS69643353.301.1010.88760.87
FPA-ANFIS62237180.553.1520.89830.42
FPASSA-ANFIS61936745.020.8870.90930.39
BAS-ANFIS69243551.151.5290.87928.29
CESBAS-ANFIS61835944.910.8720.91137.43
However, the second best approach in simulations with IDS2 dataset proved to be PSO-ANFIS that managed to outperform CESBAS-ANFIS in MAE and RMSRE performance indicators. For other metrics, CESBAS-ANFIS showed better results. Similarly, as in the case of COVID-19 prediction simulations, original BAS exhibited similar performance as the ABC. Also, as in the previous test, the proposed CESBAS managed to completely outperform original BAS by establishing a better balance between intensification and diversification. Predicted infections rate based on the climate variables. Prediction of confirmed COVID-19 cases in China from January 21, 2020 till February 18, 2020 based on the data retrieved from website Our World in Data — CESBAS (left) and BAS (right). Infection rate trends by different climate variables.

Conclusion

In this manuscript, a novel method has been proposed to predict new COVID-19 cases by employing hybridized algorithm between machine learning, adaptive neuro-fuzzy inference system (ANFIS) and enhanced beetle antennae search (BAS) swarm intelligence metaheuristics. Since one of the greatest challenges in any machine learning approach is parameters’ optimization and adjustments for a specific practical problem, enhanced BAS algorithm was utilized for solving this task. The proposed Cauchy exploration strategy BAS (CESBAS) was tested on a standard set of unconstrained benchmarks and proved to be a robust metaheuristics that significantly outscored all other approaches including the original BAS. Additionally, CESBAS algorithm has been incorporated for updating ANFIS parameters and it was tested on a practical COVID-19 new cases prediction. The proposed method has been tested on the COVID-19 case study because it is currently the most important challenge the entire humanity is facing. However, the method can be generalized and applied to predict any time-series. Simulation results and comparative analysis showed that the proposed hybrid method has outperformed other sophisticated approaches that were tested on the same datasets and proved to be an useful tool for time-series prediction. The primary contribution of this paper is reflected in the fact that the prediction accuracy has been enhanced for the number of new confirmed disease cases on the COVID-19 case study. The ongoing COVID-19 outbreak showed complex nature, and the promising results from this research can provide an alternative disease outbreak modeling approach, which can be used by the authorities to decide what measures should be taken, and when to implement them. The prediction accuracy of the proposed model also suggests that, in case of a disease outbreak, machine learning models can be used together with traditional epidemiological models to predict the number of new confirmed cases. Proposed method can easily be applied to any time-series prediction. The secondary contribution of this research is the enhancement of the original BAS algorithm. Moreover, proposed CESBAS proved to be a very efficient metaheuristics that can be adapted for solving other real-life NP hard challenges. Computational results for datasets of confirmed influenza cases. The challenge with the proposed approach is presented in the testing process. That is because every change in control parameters of the utilized metaheuristics will require a new set of simulation runs. Keeping in mind that the training of ANFIS is extremely resource intensive and time consuming process, it would be necessary to work with graphical processing units and Cuda platform to provide the interested researchers with timely results. In this version of the algorithm, everything was done locally, offline. Datasets were retrieved into the csv files, and csv files were then loaded to the python environment by utilizing the pandas library. As part of the future work it is possible to modify the current solution to be online, by utilizing a RESTful web service which would expose the appropriate endpoints, making the service available to other researchers. It is recommended as part of future research in this domain to encompass additional modifications and improvements of the original BAS algorithm. Future work can focus on hybridizing BAS with other machine learning methods for classification, as well as for regression. Additionally, it is also possible to adapt basic and enhanced BAS versions for solving various NP hard challenges (WSN localization and energy consumption problem, cloud computing scheduling, portfolio optimization, etc.), since this metaheuristics showed a promising potential in this domain.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  22 in total

1.  A systematic review on AI/ML approaches against COVID-19 outbreak.

Authors:  Onur Dogan; Sanju Tiwari; M A Jabbar; Shankru Guggari
Journal:  Complex Intell Systems       Date:  2021-07-05

2.  COVID-19 and urban spaces: A new integrated CFD approach for public health opportunities.

Authors:  Asmaa M Hassan; Naglaa A Megahed
Journal:  Build Environ       Date:  2021-07-12       Impact factor: 6.456

3.  A two-step vaccination technique to limit COVID-19 spread using mobile data.

Authors:  MohammadMohsen Jadidi; Saeed Jamshidiha; Iman Masroori; Pegah Moslemi; Abbas Mohammadi; Vahid Pourahmadi
Journal:  Sustain Cities Soc       Date:  2021-03-27       Impact factor: 7.587

4.  Multi-type skin diseases classification using OP-DNN based feature extraction approach.

Authors:  Arushi Jain; Annavarapu Chandra Sekhara Rao; Praphula Kumar Jain; Ajith Abraham
Journal:  Multimed Tools Appl       Date:  2022-01-12       Impact factor: 2.577

5.  Investigation of robustness of hybrid artificial neural network with artificial bee colony and firefly algorithm in predicting COVID-19 new cases: case study of Iran.

Authors:  Mohammad Javad Shaibani; Sara Emamgholipour; Samira Sadate Moazeni
Journal:  Stoch Environ Res Risk Assess       Date:  2021-09-30       Impact factor: 3.821

6.  Biserial targeted feature projection based radial kernel regressive deep belief neural learning for covid-19 prediction.

Authors:  S Subash Chandra Bose; A Vinoth Kumar; Anitha Premkumar; M Deepika; M Gokilavani
Journal:  Soft comput       Date:  2022-03-31       Impact factor: 3.643

7.  ANN-Based traffic volume prediction models in response to COVID-19 imposed measures.

Authors:  Mohammad Shareef Ghanim; Deepti Muley; Mohamed Kharbeche
Journal:  Sustain Cities Soc       Date:  2022-03-10       Impact factor: 10.696

8.  Seesaw scenarios of lockdown for COVID-19 pandemic: Simulation and failure analysis.

Authors:  Behrouz Afshar-Nadjafi; Seyed Taghi Akhavan Niaki
Journal:  Sustain Cities Soc       Date:  2021-06-20       Impact factor: 7.587

Review 9.  Digital Technology-Based Telemedicine for the COVID-19 Pandemic.

Authors:  Yu-Ting Shen; Liang Chen; Wen-Wen Yue; Hui-Xiong Xu
Journal:  Front Med (Lausanne)       Date:  2021-07-06

10.  Improved manta ray foraging optimization for multi-level thresholding using COVID-19 CT images.

Authors:  Essam H Houssein; Marwa M Emam; Abdelmgeid A Ali
Journal:  Neural Comput Appl       Date:  2021-07-07       Impact factor: 5.606

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.