Literature DB >> 33564213

A review on COVID-19 forecasting models.

Iman Rahimi¹, Fang Chen², Amir H Gandomi².

Abstract

The novel coronavirus (COVID-19) has spread to more than 200 countries worldwide, leading to more than 36 million confirmed cases as of October 10, 2020. As such, several machine learning models that can forecast the outbreak globally have been released. This work presents a review and brief analysis of the most important machine learning forecasting models against COVID-19. The work presented in this study possesses two parts. In the first section, a detailed scientometric analysis presents an influential tool for bibliometric analyses, which were performed on COVID-19 data from the Scopus and Web of Science databases. For the above-mentioned analysis, keywords and subject areas are addressed, while the classification of machine learning forecasting models, criteria evaluation, and comparison of solution approaches are discussed in the second section of the work. The conclusion and discussion are provided as the final sections of this study.

Entities: Chemical

Keywords: Analysis; COVID-19; Forecasting; SEIR; SIR; Time series

Year: 2021 PMID： 33564213 PMCID： PMC7861008 DOI： 10.1007/s00521-020-05626-8

Source DB: PubMed Journal: Neural Comput Appl ISSN： 0941-0643 Impact factor: 5.102

Introduction

In December 2019, the Chinese government informed the rest of the world that a novel coronavirus, Severe Acute Respiratory Syndrome-Related Coronavirus 2 (COVID-19), was rapidly spreading throughout China, which quickly infiltrated many other countries. The United States Centers for Disease Control and Prevention (CDC) recognized a seafood market in Wuhan as the center of the outbreak. On January 13, 2020, the World Health Organization (WHO) reported a case in Thailand, the first case to be identified outside China. On January 16, Japan confirmed its first case, and on January 20, South Korea reported its first confirmed case. Nowadays, most countries in the world have been affected by this virus. Putra and Khozin Mu’tamar [1] used the Particle Swarm Optimization (PSO) algorithm to estimate parameters in the Susceptible, Infected, Recovered (SIR) model. The results indicate that the suggested method is precise and has low enough error compared to other analytical methods. Mbuvha and Marwala [2] calibrated the SIR model to South Africa’s reported cases after considering different scenarios of the reproduction number (R0) for reporting infections and healthcare resource estimations. Qi and Xiao [3] proposed that both daily temperature and relative humidity can influence the occurrence of COVID-19 in Hubei and other provinces. Salgotra and Gandomi [4] developed two COVID-19 prediction models based on genetic programming and applied these models in India. Findings from a study by [4] show that genetic evolutionary programming models have proven to be highly reliable for COVID-19 cases in India. The rest of paper is organized into the following sections. Sections 2 and 3 present the search method procedure and other reviews, respectively. Section 4 shows the main research fields. Generic illustrations are provided in Sect. 5. Mathematical modeling and criteria evaluation are presented in Sects. 6 and 7. Solution approaches, including autoregressive model, exponential models, deep learning, regression methods, etc., are described in Sect. 8. Section 9 depicts the strengths and weaknesses of the various forecasting models. Finally, the conclusion, discussion of results, and future directions are presented in Sect. 10.

Search method procedure

The process used to identify the articles for this study’s review is presented in this section.

Search method

Web of Science (WOS) and Scopus were used to find related publications based on the following keywords: forecasting, prediction, COVID-19, and coronavirus. The classification of the chosen published works based on the subject area is displayed in Fig. 1. Updated articles from the beginning of 2020 to now were filtered from Scopus using the Boolean operator OR, for both topics and titles. We selected 920 technical research articles that contain only algorithmic descriptions, review articles, conference papers, case studies, and provide managerial insights, which were published as of October 10, 2020 (Fig. 2). In addition, this study focuses more on those papers that were indexed by the Web of Science.

Fig. 1

Classification of scientific papers based on subject area

Fig. 2

Research methodology used in this paper

Classification of scientific papers based on subject area Research methodology used in this paper

Other reviews

Mahalle and Kalamkar [5] categorized forecasting models as mathematical models and machine learning techniques, using WHO and social media communications as datasets. Significant parameters including death count, metrological parameters, quarantine period, medical resources, and mobility were also studied [5]. Naudé [6] provided a review of the contribution of artificial intelligence (AI) against COVID-19. Some fields of AI that have contribution against COVID-19 have been identified as early warnings and alerts, tracking and prediction, data dashboards, diagnosis and prognosis, treatments and cures, and social control [6].

Main research fields

Keywords are critical in identifying the appropriate literature in a research field [7]. As specified by [8]: “keywords represent the core research of a paper.” A keywords network offers a copy of an information area that provides insight into the available subjects and how these topics are related and sorted [9]. Therefore, the VOSviewer 1.6.11 software was applied to provide a keyword co-occurrence network, and bibliographic data were derived from Scopus. Author keywords were used to generate a network of keywords. A sum of 1931 keywords were obtained from the dataset, regarding the full counting. Table 1 presents the parameter settings for keyword visualization.

Table 1

Parameter settings

Parameter	Value
Minimum number of occurrences	1
Criterion met	1931 keywords

Parameter settings The resulting network contains 500 nodes and 4000 links, as shown in Fig. 3, which also presents the main fields for forecasting coronavirus. Stronger links in the network visualization are indicated by thicker lines [10]. It can be seen in Fig. 3 that Coronavirus, prediction, epidemic, human, and forecasting have connection links. Moreover, Fig. 3 presents a network visualization based on keywords, where Coronavirus, prediction, epidemic, human, statistical analysis, quarantine, hospitalization, mortality, and weather are among the top keywords on which researchers focused. In Fig. 3, the cluster is indicated by color, and the bigger circle represents the keyword that is used most.

Fig. 3

Networks across the links (keywords analysis)

Networks across the links (keywords analysis) Figure 4 presents the detailed analysis of the sum of works cited and the number of records versus affiliations. The filtered numbers of records and works cited include a minimum of 1 and 18, respectively.

Fig. 4

A detailed analysis (sum of works cited and number of records vs. Affiliations)

Generic illustrations

Several epidemic models have been used by researchers to estimate the outbreak in the short and long term [11-14]. The most applied epidemic models are the susceptible, infected, and recovered (SIR) model and susceptible, exposed, infected, and recovered (SEIR). The SIR model [15, 16] is described as shown in Fig. 5:

Fig. 5

Susceptible, infected, and recovered (SIR) model

Susceptible, infected, and recovered (SIR) model In terms of mathematical modeling, the SIR model is shown below [17]:where S is the number of individuals susceptible at time t; I is the number of infected individuals at time t; R is the number of recovered individuals at time t; and and are the transmission rate and rate of recovery (removal), respectively. The SEIR model [18] is similar to the SIR model except that variable E is added for the fraction of individuals that have been infected but are asymptomatic. The SEIR model and the related equations are presented in Fig. 6.

Fig. 6

The susceptible, exposed, infected, and recovered (SEIR) diagram [18]

The susceptible, exposed, infected, and recovered (SEIR) diagram [18] The equations of the SEIR model are defined below:where depicts the protection rate; is the infection rate; is the inverse of the average latent time; represents the inverse of the average quarantine time; are coefficients used in the time-dependent cure rate; and and are coefficients used in the time-dependent mortality rate [18].

Mathematical modeling

Ahmar and del Val [19] used the SutteARIMA method to forecast short-term confirmed cases of COVID-19 and Spain Market Index (IBEX 35). Comparatively, the SutteARIMA method was found to be more suitable for forecasting daily confirmed cases in Spain than the AutoRegressive Integrated Moving Average (ARIMA) based on the mean absolute percentage error (MAPE) values. Al-qaness [20] suggested an improved version of the Adaptive Neuro-Fuzzy Inference System (ANFIS) based on the Flower Pollination Algorithm (FPA) by using the Salp Swarm Algorithm to forecast the number of confirmed cases of COVID-19 in China. The idea is to determine the parameters of the Adaptive Neuro-Fuzzy Inference System using the hybrid of the Flower Pollination and Salp Swarm Algorithms. The performance of FPA was validated by comparing it with the existing modified ANFIS models, such as Particle Swarm Optimization (PSO), genetic algorithm (GA), approximate Bayesian computation (ABC), and FPA. Anastassopoulou and Russo [21] proposed a method for predicting the reproduction number (R0) from the susceptible, infected, recovered, and deceased (SIRD) model and other key parameters in forecasting the spread of the COVID-19 epidemic in China. Chakraborty and Ghosh [22] presented a real-time forecast of confirmed COVID-19 cases for multiple countries as well as a risk assessment of the novel COVID-19 for some profoundly affected countries using the regression tree algorithm. A simple moving average approach was used by [23] to predict COVID-19 confirmed cases in Pakistan. [24] used a five-parameter logistic growth model to reconstruct and forecast the COVID-19 epidemic in the USA; however, the authors claimed the accuracy of their model depends on federal- and state-level policy decisions. Cheng and Burcu [12] introduced a platform, icumonitoring.ch, to provide hospital-level projections for intensive care unit (ICU) occupancy based on SEIR models. The proposed platform could help ICU managers to estimate the need for additional resources and is updated every 3–4 days. Chimmula and Zhang [25] applied long short-term memory (LSTM) networks as a deep learning technique for predicting COVID-19 outbreaks in Canada. Their approach identified the key features for estimating the trends of the pandemic in Canada. A simple ARIMA model was proposed by [26] to estimate registered and recovered cases after a lockdown in Italy. Salgotra and Gandomi [4] established two COVID-19 prediction models based on genetic programming in India. Their results indicate that genetic evolutionary programming models are highly reliable for COVID-19 cases in India. Dil and Dil [11] used the SIR model to forecast confirmed COVID-19 cases in the Eastern Mediterranean region, namely Iran, Iraq, Saudi Arabia, United Arab Emirates, Lebanon, Egypt, and Pakistan, with a special focus on Pakistan. A simple SIRD model was proposed by [14] to predict COVID-19 outbreaks in China, Italy, and France and estimate healthcare facility necessities, such as ventilation units.

Criteria evaluation

Forecasting confirmed cases, risk assessment, stock market, ICU beds, registered and recovered cases are top criteria in which scholars show heightened interest.

Solution approaches

Several approaches have been addressed by researchers to predict the COVID-19 outbreak [27, 28]. Table 2 presents the solution approaches proposed by researchers for forecasting COVID-19, among which SIR, SEIR, SIRD, and Moving Average are the most popular approaches. Also, some researchers [29, 30] preferred to use hybrid algorithms to enhance the power of forecasting algorithms.

Table 2

Proposed solution approaches for forecasting coronavirus 2019 (COVID-19)

Algorithm
Epidemic models		Time-series models						Nature-inspired algorithms
SIR	[11]	Autoregressive model	Moving average	Autoregressive integrated moving average [19, 22, 26, 30, 31]				Genetic programming [30–34]
			Moving average	Simple moving average [23]
			Other models	[35]
SEIR	[12, 36]	Exponential models		Logistic growth model [24, 37, 38]				Flower pollination algorithm [29]
SIRD	[13, 14]	Deep learning	Long short-term memory (LSTM) networks [25]					Polynomial Neural Network [39]
			Neural network [31, 40]					Ecological Niche models [41]
		Regression methods	[42–44]
		Prophet algorithm	[45]
Phenomenological model	[46]	Other models	Adaptive neuro-fuzzy inference system [29]	Regression tree algorithm [22]	Support vector machine [39, 47]	Iteration method [48]	Support vector Kuhn-tucker [47]

Proposed solution approaches for forecasting coronavirus 2019 (COVID-19)

Autoregressive model

The autoregressive time-series model is known as a useful tool to model dependent data and has been applied to various real-world problems [49-53].

Moving average

In statistics and economics, a moving average is a way to calculate and analyze data by providing a series of averages of various subsets of the dataset [54].

Simple moving average

A simple moving average (SMA) is defined as the unweighted mean of the previous data (in finance) or an equal number of data on either side of a central value (in science or engineering) [54]. An example of an application of a simple moving average in COVID-19 could be found in [23].

Autoregressive integrated moving average (ARIMA)

An autoregressive integrated moving average model is a generalized form of the autoregressive moving average model. As it is well known for forecasting, some researchers have used ARIMA to predict the spread of the new pandemic [31, 55–58].

Two-piece distributions based on the scale

Maleki M et al. [35] proposed an autoregressive time-series model based on two-piece scale mixture normal distribution to predict confirmed and recovered COVID-19 cases. Compared with the standard autoregressive time-series model, the proposed algorithm outperforms others in the forecasting the confirmed and recovered COVID-19 cases around the world.

Exponential models

Exponential models are suitable in the modeling of several phenomena, such as populations, interest rates, and infectious diseases [59].

Logistic functions

One of the famous S-shaped curves is logistic a function with application in biology, chemistry, linguistics, political science, and statistics. [24, 37, 38] provide examples of applications of logistic functions in COVID-19.

Deep learning

Deep learning is a famous branch of machine learning in which the learning process can be supervised, semi-supervised, unsupervised [60-62]. Application of different forms of deep learning in forecasting COVID-19 cases could be found in long short-term memory (LSTM) networks [25, 63], polynomial neural network [39], and neural network [31, 40].

Regression methods

In statistics, regression methods are a set of statistical modeling to estimate the relationship between a dependent variable and independent variable(s) [64, 65]. As a powerful tool to forecast the pandemic, various regression methods have been addressed by researchers against COVID-19 [42–44, 66, 67].

Prophet algorithm

The Prophet algorithm is an open-source tool that works well with time-series data that have seasonal effects. The main goal of the algorithm, developed by Facebook’s Data Science team, is business forecasting [68, 69]. The Prophet algorithm has proven to be robust in dealing with missing data [70].

Genetic programming

Genetic programming (GP) is a nature-inspired algorithm, where the keys include program representation (tree structure), selection, crossover, and mutation [71]. Some examples of GP in COVID-19 are available in [32-34].

SIR

One of the most applied epidemic models is the susceptible, infected, and recovered (SIR) model [15, 16]. Variables S, I, and R are defined in Eqs. 1–3.

SEIR

The SEIR model [18] is an extended version of the SIR model, which considers an additional parameter, E, representing the fraction of individuals that have been infected but are asymptomatic.

SIRD

The SIRD model differentiates between recovered individuals (those who have survived the disease and are now immune) and deceased individuals [13, 14].

Strengths and weaknesses of forecasting models

As discussed earlier, many machine learning algorithms have been used to forecast the new pandemic in different places of the world. Figure 7 presents the percentage of contribution of different solution approaches applied in forecasting COVID-19 confirmed cases (there are 925 indexed articles in Scopus as of October 10, 2020). As it is clear from Fig. 7, deep learning, compartmental models, and other methods have the most contributions, while the Prophet algorithm, as a new branch of machine learning, has the least contribution.

Fig. 7

% of contribution of different solution approaches applied in the forecasting of COVID-19 confirmed cases

% of contribution of different solution approaches applied in the forecasting of COVID-19 confirmed cases Machine learning algorithms exhibit many pros and cons, which are described in Table 3.

Table 3

Strengths and weaknesses of proposed machine learning algorithms

Algorithm	Strength	Weakness
Artificial neural network	Could access several training algorithms [72]	The nature of being a black box [72], overtraining [73]
Support vector machine	Can avoid overfitting and defining a convex optimization problem [72]	Choice of the kernel as well as speed and size of training and testing sets [72]
Compartmental models (SIR, SEIR, SIRD, etc.)	Predict how the disease spreads Present the effects of public health interventions on the outcome of the pandemic [74–78]	The proposed models are mostly deterministic and work with large populations [79]
Nature-inspired algorithms (genetic programming)	Intelligent search [80] Can integrate with certain decomposition algorithms [81]	Several parameters should be set by the decision-makers The algorithms are approximate and usually nondeterministic [82]
Prophet algorithm	Are robust in dealing with missing data [70]	It is hard to use the algorithm for Multiplicative models Predefined format is needed for data before using the algorithm
ARIMA	Works for seasonal and nonseasonal models Outliers can be handled well	Changes in observations and changes in model specification make the model unstable [83]
Deep learning	Results comparable to human expert performance [84, 85]	Requires large amounts of data The training process is expensive

Strengths and weaknesses of proposed machine learning algorithms Predict how the disease spreads Present the effects of public health interventions on the outcome of the pandemic [74-78] Intelligent search [80] Can integrate with certain decomposition algorithms [81] Several parameters should be set by the decision-makers The algorithms are approximate and usually nondeterministic [82] It is hard to use the algorithm for Multiplicative models Predefined format is needed for data before using the algorithm Works for seasonal and nonseasonal models Outliers can be handled well Requires large amounts of data The training process is expensive

Conclusion and discussion

At the time of writing, COVID-19 had spread to more than 200 countries worldwide with more than 36 million confirmed cases. Several works have been released in the field for predicting global outbreaks. This study aimed to review the most important forecasting models for COVID-19 and provides a short analysis of published literature. This paper highlighted the most important subject areas by keywords analysis. Moreover, several criteria were identified that could help researchers for future works. Also, this paper recognized the most useful models that researchers have applied for predicting this pandemic. Furthermore, this paper may help researchers to identify important gaps in the research area and, subsequently, develop new machine learning models for forecasting the COVID-19 cases. A detailed scientometric analysis was performed as an influential tool for use in bibliometric analyses and reviews. For this aim, keywords and subject areas are discussed, while the classification of forecasting models, criteria evaluation, and comparison of solution methods are provided in the second section of the work. This study describes some key arguments that are worthy of further discussion: In terms of the subject area, medicine, biochemistry, and mathematics are most discussed areas addressed by scholars. In terms of keywords analysis, trends present that studies on COVID-19 will increase in the next few months. Moreover, Coronavirus, prediction, epidemic, human, statistical analysis, quarantine, hospitalization, mortality, and weather instances are the most interesting keywords for scholars. Several other criteria have been used by researchers in forecasting, including: Confirmed cases, risk assessment, stock market, ventilation units, ICU beds, estimated registered and recovered cases. Several countries, including China, Pakistan, France, Italy, USA, UK, Brazil, Nigeria, Iran, Germany, and India, were addressed as case studies. Among the epidemic models, deep learning, SIR, and SEIR are the top models that were used by researchers. Hybrid algorithms are used to enhance the power of forecasting approaches. The majority of studies are deterministic approaches, while there is an urgent need to provide robust approaches for tackling uncertain situations. For future research directions, a comprehensive review in other fields, such as artificial intelligence (AI) and deep learning, is encouraged. Moreover, more studies addressing the development of novel and hybrid approaches to forecast the pandemic should be investigated. Furthermore, at the time of writing this paper, we had access to only a limited number of published articles by Scopus and WOS. However, the most important parts of this paper are the keywords and scientometric analysis that consider the whole database, from which we chose some examples of published articles for review. Therefore, a more comprehensive review in the research area is suggested.

38 in total

Review 1. Deep learning in neural networks: an overview.

Authors: Jürgen Schmidhuber
Journal: Neural Netw Date: 2014-10-13

2. A machine learning forecasting model for COVID-19 pandemic in India.

Authors: R Sujath; Jyotir Moy Chatterjee; Aboul Ella Hassanien
Journal: Stoch Environ Res Risk Assess Date: 2020-05-30 Impact factor: 3.379

3. Coronavirus Disease 2019 (COVID-19): Forecast of an Emerging Urgency in Pakistan.

Authors: Rabia M Chaudhry; Asif Hanif; Muhammad Chaudhary; Sadia Minhas; Khalid Mirza; Tahira Ashraf; Syed A Gilani; Muhammad Kashif
Journal: Cureus Date: 2020-05-28

4. Early forecasting of the potential risk zones of COVID-19 in China's megacities.

Authors: Hongyan Ren; Lu Zhao; An Zhang; Liuyi Song; Yilan Liao; Weili Lu; Cheng Cui
Journal: Sci Total Environ Date: 2020-04-26 Impact factor: 7.963

5. Time Series Analysis and Forecast of the COVID-19 Pandemic in India using Genetic Programming.

Authors: Rohit Salgotra; Mostafa Gandomi; Amir H Gandomi
Journal: Chaos Solitons Fractals Date: 2020-05-30 Impact factor: 5.944

6. Reconstructing and forecasting the COVID-19 epidemic in the United States using a 5-parameter logistic growth model.

Authors: Ding-Geng Chen; Xinguang Chen; Jenny K Chen
Journal: Glob Health Res Policy Date: 2020-05-15

7. A novel IDEA: The impact of serial interval on a modified-Incidence Decay and Exponential Adjustment (m-IDEA) model for projections of daily COVID-19 cases.

Authors: Ben A Smith
Journal: Infect Dis Model Date: 2020-05-30

8. Data-based analysis, modelling and forecasting of the COVID-19 outbreak.

Authors: Cleo Anastassopoulou; Lucia Russo; Athanasios Tsakris; Constantinos Siettos
Journal: PLoS One Date: 2020-03-31 Impact factor: 3.240

9. Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020.

Authors: K Roosa; Y Lee; R Luo; A Kirpich; R Rothenberg; J M Hyman; P Yan; G Chowell
Journal: Infect Dis Model Date: 2020-02-14

10. Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions.

Authors: Saleh I Alzahrani; Ibrahim A Aljamaan; Ebrahim A Al-Fakih
Journal: J Infect Public Health Date: 2020-06-08 Impact factor: 3.718

35 in total

1. Multi-criteria decision making of COVID-19 vaccines (in India) based on ranking interpreter technique under single valued bipolar neutrosophic environment.

Authors: Totan Garai; Harish Garg
Journal: Expert Syst Appl Date: 2022-07-18 Impact factor: 8.665

2. Differential evolution and particle swarm optimization against COVID-19.

Authors: Adam P Piotrowski; Agnieszka E Piotrowska
Journal: Artif Intell Rev Date: 2021-08-19 Impact factor: 9.588

3. A runtime alterable epidemic model with genetic drift, waning immunity and vaccinations.

Authors: Wayne M Getz; Richard Salter; Ludovica Luisa Vissat; James S Koopman; Carl P Simon
Journal: J R Soc Interface Date: 2021-11-24 Impact factor: 4.118

4. Post-Coronavirus Disease-2019 (COVID-19): Toward a Severe Multi-Level Health Crisis?

Authors: Abdelaziz Ghanemi; Mayumi Yoshioka; Jonny St-Amand
Journal: Med Sci (Basel) Date: 2021-11-08

5. Evaluation of government strategies against COVID-19 pandemic using q-rung orthopair fuzzy TOPSIS method.

Authors: Nurşah Alkan; Cengiz Kahraman
Journal: Appl Soft Comput Date: 2021-06-30 Impact factor: 6.725

6. Cross-Validation Comparison of COVID-19 Forecast Models.

Authors: Mintodê Nicodème Atchadé; Yves Morel Sokadjo; Aliou Djibril Moussa; Svetlana Vladimirovna Kurisheva; Marina Vladimirovna Bochenina
Journal: SN Comput Sci Date: 2021-05-26

7. Impact of vaccine supplies and delays on optimal control of the COVID-19 pandemic: mapping interventions for the Philippines.

Authors: Carlo Delfin S Estadilla; Joshua Uyheng; Elvira P de Lara-Tuprio; Timothy Robin Teng; Jay Michael R Macalalag; Maria Regina Justina E Estuar
Journal: Infect Dis Poverty Date: 2021-08-09 Impact factor: 4.520

8. Improved manta ray foraging optimization for multi-level thresholding using COVID-19 CT images.

Authors: Essam H Houssein; Marwa M Emam; Abdelmgeid A Ali
Journal: Neural Comput Appl Date: 2021-07-07 Impact factor: 5.606

9. The forecast of COVID-19 spread risk at the county level.

Authors: Murtadha D Hssayeni; Arjuna Chala; Roger Dev; Lili Xu; Jesse Shaw; Borko Furht; Behnaz Ghoraani
Journal: J Big Data Date: 2021-07-07

10. Deep learning-based bird eye view social distancing monitoring using surveillance video for curbing the COVID-19 spread.

Authors: Raghav Magoo; Harpreet Singh; Neeru Jindal; Nishtha Hooda; Prashant Singh Rana
Journal: Neural Comput Appl Date: 2021-07-02 Impact factor: 5.606