Mahdi Kalantari1. 1. Department of Statistics, Payame Noor University, 19395-4697, Tehran, Iran.
Abstract
Coronavirus disease 2019 (COVID-19) is a pandemic that has affected all countries in the world. The aim of this study is to examine the potential advantages of Singular Spectrum Analysis (SSA) for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19, which are the three main variables of interest. This paper contributes to the literature on forecasting COVID-19 pandemic in several ways. Firstly, an algorithm is proposed to calculate the optimal parameters of SSA including window length and the number of leading components. Secondly, the results of two forecasting approaches in the SSA, namely vector and recurrent forecasting, are compared to those from other commonly used time series forecasting techniques. These include Autoregressive Integrated Moving Average (ARIMA), Fractional ARIMA (ARFIMA), Exponential Smoothing, TBATS, and Neural Network Autoregression (NNAR). Thirdly, the best forecasting model is chosen based on the accuracy measure Root Mean Squared Error (RMSE), and it is applied to forecast 40 days ahead. These forecasts can help us to predict the future behaviour of this disease and make better decisions. The dataset of Center for Systems Science and Engineering (CSSE) at Johns Hopkins University is adopted to forecast the number of daily confirmed cases, deaths, and recoveries for top ten affected countries until October 29, 2020. The findings of this investigation show that no single model can provide the best model for any of the countries and forecasting horizons considered here. However, the SSA technique is found to be viable option for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19 based on the number of times that it outperforms the competing models.
Coronavirus disease 2019 (COVID-19) is a pandemic that has affected all countries in the world. The aim of this study is to examine the potential advantages of Singular Spectrum Analysis (SSA) for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19, which are the three main variables of interest. This paper contributes to the literature on forecasting COVID-19 pandemic in several ways. Firstly, an algorithm is proposed to calculate the optimal parameters of SSA including window length and the number of leading components. Secondly, the results of two forecasting approaches in the SSA, namely vector and recurrent forecasting, are compared to those from other commonly used time series forecasting techniques. These include Autoregressive Integrated Moving Average (ARIMA), Fractional ARIMA (ARFIMA), Exponential Smoothing, TBATS, and Neural Network Autoregression (NNAR). Thirdly, the best forecasting model is chosen based on the accuracy measure Root Mean Squared Error (RMSE), and it is applied to forecast 40 days ahead. These forecasts can help us to predict the future behaviour of this disease and make better decisions. The dataset of Center for Systems Science and Engineering (CSSE) at Johns Hopkins University is adopted to forecast the number of daily confirmed cases, deaths, and recoveries for top ten affected countries until October 29, 2020. The findings of this investigation show that no single model can provide the best model for any of the countries and forecasting horizons considered here. However, the SSA technique is found to be viable option for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19 based on the number of times that it outperforms the competing models.
The outbreak of coronavirus disease 2019 (COVID-19) in the world is an important public health concern. World Health Organization (WHO) declared COVID-19 a pandemic on 11 March 2020. The rapid spread of this virus has affected over 200 countries. Currently, the number of infected and deceased patients is still increasing, with a very high contagion rate, in almost all the affected countries. This disease seriously threatens human health and has significant effect on various fields such as economic development, tourism, social relations, life style and international politics.Recently, several studies have been conducted to model COVID-19 pandemic using various methods. For example, a Long Short Term Memory for Data Training-SAE (LSTM-SAE) network model has been used as a preliminary study in [1] and it served as a baseline for testing other ANN types. Then, the Modified Auto-Encoder (MAE) networks have been applied as final models to forecast COVID-19 dynamics in Brazil. Also, in order to predict the number of positive reported cases for 32 states and union territories of India, deep learning-based models have been used in [2]. In [3], a simple iteration method has been used for forecasting that needs only the daily values of confirmed cases as input. In [4], first, the Generalized Additive Models (GAMs) have been applied to estimate three parameters of time-dependent transmission rate, time-dependent recovery rate, and time-dependent death rate from COVID-19 outbreak in China, and then, using the number of COVID-19infections in Iran, the number of patients were predicted in Iran. A comparative study of five deep learning methods has been proposed in [5] to forecast the number of new cases and recovered cases. Simple Recurrent Neural Network (RNN), LSTM, Bidirectional LSTM, Gated Recurrent Units (GRUs) and Variational AutoEncoder (VAE) algorithms have been applied in this reference for global forecasting of COVID-19 cases based on the data of Italy, Spain, France, China, USA, and Australia. In [6], a hybrid model including two-dimensional (2D) curvelet transformation, Chaotic Salp Swarm Algorithm (CSSA) and deep learning technique have been developed to determine the patientinfected with coronavirus from X-ray images. In the proposed model, 2D curvelet transformation was applied to the images obtained from the patient’s chest X-ray radiographs and a feature matrix was formed using the obtained coefficients. The coefficients in the feature matrix were optimized using the CSSA and COVID-19 disease was diagnosed by the EfficientNet-B0 model, which is one of the deep learning methods. For more details on other new chaotic methods see [7], [8]. Further studies considering the forecast of the pandemic can be found in [9], [10], [11], [12], [13], [14], [15], [16]. While the review of all references concerning COVID-19 is beyond the scope of this paper, an interested reader is refereed to [17] to find an overall comprehensive study on analysis of several forecasting models available in the literature and their classification, challenges of these models, and control measures.Recently, many attempts have been made with the purpose of forecasting COVID-19 spread using time series models. For example, exponential smoothing family has been used in [18] to forecast daily cumulative confirmed, deaths, and recovered cases from COVID-19. The linear trend model and double exponential smoothing techniques have been tested in [19] in order to forecast COVID-19 spread in Malaysia, Thailand, and Singapore. An ARIMA modelling has been utilized in [20] to forecast total infected cases of USA, Brazil, India, Russia, and Spain from 15th February to June 30, 2020. A Vector Autoregressive model has been used in [21] to forecast new daily confirmed cases, deaths and recovered cases in Pakistan for ten days. A Bayesian time series analysis has been conducted in [22] using daily data of COVID-19 in Japan until March 31, 2020. A new hybrid model of discrete wavelet decomposition and ARIMA models have been developed in [23] to make one month ahead prediction of death cases in Italy, Spain, France, the United Kingdom (UK), and the United States of America (USA). More information about other time series models used for forecasting COVID-19 disease can be found in [24], [25], [26], [27], [28], [29].Despite many attempts to model COVID-19 pandemic, few researches to the best of our knowledge have utilized Singular Spectrum Analysis (SSA) technique to forecast COVID-19. We found that a modified SSA approach has been used in [30] to predict COVID-19 pandemic in Saudi Arabia. Also, the recurrent forecasting method of SSA has been applied in [31] to provide predictive modelling of COVID-19 cases in Malaysia. The SSA has been a rapidly developing method of time series analysis. This non-parametric technique is widely used in a variety of fields such as signal processing, finance, economics, image processing, meteorology, engineering, medicine, biology and genetics. The main characteristics of SSA are neither a parametric model nor stationary condition have to be assumed for a time series. Whilst the review of all applications of SSA is beyond the scope of this paper, we refer interested readers to [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49]. For a whole and detailed information on the theory and applications of SSA, see [50], [51]. A comprehensive review of SSA and description of its modifications and extensions can be found in [52]. Due to the great potential of SSA to forecast future data, we believe that this method can provide a reliable forecast for COVID-19 time series data and therefore, this motivates us to apply the SSA.The number of confirmed cases, deaths, and recoveries caused by COVID-19 are the three main variables of interest that have been reported every day. Accurate forecast of these variables is crucial and it can allow us to better understand the global impact of corona virus and correct planning in the future, such as estimating the required number of hospital beds or changing the social distancing and isolation rules. This paper contributes to the literature on forecasting COVID-19 pandemic in several ways. Firstly, the optimal version of recurrent and vector forecasting methods of SSA are used, for the first time, to predict the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. Secondly, in order to evaluate the potential of SSA for forecasting the three main variables, the performance of SSA is compared with other commonly used time series forecasting techniques including Autoregressive Integrated Moving Average (ARIMA), Fractional ARIMA (ARFIMA), Exponential Smoothing, TBATS, and Neural Network Autoregression (NNAR). Thirdly, the best forecasting model is chosen based on the accuracy measure Root Mean Squared Error (RMSE), which is a commonly used criterion in time series forecasting literature, and it is applied to forecast 40 days ahead. These forecasts may help government and other agencies to change their strategies and to optimize the available resources according to the forecasted situation. Owing to the broad spread of this virus around the world, analysing the data of all countries is a difficult and time consuming task. Therefore, we focus only on the first ten countries in terms of the number of cumulative confirmed cases. At the time of writing this paper, 29 October 2020, these countries include USA, India, Brazil, Russia, France, Spain, Argentina, Colombia, UK, and Mexico.The remainder of this paper is organized as follows. Section 2 briefly presents a review of SSA. The description of recurrent and vector forecasting are outlined in this section, along with the algorithm of calculating optimal parameters of SSA. In Section 3, the theoretical background and general scheme of other time series forecasting techniques utilized in this study are briefly discussed. The source of data, which are used in this investigation, are explained in Section 4. Section 5 is dedicated towards comparing the performance of SSA with other forecasting methods. In addition, 40 days ahead point forecasts for the number of confirmed cases, deaths, and recoveries are presented in this section. The findings of this study are discussed in Section 6. Finally, the conclusion and future works are given in Section 7.
Review of SSA
The SSA technique has various modifications and extensions, which some of them are explained in [53]. The most fundamental version of the SSA is called Basic SSA. Here, we briefly explain the theory underlying Basic SSA and in doing so we mainly follow [51], [53]. Also, two types of SSA forecasting methods namely Recurrent forecasting (R-forecasting) and Vector forecasting (V-forecasting) are briefly reviewed. It is noteworthy that there are many software applications which are applied in SSA such as Caterpillar-SSA and SAS/ETS. In this research, we apply the free available R package Rssa to conduct SSA stages and to obtain recurrent and vector forecasting. More details on this package can be found in [54], [55], [56].
SSA Stages
The SSA technique consists of two complementary stages: Decomposition and Reconstruction. Each of these stages includes two separate steps. At the decomposition stage, a time series is decomposed into several interpretable components such as trend, seasonal and cyclical components, which enables us to signal extraction and noise reduction. At the reconstruction stage, interpretable components are reconstructed, which can be used to forecast new data points.
Stage 1: Decomposition (Embedding & Singular Value Decomposition)
In embedding step, the observed time series is transformed into the matrix whose columns comprise , where and . The matrix is called the trajectory matrix. This matrix is a Hankel matrix in the sense that all the elements on the anti-diagonals are equal. This step has only one parameter which is called the window length. The window length is commonly chosen such that where is the length of the time series .In Singular Value Decomposition (SVD) step, the trajectory matrix is decomposed into where and are orthogonal and is a diagonal matrix. The diagonal entries of the matrix are called the singular values of and denoted by in decreasing order of magnitude . The columns of are called left singular vectors and those of are called right singular vectors. If then the SVD of the trajectory matrix can be written as follows:where
is the th left singular vector and is the th right singular vector (). It is also well known that the left singular vectors of are the eigenvectors of . The collection () is called the th eigentriple of the SVD.
The grouping step splits the elementary matrices in (1) into several groups and sums the matrices within each group. Let be the subset of indices . Then, the resultant matrix corresponding to the group I is defined as that is, summing the matrices within each group. With the SVD of the split of the set of indices into the disjoint subsets corresponds to the following decomposition:The main goal of diagonal averaging is to transform each matrix of the grouped matrix decomposition (2) into a Hankel matrix, which can subsequently be converted into a new time series of length . Let be an matrix with elements
. By diagonal averaging, the matrix is transferred into the Hankel matrix with the elements over the anti-diagonals using the following formula:where and denotes the number of elements in the set . By applying diagonal averaging (3) to all the matrix components of (2), the following expansion is obtained: where
. This is equivalent to the decomposition of the initial series into a sum of m series: where corresponds to the matrix . In this paper, we denote the number of leading eigentriples corresponding to the signal (noise-free time series) by .
Recurrent forecasting
Suppose is the chosen set of eigentriples attained at the grouping step of SSA. Let be the corresponding eigenvectors of chosen eigentriples, be the vector consisting of the first components of the vector
be the last component of the vector
and be the time series reconstructed by set .The recurrent forecasting algorithm, which we refer to as R-SSA, is summarized as follows:The time series is defined bywhere the vector of coefficients is defined as:The numbers are the step-ahead recurrent forecasts.
Vector forecasting
Consider the matrix where the matrix consists of column vectors and is defined in (5). The vector forecasting algorithm, which we refer to as V-SSA, is formulated as follows:Define the vector as:where
and is the vector consisting of the last components of the vector .By constructing the matrix and making its diagonal averaging the series is obtained.The numbers are the step ahead vector forecasts.
Choosing and
The window length (), which is used in the embedding step of SSA, plays a pivotal role in the SSA technique; because the whole procedure of SSA depends upon this parameter. Another important parameter is the number of leading eigentriples () that is required to reconstruct and forecast the signal (noise-free time series). In order to find the optimal values of and we apply a cross-validation procedure. This method of parameter choice is based on the minimization of Root Mean Squared Error (RMSE) within the validation (test) period for a given forecasting horizon (i.e. the number of periods for forecasting). In Algorithm 1
, the details of finding optimal are described:
Algorithm 1
Calculation of optimal
Calculation of optimal
Other forecasting methods
In this section, the other commonly used time series forecasting methods applied in this investigation are briefly explained.
Autoregressive integrated moving average (ARIMA)
The ARIMA technique is one of the most established and widely used time series forecasting methods. A non-seasonal ARIMA model is given bywhere is a time series, is the backshift operator defined as
is a white noise process with mean zero, and is the mean of
[57]. Also, the seasonal ARIMA model is written aswhere is equal to the number of observations per year, and .Selecting an appropriate model order, that is the values and is a major task in ARIMA modelling. In this paper, we use the auto.arima function from the forecast package of R software to find the best ARIMA model automatically and estimate its parameters. For more information on how this function works and examples of applications, see [58].
Fractional ARIMA (ARFIMA)
If the time series exhibits a long-range dependence, then the parameter can be allowed to have non-integer values in an ARIMA model, which is also called an ARFIMA model. We apply the arfima function from the forecast package to find automatically the best ARFIMA model. This function selects and and estimates the parameters of model using an algorithm proposed in [58], whilst the algorithm provided in [59] is applied to estimate the parameters including .
Exponential smoothing (ETS)
Exponential smoothing methods are among the most widely used forecasting procedures in practice. These were originally classified by Pegels’ taxonomy [60] and later extended by Gardner [61], modified by Hyndman et al. [62], and extended again by Taylor [63], giving a total of fifteen methods. It has shown that the exponential smoothing family has good forecast accuracy over several forecasting competitions [64], [65], [66] and is especially suitable for short time series. Some of well-known methods such as simple (or single) exponential smoothing, Holt’s linear method, additive and multiplicative Holt-Winters’ methods are special cases of exponential smoothing techniques. In order to refer to the three components error, trend, and seasonality in exponential smoothing methods; the notation ETS is proposed in [58] and we also use this notation. The ETS models can capture a variety of trend and seasonal structures (additive or multiplicative) and combinations of those. A detailed description of ETS can be found in [67] and is therefore not repeated here. We apply the ets function from the forecast package to find automatically the best ETS model. This function implement the innovations state space modelling framework described in [67] for parameter estimation and forecasting.
TBATS Model
An innovations state space modelling framework has been introduced in [68] for forecasting complex seasonal time series such as those with multiple seasonal periods, high-frequency seasonality, non-integer seasonality, and dual-calendar effects. This model, which is called BATS, is an exponential smoothing state space model with Box-Cox transformation, ARMA errors, trend and seasonal components. This model is a generalization of the traditional seasonal innovations models to allow for multiple seasonal periods. The notation BATS is an acronym for Box–Cox transform, ARMA errors, Trend, and Seasonal components. In TBATS model, the trigonometric representation of seasonal components based on Fourier transform is used and the initial T in the notation TBATS stands for trigonometric. For more information on the theory and applications of TBATS, see [68]. The tbats function is made available through the forecast package to fit TBATS model to a time series.
Neural network autoregression (NNAR)
There has been an increasing interest in using neural networks to model and forecast time series data. A neural network can be considered as a network of neurons which are arranged in layers. The predictors (or inputs) form the bottom layer, and the forecasts (or outputs) form the top layer. There may also be intermediate layers containing hidden neurons
[57]. A linear regression is equivalent to the networks containing no hidden layers; however, the neural network becomes non-linear by adding an intermediate layer with hidden neurons [57]. This is known as a multilayer feed-forward network, where each layer of nodes receives inputs from the previous layers. Let us here briefly present some details of Neural Network Autoregression (NNAR) model and in doing so we mainly follow [57]. In the NNAR model, the lagged values of the time series can be used as inputs to a neural network. The notation NNAR() is used in [57] to indicate feed-forward networks with one hidden layer, lagged inputs and nodes in the hidden layer. In addition, a seasonal NNAR model has the notation NNAR to indicate as inputs with neurons in the hidden layer. The nnetar function in the forecast package fits an NNAR model to time series data. In this function, the values of and are selected automatically if they are not specified. More details on NNAR model and its applications can be found in [57].
Data sources
The accuracy of forecasting largely depends on the quality of data and requires ample historical data. There are several packages of free-available R software that provide data related to COVID-19. For example, nCov2019 contains not only Chinese data but also data on other countries and regions. Furthermore, conronavirus provides the dataset of Center for Systems Science and Engineering (CSSE) at Johns Hopkins University together with a dashboard. Additional R related resources on COVID-19 can be found in [69].This paper focuses on top ten countries affected by COVID-19, namely, USA, India, Brazil, Russia, France, Spain, Argentina, Colombia, UK, and Mexico. In this study, we use the R package tidycovid19 in order to analyse the data of the number of confirmed cases, deaths, and recoveries reported by Johns Hopkins University CSSE [70]. The main advantage of this package is to provide transparent access to various data sources at the country-day level, including data on governmental interventions and on behavioural response of the public. This package facilitates the download of COVID-19 related data directly from authoritative sources, including as follows [71]:The CSSE team at Johns Hopkins UniversityThis data has developed to a standard resource for researchers and the general audience interested in assessing the global spreading of the virus. The data is provided at country and sub-country levels.European Centre for Disease Prevention and Control (ECDC)The data is updated daily and contains the latest available public data on the number of new COVID-19 cases reported per day and per country.Testing data collected by the ’Our World in Data’ teamThis team systematically collects data on COVID-19 testing from multiple national sources.Assessment Capacities Project (ACAPS)These data contain government measures dataset provided by ACAPS and allow researchers to study the effect of non-pharmaceutical interventions on the development of the virus.Oxford COVID-19 Government Response TrackerAn alternative data source for governmental interventions.Apple Mobility Trends ReportsThe data is provided by Apple at country and sub-country levels.Google COVID-19 Community Mobility Reports dataThis data is available at the country, regional and U.S. county level.Google TrendsIt presents data on the search volume for the term “coronavirus”. This data can be used to assess the public attention to COVID-19 across countries and over time within a given country. The data is available at the country, regional and city level but availability varies across countries.World BankThese data contain country level information provided by the World Bank and allow researchers to calculate per capita measures of the virus spread. Also, these data can help researchers to assess the association of macro-economic variables with the development of the virus.The data of above-mentioned sources can be downloaded separately or in one merged data frame using specific download functions in the package. Additionally, a function and shiny app are given in this package to visualize the country-level spread of COVID-19. Despite all the advantages of this package, it has at least one drawback. If the cumulative data of confirmed cases, deaths, and recoveries are transformed into daily data, some negative data are obtained that are apparently irrational. In order to solve this problem, first, we considered the negative values and outliers as missing data. Then, these missing values were imputed by Kalman Smoothing method via na_kalman function from imputeTS package. For a detailed information on this package see [72].Fig. 3
shows a choropleth world map of the country-level COVID-19 spread based on the number of confirmed cases (cumulative) until 29 October 2020.
Fig. 3
COVID-19 confirmed cases (cumulative) as of October 29, 2020.
The black circles are training sets, the red squares are test sets and other points are ignored (). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)The black circles are training sets, the red squares are test sets and other points are ignored (). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)COVID-19 confirmed cases (cumulative) as of October 29, 2020.The time series plots of daily confirmed cases are presented in Fig. 4
for ten countries as of October 29, 2020. Similar plots for the number of deaths and recoveries are depicted in Figs. 5
and 6
. As can be seen in Fig. 4, the number of confirmed cases have a periodic pattern in some countries such as USA, Brazil, Argentina, and Mexico. In addition, there is an obvious upward trend in the number of confirmed cases of USA, Russia, France, Spain, Argentina, and UK. However, it seems that the number of confirmed cases tend downwards in India.
Fig. 4
The time series plots of daily confirmed cases as of October 29, 2020.
Fig. 5
The time series plots of daily deaths as of October 29, 2020.
Fig. 6
The time series plots of daily recovered cases as of October 29, 2020.
The time series plots of daily confirmed cases as of October 29, 2020.The time series plots of daily deaths as of October 29, 2020.The time series plots of daily recovered cases as of October 29, 2020.It can be concluded from Fig. 5 that there is a periodic structure in the number of deaths in USA, Brazil, Russia, and Mexico. Also, an evident upward trend is visible in the number of deaths in Russia and Argentina.It is apparent from Fig. 6 that the number of recovered cases in Russia have a cyclical fluctuation. In addition, there is an upward trend in the number of recovered cases of USA, France, and Argentina. It is noteworthy that the number of recovered cases in Spain has been reported zero after 18 May 2020, which it seems irrational. Consequently, we ignore this dataset and do not provide point forecasts for the number of recovered cases in Spain.In order to provide a better understanding on the nature of the confirmed cases data, some descriptive statistics of the number of confirmed cases are reported for ten countries in Table 1
. These are the lengths of time series (N), minimum (Min.), mean, median, standard deviation (SD), coefficient of variation (CV) in percent, coefficient of skewness (Skew.), and maximum (Max.). Similar descriptive statistics of the number of deaths and recovered cases are presented in Tables 2
and 3
.
Table 1
Descriptive statistics for confirmed cases series.
Country
N
Min.
Mean
Median
SD
CV
Skew.
Max.
ADF
USA
282
0
31719.624
30817.5
22133.642
70
0.139
88,521
0.964*
India
274
0
29521.354
11480.0
32343.681
110
0.665
97,894
>0.99*
Brazil
247
0
22279.275
21704.0
17611.646
79
0.357
69,074
>0.99*
Russia
273
0
5752.549
5741.0
4325.621
75
0.422
17,418
0.917*
France
280
0
4992.389
1609.0
8345.041
167
2.701
47,637
>0.99*
Spain
272
0
4595.511
2150.0
5160.027
112
1.177
23,580
>0.99*
Argentina
241
0
4746.058
2632.0
5114.032
108
0.798
18,326
0.549*
Colombia
238
0
4403.592
3837.5
3896.218
88
0.319
13,056
0.939*
UK
273
0
3510.769
1297.0
5286.346
151
2.476
26,707
>0.99*
Mexico
245
0
3629.910
4147.0
2415.718
67
-0.167
9556
0.985*
Indicates a non-stationary time series based on the ADF test at .
Table 2
Descriptive statistics for deaths series.
Country
N
Min.
Mean
Median
SD
CV
Skew.
Max.
ADF
USA
282
0
810.837
793.5
653.305
81
0.688
2609
0.918*
India
274
0
434.296
335.0
415.217
96
0.437
1290
>0.99*
Brazil
247
0
644.113
632.0
434.916
68
0.009
1595
>0.99*
Russia
273
0
99.308
105.0
78.554
79
0.450
359
>0.99*
France
280
0
131.250
32.0
210.014
160
2.364
1122
0.811*
Spain
272
0
146.485
47.5
217.083
148
1.912
961
0.551*
Argentina
241
0
113.842
35.0
140.192
123
1.161
515
0.670*
Colombia
238
0
130.134
141.0
114.745
88
0.421
400
0.982*
UK
273
0
168.663
34.0
272.376
161
2.059
1224
0.922*
Mexico
245
0
362.282
342.0
277.708
77
0.299
1092
0.942*
Indicates a non-stationary time series based on the ADF test at .
Table 3
Descriptive statistics for recovered cases series.
Country
N
Min.
Mean
Median
SD
CV
Skew.
Max.
ADF
USA
282
0
12030.206
9181.5
11372.002
95
0.790
48,872
0.272*
India
274
0
26910.128
8584.5
31935.463
119
0.758
101,468
0.868*
Brazil
247
0
19381.733
18303.0
17437.521
90
0.625
76,649
0.950*
Russia
273
0
4320.385
4342.0
3706.346
86
0.310
14,550
>0.99*
France
280
0
515.836
361.0
501.929
97
1.212
2266
0.976*
Spain
272
0
552.853
0.0
1203.639
218
2.145
6399
0.641*
Argentina
241
0
3569.515
934.0
4484.971
126
0.987
14,987
0.961*
Colombia
238
0
3992.454
1644.0
4390.565
110
0.744
16,594
0.093*
UK
273
0
9.018
5.0
12.145
135
1.973
58
0.091*
Mexico
245
0
3159.686
3193.0
2404.058
76
0.254
10,915
0.959*
Indicates a non-stationary time series based on the ADF test at .
Descriptive statistics for confirmed cases series.Indicates a non-stationary time series based on the ADF test at .Descriptive statistics for deaths series.Indicates a non-stationary time series based on the ADF test at .Descriptive statistics for recovered cases series.Indicates a non-stationary time series based on the ADF test at .The skewness coefficient indicates that all time series considered in this study are right skewed, except the number of confirmed cases in Mexico. This information tells us that highly right skewed time series have a high probability for extreme values. This firstly suggest that it is more appropriate to consider median instead of mean as a central tendency measure for all majority of the skewed series. Secondly, it is better to apply the coefficient of variation criterion to compare the variability between countries. The last column in Table 1, Table 2, Table 3 shows the p-value of Augmented Dickey-Fuller (ADF) unit root test, which is one of the most commonly used unit root tests in the literature. It is used for testing a null hypothesis that an observable time series has a unit root against the alternative of stationary [73]. The results of ADF test, which are obtained using the function adf.test from the R package tseries
[74], provide a sound evidence that all of the time series used here are non-stationary. The skewness and non-stationary structure of time series may have destructive effect on the forecasting results of linear time series models such as ARIMA.It should be noted that the lengths of time series are different because the time of the first observation (or starting time) of each time series is different from other series. The starting time is defined as the first time that confirmed cases were reported by governments. Table 4
lists the starting time of series for ten countries.
Table 4
Starting time of series for ten countries.
Country
Starting time
USA
22-01-2020
India
30-01-2020
Brazil
26-02-2020
Russia
31-01-2020
France
24-01-2020
Spain
01-02-2020
Argentina
03-03-2020
Colombia
06-03-2020
UK
31-01-2020
Mexico
28-02-2020
Starting time of series for ten countries.
Empirical results
In this section, the performance of R-SSA, V-SSA and other time series forecasting methods reviewed in Section 3 are evaluated by applying them to confirmed cases, deaths, and recoveries described in Section 4. The accuracy of forecasting results are measured using RMSE. In order to compute the RMSE of each forecasting method corresponding to the forecasting horizon (i.e. defined in (6)), Steps 1–5 of Algorithm 1 are used. It is noteworthy that the optimal is applied to produce R-SSA and V-SSA forecasting. In Table 5
, the optimal are reported for confirmed, deaths, and recovered series of the ten countries.
Table 5
Optimal for SSA forecasting.
Country
Time series
Forecasting method
Forecasting horizon (h)
7
14
20
30
40
USA
confirmed
R-SSA
(35,5)
(43,6)
(43,6)
(43,6)
(7,2)
V-SSA
(36,5)
(45,6)
(43,6)
(9,2)
(8,2)
deaths
R-SSA
(73,12)
(73,12)
(70,11)
(72,12)
(73,12)
V-SSA
(80,12)
(80,12)
(80,12)
(77,14)
(82,13)
recovered
R-SSA
(11,2)
(6,1)
(5,1)
(5,1)
(5,1)
V-SSA
(43,6)
(7,1)
(7,1)
(7,2)
(6,1)
India
confirmed
R-SSA
(15,5)
(14,3)
(14,3)
(7,2)
(7,2)
V-SSA
(15,5)
(15,5)
(11,2)
(2,1)
(2,1)
deaths
R-SSA
(91,6)
(91,2)
(78,5)
(76,5)
(91,4)
V-SSA
(89,12)
(77,9)
(91,12)
(78,8)
(74,9)
recovered
R-SSA
(2,1)
(2,1)
(2,1)
(2,1)
(2,1)
V-SSA
(2,1)
(2,1)
(2,1)
(2,1)
(7,2)
Brazil
confirmed
R-SSA
(25,5)
(25,5)
(25,5)
(25,5)
(5,1)
V-SSA
(26,6)
(32,5)
(32,5)
(32,5)
(11,2)
deaths
R-SSA
(36,9)
(36,9)
(59,13)
(59,13)
(59,13)
V-SSA
(57,14)
(57,20)
(59,19)
(58,19)
(65,19)
recovered
R-SSA
(7,1)
(4,1)
(3,1)
(3,1)
(3,1)
V-SSA
(9,2)
(6,2)
(6,2)
(6,2)
(6,2)
Russia
confirmed
R-SSA
(13,3)
(7,2)
(9,4)
(75,18)
(74,27)
V-SSA
(13,3)
(8,4)
(24,7)
(75,18)
(73,30)
deaths
R-SSA
(29,7)
(14,6)
(14,6)
(36,7)
(36,7)
V-SSA
(34,7)
(33,7)
(37,7)
(37,7)
(37,7)
recovered
R-SSA
(10,7)
(15,7)
(15,7)
(24,4)
(24,4)
V-SSA
(10,8)
(16,11)
(16,11)
(16,11)
(16,11)
France
confirmed
R-SSA
(25,4)
(67,8)
(67,8)
(67,7)
(67,4)
V-SSA
(25,5)
(60,15)
(89,8)
(93,9)
(67,5)
deaths
R-SSA
(16,4)
(63,25)
(63,27)
(2,1)
(2,1)
V-SSA
(11,6)
(61,30)
(61,30)
(2,1)
(2,1)
recovered
R-SSA
(2,1)
(75,14)
(9,1)
(8,1)
(2,1)
V-SSA
(4,1)
(83,14)
(64,26)
(6,1)
(2,1)
Spain
confirmed
R-SSA
(24,7)
(3,1)
(3,1)
(5,1)
(5,1)
V-SSA
(26,7)
(4,1)
(5,1)
(7,1)
(5,1)
deaths
R-SSA
(14,2)
(3,1)
(3,1)
(2,1)
(2,1)
V-SSA
(18,2)
(4,1)
(4,1)
(2,1)
(3,1)
recovered
R-SSA
(35,1)
(35,1)
(2,1)
(2,1)
(2,1)
V-SSA
(73,4)
(70,3)
(70,3)
(69,1)
(66,4)
Argentina
confirmed
R-SSA
(74,5)
(12,3)
(9,3)
(5,1)
(5,1)
V-SSA
(68,4)
(11,3)
(11,3)
(11,3)
(11,2)
deaths
R-SSA
(17,4)
(26,2)
(26,2)
(20,1)
(13,1)
V-SSA
(13,4)
(33,4)
(27,4)
(18,2)
(15,2)
recovered
R-SSA
(3,1)
(4,1)
(6,1)
(2,1)
(6,1)
V-SSA
(4,1)
(80,1)
(4,1)
(2,1)
(80,1)
Colombia
confirmed
R-SSA
(3,1)
(8,1)
(8,2)
(6,2)
(6,2)
V-SSA
(3,1)
(2,1)
(2,1)
(2,1)
(2,1)
deaths
R-SSA
(3,1)
(2,1)
(6,2)
(6,2)
(3,1)
V-SSA
(7,2)
(2,1)
(2,1)
(2,1)
(2,1)
recovered
R-SSA
(2,1)
(13,1)
(6,1)
(5,1)
(4,1)
V-SSA
(2,1)
(15,1)
(2,1)
(2,1)
(3,1)
UK
confirmed
R-SSA
(4,1)
(2,1)
(87,22)
(49,7)
(86,15)
V-SSA
(5,1)
(2,1)
(51,7)
(74,16)
(88,23)
deaths
R-SSA
(21,18)
(13,11)
(13,11)
(2,1)
(2,1)
V-SSA
(24,20)
(24,20)
(2,1)
(4,1)
(6,1)
recovered
R-SSA
(25,1)
(43,5)
(43,5)
(43,6)
(24,1)
V-SSA
(25,1)
(34,1)
(53,7)
(52,8)
(53,9)
Mexico
confirmed
R-SSA
(15,5)
(15,5)
(6,3)
(5,1)
(4,1)
V-SSA
(16,4)
(16,4)
(16,4)
(16,4)
(16,4)
deaths
R-SSA
(41,5)
(34,5)
(34,5)
(56,3)
(57,3)
V-SSA
(62,10)
(61,10)
(42,24)
(36,5)
(36,5)
recovered
R-SSA
(8,2)
(6,1)
(5,1)
(5,1)
(5,1)
V-SSA
(24,7)
(7,1)
(7,1)
(6,1)
(6,1)
Optimal for SSA forecasting.Table 6
shows the rounded RMSEs of forecasting the number of confirmed cases for the ten countries, which are calculated for each of forecasting methods and different forecasting horizons. The RMSEs of forecasting the number of deaths and recovered cases are reported in Tables 7
and 8
. The bold font in these tables shows the forecasting method with the lowest RMSE at each horizon for a given country. Also, the last column of these tables indicates the average of RMSE across all forecasting horizons for a given forecasting method.
Table 6
The RMSE of forecasting the number of confirmed cases.
Country
Forecasting method
Forecasting horizon (h)
Avg.
7
14
20
30
40
USA
R-SSA
6881
8585
9920
13,184
16,090
10,932
V-SSA
6735
8517
10,535
13,610
16,110
11,101
ARIMA
9231
13,020
16,195
21,521
28,557
17,705
ARFIMA
10,562
10,974
11,112
11,218
11,791
11,131
ETS
9472
10,447
11,261
12,288
13,356
11,365
TBATS
7549
9567
11,130
14,021
16,894
11,832
NNAR
7832
8747
9672
9522
9997
9154
India
R-SSA
6478
11,786
18,024
29,677
43,564
21,906
V-SSA
6737
12,561
19,574
30,265
44,493
22,726
ARIMA
8620
13,942
19,149
26,074
32,109
19,979
ARFIMA
15,317
22,366
27,189
34,542
41,617
28,206
ETS
8007
12,027
16,615
25,654
33,955
19,252
TBATS
7282
10760
14,748
22,170
30,457
17,083
NNAR
9172
11,763
13523
16104
18241
13761
Brazil
R-SSA
8656
9932
10732
13,301
15,336
11,591
V-SSA
8920
10,307
11,200
13,877
16,406
12,142
ARIMA
11,390
15,895
21,312
35,610
56,389
28,119
ARFIMA
12,150
12,706
13,164
13,673
14,195
13,178
ETS
11,227
11,603
11,947
12,991
14,265
12,407
TBATS
9737
10,420
10,732
11795
13057
11148
NNAR
9242
10,243
11,052
12,641
13,620
11,360
Russia
R-SSA
405
815
1285
1481
2081
1213
V-SSA
396
812
1267
1507
2186
1234
ARIMA
427
720
1065
1902
2669
1357
ARFIMA
666
1243
1827
2524
2816
1815
ETS
454
858
1299
2109
2705
1485
TBATS
473
854
1283
2046
2553
1442
NNAR
934
1656
2182
2598
2717
2017
France
R-SSA
2104
2299
2488
2837
3545
2655
V-SSA
2094
2301
2524
2685
3377
2596
ARIMA
2563
3460
4336
5192
6517
4414
ARFIMA
4559
5650
6259
7349
8380
6439
ETS
2360
2969
3667
4627
5740
3873
TBATS
2298
2873
3538
4294
5313
3663
NNAR
3062
3979
4754
5921
7494
5042
Spain
R-SSA
1273
1708
1999
2495
2920
2079
V-SSA
1291
1700
1979
2485
2958
2083
ARIMA
1474
1913
2187
2826
4484
2577
ARFIMA
2349
3412
4179
5250
6186
4275
ETS
1521
1953
2207
2473
2934
2218
TBATS
1524
1996
2286
2561
3195
2312
NNAR
2538
2812
3158
3938
4828
3455
Argentina
R-SSA
1498
1904
2297
2965
3889
2511
V-SSA
1583
1878
2217
3076
4152
2581
ARIMA
1825
2208
2574
2769
2792
2434
ARFIMA
3063
3873
4546
5376
6033
4578
ETS
2422
2566
2797
2978
3071
2767
TBATS
1789
1973
2204
2610
2925
2300
NNAR
2650
3247
3616
4103
4640
3651
Colombia
R-SSA
1500
2339
2451
3492
4893
2935
V-SSA
1507
2026
2529
3685
5376
3025
ARIMA
1778
2582
3409
5188
7698
4131
ARFIMA
1529
1686
1779
2014
2282
1858
ETS
1628
2247
2937
4369
6355
3507
TBATS
1371
1715
2080
2778
3744
2338
NNAR
1355
1583
1727
2009
2276
1790
UK
R-SSA
1659
2296
2357
2464
3192
2394
V-SSA
1642
2296
2039
2373
3031
2276
ARIMA
1736
2522
3041
3714
4612
3125
ARFIMA
2412
3191
3674
4361
5074
3742
ETS
1622
2409
3161
4134
4882
3242
TBATS
1618
2353
2881
3334
3859
2809
NNAR
2148
2820
3311
4185
4687
3430
Mexico
R-SSA
847
1015
1108
1299
1502
1154
V-SSA
830
886
939
1095
1293
1009
ARIMA
1135
1438
1753
2584
3828
2148
ARFIMA
1238
1244
1250
1303
1365
1280
ETS
1050
1088
1119
1234
1430
1184
TBATS
1028
1074
1111
1216
1316
1149
NNAR
1059
1091
1130
1219
1310
1162
Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.
Table 7
The RMSE of forecasting the number of deaths.
Country
Forecasting method
Forecasting horizon (h)
Avg.
7
14
20
30
40
USA
R-SSA
121
128
134
141
158
136
V-SSA
120
129
139
150
168
141
ARIMA
264
312
361
458
599
399
ARFIMA
360
367
377
379
386
374
ETS
380
395
412
438
469
419
TBATS
222
250
282
337
386
295
NNAR
172
184
191
203
216
193
India
R-SSA
102
128
143
168
191
146
V-SSA
98
119
128
147
149
128
ARIMA
114
162
213
285
343
223
ARFIMA
194
263
309
376
447
318
ETS
112
152
202
292
370
226
TBATS
107
140
178
231
265
184
NNAR
123
146
161
172
180
156
Brazil
R-SSA
143
155
164
179
200
168
V-SSA
144
157
165
176
190
166
ARIMA
268
386
555
1024
1928
832
ARFIMA
298
306
313
314
320
310
ETS
267
282
297
316
348
302
TBATS
209
222
239
255
270
239
NNAR
231
253
269
294
317
273
Russia
R-SSA
19
24
30
37
42
30
V-SSA
18
23
28
35
40
29
ARIMA
32
38
39
42
42
39
ARFIMA
44
45
48
54
58
50
ETS
37
42
46
52
56
47
TBATS
33
38
41
46
51
42
NNAR
35
39
42
46
47
42
France
R-SSA
37
45
45
50
56
47
V-SSA
37
45
45
50
56
47
ARIMA
42
45
45
49
55
47
ARFIMA
42
44
44
46
51
45
ETS
39
42
42
49
55
45
TBATS
39
43
43
46
53
45
NNAR
47
57
65
68
65
60
Spain
R-SSA
52
62
69
79
89
70
V-SSA
52
61
68
79
88
70
ARIMA
50
58
63
74
88
67
ARFIMA
56
65
72
86
101
76
ETS
50
57
63
75
90
67
TBATS
51
60
67
75
86
68
NNAR
87
95
119
138
141
116
Argentina
R-SSA
87
100
110
127
141
113
V-SSA
86
98
108
124
139
111
ARIMA
97
108
113
117
113
110
ARFIMA
108
134
153
182
200
155
ETS
97
112
126
135
128
120
TBATS
106
128
155
200
137
145
NNAR
101
122
128
134
145
126
Colombia
R-SSA
38
52
69
107
162
85
V-SSA
38
52
69
109
165
87
ARIMA
45
66
91
146
225
114
ARFIMA
34
38
39
40
42
38
ETS
36
46
57
84
116
68
TBATS
35
43
52
73
98
60
NNAR
37
47
56
75
97
63
UK
R-SSA
23
28
27
33
36
29
V-SSA
22
28
31
33
35
30
ARIMA
27
31
33
35
36
32
ARFIMA
26
29
32
33
35
31
ETS
20
21
24
34
42
28
TBATS
18
22
26
33
40
28
NNAR
27
34
37
48
52
40
Mexico
R-SSA
113
124
135
151
166
138
V-SSA
122
132
147
159
174
147
ARIMA
179
226
280
429
679
359
ARFIMA
185
185
185
189
195
188
ETS
181
179
180
190
205
187
TBATS
180
187
192
214
246
204
NNAR
158
170
195
215
238
195
Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.
Table 8
The RMSE of forecasting the number of recovered cases.
Country
Forecasting method
Forecasting horizon (h)
Avg.
7
14
20
30
40
USA
R-SSA
8442
8960
9568
11,089
13,439
10,300
V-SSA
8174
9053
9716
11,497
14,179
10,524
ARIMA
9014
11,079
14,105
21,463
33,801
17,892
ARFIMA
9584
10,071
10,627
11,743
13,231
11,051
ETS
8320
8423
8506
8870
9181
8660
TBATS
8226
8557
8984
10,033
11,631
9486
NNAR
8805
9034
9366
9713
9818
9347
India
R-SSA
7531
12,050
16,991
28,351
41,859
21,356
V-SSA
7531
12,050
16,991
28,351
41,540
21,293
ARIMA
9080
15,161
21,918
37,759
58,131
28,410
ARFIMA
13,458
20,923
27,035
36,878
45,806
28,820
ETS
8466
12,619
17,074
27,410
37,001
20,514
TBATS
6471
9381
12652
19206
21120
13766
NNAR
8573
12,971
16,084
20,424
23,135
16,237
Brazil
R-SSA
10,689
12,312
13,941
16,879
21,358
15,036
V-SSA
10,602
12,027
13,639
16,811
21,616
14,939
ARIMA
12,662
17,031
23,130
37,745
60,446
30,203
ARFIMA
10,648
11,103
11,764
12,296
12,732
11,709
ETS
10,982
12,809
14,955
20,654
29,794
17,839
TBATS
9390
10128
10762
11721
12275
10855
NNAR
11,293
11,959
12,354
12,474
12,326
12,081
Russia
R-SSA
733
1049
1256
1424
1592
1211
V-SSA
755
1030
1201
1365
1513
1173
ARIMA
1744
2549
3490
5740
9530
4611
ARFIMA
2163
2254
2298
2463
2587
2353
ETS
1789
1846
1935
2055
2106
1946
TBATS
1400
1643
1761
1850
1900
1711
NNAR
1080
1382
1641
1977
2193
1655
France
R-SSA
242
276
293
305
337
291
V-SSA
235
263
279
300
337
283
ARIMA
243
284
301
316
356
300
ARFIMA
273
304
311
325
363
315
ETS
238
275
293
310
346
292
TBATS
239
282
302
313
338
295
NNAR
278
320
355
368
381
340
Argentina
R-SSA
1395
1885
2416
3978
7003
3335
V-SSA
1397
1880
2433
3978
6883
3314
ARIMA
1299
1566
1881
2788
4240
2355
ARFIMA
2360
3336
4122
5333
6164
4263
ETS
1307
1568
1860
2666
3906
2262
TBATS
1336
1634
1884
2312
2886
2010
NNAR
1536
1895
2156
2695
3213
2299
Colombia
R-SSA
3473
4889
5898
8670
13,436
7273
V-SSA
3473
4947
6040
9030
14,151
7528
ARIMA
3700
5329
6573
9632
13,955
7838
ARFIMA
3216
4041
4206
4342
4685
4098
ETS
3502
4773
5516
7557
10,932
6456
TBATS
3253
4087
4349
5105
6251
4609
NNAR
3115
3734
3760
4120
4430
3832
UK
R-SSA
12
13
14
15
16
14
V-SSA
12
13
14
15
15
14
ARIMA
13
15
16
18
18
16
ARFIMA
14
16
17
18
18
17
ETS
13
15
16
17
17
16
TBATS
12
13
14
15
15
14
NNAR
14
16
602,997
18
4,907,095
1,102,028
Mexico
R-SSA
1442
1570
1639
1749
1955
1671
V-SSA
1441
1583
1664
1824
2102
1723
ARIMA
1658
1903
2199
2873
3892
2505
ARFIMA
1637
1637
1649
1626
1630
1636
ETS
1578
1601
1629
1608
1653
1614
TBATS
1528
1571
1632
1613
1624
1594
NNAR
1402
1551
1655
1864
2033
1701
Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.
The RMSE of forecasting the number of confirmed cases.Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.The RMSE of forecasting the number of deaths.Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.The RMSE of forecasting the number of recovered cases.Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.By having the RMSEs reported in Table 6, Table 7, Table 8, we are able to determine the best forecasting technique corresponding to minimum RMSE. For example, the best model of forecasting the number of deaths for USA is R-SSA, at forecasting horizon 40. The best model for forecasting the confirmed series of each country is presented in Table 9
. Similarly, the best model for forecasting the deaths and recoveries is reported in Tables 10
and 11
. The last column of Table 9, Table 10, Table 11 shows the best model on average for a given country, which corresponds to the lowest RMSE presented in the last column of Table 6, Table 7, Table 8. The first finding from Table 9, Table 10, Table 11 is that no single model can provide the best forecast of the number of confirmed cases, deaths, and recoveries for all ten countries considered here. Secondly, based on the number of times that R-SSA and V-SSA techniques outperform the other models across all horizons, we can suggest that the two SSA models are viable options for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. Another interesting finding is that the best model for forecasting the number of deaths in Colombia is the ARFIMA model, across all forecasting horizons. This means that there is a long-range dependence in the time series of deaths in Colombia.
Table 9
The best model for forecasting the COVID-19 confirmed cases.
Country
Forecasting horizon (h)
Avg.
7
14
20
30
40
USA
V-SSA
V-SSA
NNAR
NNAR
NNAR
NNAR
India
R-SSA
TBATS
NNAR
NNAR
NNAR
NNAR
Brazil
R-SSA
R-SSA
R-SSA
TBATS
TBATS
TBATS
Russia
V-SSA
ARIMA
ARIMA
R-SSA
R-SSA
R-SSA
France
V-SSA
R-SSA
R-SSA
V-SSA
V-SSA
V-SSA
Spain
R-SSA
V-SSA
V-SSA
ETS
R-SSA
R-SSA
Argentina
R-SSA
V-SSA
TBATS
TBATS
ARIMA
TBATS
Colombia
NNAR
NNAR
NNAR
NNAR
NNAR
NNAR
UK
TBATS
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
Mexico
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
Table 10
The best model for forecasting the number of deaths caused by COVID-19.
Country
Forecasting horizon (h)
Avg.
7
14
20
30
40
USA
V-SSA
R-SSA
R-SSA
R-SSA
R-SSA
R-SSA
India
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
Brazil
R-SSA
R-SSA
R-SSA
V-SSA
V-SSA
V-SSA
Russia
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
France
R-SSA
ETS
ETS
TBATS
ARFIMA
TBATS
Spain
ARIMA
ETS
ETS
ARIMA
TBATS
ARIMA
Argentina
V-SSA
V-SSA
V-SSA
ARIMA
ARIMA
ARIMA
Colombia
ARFIMA
ARFIMA
ARFIMA
ARFIMA
ARFIMA
ARFIMA
UK
TBATS
ETS
ETS
TBATS
ARFIMA
TBATS
Mexico
R-SSA
R-SSA
R-SSA
R-SSA
R-SSA
R-SSA
Table 11
The best model for forecasting the recovered cases.
Country
Forecasting horizon (h)
Avg.
7
14
20
30
40
USA
V-SSA
ETS
ETS
ETS
ETS
ETS
India
TBATS
TBATS
TBATS
TBATS
TBATS
TBATS
Brazil
TBATS
TBATS
TBATS
TBATS
TBATS
TBATS
Russia
R-SSA
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
France
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
V-SSA
Argentina
ARIMA
ARIMA
ETS
TBATS
TBATS
TBATS
Colombia
NNAR
NNAR
NNAR
NNAR
NNAR
NNAR
UK
V-SSA
TBATS
TBATS
V-SSA
V-SSA
V-SSA
Mexico
NNAR
NNAR
ETS
ETS
TBATS
TBATS
The best model for forecasting the COVID-19 confirmed cases.The best model for forecasting the number of deaths caused by COVID-19.The best model for forecasting the recovered cases.The results of Table 9, Table 10, Table 11 are useful to practitioners in two ways. First, it can be determined which model is the best for forecasting at a particular horizon for a given country. Second, the results enable practitioners to select the best model on average for forecasting in selected country across all forecasting horizons.By exploiting of the results given in Table 9, Table 10, Table 11, we are able to provide forecasts for the number of confirmed cases, deaths, and recoveries caused by COVID-19, at different forecasting horizons. Fig. 7, Fig. 8, Fig. 9
depict the original time series (black circles) together with 40 days ahead point forecasts (red squares) for the number of confirmed cases, deaths, and recoveries. Forecasting results shown in Fig. 7 indicate that there will be a dramatic increase in the number of confirmed cases of France, Spain, and UK. However, the rate of growth will be slower in Russia and Argentina. This increase will happen slowly in India, Brazil, Colombia, and Mexico. Also, this results reveal that the number of confirmed cases will be decreasing in USA.
Fig. 7
Plot of 40 days ahead point forecasts of confirmed cases starting from October 30, 2020.
Fig. 8
Plot of 40 days ahead point forecasts of deaths starting from October 30, 2020.
Fig. 9
Plot of 40 days ahead point forecasts of recovered cases starting from October 30, 2020.
Plot of 40 days ahead point forecasts of confirmed cases starting from October 30, 2020.Plot of 40 days ahead point forecasts of deaths starting from October 30, 2020.Plot of 40 days ahead point forecasts of recovered cases starting from October 30, 2020.It can be concluded from Fig. 8 that there will be a considerable increase in the number of deaths of Russia and Argentina; however, it will be decreasing in India, France, Colombia, and UK. The number of deaths will fluctuate around an almost constant value in USA, Brazil, and Mexico.According to the forecasting results depicted in Fig. 9, the number of recovered cases will rise in USA, Russia, France, and Argentina; however, there will be a decline in Brazil, and especially in India. This quantity will tend to a constant in Colombia and Mexico. In addition, the number of recovered cases in UK will fluctuate, but the trend will upward.
Discussion
In this study, we have evaluated the potential advantages of SSA for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. In order to calculate the optimal parameters of SSA including window length and the number of leading components, an algorithm have been proposed. The results of R-SSA and V-SSA have been compared to those from other conventional time series forecasting techniques including ARIMA, ARFIMA, ETS, TBATS, and NNAR. The dataset of CSSE at Johns Hopkins University has been adopted to forecast the number of daily confirmed cases, deaths, and recoveries for top ten affected countries until 29 October 2020. It should be noted that the dataset of CSSE has a considerable disadvantage. If the cumulative data of confirmed cases, deaths, and recoveries are transformed into daily data, some negative data are obtained that are apparently irrational. In order to deal with this issue, first, we considered the negative values and outliers as missing data. Then, these missing values were imputed by Kalman Smoothing method. It is worth mentioning that the present study is unique with regard to using optimal version of V-SSA and R-SSA, and comparing the results to those from ARIMA, ARFIMA, ETS, TBATS, and NNAR. The findings of this study can be summarised as follows:No single model can provide the best forecast of the number of confirmed cases, deaths, and recoveries for all ten countries considered here.Based on the number of times that R-SSA and V-SSA forecasting techniques outperform the other models across all horizons, these methods are viable options for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19.There will be a rapid rise in the number of confirmed cases of France, Spain, and UK. However, the rate of increase will be slower in Russia and Argentina. This growth will occur slowly in India, Brazil, Colombia, and Mexico. The results of point forecasts reveal that the number of confirmed cases will be decreasing in USA.The number of deaths of Russia and Argentina will increase dramatically; however, it will be decreasing in India, France, Colombia, and UK. The number of deaths will fluctuate around an almost constant value in USA, Brazil, and Mexico.There will be an increase in the number of recovered cases of USA, Russia, France, and Argentina; however, there will be a decline in Brazil, and especially in India. This quantity will tend to a constant in Colombia and Mexico. Also, the number of recovered cases in UK will fluctuate, but the trend will upward.
Conclusion
In this paper, we have used the optimal version of two forecasting techniques of SSA, namely R-SSA and V-SSA, for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. In order to evaluate the performance of these approaches based on the RMSE criterion, the forecasting results have been compared to those from other commonly used time series forecasting methods including ARIMA, ARFIMA, ETS, TBATS, and NNAR. We considered only the first ten countries in terms of the number of cumulative confirmed cases. These countries include USA, India, Brazil, Russia, France, Spain, Argentina, Colombia, UK, and Mexico. The evidence from this investigation shows that there is not a single model to provide the best model for any of the countries and forecasting horizons considered in this study. However, we have found that the optimal SSA technique can provide a powerful tool for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19 based on the number of times that it outperforms the competing models.Our study has an obvious shortcoming. The forecasting methods used in this investigation may produce some negative point forecasts that are clearly meaningless as the number of confirmed cases, deaths, and recoveries. In order to make positive point forecasts, we suggest using count time series models. This work has gone some way towards enhancing our understanding of SSA capabilities for forecasting the COVID-19 pandemic. The results of this study enable forecasters to choose the most appropriate model (from those considered here) based on the country and horizon for forecasting the number of confirmed cases, deaths, and recoveries caused by COVID-19. We hope that our forecasts will be a useful tool for governments towards making appropriate decisions to control the disease and prevent further damages. In terms of future research, we will apply the multivariate version of SSA that employs the time-dependent correlations between several time series.
CRediT authorship contribution statement
Mahdi Kalantari: Conceptualization, Methodology, Software, Data curation, Visualization, Validation, Writing - original draft, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Gitanjali R Shinde; Asmita B Kalamkar; Parikshit N Mahalle; Nilanjan Dey; Jyotismita Chaki; Aboul Ella Hassanien Journal: SN Comput Sci Date: 2020-06-11
Authors: Muhammad Yousaf; Samiha Zahir; Muhammad Riaz; Sardar Muhammad Hussain; Kamal Shah Journal: Chaos Solitons Fractals Date: 2020-05-25 Impact factor: 5.944
Authors: Lishi Wang; Jing Li; Sumin Guo; Ning Xie; Lan Yao; Yanhong Cao; Sara W Day; Scott C Howard; J Carolyn Graff; Tianshu Gu; Jiafu Ji; Weikuan Gu; Dianjun Sun Journal: Sci Total Environ Date: 2020-04-08 Impact factor: 7.963