Literature DB >> 33311861

Forecasting COVID-19 pandemic using optimal singular spectrum analysis.

Abstract

Coronavirus disease 2019 (COVID-19) is a pandemic that has affected all countries in the world. The aim of this study is to examine the potential advantages of Singular Spectrum Analysis (SSA) for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19, which are the three main variables of interest. This paper contributes to the literature on forecasting COVID-19 pandemic in several ways. Firstly, an algorithm is proposed to calculate the optimal parameters of SSA including window length and the number of leading components. Secondly, the results of two forecasting approaches in the SSA, namely vector and recurrent forecasting, are compared to those from other commonly used time series forecasting techniques. These include Autoregressive Integrated Moving Average (ARIMA), Fractional ARIMA (ARFIMA), Exponential Smoothing, TBATS, and Neural Network Autoregression (NNAR). Thirdly, the best forecasting model is chosen based on the accuracy measure Root Mean Squared Error (RMSE), and it is applied to forecast 40 days ahead. These forecasts can help us to predict the future behaviour of this disease and make better decisions. The dataset of Center for Systems Science and Engineering (CSSE) at Johns Hopkins University is adopted to forecast the number of daily confirmed cases, deaths, and recoveries for top ten affected countries until October 29, 2020. The findings of this investigation show that no single model can provide the best model for any of the countries and forecasting horizons considered here. However, the SSA technique is found to be viable option for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19 based on the number of times that it outperforms the competing models.

Entities: CellLine Chemical Disease Species

Keywords: 37M10; 62M10; 62M20; ARFIMA; ARIMA; COVID-19; Exponential smoothing; Neural network autoregression; Singular spectrum analysis; TBATS

Year: 2020 PMID： 33311861 PMCID： PMC7719007 DOI： 10.1016/j.chaos.2020.110547

Source DB: PubMed Journal: Chaos Solitons Fractals ISSN： 0960-0779 Impact factor: 5.944

Introduction

The outbreak of coronavirus disease 2019 (COVID-19) in the world is an important public health concern. World Health Organization (WHO) declared COVID-19 a pandemic on 11 March 2020. The rapid spread of this virus has affected over 200 countries. Currently, the number of infected and deceased patients is still increasing, with a very high contagion rate, in almost all the affected countries. This disease seriously threatens human health and has significant effect on various fields such as economic development, tourism, social relations, life style and international politics. Recently, several studies have been conducted to model COVID-19 pandemic using various methods. For example, a Long Short Term Memory for Data Training-SAE (LSTM-SAE) network model has been used as a preliminary study in [1] and it served as a baseline for testing other ANN types. Then, the Modified Auto-Encoder (MAE) networks have been applied as final models to forecast COVID-19 dynamics in Brazil. Also, in order to predict the number of positive reported cases for 32 states and union territories of India, deep learning-based models have been used in [2]. In [3], a simple iteration method has been used for forecasting that needs only the daily values of confirmed cases as input. In [4], first, the Generalized Additive Models (GAMs) have been applied to estimate three parameters of time-dependent transmission rate, time-dependent recovery rate, and time-dependent death rate from COVID-19 outbreak in China, and then, using the number of COVID-19 infections in Iran, the number of patients were predicted in Iran. A comparative study of five deep learning methods has been proposed in [5] to forecast the number of new cases and recovered cases. Simple Recurrent Neural Network (RNN), LSTM, Bidirectional LSTM, Gated Recurrent Units (GRUs) and Variational AutoEncoder (VAE) algorithms have been applied in this reference for global forecasting of COVID-19 cases based on the data of Italy, Spain, France, China, USA, and Australia. In [6], a hybrid model including two-dimensional (2D) curvelet transformation, Chaotic Salp Swarm Algorithm (CSSA) and deep learning technique have been developed to determine the patient infected with coronavirus from X-ray images. In the proposed model, 2D curvelet transformation was applied to the images obtained from the patient’s chest X-ray radiographs and a feature matrix was formed using the obtained coefficients. The coefficients in the feature matrix were optimized using the CSSA and COVID-19 disease was diagnosed by the EfficientNet-B0 model, which is one of the deep learning methods. For more details on other new chaotic methods see [7], [8]. Further studies considering the forecast of the pandemic can be found in [9], [10], [11], [12], [13], [14], [15], [16]. While the review of all references concerning COVID-19 is beyond the scope of this paper, an interested reader is refereed to [17] to find an overall comprehensive study on analysis of several forecasting models available in the literature and their classification, challenges of these models, and control measures. Recently, many attempts have been made with the purpose of forecasting COVID-19 spread using time series models. For example, exponential smoothing family has been used in [18] to forecast daily cumulative confirmed, deaths, and recovered cases from COVID-19. The linear trend model and double exponential smoothing techniques have been tested in [19] in order to forecast COVID-19 spread in Malaysia, Thailand, and Singapore. An ARIMA modelling has been utilized in [20] to forecast total infected cases of USA, Brazil, India, Russia, and Spain from 15th February to June 30, 2020. A Vector Autoregressive model has been used in [21] to forecast new daily confirmed cases, deaths and recovered cases in Pakistan for ten days. A Bayesian time series analysis has been conducted in [22] using daily data of COVID-19 in Japan until March 31, 2020. A new hybrid model of discrete wavelet decomposition and ARIMA models have been developed in [23] to make one month ahead prediction of death cases in Italy, Spain, France, the United Kingdom (UK), and the United States of America (USA). More information about other time series models used for forecasting COVID-19 disease can be found in [24], [25], [26], [27], [28], [29]. Despite many attempts to model COVID-19 pandemic, few researches to the best of our knowledge have utilized Singular Spectrum Analysis (SSA) technique to forecast COVID-19. We found that a modified SSA approach has been used in [30] to predict COVID-19 pandemic in Saudi Arabia. Also, the recurrent forecasting method of SSA has been applied in [31] to provide predictive modelling of COVID-19 cases in Malaysia. The SSA has been a rapidly developing method of time series analysis. This non-parametric technique is widely used in a variety of fields such as signal processing, finance, economics, image processing, meteorology, engineering, medicine, biology and genetics. The main characteristics of SSA are neither a parametric model nor stationary condition have to be assumed for a time series. Whilst the review of all applications of SSA is beyond the scope of this paper, we refer interested readers to [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49]. For a whole and detailed information on the theory and applications of SSA, see [50], [51]. A comprehensive review of SSA and description of its modifications and extensions can be found in [52]. Due to the great potential of SSA to forecast future data, we believe that this method can provide a reliable forecast for COVID-19 time series data and therefore, this motivates us to apply the SSA. The number of confirmed cases, deaths, and recoveries caused by COVID-19 are the three main variables of interest that have been reported every day. Accurate forecast of these variables is crucial and it can allow us to better understand the global impact of corona virus and correct planning in the future, such as estimating the required number of hospital beds or changing the social distancing and isolation rules. This paper contributes to the literature on forecasting COVID-19 pandemic in several ways. Firstly, the optimal version of recurrent and vector forecasting methods of SSA are used, for the first time, to predict the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. Secondly, in order to evaluate the potential of SSA for forecasting the three main variables, the performance of SSA is compared with other commonly used time series forecasting techniques including Autoregressive Integrated Moving Average (ARIMA), Fractional ARIMA (ARFIMA), Exponential Smoothing, TBATS, and Neural Network Autoregression (NNAR). Thirdly, the best forecasting model is chosen based on the accuracy measure Root Mean Squared Error (RMSE), which is a commonly used criterion in time series forecasting literature, and it is applied to forecast 40 days ahead. These forecasts may help government and other agencies to change their strategies and to optimize the available resources according to the forecasted situation. Owing to the broad spread of this virus around the world, analysing the data of all countries is a difficult and time consuming task. Therefore, we focus only on the first ten countries in terms of the number of cumulative confirmed cases. At the time of writing this paper, 29 October 2020, these countries include USA, India, Brazil, Russia, France, Spain, Argentina, Colombia, UK, and Mexico. The remainder of this paper is organized as follows. Section 2 briefly presents a review of SSA. The description of recurrent and vector forecasting are outlined in this section, along with the algorithm of calculating optimal parameters of SSA. In Section 3, the theoretical background and general scheme of other time series forecasting techniques utilized in this study are briefly discussed. The source of data, which are used in this investigation, are explained in Section 4. Section 5 is dedicated towards comparing the performance of SSA with other forecasting methods. In addition, 40 days ahead point forecasts for the number of confirmed cases, deaths, and recoveries are presented in this section. The findings of this study are discussed in Section 6. Finally, the conclusion and future works are given in Section 7.

Review of SSA

The SSA technique has various modifications and extensions, which some of them are explained in [53]. The most fundamental version of the SSA is called Basic SSA. Here, we briefly explain the theory underlying Basic SSA and in doing so we mainly follow [51], [53]. Also, two types of SSA forecasting methods namely Recurrent forecasting (R-forecasting) and Vector forecasting (V-forecasting) are briefly reviewed. It is noteworthy that there are many software applications which are applied in SSA such as Caterpillar-SSA and SAS/ETS. In this research, we apply the free available R package Rssa to conduct SSA stages and to obtain recurrent and vector forecasting. More details on this package can be found in [54], [55], [56].

SSA Stages

The SSA technique consists of two complementary stages: Decomposition and Reconstruction. Each of these stages includes two separate steps. At the decomposition stage, a time series is decomposed into several interpretable components such as trend, seasonal and cyclical components, which enables us to signal extraction and noise reduction. At the reconstruction stage, interpretable components are reconstructed, which can be used to forecast new data points.

Stage 1: Decomposition (Embedding & Singular Value Decomposition)

In embedding step, the observed time series is transformed into the matrix whose columns comprise , where and . The matrix is called the trajectory matrix. This matrix is a Hankel matrix in the sense that all the elements on the anti-diagonals are equal. This step has only one parameter which is called the window length. The window length is commonly chosen such that where is the length of the time series . In Singular Value Decomposition (SVD) step, the trajectory matrix is decomposed into where and are orthogonal and is a diagonal matrix. The diagonal entries of the matrix are called the singular values of and denoted by in decreasing order of magnitude . The columns of are called left singular vectors and those of are called right singular vectors. If then the SVD of the trajectory matrix can be written as follows:where is the th left singular vector and is the th right singular vector (). It is also well known that the left singular vectors of are the eigenvectors of . The collection () is called the th eigentriple of the SVD.

Stage 2: Reconstruction (Grouping & Diagonal Averaging)

The grouping step splits the elementary matrices in (1) into several groups and sums the matrices within each group. Let be the subset of indices . Then, the resultant matrix corresponding to the group I is defined as that is, summing the matrices within each group. With the SVD of the split of the set of indices into the disjoint subsets corresponds to the following decomposition: The main goal of diagonal averaging is to transform each matrix of the grouped matrix decomposition (2) into a Hankel matrix, which can subsequently be converted into a new time series of length . Let be an matrix with elements . By diagonal averaging, the matrix is transferred into the Hankel matrix with the elements over the anti-diagonals using the following formula:where and denotes the number of elements in the set . By applying diagonal averaging (3) to all the matrix components of (2), the following expansion is obtained: where . This is equivalent to the decomposition of the initial series into a sum of m series: where corresponds to the matrix . In this paper, we denote the number of leading eigentriples corresponding to the signal (noise-free time series) by .

Recurrent forecasting

Suppose is the chosen set of eigentriples attained at the grouping step of SSA. Let be the corresponding eigenvectors of chosen eigentriples, be the vector consisting of the first components of the vector be the last component of the vector and be the time series reconstructed by set . The recurrent forecasting algorithm, which we refer to as R-SSA, is summarized as follows: The time series is defined bywhere the vector of coefficients is defined as: The numbers are the step-ahead recurrent forecasts.

Vector forecasting

Consider the matrix where the matrix consists of column vectors and is defined in (5). The vector forecasting algorithm, which we refer to as V-SSA, is formulated as follows: Define the vector as:where and is the vector consisting of the last components of the vector . By constructing the matrix and making its diagonal averaging the series is obtained. The numbers are the step ahead vector forecasts.

Choosing and

The window length (), which is used in the embedding step of SSA, plays a pivotal role in the SSA technique; because the whole procedure of SSA depends upon this parameter. Another important parameter is the number of leading eigentriples () that is required to reconstruct and forecast the signal (noise-free time series). In order to find the optimal values of and we apply a cross-validation procedure. This method of parameter choice is based on the minimization of Root Mean Squared Error (RMSE) within the validation (test) period for a given forecasting horizon (i.e. the number of periods for forecasting). In Algorithm 1 , the details of finding optimal are described:

Algorithm 1

Calculation of optimal

Other forecasting methods

In this section, the other commonly used time series forecasting methods applied in this investigation are briefly explained.

Autoregressive integrated moving average (ARIMA)

The ARIMA technique is one of the most established and widely used time series forecasting methods. A non-seasonal ARIMA model is given bywhere is a time series, is the backshift operator defined as is a white noise process with mean zero, and is the mean of [57]. Also, the seasonal ARIMA model is written aswhere is equal to the number of observations per year, and . Selecting an appropriate model order, that is the values and is a major task in ARIMA modelling. In this paper, we use the auto.arima function from the forecast package of R software to find the best ARIMA model automatically and estimate its parameters. For more information on how this function works and examples of applications, see [58].

Fractional ARIMA (ARFIMA)

If the time series exhibits a long-range dependence, then the parameter can be allowed to have non-integer values in an ARIMA model, which is also called an ARFIMA model. We apply the arfima function from the forecast package to find automatically the best ARFIMA model. This function selects and and estimates the parameters of model using an algorithm proposed in [58], whilst the algorithm provided in [59] is applied to estimate the parameters including .

Exponential smoothing (ETS)

Exponential smoothing methods are among the most widely used forecasting procedures in practice. These were originally classified by Pegels’ taxonomy [60] and later extended by Gardner [61], modified by Hyndman et al. [62], and extended again by Taylor [63], giving a total of fifteen methods. It has shown that the exponential smoothing family has good forecast accuracy over several forecasting competitions [64], [65], [66] and is especially suitable for short time series. Some of well-known methods such as simple (or single) exponential smoothing, Holt’s linear method, additive and multiplicative Holt-Winters’ methods are special cases of exponential smoothing techniques. In order to refer to the three components error, trend, and seasonality in exponential smoothing methods; the notation ETS is proposed in [58] and we also use this notation. The ETS models can capture a variety of trend and seasonal structures (additive or multiplicative) and combinations of those. A detailed description of ETS can be found in [67] and is therefore not repeated here. We apply the ets function from the forecast package to find automatically the best ETS model. This function implement the innovations state space modelling framework described in [67] for parameter estimation and forecasting.

TBATS Model

An innovations state space modelling framework has been introduced in [68] for forecasting complex seasonal time series such as those with multiple seasonal periods, high-frequency seasonality, non-integer seasonality, and dual-calendar effects. This model, which is called BATS, is an exponential smoothing state space model with Box-Cox transformation, ARMA errors, trend and seasonal components. This model is a generalization of the traditional seasonal innovations models to allow for multiple seasonal periods. The notation BATS is an acronym for Box–Cox transform, ARMA errors, Trend, and Seasonal components. In TBATS model, the trigonometric representation of seasonal components based on Fourier transform is used and the initial T in the notation TBATS stands for trigonometric. For more information on the theory and applications of TBATS, see [68]. The tbats function is made available through the forecast package to fit TBATS model to a time series.

Neural network autoregression (NNAR)

There has been an increasing interest in using neural networks to model and forecast time series data. A neural network can be considered as a network of neurons which are arranged in layers. The predictors (or inputs) form the bottom layer, and the forecasts (or outputs) form the top layer. There may also be intermediate layers containing hidden neurons [57]. A linear regression is equivalent to the networks containing no hidden layers; however, the neural network becomes non-linear by adding an intermediate layer with hidden neurons [57]. This is known as a multilayer feed-forward network, where each layer of nodes receives inputs from the previous layers. Let us here briefly present some details of Neural Network Autoregression (NNAR) model and in doing so we mainly follow [57]. In the NNAR model, the lagged values of the time series can be used as inputs to a neural network. The notation NNAR() is used in [57] to indicate feed-forward networks with one hidden layer, lagged inputs and nodes in the hidden layer. In addition, a seasonal NNAR model has the notation NNAR to indicate as inputs with neurons in the hidden layer. The nnetar function in the forecast package fits an NNAR model to time series data. In this function, the values of and are selected automatically if they are not specified. More details on NNAR model and its applications can be found in [57].

Data sources

The accuracy of forecasting largely depends on the quality of data and requires ample historical data. There are several packages of free-available R software that provide data related to COVID-19. For example, nCov2019 contains not only Chinese data but also data on other countries and regions. Furthermore, conronavirus provides the dataset of Center for Systems Science and Engineering (CSSE) at Johns Hopkins University together with a dashboard. Additional R related resources on COVID-19 can be found in [69]. This paper focuses on top ten countries affected by COVID-19, namely, USA, India, Brazil, Russia, France, Spain, Argentina, Colombia, UK, and Mexico. In this study, we use the R package tidycovid19 in order to analyse the data of the number of confirmed cases, deaths, and recoveries reported by Johns Hopkins University CSSE [70]. The main advantage of this package is to provide transparent access to various data sources at the country-day level, including data on governmental interventions and on behavioural response of the public. This package facilitates the download of COVID-19 related data directly from authoritative sources, including as follows [71]: The CSSE team at Johns Hopkins University This data has developed to a standard resource for researchers and the general audience interested in assessing the global spreading of the virus. The data is provided at country and sub-country levels. European Centre for Disease Prevention and Control (ECDC) The data is updated daily and contains the latest available public data on the number of new COVID-19 cases reported per day and per country. Testing data collected by the ’Our World in Data’ team This team systematically collects data on COVID-19 testing from multiple national sources. Assessment Capacities Project (ACAPS) These data contain government measures dataset provided by ACAPS and allow researchers to study the effect of non-pharmaceutical interventions on the development of the virus. Oxford COVID-19 Government Response Tracker An alternative data source for governmental interventions. Apple Mobility Trends Reports The data is provided by Apple at country and sub-country levels. Google COVID-19 Community Mobility Reports data This data is available at the country, regional and U.S. county level. Google Trends It presents data on the search volume for the term “coronavirus”. This data can be used to assess the public attention to COVID-19 across countries and over time within a given country. The data is available at the country, regional and city level but availability varies across countries. World Bank These data contain country level information provided by the World Bank and allow researchers to calculate per capita measures of the virus spread. Also, these data can help researchers to assess the association of macro-economic variables with the development of the virus. The data of above-mentioned sources can be downloaded separately or in one merged data frame using specific download functions in the package. Additionally, a function and shiny app are given in this package to visualize the country-level spread of COVID-19. Despite all the advantages of this package, it has at least one drawback. If the cumulative data of confirmed cases, deaths, and recoveries are transformed into daily data, some negative data are obtained that are apparently irrational. In order to solve this problem, first, we considered the negative values and outliers as missing data. Then, these missing values were imputed by Kalman Smoothing method via na_kalman function from imputeTS package. For a detailed information on this package see [72]. Fig. 3 shows a choropleth world map of the country-level COVID-19 spread based on the number of confirmed cases (cumulative) until 29 October 2020.

Fig. 3

COVID-19 confirmed cases (cumulative) as of October 29, 2020.

The black circles are training sets, the red squares are test sets and other points are ignored (). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) The black circles are training sets, the red squares are test sets and other points are ignored (). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) COVID-19 confirmed cases (cumulative) as of October 29, 2020. The time series plots of daily confirmed cases are presented in Fig. 4 for ten countries as of October 29, 2020. Similar plots for the number of deaths and recoveries are depicted in Figs. 5 and 6 . As can be seen in Fig. 4, the number of confirmed cases have a periodic pattern in some countries such as USA, Brazil, Argentina, and Mexico. In addition, there is an obvious upward trend in the number of confirmed cases of USA, Russia, France, Spain, Argentina, and UK. However, it seems that the number of confirmed cases tend downwards in India.

Fig. 4

The time series plots of daily confirmed cases as of October 29, 2020.

Fig. 5

The time series plots of daily deaths as of October 29, 2020.

Fig. 6

The time series plots of daily recovered cases as of October 29, 2020.

The time series plots of daily confirmed cases as of October 29, 2020. The time series plots of daily deaths as of October 29, 2020. The time series plots of daily recovered cases as of October 29, 2020. It can be concluded from Fig. 5 that there is a periodic structure in the number of deaths in USA, Brazil, Russia, and Mexico. Also, an evident upward trend is visible in the number of deaths in Russia and Argentina. It is apparent from Fig. 6 that the number of recovered cases in Russia have a cyclical fluctuation. In addition, there is an upward trend in the number of recovered cases of USA, France, and Argentina. It is noteworthy that the number of recovered cases in Spain has been reported zero after 18 May 2020, which it seems irrational. Consequently, we ignore this dataset and do not provide point forecasts for the number of recovered cases in Spain. In order to provide a better understanding on the nature of the confirmed cases data, some descriptive statistics of the number of confirmed cases are reported for ten countries in Table 1 . These are the lengths of time series (N), minimum (Min.), mean, median, standard deviation (SD), coefficient of variation (CV) in percent, coefficient of skewness (Skew.), and maximum (Max.). Similar descriptive statistics of the number of deaths and recovered cases are presented in Tables 2 and 3 .

Table 1

Descriptive statistics for confirmed cases series.

Country	N	Mean	Median	SD	CV	Skew.	Max.	ADF
USA	282	31719.624	30817.5	22133.642	70	0.139	88,521	0.964*
India	274	29521.354	11480.0	32343.681	110	0.665	97,894	>0.99*
Brazil	247	22279.275	21704.0	17611.646	79	0.357	69,074	>0.99*
Russia	273	5752.549	5741.0	4325.621	75	0.422	17,418	0.917*
France	280	4992.389	1609.0	8345.041	167	2.701	47,637	>0.99*
Spain	272	4595.511	2150.0	5160.027	112	1.177	23,580	>0.99*
Argentina	241	4746.058	2632.0	5114.032	108	0.798	18,326	0.549*
Colombia	238	4403.592	3837.5	3896.218	88	0.319	13,056	0.939*
UK	273	3510.769	1297.0	5286.346	151	2.476	26,707	>0.99*
Mexico	245	3629.910	4147.0	2415.718	67	-0.167	9556	0.985*

Indicates a non-stationary time series based on the ADF test at .

Table 2

Descriptive statistics for deaths series.

Country	N	Mean	Median	SD	CV	Skew.	Max.	ADF
USA	282	810.837	793.5	653.305	81	0.688	2609	0.918*
India	274	434.296	335.0	415.217	96	0.437	1290	>0.99*
Brazil	247	644.113	632.0	434.916	68	0.009	1595	>0.99*
Russia	273	99.308	105.0	78.554	79	0.450	359	>0.99*
France	280	131.250	32.0	210.014	160	2.364	1122	0.811*
Spain	272	146.485	47.5	217.083	148	1.912	961	0.551*
Argentina	241	113.842	35.0	140.192	123	1.161	515	0.670*
Colombia	238	130.134	141.0	114.745	88	0.421	400	0.982*
UK	273	168.663	34.0	272.376	161	2.059	1224	0.922*
Mexico	245	362.282	342.0	277.708	77	0.299	1092	0.942*

Indicates a non-stationary time series based on the ADF test at .

Table 3

Descriptive statistics for recovered cases series.

Country	N	Mean	Median	SD	CV	Skew.	Max.	ADF
USA	282	12030.206	9181.5	11372.002	95	0.790	48,872	0.272*
India	274	26910.128	8584.5	31935.463	119	0.758	101,468	0.868*
Brazil	247	19381.733	18303.0	17437.521	90	0.625	76,649	0.950*
Russia	273	4320.385	4342.0	3706.346	86	0.310	14,550	>0.99*
France	280	515.836	361.0	501.929	97	1.212	2266	0.976*
Spain	272	552.853	0.0	1203.639	218	2.145	6399	0.641*
Argentina	241	3569.515	934.0	4484.971	126	0.987	14,987	0.961*
Colombia	238	3992.454	1644.0	4390.565	110	0.744	16,594	0.093*
UK	273	9.018	5.0	12.145	135	1.973	58	0.091*
Mexico	245	3159.686	3193.0	2404.058	76	0.254	10,915	0.959*

Indicates a non-stationary time series based on the ADF test at .

Descriptive statistics for confirmed cases series. Indicates a non-stationary time series based on the ADF test at . Descriptive statistics for deaths series. Indicates a non-stationary time series based on the ADF test at . Descriptive statistics for recovered cases series. Indicates a non-stationary time series based on the ADF test at . The skewness coefficient indicates that all time series considered in this study are right skewed, except the number of confirmed cases in Mexico. This information tells us that highly right skewed time series have a high probability for extreme values. This firstly suggest that it is more appropriate to consider median instead of mean as a central tendency measure for all majority of the skewed series. Secondly, it is better to apply the coefficient of variation criterion to compare the variability between countries. The last column in Table 1, Table 2, Table 3 shows the p-value of Augmented Dickey-Fuller (ADF) unit root test, which is one of the most commonly used unit root tests in the literature. It is used for testing a null hypothesis that an observable time series has a unit root against the alternative of stationary [73]. The results of ADF test, which are obtained using the function adf.test from the R package tseries [74], provide a sound evidence that all of the time series used here are non-stationary. The skewness and non-stationary structure of time series may have destructive effect on the forecasting results of linear time series models such as ARIMA. It should be noted that the lengths of time series are different because the time of the first observation (or starting time) of each time series is different from other series. The starting time is defined as the first time that confirmed cases were reported by governments. Table 4 lists the starting time of series for ten countries.

Table 4

Starting time of series for ten countries.

Country	Starting time
USA	22-01-2020
India	30-01-2020
Brazil	26-02-2020
Russia	31-01-2020
France	24-01-2020
Spain	01-02-2020
Argentina	03-03-2020
Colombia	06-03-2020
UK	31-01-2020
Mexico	28-02-2020

Starting time of series for ten countries.

Empirical results

In this section, the performance of R-SSA, V-SSA and other time series forecasting methods reviewed in Section 3 are evaluated by applying them to confirmed cases, deaths, and recoveries described in Section 4. The accuracy of forecasting results are measured using RMSE. In order to compute the RMSE of each forecasting method corresponding to the forecasting horizon (i.e. defined in (6)), Steps 1–5 of Algorithm 1 are used. It is noteworthy that the optimal is applied to produce R-SSA and V-SSA forecasting. In Table 5 , the optimal are reported for confirmed, deaths, and recovered series of the ten countries.

Table 5

Optimal for SSA forecasting.

Country	Time series	Forecasting method	Forecasting horizon (h)
Country	Time series	Forecasting method	7	14	20	30	40
USA	confirmed	R-SSA	(35,5)	(43,6)	(43,6)	(43,6)	(7,2)
	confirmed	V-SSA	(36,5)	(45,6)	(43,6)	(9,2)	(8,2)
	deaths	R-SSA	(73,12)	(73,12)	(70,11)	(72,12)	(73,12)
	deaths	V-SSA	(80,12)	(80,12)	(80,12)	(77,14)	(82,13)
	recovered	R-SSA	(11,2)	(6,1)	(5,1)	(5,1)	(5,1)
	recovered	V-SSA	(43,6)	(7,1)	(7,1)	(7,2)	(6,1)
India	confirmed	R-SSA	(15,5)	(14,3)	(14,3)	(7,2)	(7,2)
	confirmed	V-SSA	(15,5)	(15,5)	(11,2)	(2,1)	(2,1)
	deaths	R-SSA	(91,6)	(91,2)	(78,5)	(76,5)	(91,4)
	deaths	V-SSA	(89,12)	(77,9)	(91,12)	(78,8)	(74,9)
	recovered	R-SSA	(2,1)	(2,1)	(2,1)	(2,1)	(2,1)
	recovered	V-SSA	(2,1)	(2,1)	(2,1)	(2,1)	(7,2)
Brazil	confirmed	R-SSA	(25,5)	(25,5)	(25,5)	(25,5)	(5,1)
	confirmed	V-SSA	(26,6)	(32,5)	(32,5)	(32,5)	(11,2)
	deaths	R-SSA	(36,9)	(36,9)	(59,13)	(59,13)	(59,13)
	deaths	V-SSA	(57,14)	(57,20)	(59,19)	(58,19)	(65,19)
	recovered	R-SSA	(7,1)	(4,1)	(3,1)	(3,1)	(3,1)
	recovered	V-SSA	(9,2)	(6,2)	(6,2)	(6,2)	(6,2)
Russia	confirmed	R-SSA	(13,3)	(7,2)	(9,4)	(75,18)	(74,27)
	confirmed	V-SSA	(13,3)	(8,4)	(24,7)	(75,18)	(73,30)
	deaths	R-SSA	(29,7)	(14,6)	(14,6)	(36,7)	(36,7)
	deaths	V-SSA	(34,7)	(33,7)	(37,7)	(37,7)	(37,7)
	recovered	R-SSA	(10,7)	(15,7)	(15,7)	(24,4)	(24,4)
	recovered	V-SSA	(10,8)	(16,11)	(16,11)	(16,11)	(16,11)
France	confirmed	R-SSA	(25,4)	(67,8)	(67,8)	(67,7)	(67,4)
	confirmed	V-SSA	(25,5)	(60,15)	(89,8)	(93,9)	(67,5)
	deaths	R-SSA	(16,4)	(63,25)	(63,27)	(2,1)	(2,1)
	deaths	V-SSA	(11,6)	(61,30)	(61,30)	(2,1)	(2,1)
	recovered	R-SSA	(2,1)	(75,14)	(9,1)	(8,1)	(2,1)
	recovered	V-SSA	(4,1)	(83,14)	(64,26)	(6,1)	(2,1)
Spain	confirmed	R-SSA	(24,7)	(3,1)	(3,1)	(5,1)	(5,1)
	confirmed	V-SSA	(26,7)	(4,1)	(5,1)	(7,1)	(5,1)
	deaths	R-SSA	(14,2)	(3,1)	(3,1)	(2,1)	(2,1)
	deaths	V-SSA	(18,2)	(4,1)	(4,1)	(2,1)	(3,1)
	recovered	R-SSA	(35,1)	(35,1)	(2,1)	(2,1)	(2,1)
	recovered	V-SSA	(73,4)	(70,3)	(70,3)	(69,1)	(66,4)
Argentina	confirmed	R-SSA	(74,5)	(12,3)	(9,3)	(5,1)	(5,1)
	confirmed	V-SSA	(68,4)	(11,3)	(11,3)	(11,3)	(11,2)
	deaths	R-SSA	(17,4)	(26,2)	(26,2)	(20,1)	(13,1)
	deaths	V-SSA	(13,4)	(33,4)	(27,4)	(18,2)	(15,2)
	recovered	R-SSA	(3,1)	(4,1)	(6,1)	(2,1)	(6,1)
	recovered	V-SSA	(4,1)	(80,1)	(4,1)	(2,1)	(80,1)
Colombia	confirmed	R-SSA	(3,1)	(8,1)	(8,2)	(6,2)	(6,2)
	confirmed	V-SSA	(3,1)	(2,1)	(2,1)	(2,1)	(2,1)
	deaths	R-SSA	(3,1)	(2,1)	(6,2)	(6,2)	(3,1)
	deaths	V-SSA	(7,2)	(2,1)	(2,1)	(2,1)	(2,1)
	recovered	R-SSA	(2,1)	(13,1)	(6,1)	(5,1)	(4,1)
	recovered	V-SSA	(2,1)	(15,1)	(2,1)	(2,1)	(3,1)
UK	confirmed	R-SSA	(4,1)	(2,1)	(87,22)	(49,7)	(86,15)
	confirmed	V-SSA	(5,1)	(2,1)	(51,7)	(74,16)	(88,23)
	deaths	R-SSA	(21,18)	(13,11)	(13,11)	(2,1)	(2,1)
	deaths	V-SSA	(24,20)	(24,20)	(2,1)	(4,1)	(6,1)
	recovered	R-SSA	(25,1)	(43,5)	(43,5)	(43,6)	(24,1)
	recovered	V-SSA	(25,1)	(34,1)	(53,7)	(52,8)	(53,9)
Mexico	confirmed	R-SSA	(15,5)	(15,5)	(6,3)	(5,1)	(4,1)
	confirmed	V-SSA	(16,4)	(16,4)	(16,4)	(16,4)	(16,4)
	deaths	R-SSA	(41,5)	(34,5)	(34,5)	(56,3)	(57,3)
	deaths	V-SSA	(62,10)	(61,10)	(42,24)	(36,5)	(36,5)
	recovered	R-SSA	(8,2)	(6,1)	(5,1)	(5,1)	(5,1)
	recovered	V-SSA	(24,7)	(7,1)	(7,1)	(6,1)	(6,1)

Optimal for SSA forecasting. Table 6 shows the rounded RMSEs of forecasting the number of confirmed cases for the ten countries, which are calculated for each of forecasting methods and different forecasting horizons. The RMSEs of forecasting the number of deaths and recovered cases are reported in Tables 7 and 8 . The bold font in these tables shows the forecasting method with the lowest RMSE at each horizon for a given country. Also, the last column of these tables indicates the average of RMSE across all forecasting horizons for a given forecasting method.

Table 6

The RMSE of forecasting the number of confirmed cases.

Country	Forecasting method	Forecasting horizon (h)					Avg.
Country	Forecasting method	7	14	20	30	40	Avg.
USA	R-SSA	6881	8585	9920	13,184	16,090	10,932
	V-SSA	6735	8517	10,535	13,610	16,110	11,101
	ARIMA	9231	13,020	16,195	21,521	28,557	17,705
	ARFIMA	10,562	10,974	11,112	11,218	11,791	11,131
	ETS	9472	10,447	11,261	12,288	13,356	11,365
	TBATS	7549	9567	11,130	14,021	16,894	11,832
	NNAR	7832	8747	9672	9522	9997	9154
India	R-SSA	6478	11,786	18,024	29,677	43,564	21,906
	V-SSA	6737	12,561	19,574	30,265	44,493	22,726
	ARIMA	8620	13,942	19,149	26,074	32,109	19,979
	ARFIMA	15,317	22,366	27,189	34,542	41,617	28,206
	ETS	8007	12,027	16,615	25,654	33,955	19,252
	TBATS	7282	10760	14,748	22,170	30,457	17,083
	NNAR	9172	11,763	13523	16104	18241	13761
Brazil	R-SSA	8656	9932	10732	13,301	15,336	11,591
	V-SSA	8920	10,307	11,200	13,877	16,406	12,142
	ARIMA	11,390	15,895	21,312	35,610	56,389	28,119
	ARFIMA	12,150	12,706	13,164	13,673	14,195	13,178
	ETS	11,227	11,603	11,947	12,991	14,265	12,407
	TBATS	9737	10,420	10,732	11795	13057	11148
	NNAR	9242	10,243	11,052	12,641	13,620	11,360
Russia	R-SSA	405	815	1285	1481	2081	1213
	V-SSA	396	812	1267	1507	2186	1234
	ARIMA	427	720	1065	1902	2669	1357
	ARFIMA	666	1243	1827	2524	2816	1815
	ETS	454	858	1299	2109	2705	1485
	TBATS	473	854	1283	2046	2553	1442
	NNAR	934	1656	2182	2598	2717	2017
France	R-SSA	2104	2299	2488	2837	3545	2655
	V-SSA	2094	2301	2524	2685	3377	2596
	ARIMA	2563	3460	4336	5192	6517	4414
	ARFIMA	4559	5650	6259	7349	8380	6439
	ETS	2360	2969	3667	4627	5740	3873
	TBATS	2298	2873	3538	4294	5313	3663
	NNAR	3062	3979	4754	5921	7494	5042
Spain	R-SSA	1273	1708	1999	2495	2920	2079
	V-SSA	1291	1700	1979	2485	2958	2083
	ARIMA	1474	1913	2187	2826	4484	2577
	ARFIMA	2349	3412	4179	5250	6186	4275
	ETS	1521	1953	2207	2473	2934	2218
	TBATS	1524	1996	2286	2561	3195	2312
	NNAR	2538	2812	3158	3938	4828	3455
Argentina	R-SSA	1498	1904	2297	2965	3889	2511
	V-SSA	1583	1878	2217	3076	4152	2581
	ARIMA	1825	2208	2574	2769	2792	2434
	ARFIMA	3063	3873	4546	5376	6033	4578
	ETS	2422	2566	2797	2978	3071	2767
	TBATS	1789	1973	2204	2610	2925	2300
	NNAR	2650	3247	3616	4103	4640	3651
Colombia	R-SSA	1500	2339	2451	3492	4893	2935
	V-SSA	1507	2026	2529	3685	5376	3025
	ARIMA	1778	2582	3409	5188	7698	4131
	ARFIMA	1529	1686	1779	2014	2282	1858
	ETS	1628	2247	2937	4369	6355	3507
	TBATS	1371	1715	2080	2778	3744	2338
	NNAR	1355	1583	1727	2009	2276	1790
UK	R-SSA	1659	2296	2357	2464	3192	2394
	V-SSA	1642	2296	2039	2373	3031	2276
	ARIMA	1736	2522	3041	3714	4612	3125
	ARFIMA	2412	3191	3674	4361	5074	3742
	ETS	1622	2409	3161	4134	4882	3242
	TBATS	1618	2353	2881	3334	3859	2809
	NNAR	2148	2820	3311	4185	4687	3430
Mexico	R-SSA	847	1015	1108	1299	1502	1154
	V-SSA	830	886	939	1095	1293	1009
	ARIMA	1135	1438	1753	2584	3828	2148
	ARFIMA	1238	1244	1250	1303	1365	1280
	ETS	1050	1088	1119	1234	1430	1184
	TBATS	1028	1074	1111	1216	1316	1149
	NNAR	1059	1091	1130	1219	1310	1162

Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.

Table 7

The RMSE of forecasting the number of deaths.

Country	Forecasting method	Forecasting horizon (h)					Avg.
Country	Forecasting method	7	14	20	30	40	Avg.
USA	R-SSA	121	128	134	141	158	136
	V-SSA	120	129	139	150	168	141
	ARIMA	264	312	361	458	599	399
	ARFIMA	360	367	377	379	386	374
	ETS	380	395	412	438	469	419
	TBATS	222	250	282	337	386	295
	NNAR	172	184	191	203	216	193
India	R-SSA	102	128	143	168	191	146
	V-SSA	98	119	128	147	149	128
	ARIMA	114	162	213	285	343	223
	ARFIMA	194	263	309	376	447	318
	ETS	112	152	202	292	370	226
	TBATS	107	140	178	231	265	184
	NNAR	123	146	161	172	180	156
Brazil	R-SSA	143	155	164	179	200	168
	V-SSA	144	157	165	176	190	166
	ARIMA	268	386	555	1024	1928	832
	ARFIMA	298	306	313	314	320	310
	ETS	267	282	297	316	348	302
	TBATS	209	222	239	255	270	239
	NNAR	231	253	269	294	317	273
Russia	R-SSA	19	24	30	37	42	30
	V-SSA	18	23	28	35	40	29
	ARIMA	32	38	39	42	42	39
	ARFIMA	44	45	48	54	58	50
	ETS	37	42	46	52	56	47
	TBATS	33	38	41	46	51	42
	NNAR	35	39	42	46	47	42
France	R-SSA	37	45	45	50	56	47
	V-SSA	37	45	45	50	56	47
	ARIMA	42	45	45	49	55	47
	ARFIMA	42	44	44	46	51	45
	ETS	39	42	42	49	55	45
	TBATS	39	43	43	46	53	45
	NNAR	47	57	65	68	65	60
Spain	R-SSA	52	62	69	79	89	70
	V-SSA	52	61	68	79	88	70
	ARIMA	50	58	63	74	88	67
	ARFIMA	56	65	72	86	101	76
	ETS	50	57	63	75	90	67
	TBATS	51	60	67	75	86	68
	NNAR	87	95	119	138	141	116
Argentina	R-SSA	87	100	110	127	141	113
	V-SSA	86	98	108	124	139	111
	ARIMA	97	108	113	117	113	110
	ARFIMA	108	134	153	182	200	155
	ETS	97	112	126	135	128	120
	TBATS	106	128	155	200	137	145
	NNAR	101	122	128	134	145	126
Colombia	R-SSA	38	52	69	107	162	85
	V-SSA	38	52	69	109	165	87
	ARIMA	45	66	91	146	225	114
	ARFIMA	34	38	39	40	42	38
	ETS	36	46	57	84	116	68
	TBATS	35	43	52	73	98	60
	NNAR	37	47	56	75	97	63
UK	R-SSA	23	28	27	33	36	29
	V-SSA	22	28	31	33	35	30
	ARIMA	27	31	33	35	36	32
	ARFIMA	26	29	32	33	35	31
	ETS	20	21	24	34	42	28
	TBATS	18	22	26	33	40	28
	NNAR	27	34	37	48	52	40
Mexico	R-SSA	113	124	135	151	166	138
	V-SSA	122	132	147	159	174	147
	ARIMA	179	226	280	429	679	359
	ARFIMA	185	185	185	189	195	188
	ETS	181	179	180	190	205	187
	TBATS	180	187	192	214	246	204
	NNAR	158	170	195	215	238	195

Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.

Table 8

The RMSE of forecasting the number of recovered cases.

Country	Forecasting method	Forecasting horizon (h)					Avg.
Country	Forecasting method	7	14	20	30	40	Avg.
USA	R-SSA	8442	8960	9568	11,089	13,439	10,300
	V-SSA	8174	9053	9716	11,497	14,179	10,524
	ARIMA	9014	11,079	14,105	21,463	33,801	17,892
	ARFIMA	9584	10,071	10,627	11,743	13,231	11,051
	ETS	8320	8423	8506	8870	9181	8660
	TBATS	8226	8557	8984	10,033	11,631	9486
	NNAR	8805	9034	9366	9713	9818	9347
India	R-SSA	7531	12,050	16,991	28,351	41,859	21,356
	V-SSA	7531	12,050	16,991	28,351	41,540	21,293
	ARIMA	9080	15,161	21,918	37,759	58,131	28,410
	ARFIMA	13,458	20,923	27,035	36,878	45,806	28,820
	ETS	8466	12,619	17,074	27,410	37,001	20,514
	TBATS	6471	9381	12652	19206	21120	13766
	NNAR	8573	12,971	16,084	20,424	23,135	16,237
Brazil	R-SSA	10,689	12,312	13,941	16,879	21,358	15,036
	V-SSA	10,602	12,027	13,639	16,811	21,616	14,939
	ARIMA	12,662	17,031	23,130	37,745	60,446	30,203
	ARFIMA	10,648	11,103	11,764	12,296	12,732	11,709
	ETS	10,982	12,809	14,955	20,654	29,794	17,839
	TBATS	9390	10128	10762	11721	12275	10855
	NNAR	11,293	11,959	12,354	12,474	12,326	12,081
Russia	R-SSA	733	1049	1256	1424	1592	1211
	V-SSA	755	1030	1201	1365	1513	1173
	ARIMA	1744	2549	3490	5740	9530	4611
	ARFIMA	2163	2254	2298	2463	2587	2353
	ETS	1789	1846	1935	2055	2106	1946
	TBATS	1400	1643	1761	1850	1900	1711
	NNAR	1080	1382	1641	1977	2193	1655
France	R-SSA	242	276	293	305	337	291
	V-SSA	235	263	279	300	337	283
	ARIMA	243	284	301	316	356	300
	ARFIMA	273	304	311	325	363	315
	ETS	238	275	293	310	346	292
	TBATS	239	282	302	313	338	295
	NNAR	278	320	355	368	381	340
Argentina	R-SSA	1395	1885	2416	3978	7003	3335
	V-SSA	1397	1880	2433	3978	6883	3314
	ARIMA	1299	1566	1881	2788	4240	2355
	ARFIMA	2360	3336	4122	5333	6164	4263
	ETS	1307	1568	1860	2666	3906	2262
	TBATS	1336	1634	1884	2312	2886	2010
	NNAR	1536	1895	2156	2695	3213	2299
Colombia	R-SSA	3473	4889	5898	8670	13,436	7273
	V-SSA	3473	4947	6040	9030	14,151	7528
	ARIMA	3700	5329	6573	9632	13,955	7838
	ARFIMA	3216	4041	4206	4342	4685	4098
	ETS	3502	4773	5516	7557	10,932	6456
	TBATS	3253	4087	4349	5105	6251	4609
	NNAR	3115	3734	3760	4120	4430	3832
UK	R-SSA	12	13	14	15	16	14
	V-SSA	12	13	14	15	15	14
	ARIMA	13	15	16	18	18	16
	ARFIMA	14	16	17	18	18	17
	ETS	13	15	16	17	17	16
	TBATS	12	13	14	15	15	14
	NNAR	14	16	602,997	18	4,907,095	1,102,028
Mexico	R-SSA	1442	1570	1639	1749	1955	1671
	V-SSA	1441	1583	1664	1824	2102	1723
	ARIMA	1658	1903	2199	2873	3892	2505
	ARFIMA	1637	1637	1649	1626	1630	1636
	ETS	1578	1601	1629	1608	1653	1614
	TBATS	1528	1571	1632	1613	1624	1594
	NNAR	1402	1551	1655	1864	2033	1701

Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country.

The RMSE of forecasting the number of confirmed cases. Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country. The RMSE of forecasting the number of deaths. Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country. The RMSE of forecasting the number of recovered cases. Note: The bold font shows the forecasting method with the lowest RMSE at each horizon for a given country. By having the RMSEs reported in Table 6, Table 7, Table 8, we are able to determine the best forecasting technique corresponding to minimum RMSE. For example, the best model of forecasting the number of deaths for USA is R-SSA, at forecasting horizon 40. The best model for forecasting the confirmed series of each country is presented in Table 9 . Similarly, the best model for forecasting the deaths and recoveries is reported in Tables 10 and 11 . The last column of Table 9, Table 10, Table 11 shows the best model on average for a given country, which corresponds to the lowest RMSE presented in the last column of Table 6, Table 7, Table 8. The first finding from Table 9, Table 10, Table 11 is that no single model can provide the best forecast of the number of confirmed cases, deaths, and recoveries for all ten countries considered here. Secondly, based on the number of times that R-SSA and V-SSA techniques outperform the other models across all horizons, we can suggest that the two SSA models are viable options for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. Another interesting finding is that the best model for forecasting the number of deaths in Colombia is the ARFIMA model, across all forecasting horizons. This means that there is a long-range dependence in the time series of deaths in Colombia.

Table 9

The best model for forecasting the COVID-19 confirmed cases.

Country	Forecasting horizon (h)					Avg.
Country	7	14	20	30	40	Avg.
USA	V-SSA	V-SSA	NNAR	NNAR	NNAR	NNAR
India	R-SSA	TBATS	NNAR	NNAR	NNAR	NNAR
Brazil	R-SSA	R-SSA	R-SSA	TBATS	TBATS	TBATS
Russia	V-SSA	ARIMA	ARIMA	R-SSA	R-SSA	R-SSA
France	V-SSA	R-SSA	R-SSA	V-SSA	V-SSA	V-SSA
Spain	R-SSA	V-SSA	V-SSA	ETS	R-SSA	R-SSA
Argentina	R-SSA	V-SSA	TBATS	TBATS	ARIMA	TBATS
Colombia	NNAR	NNAR	NNAR	NNAR	NNAR	NNAR
UK	TBATS	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA
Mexico	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA

Table 10

The best model for forecasting the number of deaths caused by COVID-19.

Country	Forecasting horizon (h)					Avg.
Country	7	14	20	30	40	Avg.
USA	V-SSA	R-SSA	R-SSA	R-SSA	R-SSA	R-SSA
India	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA
Brazil	R-SSA	R-SSA	R-SSA	V-SSA	V-SSA	V-SSA
Russia	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA
France	R-SSA	ETS	ETS	TBATS	ARFIMA	TBATS
Spain	ARIMA	ETS	ETS	ARIMA	TBATS	ARIMA
Argentina	V-SSA	V-SSA	V-SSA	ARIMA	ARIMA	ARIMA
Colombia	ARFIMA	ARFIMA	ARFIMA	ARFIMA	ARFIMA	ARFIMA
UK	TBATS	ETS	ETS	TBATS	ARFIMA	TBATS
Mexico	R-SSA	R-SSA	R-SSA	R-SSA	R-SSA	R-SSA

Table 11

The best model for forecasting the recovered cases.

Country	Forecasting horizon (h)					Avg.
Country	7	14	20	30	40	Avg.
USA	V-SSA	ETS	ETS	ETS	ETS	ETS
India	TBATS	TBATS	TBATS	TBATS	TBATS	TBATS
Brazil	TBATS	TBATS	TBATS	TBATS	TBATS	TBATS
Russia	R-SSA	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA
France	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA	V-SSA
Argentina	ARIMA	ARIMA	ETS	TBATS	TBATS	TBATS
Colombia	NNAR	NNAR	NNAR	NNAR	NNAR	NNAR
UK	V-SSA	TBATS	TBATS	V-SSA	V-SSA	V-SSA
Mexico	NNAR	NNAR	ETS	ETS	TBATS	TBATS

The best model for forecasting the COVID-19 confirmed cases. The best model for forecasting the number of deaths caused by COVID-19. The best model for forecasting the recovered cases. The results of Table 9, Table 10, Table 11 are useful to practitioners in two ways. First, it can be determined which model is the best for forecasting at a particular horizon for a given country. Second, the results enable practitioners to select the best model on average for forecasting in selected country across all forecasting horizons. By exploiting of the results given in Table 9, Table 10, Table 11, we are able to provide forecasts for the number of confirmed cases, deaths, and recoveries caused by COVID-19, at different forecasting horizons. Fig. 7, Fig. 8, Fig. 9 depict the original time series (black circles) together with 40 days ahead point forecasts (red squares) for the number of confirmed cases, deaths, and recoveries. Forecasting results shown in Fig. 7 indicate that there will be a dramatic increase in the number of confirmed cases of France, Spain, and UK. However, the rate of growth will be slower in Russia and Argentina. This increase will happen slowly in India, Brazil, Colombia, and Mexico. Also, this results reveal that the number of confirmed cases will be decreasing in USA.

Fig. 7

Plot of 40 days ahead point forecasts of confirmed cases starting from October 30, 2020.

Fig. 8

Plot of 40 days ahead point forecasts of deaths starting from October 30, 2020.

Fig. 9

Plot of 40 days ahead point forecasts of recovered cases starting from October 30, 2020.

Plot of 40 days ahead point forecasts of confirmed cases starting from October 30, 2020. Plot of 40 days ahead point forecasts of deaths starting from October 30, 2020. Plot of 40 days ahead point forecasts of recovered cases starting from October 30, 2020. It can be concluded from Fig. 8 that there will be a considerable increase in the number of deaths of Russia and Argentina; however, it will be decreasing in India, France, Colombia, and UK. The number of deaths will fluctuate around an almost constant value in USA, Brazil, and Mexico. According to the forecasting results depicted in Fig. 9, the number of recovered cases will rise in USA, Russia, France, and Argentina; however, there will be a decline in Brazil, and especially in India. This quantity will tend to a constant in Colombia and Mexico. In addition, the number of recovered cases in UK will fluctuate, but the trend will upward.

Discussion

In this study, we have evaluated the potential advantages of SSA for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. In order to calculate the optimal parameters of SSA including window length and the number of leading components, an algorithm have been proposed. The results of R-SSA and V-SSA have been compared to those from other conventional time series forecasting techniques including ARIMA, ARFIMA, ETS, TBATS, and NNAR. The dataset of CSSE at Johns Hopkins University has been adopted to forecast the number of daily confirmed cases, deaths, and recoveries for top ten affected countries until 29 October 2020. It should be noted that the dataset of CSSE has a considerable disadvantage. If the cumulative data of confirmed cases, deaths, and recoveries are transformed into daily data, some negative data are obtained that are apparently irrational. In order to deal with this issue, first, we considered the negative values and outliers as missing data. Then, these missing values were imputed by Kalman Smoothing method. It is worth mentioning that the present study is unique with regard to using optimal version of V-SSA and R-SSA, and comparing the results to those from ARIMA, ARFIMA, ETS, TBATS, and NNAR. The findings of this study can be summarised as follows: No single model can provide the best forecast of the number of confirmed cases, deaths, and recoveries for all ten countries considered here. Based on the number of times that R-SSA and V-SSA forecasting techniques outperform the other models across all horizons, these methods are viable options for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. There will be a rapid rise in the number of confirmed cases of France, Spain, and UK. However, the rate of increase will be slower in Russia and Argentina. This growth will occur slowly in India, Brazil, Colombia, and Mexico. The results of point forecasts reveal that the number of confirmed cases will be decreasing in USA. The number of deaths of Russia and Argentina will increase dramatically; however, it will be decreasing in India, France, Colombia, and UK. The number of deaths will fluctuate around an almost constant value in USA, Brazil, and Mexico. There will be an increase in the number of recovered cases of USA, Russia, France, and Argentina; however, there will be a decline in Brazil, and especially in India. This quantity will tend to a constant in Colombia and Mexico. Also, the number of recovered cases in UK will fluctuate, but the trend will upward.

Conclusion

In this paper, we have used the optimal version of two forecasting techniques of SSA, namely R-SSA and V-SSA, for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. In order to evaluate the performance of these approaches based on the RMSE criterion, the forecasting results have been compared to those from other commonly used time series forecasting methods including ARIMA, ARFIMA, ETS, TBATS, and NNAR. We considered only the first ten countries in terms of the number of cumulative confirmed cases. These countries include USA, India, Brazil, Russia, France, Spain, Argentina, Colombia, UK, and Mexico. The evidence from this investigation shows that there is not a single model to provide the best model for any of the countries and forecasting horizons considered in this study. However, we have found that the optimal SSA technique can provide a powerful tool for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19 based on the number of times that it outperforms the competing models. Our study has an obvious shortcoming. The forecasting methods used in this investigation may produce some negative point forecasts that are clearly meaningless as the number of confirmed cases, deaths, and recoveries. In order to make positive point forecasts, we suggest using count time series models. This work has gone some way towards enhancing our understanding of SSA capabilities for forecasting the COVID-19 pandemic. The results of this study enable forecasters to choose the most appropriate model (from those considered here) based on the country and horizon for forecasting the number of confirmed cases, deaths, and recoveries caused by COVID-19. We hope that our forecasts will be a useful tool for governments towards making appropriate decisions to control the disease and prevent further damages. In terms of future research, we will apply the multivariate version of SSA that employs the time-dependent correlations between several time series.

CRediT authorship contribution statement

Mahdi Kalantari: Conceptualization, Methodology, Software, Data curation, Visualization, Validation, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

21 in total

1. Bicoid signal extraction: Another powerful approach.

Authors: Maryam Movahedifar; Masoud Yarmohammadi; Hossein Hassani
Journal: Math Biosci Date: 2018-07-04 Impact factor: 2.144

2. Forecasting Models for Coronavirus Disease (COVID-19): A Survey of the State-of-the-Art.

Authors: Gitanjali R Shinde; Asmita B Kalamkar; Parikshit N Mahalle; Nilanjan Dey; Jyotismita Chaki; Aboul Ella Hassanien
Journal: SN Comput Sci Date: 2020-06-11

3. Statistical analysis of forecasting COVID-19 for upcoming month in Pakistan.

Authors: Muhammad Yousaf; Samiha Zahir; Muhammad Riaz; Sardar Muhammad Hussain; Kamal Shah
Journal: Chaos Solitons Fractals Date: 2020-05-25 Impact factor: 5.944

4. Forecast and evaluation of COVID-19 spreading in USA with reduced-space Gaussian process regression.

Authors: Ricardo Manuel Arias Velásquez; Jennifer Vanessa Mejía Lara
Journal: Chaos Solitons Fractals Date: 2020-05-22 Impact factor: 5.944

5. Time Series Analysis and Forecast of the COVID-19 Pandemic in India using Genetic Programming.

Authors: Rohit Salgotra; Mostafa Gandomi; Amir H Gandomi
Journal: Chaos Solitons Fractals Date: 2020-05-30 Impact factor: 5.944

6. Development of new hybrid model of discrete wavelet decomposition and autoregressive integrated moving average (ARIMA) models in application to one month forecast the casualties cases of COVID-19.

Authors: Sarbjit Singh; Kulwinder Singh Parmar; Jatinder Kumar; Sidhu Jitendra Singh Makkhan
Journal: Chaos Solitons Fractals Date: 2020-05-11 Impact factor: 5.944

7. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions.

Authors: Zifeng Yang; Zhiqi Zeng; Ke Wang; Sook-San Wong; Wenhua Liang; Mark Zanin; Peng Liu; Xudong Cao; Zhongqiang Gao; Zhitong Mai; Jingyi Liang; Xiaoqing Liu; Shiyue Li; Yimin Li; Feng Ye; Weijie Guan; Yifan Yang; Fei Li; Shengmei Luo; Yuqi Xie; Bin Liu; Zhoulang Wang; Shaobo Zhang; Yaonan Wang; Nanshan Zhong; Jianxing He
Journal: J Thorac Dis Date: 2020-03 Impact factor: 3.005

8. Real-time estimation and prediction of mortality caused by COVID-19 with patient information based algorithm.

Authors: Lishi Wang; Jing Li; Sumin Guo; Ning Xie; Lan Yao; Yanhong Cao; Sara W Day; Scott C Howard; J Carolyn Graff; Tianshu Gu; Jiafu Ji; Weikuan Gu; Dianjun Sun
Journal: Sci Total Environ Date: 2020-04-08 Impact factor: 7.963

9. Removal of EMG Artifacts from Multichannel EEG Signals Using Combined Singular Spectrum Analysis and Canonical Correlation Analysis.

Authors: Qingze Liu; Aiping Liu; Xu Zhang; Xiang Chen; Ruobing Qian; Xun Chen
Journal: J Healthc Eng Date: 2019-12-30 Impact factor: 2.682

5 in total

1. Prediction intervals of the COVID-19 cases by HAR models with growth rates and vaccination rates in top eight affected countries: Bootstrap improvement.

Authors: Eunju Hwang
Journal: Chaos Solitons Fractals Date: 2022-01-03 Impact factor: 5.944

2. Forecasting and comparative analysis of Covid-19 cases in India and US.

Authors: Santanu Biswas
Journal: Eur Phys J Spec Top Date: 2022-03-19 Impact factor: 2.707

Review 3. Statistical Modeling for the Prediction of Infectious Disease Dissemination With Special Reference to COVID-19 Spread.

Authors: Subhash Kumar Yadav; Yusuf Akhter
Journal: Front Public Health Date: 2021-06-16

4. Modelling COVID-19 Scenarios for the States and Federal Territories of Malaysia.

Authors: Noor Atinah Ahmad; Mohd Hafiz Mohd; Kamarul Imran Musa; Jafri Malin Abdullah; Nurul Ashikin Othman
Journal: Malays J Med Sci Date: 2021-10-26

Review 5. Artificial Intelligence for Forecasting the Prevalence of COVID-19 Pandemic: An Overview.

Authors: Ammar H Elsheikh; Amal I Saba; Hitesh Panchal; Sengottaiyan Shanmugan; Naser A Alsaleh; Mahmoud Ahmadein
Journal: Healthcare (Basel) Date: 2021-11-23

5 in total