Literature DB >> 35002103

Prediction intervals of the COVID-19 cases by HAR models with growth rates and vaccination rates in top eight affected countries: Bootstrap improvement.

Abstract

This paper is devoted to modeling and predicting COVID-19 confirmed cases through a multiple linear regression. Especially, prediction intervals of the COVID-19 cases are extensively studied. Due to long-memory feature of the COVID-19 data, a heterogeneous autoregression (HAR) is adopted with Growth rates and Vaccination rates; it is called HAR-G-V model. Top eight affected countries are taken with their daily confirmed cases and vaccination rates. Model criteria results such as root mean square error (RMSE), mean absolute error (MAE), R 2 , AIC and BIC are reported in the HAR models with/without the two rates. The HAR-G-V model performs better than other HAR models. Out-of-sample forecasting by the HAR-G-V model is conducted. Forecast accuracy measures such as RMSE, MAE, mean absolute percentage error and root relative square error are computed. Furthermore, three types of prediction intervals are constructed by approximating residuals to normal and Laplace distributions, as well as by employing bootstrap procedure. Empirical coverage probability, average length and mean interval score are evaluated for the three prediction intervals. This work contributes three folds: a novel trial to combine both growth rates and vaccination rates in modeling COVID-19; construction and comparison of three types of prediction intervals; and an attempt to improve coverage probability and mean interval score of prediction intervals via bootstrap technique.

Entities: Chemical

Keywords: Bootstrap procedure; COVID-19; HAR model; Mean interval score; Prediction interval

Year: 2022 PMID： 35002103 PMCID： PMC8720534 DOI： 10.1016/j.chaos.2021.111789

Source DB: PubMed Journal: Chaos Solitons Fractals ISSN： 0960-0779 Impact factor: 5.944

Introduction

The COVID-19 pandemic has lasted for more than a year although vaccinations against the COVID-19 has been recently distributed around the world. The severe virus is still threatening human health, social activity and global economy, despite struggles and efforts of experts to overcome the crisis. Many researchers have attempted to help with health systems of COVID-19. Statisticians and econometricians contribute to preventing the spread of COVID-19 through a variety of time series models, by modeling and forecasting the COVID-19 confirmed and death cases. Notable time series studies have emerged to model the COVID-19 data. From the beginning of Benvenuto et al. [1], who used an ARIMA (autoregressive integrated moving average) model, there have been so many ARIMA time series analyses for the COVID-19 datasets [2], [3], [4], [5], [6], [7], [8], [9], [10]. Adding sophisticated theories of probability distributions and differential equations, mathematical models have been developed by [12], [13], [14], [15], [16], [17], [18], [19], [11] for efficient predictions of the COVID-19 pandemic. In the area of artificial intelligence and machine learning, stochastic modelings for the COVID-19 epidemic disease have been conducted to support the forecast analyses [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31]. Among recent remarkable works of the COVID-19 modeling, we refer to Chowell and Luo [32], Bracher et al. [33], Ekinci [34], Hwang and Yu [35]: Chowell and Luo [32] introduced ensemble methodology for forecasting growth process of epidemic outbreaks using differential equations. They achieved the improved coverage rate and mean interval score of prediction intervals. Bracher et al. [33] combined multiple models to improve reliability of forecast outputs. Ekinci [34] adopted an ARMA-GARCH (generalized autoregressive conditional heteroscedasticity) process for modeling the growth rate of daily COVID-19 confirmed cases, considering conditional variance and asymmetric effect, while [35] employed a HAR (heterogeneous AR) model to capture the long memory embedded in the COVID-19 data. The GARCH and HAR models in Ekinci [34], Hwang and Yu [35], which are popular and robust time series models for financial volatility, still show good predictive performance on the COVID-19 time series data. Vaccinations against the COVID-19 have been widely distributed in the world. According to Tregoning et al. [36] it is reported that the development and deployment of vaccines for the COVID-19 are a remarkable science success story: within 16 months of the first vaccine trials, 2.8 billion vaccine doses have been administered. Thanks to Mathieu et al. [37] a global database of COVID-19 vaccinations are available. Shim [38] evaluated the impact of COVID-19 vaccination program through a mathematical model based on age-structured classification, while [39] studied forecasting of the number of fully vaccinated people and examined future vaccination rate by the ARIMA model. Tregoning et al. [36] and Higdon [40] conducted systematic reviews of COVID-19 vaccines efficacy and effectiveness against the SARS-CoV-2 infection and disease. Vaccines and booster shots affect the new COVID-19 spread in the local region. Vaccination rates are one of the primary components that influence strongly the COVID-19 confirmed cases. This work concerns itself with a time series analysis for the COVID-19 in the world. Vaccination rates are considered as a regressor variable in a multiple time series regression for the purpose of the COVID-19 forecast analysis. In addition, the growth rates of daily COVID-19 confirmed cases are adopted as another regression factor because the growth rates directly determine new numbers in the near future. The COVID-19 confirmed cases have long-memory property with algebraically decaying autocorrelation coefficients. As in Hwang and Yu [35], we use the HAR model that is an additive cascade regression model with a main feature of the long memory. Overall, to forecast the COVID-19 confirmed cases, this paper suggests a HAR model with Growth rates and Vaccination rates, called the HAR-G-V model, which is a form of multiple linear regression. The multiple linear regression and its statistical theories are well-known (for example, see Montgomery and Peck [41]). Their coefficient parameter estimation and related analysis are frequently used, as a standard-level study, to achieve statistically significant results. The HAR model is a powerful time series model especially for realized volatility of financial asset, (see Corsi [42], Hong et al. [43], Hwang and Hong [44], Wilms et al. [45]). Its background is deferred to a recent work [35] of author, which is also a HAR model-application to the COVID-19 data. The HAR model has moving-averages as regressor variables: for example, in a HAR(3) model with lags (1,7,14), the previous value, 7-day moving average and 14-day moving average are used as explanatory variables. The conventional HAR model is a kind of a linear AR model: for example, the HAR(3) model with lags (1,7,14) is theoretically a linear AR(14) model. However, adding the growth rates to the HAR model results in a nonlinear AR model. Not only the averages but also the growth rates are important factors that determine the COVID-19 spread. In the notable work [34], the growth rate itself of the daily COVID-19 confirmed cases is used for time series modeling. In contrast, the current work uses both averages and growth rates, in a form of nonlinear AR model, and at the same time, in a multiple linear regression model. The modified (nonlinear) HAR() model is employed with one-day, one-week, two-week moving growth rates as well as one-day, one-week, two-week moving averages, with lags (1,7,14), if . In addition, two vaccination rates are imposed in our time series model: 1st and 2nd vaccination rates, which are different from country to country. The following three main issues: (i) reflecting long memory of COVID-19 data, (ii) adopting a nonlinear autoregressive effect with both moving averages and moving rates, and (iii) taking vaccination rates that play a decisive role in the COVID-19 situation, motivate us to choose the multiple regression of the HAR-G-V model in this study. For the empirical analysis of the COVID-19 time series data in the world, top eight affected countries are taken: Brazil, France, Germany, India, Italy, Russia, UK and USA. Their daily COVID-19 confirmed cases and vaccination rates are analyzed through HAR-G-V models of order 2, 3 with lags (1,7), (1,7,14), respectively. In the HAR models with/without growth rates and vaccination rates, model criteria measures such as root mean square error (RMSE), mean absolute error (MAE), , AIC and BIC are evaluated for the comparison. In addition, these are compared to ARIMA models, which are popular in time series literature including COVID-19. The proposed HAR-G-V model outperforms other HAR models and ARIMA models with respect to RMSE and MAE. It is because the both rates have strong influences on the spread of COVID-19. The fitted HAR-G-V models are well-matched to real daily COVID-19 confirmed cases. To investigate forecast performance of the HAR-G-V model, out-of-sample forecasting is conducted to evaluate RMSE, MAE, mean absolute percentage error (MAPE) and root relative square error (RRSE) of forecasts. As a main goal of this work, prediction intervals are established. Their empirical coverage probability, average length and mean interval score are reported. The mean interval score was assessed in Chowell and Luo [32], to account the forecasting performances across a diversity of epidemic datasets. The interval score was suggested by Gneiting and Raftery [46]. It is a scoring rule for prediction interval, which rewards for a narrow prediction interval and incurs a penalty if the observation misses the interval. See [46] for its details, who dealt with the interval score and other scoring rules of prediction intervals. Based on the residuals of the HAR-G-V model, prediction intervals, (), are constructed by approximating the residual distribution. The following three approaches are attempted: (i) Approximation by normal distribution; it does not provide the best performance because residuals do not follow the normal distribution. (ii) Approximation by Laplace distribution; it seems to fit well the residual distribution and to improve the normal approximation by enhancing coverage probability in some countries. However, the Laplace distribution makes longer prediction interval and thus it has the worst mean interval score in the comparison of 95% prediction intervals. (iii) In order to improve the previous two approximations, the bootstrap technique is applied to the residuals to construct the bootstrap prediction interval. The validity of the bootstrap technique in a regression model was initially established by Freedman [47]. Also, the bootstrap procedure was used by [48] to estimate the mean squared error of forecast for the AR() model. As a useful application of the bootstrap, Thombs and Schucany [49] dealt with the bootstrap prediction intervals for the AR models. For weak dependence structure in a network vector autoregression, Hong and Hwang [50] implemented the bootstrap prediction intervals using the stationary bootstrap and the classical residual bootstrap. In the current work, in order to construct a better prediction interval, the classical residual-bootstrap method is used. This residual resampling-technique not only improves empirical coverage but also shortens average length of prediction interval, and thus the proposed bootstrap prediction interval yields the best mean interval score. So as to find how much the bootstrap method can improve the normal and Laplace approaches, bootstrap improvement ratio (BIR) of the mean interval score is suggested. In cases of India, Italy and Russia, which have so long prediction intervals by the normal and Laplace approximations, their BIRs are evaluated as more than 0.7, which means that mean interval scores of the bootstrap prediction intervals can be enhanced by more than 70% from those of other two prediction intervals. This research has contributions in three aspects: (i) This present work combines both growth rates and vaccination rates, for modeling and forecasting the COVID-19 pandemic, through a nonlinear type of HAR models. It provides results of well-fitting to actual daily COVID-19 confirmed cases and good criteria measures such as small values of RMSE and MAE. (ii) Forecasting accuracy performances of RMSE, MAE, MAPE and RRSE are evaluated with small values and, moreover, in behalf of more informative interval estimation, three types of prediction intervals are constructed and compared. (iii) Bootstrap method can improve prediction intervals not only by yielding the empirical coverage, close to the nominal coverage, but also by shortening average length and thus resulting in better mean interval score. Consequently, the combination of the HAR-G-V model and the bootstrap technique achieves an accurate prediction interval with the best mean interval score. An outline of this study is drawn in the flow chart of Fig. 1 .

Fig. 1

Flow chart for this research of HAR-G-V model.

Flow chart for this research of HAR-G-V model. The rest of the paper is organized as follows: In Section 2, model and data are presented. In Section 3 empirical results of estimation and prediction are explored. Conclusion and discussion are stated in Section 4.

Model and data

HAR-G-V model for a triangular array

We first introduce the HAR-G-V model, which is used for a triangular array of observations. Suppose that a sample of size is observed, and the dataset is a triangular array with mean zero and finite variance, but it is not necessarily stationary. In the expression , we omit subscript and use in the model just for notational simplicity. The HAR-G-V model of order , , is defined as follows:where the HAR components are given bywith positive integers , and the growth rate components are given by log difference of shifted values:with some constant , called a shift threshold, such that for all . The last two regressors and are used as the 1st and 2nd vaccination rates at time , respectively. Coefficients are parameters to be estimated for model-fitting and forecasting, and is an i.i.d. sequence of random variables with mean zero and finite variance. The HAR-G-V model of order in (1) can be regarded as a nonlinear AR model of order with additional regression variables, which make the model a multiple regression form. As mentioned above, as explanatory variables, the model has moving averages and moving rates, both of which are functions of the observations . In other words, we may writefor some real-valued function defined on -dimensional real space , i.e., is a composite of algebraic additions, divisions and logarithmic function, along with parameters and . The shift threshold is estimated as any positive number only with condition for all in the logarithm function. Instead of a general function , we consider a specific semiparametric AR model with the form of (1). In the HAR-G-V model (1), if , then model (1) is denoted by HAR-V model, with only vaccination rates, and if , then it is HAR-G model, with only growth rates. If all , then model (1) is the HAR() model, which is used in the analysis of [35]. In the next section of an empirical study, the four models; HAR, HAR-V, HAR-G and HAR-G-V models of order , are compared together, by applying eight datasets of COVID-19 cases. Order is chosen as , and lags ( are (1,7) for and (1,7,14) for . According to Linton et al. [51] the incubation period of the COVID-19 is 14 to 21 days and so the recent work [35] of author dealt with (1,7,14) for and (1,7,14,21) for . However, since this current work handles additionally growth rates and vaccination rates, (if we use , then there are 10 coefficients in the multiple regression), and also since the results of order are expected to be similar to those of order , we omit an empirical study for .

COVID-19 data

This work considers the new daily COVID-19 confirmed cases in top eight affected countries with their growth rates and vaccination rates. The daily confirmed cases (7-day smoothed) and vaccination rates are taken from Our World in Data in https://ourworldindata.org/covid-vaccinations, (see Mathieu et al. [37]). The confirmed cases are applied by 7-day moving-averaging as in Ekinci [34] so as to smoothen the daily data. Eight countries: Brazil, France, Germany, India, Italy, Russia, UK and USA, are chosen to apply the HAR models and to construct the prediction intervals as a main goal. Statistics summaries of daily confirmed cases (7-day smoothed) are displayed in Table 1 along with those of their one-day, one-week and two-week growth rates.

Table 1

Statistics of daily confirmed cases (7-day smoothed) and their growth rates; size is the period from the starting date in 2021 to 08/24/2021; SD = standard deviation.

		Brazil	France	Germany	India	Italy	Russia	UK	USA
	Starting	02/09	01/26	01/16	02/15	01/18	03/02	01/10	01/19
	Size N	190	204	214	184	212	169	220	211
Daily confirmed cases	Mean	56917.9	18765.2	8429.5	116687.6	9691.6	14028.4	14912.9	60591.1
	SD	13699.6	10707.5	6303.9	117743.3	6890.7	6435.6	13467.5	40198.2
	Min	28337.6	1840.6	538.0	12751.6	639.3	8016.4	1517.7	11299.3
	Median	60438.8	20287.4	7864.8	50359.8	9430.5	9488.6	8662.2	57301.1
	Max	77327.6	45641.6	21595.7	391232.0	23211.9	24490.6	48675.4	171103.9
	Skewness	−0.5867	0.0849	0.4389	1.2000	0.3819	0.5397	0.8334	0.8423
	Kurtosis	−0.7192	−0.6760	−1.0002	−0.0274	−1.0363	−1.5331	−0.5727	0.0264
One-day growth rate	Mean	−0.0027	0.0003	−0.0033	0.0050	−0.0031	0.0038	−0.0018	−0.0006
	SD	0.0386	0.0815	0.0683	0.0428	0.0451	0.0163	0.0737	0.0460
	Min	−0.2419	−0.3246	−0.1898	−0.1225	−0.1636	−0.0295	−0.5146	−0.1538
	Median	−0.0002	0.0017	0.0032	−0.0013	−0.0047	−0.0010	−0.0030	−0.0086
	Max	0.2165	0.2546	0.2958	0.1337	0.1284	0.0706	0.5197	0.1960
	Skewness	−0.5975	−0.3307	0.02783	0.2650	0.4591	1.5086	0.1375	1.0606
	Kurtosis	13.4556	2.4453	1.6424	−0.4308	1.1372	2.6062	−0.5964	2.9285
One-week growth rate	Mean	−0.0173	0.0057	−0.0265	0.0353	−0.0232	0.0309	−0.0109	−0.0035
	SD	0.1298	0.3277	0.3176	0.2838	0.2847	0.1006	0.3320	0.2420
	Min	−0.3572	−0.7531	−0.7623	−0.4066	−0.5087	−0.1171	−0.9072	−0.4321
	Median	0.0070	−0.0234	0.0027	−0.0531	−0.0147	−0.0072	−0.0481	−0.0379
	Max	0.2275	1.1725	0.5079	0.5356	0.7688	0.3528	0.6126	0.6937
	Skewness	−0.6097	0.8542	−0.2229	0.2691	0.6975	1.5848	0.3378	0.7216
	Kurtosis	0.2087	1.2571	−0.7937	−1.0847	0.3750	1.6817	−0.9006	−0.2771
Two-week growth rate	Mean	−0.0391	0.0141	−0.0652	0.0722	−0.0471	0.0693	−0.0154	−0.0005
	SD	0.1889	0.6056	0.5886	0.5626	0.5492	0.1876	0.5574	0.4625
	Min	−0.4629	−1.1577	−1.2154	−0.7770	−0.9424	−0.1296	−1.2253	−0.6608
	Median	−0.0260	0.0170	0.0161	−0.0896	−0.0907	−0.0020	−0.1433	−0.1470
	Max	0.3476	1.6405	0.8989	1.0268	1.3463	0.5938	1.0605	1.0772
	Skewness	−0.2535	0.7179	−0.2469	0.2241	0.6688	1.4191	0.4948	0.7940
	Kurtosis	−0.4413	0.4999	−1.0432	−1.1951	0.0393	0.7863	−1.0508	−0.4554

Statistics of daily confirmed cases (7-day smoothed) and their growth rates; size is the period from the starting date in 2021 to 08/24/2021; SD = standard deviation. Because the vaccine supply differs from country to country, the sample size is different for each country. More specifically, the start date in the sample period is set as the first day of the 2nd vaccination, while the end date is all 08.24.2021. The size in each country, shown in Table 1, is the length between the start date and 08.24.2021. As introduced above, model (1) is assumed to have mean zero, and so we adopt as standardized values of COVID-19 confirmed cases for each country. In other words, , for , where is daily COVID-19 confirmed case at time , and are its sample mean and sample standard deviation, respectively. Note that generates a triangular array with mean zero and variance one, (but not stationary). With lags (1,7,14) for , the growth rates in (2) are computed as one-day, one-week and two-week growth rates of daily numbers shifted by . The shift threshold may be chosen as some positive number so that for all . Either , for any , or are possible. Here we select threshold because it implies thatwhich is log difference of original COVID-19 confirmed cases . These growth rates are depicted in Fig. 2 , where we see one-day, one-week, two-week growth rates of the eight countries. Positive values during a time period imply that the numbers of COVID-19 confirmed cases increase during the period; for example, in India, during the period before 60 on time axis, growth rates are positive, which means new daily COVID-19 confirmed numbers increase, while during the period between 75 and 140, growth rates are negative, which means new daily COVID-19 numbers decrease. These facts are also shown on the plot of India in Fig. 5. In Russia, between 80 and 110 on time axis, growth rates are positive, which indicates increasing COVID-19 during the period, as seen on the plot of Russia in Fig. 5. In Fig. 3 , the 1st and 2nd vaccination rates of the eight countries are shown, until date 08.24.2021. India and Russia have low vaccination rates, whereas Italy, UK and USA have high rates. Data in Figs. 2 and 3 are used as regressor variables in the HAR-G-V model of a triangular array.

Fig. 2

Growth rates of daily COVID-19 confirmed cases (7-day smoothed).

Fig. 5

Daily confirmed cases (7-day smoothed) and HAR-G-V model fitting with its residuals.

Fig. 3

Vaccination rates.

Growth rates of daily COVID-19 confirmed cases (7-day smoothed). Vaccination rates. A main characteristic of the HAR model is the long memory, which means that historical observation has a persistent impact on future value. The long memory can be identified by autocorrelation coefficient functions (ACFs) with algebraically decay rates. In order to see whether the COVID-19 data are suitable for the HAR model, their sample ACFs are illustrated in Fig. 4 . The COVID-19 confirmed cases have the long-memory effects as seen in Fig. 4, where the sample ACFs of the confirmed cases are depicted for the eight countries. The sample ACFs are decreasing very slowly. This feature leads us to adopt the HAR models for modeling the COVID-19 pandemic time series. In the next section, we discuss empirical results of the COVID-19 confirmed cases with their estimation and forecasting through the HAR models with/without the growth rates and vaccination rates. Because the proposed models are a type of multiple linear regression model, which is popular and standard in statistical analysis, and also because earlier works [43], [44] dealt with a simulation study in the HAR model, in this work, the simulation is omitted and real data analysis only is discussed.

Fig. 4

Sample autocorrelation coefficient function (ACF) of daily COVID-19 confirmed cases (7-day smoothed).

Sample autocorrelation coefficient function (ACF) of daily COVID-19 confirmed cases (7-day smoothed). Daily confirmed cases (7-day smoothed) and HAR-G-V model fitting with its residuals.

Empirical study

Parameter estimation

This section presents the empirical results of the proposed models with daily confirmed cases and vaccination rates. Four models; HAR, HAR-V, HAR-G and HAR-G-V models, are compared by reporting some criteria measures of the parameter estimation. Additionally, these results are compared with those by popular ARIMA models. Suppose the observations of size are relabelled by , where is chosen so that , with , 14, for , 3, respectively. We use the least squares method that minimizes the sum of squared errors (SSE): The regression model in (1) is written as where and . The ordinary least squared (OLS) estimator is obtained by minimizing the SSE and given asTo see the performance of the OLS regression, we compute criteria measures such as root mean square error (RMSE), mean absolute error (MAE), , AIC and BIC. Let . Using the residuals , RMSE and MAE are given byAlso, , AIC and BIC are given bywhere is the log-likelihood function and is the number of parameters. For the HAR(-G-V) models with order , these performance measures in the eight countries are reported in Tables 2 and 3 . In order to compare the results with existing model, Table 4 gives the performances by ARIMA models. In the columns of RMSE and MAE of Table 2, Table 3, Table 4, the values in the parentheses are the errors multiplied by the standard deviations of the daily confirmed cases in each country, where the standard deviations are given in Table 1. Because HAR and HAR-V models have fewer parameters, AIC and BIC values in these models are somewhat smaller than those in the HAR-G and HAR-G-V models.

Table 2

Criteria of HAR() models with/without growth rates and vaccination rates; Brazil, France, Germany and India.

			RMSE	MAE	R2	AIC	BIC
Brazil	p=2	HAR	0.14876(2038.06)	0.09671(1324.91)	0.979	−167.2	−160.9
		HAR-V	0.14634(2004.85)	0.09508(1302.64)	0.980	−169.0	−156.3
		HAR-G	0.14782(2025.06)	0.09731(1333.16)	0.979	−165.5	−152.8
		HAR-G-V	0.14493(1985.55)	0.09363(1282.75)*	0.980	−168.4	−149.4
	p=3	HAR	0.14864(2036.32)	0.09726(1332.36)	0.979	−165.5	−156.0
		HAR-V	0.14582(1997.74)	0.09575(1311.75)	0.980	−168.3	−152.4
		HAR-G	0.14698(2013.58)	0.09684(1326.60)	0.979	−163.5	−144.5
		HAR-G-V	0.14473(1982.75)*	0.09437(1292.96)	0.980	−164.9	−139.5
France	p=2	HAR	0.18192(1948.01)	0.09762(1045.33)	0.969	−104.4	−97.88
		HAR-V	0.17893(1915.97)	0.09195(984.65)	0.970	−106.7	−93.69
		HAR-G	0.17065(1827.26)	0.09503(1017.59)	0.973	−124.7	−111.7
		HAR-G-V	0.17042(1824.78)	0.09471(1014.13)	0.973	−121.2	−101.7
	p=3	HAR	0.17907(1917.40)	0.09266(992.16)	0.970	−108.4	−98.65
		HAR-V	0.17779(1903.79)	0.08936(956.84)*	0.971	−107.1	−90.87
		HAR-G	0.17061(1826.83)	0.09438(1010.62)	0.973	−120.8	−101.3
		HAR-G-V	0.17040(1824.58)*	0.09428(1009.51)	0.973	−117.3	−91.27
Germany	p=2	HAR	0.10697(674.34)	0.05891(371.62)	0.989	−322.5	−315.9
		HAR-V	0.10508(662.44)	0.05684(358.35)	0.989	−325.6	−312.4
		HAR-G	0.10205(643.37)	0.05991(377.65)	0.990	−337.3	−324.1
		HAR-G-V	0.10146(639.62)	0.05940(374.49)	0.990	−335.6	−315.9
	p=3	HAR	0.10563(665.93)	0.05684(358.29)	0.989	−325.5	−315.6
		HAR-V	0.10475(660.37)	0.05563(350.72)*	0.989	−324.9	−308.4
		HAR-G	0.10194(642.59)	0.05952(375.18)	0.990	−333.8	−314.0
		HAR-G-V	0.10124(638.23)*	0.05919(373.15)	0.990	−332.5	−306.1
India	p=2	HAR	0.01323(1558.30)	0.00919(1082.32)	1.000	−984.0	−977.8
		HAR-V	0.01303(1533.87)	0.00896(1055.56)	1.000	−985.4	−972.9
		HAR-G	0.01318(1552.68)	0.00922(1085.87)	1.000	−981.3	−968.7
		HAR-G-V	0.01299(1530.08)	0.00903(1062.89)	1.000	−982.2	−963.4
	p=3	HAR	0.01186(1396.15)	0.00783(921.56)	1.000	−1019	−1010
		HAR-V	0.01181(1390.10)	0.00776(913.84)	1.000	−1017	−1001
		HAR-G	0.01139(1340.87)	0.00744(875.97)	1.000	−1027	−1008
		HAR-G-V	0.01125(1324.41)*	0.00731(860.22)*	1.000	−1027	−1002

Table 3

Criteria of HAR() models with/without growth rates and vaccination rates; Italy, Russia, UK and USA.

			RMSE	MAE	R2	AIC	BIC
Italy	p=2	HAR	0.03619(249.42)	0.01897(130.77)	0.999	−748.3	−741.8
		HAR-V	0.03608(248.61)	0.01882(129.71)	0.999	−745.6	−732.5
		HAR-G	0.03565(245.71)	0.01891(130.32)	0.999	−750.3	−737.1
		HAR-G-V	0.03553(244.85)	0.01881(129.64)	0.999	−747.7	−727.9
	p=3	HAR	0.03613(248.97)	0.01888(130.15)	0.999	−747.0	−737.2
		HAR-V	0.03603(248.27)	0.01866(128.57)*	0.999	−744.2	−727.7
		HAR-G	0.03541(244.01)	0.01887(130.06)	0.999	−749.0	−729.3
		HAR-G-V	0.03530(243.28)*	0.01871(128.96)	0.999	−746.2	−719.9
Russia	p=2	HAR	0.01644(105.83)	0.01193(76.79)	1.000	−829.5	−823.4
		HAR-V	0.01628(104.77)	0.01203(77.43)	1.000	−828.6	−816.5
		HAR-G	0.01585(101.98)	0.01218(78.38)	1.000	−827.0	−824.8
		HAR-G-V	0.01548(101.35)	0.01224(78.79)	1.000	−834.9	−816.7
	p=3	HAR	0.01629(104.84)	0.01198(77.07)	1.000	−803.5	−821.3
		HAR-V	0.01615(103.98)	0.01207(77.68)	1.000	−829.0	−813.8
		HAR-G	0.01534(98.74)	0.01184(76.21)*	1.000	−843.0	−824.7
		HAR-G-V	0.01530(98.50)*	0.01188(76.49)	1.000	−839.8	−815.4
UK	p=2	HAR	0.03639(490.06)	0.02126(286.43)	0.998	−776.6	−769.9
		HAR-V	0.03526(474.89)	0.02144(288.81)	0.999	−785.5	−772.2
		HAR-G	0.03612(486.43)	0.02111(284.39)	0.998	−775.6	−762.3
		HAR-G-V	0.03435(462.64)	0.02109(284.11)	0.999	−792.3	−772.3
	p=3	HAR	0.03288(442.83)	0.01987(267.61)	0.999	−816.3	−806.3
		HAR-V	0.03144(423.47)	0.01976(266.07)	0.999	−830.7	−814.1
		HAR-G	0.03165(426.26)	0.01932(260.16)	0.999	−826.0	−806.1
		HAR-G-V	0.03081(414.95)*	0.01927(259.54)*	0.999	−833.1	−806.5
USA	p=2	HAR	0.03952(1588.63)	0.02708(1088.51)	0.998	−709.9	−703.4
		HAR-V	0.03914(1573.53)	0.02707(1088.02)	0.998	−709.7	−696.6
		HAR-G	0.03897(1566.44)	0.02712(1090.36)	0.998	−711.5	−698.3
		HAR-G-V	0.03862(1552.69)	0.02700(1085.47)	0.998	−710.9	−691.2
	p=3	HAR	0.03912(1572.87)	0.02690(1081.42)	0.998	−711.9	−702.0
		HAR-V	0.03886(1562.19)	0.02692(1082.19)	0.998	−710.5	−694.1
		HAR-G	0.03843(1544.88)	0.02624(1054.95)*	0.998	−712.9	−693.2
		HAR-G-V	0.03821(1536.29)*	0.02633(1058.66)	0.998	−711.1	−684.9

Table 4

Comparison with ARIMA models.

		RMSE	MAE	R2	AIC	BIC
Brazil	ARIMA(1,1,1)	0.14527(1990.21)	0.09655(1322.72)	0.979	−186.9	−177.23
France	ARIMA(1,1,1)	0.17485(1872.25)	0.10083(1079.63)	0.969	−125.8	−115.9
Germany	ARIMA(1,2,1)	0.10476(660.43)	0.05947(374.92)	0.989	−347.0	−336.9
India	ARIMA(1,1,1)	0.011297(1330.12)	0.00671(789.64)*	1.000	−1112.7	−1103.1
Italy	ARIMA(1,2,1)	0.03474(239.36)*	0.01829(126.06)*	0.999	−808.6	−798.6
Russia	ARIMA(1,2,1)	0.01615(103.95)	0.01252(80.57)	1.000	−897.9	−888.6
UK	ARIMA(1,2,0)	0.02763(370.39)*	0.01674(224.51)*	0.999	−942.2	−935.4
USA	ARIMA(1,2,1)	0.03854(1549.26)	0.02661(1069.80)	0.998	−760.9	750.9

Criteria of HAR() models with/without growth rates and vaccination rates; Brazil, France, Germany and India. Criteria of HAR() models with/without growth rates and vaccination rates; Italy, Russia, UK and USA. Comparison with ARIMA models. As for RMSE in Tables 2 and 3, HAR-V models have smaller errors than HAR models, and also HAR-G-V models have smaller errors than HAR-G models. It implies that vaccination rates affect significantly the model identification. In Tables 2 and 3, bold numbers are the smallest errors in the given order in each country, whereas mark indicates the best one among the two smallest errors in both orders in each country. All eight countries have the smallest values of RMSE in HAR-G-V model of order 3. Consequently, HAR-G-V model of order is the most suitable model in the sense of RMSE. As for MAE in Tables 2 and 3, Brazil, India and UK have the smallest errors in HAR-G-V model of order 3. In other countries, we see that the models with vaccination rates, HAR-V or HAR-G-V, are better than the models without them, HAR or HAR-G, respectively, with respect to MAE. Table 4 presents empirical results of ARIMA() models for the COVID-19 data. RMSE, MAE, , AIC and BIC by our proposed HAR(-G-V) models are compared with those in Table 4. Orders of ARIMA() models are selected through a unit-root test, i.e., ADF test, as well as ACF and PACF plots, and through consideration of smallest RMSE and MAE. Bold numbers in Table 4 of ARIMA models indicate smaller errors than HAR(-G-V) model errors in each country. Most cases demonstrate that HAR(-G-V) models fit with the COVID-19 data better than the ARIMA models. To see the model-fitting of HAR-G-V models, Fig. 5 depicts the actual daily COVID-19 confirmed cases and HAR-G-V models along with their residuals. For all the eight countries, HAR-G-V models are well-matched with the actual data. It is concluded that HAR-G-V model of order 3 is the best one with reduced errors, compared to other models. Therefore, in the next subsection of prediction, we adopt the HAR-G-V model of order 3 and discuss prediction intervals by approximating the residual distribution.

Prediction analysis

Now we investigate the forecasting performance by the HAR-G-V model of order 3, applied to the daily COVID-19 confirmed cases. For the first prediction result, one-, two-, three-step ahead forecasts are computed and their forecasting performance accuracy is reported. In order to provide the forecasting analysis, rolling windows are used. Out of the given observations of size , we take two subsamples; in-sample and out-of-sample. To assess the -step ahead forecast, (), let , and let with relabelled observations, (). is the in-sample used to estimate parameters. The -step ahead predicted value is computed from , (), and accuracy measures of RMSE, MAE, MAPE and RRSE are evaluated along with observation in the out-of-sample as follows: Root Mean Square Error (RMSE): Mean Absolute Error (MAE): Mean Absolute Percentage Error (MAPE): Root Relative Square Error (RRSE):where . In Table 5 , RMSE, MAE, MAPE and RRSE for the -step ahead forecasts, (), are evaluated with size . The values in the parentheses on the columns of RMSE and MAE indicate the errors multiplied by the standard deviations of in-sample observations. As expected, it is shown in Table 5 that all the errors are reasonably small, and the farther step ahead, the larger errors.

Table 5

Forecasting performance of one-, two-, three-step ahead ( forecasts in HAR-G-V model with size 30.

	k	RMSE	MAE	MAPE	RRSE
Brazil	1	0.20620(2824.94)	0.11714(1604.88)	4.35666	0.45181
	2	0.31809(4357.75)	0.22947(3143.68)	8.57290	0.68883
	3	0.37590(5149.75)	0.29034(3977.58)	11.25078	0.85273
France	1	0.05194(556.22)	0.04302(460.71)	2.19886	0.20366
	2	0.07343(786.35)	0.06043(647.13)	2.99611	0.33470
	3	0.09759(1045.04)	0.07635(817.59)	3.71851	0.52219
Germany	1	0.02556(161.13)	0.02103(132.58)	5.51255	0.12122
	2	0.02581(162.73)	0.02124(133.93)	4.82855	0.11271
	3	0.05738(361.72)	0.04367(275.31)	11.58729	0.23259
India	1	0.01886(2220.95)	0.01117(1315.23)	3.22558	0.71244
	2	0.03623(4265.88)	0.02432(2864.33)	7.00961	1.28780
	3	0.05698(6709.39)	0.03789(4462.14)	10.89532	1.88765
Italy	1	0.01446(99.64)	0.01134(78.14)	1.5562	0.12081
	2	0.03627(249.95)	0.02876(198.19)	3.92301	0.33632
	3	0.06402(441.16)	0.05005(344.91)	6.74034	0.65573
Russia	1	0.01622(104.42)	0.01343(86.45)	0.39060	0.11079
	2	0.02954(190.16)	0.02418(155.64)	0.71247	0.20343
	3	0.04626(297.73)	0.03795(244.27)	1.13091	0.31719
UK	1	0.06500(871.42)	0.04350(583.18)	1.78073	0.16298
	2	0.15677(2101.66)	0.10299(1380.77)	4.32968	0.47361
	3	0.27381(3670.67)	0.18412(2468.32)	8.04029	1.02424
USA	1	0.08251(3316.91)	0.06854(2755.29)	3.23027	0.10518
	2	0.11920(4791.86)	0.09111(3662.80)	4.45301	0.15395
	3	0.14519(5836.68)	0.11731(4715.98)	5.42386	0.19048

Forecasting performance of one-, two-, three-step ahead ( forecasts in HAR-G-V model with size 30. For the second prediction result, we construct % prediction intervals using the residual distributions. In the HAR-G-V model with estimated parameter vector , the residuals are approximated by three approaches. The distributions of the residuals are plotted with histograms in Fig. 6 , where the normal and Laplace distributions by estimated mean and standard deviation of the residuals are shown as well. The % Box–Jenkins prediction interval by the normal approximation is constructed as , with for , respectively, where and are the estimated mean and standard deviation of the residuals. However, as shown in Fig. 6, the normal density is not well fitted with distribution of the residuals. Hence we adopt the Laplace distribution for better fitting of the density. The Laplace distribution, which is well-known as two-sided exponential distribution, has the following probability density function with location parameter and scale parameter : . Its mean and variance are and , respectively. In Fig. 6, red curves indicate Laplace density functions with estimated parameters from the residuals. Using the Laplace distribution, % prediction intervals are constructed as with the estimated mean and scale parameter, and . By rescaling these intervals, the one-step prediction intervals of the daily COVID-19 confirmed cases are constructed.

Fig. 6

Distribution of residuals by HAR-G-V models.

Distribution of residuals by HAR-G-V models. In Table 6 , for the last days, , empirical coverage probability, average length and mean interval score of 80% prediction intervals are displayed. Moreover, those of 95% prediction intervals are given in Table 7 . The empirical coverage probability is the ratio of observations that lie in the prediction interval. The interval score is a scoring rule for prediction interval, which gives a penalty if observation misses the interval and rewards for a narrow interval, (see Chowell and Luo [32], Gneiting and Raftery [46]). The mean interval score (MIS) is defined bywhere and are lower and upper bounds of the % prediction interval for level , respectively, is the indicator function and is observation. If coverage probability is one, then the mean interval score is the same as the average length. Lower coverage probability yields more penalty as much as the last two terms in (3). For the case of Brazil in Tables 6 and 7, the normal and Laplace approximations address small coverage probabilities, not close to, but lower than the nominal coverage, and thus they make much larger (or worse) mean interval scores with high penalty, compared to the average lengths.

Table 6

		Normal			Laplace			Bootstrap
	ℓ	CP	AL	MIS	CP	AL	MIS	CP	AL	MIS
Brazil	20	0.7	2447	3790	0.65	2176	3924	0.85	3526	3749
	50	0.6	2176	8631	0.54	1935	8887	0.76	3509	7780
	100	0.53	2096	9145	0.46	1863	9475	0.72	3422	7926
France	20	1.0	2089	2089	1.0	1858	1858	1.0	2636	2636
	50	0.9	2250	2588	0.88	2000	2472	0.9	2806	2985
	100	0.66	2231	4910	0.62	1984	5105	0.81	3429	4244
Germany	20	1.0	1192	1192	1.0	1060	1060	1.0	1218	1218
	50	1.0	1234	1234	0.94	1097	1126	1.0	1251	1251
	100	0.77	1203	2022	0.66	1070	2073	0.84	1413	1861
India	20	1.0	24234	24234	1.0	21547	21547	0.95	2703	2706
	50	1.0	26141	26141	1.0	23242	23242	0.78	2792	6798
	100	1.0	31015	31015	1.0	27575	27575	0.76	3063	6000
Italy	20	1.0	1313	1313	1.0	1167	1167	1.0	359	359
	50	1.0	1389	1389	1.0	1235	1235	0.82	368	486
	100	1.0	1361	1361	1.0	1210	1210	0.82	390	599
Russia	20	1.0	1357	1357	1.0	1206	1206	0.7	253	335
	50	1.0	1272	1272	1.0	1131	1131	0.74	251	366
	100	0.78	787	1006	0.75	699	948	0.6	213	573
UK	20	1.0	2434	2434	1.0	2164	2164	0.8	688	979
	50	0.86	2373	3235	0.82	2110	3190	0.56	569	4030
	100	0.93	2413	2844	0.91	2146	2686	0.62	441	2495
USA	20	0.75	6952	9895	0.65	6181	10339	0.52	3245	14417
	50	0.82	7186	9892	0.78	6389	9894	0.56	2871	13218
	100	0.91	7597	8950	0.89	6755	8507	0.7	2663	8044

Table 7

Coverage probability (CP), average length (AL) and mean interval score (MIS) of 95% prediction intervals by normal, laplace distributions and Bootstrap procedure in HAR-G-V residuals for the last days.

		Normal			Laplace			Bootstrap
	ℓ	CP	AL	MIS	CP	AL	MIS	CP	AL	MIS
Brazil	20	0.9	3747	4584	0.95	4050	4465	1.0	8011	8011
	50	0.74	3332	21773	0.78	3602	20856	0.96	8095	17968
	100	0.69	3209	208	0.74	3469	22065	0.95	8137	16182
France	20	1.0	3200	3200	1.0	3458	3458	1.0	9449	9449
	50	1.0	3445	3445	1.0	3723	3723	1.0	034	10034
	100	0.82	3416	7901	0.87	3692	7342	1.0	10361	10361
Germany	20	1.0	1825	1825	1.0	1973	1973	1.0	3227	3227
	50	1.0	1890	1890	1.0	2042	2042	1.0	3439	3439
	100	0.89	1842	3190	0.9	1991	3054	0.99	3563	3602
India	20	1.0	37109	37109	1.0	40107	40107	1.0	6340	6340
	50	1.0	40028	40028	1.0	43261	43261	0.94	5867	13861
	100	1.0	47492	47492	1.0	51328	51328	0.94	5490	11140
Italy	20	1.0	2011	2011	1.0	2173	2173	1.0	869	869
	50	1.0	2127	2127	1.0	2299	2299	1.0	981	981
	100	1.0	2084	2084	1.0	2253	2253	0.98	1093	1156
Russia	20	1.0	2078	2078	1.0	2254	2254	0.95	437	478
	50	1.0	1948	1948	1.0	2105	2105	0.92	433	525
	100	0.88	1205	1716	0.91	1302	1771	0.80	355	1154
UK	20	1.0	3745	3745	1.0	4047	4047	0.95	1996	2007
	50	0.94	3653	4344	0.96	3948	4393	0.78	1703	9220
	100	0.97	3716	4061	0.98	4016	4239	0.83	1366	5315
USA	20	0.9	10645	11671	0.95	11505	11596	0.7	6284	25698
	50	0.94	11004	14384	0.96	11892	14556	0.72	5264	24559
	100	0.97	11634	13324	0.98	12573	13905	0.85	4835	14012

Coverage probability (CP), average length (AL) and mean interval score (MIS) of 80% prediction intervals by normal, laplace distributions and Bootstrap procedure in HAR-G-V residuals for the last days. Coverage probability (CP), average length (AL) and mean interval score (MIS) of 95% prediction intervals by normal, laplace distributions and Bootstrap procedure in HAR-G-V residuals for the last days. The Laplace distribution seems to fit the residual distributions better and to improve coverage probabilities of the 95% prediction intervals in Brazil, France, Germany and Russia (Table 7). Also, the Laplace distribution makes the mean interval scores better in most countries for the 80% prediction intervals (Table 6), whereas this is not true for the 95% prediction intervals (Table 7). However, average lengths of prediction intervals by both normal and Laplace distributions are large in some countries and so coverage probability is always equal to one. Too long intervals make always coverage one. Thus we need to improve prediction intervals with proper coverage probability close to the nominal coverage, as well as with shorter length. For this purpose, we consider the bootstrap procedure in the next subsection.

Bootstrap prediction interval

In order to assess the bootstrap forecast, we first generate a bootstrap sample of residuals. Using the residuals , we define the empirical distribution byWe draw an i.i.d. sequence from the empirical distribution , which is a bootstrap sample to be used in constructing a bootstrap prediction interval. One-step ahead bootstrap forecast is given by An % bootstrap prediction interval is constructed by the - and -bootstrap quantiles, and , respectively, of the bootstrap sample , where is the number of bootstrap replications. The % bootstrap prediction interval is given by In Figs. 7 and 8 , the right columns depict the 95% bootstrap prediction intervals for the COVID-19 confirmed cases, where is used to get the bootstrap quantiles, while the left columns illustrate the Box–Jenkins prediction intervals by the normal approximation. Because all the lengths by the Laplace approximation in Table 7 are larger than those by the normal approximation, plots of Laplace prediction intervals are omitted. The interval plots in Figs. 7 and 8 describe the numerical results in Table 7 with 95% one-step prediction intervals for the last 15 days. The left and right columns use the same scales on the vertical axis in each country so that the Box–Jenkins and bootstrap prediction intervals are compared with each other at sight. In most countries the prediction intervals include the actual values of the COVID-19 confirmed cases. On the left columns of Figs. 7 and 8, the prediction intervals by the Box–Jenkins are so large with coverage probability one in some countries. The lengths of the bootstrap prediction intervals on the right columns are not too large or too small because bootstrap distribution mimics the distribution of the residuals.

Fig. 7

Fig. 8

One-day ahead predicted values and 95% prediction intervals by HAR-G-V models: Box–Jenkins prediction intervals (left column) and Bootstrap prediction intervals (right column); Brazil, France, Germany and India. One-day ahead predicted values and 95% prediction intervals by HAR-G-V models: Box–Jenkins prediction intervals (left column) and Bootstrap prediction intervals (right column); Italy, Russia, UK and USA. To add some discussion of the bootstrap prediction intervals, we go back to Tables 6 and 7, whose last columns report results of the 80%, 95% bootstrap prediction intervals. In Tables 6 and 7 we see that the bootstrap procedure not only improves coverage probability but also makes better mean interval score of prediction intervals. In Brazil, France and Germany, the coverage probabilities are improved to the values close to the norminal coverage by enlarging the average length of the intervals. In India, Italy and Russia, the bootstrap prediction intervals are shortened with the improved coverage probability as well, which are in cases of so large average length by the normal and Laplace distributions. Consequently, the bootstrap technique produces the best mean interval scores: bold numbers in columns of MIS in Tables 6 and 7 indicate the best one among three scores of the normal, Laplace and bootstrap approaches. Finally, in order to show how much the bootstrap method improves the normal and Laplace approximations, in the countries with the best mean interval scores in all three cases of Table 7, i.e., in Brazil, India, Italy and Russia, the coverage probability and mean interval score of 80%, 95% prediction intervals are depicted in Fig. 9, Fig. 10, Fig. 11, Fig. 12 . Instead of in Tables 6 and 7, are set on the horizontal axis in Fig. 9, Fig. 10, Fig. 11, Fig. 12. Using the last days, the empirical coverage probability and the mean interval score are computed and then the two measures are illustrated on the -axis of the horizon. Red graphs in the figures indicate the results by the bootstrap while green and blue ones are by the normal and Laplace, respectively. Red graphs in the coverage probability (left columns of Fig. 9, Fig. 10, Fig. 11, Fig. 12) are closer to the nominal coverage in both 80% and 95% intervals. In the plots of coverage probability, the grey horizontal lines in the middle lie on the nominal coverage. Coverage probabilities of the 80% bootstrap prediction intervals average to 0.8470, 0.8708, 0.9191 and 0.7717 in Brazil, India, Italy and Russia, respectively, whereas those of the normal and Laplace approximations are all one except for Brazil, which has the averages 0.7073 and 0.6564. Also, the averages of coverage probabilities of the 95% bootstrap prediction intervals are 0.9761, 0.9624, 1.0 and 0.9691 in the four countries, respectively, whereas those of others are all one, but for Brazil with 0.8543 and 8894.

Fig. 9

Coverage probability (left column) and Mean interval score (right column) of 80%, 95% prediction interval (PI) for Brazil.

Fig. 10

Coverage probability (left column) and Mean interval score (right column) of 80%, 95% prediction interval (PI) for India.

Fig. 11

Coverage probability (left column) and Mean interval score (right column) of 80%, 95% prediction interval (PI) for Italy.

Fig. 12

Coverage probability (left column) and Mean interval score (right column) of 80%, 95% prediction interval (PI) for Russia.

Coverage probability (left column) and Mean interval score (right column) of 80%, 95% prediction interval (PI) for Brazil. Coverage probability (left column) and Mean interval score (right column) of 80%, 95% prediction interval (PI) for India. Coverage probability (left column) and Mean interval score (right column) of 80%, 95% prediction interval (PI) for Italy. Coverage probability (left column) and Mean interval score (right column) of 80%, 95% prediction interval (PI) for Russia. As for the mean interval score (right columns of Fig. 9, Fig. 10, Fig. 11, Fig. 12), red graphs of the bootstrap are much smaller values than others except for a few values of Brazil. As shown in the figures, the Laplace approximation does not significantly improve the normal one. However, the bootstrap method considerably lowers and enhances the mean interval scores from both normal and Laplace approximations. To understand the bootstrap improvement with an evaluated measure, we suggest a bootstrap improvement ratio (BIR) of the mean interval score. The BIR is defined aswhere is average of the mean interval scores by the underlying approach for :with in (3), and stands for the bootstrap. The BIR values are displayed in Table 8 , whose results are computed from the mean interval scores (right columns) of Fig. 9, Fig. 10, Fig. 11, Fig. 12. In the case of Brazil, it is reported that the BIR values of the 95% prediction intervals are not as good as those in others because the mean interval scores of the bootstrap are higher (or worse) for , as seen in Fig. 9, than other two approaches. In India and Russia, BIR values are between 0.7 and 0.8, which means that even up to 70–80% of the scores are improved either from the normal to the bootstrap or from the Laplace to the bootstrap. In conclusion, we claim that the bootstrap technique can improve prediction intervals with having more proper empirical coverages (closer to nominal one) as well as shorter average lengths and thus better mean interval scores.

Table 8

Bootstrap improvement ratio (BIR) of prediction interval (PI) on the mean interval scores in Fig. 9, Fig. 10, Fig. 11, Fig. 12.

	Normal to Bootstrap		Laplace to Bootstrap
	80% PI	95% PI	80% PI	95% PI
Brazil	0.0246	0.0101	0.0434	−0.0172
India	0.8031	0.7132	0.7786	0.7346
Italy	0.7103	0.5754	0.6741	0.6071
Russia	0.7683	0.7781	0.7394	0.7947

Bootstrap improvement ratio (BIR) of prediction interval (PI) on the mean interval scores in Fig. 9, Fig. 10, Fig. 11, Fig. 12.

Conclusion and discussion

The COVID-19 pandemic might continue over the next few years in spite of the vaccination supply all around the world. Health scientists and researchers will make more efforts to reverse the virus-contaminated Earth to the uncontaminated one again. For this purpose, more efficient models as well as more accurate predictions are needed to be reflected in social policies such as lockdown and social distancing. In the statistical aspect, rather than point-estimation of predicted values, interval-estimation such as prediction intervals can give more information with high probability. To this end, this study suggests a multiple regression with a form of nonlinear HAR along with growth rates and vaccination rates as its additional regressors, and aims to provide more accurate prediction intervals for the daily COVID-19 confirmed cases. Model selections of the HAR models with/without the growth rates and vaccination rates are given by the criteria measures such as RMSE, MAE, R, AIC and BIC. Our proposed HAR-G-V model addresses the smallest values of RMSE and MAE in top eight affected countries. We conclude that the HAR-G-V model is a suitable and useful tool for obtaining accurate prediction. For the forecasting, point-estimation of predicted values as well as interval-estimation through three kinds of prediction intervals are extensively provided. RMSE, MAE, MAPE and RRSE are evaluated for the forecast performance accuracy. In order to assess accurate prediction intervals, three approaches are carried out with the residuals of the HAR-G-V model. Normal and Laplace approximations, and bootstrap method are tried to construct prediction intervals. The last one improves prediction intervals with maximum 70–80% of mean interval score, because bootstrap procedure mimics the residual distribution. This work has the following contributions: (i) A time series model collaborated with both growth rates and vaccination rates is proposed in order to forecast the daily COVID-19 confirmed cases. As a result, the proposed model yields smaller errors in RMSE and MAE. (ii) This work mainly focuses on the prediction intervals to be more informative, by establishing the 80%, 95% prediction intervals through the normal, Laplace and bootstrap approaches. All three prediction intervals produce good empirical coverage probabilities. (iii) The bootstrap technique can improve the mean interval score of prediction intervals. Thus we conclude that the HAR-G-V model combined with the bootstrap method provides accurate prediction intervals of the COVID-19 pandemic. As a next step of the forecast analysis of the COVID-19, author plans to propose and study a HAR(-G-V) model with partially periodicity as mentioned as in the last section of Hwang and Yu [35]. This current work has handled the data of 7-day smoothed COVID-19 cases as in Ekinci [34]. Empirical results in Ekinci [34] and in this work can provide useful predictive information in deciding the COVID-19 social policies such as social distancing or pandemic relief money. It is because such a social policy is executed for a rather long period, specifically, much longer than 7 days. Due to irregular PCR tests from day to day, for example, rare on Sunday or frequent on Wednesday, the 7-day smoothed data analysis is reasonable and sufficient to understand the overall COVID-19 situation. However, as an interesting and challenging topic, author is interested in modeling the partially periodic COVID-19 cases. As seen in the patterns of COVID-19 data (unsmoothed) of USA, Germany, Brazil, India etc., there is a phenomenon such as strong oscillation with 7-day periodicity, which makes 7-day periodic ACFs. The oscillation seems to appear as the magnitude of the COVID-19 data is larger. In other words, if the magnitude becomes larger, then the oscillation becomes stronger; together with no oscillation on small magnitude. In order to reflect such a feature, partial oscillations not only having some truncation but also proportional to the magnitude might be more desirable than full oscillations. Such a time series modeling for the partially periodic COVID-19 cases will be a promising and challenging study. It will need a mathematical refinement or probability distributional complement (like [18]) as well as an experimental design along with the basis of the statistical inference.

CRediT authorship contribution statement

Eunju Hwang: Conceptualization, Methodology, Software, Data curation, Writing – original draft, Visualization, Investigation, Supervision, Validation, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

35 in total

1. Forecasting and planning during a pandemic: COVID-19 growth rates, supply chain disruptions, and governmental decisions.

Authors: Konstantinos Nikolopoulos; Sushil Punia; Andreas Schäfers; Christos Tsinopoulos; Chrysovalantis Vasilakis
Journal: Eur J Oper Res Date: 2020-08-08 Impact factor: 5.334

2. Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil.

Authors: Matheus Henrique Dal Molin Ribeiro; Ramon Gomes da Silva; Viviana Cocco Mariani; Leandro Dos Santos Coelho
Journal: Chaos Solitons Fractals Date: 2020-05-01 Impact factor: 5.944

3. Optimization Method for Forecasting Confirmed Cases of COVID-19 in China.

Authors: Mohammed A A Al-Qaness; Ahmed A Ewees; Hong Fan; Mohamed Abd El Aziz
Journal: J Clin Med Date: 2020-03-02 Impact factor: 4.241

4. A novel grey model based on traditional Richards model and its application in COVID-19.

Authors: Xilin Luo; Huiming Duan; Kai Xu
Journal: Chaos Solitons Fractals Date: 2020-11-17 Impact factor: 5.944

5. COVID-19: Forecasting confirmed cases and deaths with a simple time series model.

Authors: Fotios Petropoulos; Spyros Makridakis; Neophytos Stylianou
Journal: Int J Forecast Date: 2020-12-04

6. Forecasting COVID-19 pandemic using optimal singular spectrum analysis.

Authors: Mahdi Kalantari
Journal: Chaos Solitons Fractals Date: 2020-12-05 Impact factor: 5.944

7. Characterization of the COVID-19 pandemic and the impact of uncertainties, mitigation strategies, and underreporting of cases in South Korea, Italy, and Brazil.

Authors: Ruy Freitas Reis; Bárbara de Melo Quintela; Joventino de Oliveira Campos; Johnny Moreira Gomes; Bernardo Martins Rocha; Marcelo Lobosco; Rodrigo Weber Dos Santos
Journal: Chaos Solitons Fractals Date: 2020-05-14 Impact factor: 5.944

8. Forecasting spread of COVID-19 using google trends: A hybrid GWO-deep learning approach.

Authors: Sikakollu Prasanth; Uttam Singh; Arun Kumar; Vinay Anand Tikkiwal; Peter H J Chong
Journal: Chaos Solitons Fractals Date: 2020-10-22 Impact factor: 5.944

3 in total

1. An ensemble n -sub-epidemic modeling framework for short-term forecasting epidemic trajectories: Application to the COVID-19 pandemic in the USA.

Authors: Gerardo Chowell; Sushma Dahal; Amna Tariq; Kimberlyn Roosa; James M Hyman; Ruiyan Luo
Journal: medRxiv Date: 2022-06-21

2. An ensemble n-sub-epidemic modeling framework for short-term forecasting epidemic trajectories: Application to the COVID-19 pandemic in the USA.

Authors: Gerardo Chowell; Sushma Dahal; Amna Tariq; Kimberlyn Roosa; James M Hyman; Ruiyan Luo
Journal: PLoS Comput Biol Date: 2022-10-06 Impact factor: 4.779

3. Mathematical Modeling to Predict COVID-19 Infection and Vaccination Trends.

Authors: Bogdan Doroftei; Ovidiu-Dumitru Ilie; Nicoleta Anton; Sergiu-Ioan Timofte; Ciprian Ilea
Journal: J Clin Med Date: 2022-03-21 Impact factor: 4.241

3 in total