Literature DB >> 32395038

Development of new hybrid model of discrete wavelet decomposition and autoregressive integrated moving average (ARIMA) models in application to one month forecast the casualties cases of COVID-19.

Sarbjit Singh1,2, Kulwinder Singh Parmar3, Jatinder Kumar2, Sidhu Jitendra Singh Makkhan4,5.   

Abstract

Everywhere around the globe, the hot topic of discussion today is the ongoing and fast-spreading coronavirus disease (COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-COV-2). Earlier detected in Wuhan, Hubei province, in China in December 2019, the deadly virus engulfed China and some neighboring countries, which claimed thousands of lives in February 2020. The proposed hybrid methodology involves the application of discreet wavelet decomposition to the dataset of deaths due to COVID-19, which splits the input data into component series and then applying an appropriate econometric model to each of the component series for making predictions of death cases in future. ARIMA models are well known econometric forecasting models capable of generating accurate forecasts when applied on wavelet decomposed time series. The input dataset consists of daily death cases from most affected five countries by COVID-19, which is given to the hybrid model for validation and to make one month ahead prediction of death cases. These predictions are compared with that obtained from an ARIMA model to estimate the performance of prediction. The predictions indicate a sharp rise in death cases despite various precautionary measures taken by governments of these countries.
© 2020 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  ARIMA model; COVID-19 casualties cases; Discrete wavelet decomposition; Hybrid model; Prediction

Year:  2020        PMID: 32395038      PMCID: PMC7211653          DOI: 10.1016/j.chaos.2020.109866

Source DB:  PubMed          Journal:  Chaos Solitons Fractals        ISSN: 0960-0779            Impact factor:   5.944


Introduction

In Dec 2019, Wuhan, China, witnessed the start of an epidemic, which is just a period of two months overpowered the entire world and took the form of a pandemic named COVID-19 [20,46,68]. The novel coronavirus disease (COVID-19) pandemic caused by the virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has engulfed the entire world within a short period of time [4], [12], [20], [30], [32], [56], [60], [61], [65], [67]. Being highly contagious in nature, it poses a massive threat to people's health as till 10:00 CET, 30 March 2020 a total of 693,282 confirmed cases and 33,106 deaths were reported globally as per World Health Organization (WHO) [26,55,56,61]. The outbreak of new infection has created an emergency situation that raises many important questions related to its transmission dynamics, mitigation, and control measures. Researchers are taking the help of mathematical modeling in order to provide answers to such urgent queries [10]. For instance, to contain the spread, strategies such as social distancing, quarantine and contact tracing of the infected or suspected people, the complete lockdown of the area or countries dealing with it and screening international travelers are the results of model predictions ([19,43]; Mandal et al. 2019; [11,63]). Early modeling results by Kucharski et al. based on the stochastic transmission model told about the variation of COVID-19 over a certain period of time, probability of an outbreak in other areas outside Wuhan and observed a decline in reproduction number from 2.35 to 1.05 after the introduction of travel restrictions [24], [37]. In another study by Chen et al. [10], a Bats-Hosts-Reservoir-People transmission network model was developed to simulate the probable transmission from bats to human beings. They also simplified the above model and found that majorly the transmission occurred from person to person relying on the reproduction number estimated as 3.58 [43]. Li et al. also provided evidence about the person to person transmission route in Wuhan, China, and the calculated that the number of infections doubled in 7.4 days [27]. Similarly, several other models have been used to access the outbreak characteristics [6,28, 59]. Moreover, once the vaccine is made available, its effective distribution could be carried out by mathematical modeling, as suggested in the literature for such infections [1]. However, looking at the severity of the pandemic and the rapidly changing numbers of the infected population, it demands constant data analysis. Time series analysis and forecasting deal with understanding the past relationship among the variables by using various modeling techniques with the ultimate goal of obtaining accurate prediction of future values. Box-Jenkins based ARIMA (Autoregressive Integrated Moving Average) model is a widely used statistical model in time series analysis, which covers a wide variety of patterns, ranging from stationary to non-stationary and seasonal (periodic) time series [33], [35], [54]. However, in dealing with non-linear situations where data is not a linear function of time, Box-Jenkins methodology is inappropriate [3,7,23,33]. For accurate forecasting of non-linear data, wavelet analysis is a magnificent tool that is capable of diagnosing high-frequency components in time series data [14], [15], [17], [29], [34], [36], [42]. Discreet wavelet transformation involves decomposition of time series at different scales, and each component series can be treated for forecasting purpose [25,38,40,44,[49], [50], [51],53,59]. The use of wavelets for forecasting purposes includes the extent of refinement and flexibility, which the traditional methods cannot afford [16,21,22,62,64]. The present study deals with developing a hybrid model for making the prediction of death cases due to COVID-19 by understanding the dynamic nature of the transmission of the virus. Hybrid modeling in such a situation can prove to be a vital tool to deal with it by studying its potential of transmission and growth of the virus in the long run [31,59,66]. For this, we have considered the dataset of daily deaths due to the COVID-19 in most affected five countries of the world, namely Italy, Spain, France, the United Kingdom (UK), and the United States of America (USA) (Data Source: World Health Organization)

Methodology

Wavelet analysis

Wavelets are localized functions with zero mean and compact support, which are capable of analyzing non-periodic and transient signals [13], [39], [45], [47]. A function ψ(x) ∈ L 2(R) is a wavelet if it satisfies the admissibility condition (1).where denotes Fourier Transform of ψ(ω). A family of functions generated by translation and dilation of a single function ψ(t) is known as the ‘Mother Wavelet.’ A mother wavelet constitutes a family of functions of the formwhere ‘a’ is a scaling parameter which determines the expansion or compactness of a signal, and ‘b’ is a translation or shifting parameter which determines the location of wavelet. For discreet wavelet decomposition of time series , the mother wavelet function ψ and the father wavelet function φ are defined respectively by Eqs. (3) and (4). The approximation coefficients α are obtained by convoluting the scaling coefficients φ with f(t) and the convolution with f(t) of the wavelet function ψ gives the detailed coefficients which are given by Eqs. (5) and (6). Using integrals (5) and (6), decomposed series applicable to continuous time series f(t) is given by (7). Since the time series data under study is discreet and is of finite length, so the discretized time series y(t) of length is given by (8). The decomposition of f(t) into approximation and detail components is also classified in Fig. 1 [52].
Fig. 1

Wavelet Decomposition of signal f(t).

Wavelet Decomposition of signal f(t).

Econometric forecasting models

Autoregressive Integrated Moving Average (ARIMA) model is best among various econometric models such as ARMA (Autoregressive Moving Average), MA (Moving Average), and AR (Autoregressive) models. ARIMA model is based on the Box-Jenkins Model (1960), which makes use of past values to predict future values of time series. ARIMA modeling approach basically has three phases; model identification, parameter estimation, and diagnostic checking of the model. The model identification stage determines the time series for stationarity and seasonality, which needs to be modeled before parameter estimation. The stationarity of time series can be judged from an autocorrelation function (ACF) plot, and in case of non-stationary time series, differencing transformation can be applied to obtain stationary data. Seasonality can be modeled by taking seasonal differencing and regenerating autocorrelation function (ACF) and partial autocorrelation function (PACF) plots. These plots are also helpful in identifying the values of parameters p and q [8,9]. Parameter estimation of the appropriately selected model is made by maximum likelihood, which is a commonly used method for evaluation. Finally, the overall adequacy of the model is checked with the help of the Ljung and Box test so that no further modeling of time series is required (McNeil et al. 2006; [41]). An ARIMA (p, d, q) model using lag polynomial L is expressed aswhere the non-negative integers p and q are the orders of autoregressive and moving average polynomials respectively; d is the non-seasonal differencing required to make data stationary; f(t) is the value of observations and ɛ is a random error at time t; φ and θ are the coefficients.

Hybrid prediction model

Both the ARIMA model and the Wavelet decomposition methods have different tendencies to deal with linear and non-linear features of data, so the coupled models proposed in this study consists of forecasting by ARIMA models on the time series data refined by wavelet decomposition methods. Thus, the coupled models can improve forecasting performance by modeling linear and non-linear components of data [48]. In the wavelet decomposition method, time-series data f(t) is first decomposed into approximations (A) and detail (D) coefficients (Section 2.1), which can be used as separate series for prediction purposes; then, each of these series is modeled and forecasted by using an appropriate ARIMA model. The predicted approximations () and detail () coefficients so obtained are summed to obtain forecasted data (), expressed as

Application and results

In this paper, the dataset consisting of death cases by COVID-19 in five countries of the world, namely Italy, Spain, France, the United Kingdom, and the United States of America is used as input to a hybrid model and prediction results so obtained are compared with that of the ARIMA model. The dataset consists of 82 daily observations ranging from 21 January 2020 to 11 April 2020, out of which 66 data points (80% of  data) are used for modeling purpose and rest 16 (20% of  data) are kept for testing purpose of validating the model.

Time series analysis

The first step in many time series methods is to check the stationarity of data. Quick changes in time series data indicate non-stationarity, which can be checked by an autocorrelation function (ACF) and partial autocorrelation function (PACF) plots. A slow decaying ACF plot indicates that the time series is non-stationary, and it is removed by differencing transformation to get stationary data [9]. After checking stationarity, the next step is to determine the order of the ARIMA model parameter, which can be determined by the ACF plot of differenced time series. Then, an appropriate ARIMA model is fitted to data that generates future values of time series data.

Wavelet decomposition

Wavelet decomposition is an excellent method of extracting different frequency components from a signal and explores important features of the signal. For applying wavelet decomposition to time series, the choice of mother wavelet, its order, and level of decomposition is very important. There are several families of wavelets for wavelet decomposition, but Daubechies wavelet is one of the important types of wavelets which has its own advantages. An accurate forecasting system is developed on the basis of appropriate order and level of decomposition of the input signal. COVID-19 death cases data is decomposed by using the Daubechies wavelet of order 8 and level 3, which are shown in Fig. 2 . The approximation parts are low-frequency parts showing a trend and detailed parts representing high-frequency parts. The approximation A3 and details D1, D2, D3 are separately modeled with an appropriate ARIMA model to obtain predicted components. The predicted outputs are finally summed to obtain the forecasts of death cases data given in Eq. (11).where capped (^) symbol is used to denote predicted values.
Fig. 2

Time series plot and Wavelet decomposition of the dataset of five countries in the order, namely France, Italy, Spain, UK, and USA.

Time series plot and Wavelet decomposition of the dataset of five countries in the order, namely France, Italy, Spain, UK, and USA.

Hybrid model

In this section, Wavelet decomposition, together with an ARIMA model, is applied to COVID-19 death cases dataset for obtaining accurate prediction results. Autoregressive Integrated Moving Average (ARIMA) is an appropriate econometric model used to generate future values independently as well as jointly with Wavelet decomposition [2, 5, 18]. In the case of a hybrid model, the data is decomposed first into constituent series by using the Daubechies Wavelet of order 8 (Db8) at level 3, and then the ARIMA model is applied to each of constituent series to generate a forecast. Finally, the predicted values of the constituent series are summed to obtain the output of the hybrid model. The predictive performance hybrid and ARIMA models are compared finally to find the best model among them with least forecasting errors. Model outputs are compared with testing data using a statistical measure of errors such as root mean square error (RMSE), mean absolute error (MAE) mean absolute percentage error (MAPE), and coefficient of determination (R). MAE, RMSE, and MAPE are defined by where denotes the predicted value of f(t).

Discussion of results

The hybrid model of discreet Wavelet decomposition combined with an ARIMA model is developed and applied to COVID-19 death cases dataset to predict future death cases. The performance of the developed hybrid model is compared with the econometric, ARIMA model to find accurate prediction results with the least errors. Table 1 shows the accuracy of the prediction of a hybrid model over an ARIMA model when 20% observed data is compared with the prediction data. Computational results reveal that the hybrid model minimizes forecasting errors as compared to the ARIMA model. The prediction errors by the ARIMA model are just 7.726% and 5.653% for Italy and Spain, which are reduced by approximately 85% by application of the hybrid Wavelet-ARIMA model. For other countries like France, UK, and USA, the decrease in error is approximately 50% for the hybrid model. A comparison of model forecasts has been shown in Fig. 3 , which reveals that the outputs generated by a hybrid model are fairly close to observed data values. One-month ahead prediction of death cases by ARIMA and hybrid Wavelet-ARIMA models are shown in Figs. 4 and 5 .
Table 1

Predictive Performance of ARIMA and Wavelet-ARIMA Models.

CountryMAE (x103)MSE(x106)RMSE(x103)MAPER-square
ARIMA Model
Italy1.2432.5651.6017.7260.9944
Spain0.6930.7820.8845.6530.9989
France24.64011.0063.31729.6960.9826
USA2.82216.5404.10329.7680.9806
UK1.31629.7421.72428.2850.9980
Hybrid Wavelet-ARIMA Model
Italy0.4640.3980.6302.8040.9985
Spain0.1360.0280.1701.2480.9996
France1.6275.2452.29018.5330.9861
USA1.3413.9001.97415.6250.9888
UK0.1930.0640.2535.6230.9974
Fig. 3

Comparison of forecasts of ARIMA and hybrid Wavelet-ARIMA model of five countries in the order namely France, Italy, Spain, UK and USA.

Fig. 4

One-month ahead ARIMA model forecast of Death cases.

Fig. 5

One-month ahead Wavelet-ARIMA model forecast of Death cases.

Predictive Performance of ARIMA and Wavelet-ARIMA Models. Comparison of forecasts of ARIMA and hybrid Wavelet-ARIMA model of five countries in the order namely France, Italy, Spain, UK and USA. One-month ahead ARIMA model forecast of Death cases. One-month ahead Wavelet-ARIMA model forecast of Death cases.

Conclusions

In this paper, hybrid Wavelet-ARIMA model is developed and the accuracy of proposed model is investigated using past 66 days data of death cases by COVID-19 and a prediction of 16 days ahead death cases was made within sample which was then used to predict one-month ahead out of sample death cases in most affected five countries of world namely France, Italy, Spain, UK and USA. Discreet Wavelet decomposition of dataset was combined with an econometric model in order to develop a better hybrid model to forecast future death cases accurately. The forecast obtained by hybrid Wavelet-ARIMA model reduced errors approximately by 50% as compared to ARIMA model. The performance of hybrid model is nearly 80% better than ARIMA model for the countries Italy, Spain and UK whereas it is approximately 50% better for the countries France and USA. Thus, the results showed better performance of hybrid model as compared with ARIMA model and can be used as forecasting technique. The prediction of death cases by this technique can help governments to take preventive measures before any disastrous situation.

Availability of data and materials

All data are publicly available with WHO [57], [58].

Declaration of Competing Interest

There is no conflict of interest between the authors.
  22 in total

1.  Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia.

Authors:  Qun Li; Xuhua Guan; Peng Wu; Xiaoye Wang; Lei Zhou; Yeqing Tong; Ruiqi Ren; Kathy S M Leung; Eric H Y Lau; Jessica Y Wong; Xuesen Xing; Nijuan Xiang; Yang Wu; Chao Li; Qi Chen; Dan Li; Tian Liu; Jing Zhao; Man Liu; Wenxiao Tu; Chuding Chen; Lianmei Jin; Rui Yang; Qi Wang; Suhua Zhou; Rui Wang; Hui Liu; Yinbo Luo; Yuan Liu; Ge Shao; Huan Li; Zhongfa Tao; Yang Yang; Zhiqiang Deng; Boxi Liu; Zhitao Ma; Yanping Zhang; Guoqing Shi; Tommy T Y Lam; Joseph T Wu; George F Gao; Benjamin J Cowling; Bo Yang; Gabriel M Leung; Zijian Feng
Journal:  N Engl J Med       Date:  2020-01-29       Impact factor: 176.079

2.  Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.

Authors:  Joseph T Wu; Kathy Leung; Gabriel M Leung
Journal:  Lancet       Date:  2020-01-31       Impact factor: 79.321

3.  A Novel Coronavirus from Patients with Pneumonia in China, 2019.

Authors:  Na Zhu; Dingyu Zhang; Wenling Wang; Xingwang Li; Bo Yang; Jingdong Song; Xiang Zhao; Baoying Huang; Weifeng Shi; Roujian Lu; Peihua Niu; Faxian Zhan; Xuejun Ma; Dayan Wang; Wenbo Xu; Guizhen Wu; George F Gao; Wenjie Tan
Journal:  N Engl J Med       Date:  2020-01-24       Impact factor: 91.245

4.  Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges.

Authors:  Chih-Cheng Lai; Tzu-Ping Shih; Wen-Chien Ko; Hung-Jen Tang; Po-Ren Hsueh
Journal:  Int J Antimicrob Agents       Date:  2020-02-17       Impact factor: 5.283

5.  Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak.

Authors:  Shi Zhao; Qianyin Lin; Jinjun Ran; Salihu S Musa; Guangpu Yang; Weiming Wang; Yijun Lou; Daozhou Gao; Lin Yang; Daihai He; Maggie H Wang
Journal:  Int J Infect Dis       Date:  2020-01-30       Impact factor: 3.623

6.  Venezuelan migrants "struggling to survive" amid COVID-19.

Authors:  Joe Parkin Daniels
Journal:  Lancet       Date:  2020-03-28       Impact factor: 79.321

7.  A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action.

Authors:  Qianying Lin; Shi Zhao; Daozhou Gao; Yijun Lou; Shu Yang; Salihu S Musa; Maggie H Wang; Yongli Cai; Weiming Wang; Lin Yang; Daihai He
Journal:  Int J Infect Dis       Date:  2020-03-04       Impact factor: 3.623

8.  Early dynamics of transmission and control of COVID-19: a mathematical modelling study.

Authors:  Adam J Kucharski; Timothy W Russell; Charlie Diamond; Yang Liu; John Edmunds; Sebastian Funk; Rosalind M Eggo
Journal:  Lancet Infect Dis       Date:  2020-03-11       Impact factor: 25.071

9.  Successful containment of COVID-19: the WHO-Report on the COVID-19 outbreak in China.

Authors:  Bernd Salzberger; Thomas Glück; Boris Ehrenstein
Journal:  Infection       Date:  2020-04       Impact factor: 3.553

10.  Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts.

Authors:  Joel Hellewell; Sam Abbott; Amy Gimma; Nikos I Bosse; Christopher I Jarvis; Timothy W Russell; James D Munday; Adam J Kucharski; W John Edmunds; Sebastian Funk; Rosalind M Eggo
Journal:  Lancet Glob Health       Date:  2020-02-28       Impact factor: 26.763

View more
  24 in total

1.  Estimating the Prevalence and Mortality of Coronavirus Disease 2019 (COVID-19) in the USA, the UK, Russia, and India.

Authors:  Yongbin Wang; Chunjie Xu; Sanqiao Yao; Yingzheng Zhao; Yuchun Li; Lei Wang; Xiangmei Zhao
Journal:  Infect Drug Resist       Date:  2020-09-29       Impact factor: 4.003

2.  Design of PM2.5 monitoring and forecasting system for opencast coal mine road based on internet of things and ARIMA Mode.

Authors:  Meng Wang; Qiaofeng Zhang; Caiwang Tai; Jiazhen Li; Zongwei Yang; Kejun Shen; Chengbin Guo
Journal:  PLoS One       Date:  2022-05-05       Impact factor: 3.240

3.  Study of ARIMA and least square support vector machine (LS-SVM) models for the prediction of SARS-CoV-2 confirmed cases in the most affected countries.

Authors:  Sarbjit Singh; Kulwinder Singh Parmar; Sidhu Jitendra Singh Makkhan; Jatinder Kaur; Shruti Peshoria; Jatinder Kumar
Journal:  Chaos Solitons Fractals       Date:  2020-07-04       Impact factor: 9.922

4.  Deep learning-based forecasting model for COVID-19 outbreak in Saudi Arabia.

Authors:  Ammar H Elsheikh; Amal I Saba; Mohamed Abd Elaziz; Songfeng Lu; S Shanmugan; T Muthuramalingam; Ravinder Kumar; Ahmed O Mosleh; F A Essa; Taher A Shehabeldeen
Journal:  Process Saf Environ Prot       Date:  2020-11-01       Impact factor: 6.158

5.  SIRSi compartmental model for COVID-19 pandemic with immunity loss.

Authors:  Cristiane M Batistela; Diego P F Correa; Átila M Bueno; José Roberto C Piqueira
Journal:  Chaos Solitons Fractals       Date:  2020-10-29       Impact factor: 5.944

6.  Forecasting of medical equipment demand and outbreak spreading based on deep long short-term memory network: the COVID-19 pandemic in Turkey.

Authors:  Erdinç Koç; Muammer Türkoğlu
Journal:  Signal Image Video Process       Date:  2021-01-25       Impact factor: 1.583

Review 7.  A review on COVID-19 forecasting models.

Authors:  Iman Rahimi; Fang Chen; Amir H Gandomi
Journal:  Neural Comput Appl       Date:  2021-02-04       Impact factor: 5.102

8.  The impact of the Covid-19 related media coverage upon the five major developing markets.

Authors:  Zaghum Umar; Mariya Gubareva; Tatiana Sokolova
Journal:  PLoS One       Date:  2021-07-01       Impact factor: 3.240

9.  Efficient artificial intelligence forecasting models for COVID-19 outbreak in Russia and Brazil.

Authors:  Mohammed A A Al-Qaness; Amal I Saba; Ammar H Elsheikh; Mohamed Abd Elaziz; Rehab Ali Ibrahim; Songfeng Lu; Ahmed Abdelmonem Hemedan; S Shanmugan; Ahmed A Ewees
Journal:  Process Saf Environ Prot       Date:  2020-11-13       Impact factor: 6.158

10.  Forecasting the epidemiological trends of COVID-19 prevalence and mortality using the advanced α-Sutte Indicator.

Authors:  Yongbin Wang; Chunjie Xu; Sanqiao Yao; Yingzheng Zhao
Journal:  Epidemiol Infect       Date:  2020-10-05       Impact factor: 2.451

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.