Literature DB >> 32405266

Time series modelling to forecast the confirmed and recovered cases of COVID-19.

Mohsen Maleki1, Mohammad Reza Mahmoudi2,3, Darren Wraith4, Kim-Hung Pho5.   

Abstract

Coronaviruses are enveloped RNA viruses from the Coronaviridae family affecting neurological, gastrointestinal, hepatic and respiratory systems. In late 2019 a new member of this family belonging to the Betacoronavirus genera (referred to as COVID-19) originated and spread quickly across the world calling for strict containment plans and policies. In most countries in the world, the outbreak of the disease has been serious and the number of confirmed COVID-19 cases has increased daily, while, fortunately the recovered COVID-19 cases have also increased. Clearly, forecasting the "confirmed" and "recovered" COVID-19 cases helps planning to control the disease and plan for utilization of health care resources. Time series models based on statistical methodology are useful to model time-indexed data and for forecasting. Autoregressive time series models based on two-piece scale mixture normal distributions, called TP-SMN-AR models, is a flexible family of models involving many classical symmetric/asymmetric and light/heavy tailed autoregressive models. In this paper, we use this family of models to analyze the real world time series data of confirmed and recovered COVID-19 cases.
© 2020 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Autoregressive model; COVID-29; Coronaviruses; Prediction; Two pieces distributions based on the scale mixtures normal distribution

Mesh:

Year:  2020        PMID: 32405266      PMCID: PMC7219401          DOI: 10.1016/j.tmaid.2020.101742

Source DB:  PubMed          Journal:  Travel Med Infect Dis        ISSN: 1477-8939            Impact factor:   6.211


Introduction

Coronaviridae family includes two main subfamilies Coronavirinae and Torovirinae. The member genera include Alphacoronavirus, Betacoronavirus, Gammacoronavirus, Torovirus, and Bafinivirus. They are a huge family of viruses that affect neurological, gastrointestinal, hepatic and respiratory systems and can be grown among humans, bats, mice, livestock, birds, and others [[1], [2], [3]]. In the Coronaviridae family, a well-known type of virus called SARS coronavirus (SARS-CoV) distributed from animal to animal and humans [4]. Another type of coronavirus, called MERS coronavirus (MERS -CoV), significantly distributed from human to human in 2012 [4]. In 2019 many cases in China with respiratory diseases were reported by the World Health Organization (WHO), with evidence that these cases originated from a seafood market in Wuhan [5]. In 2019 a new type of virus called COVID-19 (novel coronavirus, 2019-nCoV), belonging to the Betacoronavirus genera of Coronaviridae family, spread from Wuhan in China [6]. Evidence that COVID-19 is distributed from human to human has been verified by the Centers for Disease Control and Prevention (CDC), and also reported that COVID-19 is spreading by touching surfaces, close contact, air, or objects that contain viral particles. The incubation period of COVID-19 is at least 14 days [7], and it can spread to others in the incubation period. Finally note that the incubation period and median age of confirmed cases are respectively 3 days and 47.0 years [8]. Preparation and controlling the outbreak of COVID-19 diseases requires thorough planning and policies. Some researchers have used statistical and mathematical modelling. In China, the number of unreported COVID-19 cases has been mathematically estimated in Refs. [9]. Also based on the information of some Japanese passengers in Wuhan [10], estimated the rate of the infection for COVID-19 in Wuhan. The results indicated a rate of 9.5% for infection and a rate from 0.3% to 0.6%, for death. Based on mathematical modelling in Ref. [11], the transmission risk of COVID-19 is on average about 6.47 persons and predicted the time that the peak of COVID-19 will be reached. Estimation of a sustained human-to-human transmission equal to 0.4 for COVID-19 using the information of 47 patients has been done in Ref. [12]. In Ref. [13], based on two scenarios, found that the risk of death is 5.1% and 8.4%. The modelling, estimation and prediction of the prevalence of viruses and the epidemiological characteristics are important issues in providing the equipment needed to cope with their consequences. Forecasting of the cases and transmission risk of West Nile virus (WNV) has been provided by Ref. [14]. For further modelling and forecasting of the spread of several viruses such as the hepatitis A virus, Ebola, SARS, influenza A and MERS, refer to Refs. [15], [16], [17], [18], [19], [20], [21]. To have a suitable plan for COVID-19, forecasting the future confirmed cases are critical. An optimization method, named FPASSA-ANFIS, has been proposed by Ref. [22] to model the number of confirmed cases of COVID-19 and to predict its future values using collected data in China. Forecasting various data about COVID-19, by mathematical and statistical models, is very important to a program of cutting the transmission chain of diseases; see, e.g., Ref. [23], [24], [25], [26]. According to credible daily reports from the World Health Organization and other world-renowned institutions in the field of public health, the total number of COVID-19 confirmed cases has increased in different countries, especially in U.S.A, Italy, Spain and Iran. Although the spread of COVID-19 has many dangers, fortunately reports show that the total number of COVID-19 recovered cases has also increased. Increasing the number of recovered cases, along with reducing or stabilizing the number of confirmed cases is important to control the spread of the COVID-19 and leads to stability of the rate of infections in the world. So modelling and forecasting the numbers of confirmed and recovered COVID-19 cases has an important role to plan the control of the spread of the COVID-19 in the world. Cumulative numbers of the confirmed and recovered COVID-19 cases, which are reported daily by the proposed organizations, on each day depend on their values on the past days. So using autoregressive time series model can be a useful tool to model, analyze and forecast the confirmed and recovered cases of COVID-19. The SIR epidemic modelling can be done at local (country) level but the autoregressive model can be good to look at overall patterns. The autoregressive time series model is a flexible tool to model dependent data and has been used to estimate and forecast many real practical problems, see Refs. [[27], [28], [29], [30], [31], [32], [33], [34]]. In fact, the autoregressive model, determines the probabilistic behavior of the current values based on a linear combination of past values , in the form of:where the error terms are generally assumed to be uncorrelated and identically probabilistically distributed random variables from a distribution, and denoted by . (see e.g. Ref. [39,40].) Because in many real world time series data, classical modelling based on the symmetrical/light-tailed distributions are not satisfactory, in our methodology we have used autoregressive time series model (1) based on asymmetric/heavy-tailed TP–SMN distributions. Therefore we assume the error terms in (1) are distributed as TP–SMN distributions, denoted by and . See details of the proposed distributions and model in Ref. [[34], [35], [36], [37], [38]]. In this paper we modeled the total number of confirmed and recovered COVID-19 cases in the world by the proposed autoregressive time series model so-called TP–SMN–AR models which includes the symmetric Gaussian and asymmetric heavy-tailed non-Gaussian autoregressive time series models. The various members of the proposed autoregressive models were fitted initially to the historical numbers of confirmed and recovered COVID-19 cases in the world. Then, the autoregressive time series that has the best fit to each of the dataset is selected. Finally, the selected models are used to predict the number of confirmed and recovered COVID-19 cases in the world from 21-Apr-2020 up to 30-Apr-2020, and we measure the differences between the real and predicted values to show the performance of the models. Therefore, the main contribution points of the current study are as follows: an improved autoregressive time series model based on the TP–SMN distributions, and a new efficient predictive model applied to predict and estimate the confirmed and recovered COVID-19 cases in the world using past and current data. Note that a sample copy of the code is available from the authors upon request.

Modelling the confirmed and recovered cases of the COVID-19 in the world

The coronavirus (COVID-19) is affecting about 212 countries and territories around the world and two international conveyances. The daily data for COVID-19 in the world are reported by the China National Health Commission (NHC) and World Health Organization (WHO). In this part we fit the TP–SMN–AR time series model to the total confirmed COVID-19 cases from 22-Jan-2020 to 30-Apr-2020 and also to the total recovered COVID-19 cases from 02-Feb-2020 to 30-Apr-2020, in the world. Time series plots of the total confirmed and recovered cases are plotted in Fig. 1 and Fig. 2 respectively. The proposed time series plots are not stationary because they are increasing and show signs of a trend. After some suitable transformations described in Ref. [40], we obtain stationary data. Also using model selection criteria [34,39,40] the best TP–SMN–AR models (the autoregressive models based on the two-piece t distributions) were fitted to the stationary series of the confirmed and recovered cases and are given by. where where
Fig. 1

Time series plot of the total confirmed COVID-19 cases in the world from 22-Jan to 30-Apr of 2020.

Fig. 2

Time series plot of the total recovered COVID-19 cases in the world from 02-Feb to 30-Apr of 2020.

The confirmed COVID-19 cases; model: The recovered COVID-19 cases; model: Time series plot of the total confirmed COVID-19 cases in the world from 22-Jan to 30-Apr of 2020. Time series plot of the total recovered COVID-19 cases in the world from 02-Feb to 30-Apr of 2020. The histograms of the estimated errors (residuals) based on the estimated heavy-tailed TP–SMN densities are superimposed in Fig. 3 and show the suitable performance of the estimated models to the stationary series of total confirmed and recovered COVID-19 cases datasets. Also the auto–correlation function (ACF) plots of the residuals presented in Fig. 4 show the suitability of the fitted models.
Fig. 3

Histograms of the residuals of the fitted models on the confirmed COVID-19 cases (a), and the recovered COVID-19 cases (b) datasets in the world, with their superimposed estimated densities.

Fig. 4

ACF of the residuals of the fitted models on the confirmed COVID-19 cases (a), and the recovered COVID-19 cases (b).

Histograms of the residuals of the fitted models on the confirmed COVID-19 cases (a), and the recovered COVID-19 cases (b) datasets in the world, with their superimposed estimated densities. ACF of the residuals of the fitted models on the confirmed COVID-19 cases (a), and the recovered COVID-19 cases (b). To further demonstrate the goodness of fit of the model, we eliminated the last 10 days of the confirmed and recovered cases (2020-Apr-21 to 2020-Apr-30), and then fitted the TP–SMN–AR models and provided forecasts. Table 1 contains the predictions and 98% confidence intervals for this analysis. Also Fig. 5 , Fig. 6 and Fig. 7 , show the forecasted values which are superimposed on the plots of the real values of the confirmed and recovered COVID-19 cases in the world.
Table 1

The real values of the total confirmed and recovered COVID-19 cases in the world data from 2020-Apr-21 to 2020-Apr-30 with predictions and 98% confidence interval.

COVID-19 DataDateReal valuePredictionLower C·I.Upper
Confirmed Cases2020-Apr-212556720255680625459422568200
2020-Apr-222637439263740926267222648536
2020-Apr-232722857272141027109142732294
2020-Apr-242828682280886027984392819720
2020-Apr-252919404293752929256902948566
2020-Apr-262993292300421229923533015772
2020-Apr-273059944306494330526943077488
2020-Apr-283136505312984131177733143146
2020-Apr-293218183321819932049513231956
2020-Apr-303304220330221132899793315769
Recovered Cases2020-Apr-21691650670555638016707815
2020-Apr-22718761722622689342761654
2020-Apr-23746924753685716873795415
2020-Apr-24815145775550737065858978
2020-Apr-25854466864671818679914842
2020-Apr-26877411904511854466957290
2020-Apr-27921320914058865114968022
2020-Apr-289533099542019035601010487
2020-Apr-2910000339852649329031043419
2020-Apr-30103902810386899842791099247
Fig. 5

Time series plot of the confirmed COVID-19 cases data and predicted data from 2020-Apr-21 to 30-Apr of 2020.

Fig. 6

Time series plot of the recovered COVID-19 cases data and predicted data from 2020-Apr-21 to 30-Apr of 2020.

Fig. 7

Time series plots of the real values and predicted confirmed COVID-19 cases (a) and recovered COVID-19 cases (b) datasets from 2020-Apr-21 up to 2020-Apr-30 with 98% confidence intervals.

The real values of the total confirmed and recovered COVID-19 cases in the world data from 2020-Apr-21 to 2020-Apr-30 with predictions and 98% confidence interval. Time series plot of the confirmed COVID-19 cases data and predicted data from 2020-Apr-21 to 30-Apr of 2020. Time series plot of the recovered COVID-19 cases data and predicted data from 2020-Apr-21 to 30-Apr of 2020. Time series plots of the real values and predicted confirmed COVID-19 cases (a) and recovered COVID-19 cases (b) datasets from 2020-Apr-21 up to 2020-Apr-30 with 98% confidence intervals. To evaluate the accuracy of the predictions, we use the mean relative percentage error (MAPE), which for the confirmed COVID-19 cases is 0.22% and for the recovered COVID-19 cases is 1.6% which are reasonably low values demonstrating the suitability of the proposed models for prediction. Finally note that the proposed TP–SMN–AR models include as special or limiting cases the more standard autoregressive time series models used in the literature. In particular, some model selection criteria such as Akaike information criteria (AIC), Bayesian information criteria (BIC), and Box–Pierce and Ljung–Box tests on the residuals, demonstrate that the proposed fitted TP–SMN–AR models are more reasonable than other well-known counterparts.

Conclusion

Coronaviruses are a huge family of viruses that affect neurological, gastrointestinal, hepatic, and respiratory systems. The number of confirmed cases has increased daily in different countries, especially in U.S.A, Italy, Spain, Iran, China and others. The spread of COVID-19 has many dangers and needs strict special plans and policies. Therefore, to consider plans and policies, predicting and forecasting the future confirmed and recoveries cases are critical. The autoregressive time series models are a useful tool to model data over time. However, some of the standard time series models are based on the assumption that the error term or residuals are symmetric (Gaussian). There exist many situations in the real world that the assumption of symmetric distribution of the error terms is not satisfactory. In our methodology, we considered autoregressive time series models based on the two–piece scale mixture normal (TP–SMN) distributions. The results indicated that the proposed method performed well in forecasting confirmed and recovered COVID-19 cases in the world. Using model selection criteria, the proposed models were also more reasonable than the standard Gaussian autoregressive time series model which is the simplest member of our proposed models. For future works, we suggest that the researchers apply cyclostationary, almost cyclostationary and simple processes [[41], [42], [43], [44], [45], [46], [47]] based on the TP–SMN distributions, instead of stationary processes.

Funding

No funding

CRediT authorship contribution statement

Mohsen Maleki: Data curation, Validation, Writing - original draft. Mohammad Reza Mahmoudi: Conceptualization, Methodology, Software, Supervision. Darren Wraith: Visualization, Investigation, Writing - review & editing. Kim-Hung Pho: Visualization, Investigation, Writing - review & editing.

Declaration of competing interest

The authors declare no conflict of interest.
  21 in total

1.  Real-time epidemic monitoring and forecasting of H1N1-2009 using influenza-like illness from general practice and family doctor clinics in Singapore.

Authors:  Jimmy Boon Som Ong; Mark I-Cheng Chen; Alex R Cook; Huey Chyi Lee; Vernon J Lee; Raymond Tzer Pin Lin; Paul Ananth Tambyah; Lee Gan Goh
Journal:  PLoS One       Date:  2010-04-14       Impact factor: 3.240

Review 2.  Review of bats and SARS.

Authors:  Lin-Fa Wang; Zhengli Shi; Shuyi Zhang; Hume Field; Peter Daszak; Bryan T Eaton
Journal:  Emerg Infect Dis       Date:  2006-12       Impact factor: 6.883

3.  Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor.

Authors:  Xing-Yi Ge; Jia-Lu Li; Xing-Lou Yang; Aleksei A Chmura; Guangjian Zhu; Jonathan H Epstein; Jonna K Mazet; Ben Hu; Wei Zhang; Cheng Peng; Yu-Ji Zhang; Chu-Ming Luo; Bing Tan; Ning Wang; Yan Zhu; Gary Crameri; Shu-Yi Zhang; Lin-Fa Wang; Peter Daszak; Zheng-Li Shi
Journal:  Nature       Date:  2013-10-30       Impact factor: 49.962

4.  Ensemble forecast of human West Nile virus cases and mosquito infection rates.

Authors:  Nicholas B DeFelice; Eliza Little; Scott R Campbell; Jeffrey Shaman
Journal:  Nat Commun       Date:  2017-02-24       Impact factor: 14.919

5.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding.

Authors:  Roujian Lu; Xiang Zhao; Juan Li; Peihua Niu; Bo Yang; Honglong Wu; Wenling Wang; Hao Song; Baoying Huang; Na Zhu; Yuhai Bi; Xuejun Ma; Faxian Zhan; Liang Wang; Tao Hu; Hong Zhou; Zhenhong Hu; Weimin Zhou; Li Zhao; Jing Chen; Yao Meng; Ji Wang; Yang Lin; Jianying Yuan; Zhihao Xie; Jinmin Ma; William J Liu; Dayan Wang; Wenbo Xu; Edward C Holmes; George F Gao; Guizhen Wu; Weijun Chen; Weifeng Shi; Wenjie Tan
Journal:  Lancet       Date:  2020-01-30       Impact factor: 79.321

6.  Optimization Method for Forecasting Confirmed Cases of COVID-19 in China.

Authors:  Mohammed A A Al-Qaness; Ahmed A Ewees; Hong Fan; Mohamed Abd El Aziz
Journal:  J Clin Med       Date:  2020-03-02       Impact factor: 4.241

Review 7.  2019 Novel coronavirus: where we are and what we know.

Authors:  Zhangkai J Cheng; Jing Shan
Journal:  Infection       Date:  2020-02-18       Impact factor: 7.455

8.  Novel Coronavirus Outbreak in Wuhan, China, 2020: Intense Surveillance Is Vital for Preventing Sustained Transmission in New Locations.

Authors:  Robin N Thompson
Journal:  J Clin Med       Date:  2020-02-11       Impact factor: 4.241

9.  Predicting the international spread of Middle East respiratory syndrome (MERS).

Authors:  Kyeongah Nah; Shiori Otsuki; Gerardo Chowell; Hiroshi Nishiura
Journal:  BMC Infect Dis       Date:  2016-07-22       Impact factor: 3.090

Review 10.  Emerging coronaviruses: Genome structure, replication, and pathogenesis.

Authors:  Yu Chen; Qianyun Liu; Deyin Guo
Journal:  J Med Virol       Date:  2020-02-07       Impact factor: 2.327

View more
  27 in total

1.  A Short-Term Prediction Model at the Early Stage of the COVID-19 Pandemic Based on Multisource Urban Data.

Authors:  Ruxin Wang; Chaojie Ji; Zhiming Jiang; Yongsheng Wu; Ling Yin; Ye Li
Journal:  IEEE Trans Comput Soc Syst       Date:  2021-03-05

2.  A Bayesian approach on the two-piece scale mixtures of normal homoscedastic nonlinear regression models.

Authors:  Zahra Barkhordar; Mohsen Maleki; Zahra Khodadadi; Darren Wraith; Farajollah Negahdari
Journal:  J Appl Stat       Date:  2020-12-03       Impact factor: 1.416

3.  The impact of asymptomatic individuals on the strength of public health interventions to prevent the second outbreak of COVID-19.

Authors:  Xiaochen Wang; Shengfeng Wang; Yueheng Lan; Xiaofeng Tao; Jinghua Xiao
Journal:  Nonlinear Dyn       Date:  2020-06-14       Impact factor: 5.741

4.  Deep learning-based forecasting model for COVID-19 outbreak in Saudi Arabia.

Authors:  Ammar H Elsheikh; Amal I Saba; Mohamed Abd Elaziz; Songfeng Lu; S Shanmugan; T Muthuramalingam; Ravinder Kumar; Ahmed O Mosleh; F A Essa; Taher A Shehabeldeen
Journal:  Process Saf Environ Prot       Date:  2020-11-01       Impact factor: 6.158

5.  Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA).

Authors:  K E ArunKumar; Dinesh V Kalaga; Ch Mohan Sai Kumar; Govinda Chilkoor; Masahiro Kawaji; Timothy M Brenza
Journal:  Appl Soft Comput       Date:  2021-02-08       Impact factor: 6.725

6.  Factor analysis approach to classify COVID-19 datasets in several regions.

Authors:  Mohammad Reza Mahmoudi; Dumitru Baleanu; Shahab S Band; Amir Mosavi
Journal:  Results Phys       Date:  2021-03-22       Impact factor: 4.476

7.  Pandemic coronavirus disease (Covid-19): World effects analysis and prediction using machine-learning techniques.

Authors:  Dimple Tiwari; Bhoopesh Singh Bhati; Fadi Al-Turjman; Bharti Nagpal
Journal:  Expert Syst       Date:  2021-05-11       Impact factor: 2.812

8.  Modelling and Forecasting of Growth Rate of New COVID-19 Cases in Top Nine Affected Countries: Considering Conditional Variance and Asymmetric Effect.

Authors:  Aykut Ekinci
Journal:  Chaos Solitons Fractals       Date:  2021-07-08       Impact factor: 5.944

9.  Efficient artificial intelligence forecasting models for COVID-19 outbreak in Russia and Brazil.

Authors:  Mohammed A A Al-Qaness; Amal I Saba; Ammar H Elsheikh; Mohamed Abd Elaziz; Rehab Ali Ibrahim; Songfeng Lu; Ahmed Abdelmonem Hemedan; S Shanmugan; Ahmed A Ewees
Journal:  Process Saf Environ Prot       Date:  2020-11-13       Impact factor: 6.158

10.  Mid-Epidemic Forecasts of COVID-19 Cases and Deaths: A Bivariate Model Applied to the UK.

Authors:  Peter Congdon
Journal:  Interdiscip Perspect Infect Dis       Date:  2021-02-12
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.