Literature DB >> 32537480

ARIMA modelling and forecasting of irregularly patterned COVID-19 outbreaks using Japanese and South Korean data.

Xingde Duan1, Xiaolei Zhang2.   

Abstract

The World Health Organization (WHO) upgraded the status of the coronavirus disease 2019 (COVID-19) outbreak from epidemic to global pandemic on March 11, 2020. Various mathematical and statistical models have been proposed to predict the spread of COVID-2019 [1]. We collated data on daily new confirmed cases of the COVID-19 outbreaks in Japan and South Korea from January 20, 2020 to April 26, 2020. Auto Regressive Integrated Moving Average (ARIMA) model were introduced to analyze two data sets and predict the daily new confirmed cases for the 7-day period from April 27, 2020 to May 3, 2020. Also, the forecasting results and both data sets are provided.
© 2020 The Authors.

Entities:  

Keywords:  Daily new cases; Dynamic prediction; Statistical analysis; stationarity

Year:  2020        PMID: 32537480      PMCID: PMC7248635          DOI: 10.1016/j.dib.2020.105779

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the Data These data are easy to collect, and countries are beginning to collect and collate the data and release it publicly for study and analysis. These data can be updated through news and websites to facilitate tracking and analysis during the development of the epidemic. In particular, data on daily new cases are useful because they can be used to predict covid-19 outbreaks. The data from these two typical Asian countries have practical implications for the analysis and intervention of covid-19. The analysis of new data with ARIMA model can timely analyse and predict the changes of COVID-19, and provide dynamic information to relevant departments. At the same time, other research institutions and management departments can also use these data to timely track and study the development and changes of the epidemic.

Data Description

The daily new confirmed cases data of the COVID-19 outbreaks in Japan and South Korea from January 20, 2020 to April 26, 2020 are available from the Wind Database[2]. Also, there are no missing values and the Excel file of the daily data are presented in Supplementary data. The data were analysed using the statistical software R. To visualize the data time series plots of the daily new confirmed cases data in Japan and South Korea for the 98-day period from January 20, 2020 to April 26, 2020, are displayed in Figure 1, Figure 2; respectively. It can be seen from Figure 1 and 2 that both original time series look much more nonstationary and present irregular pattern; therefore, the differencing transformation was incorporated as a useful approach to stabilize the original time series. In addition, the first- difference time series look stationary when compared with the original time series shown in Figure 1 and 2.
Figure 1

Daily new confirmed cases in Japan,first- difference of the original data ,ACF and PACF PLOT.

Figure 2

Daily new confirmed cases in South Korea,first- difference of the original data ,ACF and PACF PLOT.

Daily new confirmed cases in Japan,first- difference of the original data ,ACF and PACF PLOT. Daily new confirmed cases in South Korea,first- difference of the original data ,ACF and PACF PLOT.

Experimental Design, Materials, and Methods

Auto Regressive Integrated Moving Average model, referred as ARIMA model, is employed to analyse the daily new confirmed cases data in Japan and South Korea; respectively. Under the framework of Box-Jenkins method, model identification, estimation, diagnostic checking, and forecasting for ARIMA model was applied to the two original time series [3,4]. The differencing transformation was used to achieve stationarity on certain nonstationary time series. The Augmented Dickey-Fuller (ADF) unit-root test was also introduced to identify whether the time series is stationary [3]. In addition, the R package “tseries” and “forecast” were implemented to produce the numerical output for ARIMA [5]. The first difference of two original series and their ADF unit-root test appear to support stationary ARMA model; therefore, we consider a class of stationary ARMA model as appropriate. Combinating parsimonious parameter models, auto-selection of model order based on R package “tseries” and correlogram of the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) shown in Figure 1 and 2, we chose the orders for ARIMA model as ARIMA (6,1,7) in Japan and ARIMA (2,1,3) in South Korea; respectively. Furthermore, we adopted the following moment method and unconditional least squares to estimate the parameters for the stationary ARMA model. To save space, these estimated results were not reported. To check on the independence of the noise terms from the above ARMA model, we implemented the following diagnostic checking tools: a sequence plot of the residuals, the sample ACF of the residuals, and p-values for the Ljung-Box test statistic for a whole range of the residuals; which indicate the residuals from these ARIMA follow the white noise process. Therefore, the estimated ARIMA model can capture the dependent structure of the daily new confirmed cases time series very well. Finally, based on the above ARIMA model, the predicted value and the upper and lower limits of the predicted value under the 95% confidence level of the daily new confirmed cases for the 7-day period from April 27, 2020 to May 3, 2020 were reported in Table 1 and displayed in Figure 3.
Table 1

Predicted value under the 95% confidence level of the daily new confirmed cases for the 7-day period

JapandatelowwermeanupperKoreadatelowwermeanupper
2020-04-27122.68342207.5012292.3192020-04-27-161.96856.36643174.7014
2020-04-28194.68068303.4768412.27292020-04-28-210.91352.035784214.9851
2020-04-29211.76786333.6616455.55542020-04-29-272.63494.635792281.9065
2020-04-30170.06375304.8661439.66842020-04-30-308.24447.649637323.5437
2020-05-01164.28963308.5206452.75162020-05-01-330.4657.153191344.7714
2020-05-0293.39579244.7979396.19992020-05-02-354.10995.432586364.975
2020-05-03-22.66019143.4524309.5652020-05-03-381.28965.198718391.687
Figure 3

7-day period prediction of the daily new confirmed cases for Japan and Korea plot.

Predicted value under the 95% confidence level of the daily new confirmed cases for the 7-day period 7-day period prediction of the daily new confirmed cases for Japan and Korea plot.

Appendix A. Supplementary data

Appendix A.xlsx

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
SubjectInfectious Diseases
Specific subject areaARIMA model applied to predict COVID-19 outbreaks
Type of dataTableImage
How data were acquiredThe data on daily new confirmed cases of COVID-19 were taken from Wind Database. The data ware built as a time-series database by excel 2017 and ARIMA model was established for analysis using R software.
Data formatRaw
Parameters for data collectionUnder the framework of Box-Jenkins method, model identification, estimation, diagnostic checking, and forecasting for ARIMA model was applied to the daily new confirmed cases data in Japan and South Korea.
Description of data collectionThe daily new confirmed cases data of the COVID-19 outbreaks in Japan and South Korea from January 20, 2020 to April 26, 2020 are available from the Wind Database(https://www.wind.com.cn/newsite/edb.html). Also, there are no missing values and the Excel file of the daily data are presented in Supplementary data.
Data source locationJapan and Korea
Data accessibilityWith the articleThe raw data is in Appendix A.
  1 in total

1.  Predicting turning point, duration and attack rate of COVID-19 outbreaks in major Western countries.

Authors:  Xiaolei Zhang; Renjun Ma; Lin Wang
Journal:  Chaos Solitons Fractals       Date:  2020-04-20       Impact factor: 5.944

  1 in total
  5 in total

1.  A Model for Highly Fluctuating Spatio-Temporal Infection Data, with Applications to the COVID Epidemic.

Authors:  Peter Congdon
Journal:  Int J Environ Res Public Health       Date:  2022-05-30       Impact factor: 4.614

2.  A comparative study for predictive monitoring of COVID-19 pandemic.

Authors:  Binish Fatimah; Priya Aggarwal; Pushpendra Singh; Anubha Gupta
Journal:  Appl Soft Comput       Date:  2022-04-07       Impact factor: 8.263

3.  Covid-19 cases prediction using SARIMAX Model by tuning hyperparameter through grid search cross-validation approach.

Authors:  Sweeti Sah; Balasubramanian Surendiran; Ramasamy Dhanalakshmi; Mohammed Yamin
Journal:  Expert Syst       Date:  2022-07-15       Impact factor: 2.812

Review 4.  Artificial Intelligence for Forecasting the Prevalence of COVID-19 Pandemic: An Overview.

Authors:  Ammar H Elsheikh; Amal I Saba; Hitesh Panchal; Sengottaiyan Shanmugan; Naser A Alsaleh; Mahmoud Ahmadein
Journal:  Healthcare (Basel)       Date:  2021-11-23

5.  Forecasting the Severity of COVID-19 Pandemic Amidst the Emerging SARS-CoV-2 Variants: Adoption of ARIMA Model.

Authors:  Cai Li; Agyemang Kwasi Sampene; Fredrick Oteng Agyeman; Brenya Robert; Abraham Lincoln Ayisi
Journal:  Comput Math Methods Med       Date:  2022-01-13       Impact factor: 2.238

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.