Literature DB >> 32181302

Application of the ARIMA model on the COVID-2019 epidemic dataset.

Domenico Benvenuto1, Marta Giovanetti2, Lazzaro Vassallo3, Silvia Angeletti4, Massimo Ciccozzi2.   

Abstract

Coronavirus disease 2019 (COVID-2019) has been recognized as a global threat, and several studies are being conducted using various mathematical models to predict the probable evolution of this epidemic. These mathematical models based on various factors and analyses are subject to potential bias. Here, we propose a simple econometric model that could be useful to predict the spread of COVID-2019. We performed Auto Regressive Integrated Moving Average (ARIMA) model prediction on the Johns Hopkins epidemiological data to predict the epidemiological trend of the prevalence and incidence of COVID-2019. For further comparison or for future perspective, case definition and data collection have to be maintained in real time.
© 2020 The Authors.

Entities:  

Keywords:  ARIMA model; COVID-2019 epidemic; Forecast; Infection control

Year:  2020        PMID: 32181302      PMCID: PMC7063124          DOI: 10.1016/j.dib.2020.105340

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table These data are useful because they provide a forecast for COVID-2019 epidemic, thus representing a valid and objective tool for monitoring infection control. All institutions involved in public health and infection control can benefit from these data because by using this model, they can daily construct a reliable forecast for COVID-2019 epidemic. The additional value of these data lies in their easy collection and in the possibility to provide valid forecast for COVID-2019 daily monitoring after the application of the ARIMA model. These data represent an easy way to evaluate the transmission dynamics of COVID-2019 to verify whether the strategy plan for infection control or quarantine is efficient.

Data description

The daily prevalence data of COVID-2019 from January 20, 2020 to February 10, 2020 were collected from the official website of Johns Hopkins University (https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html), and Excel 2019 was used to build a time-series database [1]. ARIMA model was applied to a dataset consisting of 22 number determinations. Fig. 1 shows that the overall prevalence of COVID-2019 presented an increasing trend that is reaching the epidemic plateau. The difference between cases of one day and cases of the previous day Δ(Xn-Xn-1) showed a nonconstant increase in the number of confirmed cases. Descriptive analysis of the data was performed to evaluate the incidence of new confirmed cases of COVID-2019 and to prevent eventual bias.
Fig. 1

Correlogram and ARIMA forecast graph for the 2019-nCoV prevalence.

Correlogram and ARIMA forecast graph for the 2019-nCoV prevalence.

Experimental design, materials, and methods

The ARIMA model includes autoregressive (AR) model, moving average (MA) model, and seasonal autoregressive integrated moving average (SARIMA) model [2]. The Augmented Dickey-Fuller (ADF) [3] unit-root test helps in estimating whether the time series is stationary. Log transformation and differences are the preferred approaches to stabilize the time series [4]. Seasonal and nonseasonal differences were used to stabilize the term trend and periodicity. Parameters of the ARIMA model were estimated by autocorrelation function (ACF) graph and partial autocorrelation (PACF) correlogram. To determine the prevalence of COVID-2019, ARIMA (1,0,4) was selected as the best ARIMA model, while ARIMA (1,0,3) was selected as the best ARIMA model for determining the incidence of COVID-2019. Gretl2019d statistical software [5] was used to perform statistical analysis on the prevalence and incidence datasets, and the statistical significance level was set at 0.05. A previous study was considered as reference for the methodology of the analysis [6]. Logarithmic transformation was performed to evaluate the influence of seasonality on the forecast. The correlogram reporting the ACF and PACF showed that both prevalence and incidence of COVID-2019 are not influenced by the seasonality. The forecast of prevalence and incidence data with relative 95% confidence intervals are reported in Table 1.
Table 1

Forecast value for the 2 days after the analysis for the prevalence and for the incidence of the COVID-2019.

DateForecast95% Confidence Interval
Prevalence11/02/202043599.7142347.53–44851.9
12/02/202045151.4542084.88–48218.02
Incidence11/02/20202070.661305.23–2836.09
12/02/20202418.471534.43–3302.51
Forecast value for the 2 days after the analysis for the prevalence and for the incidence of the COVID-2019. Although more data are needed to have a more detailed prevision, the spread of the virus seems to be slightly decreasing. Moreover, although the number of confirmed cases is still increasing, the incidence is slightly decreasing. If the virus does not develop new mutations, the number of cases should reach a plateau (Fig. 1, Fig. 2). The forecast and the estimate obtained are influenced by the “case” definition and the modality of data collection. For further comparison or for future perspective, case definition and data collection must be maintained in real time.
Fig. 2

Correlogram and ARIMA forecast graph for the 2019-nCoV incidence.

Correlogram and ARIMA forecast graph for the 2019-nCoV incidence.

Specifications Table

SubjectInfectious Diseases
Specific subject areaEconometric models applied to infectious diseases epidemiological data to forecast the prevalence and incidence of COVID-2019
Type of dataChartGraphFigure
How data were acquiredGretl 2019d http://gretl.sourceforge.net/win32/index_it.html
Data formatData are in raw format and have been analyzed. An Excel file with data has been uploaded.
Parameters for data collectionParameters used for ARIMA were model ARIMA (1,2,0) and ARIMA (1,0,4)
Description of data collectionThe daily prevalence data of COVID-2019 from January 20, 2020 to February 10, 2020 were collected from the official website of Johns Hopkins university (https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html), and Excel 2019 was used to build a time-series database. Descriptive analysis of the data was performed, and to evaluate the incidence of new confirmed cases of COVID-2019 and to prevent eventual bias, the difference between the cases confirmed on that day and the cases confirmed on the previous day were calculated Δ(Xn-Xn-1).
Data source locationUniversity Campus Bio-Medico of Rome
Data accessibilityRaw data can be retrieved from the Github repository https://github.com/CSSEGISandData/COVID-19
Value of the Data

These data are useful because they provide a forecast for COVID-2019 epidemic, thus representing a valid and objective tool for monitoring infection control.

All institutions involved in public health and infection control can benefit from these data because by using this model, they can daily construct a reliable forecast for COVID-2019 epidemic.

The additional value of these data lies in their easy collection and in the possibility to provide valid forecast for COVID-2019 daily monitoring after the application of the ARIMA model.

These data represent an easy way to evaluate the transmission dynamics of COVID-2019 to verify whether the strategy plan for infection control or quarantine is efficient.

  1 in total

1.  Comparison of ARIMA and GM(1,1) models for prediction of hepatitis B in China.

Authors:  Ya-Wen Wang; Zhong-Zhou Shen; Yu Jiang
Journal:  PLoS One       Date:  2018-09-04       Impact factor: 3.240

  1 in total
  93 in total

1.  Integrating County-Level Socioeconomic Data for COVID-19 Forecasting in the United States.

Authors:  MichaelC Lucic; Hakim Ghazzai; Carlo Lipizzi; Yehia Massoud
Journal:  IEEE Open J Eng Med Biol       Date:  2021-07-09

2.  A systematic review on AI/ML approaches against COVID-19 outbreak.

Authors:  Onur Dogan; Sanju Tiwari; M A Jabbar; Shankru Guggari
Journal:  Complex Intell Systems       Date:  2021-07-05

3.  Modeling the trend of coronavirus disease 2019 and restoration of operational capability of metropolitan medical service in China: a machine learning and mathematical model-based analysis.

Authors:  Zeye Liu; Shuai Huang; Wenlong Lu; Zhanhao Su; Xin Yin; Huiying Liang; Hao Zhang
Journal:  Glob Health Res Policy       Date:  2020-05-06

4.  Product of natural evolution (SARS, MERS, and SARS-CoV-2); deadly diseases, from SARS to SARS-CoV-2.

Authors:  Mohamad Hesam Shahrajabian; Wenli Sun; Qi Cheng
Journal:  Hum Vaccin Immunother       Date:  2020-08-12       Impact factor: 3.452

5.  Predictive Modeling of Covid-19 Data in the US: Adaptive Phase-Space Approach.

Authors:  Vasilis Z Marmarelis
Journal:  IEEE Open J Eng Med Biol       Date:  2020-07-09

6.  Estimating the Prevalence and Mortality of Coronavirus Disease 2019 (COVID-19) in the USA, the UK, Russia, and India.

Authors:  Yongbin Wang; Chunjie Xu; Sanqiao Yao; Yingzheng Zhao; Yuchun Li; Lei Wang; Xiangmei Zhao
Journal:  Infect Drug Resist       Date:  2020-09-29       Impact factor: 4.003

7.  Steady state Kalman filter design for cases and deaths prediction of Covid-19 in Greece.

Authors:  N Assimakis; M Adam; A Ktena; C Manasis
Journal:  Results Phys       Date:  2021-05-28       Impact factor: 4.476

8.  Analysis and forecasts for trends of COVID-19 in Pakistan using Bayesian models.

Authors:  Navid Feroze; Kamran Abbas; Farzana Noor; Amjad Ali
Journal:  PeerJ       Date:  2021-07-07       Impact factor: 2.984

9.  Transmission trend of the COVID-19 pandemic predicted by dendritic neural regression.

Authors:  Minhui Dong; Cheng Tang; Junkai Ji; Qiuzhen Lin; Ka-Chun Wong
Journal:  Appl Soft Comput       Date:  2021-07-07       Impact factor: 6.725

10.  Early detection of COVID-19 outbreaks using human mobility data.

Authors:  Grace Guan; Yotam Dery; Matan Yechezkel; Irad Ben-Gal; Dan Yamin; Margaret L Brandeau
Journal:  PLoS One       Date:  2021-07-20       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.