Literature DB >> 33487397

Data-driven modeling and forecasting of COVID-19 outbreak for public policy making.

A Hasan¹, E R M Putri², H Susanto³, N Nuraini⁴.

Abstract

This paper presents a data-driven approach for COVID-19 modeling and forecasting, which can be used by public policy and decision makers to control the outbreak through Non-Pharmaceutical Interventions (NPI). First, we apply an extended Kalman filter (EKF) to a discrete-time stochastic augmented compartmental model to estimate the time-varying effective reproduction number (Rt). We use daily confirmed cases, active cases, recovered cases, deceased cases, Case-Fatality-Rate (CFR), and infectious time as inputs for the model. Furthermore, we define a Transmission Index (TI) as a ratio between the instantaneous and the maximum value of the effective reproduction number. The value of TI indicates the "effectiveness" of the disease transmission from a contact between a susceptible and an infectious individual in the presence of current measures, such as physical distancing and lock-down, relative to a normal condition. Based on the value of TI, we forecast different scenarios to see the effect of relaxing and tightening public measures. Case studies in three countries are provided to show the practicability of our approach.

Entities: Chemical

Keywords: COVID-19; Forecasting; Modeling; Public policy

Mesh：

Year: 2021 PMID： 33487397 PMCID： PMC7816594 DOI： 10.1016/j.isatra.2021.01.028

Source DB: PubMed Journal: ISA Trans ISSN： 0019-0578 Impact factor: 5.911

Introduction

The spread of the new coronavirus disease 2019 (COVID-19), originating from Wuhan China, has been worldwide and has caused a severe outbreak. The virus has infected more than 60M people with more than 1.4M confirmed deaths by the end of November 2020. The outbreak has not only triggered health crisis, but has also economic and social ones. It has become a multidimensional problem that needs to be minimized through measurable public policies.

Motivation

Intervention measures have been introduced to contain the outbreak and to prevent it from continuously growing and being transmitted, such as through physical distancing and lock-down measures [1], [2]. To this extent, a thorough evaluation to implement the available options is urgently needed. A quantitative as well as a qualitative evaluation involving key characteristics of COVID-19 outbreak can be conducted based on epidemiological parameters [3]. As the incidence is growing, quantitative studies on minimum physical distancing policies in Australia [4], China [5], and Italy [6] have been reported. A control measure for the disease transmission, known as the time-varying effective reproduction number (), reflects the disease extended transmission in the presence of interventions. Therefore, estimation of the time-varying effective reproduction number can be used to evaluate the implementation of a public policy [7]. The estimation of based on an epidemiological model, has an important role for an evidence-based policy making, that is also recognized by the World Health Organization (WHO) [8]. Data-driven framework of COVID-19 pandemic. Typical time-series data reported during the pandemic.

Literature review

A deterministic Susceptible–Infected–Recovered (SIR) model for estimation, assumed that the used data significantly representing the actual outbreak, was presented in [9]. Different sets of data representing levels of quarantine measures were used in [5] to describe the case growth as well as the effective reproduction number for each measure’s levels. To accommodate uncertainties in incidence data, noise was added to the model in [7]. In the paper, the authors used inputs from daily new cases, active cases, recovered cases, and deceased cases, to estimate the spread of the disease. Based on the stochastic model, the study was then extended to estimation. Using , several authors have proposed methods to forecast the evolution of the outbreak. Data-based analysis, modeling, and forecasting based on a Susceptible-Infectious-Recovered- Deceased (SIRD) model was presented in [10]. The authors fit the reported data with the SIRD model to estimate the epidemiological parameters. The main drawback when fitting the model with the data using the proposed method is that the estimated parameters can be unrealistic. The work of [11] attempted to use phenomenological models that have been validated during previous outbreaks. The model is used to generate and assess short-term forecasts of the cumulative number of confirmed reported cases. However, since COVID-19 is a new disease, the model is not reliable and the forecast can only be used for a very short term. Other authors used simple day-lag maps to investigate universality in the epidemic spreading [12]. Their results suggested that simple mean-field models can be used to gather a quantitative picture of the epidemic spreading. The main drawback is that the reproduction number was assumed to follow a heuristic continuous model, which may not described the actual transmission.

Contribution of this paper

In this paper, we propose a data-driven approach for COVID-19 modeling and forecasting, which can be used by public policy and decision makers to control the outbreak through Non-Pharmaceutical Interventions (NPI). Considering drawbacks in existing methods, we present two contributions: (i) estimation of the time-varying effective reproduction number based on real-time data fitting using an extended Kalman filter (EKF), and (ii) short to medium terms forecasting based on different public policies. As the effective reproduction number shows simply the extent of transmission due to population immunity or intervention in the form of public policy making [7], we propose a new measure called a Transmission Index (TI) (see Section 2.5), which describes the disease transmission relative to a normal condition. As well as , the value of TI can be used to measure the effectiveness of public health measures. Furthermore, TI can be used to forecast different public policy scenarios when a current measure is going to be loosen or even tightened.

Organization of this paper

Briefly, this paper is organized as follows. In Section 2, we discuss the methods used in evaluating the spread of the disease, including data availability and reliability, data driven framework, modeling, estimation, and forecasting. Then, we discuss the method and its applications to estimate and to forecast COVID-19 cases in United Arab Emirate (UAE), Australia, and Denmark in Section 3. Lastly, we present the conclusion in Section 4.

Methods

In this section, we describe the proposed data-driven modeling and forecasting approach that can be used by public policy and decision makers to control the COVID-19 pandemic through NPI. We acknowledge that no country has the exact total number of people infected with COVID-19, partially due to the lack of testing and undetected asymptomatic cases. Thus, the presented approach can only be used for countries/regions/areas that have performed mass testing with laboratory confirmation. Up to this point, we assumed that the difference between the actual and reported cases is minimum.

Data availability and reliability

There is a large number of generated data on COVID-19. Typically, government officials report daily confirmed cases, active cases, recovered cases, and deceased cases (see Table 1). These data are available for almost all countries and regions and can be accessed through online websites. Several websites, such as https://www.worldometers.info and https://ourworldindata.org/, also provide information regarding the number of test per capita. We will use these data in our analysis herein.

Table 1

Typical time-series data reported during the pandemic.

Date	Month	Daily Confirmed (C)	Active Case (I)	Recovered (R)	Deceased (D)
⋮	⋮	⋮	⋮	⋮	⋮
28	4	25498	830745	148926	59418
29	4	28525	851065	154737	61812
30	4	30912	874215	160293	64018
1	5	36090	898527	170171	65918
2	5	29816	914246	182570	67616
3	5	27394	934901	188155	68770
⋮	⋮	⋮	⋮	⋮	⋮

The testing positivity rate of a country, i.e., the ratio between the number of tests returning positive for COVID-19 relative to the total conducted tests, is a good metric to know whether or not the country has performed an adequate mass testing for their citizen to be able to properly monitor and control the spread of the virus. WHO advised that the positivity rate should remain at 5% or lower for at least 14 days. Only a few countries have successfully achieved this target, including UAE, Australia, and Denmark. Forecasting different scenarios through different designated values of TI. Interpretation of different value of designated TIs. The public measures are taken from the New Zealand COVID-19 alert system. Parameter for the simulations. RRMSE between reported and estimated cases using EKF.

Data-driven framework

The reported cases from the pandemic will be used for two purposes: (i) to estimate the time-varying effective reproduction number , and (ii) to forecast the number of active, recovered, deceased, and total cases, which are important to prepare for the healthcare system. We depicted in Fig. 1 a sketch of the data-driven framework proposed in our current study. denotes susceptible case data. Assuming constant population, can be obtained by subtracting the number of population with the number of active, recovered, and deceased cases. The active case data is the number of people who are currently infected. and denote the cumulative number of recovered and deceased, respectively, while denotes the number of daily confirmed cases. To model uncertainty in the reported cases, we added a white Gaussian noise. To obtain an estimate of , the data are assimilated into the compartmental epidemic model (see Section 2.3 below) using EKF. Based on this estimation, we will perform forecasting for the next 90 days with different scenarios.

Fig. 1

Data-driven framework of COVID-19 pandemic.

Mathematical modeling

To model the transmission of the coronavirus, we use the following discrete-time stochastic augmented compartmental model [7]: Eqs. (1)–(4) can be obtained from the standard SIRD model. We chose to use a discrete-time model instead of a continuous-time model since the data is available in discrete-time. Thus, it will be easier to implement the model and the estimation algorithm. The time-varying effective reproduction number is then given by Here, the term is used to compensate the decline in the number of susceptible population, while the term is the expression of the basic reproduction number from the SIRD model [13]. We augment the standard SIRD model with Eqs. (5), (6) that take into account the number of daily confirmed cases and the infectious rate . The infectious rate is assumed to be a piece-wise constant function with a jump at every one day time interval. The noise , , , , , and are added to model the uncertainty. The system (1)–(6) have three constant parameters: the number of population , the recovery rate , and the death rate . The recovery and death rates depend on the infectious time and Case-Fatality-Rate (CFR) and are given by The infectious time is obtained from clinical data, which on average lasts for 9 days with a standard deviation of 3 days for COVID-19 [14]. The CFR is unknown and need to be estimated. However, to simplify the calculation, in this paper we assumed that it is equal to the last data of the number of deceased case divided by the total infected case. To account for under-reported case, this estimation can be divided by a correction factor , e.g., where denotes the index of the latest data. In our example in Section 3, we assumed the under-reported case is 3 times larger that the reported case. Thus, we take . Data fitting using EKF for UAE. Error between reported and estimated cases in UAE. Data fitting using EKF for Australia. Error between reported and estimated cases in Australia. Data fitting using EKF for Denmark. Error between reported and estimated cases in Denmark. The time-varying effective reproduction number in UAE. Forecasting for the next 90 days in UAE.

Estimation of the time-varying effective reproduction number

The time-varying effective reproduction number is estimated by applying EKF to the discrete-time stochastic augmented compartmental model (1)–(6). Let us define The discrete-time augmented SIRD model (1)–(6) can be written as where is the right hand side of (1)–(6). Let us denote as the estimate of from the EKF. The EKF algorithm requires the Jacobian of at the estimate , that is given as where Detailed numerical implementation of the model can be found in [7]. Our algorithm has two tuning parameters: the covariance of the process noise and the covariance of the observation noise , which can be chosen such that the Relative Root Mean Square Error (RRMSE) between the reported and estimated data is minimized. The RRMSE for each variable is defined as where is the number of observed days. Here, and denote the reported and estimated data, respectively. The EKF serves as a real-time data fitting and will estimate any reported new data. Once this estimation process works, the EKF will also produce an estimate of from (6).

Forecasting

Forecasting is done for different public measure scenarios. Up to this point, we define TI as Here can be interpreted as the basic reproduction number (). The outcomes of different public measure scenarios are obtained through assigning different designated values of TI at the end of the prediction horizon, as illustrated in Fig. 2.

Fig. 2

Forecasting different scenarios through different designated values of TI.

Here, the prediction horizon is 90 days. In this case, we draw a straight line perturbed by a white Gaussian noise between the current TI and the designated TI. The noise is added to simulate fluctuation in the reported cases. In our case, the designated TIs are ranging from 20% to 100%. Table 2 shows a hypothetical but rational relationship between different public measure scenarios and designated TIs. Relaxing public measures will correspond to a smaller value of designated TI and vice versa. Based on the different values, we use the discrete-time compartmental model (1)–(5) to forecast the outcome of different public measure scenarios.

Table 2

Interpretation of different value of designated TIs. The public measures are taken from the New Zealand COVID-19 alert system.

Level	TI	Public measures
1 (Do nothing)	100%	No public measures.
2 (Prevent)	80%	Border entry measures.
		Intensive testing for COVID-19.
		Rapid contact tracing of any positive case.
		Self-isolation and quarantine required.
		Schools and workplaces open.
3 (Reduce)	60%	People can connect with friends and family.
		No more than 100 people at gatherings.
		Keep physical distancing of 2 meters.
		Businesses can open to the public.
		Sport and recreation activities are allowed.
4 (Restrict)	40%	People must work from home.
		Children should learn at home if possible.
		Public venues are closed.
		No more than 10 people at gatherings.
		Healthcare services use virtual.
5 (Lock-down)	20%	People instructed to stay at home.
		Travel is severely limited.
		All gatherings canceled.
		Businesses closed except for essential services.
		Educational facilities closed.

The time-varying effective reproduction number in Australia. Forecasting for the next 90 days in Australia.

Result and discussion

In this section, we run simulations for three countries: UAE, Australia, and Denmark. All data sets and codes are available on GitHub through this link: https://github.com/agusisma/coviddatadriven. Parameters for the simulations are presented in Table 3. The recovery rate and the death rate are calculated using (8).

Table 3

Parameter for the simulations.

Country	Parameters
	N	CFR	Ti
UAE	9,890,402	0.15%	9 ± 3
Australia	25,499,884	0.27%	9 ± 3
Denmark	5,792,202	1.13%	9 ± 3

Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8 show data fitting results using EKF and errors between reported and estimated cases in UAE, Australia, and Denmark. These figures show that our EKF algorithm estimate the actual reported cases accurately (see also Table 4). When estimating these cases, the EKF also estimates the time-varying effective reproduction number .

Fig. 3

Data fitting using EKF for UAE.

Fig. 4

Error between reported and estimated cases in UAE.

Fig. 5

Data fitting using EKF for Australia.

Fig. 6

Error between reported and estimated cases in Australia.

Fig. 7

Data fitting using EKF for Denmark.

Fig. 8

Error between reported and estimated cases in Denmark.

Table 4

RRMSE between reported and estimated cases using EKF.

RRMSEX
Country	I	R	D	C	Total
UAE	2.5e−04	1.8e−06	1.9e−06	2.4e−01	2.5e−01
Australia	2.1e−06	5.1e−06	2.1e−06	6.4e−03	6.5e−03
Denmark	4.1e−06	5.2e−02	2.2e−06	9.9e−03	6.2e−02

United Arab Emirates

The UAE has conducted more than 4.9 million tests since the outbreak or 502.14 total tests per thousand population. This brought UAE as one of the countries with the highest number of tests. The study of [15] estimated that the percentage of symptomatic COVID-19 cases reported in UAE using CFR estimates is at 98% (86%–100% of 95% credible interval). The first confirmed cases were reported on 29 January, from an infected family of four who came to the country on holiday from Wuhan. As the number of positive cases steadily increased, the government took immediate public measures, such as the closure of schools and universities across the country until the end of the academic year in June (announced on 3 March), the suspension of prayers at mosques and all other places of worship from 16 March including the whole month of Ramadan, as well as night curfews for disinfection on 26 March for an extended period of time that limited movements within the country. The extreme measures together with the government’s wider National Screening Program, which seeks to test as many people as possible with the aims to identify, isolate and treat patients as quickly as possible, yielded positive results in the decrease of the reproduction number almost immediately afterwards, see Fig. 9. On 18 May the government announced the first day where the number of recoveries surpasses the number of new cases found. In the third week of May, the daily new cases reached its peak and the reproduction number was already below the threshold value .

Fig. 9

The time-varying effective reproduction number in UAE.

With the current TI at 31%, our forecast shows that the daily cases will be steadily decreasing, see Fig. 10. As restrictions in the country ease, it is important to maintain safety and preventive measures. Public negligence can increase the TI, which can resulted in a second wave of infection in the country.

Fig. 10

Forecasting for the next 90 days in UAE.

The time-varying effective reproduction number in Denmark.

Australia

The first confirmed case of COVID-19 in Australia was found in January 2020 when a traveler went back to Victoria from Wuhan, China. The number of incidence passed 1000 in March 2020 and doubled after three days. The growth of incidence during March and April is considered as the first wave of pandemic with (see Fig. 11). The effects of pandemic in health sectors started to impact other sectors such as trade, travel, economic and finance and intensive interventions to prevent the pandemic from growing have been done [4].

Fig. 11

The time-varying effective reproduction number in Australia.

The Australian Government closed the borders to all non-residents and non-citizens on 20 March 2020 and applied a 14-day self isolation for all arrivals. Quarantine/lock-down related policy such as physical distancing or self-isolation policy were applied in the form of school and workplace closure, mass gathering cancellation, contact tracing, etc. Also all non-essential services were stopped to maximize the physical distancing. The policy was extended for the next three months [16]. The effect of the interventions was indicated by the value of in April, where the number of new cases dropped from 350 to 20 cases per day, as shown by Fig. 11. As the quarantine related policy was lifted in the beginning of June after a slow rate of infection () for three months, there has been escalation in the number of positive cases. This urged the Australian Government applied the policy again in order to prevent not only higher cases in the second wave but also a long-term impact to all sectors. Australia TI by 27 July was 62% and its short-term projection showed that the active cases will increase sharply if there is no further intervention. The number of recovered and deceased cases will increase in the next 90 days. The short-term projection is shown in Fig. 12.

Fig. 12

Forecasting for the next 90 days in Australia.

Forecasting for the next 90 days in Denmark.

Denmark

Denmark confirmed its first case on 27 February, when a man who had been skiing in Lombardy in Italy returned to Denmark. The country introduced lock-down on 13 March, by ordering people working in non-essential functions in the public sector to stay at home for two weeks. Furthermore, kindergarten, primary and secondary schools, universities, libraries, indoor cultural institutions and similar places were closed. Assembly of more than ten people in public were made illegal. The effect of the lock-down can be observed after three weeks (see Fig. 13), when . A very slow and gradual reopening was initiated on 15 of April, by opening nurseries, kindergartens, and primary schools, However, the government will re-enforce lock-down if there are indications that the number of infections increases quickly. Universities are open only for employees, while all courses will be given online. At the end of July, there was an increase in the number of infected people, possibly due to the summer holiday. However, the government did not enforce any restriction.

Fig. 13

The time-varying effective reproduction number in Denmark.

Denmark’s TI by 27 July was 20%. A short-term projection showed the number of active cases will be steady under the current measures. The number of recovered individuals will increase, while the number of death will decrease significantly, as can be seen from Fig. 14.

Fig. 14

Forecasting for the next 90 days in Denmark.

Conclusion

We have presented a data-driven approach for modeling and forecasting of COVID-19 outbreak for public policy making. The method we proposed is relied on the quality of the data. Thus, the estimated and the forecast results need to be carefully interpreted when the number of testing is not sufficient. By defining TI of a country as the ratio between its current reproduction number and the highest one ever reached by the country and estimating the index, our approach can be used to produce a short-to-medium term forecast that may predict the course of COVID-19 in the near future, including the probability of an upcoming second wave. Simulation results using data from three countries showed that our approach gives a reasonable forecast and insights.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

12 in total

1. Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy: Early Experience and Forecast During an Emergency Response.

Authors: Giacomo Grasselli; Antonio Pesenti; Maurizio Cecconi
Journal: JAMA Date: 2020-04-28 Impact factor: 56.272

2. A new estimation method for COVID-19 time-varying reproduction number using active cases.

Authors: Agus Hasan; Hadi Susanto; Venansius Tjahjono; Rudy Kusdiantara; Endah Putri; Nuning Nuraini; Panji Hadisoemarto
Journal: Sci Rep Date: 2022-04-23 Impact factor: 4.996

3. A quantitative and qualitative analysis of the COVID-19 pandemic model.

Authors: Sarbaz H A Khoshnaw; Muhammad Shahzad; Mehboob Ali; Faisal Sultan
Journal: Chaos Solitons Fractals Date: 2020-05-25 Impact factor: 5.944

4. Reconstructing the early global dynamics of under-ascertained COVID-19 cases and infections.

Authors: Timothy W Russell; Nick Golding; Joel Hellewell; Sam Abbott; Lawrence Wright; Carl A B Pearson; Kevin van Zandvoort; Christopher I Jarvis; Hamish Gibbs; Yang Liu; Rosalind M Eggo; W John Edmunds; Adam J Kucharski
Journal: BMC Med Date: 2020-10-22 Impact factor: 8.775

5. Developing WHO guidelines: Time to formally include evidence from mathematical modelling studies.

Authors: Matthias Egger; Leigh Johnson; Christian Althaus; Anna Schöni; Georgia Salanti; Nicola Low; Susan L Norris
Journal: F1000Res Date: 2017-08-29

6. Data-based analysis, modelling and forecasting of the COVID-19 outbreak.

Authors: Cleo Anastassopoulou; Lucia Russo; Athanasios Tsakris; Constantinos Siettos
Journal: PLoS One Date: 2020-03-31 Impact factor: 3.240

7. Scientific and ethical basis for social-distancing interventions against COVID-19.

Authors: Joseph A Lewnard; Nathan C Lo
Journal: Lancet Infect Dis Date: 2020-03-23 Impact factor: 25.071

8. Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020.

Authors: K Roosa; Y Lee; R Luo; A Kirpich; R Rothenberg; J M Hyman; P Yan; G Chowell
Journal: Infect Dis Model Date: 2020-02-14

9. Modelling transmission and control of the COVID-19 pandemic in Australia.

Authors: Sheryl L Chang; Nathan Harding; Cameron Zachreson; Oliver M Cliff; Mikhail Prokopenko
Journal: Nat Commun Date: 2020-11-11 Impact factor: 14.919

10. Analysis and forecast of COVID-19 spreading in China, Italy and France.

Authors: Duccio Fanelli; Francesco Piazza
Journal: Chaos Solitons Fractals Date: 2020-03-21 Impact factor: 5.944

8 in total

1. Analysis of COVID-19 Spread in Tokyo through an Agent-Based Model with Data Assimilation.

Authors: Chang Sun; Serge Richard; Takemasa Miyoshi; Naohiro Tsuzu
Journal: J Clin Med Date: 2022-04-25 Impact factor: 4.964

2. Deep Spatiotemporal Model for COVID-19 Forecasting.

Authors: Mario Muñoz-Organero; Paula Queipo-Álvarez
Journal: Sensors (Basel) Date: 2022-05-05 Impact factor: 3.847

3. Disinfection chain: A novel method for cheap reusable and chemical free disinfection of public places from SARS-CoV-2.

Authors: Sushanta Debnath; Mohiul Islam
Journal: ISA Trans Date: 2021-03-29 Impact factor: 5.911

Review 4. Rigorous Policy-Making Amid COVID-19 and Beyond: Literature Review and Critical Insights.

Authors: Zhaohui Su
Journal: Int J Environ Res Public Health Date: 2021-11-26 Impact factor: 3.390

5. Understanding Health Care Administrators' Data and Information Needs for Decision Making during the COVID-19 Pandemic: A Qualitative Study at an Academic Health System.

Authors: Christina Guerrier; Cara McDonnell; Tanja Magoc; Jennifer N Fishe; Christopher A Harle
Journal: MDM Policy Pract Date: 2022-03-29

6. Data-assimilation and state estimation for contact-based spreading processes using the ensemble kalman filter: Application to COVID-19.

Authors: A Schaum; R Bernal-Jaquez; L Alarcon Ramos
Journal: Chaos Solitons Fractals Date: 2022-03-11 Impact factor: 9.922

7. Linear Parameter Varying Model of COVID-19 Pandemic Exploiting Basis Functions.

Authors: Roozbeh Abolpour; Sara Siamak; Mohsen Mohammadi; Parisa Moradi; Maryam Dehghani
Journal: Biomed Signal Process Control Date: 2021-07-21 Impact factor: 3.880

8. Modeling the COVID-19 Epidemic With Multi-Population and Control Strategies in the United States.

Authors: Deshun Sun; Xiaojun Long; Jingxiang Liu
Journal: Front Public Health Date: 2022-01-03

8 in total