Literature DB >> 33824550

Maximum likelihood-based extended Kalman filter for COVID-19 prediction.

Jialu Song¹, Hujin Xie¹, Bingbing Gao², Yongmin Zhong¹, Chengfan Gu³, Kup-Sze Choi³.

Abstract

Prediction of COVID-19 spread plays a significant role in the epidemiology study and government battles against the epidemic. However, the existing studies on COVID-19 prediction are dominated by constant model parameters, unable to reflect the actual situation of COVID-19 spread. This paper presents a new method for dynamic prediction of COVID-19 spread by considering time-dependent model parameters. This method discretises the susceptible-exposed-infected-recovered-dead (SEIRD) epidemiological model in time domain to construct the nonlinear state-space equation for dynamic estimation of COVID-19 spread. A maximum likelihood estimation theory is established to online estimate time-dependent model parameters. Subsequently, an extended Kalman filter is developed to estimate dynamic COVID-19 spread based on the online estimated model parameters. The proposed method is applied to simulate and analyse the COVID-19 pandemics in China and the United States based on daily reported cases, demonstrating its efficacy in modelling and prediction of COVID-19 spread.

Entities: Chemical

Keywords: COVID-19 modelling; Extended Kalman filter; Maximum likelihood estimation; SEIRD model; Time-dependent model parameters

Year: 2021 PMID： 33824550 PMCID： PMC8017556 DOI： 10.1016/j.chaos.2021.110922

Source DB: PubMed Journal: Chaos Solitons Fractals ISSN： 0960-0779 Impact factor: 5.944

Introduction

The COVID-19 disease was broken out in Wuhan, China in the end of 2019. In spite of a huge amount of control efforts by the Chinese government, this disease was evolved into an epidemic within a few months, affecting almost every country in the world [1], [2], [3]. On the 31st January 2020, the World Health Organization declared the COVID-19 pandemic as a Public Health Emergency of International Concern. As of 27th January 2021, the virus has infected more than 100 million individuals, caused more than 2 million deaths [4], and altered the life of billions of people [5]. A huge amount of efforts has been dedicated to battling the COVID-19 disease, especially by the affected countries. However, this battle is far from over, with new infections detected every day. The forecast plays a significant role in control of the COVID-19 epidemic to provide early warnings, monitor the virus spread, evaluate the effects of virus containment measures, predict the future trend, and allocate the resources to counteract the pandemic. Thus, it is imperative to develop epidemiological modelling to predict and analyse the evolving trend of the COVID-19 pandemic. Research endeavours have been devoted to prediction and forecast of COVID-19 spread, leading to various epidemiological models for characterisation of the COVID-19 transmission process. The Susceptible-Infected-Removed (SIR) model is commonly used for epidemiological modelling, where people who are susceptible to infection will be easily infected, people who are infected will either be cured or die, the susceptible population will gradually decrease over time, and the infectious disease will disappear eventually [6]. The removed (R) in the SIR model includes both individuals who are dead due to the epidemic and who are recovered. The Susceptible-Infected-Removed-Dead (SIRD) model improves the SIR model by dividing the removed (R) into the recovered (R) and dead (D) [7]. Most of the diseases have an incubation period between 5 to 14 days, during which the infected population is asymptomatically increased. However, the SIR model does not consider the population exposed to an epidemic. The Susceptible-Exposed-Infected-Recovered (SEIR) model adds the exposed population in the SIR model to take into account the effect of the incubation period of an infectious disease [8]. Similar to the SIRD model, the susceptible-exposed-infected-recovered-dead (SEIRD) model improves the SEIR model by considering the dead population [9]. Different from other infectious diseases, the COVID-19 virus has a strong characteristic of dynamic propagation, i.e., its epidemiological process strongly evolves with time. In order to capture this characteristic and reflect the actual behaviours of dynamic COVID-19 spread, the model parameters such as the infection, death and recovery rates as well as basic reproduction number must be time-dependent in COVID-19 modelling. However, the existing studies on COVID-19 modelling are dominated by fixed model parameters [1,2,10], while those based on time-dependent model parameters are still limited and mainly focused on the SIR and SEIR models [11]. In addition to an epidemiological model, a real-time estimation method is also required for prediction and analysis of the dynamic transmission process of COVID-19. The recursive least square (RLS) is the commonly used method for online parameter estimation in epidemic modelling [12]. It conducts online parameter estimation by minimizing a linear least-square cost function related to system observations. Chen et al. used the ridge regression, which is an improved RLS method to track the infection rate and recovery rate based on the SIR model [11]. As an extension of RLS, the Kalman filter (KF) uses linear Gaussian system state equations to optimally estimate system state through system observations. In addition to the update process of estimated system state with observations, which is similar to RLS, KF also involves a prediction process to predict the dynamic evolution of system state. Comparing to RLS, KF can achieve the estimation in the accuracy of minimum mean-square error even without using observations. Vaid et al. studied the COVID-19 pandemics in the United States (US), Canada and Sweden using KF based on Markov chain Monte Carlo (MCMC) sampling of the SIR model [13]. Zeng and Ghanem also developed a KF by switching between different linear Gaussian models based on observation data for modelling of the COVID-19 pandemic in the United States (US) [14]. In general, similar to RLS, KF can be applied to linear systems only, while the existing epidemiological models for COVID-19 prediction are nonlinear. The nonlinear RLS is an extension of RLS to nonlinear systems. Piccolomini et al. used the nonlinear RLS to predetermine the SEIRD model parameters for COVID-19 [15]. However, due to the involvement of expensive computations, the nonlinear RLS can only be conducted offline, unsuitable for online determination of model parameters for COVID-19 forecast. Ensemble KF (EnKF) extends the traditional KF to nonlinear systems by approximating the distribution of system state using a random ensemble of system state to calculate the state error covariance from ensemble members. Nkwayep et al. developed an EnKF based on the SIR model for prediction of the COVID-19 pandemic in Cameroon [16]. However, the size of ensemble members is critical for epidemic modelling. The use of a small ensemble size will lead to long-range spurious correlations in the error covariance and further the filtering divergence, while the use of a large ensemble size will lead to expensive computations [17]. In addition, EnKF cannot handle sharp coherent features such as the travelling waves found in epidemics [10]. The ensemble adjustment KF (EAKF) improves EnKF by adjusting each ensemble member towards the ensemble mean to prevent the filtering divergence [18]. However, this improvement is achieved at the cost of computational efficiency. Further, EAKF also suffers from the problem of expensive computations in case of large ensemble size [19]. The particle filter (PF) conducts nonlinear state estimation by using a sequence of independent random samples distributed according to certain conditional probability distributions to approximate the system state. Calvetti et al developed a PF based on the SEIR model for dynamic estimation of the reproduction number of the COVID-19 epidemic in US [20]. However, PF suffers from the particle degeneracy phenomenon and its accuracy largely depends on the choice of the importance sampling density function and resampling scheme. Despite various improvements such as the use of MCMC sampling [21] and Metropolis-Hastings (M-H) rules [22] for improvement of resampling, PF still suffers from expensive computations in the case of large sample size. The extended Kalman filter (EKF) is an extension of the traditional KF to nonlinear systems by constructing the slope of the nonlinear system model in the mean. Although EKF has been used to predict and analyse various epidemic diseases [23,24], these studies mainly consider fixed model parameters, while the research on using EKF for modelling of the COVID-19 disease by considering time-dependent epidemiological model parameters is still limited. Just recently, Younes and Hasan also developed an EKF based on the Lotka–Volterra model to online estimate the dynamic behaviours of COVID-19 spread [25]. However, since the state prediction is achieved based on historical data, this method is unable to reflect the impact of external interventions on COVID-19 spread. Further, it still considers fixed rather than time-dependent model parameters for COVID-19 modelling. This paper presents a novel method for prediction and analysis of dynamic COVID-19 spread based on the SEIRD epidemiological model with time-dependent model parameters. By discretising the SEIRD epidemiological model in time domain, the nonlinear state-space equation for describing COVID-19 dynamics is constructed to simultaneously estimate the time-dependent model parameters and transmission state. The time-varying model parameters are estimated according to the maximum likelihood principle to account for the dynamic effects of the infection, death and recovery rates on COVID-19 spread. Subsequently, an EKF is developed to dynamically estimate the transmission state based on the online estimated model parameters for COVID-19 forecast. Simulations and analysis on the COVID-19 pandemics in China and US have been conducted based on daily reported cases to evaluate the performance of the proposed method for COVID-19 modelling.

SEIRD model

The SIR model is commonly used to estimate emerging infectious diseases such as the COVID-19 disease. It is defined as where S, I and R denote the susceptible, infected and recovered numbers, denotes the total population and under the assumption that the birth rate is equal to the death rate, denotes the rate of decrease in susceptible individuals, represents the growth rate of infected individuals, represents the growth rate of recovered individuals, is the infection rate, and is the recovery rate. The COVID-19 disease has an asymptomatic infection trend, which cannot be described by the SIR model. It also has an incubation period of 14 days, where healthy people who have been in contact with the infected but without getting sick immediately will become virus carriers. These people are classified by the exposed cohort (E), which is in relation to the incubation rate . To account for the exposed (E), the SEIR model adds the following equation in the SIR model Further, the SEIRD model adds the death component (D) with the death rate in the SEIR model to describe the disease spread dynamics Fig. 1 illustrates the structure of the SEIRD model. The susceptible cohort (S) will be infected to the exposed cohort (E) with the infection rate . The exposed cohort (E) will then be transferred to the infected (I) with the incubation rate . Finally, people who are infected will be cured with the recovery rate or die with the death rate .

Fig. 1

The SEIRD model.

The SEIRD model. Among the parameters of the SEIRD model, the incubation rate is the inverse of the average latent time from showing symptoms to being infected. Since the average latent time from showing symptoms to being infected is a constant, the incubation rate is commonly considered as a constant [7,12]. However, the infection rate , recovery rate and death rate are not constant in the real-world situation. The contact with symptomatic infected persons is the main source to affect the infection rate. Other factors such as government control measures taken at various stages of the epidemic and aerosol transmissions also affect the infection rate. The recovery and death rates are related to both medical and population health levels in a country and how the government handles and controls the COVID-19 epidemic. Since they are variable in the dynamic environment of COVID-19 spread, these model parameters must be considered time-dependent in the modelling process. Subsequently, by discretizing the SEIRD model in time domain, the discrete system equation for COVID-19 spread is obtained as where , and denote the infection rate, recovery rate and death rate at time point t. The basic reproduction number is an important index in epidemiology and is calculated as If , the infectious disease will gradually decrease over time. If , the infectious disease will spread exponentially and become an epidemic. However, the epidemic will not last forever, because the population that may be infected will slowly decrease, some portion of the population may die from the infectious disease, and some may develop immunity after recovery. If , the infectious disease will become endemic in the population. The larger is, the more difficult the control of the infectious disease will be.

Estimation algorithm

EKF

Consider the COVID-19 transmission process as a dynamic system. The system state vector is defined as The system state equation is described aswhere collects the time dependent model parameters and is the process noise at time +1 which is assumed as a Gaussian white noise with zero mean and covariance , i.e., is the nonlinear system function formed from (4), (5), (6), (7), (8), i.e., Linearizing the nonlinear system function using the first-order Taylor expansion yields the Jacobian matrix where The observation equation is expressed aswhere is the observation function to define the relation between the state vector and measurement vector , and is the observation noise which is assumed as a zero-mean Gaussian white noise of covariance , i.e., The classic EKF procedure can be described as the following steps: (i) Set the initial estimation state vector of and error covariance (ii) State prediction (iii) Measurement update (iv) Repeat (ii) and (iii) for the next time step until all samples are processed.

Model parameter estimation

The infection rate , death rate and recovery rate are the time-varying model parameters, which need to be estimated during the modelling process. The maximum-likelihood estimation is a statistical method to estimate parameters by maximizing their posteriori probability densities based on the deep-rooted Bayesian formalism [26,27]. It simplifies the problem of parameter estimation as a problem of maximizing a log-likelihood function. In this paper, we will adopt the maximum-likelihood method to estimate the model parameters from the available measurement data , where is an integer. Define the innovation vector as For the nonlinear Gaussian system described by (21)-(24), we havewhere Assuming the available measurements are independent each other, according to the multiplication theorem of conditional probability, we can obtain where m is the dimension of the measurement vector. Taking the logarithm of (29) yields By ignoring the constant term, we define the cost function as such that the maximum likelihood estimates of parameter satisfies the following condition It can be seen from (32) the problem of estimating parameter is converted to a minimization problem. Solving (32) is equivalent to solving where l=1,2,3 is the index corresponding to respectively. Let represent the vector of partials as follow In order to calculate , we denote as the matrix of second-order partials, where Assuming that is nonsingular and denoting be the estimate of at the ith iteration, we can approximate using the Taylor expansion Letting the right-hand side of (36) be equal to zero and solving for , it is obtained Iterating (37) until the process is converged, we will readily have the maximum likelihood estimation . Subsequently, the obtained maximum likelihood estimation will be fed back to the EKF filtering process to calculate the soft tissue deformation state. Fig. 2 illustrates the framework of the maximum likelihood-based EKF.

Fig. 2

The proposed EKF algorithm based on model parameter estimation.

Performance evaluation

Simulation analyses were conducted based on two actual cases to comprehensively evaluate the performance of the proposed EKF for COVID-19 modelling. One is the COVID-19 pandemic in Wuhan, China, which is the first outbreak in the world. The other is the COVID-19 pandemic in US, which has not yet reached the peak and involves the largest number of confirmed cases. Since the true values for the actual cases are unknown, the reported data were taken as reference for calculation of estimation error. The initial state, error covariance and model parameters were set based on the observation data on the first day of each simulation analysis. The transmission states of COVID-19 estimated by EKF were also compared with those numerical solutions calculated from the discrete SEIRD model (4), (5), (6), (7), (8) based on parameter identification via the offline constrained least-squares algorithm [15,34]. The mean error and RMSE (Root mean square error) are used to measure the estimation accuracy. They are defined as

COVID-19 spread in Wuhan of China

The COVID-19 was first broken out in Wuhan, China. No special intervention measures for COVID-19 were taken before 23rd January 2020. On 23rd January 2020, the Chinese government locked down the Wuhan city, suspended all public transportations and services in the city, and imposed quarantine to households in the entire city [28]. The Chinese government also adopted a series of strict measures, including travel bans, suspension of schools and extension of the Spring Festival holiday for the entire country to control the virus spread. These strong controls prohibited the virus spread and saved many lives at a huge economic cost. The number of daily new cases decreased to zero on 18th March 2020 in Wuhan. The epidemic in Wuhan almost vanished in the end of April 2020. Simulation trials were conducted by focusing on the COVID-19 spread during the lockdown period of the Wuhan city from 23rd January (Day 0) to 15th April 2020 (Day 83), where the model parameters, i.e., the infection rate, recovery rate and death rate, were mainly affected by government control measures. The constant incubation rate was set to 0.33 [29,30]. The time from 10th to 22nd January 2020 was just before the Chinese Spring Festival, during which large-scale population movements were occurred in the entire country. Before the city lockdown on 23rd January 2020, there were about 5 million people moved out of the Wuhan city. This large-scale population migration may cause the epidemic to spread widely in Wuhan and to other places in the country [31]. Therefore, the total population used in the simulation was 5 million less than the total population of Wuhan in 2019. The daily reported data on the COVID-19 spread in the Wuhan city [32] were taken as the observation data for the simulation analysis. Fig. 4 illustrates the virus transmission state estimated by the proposed EKF. It can be seen that the estimated numbers of active, recovery and death cases are very close to their reported data. As shown in Fig. 4(a), the reported number of active cases suddenly increases to 13,436 on the 20th day of the lockdown, leading to the largest increase on one single day. This is because the medical diagnosis became more reliable, leading more symptomatic cases to be classified as confirmed cases [33]. The EKF estimation captures the largest single-day increase. As shown in Figs. 3 (a) and 4 (a), the estimated infection rate increases to the maximum and the estimated number of active cases also involves a large increase on that day (13th February 2020). After the peak of reported active cases, the reported number of active cases begins to drop sharply due to the lockdown effect. Both estimated infection rate and estimated active cases also involve such dramatic drops.

Fig. 4

The transmission states estimated by the proposed EKF and calculated from the discrete SEIRD model based on parameter identification via the constrained least-squares algorithm for the COVID-19 pandemic in Wuhan: (a) the number of active cases; (b) the number of recovery cases; and (c) the number of death cases.

Fig. 3

The model parameters estimated by the proposed EKF for the COVID-19 pandemic in Wuhan: (a) the infection rate; (b) the recovery rate; and (c) the death rate.

The model parameters estimated by the proposed EKF for the COVID-19 pandemic in Wuhan: (a) the infection rate; (b) the recovery rate; and (c) the death rate. The transmission states estimated by the proposed EKF and calculated from the discrete SEIRD model based on parameter identification via the constrained least-squares algorithm for the COVID-19 pandemic in Wuhan: (a) the number of active cases; (b) the number of recovery cases; and (c) the number of death cases. Figs. 3(b) and 4(b) show that both estimated recovery rate and recovery cases have a similar trend and the estimated recovery cases follows closely to the reported ones. Both estimated recovery rate and cases are small in the early stage of the COVID-19 pandemic, because there is always a lack of effective medical treatment for a new epidemic in the early stage. However, after the 30th day of the lockdown, with the improvement of medical treatment, the estimated recovery rate as well as the numbers of estimated and reported recovery cases greatly increase and eventually remain at a constant value. As shown in Fig. 3(c), the estimated death rate has an opposite trend with the estimated recovery rate. It is at maximum in the early stage and will then greatly decrease due to the improvement of medical treatment. After the 30th day of the lockdown, the estimated death rate involves small variations and eventually remains at a constant value. Correspondingly, as shown in Fig. 4(c), both estimated and reported numbers of death cases greatly increase before the 30th day of the lockdown. After that, both estimated and reported numbers of death cases gradually increase and eventually remain at a constant value. On the 30th day of the lockdown, because the infected are either cured or died, the infection rate drops to the lowest within the first 30 days of the lockdown, and both estimated and reported active cases also begin to decline. This indicates the government control measures started taking into effect to control the epidemic. After the 70th day of the lockdown, the estimated number of active cases is gradually close to zero, while both estimated numbers of death and recovery cases are gradually close to a constant value, indicating that the COVID-19 epidemic in Wuhan is almost over. Further, the transmission state was also calculated from the discrete SEIRD model (4)-(8) based on parameter identification via the offline constrained least-squares algorithm [15,34] for the COVID-19 spread in Wuhan, and was further compared with that estimated by EKF. As shown in Fig. 4, the transmission state estimated by EKF approximates the reported data more closely than the numerical solution. Table 1 lists the statistical errors of both EKF estimation and numerical solution, demonstrating that the EKF estimation has much higher accuracy than the numerical solution for the COVID-19 spread in Wuhan.

Table 1

Statistical estimation errors of the proposed EKF and numerical solution for the COVID-19 pandemic in Wuhan.

State Variables	Mean error		RMSE
State Variables	EKF	Numerical solution	EKF	Numerical solution
I	581.25	1062	242.43	1670
R	561.76	1489	813.97	2512
D	23.79	31.88	41.24	44.37

Statistical estimation errors of the proposed EKF and numerical solution for the COVID-19 pandemic in Wuhan. The performance of the proposed EKF was also evaluated in terms of the basic reproduction number . Fig. 5 illustrates the development trend of for the COVID-19 spread in Wuhan, China during the lockdown period. The estimated increases from the initial 0.75 to the peak on the 8th day of the lockdown. After the peak, it drops rapidly and becomes smaller than 1 on the 30th day of the lockdown, indicating that the COVID-19 epidemic starts diminishing. This development trend is also reflected in Fig. 4(a), where both estimated and reported numbers of active cases rapidly decrease after the 30th day of the lockdown. After the 30th day of the lockdown, the estimated continues dropping until the 55th day of the lockdown, after which the estimated remains close to zero. This indicates that the COVID-19 epidemic has been contained and is almost over, due to the government control measures. It should be noted that the 55th day of the lockdown on which drops to zero for the first time also matches the observation data on that day where the number of daily new infected cases in Wuhan is zero for the first time.

Fig. 5

The basic reproduction number estimated by the proposed EKF for the COVID-19 pandemic in Wuhan.

The basic reproduction number estimated by the proposed EKF for the COVID-19 pandemic in Wuhan. The above results demonstrate that the proposed EKF can effectively predict the evolving trend of the COVID-19 spread. They also indicate that since the outbreak of the COVD-19 epidemic, the lockdown measures implemented by the Chinese government effectively controlled the pandemic spread. In other words, the government's early intervention measures played a decisive role in controlling the COVID-19 epidemic.

COVID-19 spread in US

The COVID-19 spread in US started in New York and California, and then severely spread in New York. On 10th March 2020, the number of confirmed cases in US was just about 700. However, on 26th March 2020, the number of confirmed cases was rapidly increased to over 80,000 and the number of death cases was also rapidly increased from 30 to about 1600. This shows that the epidemic was already in a state of community spread in US. As of 24th October, 2020, the virus has infected more than 8 million individuals and caused more than 223,000 deaths in US [4]. Simulation trials were conducted by focusing on the COVID-19 spread in US from 15th February to 30th July 2020 based on the daily reported data [4]. The constant incubation rate was set to 0.2 [35]. Currently, the number of confirmed cases in US is still increasing without any sign to reach the peak. The development trend of the COVID-19 epidemic is still not clear. Fig. 5 shows the estimated model parameters. The estimated virus transmission state is shown in Fig. 6 , and the associated estimation errors are listed in Table 2 .

Fig. 6

The model parameters estimated by the proposed EKF for the COVID-19 pandemic in US: (a) the infection rate; (b) the recovery rate; and (c) the death rate.

Table 2

Statistical estimation errors of the proposed EKF and numerical solution for the COVID-19 pandemic in US.

State Variables	Mean error		RMSE
State Variables	EKF	Numerical solutions	EKF	Numerical solutions
I	33,200	41,857	38,010	45,273
R	13,715	32,110	20,907	36,407
D	1014	4451	1245	7110

The model parameters estimated by the proposed EKF for the COVID-19 pandemic in US: (a) the infection rate; (b) the recovery rate; and (c) the death rate. Statistical estimation errors of the proposed EKF and numerical solution for the COVID-19 pandemic in US. It can be seen from Fig. 6(a) that in the early stage of the COVID-19 epidemic, the estimated infection rate, recovery rate and death rate fluctuate greatly. This is because the detection mechanism in US in the early stage of the epidemic was not reliable. Not every suspected case with symptoms was tested, while only the people who went abroad or had contact with people who returned to US were tested. On 13th March 2020, the US government declared a state of emergency in response to the COVID-19 disease and began to implement control measures such as city lockdown and expanded testing for the COVID-19 disease. This also led to a sharp rise in the reported active cases. As shown in Figs. 6(a) and 7 (a), from the 40th day of the simulation analysis, the estimated infection rate begins to increase. The estimated number of active cases closely follows the reported number and also begins to increase from that day, showing that the epidemic enters a period of outbreak growth.

Fig. 7

The transmission states estimated by the proposed EKF and calculated from the discrete SEIRD model based on constrained least-square parameter identification for the COVID-19 pandemic in US: (a) the number of active cases; (b) the number of recovery cases; and (c) the number of death cases. As shown in Fig. 7(a), around the 100th day of the simulation analysis, both estimated and reported numbers of active cases involve small variations and even begin to decline. This indicates that despite the continuously increased active cases, the government control measures started taking into effect to control the COVID-19 epidemic. However, the US government began to reduce the level of control measures on 1st May 2020 for the economy recovery. This action makes the reported number of active cases begin to rise sharply after 10th June 2020 (i.e., 110th of the simulation analysis). As shown in Fig. 7(a), both reported and estimated numbers of active cases begin to increase after the 110th day of the simulation analysis. The above trend is also reflected in the estimated infection rate. As shown in Fig. 6(a), the estimated infection rate involves small variations around the 100th day and begins to increase after the 110th day. As shown in Fig. 6(b), after the 60th day of the simulation analysis, the estimated recovery rate begins to increase due to the improvement of medical treatment. This trend is also reflected in both estimated and reported numbers of recovery cases. As shown in Fig. 7(b), both estimated and reported number of recovery cases begin to increase after the 60th day. As shown in Fig. 6(c), the estimated death rate has the highest value in the early stage and will then decrease after the 40th day due to the improvement of medical treatment. Figs. 7(b) and 5(c) show that the numbers of active and death cases are still increasing without reaching their peaks, implying the uncertainty of the future development of the COVID-19 epidemic in US. Similarly, the transmission state estimated by EKF was also compared with that calculated from the discrete SEIRD model (4)–(8) based on parameter identification via the offline constrained least-squares algorithm [15,34] for the COVID-19 spread in US. As shown in Fig. 7, the transmission state estimated by EKF approximates the reported data more closely than the numerical solution. Table 2 lists the statistical errors of both EKF estimation and numerical solution, demonstrating that the EKF estimation has much higher accuracy than the numerical solution for the COVID-19 spread in US. The above results clearly indicate that in the absence of effective vaccine, if US government does not adopt more strict control measures to control the spread of the epidemic in the future, the number of confirmed cases will continue increase. Fig. 8 illustrates the basic reproduction number estimated by the proposed EKF for the COVID-19 pandemic in US. The estimated reaches the peak on about the 40th day of the simulation analysis. This is also reflected in Fig. 7(a), where both estimated and reported numbers of active cases begin to rapidly increase on that day. After the peak, the estimated dramatically decreases and becomes smaller than 1 on the 80th of the simulation analysis, implying that the epidemic starts diminishing. After the 80th day, the estimated remains close to about 1, indicating that the spread of the COVID-19 epidemic has been contained to some extent but is still uncertain and may develop into an endemic epidemic, thus requiring the government to strengthen control measures to curb the spread of the COVID-19 epidemic.

Fig. 8

The estimated basic reproduction number by the proposed EKF for the COVID-19 pandemic in US.

The estimated basic reproduction number by the proposed EKF for the COVID-19 pandemic in US. Based on the control measures and reported data from 15th February to 30th July 2020 in US, simulation trials were also conducted to predict the COVID-19 spread for the next 30 days since 30th July 2020. As shown in Fig. 9 , the active cases, recovery cases and death cases predicted by the proposed EKF are very close to the actually reported cases, respectively. It can also be seen from Fig. 7(a) that despite the continuous increase of the active cases without any sign to reach the peak at the end of the simulation day (30th July 2020), the growth of the active cases exhibits a slowing down trend.

Fig. 9

The 30-days prediction for the COVID-19 pandemic in US: (a) the number of active cases; (b) the number of recovery cases; and (c) the number of death cases.

Conclusions

This paper presents a novel method for prediction and analysis of dynamic COVID-19 spread. It converts the epidemiological modelling of COVID-19 into a problem of filtering identification to estimate the dynamic behaviours of COVID-19 spread. The nonlinear state-space equation is established by discretising the SEIRD epidemiological model in time domain. A maximum likelihood estimation theory is established to estimate the time-varying model parameters. Based on above, an EKF is developed for online prediction and analysis of dynamic COVID-19 behaviours based on the estimated model parameters. Simulation analyses demonstrate that the proposed method can track and predict the evolving trend of COVID-19 spread based on daily reported cases. The results are consistent with the success in controlling the COVID-19 epidemic in Wuhan, which was attributed to the early interventions on public health, and the difficulty experienced in the initial stage of the outbreak in US that prompts for the needs of effective control measures. Future research work will focus on improving the proposed method by considering the errors involved in epidemiological modelling. It is expected that adaptive filtering algorithms will be developed to enable the proposed method to accommodate epidemiological modelling errors.

CRediT authorship contribution statement

Jialu Song: Conceptualization, Methodology, Software, Writing - original draft. Hujin Xie: Conceptualization, Methodology, Writing - original draft. Bingbing Gao: Methodology, Writing - review & editing. Yongmin Zhong: Investigation, Methodology, Writing - review & editing. Chengfan Gu: Writing - review & editing. Kup-Sze Choi: Writing - review & editing.

Declaration of Competing Interest

The authors declare no conflict of interest for this paper.

20 in total

1. Data driven computing by the morphing fast Fourier transform ensemble Kalman filter in epidemic spread simulations.

Authors: Jan Mandel; Jonathan D Beezley; Loren Cobb; Ashok Krishnamurthy
Journal: Procedia Comput Sci Date: 2010-05-01

2. Dynamics identification and forecasting of COVID-19 by switching Kalman filters.

Authors: Xiaoshu Zeng; Roger Ghanem
Journal: Comput Mech Date: 2020-08-29 Impact factor: 4.014

3. SEIR modeling of the COVID-19 and its dynamics.

Authors: Shaobo He; Yuexi Peng; Kehui Sun
Journal: Nonlinear Dyn Date: 2020-06-18 Impact factor: 5.022

4. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.

Authors: Joseph T Wu; Kathy Leung; Gabriel M Leung
Journal: Lancet Date: 2020-01-31 Impact factor: 79.321

5. Short-term forecasts of the COVID-19 pandemic: a study case of Cameroon.

Authors: C Hameni Nkwayep; S Bowong; J J Tewa; J Kurths
Journal: Chaos Solitons Fractals Date: 2020-07-11 Impact factor: 5.944

6. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study.

Authors: Kiesha Prem; Yang Liu; Timothy W Russell; Adam J Kucharski; Rosalind M Eggo; Nicholas Davies; Mark Jit; Petra Klepac
Journal: Lancet Public Health Date: 2020-03-25

7. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions.

Authors: Zifeng Yang; Zhiqi Zeng; Ke Wang; Sook-San Wong; Wenhua Liang; Mark Zanin; Peng Liu; Xudong Cao; Zhongqiang Gao; Zhitong Mai; Jingyi Liang; Xiaoqing Liu; Shiyue Li; Yimin Li; Feng Ye; Weijie Guan; Yifan Yang; Fei Li; Shengmei Luo; Yuqi Xie; Bin Liu; Zhoulang Wang; Shaobo Zhang; Yaonan Wang; Nanshan Zhong; Jianxing He
Journal: J Thorac Dis Date: 2020-03 Impact factor: 3.005