Literature DB >> 32853285

Predicting and analyzing the COVID-19 epidemic in China: Based on SEIRD, LSTM and GWR models.

Fenglin Liu¹, Jie Wang¹, Jiawen Liu¹, Yue Li¹, Dagong Liu¹, Junliang Tong¹, Zhuoqun Li¹, Dan Yu¹, Yifan Fan¹, Xiaohui Bi¹, Xueting Zhang¹, Steven Mo¹.

Abstract

In December 2019, the novel coronavirus pneumonia (COVID-19) occurred in Wuhan, Hubei Province, China. The epidemic quickly broke out and spread throughout the country. Now it becomes a pandemic that affects the whole world. In this study, three models were used to fit and predict the epidemic situation in China: a modified SEIRD (Susceptible-Exposed-Infected-Recovered-Dead) dynamic model, a neural network method LSTM (Long Short-Term Memory), and a GWR (Geographically Weighted Regression) model reflecting spatial heterogeneity. Overall, all the three models performed well with great accuracy. The dynamic SEIRD prediction APE (absolute percent error) of China had been ≤ 1.0% since Mid-February. The LSTM model showed comparable accuracy. The GWR model took into account the influence of geographical differences, with R2 = 99.98% in fitting and 97.95% in prediction. Wilcoxon test showed that none of the three models outperformed the other two at the significance level of 0.05. The parametric analysis of the infectious rate and recovery rate demonstrated that China's national policies had effectively slowed down the spread of the epidemic. Furthermore, the models in this study provided a wide range of implications for other countries to predict the short-term and long-term trend of COVID-19, and to evaluate the intensity and effect of their interventions.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 32853285 PMCID： PMC7451659 DOI： 10.1371/journal.pone.0238280

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Novel coronavirus pneumonia (coronavirus disease 2019, COVID-19) break out firstly in Wuhan, Hubei Province, China in December 2019, then the epidemic became prevalent in the rest of the world. With the research on COVID-19 so far, through the comparison of the gene sequence of the virus with that of the mammalian coronavirus, some studies found that its source may be related to bat, snake, mink, Malayan pangolins, turtle and other wild animals [1-4]. COVID-19 can also cause severe respiratory diseases such as fever and cough [5], and there is a possibility of transmission after symptoms of lower respiratory diseases [6]. However, unlike SARS-CoV and MERS-CoV, COVID-19 is separated from airway epithelial cells of patients [6], yet the mechanism of receptor recognition is not consistent with SARS [7]. Therefore, the pathogenicity of COVID-19 is less than that of SARS [8], and its transmissibility is higher than that of SARS [9]. In addition, this new coronavirus presents human-to-human transmission [10], and close contact could lead to group outbreaks [11]. As of July 7th, 2020, 85,359 confirmed cases and 4,648 deaths had been reported in China [12]. In addition to China, there are over 200 countries and regions in the world with a total of 11,630,898 of confirmed cases and 538,512 of deaths [12]. The outbreak of COVID-19 happened right before the Lunar New Year, which is typical Chinese Spring Festival transportation period. With a population of over 11 million, Wuhan is one of the major transportation hubs in China as well as a core city of the Yangtze River Economic Belt. The time and location of the outbreak further led to the rapid spread of the epidemic in China [13]. Since there is still no vaccine or antiviral drug specifically for COVID-19, the government's policies or actions play an important role in flatting the epidemic curve [14]. From the perspective of public health, the interventions of Wuhan government have achieved the purpose of reducing the flow of people and the risk of exposure to the diagnosed patients, and also effectively slowed down the spread of the epidemic [15]. Nevertheless, COVID-19 can be transmitted by asymptomatic carriers [16], and some of the recovered patients may still be virus carriers [17]. In order to implement non-pharmaceutical interventions more effectively, we used a combination of epidemiological methods, mathematical or statistical modeling tools to provide valuable insights and predictions as benchmarks. For the study of infectious diseases like COVID-19, SARS, and Ebola, most of the literature used descriptive research or model methods to assess indicators and analyze the effect of interventions, such as combining migration data to evaluate the potential infection rate [18, 19], understanding the impact of factors like environmental temperature and vaccines that might be potentially linked to the diseases [20, 21], using basic and time-varying reproduction number (R0 & Rt) to estimate changeable transmission dynamics of epidemic conditions [22-27], calculating and predicting the fatal risk to display any stage of outbreak [28-30], or providing suggestions and interventions from risk management and other related aspects based on the results of modeling tools or historical lessons [31-39]. Some literature only used one kind of model to simulate and predict the course of diseases. For instance, to use relatively common epidemiological dynamics models like SEIR or SIRD to forecast epidemic trends and peaks in certain provinces, even the world [9, 40–44]; to apply some other types of statistical models such as the logistic growth models or time series approaches to analyze the epidemic situation [45, 46], or to develop new models to support more complex trajectories of epidemics or to predict the number of confirmed cases and the spatial progression of outbreaks [47-49]. Several studies were further expanded based on the basic epidemic dynamic models. For example, joining the border protection mechanism with the SEIR model to better identify high-risk groups and infected cases [50]; adding the effect of media or awareness into basic models to assess whether these outside influences would possible change the transmission mode of infectious diseases [51, 52]; or according to transmission routes contained in dynamic models, using a multiplex network model or transmission network topology to analyze the outbreak scale and epidemic spread more accurately [53, 54]. A small number of studies combined the analysis capabilities of two types of models, like SEIR model and the recurrent neural networks model (RNN), to determine whether certain interventions could affect the results of outbreak control [55]. However, we did not find any analysis method using geographically weighted regression (GWR) on COVID-19 study based on our literature research. There is also a lack of understanding the model efficacy of predicting the epidemic curve among different algorithms. In this study, an SEIR's extended model SEIRD was used to simulate the epidemic situation in China and to predict the number of confirmed and cured cases in each province and several major Chinese cities. An LSTM model combined with traffic data and a GWR model were used to predict the number of confirmed patients. Specifically, GWR Model showing geographical differences was used to predict the development of epidemic situation and analyze the impact of geographical factors. This paper also compares the characteristics and prediction ability of these models. In the absence of vaccines and drugs for COVID-19, it makes sense to use multiple models to show the situation and intensity of non-pharmaceutical interventions needed to simulate and guide the control of outbreaks.

Materials and methods

Data sources

Daily updated COVID-19 epidemiological data used in this study were retrieved from National Health Commission of China [12] and accessed via https://github.com/wybert/open-wuhan-ncov-illness-data. The daily number of outbound from Wuhan city and relevant migration indice from January to March were collected from an online platform called Baidu Qianxi [56]. The demographic data and medical resources data were from China urban statistical yearbook published by the National Bureau of Statistics as shown in S1 Table.

Modified SEIRD model

This study used SEIRD model and the changes in the status of the susceptible (S), exposed (E), infected (I), recovered (R) and dead (D) population in the total population (N) are shown in Fig 1.

Fig 1

The changes of different status in the modified SEIRD model in this study.

According to the medical characteristics and clinical trials of COVID-19, both confirmed patients and asymptomatic carriers have the ability to transmit the virus. Therefore, susceptible people have a certain chance to become infected after they come into contact with exposed or infected individuals [43]. Carriers in the exposed status may develop obvious symptoms after the incubation period and become diagnosed or they may be recovered. The final status of individuals can be basically divided into two categories: one is the recovery from the combined effects of treatment in hospital and autoimmunity, and the other is the death without effective treatment. In the model formula, the infectious rate β needs to be adjusted in real time to adapt to the trend of disease development. In the middle and late stages of the epidemic, the number of daily new cases decreased significantly due to the positive influence of government policies. Thus, to better fit the model, we added an attenuation factor desc to β. Based on the basic SEIRD model formulas [57, 58], our modified model was shown as Eqs (1–6). Here, the parameter t denotes the time; β is the infectious rate; α is the rate for the exposed to be infected; γ is recovery rate for the exposed; γ is the recovery rate for the infected; k is the mortality rate; “desc” is the attenuation factor for β, so that β decays exponentially when 0

LSTM model

LSTM (Long Short-Term Memory) architecture for recurrent neural networks was first proposed in 1997 [59]. A LSTM block is illustrated in Fig 2. It features three gates (input, forget, and output), a block input and an output. The output of the block is recurrently connected to the input of the block.

Fig 2

The structure of a LSTM block in this study.

The vector formulas for a LSTM layer forward pass are given below in Eqs (7–12). Here, z, i, f, c, o and h denote the block input, input gate, forget gate, cell state, output gate and block output, respectively. And x represents the input vector at time t, ⨀ is the point-wise multiplication operator of two vectors, the W, W, W, and W are input weight matrices, and b, b, b, and b are bias vectors. Logistic sigmoid is used as the activation function of the gates and ReLU is used as the activation function of the block input and output.

GWR model

Epidemic situations and medical resources in different geographic situations may have different extents of influence on the development of the epidemic. Ordinary least squares fitting method for regression may not be applicable in this case. Geographically weighted regression model (GWR) was proposed in 1996 [60], which extended the ordinary linear regression model and embedded the geographic location data into the regression parameters as shown below: where y is the i dependent variable, x is the kth independent variable in location i, p is the total number of independent variables, β is the intercept parameter in location i, β is the regression coefficient for the k independent variable in location i, which varies with the geographical location, and ε is the error term in location i. The spatial weight matrix in this study uses the bi-square kernel function shown below: if d

Results

SEIRD model

In this study, we used the modified SEIRD model to make predictions of the number of cumulative confirmed cases in the next day for all provinces, province-level municipalities and autonomous regions in China as well as Wuhan City. The parameters were adjusted daily in our dynamic SEIRD model based on the daily updated epidemic data. The comparison of the actual data on February 14th and February 25th with the forecast results of our models is shown in Table 1. The percent error was calculated using the formula: (predicted number—actual number)/ actual number × 100%. On February 14th, the absolute percent errors of all provinces were < 5%. The percent error for Wuhan City, Hubei Province and China were -3.00%, -1.60% and 1.00%, respectively. On February 25th, the absolute percent error of prediction of cumulative confirmed cases in China was < 0.10%. The absolute percent errors of most provinces were < 0.10%, among which the absolute percent errors in Wuhan City was < 0.10% and that of Hubei province was less than 0.10%. Regarding the number of recovered cases, Wuhan City and Hubei Province had percent errors of -6.03% and -3.12%, respectively. The overall prediction of recovered of the whole country was consistent with the actual situation with percent error of -2.46%. The predicted number of deaths in Hubei province was off by 1.40% (forecast 2,599 vs. actual 2,563).

Table 1

The comparison of predicted cumulative confirmed cases with actual data on February 14th and 25th in China using SEIRD model.

	February 14^th			February 25^th
Regions	Predicted	Actual	Percent errors	Predicted	Actual	Percent errors
Wuhan	34902	35991	-3.00%	47014	47071	-0.12%
Anhui	952	934	1.90%	989	989	0.00%
Beijing	378	372	1.60%	400	400	0.00%
Chongqing	527	529	-0.40%	577	576	0.17%
Fujian	288	281	2.50%	293	294	-0.34%
Gansu	89	90	-1.10%	91	91	0.00%
Guangdong	1288	1261	2.10%	1348	1347	0.07%
Guangxi	237	226	4.90%	253	252	0.40%
Guizhou	142	140	1.40%	146	146	0.00%
Hainan	162	158	2.50%	168	168	0.00%
Hebei	282	283	-0.40%	311	311	0.00%
Heilongjiang	418	418	0.00%	480	480	0.00%
Henan	1213	1184	2.40%	1273	1271	0.16%
Hong Kong	53	53	0.00%	75	81	-7.41%
Hubei	51179	51986	-1.60%	64765	64786	-0.03%
Hunan	995	988	0.70%	1017	1016	0.10%
Inner Mongolia	66	65	1.50%	75	75	0.00%
Jiangsu	588	593	-0.80%	631	631	0.00%
Jiangxi	920	900	2.20%	934	934	0.00%
Jilin	88	86	2.30%	93	93	0.00%
Liaoning	123	117	5.10%	121	121	0.00%
Macau	10	10	0.00%	10	10	0.00%
Ningxia	68	67	1.50%	71	71	0.00%
Qinghai	18	18	0.00%	18	18	0.00%
Shaanxi	239	230	3.90%	245	245	0.00%
Shandong	534	523	2.10%	757	755	0.26%
Shanghai	131	126	4.00%	335	335	0.00%
Shanxi	324	318	1.90%	132	133	-0.75%
Sichuan	471	463	1.70%	528	529	-0.19%
Taiwan	18	18	0.00%	28	30	-6.67%
Tianjin	122	120	1.70%	136	135	0.74%
Tibet	1	1	0.00%	1	1	0.00%
Xinjiang	68	65	4.60%	76	76	0.00%
Yunnan	162	162	0.00%	174	174	0.00%
Zhejiang	1167	1155	1.00%	1206	1205	0.08%
China	63321	63940	-1.00%	77757	77779	-0.03%

Fig 3 shows a summary of the prediction results of the cumulative number of COVID-19 cases across the country, Hubei province, Wuhan city and Beijing city by the modified SEIRD dynamics model. With the increase of the total number of cases, the percent errors in all four regions tended to decrease and the general absolute percent error in late February was ≤ 0.5%.

Fig 3

Summary of the prediction for cumulative number of COVID-19 cases and percent errors by modified SEIRD model for China, Hubei province, Wuhan city and Beijing city.

Actual and predicted number of confirmed cases using the modified SEIRD model for China, Hubei province and Wuhan city are shown in Fig 4 (Hubei province and Wuhan City adjusted the criteria for diagnosis on February 13th, and the number of confirmed cases increased by about 10,000 on that day [63]. In order to smooth the sudden change, the number of cumulative cases before February 12th in Hubei City and Wuhan province was proportionally enlarged according to the new criteria. The same for Fig 5). The actual and calculated values of these three regions provided satisfying fitting curves, indicating that the situation simulated by the model was basically in line with the actual situation of the epidemic development. In this study, the inflection point was defined as the date when the number of existing confirmed cases has the largest slope. According to the SEIRD dynamic model, the inflection points of all provinces appeared generally in February, while the specific time varied from region to region. The results of model simulation revealed that the inflection point in Wuhan city and Hubei province showed up in early February, and that of the whole country roughly in the first half of February, which basically conformed to the spread of COVID-19 in China.

Fig 4

Number of actual and predicted data of existing confirmed cases by the modified SEIRD model for China, Hubei province and Wuhan city.

Fig 5

Long-term prediction of confirmed cases by the modified SEIRD model for China, Hubei province, Wuhan city and Beijing city.

Using data on March 5th, the model predicted the long-term trends in the number of confirmed, cured and deaths for China, Hubei province and Wuhan city (Fig 5). Again, the model used adjusted historical data as discussed above. Under the various social non-pharmaceutical interventions and not allowing for the imported cases from foreign countries, the cumulative number of confirmed nationwide was expected to reach about 83,000 at the end of the epidemic. Hubei Province was expected to have a total of about 70,000 confirmed cases and Wuhan City about 50,000. Data from four regions, Zhejiang, Guangdong, Beijing, and Shanghai were selected to train the LSTM neural network to predict the number of cumulative confirmed cases of the next day. Since the LSTM model had a memory function, the first feature included in the model was the number of cumulative confirmed cases on the previous day. Considering that the number of migrants from Wuhan also affected the studied city, thus the number of migrants from Wuhan was also included in the analysis. There was a certain probability that some migrants from Wuhan may be patients because of the virus’s incubation period, and the inference of this probability was based on the number of confirmed cases in Wuhan. Therefore, the second feature considered the number of migrants from Wuhan on the previous day, and the confirmed number of patients in Wuhan on the previous day. The feature was calculated as the cumulative number of immigrants from Wuhan multiplied by the incidence of COVID-19 in Wuhan on the previous day. This LSTM architecture was designed into 4 layers: an input layer, an LSTM layer (hidden layer), a fully-connected layer and an output. Each LSTM neuron had 10 hidden features, and the activation function was ReLU. The loss function was MSE, and the optimizer was “Adam”. The model structure diagram is as Fig 6. This study used the grid search method to set different hyperparameters for data in different regions.

Fig 6

LSTM network structure of predicting COVID-19.

The model was trained and the predicted results for latest 8 consecutive days as shown in Figs 7 and 8. Finally we forecast the number of cumulative confirmed cases on the next day. The results of the forecast on February 2nd (predicting the number of confirmed cases on February 3rd) and February 13th (predicting the number of confirmed cases on February 14th) are shown in Figs 7 and 8, respectively.

Fig 7

The results of prediction of cumulative confirmed cases in different regions for February 3rd.

Fig 8

The results of prediction of cumulative confirmed cases in different regions for February 14th.

The percent error is calculated as: (predicted number—actual number) / actual number ×100%. The results are shown in Tables 2 and 3. The absolute percent errors are ≤ 5.1% in all models /on February 3rd, and ≤ 0.63% in all models on February 14th.

Table 2

Results of the prediction of number of confirmed cases on February 3rd.

Area	Date	Actual number of confirmed cases	Predicted number of confirmed cases	Percent error
Zhejiang	2020/2/3	724	723	-0.14%
Guangdong	2020/2/3	725	762	5.10%
Beijing	2020/2/3	212	221	4.25%
Shanghai	2020/2/3	203	213	4.93%

Table 3

Results of the prediction of number of confirmed on February 14th.

Area	Date	Actual number of confirmed cases	Predicted number of confirmed cases	Percent error
Zhejiang	2020/2/14	1155	1151	-0.35%
Guangdong	2020/2/14	1261	1255	-0.48%
Beijing	2020/2/14	372	372	0.00%
Shanghai	2020/2/14	318	316	-0.63%

In this study, the data of 220 cities that had confirmed cases on February 2nd were selected to predict the number of confirmed cases on February 3rd. The number of confirmed cases, the number of deaths and the number of cured cases are main indicators for the epidemic. Among them, the number of confirmed cases was the mostly used and reflected the severity of COVID-19 epidemic. Therefore, this study used the cumulative number of confirmed cases in different places released by the National Health Commission as dependent variable. In this study we select the population of each city, the number of hospitals per 10,000 people, the number of doctors per 10,000 people, the number of inpatient beds per 10,000 people, the number of confirmed cases, the number of cured cases, and the number of deaths one day and 2 days ago as independent variables. The GWR model was fitted using the data of February 2nd, and we further made forecast for the number of the confirmed cases on February 3rd. The R2 of GWR regression on February 2nd was 99.98% and the R2 of the prediction of the data on February 3rd was 97.95%. The percent errors of fitting and prediction varied for different cities: for Beijing were 11.67% and 3.95%, respectively; for Shanghai were 2.24% and -5.88%, respectively, for Xiaogan in Hubei Province were -1.27% and 1.70%, respectively, and for Wuhan were 0.00% and 14.57%, respectively. The summary of the intercept and coefficients of the independent variables were listed in Table 4. It shows that the coefficients of the demographic data, and the medical resources data have larger variations than those of epidemic data. The coefficients of population, number of hospitals per 10,000 people, number of doctors per 10,000 people, dead_lag1, confirmed_lag2, cured_lag2 were negative, showing that these factors have negative influence on the dependent variable. While the other independent variables, number of inpatient beds per 10,000 people, confirmed_lag1, cured_lag1, dead_lag2 have positive coefficients, indicating positive influence on the dependent variable as shown in Table 4.

Table 4

Summary of the coefficients of GWR model.

Variable	Min	Upper Quartile	Median	Lower Quartile	Max	Overall
Intercept	1.339	1.419	1.484	1.533	1.970	1.457
Population/10,000	-0.450	-0.419	-0.406	-0.394	-0.331	-0.400
Number of hospitals per 10,000 people	-7.512	-6.875	-6.720	-6.532	-6.124	-6.926
Number of doctors per 10,000 people	-0.193	-0.169	-0.163	-0.157	-0.145	-0.158
Number of inpatient beds per 10,000 people	0.122	0.127	0.128	0.130	0.136	0.125
Confirmed_lag1 ^a	1.535	1.541	1.544	1.545	1.556	1.547
Cured_lag1 ^a	6.989	7.130	7.177	7.220	7.312	7.087
Dead_lag1 ^a	-10.902	-10.664	-10.524	-10.429	-9.787	-10.494
Confirmed _lag2 ^a	-0.417	-0.404	-0.401	-0.398	-0.390	-0.405
Cured_lag2 ^a	-9.417	-9.358	-9.308	-9.271	-8.994	-9.231
Dead_lag2 ^a	14.431	15.138	15.245	15.395	15.631	15.206

a Confirmed, Cured, and Dead denote the number of confirmed, cured, dead cases, respectively, and lag1 and lag2 denote one day and 2 days ago, respectively

Discussion

Sensitivity analysis of parameters

As of mid-March 2020, more than 60,000 people had been cured in 31 provinces, province-level municipalities, and autonomous regions in China, and new cases of infection were mainly led by overseas imports. Although the COVID-19 epidemic was not over, the traffic in the low- and medium-risk areas in Hubei province had been gradually resuming, indicating that the government's non-pharmaceutical interventions had significantly positive effects. In this study, the modified SEIRD model was used to conduct parameter sensitivity analysis of β, desc, and γ based on data before March 5th, so as to simulate the impact of prevention and control measures on real-time infections for China, Hubei Province, Wuhan city, and Beijing city (Fig 9).

Fig 9

Number of infections predicted by modified SEIRD model for China, Hubei province, Wuhan city and Beijing city under different scenarios.

(A) β, (B) desc, and (C) γ.

Number of infections predicted by modified SEIRD model for China, Hubei province, Wuhan city and Beijing city under different scenarios.

(A) β, (B) desc, and (C) γ. The decrease of the infectious rate β would promote the reduction of infections during the entire epidemic stage with other conditions being equal (Fig 9A). The shape of the epidemic curve was basically unchanged, but the duration of the epidemic increase as the infectious rate itself increases. The number of cases increased obviously, and the peak of real-time infections was postponed as the infectious rate increases. When the infectious rate increased to 125%, the epidemic size doubled with the delay of the peak of real-time infections by about 10 days (Fig 9A). Moreover, increasing the attenuation factor of infectious rate could lead to a significant slowdown in the spread of the epidemic and the shape of the epidemic curve changed (Fig 9B). In the beginning, the growth of attenuation factor changed the number of confirmed cases little, but the number had changed dramatically over time, the peak of the epidemic moved forward with the increase in the attenuation factor (Fig 9B). The duration of the epidemic also advanced correspondingly. A combination of the changes in the infectious rate β itself and the changes in the attenuation factor of β could reflect the effects of the measures such as timely isolation of confirmed or suspected patients and reduction of population mobility. Coupled with the community containment measure, the number of exposed, infected and susceptible individuals outside were greatly reduced, so that the extent of the epidemic in China had been under control. Implemented metropolitan-wide quarantine of Wuhan city itself could also interfere with the change of infectious rate. The decrease in the number of daily new confirmed cases since late February showed that the corresponding policies had effectively blocked the spread of the epidemic. The change in the recovery rate of infected γ had little effect in the early stage of the epidemic. As time went by, the growth of recovery rate could significantly raise the number of recovered, thus advancing the peak time of the real-time confirmed cases (Fig 9C). When the recovery rate raised from 75% to 125%, the whole country, Hubei province, Wuhan city and Beijing city could reach the time of maximum real-time infections about 6–15 days in advance, and the scale of the epidemic could be reduced as well (Fig 9C). In fact, China transported advantage medical resources of more than 20,000 people to Hubei province [5] in order to achieve the goal of early detection, early reporting, early diagnosis, and early isolation. Besides, the measure of “one province helping one city” established provincial counterparts to support the rescue work in Hubei province except Wuhan [5], so as to rationally allocate advanced resources. These interventions could improve the treatment and medical level of key provinces and cities, thereby increasing the recovery rate of infected and reducing the mortality rate. By March 13th, 2020, more than a thousand people each day have been cured and discharged for 29 consecutive days [6], indicating the effectiveness of related policies. Although the COVID-19 has been effectively controlled in China, it has spread rapidly in other countries. Italy, the United States and Spain have become the focused areas of the outbreak. By May 2nd, 2020, the United States, as the country with the largest number of confirmed cases, has over 1.1 million cases, and Spain had 216,582 cases, and Italy ranked the third with 207,428 confirmed patients [12]. In order to control the spread of coronavirus, America took measures to reduce the mobility of the population, built hospitals and facilitate the treatment of the coronavirus [64-67]. Similar to the US, Italy and Spain also tried to limit the movement and gathering of the crowds, improve the protection level and provide more medical resources [64, 68–70]. In conclusion, all three countries have implemented various interventions to slow down the spread of the COVID-19 disease. The measures could be basically divided into two categories: reducing the infection rate and increasing the recovery rate. However, according to the recent large-scale outbreak in the United States and Spain, it could be found that a part of the people in these two countries might have insufficient awareness of prevention and control of the epidemic [64]. The supervision of those prevention and control measures needs further improvement. Thanks to the joint efforts of the people across the Italy, while the number of confirmed cases in Italy is still large, this country, which was called "the second Hubei province" in the early stage of the epidemic, has a trend of declining new cases of infection and death [12]. In order to test the capability of the SEIRD model in foreign countries, data before June 29th, 2020 of Italy were used to calculate the epidemic curve. The results of the model also fitted well with the actual data as shown in Fig 10. Although some other countries successfully controlled the epidemic using similar measures with China [71], they may not always work in other countries because the effect depends on the public attitudes towards the measures and commitment to the intervention as debated in [72]. Therefore, in the face of the same epidemic situation and similar crises, our SEIRD dynamics model can be potentially applied to other countries to evaluate the intensity and effect of policies implemented by simulating and forecasting the situation of the epidemic, but the effect may be limited by the attitudes and action of the public.

Fig 10

Number of actual and predicted data of existing confirmed cases by the modified SEIRD model for Italy.

Spatial distribution of coefficients in GWR model

To better understand the spatial distribution of the coefficients of the independent variables in the GWR model, four parameters and their correlations in the model of February 2nd have been studied to evaluation the heterogeneity of their coefficients in space. There was a strong negative correlation between the number of hospitals per 10,000 people and the number of confirmed cases (Fig 11A). This can be explained as that the isolation of confirmed cases in the hospital can prevent contagion. From the perspective of the spatial distribution of the regression coefficients, it has a trend of gradual decline from the northeast to the southwest and northwest of China (Fig 11A). The most influenced areas are located in the northeast of China, while the least influenced areas are in southwest and northwest of China.

Fig 11

Spatial distribution of the regression coefficients in the GWR model on February 2nd (the source of the maps: USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/).

Spatial distribution of the regression coefficients in the GWR model on February 2nd (the source of the maps: USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/).

(A) Coefficients of number of hospitals per 10,000 people. (B) Coefficients of number of doctors per 10,000 people. (C) Coefficients of number of confirmed patients one day ago. (D) Coefficients of number of recovered patients one day ago. (This figure is similar but not identical to the original image of Fig 10 in last version and is for therefore illustrative purpose only). There was a negative correlation between the number of doctors per 10,000 people and the number of confirmed cases (Fig 11B). From the perspective of the spatial distribution of regression coefficient, it shows a gradually decreasing trend from the northeast and northwest of China to the south (Fig 11B). The regions that are influenced the most are concentrated in northeast and northwest of China, while the least influenced regions are in the south. There was a positive correlation between the number of confirmed cases and the confirmed cases one day ago (Fig 11C). This suggests that the more cases confirmed the day before, the more confirmed cases would emerge the next day. Effective local quarantine measures can be used to prevent a pandemic. From the perspective of the spatial distribution of the regression coefficient, it shows a trend of gradual decline from the northeast to the southwest and northwest of China (Fig 11C). This trend is not significant, which shows a universal pattern across the country. There was a positive correlation between the number of cured case and the number of confirmed cases one day ago (Fig 11D). From the perspective of the spatial distribution of regression coefficient, it shows a gradually decreasing trend from the northeast and northwest of China to the south, with the most influenced areas in the northeast and northwest, and the least influenced areas in the south (Fig 11D).

Comparison of SEIRD, LSTM and GWR models

By comparing the prediction capabilities of these three types of models, the modified SEIRD, LSTM and GWR model could effectively predict the epidemic data for the next day generally. The percent errors of the SEIRD model to predict confirmed cases were within ±5.0% in all of these four selected regions (Beijing, Wuhan, Hubei and China) shown in Table 5. The LSTM model also fit well to the real curve by incorporating traffic big data, indicating good simulation and prediction effects. The average percent error of LSTM model predictions for the four selected provinces and cities was within ±1.0% on February 14th (Table 5). GWR model could reflect spatial heterogeneity but larger percent errors showed than the other two models in some cases (Table 5). The MAPE (Mean Absolute Percentage Error) for the SEIRD, LSTM and GWR models in the selected areas were 1.70%, 1.51%, 3.44%, respectively. In order to compare the APE (Absolute Percent Error) of the three models, we ran Wilcoxon Signed Rank Test for the paired observations in Table 5. The p-values for the hypotheses: the APE of GWR> that of LSTM, the APE of GWR > that of SEIRD and the APE of SEIRD> that of LSTM were 0.173, 0.187 and 0.459, respectively, thus not significant at the level of 0.05. Overall, the prediction efficacy of GWR model was inferior to those of SEIRD and LSTM models according to the MAPE and p-values.

Table 5

Comparison of the APE (Absolute percent error) of different models.

Province/City	Date	SEIRD	LSTM	GWR
Wuhan	2020/2/3	3.01%	-	14.57%
Beijing	2020/2/3	4.25%	4.25%	3.95%
Shanghai	2020/2/3	1.48%	4.93%	5.88%
Guangdong	2020/2/3	2.76%	5.10%	-
Zhejiang	2020/2/3	2.07%	0.14%	-
Wuhan	2020/2/14	3.00%	-	1.00%
Beijing	2020/2/14	3.03%	0.00%	3.62%
Shanghai	2020/2/14	1.61%	0.63%	1.17%
Guangdong	2020/2/14	1.89%	0.48%	-
Zhejiang	2020/2/14	2.14%	0.35%	-
Wuhan	2020/2/25	0.12%	-	0.14%
Beijing	2020/2/25	0.00%	0.25%	0.04%
Shanghai	2020/2/25	0.00%	0.60%	0.58%
Guangdong	2020/2/25	0.07%	0.07%	-
Zhejiang	2020/2/25	0.08%	1.33%	-

Conclusions

In this study, the modified SEIRD model, the LSTM model with traffic data and the GWR model reflecting the geographical environment were used to make forecasts for the development of COVID–19 in China. These three types of models all showed remarkable prediction capabilities. The parameter sensitivity analysis reflected the effectiveness of non-pharmaceutical interventions. Now the epidemic quickly spread abroad, in the absence of targeted pharmaceutical treatment such as vaccines, the interventions implemented in various countries were basically similar to those in China, which were based on the two aspects: reducing the infectious rate and improving the recovery rate. As the number of daily new cases continues to increase globally, models in this study shows potential being used for epidemic curve prediction and prevention of COVID-19 in other countries.

Geographic, demographic and medical resources data for different cities.

(DOCX) Click here for additional data file.

41 in total

1. Presumed Asymptomatic Carrier Transmission of COVID-19.

Authors: Yan Bai; Lingsheng Yao; Tao Wei; Fei Tian; Dong-Yan Jin; Lijuan Chen; Meiyun Wang
Journal: JAMA Date: 2020-04-14 Impact factor: 56.272

2. Modeling epidemic spread with awareness and heterogeneous transmission rates in networks.

Authors: Yilun Shang
Journal: J Biol Phys Date: 2013-05-03 Impact factor: 1.365

3. A novel sub-epidemic modeling framework for short-term forecasting epidemic waves.

Authors: Gerardo Chowell; Amna Tariq; James M Hyman
Journal: BMC Med Date: 2019-08-22 Impact factor: 8.775

4. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster.

Authors: Jasper Fuk-Woo Chan; Shuofeng Yuan; Kin-Hang Kok; Kelvin Kai-Wang To; Hin Chu; Jin Yang; Fanfan Xing; Jieling Liu; Cyril Chik-Yan Yip; Rosana Wing-Shan Poon; Hoi-Wah Tsoi; Simon Kam-Fai Lo; Kwok-Hung Chan; Vincent Kwok-Man Poon; Wan-Mui Chan; Jonathan Daniel Ip; Jian-Piao Cai; Vincent Chi-Chung Cheng; Honglin Chen; Christopher Kim-Ming Hui; Kwok-Yung Yuen
Journal: Lancet Date: 2020-01-24 Impact factor: 79.321

5. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.

Authors: Joseph T Wu; Kathy Leung; Gabriel M Leung
Journal: Lancet Date: 2020-01-31 Impact factor: 79.321

6. Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event.

Authors: D Paraskevis; E G Kostaki; G Magiorkinis; G Panayiotakopoulos; G Sourvinos; S Tsiodras
Journal: Infect Genet Evol Date: 2020-01-29 Impact factor: 3.342

7. The potential impact of COVID-19 in refugee camps in Bangladesh and beyond: A modeling study.

Authors: Shaun Truelove; Orit Abrahim; Chiara Altare; Stephen A Lauer; Krya H Grantz; Andrew S Azman; Paul Spiegel
Journal: PLoS Med Date: 2020-06-16 Impact factor: 11.069

8. Potent binding of 2019 novel coronavirus spike protein by a SARS coronavirus-specific human monoclonal antibody.

Authors: Xiaolong Tian; Cheng Li; Ailing Huang; Shuai Xia; Sicong Lu; Zhengli Shi; Lu Lu; Shibo Jiang; Zhenlin Yang; Yanling Wu; Tianlei Ying
Journal: Emerg Microbes Infect Date: 2020-02-17 Impact factor: 7.163

9. Data-based analysis, modelling and forecasting of the COVID-19 outbreak.

Authors: Cleo Anastassopoulou; Lucia Russo; Athanasios Tsakris; Constantinos Siettos
Journal: PLoS One Date: 2020-03-31 Impact factor: 3.240

10. A novel cohort analysis approach to determining the case fatality rate of COVID-19 and other infectious diseases.

Authors: Charit Samyak Narayanan
Journal: PLoS One Date: 2020-06-15 Impact factor: 3.240

9 in total

1. Integrating County-Level Socioeconomic Data for COVID-19 Forecasting in the United States.

Authors: MichaelC Lucic; Hakim Ghazzai; Carlo Lipizzi; Yehia Massoud
Journal: IEEE Open J Eng Med Biol Date: 2021-07-09

Review 2. A review of GIS methodologies to analyze the dynamics of COVID-19 in the second half of 2020.

Authors: Ivan Franch-Pardo; Michael R Desjardins; Isabel Barea-Navarro; Artemi Cerdà
Journal: Trans GIS Date: 2021-07-11

3. Spatial differentiation and determinants of COVID-19 in Indonesia.

Authors: Millary Agung Widiawaty; Kuok Choy Lam; Moh Dede; Nur Hakimah Asnawi
Journal: BMC Public Health Date: 2022-05-23 Impact factor: 4.135

4. BeCaked: An Explainable Artificial Intelligence Model for COVID-19 Forecasting.

Authors: Duc Q Nguyen; Nghia Q Vo; Thinh T Nguyen; Khuong Nguyen-An; Quang H Nguyen; Dang N Tran; Tho T Quan
Journal: Sci Rep Date: 2022-05-13 Impact factor: 4.996

5. Spatio-temporal characteristics and control strategies in the early period of COVID-19 spread: a case study of the mainland China.

Authors: Jiachen Ning; Yuhan Chu; Xixi Liu; Daojun Zhang; Jinting Zhang; Wangjun Li; Hui Zhang
Journal: Environ Sci Pollut Res Int Date: 2021-04-27 Impact factor: 4.223

6. Exploring temporal varying demographic and economic disparities in COVID-19 infections in four U.S. areas: based on OLS, GWR, and random forest models.

Authors: Junfeng Jiao; Yefu Chen; Amin Azimian
Journal: Comput Urban Sci Date: 2021-12-04