Literature DB >> 34121818

Analysis on the spatio-temporal characteristics of COVID-19 in mainland China.

Biao Jin1,2, Jianwan Ji3, Wuheng Yang4, Zhiqiang Yao1, Dandan Huang1, Chao Xu1,2.   

Abstract

COVID-19 has brought many unfavorable effects on humankind and taken away many lives. Only by understanding it more profoundly and comprehensively can it be soundly defeated. This paper is dedicated to studying the spatial-temporal characteristics of the epidemic development at the provincial-level in mainland China and the civic-level in Hubei Province. Moreover, a correlation analysis on the possible factors that cause the spatial differences in the epidemic's degree is conducted. After completing these works, three different methods are adopted to fit the daily-change tendencies of the number of confirmed cases in mainland China and Hubei Province. The three methods are the Logical Growth Model (LGM), Polynomial fitting, and Fully Connected Neural Network (FCNN). The analysis results on the spatial-temporal differences and their influencing factors show that: (1) The Chinese government has contained the domestic epidemic in early March 2020, indicating that the number of newly diagnosed cases has almost zero increase since then. (2) Throughout the entire mainland of China, effective manual intervention measures such as community isolation and urban isolation have significantly weakened the influence of the subconscious factors that may impact the spatial differences of the epidemic. (3) The classification results based on the number of confirmed cases also prove the effectiveness of the isolation measures adopted by the governments at all levels in China from another aspect. It is reflected in the small monthly grade changes (even no change) in the provinces of mainland China and the cities in Hubei Province during the study period. Based on the experimental results of curve-fitting and considering the time cost and goodness of fit comprehensively, the Polynomial(Degree = 18) model is recommended in this paper for fitting the daily-change tendency of the number of confirmed cases.
© 2021 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  COVID-19; Correlation analysis; Curve-fitting; Impact indicators; Spatial-temporal characteristics

Year:  2021        PMID: 34121818      PMCID: PMC8183012          DOI: 10.1016/j.psep.2021.06.004

Source DB:  PubMed          Journal:  Process Saf Environ Prot        ISSN: 0957-5820            Impact factor:   6.158


Introduction

The coronavirus disease 2019 (COVID- 19) has spread worldwide. The confirmed cases have successively appeared in more than 200 countries. COVID-19 affects people's daily lives and the social economy's operation and makes many people lose their lives. It is the common enemy of all humankind. As the first country that reports COVID-19 to the United Nations and society, the Chinese government and its people have made significant contributions to the fight against COVID-19. The Chinese government has been announcing worldwide the number of confirmed cases, new cases, died cases, cured cases, and suspected cases, as well as the response measures it has taken, nearly in real-time (N. H. C. of the People's Republic of China, 2021). These measures enable people to know the development and change of COVID-19 in China and provide decision supports and experience references for other countries to cope with COVID-19. Also, due to the openness of the data, many researchers can carry out relevant researches on COVID-19. In this paper, the data about the number of confirmed cases in China are obtained from the website of the National Health Commission of the People's Republic of China (N. H. C. of the People's Republic of China, 2021) to analyze the spatial-temporal characteristics of the epidemic situation in China during the period from January 16, 2020, to July 31, 2020. After that, the possible impact indicators that cause regional differences in the number of confirmed cases are explored. Then the curve-fitting on the daily-change tendency of the number of confirmed cases is carried on. This paper aims to understand the spatio-temporal differences of the epidemic at both the provincial-level in mainland China and the civic-level in Hubei Province. It also proves to a certain extent that the epidemic prevention measures adopted by the governments at all levels in mainland China are effective.

Related work

Since the outbreak of COVID-19, researchers worldwide have been carrying out a lot of research works on it. These researches can be mainly divided into the following six categories: (1) to study the impact of COVID-19 on human physical and mental health from a biomedical perspective (Tsamakis et al., 2020, Xiong et al., 2020, Pascoal et al., 2021); (2) to study the impact of COVID-19 on human production, life, and social and economic development from a sociological perspective (Takyi and Bentum-Ennin, 2020, Qian et al., 2021, Shang et al., 2021, Beiderbeck et al., 2021, Jiang et al., 2021); (3) to creatively propose new mathematical models or revise some existing models based on relevant data for predicting and analyzing the development of the epidemic in a specific area (Vianello et al., 2021, Willis et al., 2021, Mun and Geng, 2021, Al-qaness et al., 2021, Manenti et al., 2020, Hu et al., 2020, Cao et al., 2020, Mojjada et al., 2020, Yang et al., 2020); (4) to analyze the spatial-temporal characteristics of the epidemic in a specific area (Lv and Cheng, 2020, Feng et al., 2020); (5) to explore related factors which may affect the development of the epidemic (Hu et al., 2021); (6) to evaluate the effects of different epidemic prevention measures (Leung et al., 2020, Hasnain et al., 2020). In terms of the research purpose and content, the third, the fourth, and the fifth categories are more relevant to the work carried out in this paper. To complement medical actions to contrast the spread of infections such as COVID-19, Vianello et al. (2021) have carried out some significant works. They pointed out that tracing confirmed cases and predicting the local contagion dynamics through early indicators are crucial measures to a successful fight against emerging infectious diseases (EID). Then, based on the publicly available raw data on the spread of SARS-CoV-2 sourced from the database of the Italian Civil Protection Department, they proposed a model-free framework and applied Early Warning Detection Systems (EWDS) techniques to detect changes in the territorial spread of infections in the very early stages of onset. Further, two distinct EWDS approaches were adapted and applied to the current SARS-CoV-2 outbreak by them. Their experimental results show that the approaches can promptly generate warning signals and detect the onset of an epidemic at early surveillance stages even if working on the limited daily available, open-source data. Willis et al. (2021) aimed to demonstrate the effectiveness of using parameter regression methods to calibrate a SIRD model for COVID-19. The effective reproduction number response to NPIs (non-pharmaceutical interventions) is non-linear and variable in response rates, magnitude, and direction. During the experiments, they exploited the sophisticated parameter regression functionality of a commercial chemical engineering simulator with piecewise continuous integration, event and discontinuity management. Their main contribution is developing a strategy for calibrating and validating a model rather than presenting a fully optimized model or attempting to predict the future course of the COVID-19 pandemic. Considering that the assumption of the classic rate law central to the SIR compartmental models is not always true, Mun and Geng (2021) designed a modified mathematical model for non-first-order kinetics. Especially, they discuss two coefficients associated with the modified epidemic model: transmission rate constant k and transmission reaction order n. The experiments based on the observed data from 127 countries during the initial phase of the COVID-19 pandemic have validated their model's superiority because it can remove an implicit assumption on reaction order in the classic SIR compartmental models to be more general, flexible, and accurate. Al-qaness et al. (2021) propose a new short-term forecasting model using an enhanced version of the adaptive neuro-fuzzy inference system (ANFIS). An improved marine predators algorithm (MPA), called chaotic MPA (CMPA), is applied to enhance the ANFIS and avoid its shortcomings. Manenti et al. (2020) pointed out that there are analogies between the pandemic infection of SARS-CoV-2 and the behavior of chemical reactors. Based on this point, they modeled the virus spreading as a batch (i.e., an intrinsically dynamic chemical reactor), providing a phenomenological interpretation of data to monitor and predict the time evolution of the spreading process. Thanks to their studies, in reaction engineering terms, it is possible to distinguish four infection stages of epidemics/pandemics: the starting stage (infection outbreak), the early stage (infection transmission), the mature stage (infection mitigation), and the final stage (infection extinction). By the time they published this literature, the Hubei province has been in the final stage, while South Korea has just entered the mature stage. They claimed that each phase's kinetic parameters would be properly estimated once all the data and the related convergence paths are collected. Especially, the model is progressively improving the predictions every day to support all the countries affected by the SARS-CoV-2 pandemic to make decisions and organize supplies and human resources. Hu et al. (2020) propose a dynamic growth rate model to analyze the characteristics and trends of the global outbreak of COVID-19. The model is derived based on the ordinary differential equation for infectious diseases, and its generality was tested by using the epidemic data of COVID-19 in China. They utilize the model to predict the inflection points of countries facing serious outbreaks and forecast their future trends. Cao et al. (2020) established a COVID-19 SEIR transmission dynamics model, which took transmission ability in the latent period into consideration. Based on the epidemic data of Hubei province from January 23, 2020, to February 24, 2020, they fitted the parameters of the newly established modified SEIR model. Mojjada et al. (2020) commit to demonstrating the ability to predict the number of individuals affected by the COVID-19 as a potential threat to human beings by Machine Learning (ML) modeling. Their work shows that the Linear Regression (LR) effectively predicts new corona cases, death numbers, and recovery. Yang et al. (2020) use a modified susceptible-exposed-infected-removed (SEIR) epidemiological model that incorporates the domestic migration data before and after January 23 and the most recent COVID-19 epidemiological data to predict the epidemic progression. Further, they corroborate their model prediction using a machine-learning artificial intelligence (AI) approach trained on the 2003 SARS coronavirus outbreak data. Lv and Cheng (2020) use Crystal Ball and GIS software to explore the spatial and temporal characteristics of COVID-19 from January 25 to April 8 in Hubei Province, China, employing spatial autocorrelation. Feng et al. (2020) compare transmission paths, outbreaks timelines, and coping strategies of COVID-19 in China and the US based on the cumulative number of confirmed cases, number of confirmed cases per day, and cumulative number of deaths. To clarify the correlation between temperature and the COVID-19 pandemic in Hubei, Hu et al. (2021) collected daily newly confirmed COVID-19 cases and daily temperature for six cities in Hubei Province, assessed their correlations, and established regression models. They find that the government departments in areas where temperatures range between −3.9 and 16.5 °C and rise gradually must take more active measures to address the COVID-19 pandemic. In summary, researchers have carried out a lot of researches on COVID-19 from different research perspectives. The significance and contributions of these researches must be affirmed sufficiently. They provide a basis for humans to better understand COVID-19 and its impact, thus formulating more effective prevention and even cure measures. The Chinese government has successfully controlled COVID-19 in mainland China, and its people have resumed normal production, living, learning, and work. Therefore, this paper only conducts the curve-fitting of the number of confirmed cases based on its experimental data and does not further use the obtained fitting functions to predict the number of infections in the future. Unlike the existing work of analyzing the spatio-temporal characteristics of COVID-19 in specific areas, this paper explores the spatio-temporal characteristics of the epidemic situation at two different levels (provincial-level and civic-level) and tries to find out the correlation between the characteristics got at the two different levels. In addition, unlike only analyzing the correlation between the change of the epidemic situation and a specific factor (such as temperature), this paper explores the correlations among the spatio-temporal differences of the epidemic situation and the factors that people subconsciously think are related.

Data and methods

The fundamental experimental data in this paper is the number of confirmed cases in China. The data of each province in China can be obtained from the website of the National Health Commission of the People's Republic of China (N. H. C. of the People's Republic of China, 2021). Similarly, the data of each city in Hubei Province can be obtained from the website of the Health Commission of Hubei Province (H. C. of Hubei Province, 2021). The data about the possible indicators (Table 1 ), which may impact the number of confirmed cases in different regions, are collected from the statistical yearbooks of the corresponding provinces, cities, and the whole country. These statistical yearbooks were released in late November 2020. All kinds of statistical data in these yearbooks are cumulative values rather than real-time values.
Table 1

Eighteen indicators selected for correlation analysis.

Population related indicatorsEconomy related indicatorsGathering places related indicators
Total population sizeGross Domestic ProductNumber of legal entities
Number of permanent residentsProduction value of primary industryNumber of medical and health institutions
Number of employees at the end of the periodProduction value of secondary industryNumber of industrial enterprises
Number of students at the end of 2019Production value of the tertiary industryNumber of schools
Passenger traffic volumePer capita consumption expenditure of urban residentsTotal number of medical institutions, enterprises, and schools
Passenger traffic turnoverPer capita consumption expenditure of rural residents
Permanent population density
Eighteen indicators selected for correlation analysis. The usage of the experimental data and the research contents of this paper are shown in Fig. 1 .
Fig. 1

Usage of the data and research contents.

Usage of the data and research contents. As shown in Fig. 1, the spatial-temporal differences in the number of confirmed cases at the provincial-level and civic-level are analyzed. Besides that, the curve-fitting on the daily-change tendencies of the number of confirmed cases in mainland China and Hubei Province are performed.

Classification and evaluation

The Natural Breaks method is adopted to conduct the classification work to discover and compare the distribution differences of the number of confirmed cases in different regions more intuitively. The Coefficient of Variation (CV) is used for evaluating the changes in the level of different regions in different months.

Natural Breaks method

The Natural Breaks method (JGF, 1967) is a statistical classification method based on the numerical statistical distribution. It can maximize the differences among different classes. There are some natural turning points and characteristic points in any statistical series. These points can be used to divide the research objects into groups with similar properties. Therefore, the breakpoints themselves are good boundaries for classification. To find the breakpoints, it needs to calculate the value of GVF (Goodness of Variance Fit) according to Eq. (1). In Eq. (1), k stands for the number of categories, z denotes the ith element in the jth group, and represents the mean value of all elements in the jth group; N is the number of samples, z is the ith element in the sample, and is the mean value of all samples. SDAM and SDCM stand for the Sum of squared Deviations from the Array Mean, and the Sum of squared Deviations about Class Mean, respectively. Obviously, SDAM is a constant, while SDCM is related to the classification number k, and GVF  ∈ [0, 1]. GVF can be used to compare the classification effects of different methods under the same number of classes and compare that of the same method under different classification numbers. Usually, the classification result corresponding to the maximum GVF value will be selected. Suppose that, at this time, the statistical series is divided into , where n is the size of the ith category and . Then, the elements can be viewed as the natural breakpoints of the original series. It should be noted that the index of each element in the classification result is exactly the same as that in the original series.

Coefficient of Variation (CV)

The CV is a statistic that measures the variation degree of each observation in the data. It has no dimensions, making it possible to compare the dispersion degree of two data sets objectively. Like range, standard deviation, and variance, CV is an absolute value reflecting the dispersion degree of data. The magnitude of its value is affected by the dispersion degree and the average level of the variable. Eq. (2) can be used to calculate the value of CV.In Eq. (2), x  ∈  X where X is the values of a specific property of an object in different situations. is the mean of all the elements in the set X, while n is the number of elements in X. In general, the higher the average level of the variable value, the larger the measurement value of its dispersion. In statistical analysis, if the CV value of a group of data is greater than 15% then the data may be considered as abnormal.

Curve-fitting method

In this paper, three kinds of methods are adopted to conduct the curve-fitting. They are Logistic Growth Model (LGM), Polynomial fitting method, and Fully Connected Neural Network (FCNN), respectively. Further, the goodness of fit (R 2) index is used to evaluate the fitting effect quantitatively.

Logistic Growth Model (LGM)

LGM is often used to model data from population, biological population growth, economic indicators, and other fields. Unlike the exponential model, LGM will reduce the growth rate when it grows to a particular stage until it reaches a specific maximum value. In addition, it is widely used in complex system dynamics, such as growth limits, social competition, and macroeconomic forecasting. During SARS in 2003, some scholars used LGM to make predictions (Huang et al., 2003, Ang, 2004). The mathematical expression of LGM is shown as Eq. (3). In Eq. (3), k is the upper limit of population size, while the value of a reflects the growth rate. b is the inflection point where the ascent speed reaches the highest and then slows down.

Polynomial fitting

Suppose the polynomial obtained by fitting is f(x) =  p 0 x  +  p 1 x  +  p 2 x  +  p 3 x  + ⋯ +  p , the difference between the fitting function and the actual result could be defined as . The purpose of the polynomial fitting is to find a set of {p 0, p 1, …, p } to make the fitting result as consistent with the actual sample data as possible. It also means minimizing the value of loss. The {p 0, p 1, …, p } is the coefficients of each term in the polynomial f.

Fully Connected Neural Network (FCNN)

FCNN is a kind of neural network with one input layer, one output layer, and m (m  ≥ 1) hidden layers. The neurons in the same layer are not connected with each other, while each neuron in the previous layer is connected with all neurons in the next layer. The structure of FCNN is shown in Fig. 2 .
Fig. 2

Fully Connected Neural Network.

Fully Connected Neural Network. In FCNN, all input information received by the neuron in the previous hidden layer is processed by a linear integration and an activation function. The processing result will be used as the input of the neurons connected to it in the next hidden layer. In the same way, the information received by the neuron in the last hidden layer undergoes the same processing as the input of the neuron connected to it in the output layer. Some commonly used activation functions are shown in Table 2 .
Table 2

Activation functions.

FunctionMathematical expression
Sigmoidf(z)=11+ez
tanhtanh(x)=exexex+ex
ReLURelu = max(0, x)
Leaky ReLU(PReLU)f(x)=xx0αxx<0
ELUf(x)=xx0α(ex1)x<0
Softsignf(x)=x1+|x|
SoftPlusf(x) = ln(1 + ex)
Maxoutf(x)=max(w1Tx+b1,w2Tx+b2,,wnTx+bn)
Activation functions. If m  > 2, the FCNN can be considered as a DNN (Deep Neural Networks). The nonlinear fitting capability of DNN is powerful and can fit almost any function.

Fitting capacity estimate (R2)

The goodness of fit refers to how well the regression line fits the observations. The statistic that measures the goodness of fit is the coefficient of determination (R 2  ∈ [0, 1])), according to Eq. (4). Where RSS is the abbreviation of ‘Residual Sum of Squares’ while TSS is that of ‘Total Sum of Squares’. m is the number of samples, while y and are the true output and predicted output of the ith sample, respectively. is the mean value of all y (i  = 1, 2, …, m). The larger the value of R 2, the better the fitting effect.

Experiments and analysis

This section first analyzes COVID-19's spatial-temporal characteristics in China from January 16, 2020, to July 31, 2020, is conducted. Then, the possible impact indicators that may cause these spatial-temporal differences are explored. Finally, the fitting effects of the daily-change tendency of the number of confirmed cases obtained using the three kinds of methods are compared and evaluated.

Temporal differences analysis

The actual change curves of the number of confirmed cases in mainland China and Hubei Province over time are shown in Fig. 3 (a) and (b), respectively.
Fig. 3

The daily-change tendency of the number of confirmed cases.

The daily-change tendency of the number of confirmed cases. As the first city in China to report and appear the confirmed cases, Wuhan city has taken many effective measures to control the spread of the epidemic, such as sealing off the city from all outside contact. These effective isolation and prevention measures make the epidemic development and change tendency of Wuhan City directly determine that of Hubei Province and the entire country. The most direct evidence for this conclusion is that the correlation coefficient of the two change curves in Fig. 3(a) and (b) is approximately 99.78%. The epidemic variations during the study period of this paper can be divided into three stages. Early-stage of the epidemic (before January 22, 2020): During this period, the local government did not do any intervention, and the people lived normally. The number of infected people is small, so the infection rate is much lower than that in the outbreak period. In addition, as the people know very little about the virus, both the confirmed rate and admission rate of hospitals at this stage are lower. Outbreak period (from January 23, 2020, to February 12, 2020): The people have a certain understanding of the virus, but the infection rate has risen to the highest because of the increase in the number of infected people. At this stage, the local government stepped up intervention to control population movement. Especially, Wuhan city sealed off itself from all outside contact to limit the spread of the epidemic on January 23, 2020. Besides that, Huoshenshan, Leishenshan, and Fangcang shelter hospitals were established to treat patients successively, increasing the confirmed rate and admission rate. Huoshenshan hospital and Leishenshan hospital were put into operation on February 3, 2020, and February 6, 2020. Under the unified command and dispatch of the Chinese government, the lower-level governments nationwide supported Hubei Province actively. They sent the residents’ daily necessities to Hubei Province and, more importantly, provided them numerous medical workers and medical supplies. Stable period (after February 13, 2020): During this period, the number of confirmed cases first rose sharply, then the growth slowed and gradually stagnated. Something that needs to be explained is that the sharp increase is not caused by the out-of-control of the epidemic but the revision of the confirmed rule on February 13, 2020. Under the new rules, the data of clinical diagnosis was included.

Spatial differences analysis

This section analyzes the spatial differences among all the provinces in mainland China and all the cities of Hubei Province.

Spatial differences at the provincial-level

The Natural Breaks method is adopted to conduct the classification based on the number of confirmed cases of each province in mainland China at the end of each month. The results are shown in Fig. 4 .
Fig. 4

Classification results (provinces in Mainland China).

Classification results (provinces in Mainland China). As depicted in Fig. 4, all the provinces are divided into six levels. With the only exception represented by Wuhan, it is possible to state that the core areas of the epidemic are first mainly located in Hubei Province's direct neighboring provinces (Henan, Anhui, Zhejiang, Jiangxi, and Hunan Province) and one of its indirect neighboring provinces (Guangdong Province). Then, due to the impact of imported cases, the number of confirmed cases in Heilongjiang Province and Beijing increased significantly and became high-risk areas. To quantitatively evaluate and compare the changes in the levels of each province in different months, the variation coefficients of each province are calculated. The results are shown in Table 3 . A smaller coefficient means minor volatility.
Table 3

Levels and variation coefficients.

The numbers (1–6) correspond to the six levels in Fig. 4.

A smaller level number means fewer confirmed cases.

Levels and variation coefficients. The numbers (1–6) correspond to the six levels in Fig. 4. A smaller level number means fewer confirmed cases. According to the classification results, most provinces have less volatility in their grades, which is reflected in their small variation coefficients, and even 0. The reason for some provinces with relatively higher variation coefficients, such as Shanxi Province, Ningxia Province, Gansu Province, and Inner Mongolia, mainly due to their confirmed number happen to be on the dividing line between the nth level and the (n  + 1)th level.

Spatial differences at the civic-level

The classification results based on the number of confirmed cases of each city in Hubei Province at the end of each month are shown in Fig. 5 (a)–(g).
Fig. 5

Classification results (cities in Hubei Province).

Classification results (cities in Hubei Province). Geographically, the high-risk areas of the epidemic in Hubei Province are mainly located in some northern cities with Wuhan city as the center, such as Huanggang, Xiaogan, Ezhou, Suizhou, and Xiangyang, and Jingzhou city in the south. Since the number of cities in Hubei Province is small, the changes in their classification results can be displayed intuitively and clearly in the form of a picture. The classification results are directly presented in Fig. 6 . Something that needs to be explained is that the ordinate values in Fig. 6 correspond to the six levels in Fig. 5. A smaller level number means fewer confirmed cases.
Fig. 6

Monthly classification results of cities in Hubei Province.

Monthly classification results of cities in Hubei Province. As seen in Fig. 6, the classification results of each city have basically not changed during the study period of this paper. It proves to a certain extent the rationality and effectiveness of the centralized isolation, community isolation, and home isolation measures adopted by local governments at all levels. These measures have effectively curbed the spread of the epidemic across regions.

Possible impact indicators analysis

The following eighteen possible impact indicators (Table 1) are selected for analyzing the correlation between them and the number of confirmed cases in each region as of July 31, 2020. Then, the correlation between the normalized number of confirmed cases in each region and the raw data about each indicator after normalization is analyzed. Besides that, this paper also analyzed the correlation between the ranking results based on the number of confirmed cases and that based on the raw data about each indicator. The normalization method used for the raw data is Min-Max Normalization, and the correlation analysis result is shown in Table 4 .
Table 4

Correlation analysis result.

IndicatorRaw data (provinces in mainland China)Ranking of raw data (provinces in mainland China)Raw data (cities in Hubei Province)Ranking of raw data (cities in Hubei Province)
• Total population size0.123.631**.611**.667**
• Number of permanent residents0.113.655**.766**.684**
• Number of employees at the end of 20190.113.652**.826**.588*
• Number of students at the end of 20190.060.594**.884**.640**
• Passenger traffic volume0.186.639**0.108.561*
• Passenger traffic turnover0.175.654**0.375.566*
• Permanent population density−0.029.591**.822**.515*



• Gross Domestic Product0.126.794**.948**.740**
• Production value of primary industry0.203.549**0.314.664**
• Production value of secondary industry0.133.728**.912**.716**
• Production value of tertiary industry0.103.779**.974**.716**
• Per capita consumption expenditure of urban residents−0.0100.335.751**0.328
• Per capita consumption expenditure of rural residents0.108.655**.516*0.306
• Number of legal entities0.071.735**.934**.613**



• Number of medical and health institutions0.039.469**.726**.561*
• Number of industrial enterprises0.058.717**.775**.789**
• Number of schools0.044.454*.723**.556*
• Total number of medical institutions, enterprises and schools0.058.721**.754**.605*

Correlation is significant at the 0.05 level (two-tailed).

Correlation is significant at the 0.01 level (two-tailed).

Correlation analysis result. Correlation is significant at the 0.05 level (two-tailed). Correlation is significant at the 0.01 level (two-tailed). From Table 4 it can be argued that: (1) At the provincial-level, the correlation between the normalized data about the number of confirmed cases in each province and that of the eighteen indicators are very low and even negative. This result seems to be somewhat contrary to people's subconscious. Because people subconsciously believe that the epidemic should be more severe in areas with a larger population base, higher population density, more frequent economic activities, and more numerous public places. (2) At the provincial-level, there is a high correlation between the ranking of the number of confirmed cases and that of most indicator data, especially that of the indicators related to economic activities. (3) In terms of Hubei Province, there is a high correlation between the normalized data of the number of confirmed cases and that of the eighteen indicators and between the ranking based on the number of confirmed cases and that on most indicator data. (4) The analysis results for the cities in Hubei Province are more consistent with people's potential understanding. Generally, the objective factors that people subconsciously think may impact the severity of the epidemic may only be limited to specific regions but not universal.

Curve-fitting on the daily-change tendency of the number of confirmed cases

Logistic Growth Model (LGM), Polynomial fitting method, and Fully Connected Neural Network (FCNN) are adopted to conduct the curve-fitting. Further, to quantitatively evaluate their fitting effects, the goodness of fit (R 2) is used as an evaluation indicator.

Curve-fitting with LGM

The initial values of the parameters a and b are set to 0.8 and 20, respectively. As long as a  < 1 and b  ≤  n where n is the total number of records, the model will eventually converge. Given the effective quarantine measures adopted in various places after the outbreak of COVID-19, the upper limit of the number of confirmed cases is set to the total population of the local area at the end of 2019. Thus, the initial values of k are set to 59,170,000 and 1,393,444,300 for Hubei Province and mainland China, respectively. Then, the Least Square method is adopted to solve the parameters (k, a, and b) in model fitting. The results are as follows: [k, a, b]  = [6.80112920e+04, 2.39021824e−01, 2.53400625e+01], R 2  = 0.998014247506507 [k, a, b]  = [8.40794961e+04, 2.05637106e−01, 2.48323666e+01], R 2  = 0.9945165287399441 Whether seen from the fitting effect (Fig. 7 ) or the values of R 2, it can be found that the fitting effect of LGM on the daily-change tendency of the number of confirmed cases in Hubei Province is better than that on mainland China.
Fig. 7

Curve-fitting with LGM.

Curve-fitting with LGM.

Curve-fitting with polynomial

To make this method be comparable with the LGM, the experiment in this section is devoted to obtaining the polynomial with its R 2 is approximated to that of the LGM. The polynomials corresponding to the different highest coefficients are fitted, and the R 2's values in each case are calculated. The calculation results are shown in Table 5 .
Table 5

R2 at different degrees.

DegreeRHubei_Province2RMainland_China2
10.4003665997880.452853587813
20.7492532859450.772317836082
30.9112463721030.927012116056
40.9393314433500.952318498027
50.9401600556280.952334275190
60.9586525938970.964814578369
70.9799299198070.982065975157
80.9876643295040.991004480261
90.9877775483380.991527853218
100.9897597810300.992636504842
110.9934186793820.995346399411
120.9953504660590.997015114885
130.9955161427660.997197513532
140.9957473354060.997295947490
150.9965672225520.997746582029
160.9971544684510.998111106709
170.9972161507580.998166782229
180.9973015243550.998211877454
190.9972660199580.998194104144

Bold values indicate value decreases.

R2 at different degrees. Bold values indicate value decreases. From Table 5 it can be found that the R 2's value always increases when Degree  ≤ 18, and then decreases slightly. Although the 's value of Polynomial(Degree  = 11) is approximate to that obtained by the LGM, their 's values are quite different. After comprehensive consideration of the value of and , the ultimate value of Degree is set to 18. In this situation, the R 2's values of Polynomial(Degree  = 18) are approximate to that obtained by the LGM. The coefficient vectors of the polynomials at this point are denoted as coff and coff , respectively. Their values are shown as follow: coff  = [−4.42545158e−29, 7.77937554e−26, −6.23444032e−23, 3.01223934e−20, −9.77294380e−18, 2.24263459e−15, −3.72907818e−13, 4.51451769e−11, −3.91520035e−09, 2.30452026e−07, −7.68334612e−06, −5.42396144e−06, 1.44692006e−02, −7.33550444e−01, 1.84152116e+01, −2.47610756e+02, 1.72385463e+03, −5.23546704e+03, 4.72510098e+03] coff  = [−3.89744755e−29, 6.80077486e−26, −5.40274975e−23, 2.58289771e−20, −8.26982005e−18, 1.86530921e−15, −3.02904962e−13, 3.53982227e−11, −2.89191283e−09, 1.49845886e−07, −2.99393226e−06, −2.00608200e−04, 1.99285238e−02, −8.21990448e−01, 1.88207932e+01, −2.38007353e+02, 1.60968786e+03, −4.84194940e+03, 4.36718838e+03] The fitting effects are shown in Fig. 8 .
Fig. 8

Curve-fitting with polynomial (Degree = 18).

Curve-fitting with polynomial (Degree = 18). Whether seen from the fitting effect (Fig. 8) or the values of R 2 (Table 5) at the same Degree, it can be found that the fitting effect of Polynomial on the daily-change tendency of the number of confirmed cases in mainland China is better than that on Hubei Province.

Curve-fitting with FCNN

Three fully connected neural networks respectively with a single hidden layer, double hidden layers, and three hidden layers are constructed. Each hidden layer is composed of ten functional neurons. The Sigmoid function is used as the activation function between a previous hidden layer and the next hidden layer and between the last hidden layer and the output layer. Similarly, to make the different methods comparable, the value of R 2 obtained by the Polynomial fitting method is used as a benchmark to determine the number of iterations of the neural networks. The fitting effects are shown in Fig. 9 .
Fig. 9

Curve-fitting with FCNN.

Curve-fitting with FCNN. The fitting results can also explain to a certain extent that the neural network can fit any function theoretically.

Comparison of the fitting effects of three fitting methods

The three kinds of methods on the data about mainland China and Hubei Province are run ten times. For each time, their running times and R 2's values are recorded. Finally, the average running time and R 2’value of each method on the experimental data of this paper are calculated. The results are shown in Table 6, Table 7, Table 8, Table 9 .
Table 6

Time costs and R2's values of the LGM and Polynomial(Degree = 18) (Mainland China).

RoundLogistic Growth Model
Polynomial(Degree = 18)
time cost (s)R2time cost (s)R2
10.0070004463200.9945165287400.0020000934600.998211877454
20.0050001144410.9945165287400.0010004043580.998211877454
30.0050003528590.9945165287400.000999927520.998211877454
40.0050001144410.9945165287400.0009999275210.998211877454
50.0050003528590.9945165287400.0009999275210.998211877454
60.0040001869200.9945165287400.0020003318790.998211877454
70.0039999485020.9945165287400.0020000934600.998211877454
80.0040001869200.9945165287400.0009999275210.998211877454
90.0040001869200.9945165287400.0010001659390.998211877454
100.0050003528590.9945165287400.0009999275210.998211877454
Average0.0048002243040.9945165287400.0013000726700.998211877454
Table 7

Time costs and R2's values of the FCNN with different hidden layers (Mainland China).

RoundOne hidden layer
Two hidden layers
Three hidden layers
time cost (s)R2iterative timestime cost (s)R2iterative timestime cost (s)R2iterative times
10.8730497360230.998115977847462.7301561832430.998123776566425.7823309898380.99817621107635
20.5200297832490.998261336223315.8273332118990.998109312727937.6184358596800.99816798227755
31.3160753250120.998123336764703.4341962337490.9981594770105410.2385857105260.99815192175674
41.0640609264370.998132438293592.1651239395140.998143810067334.3462483882900.99856662811830
52.1391222476960.998258420245894.0452313423160.998155315391618.3434772491460.99814251757255
61.1110634803770.998138773732614.0612323284150.998392913280564.9712846279140.99819441899636
70.6070349216460.998246457218342.5321447849270.998315240227375.7673301696780.99813108430440
80.7960457801820.998340484412495.2963027954100.998196478370765.8983373641970.99817838777442
90.1950109004970.998828843522123.1251785755160.999021452499507.2444143295290.99817398409451
101.1430652141570.998292203336601.7531001567840.997977014455288.0954630374910.99813985242155
Average0.9764558315280.998273827159513.4969999551770.998259479059536.8305907726290.99820229883947
Table 8

Time costs and R2's values of the LGM and Polynomial(Degree = 18) (Hubei Province).

RoundLogistic Growth Model
Polynomial(Degree = 18)
time cost (s)R2time cost (s)R2
10.0050001144410.9980142475070.0040001869200.997301524355
20.0040001869200.9980142475070.0010001659390.997301524355
30.0060002803800.9980142475070.0010001659390.997301524355
40.0060002803800.9980142475070.0010001659390.997301524355
50.0060005187990.9980142475070.0010001659390.997301524355
60.0050001144410.9980142475070.0009999275210.997301524355
70.0050001144410.9980142475070.0009999275210.997301524355
80.0030002593990.9980142475070.0010001659390.997301524355
90.0040004253390.9980142475070.0009999275210.997301524355
100.0040001869200.9980142475070.0010001659390.997301524355
Average0.0048002481460.9980142475070.0013000965120.997301524355
Table 9

Time costs and R2's values of the FCNN with different hidden layers (Hubei Province).

RoundOne hidden layer
Two hidden layers
Three hidden layers
time cost (s)R2iterative timestime cost (s)R2iterative timestime cost (s)R2iterative times
11.0150582790370.997261077554544.8852794170380.997264302890626.4043664932250.99784387162145
21.5930912494660.997329635866832.0441169738770.997247814956325.2212986946110.99738948550532
31.6480944156650.997728075733884.3032460212710.997494009135694.4072520732880.99724361745629
40.7800445556640.997264757155422.6231498718260.997599630230427.5184302330020.99729662053448
50.8660495281220.997286638882471.4780843257900.997298756493182.7291560173030.99750567059418
60.8880505561830.997315886077463.9472255706790.997601446457488.1784675121310.99725597956459
70.7640438079830.997259507986414.9552836418150.997289058655766.9903995990750.99725182988848
80.5390307903290.997326885294322.2921311855320.9972545751253814.4358255863190.997385497122104
91.1110634803770.997489166385623.2411854267120.997222436457466.7523863315580.99725021643945
100.8980512619020.997503101089492.9781703948970.997381075836417.4314248561860.99725158847553
Average1.0102577924730.997376473202543.2747872829440.997365310623477.0069007396700.99736743772048
Time costs and R2's values of the LGM and Polynomial(Degree = 18) (Mainland China). Time costs and R2's values of the FCNN with different hidden layers (Mainland China). Time costs and R2's values of the LGM and Polynomial(Degree = 18) (Hubei Province). Time costs and R2's values of the FCNN with different hidden layers (Hubei Province). Comparing the three methods based on the values of time  _  cost and R 2 in Table 5, Table 6, Table 7, Table 8, Table 9, the following conclusions can be drawn: A comprehensive comparison of Tables 5, 6, and 8 shows that the LGM is better than the Polynomial models with Degree  < 11 in accuracy. It can be concluded from Tables 6 and 8 that the LGM and the Polynomial(Degree  = 18) exhibit comparable computational performances. The two methods have time costs of the same order of magnitude and very close accuracy. More specifically, the time cost of the LGM (approximately 5 ms) is slightly higher than that of the Polynomial(Degree  = 18) (approximately 1.3 ms). In terms of accuracy, the LGM is somewhat superior to the Polynomial(Degree  = 18) in the curve-fitting for Hubei Province. However, in the curve-fitting for mainland China, the Polynomial(Degree  = 18) is marginally better. If one of the two methods has to be chosen for the fitting work, the Polynomial(Degree  = 18) is recommended in this paper, considering time cost and accuracy synthetically. A comprehensive comparison of Table 6, Table 7, Table 8, Table 9 shows that to achieve a similar accuracy with Polynomial(Degree  = 18), the time cost of FCNN is at least 750 to 780 times that of the Polynomial(Degree  = 18). Furthermore, the time cost of the FCNN increases with the increase of the number of hidden layers. As mentioned in (ii), the Polynomial(Degree  = 18) is recommended to conduct the fitting work in this paper. When using the Polynomial-fitting method, it is necessary to pay attention to the under-fitting and over-fitting issues. The under-fitting issue is usually caused by too few feature dimensions or a simplistic model. It can be easily solved by adding feature items and increasing the complexity of the model. On the contrary, the over-fitting issue is usually caused by too many feature dimensions, overly complex model assumptions, too many parameters, too little training data, and too much noise. This issue will lead to instability and oscillation in the profile. To solve this issue in the Polynomial-fitting, some solutions can be considered: (1) to add training data sample; (2) to introduce regularization; (3) to use cross-validation; (4) to make a more robust data regression using sigmoidal function and assign different weights to different steady-state points; (5) to evaluate the impact of polynomial fitting as a function of function order; indeed, oscillations are not feasible once a stable condition is reached; (6) to refer to some other model calibration methods, such as (Willis et al., 2021). In terms of the fitting method and the amount of experimental data adopted in this paper, introducing regularization is preferred. The so-called ‘regularization’ introduces L1-norm or L2-norm of the parameter vector into the original loss function. The L1-norm and L2-norm are denoted as and , respectively. Compared with the L1-norm, L2-norm is more popular. The new loss-function with introduced L2-norm can be described as . The vector is the coefficients of each term in the polynomial f. Then, the over-fitting issue can be improved by adjusting the value of λ. Lukas (2008) provides an effective way to get an appropriate value for λ.

Conclusions

COVID has caused many adverse effects on human production, life, and health, and even threatened human life. It is challenging to predict the trend of the COVID-19 epidemic accurately: (1) People's understanding of this virus is not comprehensive enough, and its variants continue to appear; (2) Although many prevention measures have been proven effective, it is difficult to evaluate the effectiveness of specific epidemic prevention measures quantitatively; (3) It is hard to achieve absolute isolation among individuals and among regions. In the battle against COVID-19, human beings are still in the passive defense stage. However, it should be firmly believed that COVID-19 will be soundly defeated. Since many researchers have been carrying out a lot of works on it from different perspectives. Their hard work and significant research achievements provide us with more and more professional knowledge, effective prevention measures (e.g., Leung et al., 2020), and excellent mathematical analysis or prevention models (e.g., Vianello et al., 2021). The research results in this paper prove to a certain extent the effectiveness of the epidemic prevention measures adopted by the governments at all levels in mainland China. The measures are worth learning. It should be pointed out that it will be a more scientific and accurate way to collect the data about the relevant indicators in the same temporal interval with that about the number of the confirmed cases in this paper. However, the data about the relevant indicators are not released in real-time on the official websites of corresponding departments in mainland China. Although many of these data are recorded in real-time or regularly, only their owners or public security organizations have the right to access them. As an alternative, this paper can only get them from the statistical yearbooks.

Declaration of Competing Interest

The authors report no declarations of interest.
  15 in total

1.  COVID-19 pandemic and its impact on mental health of healthcare professionals.

Authors:  Konstantinos Tsamakis; Emmanouil Rizos; Athanasios J Manolis; Sofia Chaidou; Stylianos Kympouropoulos; Eleftherios Spartalis; Demetrios A Spandidos; Dimitrios Tsiptsios; Andreas S Triantafyllis
Journal:  Exp Ther Med       Date:  2020-04-07       Impact factor: 2.447

2.  First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment.

Authors:  Kathy Leung; Joseph T Wu; Di Liu; Gabriel M Leung
Journal:  Lancet       Date:  2020-04-08       Impact factor: 79.321

3.  COVID-19: Mechanistic model calibration subject to active and varying non-pharmaceutical interventions.

Authors:  Mark J Willis; Allen Wright; Victoria Bramfitt; Victor Hugo Grisales Díaz
Journal:  Chem Eng Sci       Date:  2020-11-26       Impact factor: 4.311

4.  Impacts of COVID-19 pandemic on user behaviors and environmental benefits of bike sharing: A big-data analysis.

Authors:  Wen-Long Shang; Jinyu Chen; Huibo Bi; Yi Sui; Yanyan Chen; Haitao Yu
Journal:  Appl Energy       Date:  2021-01-17       Impact factor: 9.746

Review 5.  Combined measures to control the COVID-19 pandemic in Wuhan, Hubei, China: A narrative review.

Authors:  Muhammad Hasnain; Muhammad Fermi Pasha; Imran Ghani
Journal:  J Biosaf Biosecur       Date:  2020-11-02

6.  Correlation Between Local Air Temperature and the COVID-19 Pandemic in Hubei, China.

Authors:  Cheng-Yi Hu; Lu-Shan Xiao; Hong-Bo Zhu; Hong Zhu; Li Liu
Journal:  Front Public Health       Date:  2021-01-18

7.  An epidemic model for non-first-order transmission kinetics.

Authors:  Eun-Young Mun; Feng Geng
Journal:  PLoS One       Date:  2021-03-11       Impact factor: 3.240

8.  The impact of COVID-19 on stock market performance in Africa: A Bayesian structural time series approach.

Authors:  Paul Owusu Takyi; Isaac Bentum-Ennin
Journal:  J Econ Bus       Date:  2020-12-08

9.  Efficient artificial intelligence forecasting models for COVID-19 outbreak in Russia and Brazil.

Authors:  Mohammed A A Al-Qaness; Amal I Saba; Ammar H Elsheikh; Mohamed Abd Elaziz; Rehab Ali Ibrahim; Songfeng Lu; Ahmed Abdelmonem Hemedan; S Shanmugan; Ahmed A Ewees
Journal:  Process Saf Environ Prot       Date:  2020-11-13       Impact factor: 6.158

View more
  4 in total

1.  Safety, environmental and risk management related to Covid-19.

Authors:  Bruno Fabiano; Mark Hailwood; Philip Thomas
Journal:  Process Saf Environ Prot       Date:  2022-02-17       Impact factor: 7.926

2.  Cumulative effects of air pollution and climate drivers on COVID-19 multiwaves in Bucharest, Romania.

Authors:  Maria A Zoran; Roxana S Savastru; Dan M Savastru; Marina N Tautan
Journal:  Process Saf Environ Prot       Date:  2022-08-20       Impact factor: 7.926

3.  Assessing the impact of air pollution and climate seasonality on COVID-19 multiwaves in Madrid, Spain.

Authors:  Maria A Zoran; Roxana S Savastru; Dan M Savastru; Marina N Tautan; Laurentiu A Baschir; Daniel V Tenciu
Journal:  Environ Res       Date:  2021-08-06       Impact factor: 8.431

4.  Deep learning model for forecasting COVID-19 outbreak in Egypt.

Authors:  Mohamed Marzouk; Nehal Elshaboury; Amr Abdel-Latif; Shimaa Azab
Journal:  Process Saf Environ Prot       Date:  2021-07-24       Impact factor: 6.158

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.