Literature DB >> 32376306

Logistic growth modelling of COVID-19 proliferation in China and its international implications.

Christopher Y Shen1.   

Abstract

OBJECTIVE: As the coronavirus disease 2019 (COVID-19) pandemic continues to proliferate globally, this paper shares the findings of modelling the outbreak in China at both provincial and national levels. This paper examines the applicability of the logistic growth model, with implications for the study of the COVID-19 pandemic and other infectious diseases.
METHODS: An NLS (Non-Linear Least Squares) method was employed to estimate the parameters of a differentiated logistic growth function using new daily COVID-19 cases in multiple regions in China and in other selected countries. The estimation was based upon training data from January 20, 2020 to March 13, 2020. A restriction test was subsequently implemented to examine whether a designated parameter was identical among regions or countries, and the diagnosis of residuals was also conducted. The model's goodness of fit was checked using testing data from March 14, 2020 to April 18, 2020.
RESULTS: The model presented in this paper fitted time-series data exceedingly well for the whole of China, its eleven selected provinces and municipalities, and two other countries - South Korea and Iran - and provided estimates of key parameters. This study rejected the null hypothesis that the growth rates of outbreaks were the same among ten selected non-Hubei provinces in China, as well as between South Korea and Iran. The study found that the model did not provide reliable estimates for countries that were in the early stages of outbreaks. Furthermore, this study concured that the R2 values might vary and mislead when compared between different portions of the same non-linear curve. In addition, the study identified the existence of heteroskedasticity and positive serial correlation within residuals in some provinces and countries.
CONCLUSIONS: The findings suggest that there is potential for this model to contribute to better public health policy in combatting COVID-19. The model does so by providing a simple logistic framework for retrospectively analyzing outbreaks in regions that have already experienced a maximal proliferation in cases. Based upon statistical findings, this study also outlines certain challenges in modelling and their implications for the results.
Copyright © 2020 The Author(s). Published by Elsevier Ltd.. All rights reserved.

Entities:  

Keywords:  COVID-19; China; Logistic growth model; Non-linear least squares

Mesh:

Year:  2020        PMID: 32376306      PMCID: PMC7196547          DOI: 10.1016/j.ijid.2020.04.085

Source DB:  PubMed          Journal:  Int J Infect Dis        ISSN: 1201-9712            Impact factor:   3.623


Introduction

An outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a zoonotic coronavirus similar to severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome-related coronavirus (MERS-CoV), has rapidly spread across China and various regions of the world. As of April 17, 2020, the cumulative numbers of confirmed cases had reached 82 719 in China (NHCPRC, 2020) and 2 074 529 in 210 countries and territories worldwide (World Health Organization, 2020). In light of these recent developments, the scientific community has sought understanding of coronavirus disease 2019 (COVID-19), the disease caused by SARS-CoV-2, and many have undertaken statistical and modelling approaches. The R 0 value for virus transmissibility has been evaluated through stochastic Markov chain Monte Carlo (MCMC) methods (Wu et al., 2020a), a mathematical incidence decay and exponential adjustment (IDEA) model (Majumder and Mandl, 2020), and a statistical exponential growth model adopting the serial interval from severe acute respiratory syndrome (SARS) (Zhao et al., 2020). Researchers have also utilized several models to generate short-term forecasts for cumulative case counts (Roosa et al., 2020), and have developed a ‘susceptible, un-quarantined infected, quarantined infected, confirmed infected’ (SUQC) model to characterize the dynamics of outbreaks (Zhao and Chen, 2020). This study applied a logistic growth function with parameters estimated by a non-linear least squares (NLS) method to model and analyze time-series data from eleven provinces and municipalities in China (Anhui, Beijing, Chongqing, Guangdong, Henan, Hubei, Hunan, Jiangsu, Jiangxi, Shanghai, and Zhejiang) and nine other countries (Iran, South Korea, France, Germany, the U.S.A., Italy, Spain, Singapore and Japan). The implications of the results for the study of infectious diseases are discussed.

Methods

Devised by Belgian mathematician Pierre-François Verhulst (1804–1849) and corroborated by others in later years, the logistic function has become one of the essential tools for bio-assays and has been increasingly applied in a variety of fields, including statistics, economics, and epidemiology (Cramer, 2004). Specifically, it has been used to model population growth in a region and bacterial growth in a broth, and has been implemented in binary decision-making processes in economics and finance. The equation of the logistic function, following a common sigmoid curve, takes the mathematical formwhere P(t), or the number of cumulative cases of COVID-19, is expressed as a function of time, t, with parameters . To be exact, K represents carrying capacity, P 0 represents the initial value of the function at t  = 0, and r represents the growth rate, or the speed of proliferation. This study did not, however, use a logistic function to directly estimate a model for cumulative cases. Previous studies have suggested that fitting deterministic models to cumulative cases, due to serial correlation in the error terms (measurement errors), creates biased parameters and overfitting of the model to data, and underestimates the uncertainty associated with parameters (King et al., 2015). Therefore, the derivative of the logistic growth function was adopted to model the daily new cases. The general differential logistic equation takes the following form: Substituting Equation (1) into Equation (2) gives the logistic differential equation Specifically, the number of observed daily new cases, I(t), is equal to f(t,β) plus an error term ε, as shown in the statistical model below, where t  = 1, 2, …T, and T is the number of observations. Equation (4) is the key equation for modelling time-series data. The study assumed that the error terms were independent and identically distributed (i.i.d.), and used the NLS method to estimate β by minimizing the residual sum of squares, . After estimation, this study set , or, as the predicted value at a given t using estimated parameters . The residual is defined as . The study defined , and , representing the total sum of squares, regression sum of squares, and the residual (error) sum of squares, respectively. According to the statistical identity TSS  =  RSS  +  ESS (Pindyck and Rubinfeld, 1991), the coefficient of determination is expressed as The study calculated the F-statistic value through , given that n is the number of parameters, or three, T is the number of observations, and that F asymptotically follows an F-distribution with n and (T–n) degrees of freedom. This study established matrix X as equivalent to the partial derivative of f(t·β) or such that . The estimate of the asymptotic covariance matrix of β is (Greene, 1997), wherein MSE is the estimate of the residual variance, equal to . The confidence interval of is determined by equatingwherein α is the significance level, substituted with 5% when calculating the 95% confidence interval (95% CI).1 In order to test certain restrictions upon parameters β, this study compared the ESS of a free model (ESS ) to the ESS of a restricted model (ESS ) and calculated a test static F with q constraints on β, where The test asymptotically follows an F-distribution with q and (T–n) degrees of freedom (Schabenberger and Pierce, 2002). Estimation was completed using the SAS software package. The method of optimization utilized a Gauss–Newton algorithm, which is advantageous as it does not require a second derivative and converged quickly with this study's estimations. The Gauss–Newton algorithm iteratively finds the values of parameters β that minimize the sum of squares of the residuals. It starts from an initial estimate β (0) and proceeds by iterations expressed in terms of and β, which are column matrices, and in terms of s, which is the iteration step during the optimization process. The daily times-series data of cumulative COVID-19 cases from January 20, 2020 to April 18, 2020 were retrieved from the National Health Commission of the People's Republic of China and its respective health commissions in the selected eleven provinces and municipalities. Time-series data for the nine other countries up until April 18, 2020 were obtained through WIND DATA, a leading financial data services provider in China. This study took measures to adjust the time-series data for cumulative cases in China. It should be noted that on February 12, 2020, clinical evidence and radiographic confirmation were introduced into the diagnosis guidelines for new cases, causing a jump of nearly 15 000 new cases in Hubei, China.2 On April 16, 2020, health officials announced a one-time re-adjustment of the number of cumulative cases in Wuhan, Hubei — an increase by 325 cases — which were originally omitted from the public amidst the epidemic. For data consistency, this study removed all cases that were added on those two days. To further ensure consistency, this study removed all confirmed cases that were imported from abroad (1575 cases nationally, including 741 cases in many of the ten non-Hubei provinces of China). In addition, 305 cases in the prison system reported on February 20, 2020 (271 in Hubei and 34 in Zhejiang) where the spread was relatively independent and not within the coverage of the provisional health authorities were also removed (Wu et al., 2020b). The time-series data for daily new cases in China, derived from cumulative cases, were split into two time periods for training and testing. The study fit a logistic growth model for time-series data from January 20, 2020 up until March 13, 2020 (defined as training data), when the manuscript was originally written. It estimated all parameters and related statistics of the model based upon the training data only. The estimated model was then fitted to the time-series data from March 14, 2020 to April 18, 2020 (defined as testing data). The regression model took the assumption of i.i.d. error terms. This study performed several residual diagnosis tests to check whether such an assumption was appropriate for the fitted errors or not. Previous papers have found that using the raw residuals, w(t), of non-linear models for diagnosis may be misleading, as they may have non-zero means and different variances. These papers have, consequently, suggested that the use of alternative residuals, referred to as projected residuals, may overcome many of the shortcomings of the raw residuals (Cook and Tsai, 1985). To test whether the error terms in the regression model were independent as per the suggestion by Cook and Tsai, a Durbin–Watson (DW) statistic (Durbin and Watson, 1950) was employed for both the raw residuals and projected residuals.3 Similarly, to test for homoskedasticity, this study carried out the White test (White, 1980) for both types of residuals.4

Results and discussion

Equation (4) was first fitted for the training data for the whole of China, as well as for Hubei Province and non-Hubei provinces; the results are displayed in Table 1 and Fig. 1 . The same model was then fitted for training data from ten selected non-Hubei provinces and municipalities, with the results shown in Table 2 and Fig. 2 . The model was subsequently fitted for the training data from the nine other countries as well, for which the results are shown in Table 3 and Fig. 3 . After the parameters of the model had been estimated, they were fitted in the corresponding testing data, and the results are illustrated in Tables 1, 2 and 3 and Figs. 1, 2 and 3.5
Table 1

Modelling results for national, Hubei, and non-Hubei time-series data

Regional classificationTKˆ95% CI of Kˆ
rˆ95% CI of rˆ
P0ˆF statisticApprox. Pr > FEstimated date of maximal increaseR2
LowerUpperLowerUpperTrainTestTotal
National5371 954.664 640.279 268.90.19270.16830.21700.2170213.77<0.00012020/2/70.9290.8510.929
Hubei5258 221.351 319.065 123.50.19840.16910.22771156151.5<0.00012020/2/90.90313.2710.903
Non-Hubei5313 426.112 810.614 041.60.23860.22480.2524530.71116.33<0.00012020/2/20.9850.0000.986

CI, confidence interval.

Table 1 shows the modelling results of Equation (4) estimated for time-series data of new COVID-19 cases in China, Hubei, and non-Hubei provinces. While the study includes the calculations for the 95% confidence interval for P0, it is not provided within Tables 1, 2 and 3 due to space limitations. The training period lasted from January 20, 2020 to March 13, 2020, whereas the testing period lasted from March 14, 2020 to April 18, 2020. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined.

Fig. 1

Graphical representation of the modelling results in Table 1. The estimated logistic growth function and actual values for new reported cases nationally (a), in Hubei Province (b), and in non-Hubei provinces (c), and the estimated and actual values for cumulative cases nationally (d), in Hubei Province (e), and in non-Hubei provinces (f) are represented. I is the observed value and I_hat is . Graphing Equation (1) using estimated parameters from Equation (3) involved more than plugging parameters in. When integration occurs, one needs to add some constant C, and therefore this study added a constant that was the difference between the mean of all observed data points and that of all predicted data points. The cutoff day, namely March 13, 2020, distinguished the training period from the testing period. The dotted lines of l95mean and u95mean show the 95% confidence interval of the mean predicted value at a given time (t).

Table 2

Modelling results for ten selected non-Hubei provinces and municipalities in China

Province/municipalityTDate of first observationKˆ95% CI of Kˆ
rˆ95% CI of rˆ
P0ˆF statisticApprox. Pr > FEstimated date of maximal increaseR2
LowerUpperLowerUpperTrainTestTotal
Anhui5122-Jan1052.8973.71131.90.25180.22790.275741.49416.86<0.00012020/2/30.9630.0000.963
Beijing5023-Jan415.5369.3461.80.23040.19740.263438.869232.06<0.00012020/2/20.9370.0000.935
Chongqing5123-Jan695.4601.1789.60.17640.14630.2064109.90212.24<0.00012020/1/310.9300.0000.930
Guangdong5419-Jan1347.21243.21451.30.28460.25720.31224.89372.66<0.00012020/2/20.9560.0000.955
Hunan5122-Jan1116.71027.21206.20.24380.21880.268976.54404.48<0.00012020/2/10.9620.0000.962
Henan5023-Jan1366.11269.91462.30.25070.22810.273389.91519.54<0.00012020/2/20.9710.0000.971
Jiangsu5122-Jan707657.9756.10.21240.19350.231347.69534.04<0.00012020/2/20.9710.0000.971
Jiangxi5221-Jan983.7890.71076.70.27250.24020.304826.59254.12<0.00012020/2/30.9400.0000.940
Shanghai5320-Jan345.7313.9377.60.26250.23190.293214.33278.07<0.00012020/2/10.9430.0000.943
Zhejiang5023-Jan1241.71054.71428.70.30530.24620.364498.836120.92<0.00012020/1/310.8850.0000.885

CI, confidence interval.

Table 2 shows the modelling results of Equation (4) estimated for time-series data of new COVID-19 cases in ten non-Hubei provinces. The ten provinces chosen were those that had the highest numbers of cumulative cases or those that were significant to China's economy. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined.

Fig. 2

Graphical representation of the modelling results in Table 2 for time-series data of ten provinces in China. Due to space limitations, this figure does not present the scatter plot data of observed values for each province/municipality. The scatter plot and fitted curves for each province/municipality have been drawn, and they are available upon request.

Table 3

Modelling results for nine selected countries

CountryTDate of first observationKˆ95% CI of Kˆ
rˆ95% CI of rˆ
P0ˆF statisticEstimated date of maximal increaseR2
LowerUpperLowerUpperTrainTestTotal
Iran2319-Feb19 604.510378.328830.80.21510.12370.3066197.2079.8711-Mar0.9230.0300.085
South Korea5320-Jan80807126.29033.80.36100.30820.41380.0026157.341-Mar0.9040.0070.858
France4924-Jan57 021.5−1.26E7127188365.3E–4−1.1331.134351 6310.35Convergence criterion unmet
Germany4528-Jan70 247.7−5.84E65.98E69.4E–4−0.9790.981464 9050.36Convergence criterion unmet
U.S.A5122-Jan5530.3−4.728E94.7282E95.6E–3−132.6132.65444.50.04Convergence criterion unmet
Italy411-Feb42 896.4−1.449E81.4498E85.4E–3−14.1014.1112 5720.85Convergence criterion unmet
Spain411-Feb49 407.6−2.498E92.4981E96.2E–3−22.5322.5547 4640.31Convergence criterion unmet
Singapore5023-Jan15 679.8−5.834E7583737400.0104−0.7790.8007118.310.29Convergence criterion unmet
Japan5020-Jan29 78.6−502242650283830.0070−1.6021.616157.07.13Convergence criterion unmet

CI, confidence interval.

Table 3 shows the modeling results of Equation (4) estimated for time-series data of new COVID-19 cases in nine nations. Apart from Iran and South Korea, the model failed to reach optimal results for the other seven nations. Six of the listed nations (France, Germany, Iran, Italy, Spain, and the USA) represented those with the most cumulative cases as of March 13, 2020. The three others (Japan, South Korea, and Singapore) are nations that are situated close to China and underwent early outbreaks. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined.

Fig. 3

Graphical representation of the modelling results in Table 3 for time-series data of South Korea and Iran. The estimated logistic growth function and actual values for new cases in South Korea (a) and Iran (b), and the estimated growth function and actual values for cumulative cases in South Korea (c) and Iran (d) are represented. I is the observed value and I_hat is . As in Fig. 1, this study added a constant that was the difference between the mean of all observed data points and that of all predicted data points in the integration process. The cutoff day in (a) and (c), namely March 13, 2020, distinguished the training period from the testing period. This is also the case for cutoff 1 in graphs (c) and (d). However, for Iran, this study developed two other models, shown with dotted lines, based on different cutoff dates at t = 30 (cutoff 2, March 21, 2020) and t = 50 (cutoff 3, April 9, 2020).

Modelling results for national, Hubei, and non-Hubei time-series data CI, confidence interval. Table 1 shows the modelling results of Equation (4) estimated for time-series data of new COVID-19 cases in China, Hubei, and non-Hubei provinces. While the study includes the calculations for the 95% confidence interval for P0, it is not provided within Tables 1, 2 and 3 due to space limitations. The training period lasted from January 20, 2020 to March 13, 2020, whereas the testing period lasted from March 14, 2020 to April 18, 2020. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined. Graphical representation of the modelling results in Table 1. The estimated logistic growth function and actual values for new reported cases nationally (a), in Hubei Province (b), and in non-Hubei provinces (c), and the estimated and actual values for cumulative cases nationally (d), in Hubei Province (e), and in non-Hubei provinces (f) are represented. I is the observed value and I_hat is . Graphing Equation (1) using estimated parameters from Equation (3) involved more than plugging parameters in. When integration occurs, one needs to add some constant C, and therefore this study added a constant that was the difference between the mean of all observed data points and that of all predicted data points. The cutoff day, namely March 13, 2020, distinguished the training period from the testing period. The dotted lines of l95mean and u95mean show the 95% confidence interval of the mean predicted value at a given time (t). Modelling results for ten selected non-Hubei provinces and municipalities in China CI, confidence interval. Table 2 shows the modelling results of Equation (4) estimated for time-series data of new COVID-19 cases in ten non-Hubei provinces. The ten provinces chosen were those that had the highest numbers of cumulative cases or those that were significant to China's economy. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined. Graphical representation of the modelling results in Table 2 for time-series data of ten provinces in China. Due to space limitations, this figure does not present the scatter plot data of observed values for each province/municipality. The scatter plot and fitted curves for each province/municipality have been drawn, and they are available upon request. Modelling results for nine selected countries CI, confidence interval. Table 3 shows the modeling results of Equation (4) estimated for time-series data of new COVID-19 cases in nine nations. Apart from Iran and South Korea, the model failed to reach optimal results for the other seven nations. Six of the listed nations (France, Germany, Iran, Italy, Spain, and the USA) represented those with the most cumulative cases as of March 13, 2020. The three others (Japan, South Korea, and Singapore) are nations that are situated close to China and underwent early outbreaks. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined. Graphical representation of the modelling results in Table 3 for time-series data of South Korea and Iran. The estimated logistic growth function and actual values for new cases in South Korea (a) and Iran (b), and the estimated growth function and actual values for cumulative cases in South Korea (c) and Iran (d) are represented. I is the observed value and I_hat is . As in Fig. 1, this study added a constant that was the difference between the mean of all observed data points and that of all predicted data points in the integration process. The cutoff day in (a) and (c), namely March 13, 2020, distinguished the training period from the testing period. This is also the case for cutoff 1 in graphs (c) and (d). However, for Iran, this study developed two other models, shown with dotted lines, based on different cutoff dates at t = 30 (cutoff 2, March 21, 2020) and t = 50 (cutoff 3, April 9, 2020). Based on Figures 1, 2 and 3, the regression model fits the testing data well. However, its R 2 values in the testing period, as indicated in Table 1, Table 2, do not show consistent results. This is due to the fact that the testing and training time-series data inherently occupy two distinct portions of the non-linear curve, wherein the training period covers the peak of the outbreak with most of the cases, and the testing period covers only the flat, right-end tail with few cases. This study identified that when the predicted value reached close to 0 in the testing period and new cases became sporadic, the R 2 value ended up unstable - sometimes almost 0 and other times surpassing 1 - a finding also consistent with Greene (Greene, 1997). As a result, the R 2 may be different between distinct sections of the non-linear curve, and may be misleading when it is used to compare goodness-of-fit between these portions.6 The model may not carry the strikingly high R 2 values that appear when fitted to cumulative data. However, the results avoid the problem of bias and the underestimation of uncertainty, yielding more realistic estimates. Notably, as shown in Fig. 1, national COVID-19 cases will be around 71 954.6 (95% CI 64 640.2–79 268.9), the Hubei, China cases will be around 58 221.3 (95% CI 51 319–65 123.5), and the non-Hubei, China cases will be around 13 426.1 (95% CI 12 810.6–14 041.6). It is important to note that the confidence intervals exclude the cases removed during the data adjustment process, as mentioned in the Methods section. While parameters K and P 0 reflect innate regional differences in population and size, this study demonstrated that the growth rate r might differ between provinces in China resulting from variations in local control measures and policies. Through the restriction test, the study showed that the calculated test statistic was 3.84863 with a corresponding p-value of 0.00010, and rejected the null hypothesis that r was the same among provinces and municipalities (Table 4 ). These findings corroborate previous studies showing that the degree of success in the control of proliferation has been influenced by a host of factors, ranging from individual patient response (Lau et al., 2016) to control and precautionary measures taken. More specifically, these may include differences in quarantine protocols, city-wide lockdowns, and travel restrictions, as well as distinctions in local cultures and behavior. Therefore, there is potential for others to study COVID-19 proliferation in various regions in China with the aim of strengthening viral prevention measures as cases surge internationally.
Table 4

Parameter restriction test results

Restriction testRestricted parameterESSrESSfqTnTest statisticp-ValueTest results
10 Provinces/Municipalitiesr26 777.324 985.594833.848630.00010Reject null hypothesis
South Korea – Iranr1 370 8691 185 30717010.95860.00083Reject null hypothesis

Table 4 shows the restriction test results on growth rate r. The null hypothesis was that the growth rates were the same among the ten provinces or the same between South Korea and Iran. ESSr is the residual sum of squares with restrictions, ESSf is the residual sum of squares without restrictions, and qrepresents the number of restrictions imposed upon the growth rates. The Test statistic is calculated based upon Equation (7).

Parameter restriction test results Table 4 shows the restriction test results on growth rate r. The null hypothesis was that the growth rates were the same among the ten provinces or the same between South Korea and Iran. ESSr is the residual sum of squares with restrictions, ESSf is the residual sum of squares without restrictions, and qrepresents the number of restrictions imposed upon the growth rates. The Test statistic is calculated based upon Equation (7). In regards to the modelling process for the nine countries listed in Table 3, the model failed to generate reliable estimates for seven out of the nine countries when no clear, discernable date of maximal increase in COVID-19 cases existed according to the data as of March 13, 2020. Consequently, when a model based on Equation (4) failed to reach optimal results, the corresponding region was likely in the early stages of an outbreak as of March 13, 2020.7 In the two nations that did yield reliable estimates, South Korea and Iran, this study predicted COVID-19 cases to reach 8080 (95% CI 7126.2–9033.8) and 19 604.5 (95% CI 10 378.3–28 830.8), respectively. A restriction test was also conducted for the growth rate r between these two countries, the results of which are displayed in Table 4. The test statistic was 10.9586 and the p-value was 0.00083, and this study, again, rejected the null hypothesis that r was the same between South Korea and Iran. In the case of South Korea, the regression model provided a good fit in the training data, but the fitness in the testing period was less satisfactory because the model failed to capture the additional new cases in the testing period. Modelling the data for Iran was a significantly more complicated process. Fig. 3 indicates a multitude of peaks in the time-series data of confirmed new cases of COVID-19 in Iran. As a result, the initial logistic model built using data up to March 13, 2020 failed to capture new peaks after that date. When estimating the regression model using more data and a cutoff date set at t  = 30 and t  = 50 (March 21, 2020 and April 9, 2020, respectively), this study demonstrated how the logistic regression model had evolved and fitted the data better as new information became available. The results of the residual diagnosis tests from Table 5 and Fig. 4 show both the raw and projected residuals have their distributions centered around 0, and their predicted values closely follow a 45-degree line with their corresponding actual values. This study identifies the existence of positive serial correlation in two of the ten selected non-Hubei provinces (Zhejiang and Jiangsu) and in Iran based on either the raw or projected residuals. In the rest of the non-Hubei provinces of China as well as in South Korea, the DW test failed to reject the null hypothesis of no serial correlation in residuals. Furthermore, with the exception of Beijing, Guangdong and Shanghai, China, this study identified the existence of heteroscedasticity in all other selected non-Hubei provinces and cities, suggesting the variance in error terms largely existed and that it may vary depending on different stages of the proliferation. Accordingly, whereas this study's estimates are still consistent, they are not efficient (Pindyck and Rubinfeld, 1991). The findings suggest that further research may be needed to develop more efficient estimators of the model.
Table 5

Durbin–Watson and White test results

Country/provinceType of residualDurbin–Watson test
White test
DW statisticResultChi-square valueResult
NationalRaw1.796Fail to reject3.198Fail to reject
Projected2.123Fail to reject2.558Fail to reject
HubeiRaw1.911Fail to reject3.198Fail to reject
Projected2.196Fail to reject2.558Fail to reject
Non-HubeiRaw1.043Reject19.244Reject
Projected1.629Fail to reject5.120Fail to reject
AnhuiRaw2.280Fail to reject14.810Reject
Projected2.461Fail to reject16.203Reject
BeijingRaw2.041Fail to reject3.855Fail to reject
Projected2.232Fail to reject3.295Fail to reject
ChongqingRaw1.941Fail to reject21.915Reject
Projected2.090Fail to reject21.986Reject
GuangdongRaw2.120Fail to reject4.0863Fail to reject
Projected2.482Fail to reject5.406Fail to reject
HunanRaw1.993Fail to reject6.115Reject
Projected2.190Fail to reject7.609Reject
HenanRaw1.937Fail to reject13.720Reject
Projected2.259Fail to reject20.010Reject
JiangsuRaw0.767Reject6.980Reject
Projected1.531Indeterminate9.280Reject
JiangxiRaw1.549Indeterminate8.200Reject
Projected1.826Fail to reject8.112Reject
ShanghaiRaw1.850Fail to reject1.850Fail to reject
Projected1.901Fail to reject1.901Fail to reject
ZhejiangRaw1.301Reject20.455Reject
Projected1.650Fail to reject23.465Reject
IranRaw1.193Reject3.712Reject
Projected2.088Fail to reject5.329Reject
South KoreaRaw1.776Fail to reject2.253Fail to reject
Projected2.100Fail to reject2.798Fail to reject

Table 5 shows the results of the Durbin–Watson test and White test conducted on both raw and projected residuals. The critical value for the DW test with the number of explanatory variables being 1 was 1.51 to 1.59, and ‘Reject’ indicated serial correlation issues. The critical value for the White test was 5.99, and ‘Reject’ indicated heteroskedasticity. Both the DW test and White test were conducted at the 5% significance level. The residual diagnosis was not performed for the seven other nations not listed in this table due to the lack of optimal model estimates.

Fig. 4

Residual diagnosis for the non-Hubei regression model. The results in this figure are based on the non-Hubei time-series data; the results for other regions are available upon request.

Durbin–Watson and White test results Table 5 shows the results of the Durbin–Watson test and White test conducted on both raw and projected residuals. The critical value for the DW test with the number of explanatory variables being 1 was 1.51 to 1.59, and ‘Reject’ indicated serial correlation issues. The critical value for the White test was 5.99, and ‘Reject’ indicated heteroskedasticity. Both the DW test and White test were conducted at the 5% significance level. The residual diagnosis was not performed for the seven other nations not listed in this table due to the lack of optimal model estimates. Residual diagnosis for the non-Hubei regression model. The results in this figure are based on the non-Hubei time-series data; the results for other regions are available upon request. In conclusion, this simple, three-parameter logistic growth function for reported COVID-19 cases estimated by an NLS method presents certain insights for current and future studies of outbreaks. This study's findings demonstrate that the model fitted the data in China exceedingly well, and the study was able to provide estimates for COVID-19 cases and compare the speed of proliferation among regions. However, the model failed to provide estimates for outbreaks in their early stages, and only yielded results after there was a definitive day of maximal increase in cases. Conducting a restriction test on the r parameter, the study found that between provinces in China and between other countries alike, the growth rates of COVID-19 differed and the study conjectured that this was due to disparities in local public health policies, societal behavior, patient response, etc. Accordingly, there is potential for this model to contribute to formulating better policies towards combatting COVID-19 by retrospectively analyzing the outbreaks in regions such as provinces in China. This study also observed that in non-linear regressions, the R 2 value varied between different sections of the non-linear curve, and that the existence of heteroscedasticity and serial correlation in some provinces and countries warrant further research. In summary, the study's findings show that in a relatively isolated environment such as China, where control measures are consistently strict and regulatory, the logistic regression model fits very well. However, when other factors become prevalent — such as diverging public health control practices and imported cases from abroad — the proliferation of infectious diseases may complicate research methods and a single logistic growth model may not suffice.

Declarations

Funding: Not applicable. Ethical approval and consent to participate: The need for ethical approval or individual consent was not applicable. Availability of data and materials: All data and materials used in this work are publicly available. Consent for publication: Not applicable. Conflict of interest: No conflict of interest to declare.
  8 in total

1.  Testing for serial correlation in least squares regression. I.

Authors:  J DURBIN; G S WATSON
Journal:  Biometrika       Date:  1950-12       Impact factor: 2.445

2.  Efficacy and safety of 3-week response-guided triple direct-acting antiviral therapy for chronic hepatitis C infection: a phase 2, open-label, proof-of-concept study.

Authors:  George Lau; Yves Benhamou; Guofeng Chen; Jin Li; Qing Shao; Dong Ji; Fan Li; Bing Li; Jialiang Liu; Jinlin Hou; Jian Sun; Cheng Wang; Jing Chen; Vanessa Wu; April Wong; Chris L P Wong; Stella T Y Tsang; Yudong Wang; Leda Bassit; Sijia Tao; Yong Jiang; Hui-Mien Hsiao; Ruian Ke; Alan S Perelson; Raymond F Schinazi
Journal:  Lancet Gastroenterol Hepatol       Date:  2016-07-25

3.  Generalized logistic growth modeling of the COVID-19 outbreak: comparing the dynamics in the 29 provinces in China and in the rest of the world.

Authors:  Ke Wu; Didier Darcet; Qian Wang; Didier Sornette
Journal:  Nonlinear Dyn       Date:  2020-08-19       Impact factor: 5.022

4.  Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola.

Authors:  Aaron A King; Matthieu Domenech de Cellès; Felicia M G Magpantay; Pejman Rohani
Journal:  Proc Biol Sci       Date:  2015-05-07       Impact factor: 5.349

5.  Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.

Authors:  Joseph T Wu; Kathy Leung; Gabriel M Leung
Journal:  Lancet       Date:  2020-01-31       Impact factor: 79.321

6.  Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak.

Authors:  Shi Zhao; Qianyin Lin; Jinjun Ran; Salihu S Musa; Guangpu Yang; Weiming Wang; Yijun Lou; Daozhou Gao; Lin Yang; Daihai He; Maggie H Wang
Journal:  Int J Infect Dis       Date:  2020-01-30       Impact factor: 3.623

7.  Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020.

Authors:  K Roosa; Y Lee; R Luo; A Kirpich; R Rothenberg; J M Hyman; P Yan; G Chowell
Journal:  Infect Dis Model       Date:  2020-02-14

8.  Modeling the epidemic dynamics and control of COVID-19 outbreak in China.

Authors:  Shilei Zhao; Hua Chen
Journal:  Quant Biol       Date:  2020-03-11
  8 in total
  16 in total

1.  Convex-Concave fitting to successively updated data and its application to covid-19 analysis.

Authors:  Demetrius E Davos; Ioannis C Demetriou
Journal:  J Comb Optim       Date:  2022-06-25       Impact factor: 1.262

2.  On computational analysis of nonlinear regression models addressing heteroscedasticity and autocorrelation issues: An application to COVID-19 data.

Authors:  Mintodê Nicodème Atchadé; Paul Tchanati P
Journal:  Heliyon       Date:  2022-10-12

3.  Predict Mortality in Patients Infected with COVID-19 Virus Based on Observed Characteristics of the Patient using Logistic Regression.

Authors:  Bernhard O Josephus; Ardianto H Nawir; Evelyn Wijaya; Jurike V Moniaga; Margaretha Ohyver
Journal:  Procedia Comput Sci       Date:  2021-02-19

4.  Mechanisms of recurrent outbreak of COVID-19: a model-based study.

Authors:  Chuanliang Han; Meijia Li; Naem Haihambo; Pius Babuna; Qingfang Liu; Xixi Zhao; Carlo Jaeger; Ying Li; Saini Yang
Journal:  Nonlinear Dyn       Date:  2021-03-18       Impact factor: 5.022

5.  Short-term real-time prediction of total number of reported COVID-19 cases and deaths in South Africa: a data driven approach.

Authors:  Tarylee Reddy; Ziv Shkedy; Charl Janse van Rensburg; Henry Mwambi; Pravesh Debba; Khangelani Zuma; Samuel Manda
Journal:  BMC Med Res Methodol       Date:  2021-01-11       Impact factor: 4.615

6.  Modeling Performance of Microservices Systems with Growth Theory.

Authors:  Matteo Camilli; Barbara Russo
Journal:  Empir Softw Eng       Date:  2022-01-11       Impact factor: 3.762

7.  Towards predicting COVID-19 infection waves: A random-walk Monte Carlo simulation approach.

Authors:  D P Mahapatra; S Triambak
Journal:  Chaos Solitons Fractals       Date:  2022-01-10       Impact factor: 5.944

8.  Predicting of the Coronavirus Disease 2019 (COVID-19) Epidemic Using Estimation of Parameters in the Logistic Growth Model.

Authors:  Agus Kartono; Setyanto Tri Wahyudi; Ardian Arif Setiawan; Irmansyah Sofian
Journal:  Infect Dis Rep       Date:  2021-05-24

9.  COVID-19 in India: Statewise Analysis and Prediction.

Authors:  Palash Ghosh; Rik Ghosh; Bibhas Chakraborty
Journal:  JMIR Public Health Surveill       Date:  2020-08-12

10.  Mid-Epidemic Forecasts of COVID-19 Cases and Deaths: A Bivariate Model Applied to the UK.

Authors:  Peter Congdon
Journal:  Interdiscip Perspect Infect Dis       Date:  2021-02-12
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.