Literature DB >> 32376306

Logistic growth modelling of COVID-19 proliferation in China and its international implications.

Abstract

OBJECTIVE: As the coronavirus disease 2019 (COVID-19) pandemic continues to proliferate globally, this paper shares the findings of modelling the outbreak in China at both provincial and national levels. This paper examines the applicability of the logistic growth model, with implications for the study of the COVID-19 pandemic and other infectious diseases.
METHODS: An NLS (Non-Linear Least Squares) method was employed to estimate the parameters of a differentiated logistic growth function using new daily COVID-19 cases in multiple regions in China and in other selected countries. The estimation was based upon training data from January 20, 2020 to March 13, 2020. A restriction test was subsequently implemented to examine whether a designated parameter was identical among regions or countries, and the diagnosis of residuals was also conducted. The model's goodness of fit was checked using testing data from March 14, 2020 to April 18, 2020.
RESULTS: The model presented in this paper fitted time-series data exceedingly well for the whole of China, its eleven selected provinces and municipalities, and two other countries - South Korea and Iran - and provided estimates of key parameters. This study rejected the null hypothesis that the growth rates of outbreaks were the same among ten selected non-Hubei provinces in China, as well as between South Korea and Iran. The study found that the model did not provide reliable estimates for countries that were in the early stages of outbreaks. Furthermore, this study concured that the R2 values might vary and mislead when compared between different portions of the same non-linear curve. In addition, the study identified the existence of heteroskedasticity and positive serial correlation within residuals in some provinces and countries.
CONCLUSIONS: The findings suggest that there is potential for this model to contribute to better public health policy in combatting COVID-19. The model does so by providing a simple logistic framework for retrospectively analyzing outbreaks in regions that have already experienced a maximal proliferation in cases. Based upon statistical findings, this study also outlines certain challenges in modelling and their implications for the results.

Entities: CellLine Disease Gene Species

Keywords: COVID-19; China; Logistic growth model; Non-linear least squares

Mesh：

Year: 2020 PMID： 32376306 PMCID： PMC7196547 DOI： 10.1016/j.ijid.2020.04.085

Source DB: PubMed Journal: Int J Infect Dis ISSN： 1201-9712 Impact factor: 3.623

Introduction

An outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a zoonotic coronavirus similar to severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome-related coronavirus (MERS-CoV), has rapidly spread across China and various regions of the world. As of April 17, 2020, the cumulative numbers of confirmed cases had reached 82 719 in China (NHCPRC, 2020) and 2 074 529 in 210 countries and territories worldwide (World Health Organization, 2020). In light of these recent developments, the scientific community has sought understanding of coronavirus disease 2019 (COVID-19), the disease caused by SARS-CoV-2, and many have undertaken statistical and modelling approaches. The R 0 value for virus transmissibility has been evaluated through stochastic Markov chain Monte Carlo (MCMC) methods (Wu et al., 2020a), a mathematical incidence decay and exponential adjustment (IDEA) model (Majumder and Mandl, 2020), and a statistical exponential growth model adopting the serial interval from severe acute respiratory syndrome (SARS) (Zhao et al., 2020). Researchers have also utilized several models to generate short-term forecasts for cumulative case counts (Roosa et al., 2020), and have developed a ‘susceptible, un-quarantined infected, quarantined infected, confirmed infected’ (SUQC) model to characterize the dynamics of outbreaks (Zhao and Chen, 2020). This study applied a logistic growth function with parameters estimated by a non-linear least squares (NLS) method to model and analyze time-series data from eleven provinces and municipalities in China (Anhui, Beijing, Chongqing, Guangdong, Henan, Hubei, Hunan, Jiangsu, Jiangxi, Shanghai, and Zhejiang) and nine other countries (Iran, South Korea, France, Germany, the U.S.A., Italy, Spain, Singapore and Japan). The implications of the results for the study of infectious diseases are discussed.

Methods

Devised by Belgian mathematician Pierre-François Verhulst (1804–1849) and corroborated by others in later years, the logistic function has become one of the essential tools for bio-assays and has been increasingly applied in a variety of fields, including statistics, economics, and epidemiology (Cramer, 2004). Specifically, it has been used to model population growth in a region and bacterial growth in a broth, and has been implemented in binary decision-making processes in economics and finance. The equation of the logistic function, following a common sigmoid curve, takes the mathematical formwhere P(t), or the number of cumulative cases of COVID-19, is expressed as a function of time, t, with parameters . To be exact, K represents carrying capacity, P 0 represents the initial value of the function at t = 0, and r represents the growth rate, or the speed of proliferation. This study did not, however, use a logistic function to directly estimate a model for cumulative cases. Previous studies have suggested that fitting deterministic models to cumulative cases, due to serial correlation in the error terms (measurement errors), creates biased parameters and overfitting of the model to data, and underestimates the uncertainty associated with parameters (King et al., 2015). Therefore, the derivative of the logistic growth function was adopted to model the daily new cases. The general differential logistic equation takes the following form: Substituting Equation (1) into Equation (2) gives the logistic differential equation Specifically, the number of observed daily new cases, I(t), is equal to f(t,β) plus an error term ε, as shown in the statistical model below, where t = 1, 2, …T, and T is the number of observations. Equation (4) is the key equation for modelling time-series data. The study assumed that the error terms were independent and identically distributed (i.i.d.), and used the NLS method to estimate β by minimizing the residual sum of squares, . After estimation, this study set , or, as the predicted value at a given t using estimated parameters . The residual is defined as . The study defined , and , representing the total sum of squares, regression sum of squares, and the residual (error) sum of squares, respectively. According to the statistical identity TSS = RSS + ESS (Pindyck and Rubinfeld, 1991), the coefficient of determination is expressed as The study calculated the F-statistic value through , given that n is the number of parameters, or three, T is the number of observations, and that F asymptotically follows an F-distribution with n and (T–n) degrees of freedom. This study established matrix X as equivalent to the partial derivative of f(t·β) or such that . The estimate of the asymptotic covariance matrix of β is (Greene, 1997), wherein MSE is the estimate of the residual variance, equal to . The confidence interval of is determined by equatingwherein α is the significance level, substituted with 5% when calculating the 95% confidence interval (95% CI).1 In order to test certain restrictions upon parameters β, this study compared the ESS of a free model (ESS ) to the ESS of a restricted model (ESS ) and calculated a test static F with q constraints on β, where The test asymptotically follows an F-distribution with q and (T–n) degrees of freedom (Schabenberger and Pierce, 2002). Estimation was completed using the SAS software package. The method of optimization utilized a Gauss–Newton algorithm, which is advantageous as it does not require a second derivative and converged quickly with this study's estimations. The Gauss–Newton algorithm iteratively finds the values of parameters β that minimize the sum of squares of the residuals. It starts from an initial estimate β (0) and proceeds by iterations expressed in terms of and β, which are column matrices, and in terms of s, which is the iteration step during the optimization process. The daily times-series data of cumulative COVID-19 cases from January 20, 2020 to April 18, 2020 were retrieved from the National Health Commission of the People's Republic of China and its respective health commissions in the selected eleven provinces and municipalities. Time-series data for the nine other countries up until April 18, 2020 were obtained through WIND DATA, a leading financial data services provider in China. This study took measures to adjust the time-series data for cumulative cases in China. It should be noted that on February 12, 2020, clinical evidence and radiographic confirmation were introduced into the diagnosis guidelines for new cases, causing a jump of nearly 15 000 new cases in Hubei, China.2 On April 16, 2020, health officials announced a one-time re-adjustment of the number of cumulative cases in Wuhan, Hubei — an increase by 325 cases — which were originally omitted from the public amidst the epidemic. For data consistency, this study removed all cases that were added on those two days. To further ensure consistency, this study removed all confirmed cases that were imported from abroad (1575 cases nationally, including 741 cases in many of the ten non-Hubei provinces of China). In addition, 305 cases in the prison system reported on February 20, 2020 (271 in Hubei and 34 in Zhejiang) where the spread was relatively independent and not within the coverage of the provisional health authorities were also removed (Wu et al., 2020b). The time-series data for daily new cases in China, derived from cumulative cases, were split into two time periods for training and testing. The study fit a logistic growth model for time-series data from January 20, 2020 up until March 13, 2020 (defined as training data), when the manuscript was originally written. It estimated all parameters and related statistics of the model based upon the training data only. The estimated model was then fitted to the time-series data from March 14, 2020 to April 18, 2020 (defined as testing data). The regression model took the assumption of i.i.d. error terms. This study performed several residual diagnosis tests to check whether such an assumption was appropriate for the fitted errors or not. Previous papers have found that using the raw residuals, w(t), of non-linear models for diagnosis may be misleading, as they may have non-zero means and different variances. These papers have, consequently, suggested that the use of alternative residuals, referred to as projected residuals, may overcome many of the shortcomings of the raw residuals (Cook and Tsai, 1985). To test whether the error terms in the regression model were independent as per the suggestion by Cook and Tsai, a Durbin–Watson (DW) statistic (Durbin and Watson, 1950) was employed for both the raw residuals and projected residuals.3 Similarly, to test for homoskedasticity, this study carried out the White test (White, 1980) for both types of residuals.4

Results and discussion

Equation (4) was first fitted for the training data for the whole of China, as well as for Hubei Province and non-Hubei provinces; the results are displayed in Table 1 and Fig. 1 . The same model was then fitted for training data from ten selected non-Hubei provinces and municipalities, with the results shown in Table 2 and Fig. 2 . The model was subsequently fitted for the training data from the nine other countries as well, for which the results are shown in Table 3 and Fig. 3 . After the parameters of the model had been estimated, they were fitted in the corresponding testing data, and the results are illustrated in Tables 1, 2 and 3 and Figs. 1, 2 and 3.5

Table 1

Modelling results for national, Hubei, and non-Hubei time-series data

Regional classification	T	Kˆ	95% CI of Kˆ		rˆ	95% CI of rˆ		P0ˆ	F statistic	Approx. Pr > F	Estimated date of maximal increase	R²
			Lower	Upper		Lower	Upper					Train	Test	Total
National	53	71 954.6	64 640.2	79 268.9	0.1927	0.1683	0.2170	0.2170	213.77	<0.0001	2020/2/7	0.929	0.851	0.929
Hubei	52	58 221.3	51 319.0	65 123.5	0.1984	0.1691	0.2277	1156	151.5	<0.0001	2020/2/9	0.903	13.271	0.903
Non-Hubei	53	13 426.1	12 810.6	14 041.6	0.2386	0.2248	0.2524	530.7	1116.33	<0.0001	2020/2/2	0.985	0.000	0.986

CI, confidence interval.

Table 1 shows the modelling results of Equation (4) estimated for time-series data of new COVID-19 cases in China, Hubei, and non-Hubei provinces. While the study includes the calculations for the 95% confidence interval for P0, it is not provided within Tables 1, 2 and 3 due to space limitations. The training period lasted from January 20, 2020 to March 13, 2020, whereas the testing period lasted from March 14, 2020 to April 18, 2020. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined.

Fig. 1

Graphical representation of the modelling results in Table 1. The estimated logistic growth function and actual values for new reported cases nationally (a), in Hubei Province (b), and in non-Hubei provinces (c), and the estimated and actual values for cumulative cases nationally (d), in Hubei Province (e), and in non-Hubei provinces (f) are represented. I is the observed value and I_hat is . Graphing Equation (1) using estimated parameters from Equation (3) involved more than plugging parameters in. When integration occurs, one needs to add some constant C, and therefore this study added a constant that was the difference between the mean of all observed data points and that of all predicted data points. The cutoff day, namely March 13, 2020, distinguished the training period from the testing period. The dotted lines of l95mean and u95mean show the 95% confidence interval of the mean predicted value at a given time (t).

Table 2

Modelling results for ten selected non-Hubei provinces and municipalities in China

Province/municipality	T	Date of first observation	Kˆ	95% CI of Kˆ		rˆ	95% CI of rˆ		P0ˆ	F statistic	Approx. Pr > F	Estimated date of maximal increase	R²
				Lower	Upper		Lower	Upper					Train	Test	Total
Anhui	51	22-Jan	1052.8	973.7	1131.9	0.2518	0.2279	0.2757	41.49	416.86	<0.0001	2020/2/3	0.963	0.000	0.963
Beijing	50	23-Jan	415.5	369.3	461.8	0.2304	0.1974	0.2634	38.869	232.06	<0.0001	2020/2/2	0.937	0.000	0.935
Chongqing	51	23-Jan	695.4	601.1	789.6	0.1764	0.1463	0.2064	109.90	212.24	<0.0001	2020/1/31	0.930	0.000	0.930
Guangdong	54	19-Jan	1347.2	1243.2	1451.3	0.2846	0.2572	0.312	24.89	372.66	<0.0001	2020/2/2	0.956	0.000	0.955
Hunan	51	22-Jan	1116.7	1027.2	1206.2	0.2438	0.2188	0.2689	76.54	404.48	<0.0001	2020/2/1	0.962	0.000	0.962
Henan	50	23-Jan	1366.1	1269.9	1462.3	0.2507	0.2281	0.2733	89.91	519.54	<0.0001	2020/2/2	0.971	0.000	0.971
Jiangsu	51	22-Jan	707	657.9	756.1	0.2124	0.1935	0.2313	47.69	534.04	<0.0001	2020/2/2	0.971	0.000	0.971
Jiangxi	52	21-Jan	983.7	890.7	1076.7	0.2725	0.2402	0.3048	26.59	254.12	<0.0001	2020/2/3	0.940	0.000	0.940
Shanghai	53	20-Jan	345.7	313.9	377.6	0.2625	0.2319	0.2932	14.33	278.07	<0.0001	2020/2/1	0.943	0.000	0.943
Zhejiang	50	23-Jan	1241.7	1054.7	1428.7	0.3053	0.2462	0.3644	98.836	120.92	<0.0001	2020/1/31	0.885	0.000	0.885

CI, confidence interval.

Table 2 shows the modelling results of Equation (4) estimated for time-series data of new COVID-19 cases in ten non-Hubei provinces. The ten provinces chosen were those that had the highest numbers of cumulative cases or those that were significant to China's economy. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined.

Fig. 2

Graphical representation of the modelling results in Table 2 for time-series data of ten provinces in China. Due to space limitations, this figure does not present the scatter plot data of observed values for each province/municipality. The scatter plot and fitted curves for each province/municipality have been drawn, and they are available upon request.

Table 3

Modelling results for nine selected countries

Country	T	Date of first observation	Kˆ	95% CI of Kˆ		rˆ	95% CI of rˆ		P0ˆ	F statistic	Estimated date of maximal increase	R²
				Lower	Upper		Lower	Upper				Train	Test	Total
Iran	23	19-Feb	19 604.5	10378.3	28830.8	0.2151	0.1237	0.3066	197.20	79.87	11-Mar	0.923	0.030	0.085
South Korea	53	20-Jan	8080	7126.2	9033.8	0.3610	0.3082	0.4138	0.0026	157.34	1-Mar	0.904	0.007	0.858
France	49	24-Jan	57 021.5	−1.26E7	12718836	5.3E–4	−1.133	1.1343	51 631	0.35	Convergence criterion unmet
Germany	45	28-Jan	70 247.7	−5.84E6	5.98E6	9.4E–4	−0.979	0.9814	64 905	0.36	Convergence criterion unmet
U.S.A	51	22-Jan	5530.3	−4.728E9	4.7282E9	5.6E–3	−132.6	132.6	5444.5	0.04	Convergence criterion unmet
Italy	41	1-Feb	42 896.4	−1.449E8	1.4498E8	5.4E–3	−14.10	14.11	12 572	0.85	Convergence criterion unmet
Spain	41	1-Feb	49 407.6	−2.498E9	2.4981E9	6.2E–3	−22.53	22.55	47 464	0.31	Convergence criterion unmet
Singapore	50	23-Jan	15 679.8	−5.834E7	58373740	0.0104	−0.779	0.8007	118.3	10.29	Convergence criterion unmet
Japan	50	20-Jan	29 78.6	−5022426	5028383	0.0070	−1.602	1.616	157.0	7.13	Convergence criterion unmet

CI, confidence interval.

Table 3 shows the modeling results of Equation (4) estimated for time-series data of new COVID-19 cases in nine nations. Apart from Iran and South Korea, the model failed to reach optimal results for the other seven nations. Six of the listed nations (France, Germany, Iran, Italy, Spain, and the USA) represented those with the most cumulative cases as of March 13, 2020. The three others (Japan, South Korea, and Singapore) are nations that are situated close to China and underwent early outbreaks. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined.

Fig. 3

Graphical representation of the modelling results in Table 3 for time-series data of South Korea and Iran. The estimated logistic growth function and actual values for new cases in South Korea (a) and Iran (b), and the estimated growth function and actual values for cumulative cases in South Korea (c) and Iran (d) are represented. I is the observed value and I_hat is . As in Fig. 1, this study added a constant that was the difference between the mean of all observed data points and that of all predicted data points in the integration process. The cutoff day in (a) and (c), namely March 13, 2020, distinguished the training period from the testing period. This is also the case for cutoff 1 in graphs (c) and (d). However, for Iran, this study developed two other models, shown with dotted lines, based on different cutoff dates at t = 30 (cutoff 2, March 21, 2020) and t = 50 (cutoff 3, April 9, 2020).

Modelling results for national, Hubei, and non-Hubei time-series data CI, confidence interval. Table 1 shows the modelling results of Equation (4) estimated for time-series data of new COVID-19 cases in China, Hubei, and non-Hubei provinces. While the study includes the calculations for the 95% confidence interval for P0, it is not provided within Tables 1, 2 and 3 due to space limitations. The training period lasted from January 20, 2020 to March 13, 2020, whereas the testing period lasted from March 14, 2020 to April 18, 2020. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined. Graphical representation of the modelling results in Table 1. The estimated logistic growth function and actual values for new reported cases nationally (a), in Hubei Province (b), and in non-Hubei provinces (c), and the estimated and actual values for cumulative cases nationally (d), in Hubei Province (e), and in non-Hubei provinces (f) are represented. I is the observed value and I_hat is . Graphing Equation (1) using estimated parameters from Equation (3) involved more than plugging parameters in. When integration occurs, one needs to add some constant C, and therefore this study added a constant that was the difference between the mean of all observed data points and that of all predicted data points. The cutoff day, namely March 13, 2020, distinguished the training period from the testing period. The dotted lines of l95mean and u95mean show the 95% confidence interval of the mean predicted value at a given time (t). Modelling results for ten selected non-Hubei provinces and municipalities in China CI, confidence interval. Table 2 shows the modelling results of Equation (4) estimated for time-series data of new COVID-19 cases in ten non-Hubei provinces. The ten provinces chosen were those that had the highest numbers of cumulative cases or those that were significant to China's economy. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined. Graphical representation of the modelling results in Table 2 for time-series data of ten provinces in China. Due to space limitations, this figure does not present the scatter plot data of observed values for each province/municipality. The scatter plot and fitted curves for each province/municipality have been drawn, and they are available upon request. Modelling results for nine selected countries CI, confidence interval. Table 3 shows the modeling results of Equation (4) estimated for time-series data of new COVID-19 cases in nine nations. Apart from Iran and South Korea, the model failed to reach optimal results for the other seven nations. Six of the listed nations (France, Germany, Iran, Italy, Spain, and the USA) represented those with the most cumulative cases as of March 13, 2020. The three others (Japan, South Korea, and Singapore) are nations that are situated close to China and underwent early outbreaks. The ‘Train’ column includes the training data, the ‘Test’ column includes the testing data, and the ‘Total’ column includes the training and testing data combined. Graphical representation of the modelling results in Table 3 for time-series data of South Korea and Iran. The estimated logistic growth function and actual values for new cases in South Korea (a) and Iran (b), and the estimated growth function and actual values for cumulative cases in South Korea (c) and Iran (d) are represented. I is the observed value and I_hat is . As in Fig. 1, this study added a constant that was the difference between the mean of all observed data points and that of all predicted data points in the integration process. The cutoff day in (a) and (c), namely March 13, 2020, distinguished the training period from the testing period. This is also the case for cutoff 1 in graphs (c) and (d). However, for Iran, this study developed two other models, shown with dotted lines, based on different cutoff dates at t = 30 (cutoff 2, March 21, 2020) and t = 50 (cutoff 3, April 9, 2020). Based on Figures 1, 2 and 3, the regression model fits the testing data well. However, its R 2 values in the testing period, as indicated in Table 1, Table 2, do not show consistent results. This is due to the fact that the testing and training time-series data inherently occupy two distinct portions of the non-linear curve, wherein the training period covers the peak of the outbreak with most of the cases, and the testing period covers only the flat, right-end tail with few cases. This study identified that when the predicted value reached close to 0 in the testing period and new cases became sporadic, the R 2 value ended up unstable - sometimes almost 0 and other times surpassing 1 - a finding also consistent with Greene (Greene, 1997). As a result, the R 2 may be different between distinct sections of the non-linear curve, and may be misleading when it is used to compare goodness-of-fit between these portions.6 The model may not carry the strikingly high R 2 values that appear when fitted to cumulative data. However, the results avoid the problem of bias and the underestimation of uncertainty, yielding more realistic estimates. Notably, as shown in Fig. 1, national COVID-19 cases will be around 71 954.6 (95% CI 64 640.2–79 268.9), the Hubei, China cases will be around 58 221.3 (95% CI 51 319–65 123.5), and the non-Hubei, China cases will be around 13 426.1 (95% CI 12 810.6–14 041.6). It is important to note that the confidence intervals exclude the cases removed during the data adjustment process, as mentioned in the Methods section. While parameters K and P 0 reflect innate regional differences in population and size, this study demonstrated that the growth rate r might differ between provinces in China resulting from variations in local control measures and policies. Through the restriction test, the study showed that the calculated test statistic was 3.84863 with a corresponding p-value of 0.00010, and rejected the null hypothesis that r was the same among provinces and municipalities (Table 4 ). These findings corroborate previous studies showing that the degree of success in the control of proliferation has been influenced by a host of factors, ranging from individual patient response (Lau et al., 2016) to control and precautionary measures taken. More specifically, these may include differences in quarantine protocols, city-wide lockdowns, and travel restrictions, as well as distinctions in local cultures and behavior. Therefore, there is potential for others to study COVID-19 proliferation in various regions in China with the aim of strengthening viral prevention measures as cases surge internationally.

Table 4

Parameter restriction test results

Restriction test	Restricted parameter	ESS_r	ESS_f	q	T–n	Test statistic	p-Value	Test results
10 Provinces/Municipalities	r	26 777.3	24 985.5	9	483	3.84863	0.00010	Reject null hypothesis
South Korea – Iran	r	1 370 869	1 185 307	1	70	10.9586	0.00083	Reject null hypothesis

Table 4 shows the restriction test results on growth rate r. The null hypothesis was that the growth rates were the same among the ten provinces or the same between South Korea and Iran. ESSr is the residual sum of squares with restrictions, ESSf is the residual sum of squares without restrictions, and qrepresents the number of restrictions imposed upon the growth rates. The Test statistic is calculated based upon Equation (7).

Parameter restriction test results Table 4 shows the restriction test results on growth rate r. The null hypothesis was that the growth rates were the same among the ten provinces or the same between South Korea and Iran. ESSr is the residual sum of squares with restrictions, ESSf is the residual sum of squares without restrictions, and qrepresents the number of restrictions imposed upon the growth rates. The Test statistic is calculated based upon Equation (7). In regards to the modelling process for the nine countries listed in Table 3, the model failed to generate reliable estimates for seven out of the nine countries when no clear, discernable date of maximal increase in COVID-19 cases existed according to the data as of March 13, 2020. Consequently, when a model based on Equation (4) failed to reach optimal results, the corresponding region was likely in the early stages of an outbreak as of March 13, 2020.7 In the two nations that did yield reliable estimates, South Korea and Iran, this study predicted COVID-19 cases to reach 8080 (95% CI 7126.2–9033.8) and 19 604.5 (95% CI 10 378.3–28 830.8), respectively. A restriction test was also conducted for the growth rate r between these two countries, the results of which are displayed in Table 4. The test statistic was 10.9586 and the p-value was 0.00083, and this study, again, rejected the null hypothesis that r was the same between South Korea and Iran. In the case of South Korea, the regression model provided a good fit in the training data, but the fitness in the testing period was less satisfactory because the model failed to capture the additional new cases in the testing period. Modelling the data for Iran was a significantly more complicated process. Fig. 3 indicates a multitude of peaks in the time-series data of confirmed new cases of COVID-19 in Iran. As a result, the initial logistic model built using data up to March 13, 2020 failed to capture new peaks after that date. When estimating the regression model using more data and a cutoff date set at t = 30 and t = 50 (March 21, 2020 and April 9, 2020, respectively), this study demonstrated how the logistic regression model had evolved and fitted the data better as new information became available. The results of the residual diagnosis tests from Table 5 and Fig. 4 show both the raw and projected residuals have their distributions centered around 0, and their predicted values closely follow a 45-degree line with their corresponding actual values. This study identifies the existence of positive serial correlation in two of the ten selected non-Hubei provinces (Zhejiang and Jiangsu) and in Iran based on either the raw or projected residuals. In the rest of the non-Hubei provinces of China as well as in South Korea, the DW test failed to reject the null hypothesis of no serial correlation in residuals. Furthermore, with the exception of Beijing, Guangdong and Shanghai, China, this study identified the existence of heteroscedasticity in all other selected non-Hubei provinces and cities, suggesting the variance in error terms largely existed and that it may vary depending on different stages of the proliferation. Accordingly, whereas this study's estimates are still consistent, they are not efficient (Pindyck and Rubinfeld, 1991). The findings suggest that further research may be needed to develop more efficient estimators of the model.

Table 5

Durbin–Watson and White test results

Country/province	Type of residual	Durbin–Watson test		White test
		DW statistic	Result	Chi-square value	Result
National	Raw	1.796	Fail to reject	3.198	Fail to reject
	Projected	2.123	Fail to reject	2.558	Fail to reject
Hubei	Raw	1.911	Fail to reject	3.198	Fail to reject
	Projected	2.196	Fail to reject	2.558	Fail to reject
Non-Hubei	Raw	1.043	Reject	19.244	Reject
	Projected	1.629	Fail to reject	5.120	Fail to reject
Anhui	Raw	2.280	Fail to reject	14.810	Reject
	Projected	2.461	Fail to reject	16.203	Reject
Beijing	Raw	2.041	Fail to reject	3.855	Fail to reject
	Projected	2.232	Fail to reject	3.295	Fail to reject
Chongqing	Raw	1.941	Fail to reject	21.915	Reject
	Projected	2.090	Fail to reject	21.986	Reject
Guangdong	Raw	2.120	Fail to reject	4.0863	Fail to reject
	Projected	2.482	Fail to reject	5.406	Fail to reject
Hunan	Raw	1.993	Fail to reject	6.115	Reject
	Projected	2.190	Fail to reject	7.609	Reject
Henan	Raw	1.937	Fail to reject	13.720	Reject
	Projected	2.259	Fail to reject	20.010	Reject
Jiangsu	Raw	0.767	Reject	6.980	Reject
	Projected	1.531	Indeterminate	9.280	Reject
Jiangxi	Raw	1.549	Indeterminate	8.200	Reject
	Projected	1.826	Fail to reject	8.112	Reject
Shanghai	Raw	1.850	Fail to reject	1.850	Fail to reject
	Projected	1.901	Fail to reject	1.901	Fail to reject
Zhejiang	Raw	1.301	Reject	20.455	Reject
	Projected	1.650	Fail to reject	23.465	Reject
Iran	Raw	1.193	Reject	3.712	Reject
	Projected	2.088	Fail to reject	5.329	Reject
South Korea	Raw	1.776	Fail to reject	2.253	Fail to reject
	Projected	2.100	Fail to reject	2.798	Fail to reject

Table 5 shows the results of the Durbin–Watson test and White test conducted on both raw and projected residuals. The critical value for the DW test with the number of explanatory variables being 1 was 1.51 to 1.59, and ‘Reject’ indicated serial correlation issues. The critical value for the White test was 5.99, and ‘Reject’ indicated heteroskedasticity. Both the DW test and White test were conducted at the 5% significance level. The residual diagnosis was not performed for the seven other nations not listed in this table due to the lack of optimal model estimates.

Fig. 4

Residual diagnosis for the non-Hubei regression model. The results in this figure are based on the non-Hubei time-series data; the results for other regions are available upon request.

Durbin–Watson and White test results Table 5 shows the results of the Durbin–Watson test and White test conducted on both raw and projected residuals. The critical value for the DW test with the number of explanatory variables being 1 was 1.51 to 1.59, and ‘Reject’ indicated serial correlation issues. The critical value for the White test was 5.99, and ‘Reject’ indicated heteroskedasticity. Both the DW test and White test were conducted at the 5% significance level. The residual diagnosis was not performed for the seven other nations not listed in this table due to the lack of optimal model estimates. Residual diagnosis for the non-Hubei regression model. The results in this figure are based on the non-Hubei time-series data; the results for other regions are available upon request. In conclusion, this simple, three-parameter logistic growth function for reported COVID-19 cases estimated by an NLS method presents certain insights for current and future studies of outbreaks. This study's findings demonstrate that the model fitted the data in China exceedingly well, and the study was able to provide estimates for COVID-19 cases and compare the speed of proliferation among regions. However, the model failed to provide estimates for outbreaks in their early stages, and only yielded results after there was a definitive day of maximal increase in cases. Conducting a restriction test on the r parameter, the study found that between provinces in China and between other countries alike, the growth rates of COVID-19 differed and the study conjectured that this was due to disparities in local public health policies, societal behavior, patient response, etc. Accordingly, there is potential for this model to contribute to formulating better policies towards combatting COVID-19 by retrospectively analyzing the outbreaks in regions such as provinces in China. This study also observed that in non-linear regressions, the R 2 value varied between different sections of the non-linear curve, and that the existence of heteroscedasticity and serial correlation in some provinces and countries warrant further research. In summary, the study's findings show that in a relatively isolated environment such as China, where control measures are consistently strict and regulatory, the logistic regression model fits very well. However, when other factors become prevalent — such as diverging public health control practices and imported cases from abroad — the proliferation of infectious diseases may complicate research methods and a single logistic growth model may not suffice.

Declarations

Funding: Not applicable. Ethical approval and consent to participate: The need for ethical approval or individual consent was not applicable. Availability of data and materials: All data and materials used in this work are publicly available. Consent for publication: Not applicable. Conflict of interest: No conflict of interest to declare.

8 in total

1. Testing for serial correlation in least squares regression. I.

Authors: J DURBIN; G S WATSON
Journal: Biometrika Date: 1950-12 Impact factor: 2.445

2. Efficacy and safety of 3-week response-guided triple direct-acting antiviral therapy for chronic hepatitis C infection: a phase 2, open-label, proof-of-concept study.

Authors: George Lau; Yves Benhamou; Guofeng Chen; Jin Li; Qing Shao; Dong Ji; Fan Li; Bing Li; Jialiang Liu; Jinlin Hou; Jian Sun; Cheng Wang; Jing Chen; Vanessa Wu; April Wong; Chris L P Wong; Stella T Y Tsang; Yudong Wang; Leda Bassit; Sijia Tao; Yong Jiang; Hui-Mien Hsiao; Ruian Ke; Alan S Perelson; Raymond F Schinazi
Journal: Lancet Gastroenterol Hepatol Date: 2016-07-25

3. Generalized logistic growth modeling of the COVID-19 outbreak: comparing the dynamics in the 29 provinces in China and in the rest of the world.

Authors: Ke Wu; Didier Darcet; Qian Wang; Didier Sornette
Journal: Nonlinear Dyn Date: 2020-08-19 Impact factor: 5.022

4. Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola.

Authors: Aaron A King; Matthieu Domenech de Cellès; Felicia M G Magpantay; Pejman Rohani
Journal: Proc Biol Sci Date: 2015-05-07 Impact factor: 5.349

5. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.

Authors: Joseph T Wu; Kathy Leung; Gabriel M Leung
Journal: Lancet Date: 2020-01-31 Impact factor: 79.321

6. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak.

Authors: Shi Zhao; Qianyin Lin; Jinjun Ran; Salihu S Musa; Guangpu Yang; Weiming Wang; Yijun Lou; Daozhou Gao; Lin Yang; Daihai He; Maggie H Wang
Journal: Int J Infect Dis Date: 2020-01-30 Impact factor: 3.623

7. Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020.

Authors: K Roosa; Y Lee; R Luo; A Kirpich; R Rothenberg; J M Hyman; P Yan; G Chowell
Journal: Infect Dis Model Date: 2020-02-14

8. Modeling the epidemic dynamics and control of COVID-19 outbreak in China.

Authors: Shilei Zhao; Hua Chen
Journal: Quant Biol Date: 2020-03-11

8 in total

16 in total

1. Convex-Concave fitting to successively updated data and its application to covid-19 analysis.

Authors: Demetrius E Davos; Ioannis C Demetriou
Journal: J Comb Optim Date: 2022-06-25 Impact factor: 1.262

2. On computational analysis of nonlinear regression models addressing heteroscedasticity and autocorrelation issues: An application to COVID-19 data.

Authors: Mintodê Nicodème Atchadé; Paul Tchanati P
Journal: Heliyon Date: 2022-10-12

3. Predict Mortality in Patients Infected with COVID-19 Virus Based on Observed Characteristics of the Patient using Logistic Regression.

Authors: Bernhard O Josephus; Ardianto H Nawir; Evelyn Wijaya; Jurike V Moniaga; Margaretha Ohyver
Journal: Procedia Comput Sci Date: 2021-02-19

4. Mechanisms of recurrent outbreak of COVID-19: a model-based study.

Authors: Chuanliang Han; Meijia Li; Naem Haihambo; Pius Babuna; Qingfang Liu; Xixi Zhao; Carlo Jaeger; Ying Li; Saini Yang
Journal: Nonlinear Dyn Date: 2021-03-18 Impact factor: 5.022

5. Short-term real-time prediction of total number of reported COVID-19 cases and deaths in South Africa: a data driven approach.

Authors: Tarylee Reddy; Ziv Shkedy; Charl Janse van Rensburg; Henry Mwambi; Pravesh Debba; Khangelani Zuma; Samuel Manda
Journal: BMC Med Res Methodol Date: 2021-01-11 Impact factor: 4.615

6. Modeling Performance of Microservices Systems with Growth Theory.

Authors: Matteo Camilli; Barbara Russo
Journal: Empir Softw Eng Date: 2022-01-11 Impact factor: 3.762

7. Towards predicting COVID-19 infection waves: A random-walk Monte Carlo simulation approach.

Authors: D P Mahapatra; S Triambak
Journal: Chaos Solitons Fractals Date: 2022-01-10 Impact factor: 5.944

8. Predicting of the Coronavirus Disease 2019 (COVID-19) Epidemic Using Estimation of Parameters in the Logistic Growth Model.

Authors: Agus Kartono; Setyanto Tri Wahyudi; Ardian Arif Setiawan; Irmansyah Sofian
Journal: Infect Dis Rep Date: 2021-05-24

9. COVID-19 in India: Statewise Analysis and Prediction.

Authors: Palash Ghosh; Rik Ghosh; Bibhas Chakraborty
Journal: JMIR Public Health Surveill Date: 2020-08-12

10. Mid-Epidemic Forecasts of COVID-19 Cases and Deaths: A Bivariate Model Applied to the UK.

Authors: Peter Congdon
Journal: Interdiscip Perspect Infect Dis Date: 2021-02-12