Literature DB >> 32435695

Reconstructing and forecasting the COVID-19 epidemic in the United States using a 5-parameter logistic growth model.

Ding-Geng Chen^1,2, Xinguang Chen³, Jenny K Chen⁴.

Abstract

Background: Many studies have modeled and predicted the spread of COVID-19 (coronavirus disease 2019) in the U.S. using data that begins with the first reported cases. However, the shortage of testing services to detect infected persons makes this approach subject to error due to its underdetection of early cases in the U.S. Our new approach overcomes this limitation and provides data supporting the public policy decisions intended to combat the spread of COVID-19 epidemic.
Methods: We used Centers for Disease Control and Prevention data documenting the daily new and cumulative cases of confirmed COVID-19 in the U.S. from January 22 to April 6, 2020, and reconstructed the epidemic using a 5-parameter logistic growth model. We fitted our model to data from a 2-week window (i.e., from March 21 to April 4, approximately one incubation period) during which large-scale testing was being conducted. With parameters obtained from this modeling, we reconstructed and predicted the growth of the epidemic and evaluated the extent and potential effects of underdetection.
Results: The data fit the model satisfactorily. The estimated daily growth rate was 16.8% overall with 95% CI: [15.95, 17.76%], suggesting a doubling period of 4 days. Based on the modeling result, the tipping point at which new cases will begin to decline will be on April 7th, 2020, with a peak of 32,860 new cases on that day. By the end of the epidemic, at least 792,548 (95% CI: [789,162, 795,934]) will be infected in the U.S. Based on our model, a total of 12,029 cases were not detected between January 22 (when the first case was detected in the U.S.) and April 4. Conclusions: Our findings demonstrate the utility of a 5-parameter logistic growth model with reliable data that comes from a specified period during which governmental interventions were appropriately implemented. Beyond informing public health decision-making, our model adds a tool for more faithfully capturing the spread of the COVID-19 epidemic.

Entities: CellLine Chemical Disease Species

Keywords: COVID-19; Disease dynamics; Epidemics; Logistic growth model; Population-based model; Prediction; Reconstruction; Tipping point; USA; Under-detection

Year: 2020 PMID： 32435695 PMCID： PMC7225094 DOI： 10.1186/s41256-020-00152-5

Source DB: PubMed Journal: Glob Health Res Policy ISSN： 2397-0642

Introduction

Coronavirus disease 2019 (COVID-19) is an infection caused by a novel pathogen named SARS-Cov-2. Spreading worldwide in less than five months, the COVID-19 pandemic is a typical example of a global health issue [1]. In the months since the first COVID-19 case was reported in the United States on January 22, 2020, many studies have employed different models to reconstruct the epidemic (i.e., the spread of COVID-19 within the United States only) and forecast its future trends, from simple growth models to classic susceptible-infected-recovered models [2]. Yet due to the scarcity of available information about the early period of the COVID-19 epidemic, researchers lack sufficient data to construct complex and classic epidemiological models. In this context, the population-based ecological growth model is the preferable option for predicting the epidemic’s future trajectory. Researchers have developed various population-based models for modeling population dynamics and disease epidemics. One such model is the 1-parameter exponential growth model. In this model, population growth has no upper limit and is determined by one parameter of growth rate. To account for the upper limit of population growth, the 2-parameter logistic growth model was developed. In this model, the population growth rate is exponential in the beginning, but this growth rate gets smaller and smaller as population size approaches a maximum carrying capacity as detailed described in Richards [3], McIntosh [4], Renshaw [5], Kingsland [6], and Vandermeer [7]. To account for additional key characteristics of population growth, the 2-parameter logistic growth model has since been extended to 3-parameter, 4-parameter, and 5-parameter logistic growth models. These models have been widely used in other fields of research, including demography and analytical chemistry [8, 9]. Despite the many analytical advantages of these models, to our knowledge, no study has employed this 5-parameter logistic growth model to examine the COVID-19 epidemic in the United States or in other countries. Thus, one purpose of this study is to assess the utility of the 5-parameter growth model in studying the dynamics of the spread of COVID-19. Unlike typical population growth models (in which the initial population is a known quantity), only a small number of COVID-19 cases were detected during the early phase of the epidemic in the United States. In all contexts, more extensive testing services detect more cases; when the initial time of an epidemic’s outbreak is known, extensive testing can yield data that more accurately reflects the true growth of the epidemic. Data indicate that the incubation period of COVID-19 is about 14 days [10], and COVID-19 testing services in the U.S. became available in mid-March and were sustained thereafter following CDC guidelines. Therefore, the 14-day interval following the widespread implementation of testing should demonstrate the highest level of detection rates unaffected by the removal of infected individuals from the growth curve, presenting ideal data for model building. In principle, a model built with this data would more accurately capture and predict the growth of COVID-19 than models constructed from infection data ranging from the first detected case to the present.

Methods

Data

Data for this study were the daily cumulative cases of COVID-19 in the U.S. from January 22 to April 6, 2020. This real-time data were compiled by the Centers for Disease Control and Prevention (CDC) and made available on their website at the time we conducted our study [11].

Models

We modeled the data using the 5-parameter logistic growth model as below: where C(t) is the number of cumulative cases of COVID-19 over time, t (t = 1/22/2020, 1/23/2020, …, 4/6/2020); C is the minimum number of cases at the beginning of the epidemic on January 22, 2020, when the first case was reported in the U.S.; C is the maximum number of people infected by the time the epidemic ends (i.e. the model-predicted total number of Americans who will be infected with COVID-19); r is the daily exponential growth rate; t is the estimated tipping point when the number of new daily cases begins to level off and then to decrease; and α is an asymmetric parameter quantifying the skewness of the distribution of daily new cases. α = 1 indicates a symmetric distribution centered at t; α > 1 indicates faster increases in new cases before t and slower after t; and the reverse if α < 1. With Model 1 defined above, daily new cases D(t) can be obtained by taking the first derivative of the model: where the error term ∈(t) is assumed to be normally distributed with mean 0 and standard deviation of σ.

Implementation of modeling analysis

We conducted our data analysis using the software R. A 5-parameter logistic growth model was fitted to the data for new daily infections from March 21, 2020 to April 4, 2020, as shown in Model 2. Using the R function “optim,” we implemented modeling analysis using a nonlinear optimization algorithm to minimize the sum of squared errors between the observed and model-estimated data. The optimization process yielded estimates for the five parameters C, C, t, r, and α with a significance level set at p < 0.05 (two-sided). With these five estimated model parameters, we estimated model-based cumulative cases (using Model 1) and new cases (using Model 2) for each day from March 21 to April 4 and made predictions about cumulative and new daily cases after April 4. We calculated the underdetection of cases in this 2-week window by measuring the differences between the reported number and the model-predicted number of cases.

Results

Model 2 fitted the observed cumulative daily cases from March 21 to April 4 satisfactorily and the model fit converged nicely. Table 1 summarizes the estimated parameters, their standard error (SE), and their 95% confidence intervals (CI). Except for C, all model parameters were statistically significant at p < 0.001 level. The lack of significance for C appears to be reasonable given the small scale of this number relative to the other parameters and the practical difficulties of determining the number of actual cases at the beginning of the epidemic when the first few COVID-19 cases were detected and reported.

Table 1

Summary of parameter estimation

Parameter	Estimate	SE	p-value	Lower 95% CI	Upper 95% CI
C_min	29.999	2059.86	0.988	− 4007.33	4067.32
C_max	792,548	1727.56	< 0.0001	789,162	795,934
t_mid	76.9	0.456	< 0.0001	75.952	77.739
r	0.16854	0.00463	< 0.0001	0.15947	0.17761
α	0.95364	0.06194	< 0.0001	0.83224	1.07504

Note: Parameters were estimated based on daily cases of COVID-19 in the U.S. between March 21, 2020 and April 4, 2020

Summary of parameter estimation Note: Parameters were estimated based on daily cases of COVID-19 in the U.S. between March 21, 2020 and April 4, 2020 Based on our model estimates, at least 792,548 (95% CI: [789,162, 795,934]) Americans will have been infected with COVID-19 by the time the epidemic ends. This number is slightly more than twice the number of infections that had occurred in the U.S. by April 6. For reasons we discuss later, this estimate may be conservative, as the total number of reported cases exceeded 800,000 on April 21, as we completed our revisions of this paper. Our estimated tipping point for new daily cases was on about April 7, 77 days (95% CI: [76, 78]) from the beginning of the epidemic on January 22. In other words, our model predicted that the epidemic curve in the U.S. would begin to flatten around April 6–8, 2020. This estimation corroborates recent reporting that new daily cases in the U.S. have remained somewhat constant beginning in early April [12]. This tipping point suggests that it will take three to four more COVID-19 incubation periods (i.e., 6 to 8 weeks) for the U.S. to bring the epidemic under control, given our documentation and analysis of this process in China [10] (Chen X, Yu B, Chen D: Three month of COVID-19 in China: surveillance, evaluation, and forecast from outbreak to control with a second derivation model, submitted). The estimated exponential daily growth rate of COVID-19 in the U.S. population is 16.9% (95% CI: [15.9, 17.8%]), nearly the rate observed in China (17.12%) [10]. This U.S. rate suggests that the number of total COVID-19 cases in the U.S. will double every four days if no anti-epidemic actions are in place. The estimated asymmetric parameter α was 0.954 (95% CI: [0.832, 1.075]), which is not statistically different than α = 1.0. This result indicates that changes in COVID-19 cases before and after the predicted tipping point of April 7 will follow a similar pattern. For further illustration, Table 2 summarizes three sets of information ordered by days from the beginning of the epidemic: the data used for the model fitting section, a smaller reconstruction section, and a prediction section. Our fitted model detected substantial underdetected COVID-19 cases. By April 7, when this study was completed, the CDC reported a total of 395,011 detected cases; with our model, we predicted that CDC data for reported cases in fact underreported about 19,291 cases up to April 9.

Table 2

Illustration of data usage with reported, predicted, and underreported counts

Data Usage	Days	Date	Reported Cases		Predicted		Under-reported
Data Usage	Days	Date	Total	Daily	Daily	Total	Under-reported
Reconstruction	54	3/15/2020	3487	1253	3108	19,781	16,294
	55	3/16/2020	4226	739	3623	23,141	18,915
	56	3/17/2020	7038	2812	4218	27,054	20,016
	57	3/18/2020	10,442	3404	4902	31,606	21,164
	58	3/19/2020	15,219	4777	5687	36,892	21,673
	59	3/20/2020	18,747	3528	6584	43,019	24,272
Fitting	60	3/21/2020	24,583	5836	7603	50,102	25,519
	61	3/22/2020	33,404	8821	8755	58,269	24,865
	62	3/23/2020	44,183	10,779	10,047	67,658	23,475
	63	3/24/2020	54,453	10,270	11,485	78,411	23,958
	64	3/25/2020	68,440	13,987	13,070	90,676	22,236
	65	3/26/2020	85,356	16,916	14,797	104,598	19,242
	66	3/27/2020	103,321	17,965	16,656	120,315	16,994
	67	3/28/2020	122,653	19,332	18,624	137,947	15,294
	68	3/29/2020	140,904	18,251	20,670	157,589	16,685
	69	3/30/2020	163,539	22,635	22,750	179,298	15,759
	70	3/31/2020	186,101	22,562	24,810	203,082	16,981
	71	4/1/2020	213,144	27,043	26,784	228,889	15,745
	72	4/2/2020	239,279	26,135	28,600	256,597	17,318
	73	4/3/2020	277,205	37,926	30,180	286,010	8805
	74	4/4/2020	304,826	27,621	31,453	316,855	12,029
Forecast	75	4/5/2020	330,891	26,065	32,352	348,791	17,900
	76	4/6/2020	374,329	43,438	32,830	381,419	7090
	77	4/7/2020	395,011	20,682	32,860	414,302	19,291
	78	4/8/2020	427,460	32,449	32,436	446,987	19,527
	79	4/9/2020	459,165	31,705	31,582	479,030	19,865
	80	4/10/2020	492,416	33,251	30,340	510,021	17,605

Illustration of data usage with reported, predicted, and underreported counts Using a 2-week interval (i.e., March 21 to April 4) of data, our model’s prediction of the number of new daily cases from April 5 to April 11 matched quite well with the observed data. For example, the model-predicted number on April 9 was 31,705, very close to the observed number of 31,582. These results should be interpreted with caution. The estimated sum square of error = 2638.434 is quite large, meaning that although our model fitted the 2-week interval of data very well, a large amount of variation in the data is not explained by this model. Below we provide two figures comparing the observed and model-predicted dynamics of new daily cases (Fig. 1) and of cumulative cases (Fig. 2). Overall, the model we constructed from only two weeks of data very closely predicted the reported numbers of both new and cumulative cases. Correspondingly, our model predicts that the cumulative cases will continue to increase rapidly after the tipping point until early May, as illustrated in Fig. 2.

Fig. 1

Observed vs. model-estimated and forecasted daily new COVID-19 cases, January 22–May 30, U.S.A

Fig. 2

Observed vs. model-estimated and forecasted daily cumulative COVID-19 cases, January 22–May 30, U.S.A

Observed vs. model-estimated and forecasted daily new COVID-19 cases, January 22–May 30, U.S.A Observed vs. model-estimated and forecasted daily cumulative COVID-19 cases, January 22–May 30, U.S.A

Discussion

This study details our efforts to model, reconstruct, and forecast the COVID-19 epidemic using a 5-parameter logistic growth model – a method widely used in demography, biology, and other hard sciences. To our knowledge, we are the first to use this model to analyze the COVID-19 epidemic in the U.S. We also developed and used our model through an innovative approach. Namely, to fit the model we intentionally used data from a 2-week period when new cases could be more completely detected, and we then used this fitted model to reconstruct the growth of cases before and after the 2-week period as well as to forecast the future development of the epidemic beyond the study period. Based on findings from our modeling analysis, there is not a high likelihood that the number of daily new cases will increase continuously after the tipping point (i.e., April 7, 2020). However, our model’s estimation that at least 800,000 Americans will be infected over the course of the epidemic may be conservative, given that the total number of reported cases exceeded 800,000 on April 21, as we completed our revisions of this paper, while the new cases fluctuated between 26,000 and 35,000 per day due to the increased appearance of cases in other cities and states outside of New York. This conservative estimation is potentially attributable to three factors. First, the exponential growth of our logistic model is very sensitive to differences in growth rate, and a small difference in the number of early cases can lead to a sizeable difference in predictions of subsequent cases. Second, although we strategically selected a 2-week interval of data that we believed would yield the best model for predicting the epidemic’s growth, this data likely still underreported the actual number of COVID-19 infections, making our estimated growth rate smaller than the true growth rate. For example, the estimated exponential growth rate of COVID-19 is 17.12% for China [10], higher than 16.85%, the rate we estimated for the U.S. A small difference in the exponential growth rate can result in substantial differences in the maximum number of infections. And third, the data used for this analysis is from March 21 to April 4, 2020, where most of the reported cases are from the states of New York and New Jersey. The reported cases from these two states are flattened from reported CDC. Still, more cases are reported from other states, especially from the states of Michigan, Florida, Louisiana, which would add to the cases from New York and New Jersey to exceed the 800,000 predicted. The accuracy of our model is also contingent on the federal- and state-level policy decisions that emerge in coming months. Although many states have implemented strict shelter-in-place policies to slow down the epidemic’s spread, several states still have no such policies in place. In the absence of further policy action, we expect that more cases will be reported which may greatly surpass the estimated 800,000, and that the actual infection tipping point may occur later in April. Indeed, significant variations still persist in the estimated total infections in the U.S. even in light of available data: Ferguson et al. [13] predicted 2.2 million cases whereas the CDC’s worst-case scenario model predicted a shocking 214 million cases [14]. At this moment, it remains unclear which estimates are more reliable. The accuracy of our estimation will be tested in light of emerging data on the progression of the epidemic in the United States. The daily exponential growth rate of COVID-19 is 16.85% for the U.S. population, nearly the rate observed in China (17.12%) [10]. Daily exponential growth rates can be obtained with limited data in the early period of an epidemic, and they provide a dynamic measure of instantaneous change, making doubling times calculated based on growth rate highly useful for directing and evaluating anti-epidemic measures. The U.S.’s daily exponential growth rate suggests that the number of COVID-19 infections will double every four days. For example, if the total cases are 500,000 today, there will be 1,000,000 in four days (with 40,000 anticipated deaths) if no timely anti-epidemic measures are implemented. No one – including policymakers, medical and health professionals, and the general public – should ignore this evidence of the pressing need to control the pandemic.

Conclusion

Understanding and curbing the COVID-19 epidemic in the U.S. is an essential part of fighting the pandemic globally [1]. This study provides data important for informing public health decision-making designed to end the epidemic in the U.S. Our study also demonstrates the utility and efficiency of the 5-parameter logistic growth model for examining the dynamics of an epidemic in its early period when little data is available. Additionally, our selection of the 5-parameter logistic exponential growth model was based on intensive testing of other models, including 2-parameter, 3-parameter, and 4-parameter models. Of all models tested, the 5-parameter produced the most accurate results and generated key information, including the exponential growth rate, the doubling time for the epidemic, and the tipping point when daily new cases will level off. Our study’s findings should be considered in light of their limitations. First, our strategic selection of data from a specific timeframe is more subjective than objective, and not applicable in all contexts. Researchers applying this method in different countries/regions with different anti-epidemic strategies implemented in different ways should make their own determinations regarding the optimal timeframe to select for their modeling. We selected the 2-week interval from March 21 to April 4 because this interval spans approximately one COVID-19 incubation period and because the U.S. government began implementing widespread testing services by the beginning of this period, meaning that data from this interval potentially captured a more representative set of new cases. Interested readers can conduct their own analyses using this model while expanding on this time window to further assess the utility of this method. So far, the model’s short-term predicted daily cases are quite close to the observed daily cases, as shown by Table 2. However, our model’s long-term predictions of future new daily cases may not be accurate (which is true of any model-based long-term prediction), so these long-term predictions should be considered with caution. Second, additional work is needed to improve confidence in the accuracy C, the minimum number of cases at the beginning of an epidemic. It is challenging to improve this estimation given the large range of different measures in the model. For example, the range between C and C in our analysis is from about 30 to about 800,000. Furthermore, the number of reported cases at the beginning of the epidemic is highly unreliable due to a lack of testing protocols and perhaps a lack of awareness of the incipient epidemic itself, which will lead in turn to an unreliable estimation of C. Despite the limitations, findings from this study provide timely data that can inform public health decision-making and policies designed to end the epidemic. We will continue to update our model as more data become available and the COVID-19 epidemic in the United States continues to evolve.

3 in total

1. The five-parameter logistic: a characterization and comparison with the four-parameter logistic.

Authors: Paul G Gottschalk; John R Dunn
Journal: Anal Biochem Date: 2005-08-01 Impact factor: 3.365

2. First two months of the 2019 Coronavirus Disease (COVID-19) epidemic in China: real-time surveillance and evaluation with a second derivative model.

Authors: Xinguang Chen; Bin Yu
Journal: Glob Health Res Policy Date: 2020-03-02

3. What is global health? Key concepts and clarification of misperceptions: Report of the 2019 GHRP editorial meeting.

Authors: Xinguang Chen; Hao Li; Don Eliseo Lucero-Prisno; Abu S Abdullah; Jiayan Huang; Charlotte Laurence; Xiaohui Liang; Zhenyu Ma; Zongfu Mao; Ran Ren; Shaolong Wu; Nan Wang; Peigang Wang; Tingting Wang; Hong Yan; Yuliang Zou
Journal: Glob Health Res Policy Date: 2020-04-07

3 in total

8 in total

1. COVID-19 pandemic and Farr's law: A global comparison and prediction of outbreak acceleration and deceleration rates.

Authors: Kevin Pacheco-Barrios; Alejandra Cardenas-Rojas; Stefano Giannoni-Luza; Felipe Fregni
Journal: PLoS One Date: 2020-09-17 Impact factor: 3.240

2. Levels of economic developement and the spread of coronavirus disease 2019 (COVID-19) in 50 U.S. states and territories and 28 European countries: an association analysis of aggregated data.

Authors: Yanjie Zhang; Lauren Aycock; Xinguang Chen
Journal: Glob Health J Date: 2021-02-09

Review 3. A review on COVID-19 forecasting models.

Authors: Iman Rahimi; Fang Chen; Amir H Gandomi
Journal: Neural Comput Appl Date: 2021-02-04 Impact factor: 5.102

4. Experiences of surveillance, influential factors, and prevention to end the global coronavirus disease 2019 (COVID-19) pandemic.

Authors: Yanjie Zhang; Chen Xinguang
Journal: Glob Health J Date: 2021-04-08

5. Using artificial intelligence technology to fight COVID-19: a review.

Authors: Yong Peng; Enbin Liu; Shanbi Peng; Qikun Chen; Dangjian Li; Dianpeng Lian
Journal: Artif Intell Rev Date: 2022-01-03 Impact factor: 9.588

6. Transmission of COVID-19 from community to healthcare agencies and back to community: a retrospective study of data from Wuhan, China.

Authors: Mei Yang; Anshu Li; Gengchen Xie; Yanhui Pang; Xiaoqi Zhou; Qiman Jin; Juan Dai; Yaqiong Yan; Yan Guo; Xinghua Liu
Journal: BMJ Open Date: 2021-12-17 Impact factor: 2.692

7. An intelligent forecast for COVID-19 based on single and multiple features.

Authors: Yilei Wang; Yiting Zhang; Xiujuan Zhang; Hai Liang; Guangshun Li; Xiaoying Wang
Journal: Int J Intell Syst Date: 2022-08-18 Impact factor: 8.993

8. COVID-19 in India: Statewise Analysis and Prediction.

Authors: Palash Ghosh; Rik Ghosh; Bibhas Chakraborty
Journal: JMIR Public Health Surveill Date: 2020-08-12

8 in total