Literature DB >> 35734092

Empirical Model of Spring 2020 Decrease in Daily Confirmed COVID-19 Cases in King County, Washington.

Abstract

Projections of the near future of daily case incidence of COVID-19 are valuable for informing public policy. Near-future estimates are also useful for outbreaks of other diseases. Short-term predictions are unlikely to be affected by changes in herd immunity. In the absence of major net changes in factors that affect reproduction number (R), the two-parameter exponential model should be a standard model â€" indeed, it has been standard for epidemiological analysis of pandemics for a century but in recent decades has lost popularity to more complex compartmental models. Exponential models should be routinely included in reports describing epidemiological models as a reference, or null hypothesis. Exponential models should be fitted separately for each epidemiologically distinct jurisdiction. They should also be fitted separately to time intervals that differ by any major changes in factors that affect R. Using an exponential model, incidence-count half-life ( t 1/2 ) is a better statistic than R. Here an example of the exponential model is applied to King County, Washington during Spring 2020. During the pandemic, the parameters and predictions of this model have remained stable for intervals of one to four months, and the accuracy of model predictions has outperformed models with more parameters. The COVID pandemic can be modeled as a series of exponential curves, each spanning an interval ranging from one to four months. The length of these intervals is hard to predict, other than to extrapolate that future intervals will last about as long as past intervals.

Entities: Chemical

Year: 2022 PMID： 35734092 PMCID： PMC9216716 DOI： 10.1101/2020.05.11.20098798

Source DB: PubMed Journal: medRxiv

INTRODUCTION

Projections of the near future of daily case incidence during epidemics are valuable for informing both individuals and public policy. Simple epidemiologic theory predicts an exponential change in viral incidence in a largely susceptible population, in which relatively few individuals are immune (Ross, 1915). In 1928, Merritte Ireland reported for the 1918 influenza pandemic, “If this was really the beginning of the great epidemic wave one should expect that if these series of data were plotted out on a logarithmic scale the increase from week to week would plot out as a straight line following the usual logarithmic rise of an epidemic curve” (Ireland, 1928). The basic insight that outbreaks of infectious disease should fit an exponential model remains valid a century later. An exponential model is simultaneously simple and mechanistic. There are a mere two parameters, yet these parameters enable modeling a biologically and epidemiologically plausible mechanism. In short, a fixed rate of transmission between individuals results in an exponential change in case counts. Furthermore, the parameters of the model are interpretable. In particular, the incidence-count half-life (t1/2) has a specific, useful, and understandable meaning. Together, these five factors – historical acceptance, simplicity, plausibility, interpretability, and utility – make the exponential model best suited as the default model by which all other models should be compared.

RESULTS

A model for King County during Spring 2020 is presented here as an example of exponential model fitting. The fit interval is bounded by a start date representing widespread adoption of physical distancing public policy on March 26, 2020, and an end date of June 17, 2020 (near the end of the first wave). It spans a time interval of 82 days, or several months. The March 26 bound was selected for two reasons. First, the reproduction number (R) of SARS-CoV-2 will have changed dramatically after physical distancing (aka, “social distancing”) policies were implemented in King County earlier in March. For example, Public Health — Seattle & King County (PHSKC) began encouraging physical distancing on March 10 and Washington State issued a “stay at home” directive on March 23. The inflection point of daily case counts of March 26 chosen as a bound was not caused by a rapid change in herd immunity. It was caused by a dramatic change in R secondary to intentional changes in human behavior. This cause is markedly different in character than the cause of the inflection point in most classical models (e.g., SIR), which is due to saturation of susceptible individuals. A May 2020 estimate of cumulative infections and therefore likely fraction of immune individuals in King County is only 2.1% (Thakkar, Burstein, Klein, and Famulare, 2020), and this number has been slow to change. Therefore, an unmodified Susceptible, Infected and Recovered (SIR) model is not appropriate for this situation. Second, inspection of the raw data suggests an inflection point at or around this date. The highest confirmed case count to date occurred on April 1, reported as 219 case counts, but there is sufficient variability in the daily reported case counts that one cannot rule out a true peak of the curve occurring either several days earlier or later. The analysis is robust to slightly different choices of threshold for date, including any date from March 25 to April 3, representing the peak of the Spring 2020 COVID-19 outbreak in King County (also known as “the first wave”). Visual inspection of the raw data after March 26 also suggests that an exponential fit would be excellent. Nonlinear regression was performed on the daily confirmed case counts (Figure 1), with excellent fit (R2 = 0.87). The equation for the fitted curve is, where N = predicted daily confirmed cases and t = days after March 26:

Figure 1.

Exponential Fit to Model Decrease of Confirmed Daily Cases of COVID-19 in King County during Spring 2020. Each point is the number of confirmed cases of COVID-19 for each day as reported on June 17, 2020 by PHSKC. The date range is March 26 to June 16, 2020.

The two parameters of the model are the initial case count (N0 = 204) and a rate constant (λ = 0.025). Consequently, the half-life (t1/2) is If the case count were increasing, the half-life would be instead described as a doubling time. The King County model fit has a half-life (t1/2) of 27.2 days (95% CI: 24.6—30.1). That is, the number of daily confirmed cases is expected to drop by 50% every 27 days. With approximately 80 cases observed on May 2, it predicted about 40 cases per day on May 29, 20 cases per day on June 25, and 10 cases per day on July 22, 2020. The Institute for Disease Modeling (IDM) model for COVID-19 assumes 4-day latency after infection followed by a period of 8-days of uniform infectiousness (Thakkar, Burstein, Klein, Schripsema, and Famulare, 2020b). With these parameters, the model predicts R = 0.82 (95% CI: 0.80—0.83). The combination of simple model, large number of data points, and low residuals lead to narrow confidence intervals for these analyses. One test of a model’s utility is its ability to predict the future. Predictions from our model were first made on May 3 and ensconced in MedRxiv on May 11, 2020. The model parameters have remained stable and the predictions have largely been borne out. However, sub-analyses suggest that R has been increasing since March 26, slowly at first and more rapidly in weeks closer to the end of May (Table 1). Alternatively, or in conjunction, increases in testing have enabled a higher ratio of confirmed to true case counts. It is difficult to establish from these data alone that these changes in R are significant (see Figure S1); they could result from a combination of progressive improvement in PHSKC’s estimates of previous days’ case counts, increased number of training data points, and noise. Indeed, a previous version of this paper included a correction for testing rate that at least partially accounts for this apparent increase in R (Roach, 2020). However, an increase in R has been expected due relaxation of public health policies and increases in mass gatherings, so it is likely that these observations reflect a true increase in R.

Table 1.

Model stability.

The model’s prediction of R is relatively stable when trained over all data starting with March 26. The model was first constructed on May 2, so predictions made on dates in April are hypothetical historical predictions; they were not prospective. If the model is trained only on recent data, starting with dates in May, longer half-lives are predicted, with R approaching 1. Since data is reported by PHSKC with a one-day lag, a model with a last date of 6/16, for example, would be created on 6/17. These data suggest that R is increasing and/or that testing has enabled a higher ratio of confirmed to true case counts.

Last Date Used to Train Model	Start of Data used for Model	Prospective Prediction	Number of Training Data Points	λ	Confirmed Cases Predicted on June 30	Half-life (days)	R
6/16/20	3/26/20	Yes	83	0.0255	17.6	27.2	0.815
6/10/20	3/26/20	Yes	77	0.0268	15.7	25.9	0.807
6/1/20	3/26/20	Yes	68	0.0271	15.3	25.6	0.805
5/20/20	3/26/20	Yes	56	0.0280	14.1	24.7	0.799
5/10/20	3/26/20	Yes	46	0.0283	13.7	24.5	0.798
5/2/20	3/26/20	Yes	38	0.0306	11.0	22.6	0.783
4/20/20	3/26/20	No	25	0.0349	7.4	19.9	0.756
4/10/20	3/26/20	No	15	0.0312	10.1	22.2	0.779
6/16/20	5/10/20	Yes	38	0.0146	26.6	47.4	0.890
6/16/20	5/25/20	Yes	23	0.0022	36.5	317.1	0.983

Comparison with Prior Work

By design, this exponential model is smoother than other models. This design can create some local inconsistency with other models in the context of global agreement. The model presented here has consistencies with other data and models. In particular, reports by the Seattle Flu Study (Chu et al., 2020) and the Seattle Coronavirus Assessment Network (Greater Seattle Coronavirus Assessment Network, 2020) also show data that is consistent with an exponential decay in case rate in King County over the same period. An excellent model for the COVID-19 outbreak in King County has been developed by the Institute for Disease Modeling (IDM) (Thakkar, Burstein, Klein, and Famulare, 2020). This model is relied upon by both King County and Washington for public policy. Ideally an outbreak model should account for many hidden variables. These variables change over time. R is changing over time due to differences in population behaviors such as physical distancing. Likewise, the population is heterogeneous, and includes subpopulations such as those in dense living situations. Each of these subpopulations may contribute uniquely to a real-world model; there is no guarantee that an encompassing systems model should fit an exponential. The IDM model attempts to capture many of these variables. Furthermore, it is parameterized with the premise that these variables can change rapidly, and produce rapid and large changes in the point estimate for R. Such rapid and large changes are not typically possible to model with a two-parameter model. An additional advantage of the IDM model is that it does not require piecewise fitting. Therefore, one would expect the IDM model to better capture very short-term fluctuations in R, but to be susceptible to overfitting. Conversely, a two-parameter model limits the possibility of overfitting and should be more robust to noisy data. Consistent with this expectation, the confidence intervals for the IDM and the exponential overlap across the entire trajectory, but the point estimates for R differ considerably. The IDM reported R=0.73 (95% CI: 0.3 to 1.2) for March 25; R=0.94 (95% CI: 0.55 to 1.33) for April 4; R=0.64 (95% CI: 0.28 to 1.0) for April 15; and R=0.89 (95% CI: 0.47 to 1.31) for April 22 (Thakkar, Burstein, Klein, Schripsema, and Famulare, 2020a; Thakkar, Burstein, Klein, Schripsema, and Famulare, 2020b; Thakkar, Burstein, Klein, and Famulare, 2020; Thakkar and Famulare, 2020). The IDM models more volatility in their point estimate for R, varying from 0.94 to 0.64 and back to 0.89 over the course of three weeks. One can replicate this volatility by adding more parameters to the empirical modeling approach, such as fitting with a local smoothing algorithm (e.g., as shown in Figure 2). One possible interpretation is that the IDM model has some overfitting. Since the exponential model falls within the confidence interval of the IDM model, this inconsistency may not be relevant for informing policy, as long as confidence intervals are used for reporting data, and not point estimates.

Figure 2.

Local curve fitting may either capture daily dynamics but more likely reveals overfitting to noise. Data points are identical to those in Figure 1.Curve is fit with the R geom_smooth function (span=0.25), which produces a fit similar to some multiparameter models such as that of IDM. The increasing fit line at the end of June is consistent with a recently reported R=1.2 (King County, 2020a).

DISCUSSION

Occam’s razor philosophizes that simple models are to be favored over more complex models unless there is compelling reason. Use of a simple historically accepted model limits temptation to shop for a model that fits a preconceived notion or policy. Simple models avoid overfitting to noise. Universal use of simple models increases comparability of results across studies; more complex models such as the “Susceptible, Infected and Recovered” (SIR) model are not as readily comparable as model specifications are not universal. For example, SIR models may differ in their inclusion of vital dynamics. The number of parameters can vary highly between models. For example, Wong et al. (2020) use 21 parameters; Morozova et al. (2020) use 25 parameters. To aid in comparing predictive models, the number of parameters should be prominently published with model results. In Spring 2020 in King County, there are no significant differences between the results of more complex models and a simple exponential model; the hypothesis that complex models are superior to the exponential model should be rejected. This empirical model is sufficiently valuable that it should be used to inform public policy. It is important for public health that R be less than 1. Public policy should be crafted to keep R less than 1. If R is close to or greater than 1, more aggressive public policy measures are warranted. This model provides guidance on how close R is currently to 1. Likewise, the length of the half-life allows appropriate expectations and allocation of resources. For example, if the number of cases declines by half only every 25 days or longer, it may make sense to put less emphasis on waiting under “stay-at-home” measures for the case count to reach a much lower level and more emphasis on increasing contact tracing and other public health measures. Use of raw reported confirmed case counts to model the real incidence of COVID19 is subject to many caveats. They undercount true case incidence, perhaps by a factor of ten. However, if this factor is relatively constant, then the estimate of half-life and R is invariant. Confirmed case counts may not uniformly sample the population. The dynamics of the outbreak(s) in King County may be substantially different in different subpopulations within the county (e.g., herd immunity may be reached in a subset of long-term care facilities). Although they have in the past produced a good fit to an exponential model (Figure 3), the concurrence of parameters that sums to fit an exponential model may not persist (e.g., arrow highlights in Figure 3). The characteristics of the population that receives tests vary over time. The number of tests performed may vary over time. If tests increase, then the true incidence may decline without a corresponding decline in confirmed incidence. The model does not currently include testing rate as a parameter. Indeed, over the March to June time frame of the model, testing rate per day has not substantially altered in King County (UW Virology COVID-19 Dashboard, 2020; PHSKC website, 2020; Washington State Department of Health website, 2020). Positive tests may be delayed by 1–2 weeks after infection. If an additional ten days embargo on the data is added, such as for the IDM model, there could be close to a month lag between a policy implementation and a statistically observable effect on the model. With extreme psychological, social, medical, and economic consequences of policy decisions, there is likely to be value in policy informed on the latest data, even if those data are subject to revision. However, policy based on just-in-time data such as the model presented here must be nimble enough to alter in the event of major model errors (like would have occurred during the week prior to 5/3/20). The data and curve fit in Figure 1 best reflect a relatively smooth and continuous decline in cases coupled with a slowly increasing R. This is consistent with a society that is gradually decreasing effective social distancing, gradually increasing testing throughput, and gradually achieving pockets of herd immunity. These data and analyses are slightly inconsistent with an interpretation of rapid and substantial changes in R that might drive conflicting back-to-back reports from King County Public Health such as those of May 4, “COVID-19 transmission has slowed,” and May 8, “COVID-19 transmission rate could be rising in King County […] after previous indications the transmission rate had fallen below a critical threshold” (King County, 2020b).

Figure 3.

Trajectory of case counts in King County through early February of 2022. On this log-linear visualization, exponential curves appear as straight line segments. This echoes Ireland’s claim that, “If this was really the beginning of the great epidemic wave one should expect that if these series of data were plotted out on a logarithmic scale the increase from week to week would plot out as a straight line following the usual logarithmic rise of an epidemic curve” (Ireland, 1928). The entire course of the pandemic can be seen as a series of line segments, with sudden changes in slope at mostly unpredictable break points. This demonstrates that the COVID pandemic is excellently modeled as a series of exponential curves, but after varying intervals (ranging from one to four months long), the exponent of those curves will change, often dramatically. The length of these intervals is hard to predict, other than to assume future intervals will be about as long as past intervals. The majority of those changes result in switch in sign of the slope (e.g., from increasing to decreasing, or vice versa); the result is a sawtooth pattern. Two exceptions to this pattern are noted by arrows. The purple arrow points to a line segment with particularly high variability caused by snowstorms and holidays (circa New Year 2021). During these events, people shifted the dates they otherwise would have sought testing. The red arrow points to a period of high vaccination rate (circa April 2021), causing a prolonged period of increasing herd immunity, and thus substantial deviation from exponential behavior (resulting in a downward concave shape to the visualization of this interval).

Public policy should be to make as much data available to inform these just-in-time models as can reasonably be balanced against civil liberties and privacy considerations. In particular an understanding of what subpopulations are reporting case counts would considerably improve the value of these models for informing policy. It would be valuable for government agencies to produce and make available data including occupation and living environment to improve these models in a manner that appropriately protects privacy. The “incidence count half-life (t1/2)” metric for post-peak outbreak modeling may be a more useful metric for communicating with the popular press than Re or R0. To date, R has largely been an academic statistic, and may be more useful for communicating approximations about the biology and mechanisms of transmission of a pathogen than for formulating public health policy. It is particularly awkward and counterintuitive that a parameter with a subscript of 0 (naught) is not a constant but is highly variable and dependent on many observed and latent parameters. R is a mathematical paralog to the statistic ‘average lifetime’ (τ) used by physicists to describe radioactive decay. In that field, τ is relatively deprecated compared to t1/2; t1/2 has many times greater usage in general and public communication than τ. Even if the relationship between R and half-life is modeled using a single parameter, the number of infectious days (d), a high uncertainty in R remains unavoidable because there is considerable uncertainty in d. This parameter d may vary between individuals and may vary considerably over time for each individual. The length of time an individual is infectious and how individual infectiousness varies over time and circumstances is not known. Therefore, there is more uncertainty for estimates of R than for half-life. Half-life depends only on the observed data; therefore, a good estimate of the uncertainty in half-life can be determined from the observed case-count data. This is not true for R. Furthermore, reproduction number parameters (e.g., Re and R0) depend on many other factors that are typically incomparable between publications without expert interpretation and are neither well observed nor well known for SARS-CoV-2 (Delamater et al., 2019; Viceconte & Petrosillo, 2020). Since these factors must be estimated, a large uncertainty for R must be reported. This uncertainty in R overstates the amount of uncertainty relevant to at least some public health policy decisions. Furthermore, extra care should be used in reporting confidence intervals for R that have values near 1 or that include 1. Crossing this threshold has a large impact on public policy. A philosophy of this approach – the use of an exponential model to describe a disease outbreak – is to keep the model as simple as possible to avoid overfitting and to avoid assuming too much knowledge about the underlying processes and parameters of the outbreak. Major changes in factors that alter the biological plausibility of the model will invalidate its predictions. These changes that have overwhelming impact on model parameters may include (1) rapid changes in herd immunity, such as occur near the peak incidence of an epidemic, (2) public policy shifts, such as “shelter in place”, that are widely adopted, (3) advent of a novel variant of concern or change, and (4) changes in human behavior. Therefore, the exponential model is best used piecewise, with separate parameter fits to intervals punctuated by such events (Figure 3), as recommended by Faranda et al. (2020), “Dynamical and statistical modeling should focus on limited stages of the epidemics and restrict the analysis to specific regions, thus accounting for large uncertainties.” The model will work equally well in piecewise intervals with rising or falling daily case counts. Similarly, Read et al. (2020) suggest that dynamical and statistical modeling should focus on limited stages of the epidemics and restrict the analysis to specific regions. Extrapolation based on a simple epidemiological model cannot account for rare unpredictable events. Thus, a limited focus avoids the appearance of overconfidence.

5 in total

1. Complexity of the Basic Reproduction Number (R₀).

Authors: Paul L Delamater; Erica J Street; Timothy F Leslie; Y Tony Yang; Kathryn H Jacobsen
Journal: Emerg Infect Dis Date: 2019-01 Impact factor: 6.883

2. Early Detection of Covid-19 through a Citywide Pandemic Surveillance Platform.

Authors: Helen Y Chu; Janet A Englund; Lea M Starita; Michael Famulare; Elisabeth Brandstetter; Deborah A Nickerson; Mark J Rieder; Amanda Adler; Kirsten Lacombe; Ashley E Kim; Chelsey Graham; Jennifer Logue; Caitlin R Wolf; Jessica Heimonen; Denise J McCulloch; Peter D Han; Thomas R Sibley; Jover Lee; Misja Ilcisin; Kairsten Fay; Roy Burstein; Beth Martin; Christina M Lockwood; Matthew Thompson; Barry Lutz; Michael Jackson; James P Hughes; Michael Boeckh; Jay Shendure; Trevor Bedford
Journal: N Engl J Med Date: 2020-05-01 Impact factor: 91.245

Empirical Model of Spring 2020 Decrease in Daily Confirmed COVID-19 Cases in King County, Washington.

INTRODUCTION

RESULTS

Comparison with Prior Work

DISCUSSION

1. Complexity of the Basic Reproduction Number (R₀).

2. Early Detection of Covid-19 through a Citywide Pandemic Surveillance Platform.

3. Asymptotic estimates of SARS-CoV-2 infection counts and their sensitivity to stochastic perturbation.

4. Novel coronavirus 2019-nCoV (COVID-19): early estimation of epidemiological parameters and epidemic size estimates.

5. COVID-19 R0: Magic number or conundrum?

Empirical Model of Spring 2020 Decrease in Daily Confirmed COVID-19 Cases in King County, Washington.

INTRODUCTION

RESULTS

Comparison with Prior Work

DISCUSSION

1. Complexity of the Basic Reproduction Number (R0).

2. Early Detection of Covid-19 through a Citywide Pandemic Surveillance Platform.

3. Asymptotic estimates of SARS-CoV-2 infection counts and their sensitivity to stochastic perturbation.

4. Novel coronavirus 2019-nCoV (COVID-19): early estimation of epidemiological parameters and epidemic size estimates.

5. COVID-19 R0: Magic number or conundrum?

1. Complexity of the Basic Reproduction Number (R₀).