We propose a generalized non-linear state-space model for count-valued time series of COVID-19 fatalities. To capture the dynamic changes in daily COVID-19 death counts, we specify a latent state process that involves second-order differencing and an AR(1)-ARCH(1) model. These modelling choices are motivated by the application and validated by model assessment. We consider and fit a progression of Bayesian hierarchical models under this general framework. Using COVID-19 daily death counts from New York City's five boroughs, we evaluate and compare the considered models through predictive model assessment. Our findings justify the elements included in the proposed model. The proposed model is further applied to time series of COVID-19 deaths from the four most populous counties in Texas. These model fits illuminate dynamics associated with multiple dynamic phases and show the applicability of the framework to localities beyond New York City.
We propose a generalized non-linear state-space model for count-valued time series of COVID-19 fatalities. To capture the dynamic changes in daily COVID-19 death counts, we specify a latent state process that involves second-order differencing and an AR(1)-ARCH(1) model. These modelling choices are motivated by the application and validated by model assessment. We consider and fit a progression of Bayesian hierarchical models under this general framework. Using COVID-19 daily death counts from New York City's five boroughs, we evaluate and compare the considered models through predictive model assessment. Our findings justify the elements included in the proposed model. The proposed model is further applied to time series of COVID-19 deaths from the four most populous counties in Texas. These model fits illuminate dynamics associated with multiple dynamic phases and show the applicability of the framework to localities beyond New York City.
Since early 2020, the COVID‐19 pandemic has posed a sustained threat to public health across the globe. With more than 100 million confirmed cases and 2.15 million confirmed deaths across 192 territories (Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, 2020),
1 the infectious respiratory disease has spread, sickened, and killed on an unprecedented scale.The staggering global totals arise from local, transient outbreaks, in which disease spreading contacts, new cases, and subsequent deaths are concentrated in space and time. Long range human mobility creates contacts across the globe, and a few such links can carry an epidemic between distant cities. Local contacts, however, drive the outbreaks of COVID‐19, creating waves of cases and then, later on, waves of hospitalizations and deaths.In late February and early March, following the original outbreak in China and amid the escalating situation in Europe, COVID‐19 quietly took hold in New York City (NYC). At the time of the city's first confirmed case on March 1, it is estimated that there were over 10,000 infections (Carey & Glanz, 2020). Control measures were slow to be put in place, with bars, restaurants, and schools staying open for the next 2 weeks, and no public face covering requirement until April 15. NYC quickly became the epicentre of the pandemic, with hospitals stretched, intensive care units (ICUs) filling up, and sickened New Yorkers dying at an alarming rate.In this paper we assess how a dynamic model can track the trends in observed COVID‐19 mortality data. We propose a non‐linear state‐space time series model that is fitted to daily COVID‐19 deaths in NYC, and we evaluate forecasts of held‐out, look‐ahead counts. Moreover, we fit and assess a succession of models within the same framework, where the progression from simpler to more complex is driven by the behaviour we wish to capture—volatility in the region of high curvature and the subsequent settling following the observed peak. We use second‐order differencing to model these dynamics in order to de‐trend the observed series and capture the smoothness in the daily deaths curvature.Mathematical epidemiological models can generally be characterized as either mechanistic or ‘phenomenological’ (Chowell et al., 2016). Existing models of COVID‐19 deaths fall in both categories (Altieri et al., 2020; Cooper et al., 2020; Harvey & Kattuman, 2020; He et al., 2020). Our model falls into the second category, using observed historical data to make predictions, without explicitly accounting for any epidemiological mechanisms.Good data alongside good modelling is essential for successful COVID‐19 forecasting (Jewell et al., 2020; Chin et al., 2020). We use deaths reported by the NYC Department of Health New York City Department of Health and Mental Hygiene (2020). These deaths are confirmed by the City's Office of the Chief Medical Examiner and the Health Department's Bureau of Vital Statistics, a measure that can improve uniformity and reliability (Pappas, 2020). The counts include decedents who were NYC residents and had a positive laboratory test for the coronavirus. Figure 1 shows these confirmed counts in the five NYC boroughs in the spring of 2020.
FIGURE 1
One hundred days of New York City COVID‐19 mortality data. The observed peak is indicated in red
One hundred days of New York City COVID‐19 mortality data. The observed peak is indicated in redThis paper describes the statistical methods we employ and presents the results of applying these methods to observed COVID‐19 death counts. Section 2.1 introduces the modelling framework, and Section 2.2 specifies the general form and progression of our considered models. Section 2.3 details the tools for predictive model assessment. In Section 3.1 we compare the progression of model fits to NYC data. Sections 3.2 and 3.3 further examine fits from the proposed model. Section 3.2 shows posterior estimates around the observed April peaks for all five NYC boroughs, whereas Section 3.3 shows fits to longer series of observed deaths from the four most populous Texas counties.
METHODS
We take a non‐linear state‐space model approach to modelling daily death counts. This approach fits nicely into a hierarchical modelling framework for which Bayesian estimation methods are accessible. Specifically, we use the Monte Carlo methods (Carpenter et al., 2017; Stan Development Team, 2020) for fitting our model, obtaining posterior samples of the model parameters. Out‐of‐sample look‐ahead predictions are then found by drawing from posterior predictive distributions.
Model framework
Let y
be the observed daily count on Day t. We assume that y
given λ
follows a parametric distribution for count data with mean λ
. The dynamics of the time‐varying mean λ
is specified through a latent state process s
, which updates according to a density that depends on the previous state s
.Let p(y
|λ
, β) and p(s
|s
, β) denote the conditional observational density and the state update density, respectively, where β includes model parameters that do not vary with time. To complete the Bayesian specification, we assign prior distributions p(β) and
to the conditioning parameters β and the initial states
, respectively. For second‐order differencing we must initialize
states. The model specification is
where λ
↦ s
means that λ
is encoded in the latent state s
. Given T observations
, we can fit this generative model using Monte Carlo methods and obtain estimates of the posterior distribution p(s
1 : , β|y
1 : ).We make k‐day ahead forecasts based on the posterior predictive distributionIt is straightforward to sample from this distribution using the posterior samples from p(s
, β|y
1 : ). In particular we can drawA set of draws
from (3) approximates (2). We make point estimates and construct intervals from these draws and evaluate predictive performance by considering an associated loss. We assess model fit by considering held‐out k‐day ahead predictions relative to these posterior predictive draws.
Models considered
Within this generalized state‐space, parameter‐driven modelling framework (Davis et al., 1999), we consider and fit a sequence of models. These models vary in their observation equation p(y
|λ
, β) and state update density p(s
|s
, β). For the former, we begin with the most natural and simple choice, the Poisson distribution, and later compare with the negative binomial distribution. The state update density is specified according to an underlying process for the second‐order difference of
, which we denote by δ
. For t > t
0. The model isUnder the model,
is assumed to follow an AR(1)‐ARCH(1) process. This parametric form of the state equation accommodates our beliefs about how second‐order differences drive the dynamics of the observed daily counts. Note that because λ
must be positive, it is natural to work with the logarithm of the underlying mean,
. In a wave of COVID‐19 death counts, we expect the δ
's to have high lag 1 autocorrelation and settle towards 0 following the region of high curvature around the peak. Our state equation can capture this behaviour—if ρ ∈ (0, 1) and α
1 > 0, we have δ
positively autocorrelated, and
. Moreover, δ
is more concentrated around 0 when |δ
| is small.The δ
update depends on δ
whenever at least one of ρ or α
1 is nonzero. In this case,
, and for t ≥ 3, we explicitly define s
as the vector of length three containing
, the first order difference
, and the second‐order difference δ
. Because of the twice‐differencing, three dimensions are necessary to satisfy the second and third lines of (1). If ρ and α
1 are both 0, then δ
is white noise,
, and s
need only contain
and
.The update step (4) for δ
has the general form of a first order autoregressive process with ARCH(1) errors. Fixing the AR(1) and ARCH(1) terms (ρ and α
1, respectively) gives rise to boundary cases and simpler models:
gives iid noise;
gives a random walk. We consider a sequence of models that vary in their treatment of ρ and α
1, and grow in complexity. Table 1 shows the progression of models. Note that when α
1 is set to 0, we reparameterize with
for notational ease.
TABLE 1
Progression of models fit to New York City daily COVID‐19 deaths
Model
Observation distribution
δt update
Parameters β to fit
t0
P(0,0)
Poisson(λt)
σϵt
σ
2
P(1,0)
Poisson(λt)
δt − 1 + σϵt
σ
3
P(ρ,0)
Poisson(λt)
ρδt − 1 + σϵt
ρ, σ
3
P(ρ, α1)
Poisson(λt)
ρδt−1+α0+α1δt−12ϵt
ρ, α0, α1
3
NB(ρ, α1)
NegBinom(λt, ϕ)
ρδt−1+α0+α1δt−12ϵt
ϕ, ρ, α0, α1
3
t
0 is the number of initialized states.
Progression of models fit to New York City daily COVID‐19 deathst
0 is the number of initialized states.In our simplest model, P(0,0), we fix ρ and α
1 equal to 0, giving the white noise process for δ
discussed above. Keeping
, we next consider a random walk for δ
, setting
in P(1,0). In P(ρ,0) we introduce a prior on ρ with support [0, 1]. Conditionally, δ
now follows a stationary AR(1) process. In P(ρ, α
1), we set priors for both ρ and α
1, so that conditionally δ
follows a stationary AR(1)‐ARCH(1) process. This is the most complicated state equation we consider. We take this one step further in the NB(ρ, α
1) case by introducing a negative binomial observation distribution with scale parameter ϕ, parameterized so that it has mean λ
and variance
.Each model in Table 1 has associated priors for the β parameters and the initial states
. In Appendix A1 we provide details of the full joint distribution for NB(ρ, α
1). With the full joint specified, we can use MCMC methods to fit each model. In Appendix B1 we give the stan model code used to fit NB(ρ, α
1).
Model assessment
We perform predictive model assessment in several ways based on posterior predictive draws y
∼ p(y
|y
1 : ) sampled according to (3). These evaluations allow us to meaningfully compare the models in Table 1.We make point and interval forecasts by taking quantiles of posterior predictive draws. We produce forecasts and compare to observed data. As a summary measure of predictive performance, we consider the absolute relative error of 7‐day ahead cumulative death predictionsThe other predictive model assessments we perform evaluate model fit using the full set of posterior predictive draws.
Probability integral transform
The probability integral transform (PIT) is a useful tool generally used for assessing models for continuous data; however, it can be extended to count‐valued data. We use the nonrandomized PIT histogram for count data proposed in (Czado et al., 2009) for assessing the fits of our considered models.We fit our model to observed counts y
1 : and make a k‐day ahead prediction by sampling posterior predictive draws,
as in (3). We seek a probability integral transform to assess this empirical predictive distribution against future observations y
∼ F
. The empirical predictive distribution function F˜
is given byIf we successfully target the true predictive distribution, so that
, then
, and it follows that the PIT using F˜
(y
) will be very close to Uniform on the unit interval for B large. For each model fit, we make
post‐warmup posterior draws.
Scoring rules
Scoring rules are another tool for predictive model assessment, providing a numerical score s(P, y) based on a predictive distribution P and an observation y. We consider two scoring rules, the logarithm score and the ranked probability score. These scoring rules are strictly proper (Czado et al., 2009). We average the scores over sets of look‐ahead predictive distributions to compare the considered models.The logarithm score is defined as
, where p(x) is the probability of observing y under the predictive distribution P. We adapt this score to our framework for a k‐day ahead forecast as
, where
is an approximation of the true posterior predictive probability mass function. This is similar to (6) above. We add the 1/B
2 term to avoid
when
for all b ∈ {1, … , B}.The ranked probability score is defined as
, where P(·) is the cumulative distribution function under P. We take
, from (6). For both scoring rules, smaller values indicate a better fit.
RESULTS
We fit the models in Table 1 to multiple subsets of the NYC data. Each daily count time series corresponds to a single borough and begins with the day of the first COVID‐19 death observed in that borough. Model assessment is based on a collection of these fits and their forecasts. Comparing results across the considered models, we determine NB(ρ, α
1) to be generally best at capturing the observed dynamics in NYC's daily COVID‐19 death counts over various ranges of the data. We further fit the NB(ρ, α
1) model to daily COVID‐19 deaths data from the four most populous counties in Texas.
Model comparison
Figures 2 and 3 illustrate how the progression of state equation specifications (δ
updates) captures the dynamics of observed daily deaths around their peak. Here the models from Table 1 are fit to six ranges of Queens data. Figure 2 shows posterior estimates of the underlying time‐varying mean λ
, along with observed deaths, whereas Figure 3 shows posterior estimates of δ
.
FIGURE 2
Daily COVID‐19 deaths in Queens along with estimates of underlying means λ
. Red shaded regions correspond to out‐of‐sample forecasts. Estimates vary by model and the length of training series (T)
FIGURE 3
Estimates of δ
, the second‐order difference of the logarithm of the underlying mean. Red shaded regions correspond to out‐of‐sample forecasts. Estimates vary by model and the length of training series (T)
Daily COVID‐19 deaths in Queens along with estimates of underlying means λ
. Red shaded regions correspond to out‐of‐sample forecasts. Estimates vary by model and the length of training series (T)Estimates of δ
, the second‐order difference of the logarithm of the underlying mean. Red shaded regions correspond to out‐of‐sample forecasts. Estimates vary by model and the length of training series (T)Each subfigure corresponds to a different model fit. The rows align with Table 1 and each column corresponds to a different training series, where T is the number of training points. We indicate where the final training day falls relative to the observed peak. For the top left subfigure,
days of deaths are included in the training series, starting from the first confirmed COVID‐19 death in Queens on March 11 and ending with the count from April 4, 2 days before the observed peak. We plot approximate 90% posterior intervals and approximate posterior medians based on Monte Carlo sampling. The red shaded regions show 3 weeks of out‐of‐sample, look‐ahead forecasts.The first row of Figures 2 and 3 show estimates based on P(0,0). Under P(0,0), δ
is uncorrelated with δ
and
. These model features lead to poor forecasting around the observed peak, where second‐order differences are negative with lag 1 autocorrelation. Although P(0,0) fails to capture the dynamics close to the peak, this model forecasts well when the training series extends at least 18 days past the peak.Under P(1,0), we have that
for k ≥ 1. This feature is seen clearly in Figure 3, where the P(1,0) posterior predictive distributions for δ
remains centred around the posterior mean of δ
. In the first four columns, forecasts for δ
stay negative, leading to a too quick decrease in the corresponding λ
forecasts.Under P(ρ,0) forecasts of δ
increase gradually towards 0. This captures the dynamics of the observed counts much more closely than the previous two models. However, although predictive distributions for δ
become centred around 0, there is still much uncertainty, and this seems to exaggerate the possibility of δ
moving away from 0. Inclusion of the ARCH(1) term corrects for this. In the P(ρ, α
1) and NB(ρ, α
1) models, there is more uncertainty in the second and third columns where δ
is negative, and less in the fifth and sixth where δ
is close to 0. The final two rows in Figures 2 and 3, corresponding to models P(ρ, α
1) and NB(ρ, α
1), are very similar. This state update model is not able to properly predict the peak in the far left column, but performs well in the period after the peak.Table 2 shows mean absolute relative errors (5) along with their standard errors. We average over the five NYC boroughs as well as the week‐long range based on where the the final training day falls relative to the observed peak. Each cell summarizes
values.
TABLE 2
Mean absolute relative error of 7‐day ahead cumulative predictions
Absolute relative error (SE)
Model
peak − 3 to peak + 3
peak + 4 to peak + 10
peak + 11 to peak + 17
peak + 18 to peak + 24
P(0,0)
.915 (.083)
.440 (.086)
.165 (.041)
.113 (.015)
P(1,0)
.146 (.018)
.243 (.018)
.202 (.020)
.189 (.031)
P(ρ,0)
.208 (.042)
.177 (.015)
.145 (.013)
.159 (.027)
P(ρ, α1)
.320 (.053)
.175 (.019)
.099 (.010)
.139 (.023)
NB(ρ, α1)
.357 (.058)
.168 (.022)
.099 (.010)
.138 (.023)
The peak‐relative week in which the last training day falls.
Mean absolute relative error of 7‐day ahead cumulative predictionsThe peak‐relative week in which the last training day falls.The P(1,0) model has the lowest absolute relative error when the final training day falls within 3 days of the observed peak. This follows because absolute errors are bounded for underestimates, and the forecasts from P(1,0) decrease quickly to 0. The NB(ρ, α
1) model has the lowest error when the final training day falls 4 to 17 days after the observed peak, whereas the P(0,0) model performs best later on.Figure 4 compares the nonrandomized PIT histograms (Section 2.3.1) for 1‐day ahead predictions. Each histogram is based on
model fits, with the 28 training series from each borough covering the same four peak‐relative weeks in Table 2. The histograms corresponding to the P(0,0), P(1,0), and P(ρ,0) appear skewed. Although we see some overdispersion for NB(ρ, α
1), the PITs for P(ρ, α
1) and NB(ρ, α
1) do not provide strong evidence that these models are incorrect.
FIGURE 4
Nonrandomized PIT histograms for 1‐day ahead predictive distributions
Nonrandomized PIT histograms for 1‐day ahead predictive distributionsTable 3 shows two proper scoring rules (Section 2.3.2) for the progression of models considered. These scores are averaged over the 21 days of forecasting for each model fit, the peak‐relative week of the final training day, and the five boroughs. Each entry is an average of
values.
TABLE 3
Scoring rules evaluated on the predictive distributions of considered models
Ranked probability score
Logarithm score
Model
peak − 3 to
peak + 4 to
peak + 11 to
peak + 18 to
peak − 3 to
peak + 4 to
peak + 11 to
peak + 18 to
peak + 3
peak + 10
peak + 17
peak + 24
peak + 3
peak + 10
peak + 17
peak + 24
P(0,0)
350.88
26.41
5.16
3.59
13.36
6.61
3.56
3.17
P(1,0)
24.73
21.11
12.36
8.23
6.04
6.14
4.47
3.82
P(ρ,0)
24.23
14.73
7.55
4.58
4.98
4.90
4.00
3.45
P(ρ, α1)
46.60
11.79
5.30
3.62
5.42
4.41
3.59
3.24
NB(ρ, α1)
52.22
10.98
5.34
3.58
5.46
4.39
3.62
3.25
The peak‐relative week in which the last training day falls.
Scoring rules evaluated on the predictive distributions of considered modelsThe peak‐relative week in which the last training day falls.According to these measures for predictive model assessment, the models with both the AR(1) and ARCH(1) terms dominate when the final training day is 4 to 10 days after the peak, with NB(ρ, α
1) performing the better of the two. The P(ρ, α
1) and NB(ρ, α
1) models have logarithm scores close to the lowest for the other peak‐relative weeks, and NB(ρ, α
1) has the lowest ranked probability score when the final training day is 18 to 24 days after the peak. We again see how P(0,0) performs well relative to the others in the later weeks, but very poorly when the final training day is close to the peak.
Fits around peak for all five boroughs
Figures 5 and 6 show further posterior estimates from NB(ρ, α
1). We fit the model to counts from all five NYC boroughs, with final training days concentrated around the 2 weeks following the observed peaks. Again we plot approximate 90% posterior intervals and approximate posterior medians for λ
and δ
, which are based on posterior samples.
FIGURE 5
Estimates of λ
from the collection of NB(ρ, α
1) model fits. The red shaded regions show out‐of‐sample posterior predictive forecasting. The points are observed actual deaths
FIGURE 6
Estimates of δ
from the collection of NB(ρ, α
1) model fits. The red shaded regions show out‐of‐sample posterior predictive forecasting
Estimates of λ
from the collection of NB(ρ, α
1) model fits. The red shaded regions show out‐of‐sample posterior predictive forecasting. The points are observed actual deathsEstimates of δ
from the collection of NB(ρ, α
1) model fits. The red shaded regions show out‐of‐sample posterior predictive forecastingThe trajectories of the posterior estimates of δ
show a similar pattern across the five boroughs—an increase towards 0 with less posterior predictive uncertainty as more training points are included. For Brooklyn, Queens, and The Bronx there is a temporary flattening out or even decrease in δ
around 2 weeks before the peak. For Brooklyn and The Bronx, the peak of the posterior λ
estimates falls before the observed peak, whereas for Queens there appears to be a sort of double observed peak, and the maximum posterior λ
is closer to the first.In the far left hand column in Figure 5, we see a failure to recognize that the peak has been reached for Queens, Manhattan, and Staten Island. Here the model incorrectly forecasts dramatic continued growth in the death rates for these boroughs. As more training points are added, the posterior predictive forecasts show decreasing mean processes that are better aligned with the actual deaths and have less posterior uncertainty.Across all the boroughs, the NB(ρ, α
1) model appears to perform reasonably well in capturing the dynamics of the observed daily counts once it has been fit with data that extend a few days past the observed peak. The fits improve as more data are included.Figure 7 shows nonrandomized PIT histograms for k‐day ahead predictions for the NB(ρ, α
1) model. The fits appear to break down around
, with the predictive distributions becoming more and more overdispersed with an increasing k.
FIGURE 7
PIT histograms for k‐day ahead predictive distributions, for the NB(ρ, α
1) model
PIT histograms for k‐day ahead predictive distributions, for the NB(ρ, α
1) model
Fits to four Texas counties
In further exploration, we fit the NB(ρ, α
1) model to COVID‐19 mortality data from the four most populous counties in Texas (Texas Department of State Health Services, 2020). These data come from the state, are subject to ongoing quality checks and updates, and include fatalities for which COVID‐19 is listed as a direct cause of death on the death certificate. The four largest Texas counties are Harris County, which includes the city of Houston, Dallas and Tarrant counties, which include most of the Dallas–Fort Worth metroplex population, and Bexar County, which includes the city of San Antonio. We fit the NB(ρ, α
1) model to one long series for each county, beginning with its first recorded COVID‐19 death, to the death count on December 20, 2020. The training data consist of the provisional counts available on January 12, 2021.Figure 8 shows the posterior estimates from these four fits. Over these longer training series, we observe multiple dynamic phases, including pronounced peaks in mid‐to‐late July and waves building at the end of the year. We can see how the dynamics of the estimated δ
's on the right are reflected in the observed training counts and estimated λ
's on the left. The geographically adjacent Dallas and Tarrant counties show similar patterns of a more gradual, less devastating first wave, whereas Harris and Bexar Counties have dramatic and pronounced peaks. According to the model forecasts, the late fall/winter peaks may not yet be reached.
FIGURE 8
Posterior estimates of λ
and δ
from NB(ρ, α
1) fits to fatality data from the four largest Texas counties
Posterior estimates of λ
and δ
from NB(ρ, α
1) fits to fatality data from the four largest Texas counties
CONCLUSION
We have presented a generalized state‐space model for count‐valued time series that seeks to capture the dynamics of local daily COVID‐19 deaths. The model components, in particular the latent state process involving second‐order differencing and an AR(1)‐ARCH(1) model, are motivated by supposed behaviour of waves of epidemic counts. The results we see in applying our proposed model justify these components.
Authors: Vincent Chin; Noelle I Samia; Roman Marchant; Ori Rosen; John P A Ioannidis; Martin A Tanner; Sally Cripps Journal: Eur J Epidemiol Date: 2020-08-11 Impact factor: 8.082