Literature DB >> 34660186

Estimation of COVID-19 mortality in the United States using Spatio-temporal Conway Maxwell Poisson model.

Abstract

Spatio-temporal Poisson models are commonly used for disease mapping. However, after incorporating the spatial and temporal variation, the data do not necessarily have equal mean and variance, suggesting either over- or under-dispersion. In this paper, we propose the Spatio-temporal Conway Maxwell Poisson model. The advantage of Conway Maxwell Poisson distribution is its ability to handle both under- and over-dispersion through controlling one special parameter in the distribution, which makes it more flexible than Poisson distribution. We consider data from the pandemic caused by the SARS-CoV-2 virus in 2019 (COVID-19) that has threatened people all over the world. Understanding the spatio-temporal pattern of the disease is of great importance. We apply a spatio-temporal Conway Maxwell Poisson model to data on the COVID-19 deaths and find that this model achieves better performance than commonly used spatio-temporal Poisson model.

Entities: Chemical

Keywords: Bayesian inference; COM-Poisson model; Model comparison; Spatial temporal effect

Year: 2021 PMID： 34660186 PMCID： PMC8505020 DOI： 10.1016/j.spasta.2021.100542

Source DB: PubMed Journal: Spat Stat

Introduction

The coronavirus disease 2019 (COVID-19), first discovered in the end of 2019 was officially declared a pandemic by WHO on March 11th, 2020. The first confirmed case in the United States was in the Washington state, on January 19th, 2020. After this, the disease spreads rapidly to the rest of the country. By the end of March 2020, there were confirmed cases in all the fifty states. By the end of February 2021, there were over 110 million confirmed cases of COVID-19 over the world and more than 49 million in the United states. The disease has been dramatically influencing our lives. The mortality risk of the disease is also relatively high. There have been over 2.4 million deaths of COVID-19 over the world and over 1 million in the United States. The prevention of the spread of the disease as well as the treatment of the disease is of great importance. By estimating the mortality risk of the COVID-19, we can provide useful information for resources allocation. States with higher mortality risk need more attention and might require more medical resources. There are many studies on modeling spatial variation of the COVID-19 infection and deaths in the United States (Mollalo et al., 2020, Desmet and Wacziarg, 2020). Thus spatial variation should be considered when we model the COVID-19 deaths. Zhu et al. (2020) used a spatio-temporal model estimating the confirmed cases and deaths of COVID-19 in the United States. They assumed that both the number of confirmed cases and the number of deaths follow normal distributions and used a vector auto-regressive model to capture spatial and temporal effects. In this paper, we focus on modeling the death rate and the mortality risk. Commonly used disease mapping models assume a Poisson distribution for the number of infected cases or deaths, and link the rate parameter with possible predictors and spatial and temporal random effects. In this paper, we use Conway Maxwell Poisson distribution instead of Poisson distribution due to its flexibility. Poisson distribution is commonly used for count data. However, it has the constraint of the equal mean variance assumption, which is not always true. Including spatial and temporal random effects can account for the over dispersion. However, after incorporating the spatial and temporal effects, there might still be extra variation which causes violation of the assumption of equal mean and variance, i.e., over- or under-dispersion. To accommodate the equal mean variance assumption of Poisson distribution, Negative Binomial regression is often adopted for over-dispersed data. As for under-dispersion, there are few models that can account for this. Generalized Poisson regression (Famoye, 1993) is one option. It can model both over- and under-dispersion. However, it belongs to the exponential family only for a constant dispersion parameter (Cui et al., 2006). The dispersion parameter of the generalized Poisson model cannot be linked with predictors. A more general distribution for count data is the Conway Maxwell Poisson distribution (Shmueli et al., 2005), which can also handle both over- and under-dispersed data by introducing a dispersion parameter. We can estimate this parameter, and let the data tell us the value of the dispersion. A dual-link generalized linear model based on Conway Maxwell Poisson is well developed (Guikema and Goffelt, 2008, Sellers and Shmueli, 2010). Both the mean and the dispersion parameter can be modeled by predictors. The Poisson distribution is also a special case of Conway Maxwell Poisson distribution, with the Conway Maxwell Poisson distribution providing more flexibility. Thus, in this paper, we use the Conway Maxwell Poisson distribution to model the number of death of COVID-19. Wu et al. (2013) conducted the Spatio-Temporal Conway Maxwell Poisson model that captured spatial effect by mapping from physical space using the basis function matrix. They applied the model to predicting waterfowl migratory patterns. MacDonald and Bhamani (2018) conducted a hidden Markov model with Conway Maxwell Poisson state-dependent distributions for time series of counts. The Conway Maxwell Poisson model has not been applied in disease mapping and we use the scaled Besag–York–Mollié (BYM) model (Riebler et al., 2016) to capture the spatial effect, which is different from Wu et al. (2013). The scaled BYM model includes both structured and unstructured spatial variations and have them at the same scale. This paper is organized as follows. In Section 2, we introduce the data and factors that we selected to include in the model. In Section 3, we review the Conway Maxwell Poisson distribution and introduce the Conway Maxwell Poisson spatio-temporal model that we use to fit to the COVID-19 data. We also talk about parameter estimation and model comparison. The results are then shown in Section 4. We also compare the results of our proposed model with Poisson spatio-temporal model to show the superiority of the proposed model. Finally we put discussion and conclusion in Section 5.

Data

In this paper, we assessed the data of confirmed cases and deaths of COVID-19 and demographic census data to explore the spatio-temporal dynamics of COVID-19 transmission in the United States at state level. We focus on the mainland United States, and do not include Hawaii and Alaska. Daily confirmed cases and deaths of COVID-19 were extracted from John Hopkins coronavirus resource center (Dong et al., 2020). We converted the daily confirmed cases and number of deaths to weekly confirmed cases and number of deaths to avoid the possible testing bias. Demographic data was obtained from American Community Survey (ACS) provided by the U.S. Census Bureau (The U.S. Census Bureau, 2019). The latest data are for year 2019. It has been shown in some studies that the infection and death of COVID-19 are related to age and gender (Bhopal, 2020, Bhopal et al., 2020). Thus the demographic data we collect are the total population, the population of elderly with an age of 65 or older and the population of males and females of each state. We then calculate the proportion of elderly with an age of 65 or older and the ratio of males and females of each state as our covariates. The overview of the demographic variables is shown in Fig. 1

Fig. 1

Overview of the demographic variables.

We consider the confirmed cases as the exposure of the number of deaths. We plot the death rate of each state at three time points to show the change in death rate in Fig. 2. We also show the time series plot of death rate of each state in appendix. From this time series plot, we can see that the death rate was increasing until March, and then started decreasing in late April. After July, the death rate remained relatively low. Thus we focus on the time periods from late April to July, and the three time points that we pick for Fig. 2 are April 6th, May 11th and June 15th.

Fig. 2

Death rate.

Overview of the demographic variables. The death rate is higher in northeast region, and also shows spatial trend. For states with high death rates, the neighbor states also tend to have high death rate, indicating that we should consider spatial variation when we model COVID-19 mortality. Death rate.

Methodology

In this section, we first review the Conway Maxwell Poisson (COM-Poisson) distribution. Then we introduce the COM-Poisson spatio-temporal model that we use to model the number of deaths of COVID-19. Finally we show the computation and model comparison criteria.

COM-Poisson distribution

The Conway Maxwell Poisson (COM-Poisson) distribution, which was first introduced in 1962 for modeling queues and service rates by Conway and Maxwell (Conway and Maxwell, 1962), has recently been re-introduced by statisticians to model count data. In this paper, we use the re-parameterization of the COM-Poisson distribution proposed by Guikema and Goffelt (2008), which provides a good basis for developing a generalized linear model. Suppose is a random variable following the COM-Poisson distribution. The probability mass function of is shown in Eq. (1). where is the normalization constant, which is intractable. Shmueli et al. (2005) showed the asymptotic expression of Z and derived the approximation of the mean and the variance of based on the asymptotic expression of Z for the original parameterization of the COM-Poisson distribution. By modifying that, the asymptotic expression of Z for the re-parameterization of the COM-Poisson distribution is shown in Eq. (2) which is accurate when and are not both small. When and are small, we use the truncation method to approximate the normalization constant. The approximated mean and variance for the re-parameterized COM-Poisson distribution are shown in Eq. (3) and the mode of the COM-Poisson distribution is . The dispersion parameter makes COM-Poisson distribution more flexible than the standard Poisson distribution. When , COM-Poisson can model the over-dispersed count data. While , it can capture the under-dispersion of the count data. In this set up, closely approximates the mean, unless or is small. The mean parameter can be linked with predictors with a log link as shown in Eq. (4). For , Then and . Both the mean and variance are allowed to vary for different values of predictors.

Conway Maxwell Poisson Spatio-temporal model

The weekly number of deaths of State at week is represented by , which is modeled as where and . n is the total number of states and T is the total number of weeks. is the rate of death and is the confirmed cases, which we consider as exposure. Huang (2017) shows that exposures can be taken into account by including an offset in the Conway Maxwell model. The death rate of State at time , is modeled by additive predictors, spatial random effect and temporal random effect, as shown in Eq. (6) where are the State level population, proportion of population that above 65 years old and the male female ratio. are the corresponding coefficients. is the spatial random variable and is the temporal random variable. We assume follows the scaled BYM model, which is proposed by Riebler et al. (2016) following Simpson et al. (2015), as shown in Eq. (7) where is the overall standard deviation for the spatial effect, is the scaled unstructured spatial random effect, is the scaled structured spatial random effect and measures how much of the spatial variation is from the structured effect. We assume , and , where is the geometric mean of the variance of the structured spatial components, as suggested in Riebler et al. (2016), that is . follows ICAR model, that is represents the set of neighbors of region , and is the size of . By applying Brook’s Lemma (Brook, 1964), it is proved that the joint distribution of is multivariate normal. where is a diagonal matrix with diagonal being , the number of neighbors of each state, and is the adjacency matrix, with entry being 1 if state and are neighbors and 0 otherwise. All the diagonal elements of are zero. controls the amount of spatial correlation. We use an intrinsic CAR model, that is we set . When , is singular, thus the distribution of is improper, which is not an issue since we use the model as a prior of the structured spatial effect. In addition, the CAR prior does not deliver enough spatial similarity unless is very close to 1, thus getting us to the same problem as using the intrinsic CAR model (Banerjee et al., 2004). Thus we use intrinsic CAR model for the structured spatial effect. When we examine the autocorrelation plot of the death rate of each state, we find that the autocorrelation decays to zero very slowly for most states, so we assume is a random walk, as in Eq. (10)

Computation

The model is fit under a Bayesian framework. The marginal posterior distributions of parameters cannot be obtained analytically. We, therefore, use Hamiltonian Monte Carlo (Neal, 2011) through RSTAN to sample the parameters from their posterior distributions. The implementations of the scaled BYM model for the spatial effect and the random walk for the temporal effect in STAN follow Morris et al. (2019). Another reason that we use ICAR is that it can reduce the complexity of computation. The log probability density of is proportional to Eq. (11) Computing the determinant requires operations. If we use ICAR model, the determinant of Q is 0, and the number of operations drops to . Then we can rewrite the probability density for with variance 1 as pairwise difference formula (12) where means States and are neighbors to each other. A constraint that is added because the pairwise difference is non-identifiable. A natural way to compute the pairwise differences is to encode the non-zero entries of the adjacency matrix as an edge set. It also saves more storage than the full adjacency matrix. The priors for hyperparameters are set as suggested in Riebler et al. (2016). We assume a beta(0.5,0.5) prior on , half normal prior on and standard deviations and normal prior on s. Model convergences are diagnosed by trace plots, the potential scale reduction (Gelman and Rubin, 1992) and effective sample size (Kass et al., 1998). We check that the effect sizes are not small and the trace plots of the chains mix well. We also check that the potential scale reductions are close to 1, indicating the chains converged to the target posterior distribution.

Model comparison criteria

We compare the proposed model with the commonly used Poisson Spatio-temporal model. We use mean absolute error (MAE), leave-one-out cross-validation (LOO) and Watanabe–Akaike information criterion (WAIC) (Watanabe, 2010) to compare model performances. MAE is the mean of the absolute value of the difference between fitted values and true observations. It measures the in-sample model fit. LOO and WAIC are measures of out-of-sample predictive fit. Suppose are the in sample data, and are new observations. The expected log pointwise predictive density (elpd) for the new observations is defined as in Eq. (13) where is the true distribution of , which is unknown. Thus we can only approximate the value of elpd. LOO and WAIC are two ways to approximate it. As introduced in Vehtari et al. (2016), the Bayesian LOO estimate of out-of-sample predictive fit is defined as in Eq. (14) where is the predictive density given the data without . Here we use to represent all the parameters. WAIC is defined as where is the computed log pointwise predictive density as shown in Eq. (16) where for ,2, ,S are draws from posterior distribution of . is estimated effective number of parameters, as shown in Eq. (17) is the estimation of using , where is the sample variance. Thus

Results

We fit both the Conway Maxwell Poisson spatio-temporal model and the commonly used Poisson spatio-temporal model to see if the model performance is improved if we use a Conway Maxwell Poisson model. The MAE, LOO, WAIC and the estimated coefficients of both models are shown in Table 1.

Table 1

Coefficient estimates.

	COM-Poisson		Poisson
	βˆ	95% CI	βˆ	95% CI
population	0.83	(0.07, 1.59)	0.06	(−0.07, 0.21)
prop(65)	0.25	(0.12, 0.38)	0.11	(−1.85, 1.94)
prop(gender)	−0.81	(−1.44, −0.18)	−0.86	(−2.20, 0.44)

MAE	31.5		35.0
WAIC	−50837.2		−2075063.8
LOO	−10884.9		−151162.0

As introduced in the data section, we include three predictors in our model, the total population, the proportion of elderly that is 65 years old or above and the male female ratio of each state. None of the credible intervals of the coefficients estimated using COM-Poisson model include zero, but all of the credible intervals including zero using Poisson model. The posterior variance of model using Poisson distribution is greater than the model using Conway Maxwell Poisson distribution. In addition, all of the three model comparison criteria suggest that the COM-Poisson model performs better than the Poisson model. The posterior mean of is 0.08 with 95% credible interval (0.06,0.1), indicating a high over-dispersion of the data even after considering the spatial and temporal effects. This might be the reason that the performance of Poisson model, that assumes no over-dispersion, is not good. From the estimated coefficient of COM-Poisson model, states with higher population, proportion of elderly and lower male female ratio are likely to have higher number of deaths of COVID-19. Previous study has shown that older people and males are more vulnerable to COVID-19 than younger people and females (Bhopal, 2020, Bhopal et al., 2020). States with higher proportion of elderly show higher COVID-19 mortality risk which is consistent with these previous findings. However, we find that states with a higher proportion of males have lower mortality risk, which is different from the previous finding that male is more vulnerable to COVID-19. This might be happening because we have examined the data at state level. The male female ratio of each state might be related to some other factors of the state which also influence the COVID-19 mortality risk. This also indicates that it could be beneficial to explore additional predictors, as well as the current predictors at finer spatial scales. Coefficient estimates. Correlation of structured spatial effect. The estimated is 0.48, showing that the proportion of structured and unstructured spatial effects are about equal to each other. We show the correlation plot of , the structured spatial effect in Fig. 3. We find that the correlation of structured spatial effects is positive between neighboring states, and zero or negative between states that are not close to each other (Fig. 4). The result is consistent with our model setting, indicating the adequacy of our model.

Fig. 3

Correlation of structured spatial effect.

Fig. 4

Map of spatial effects.

Map of spatial effects. From Fig. 4, the spatial trend is also consistent with what we show in Fig. 2. The mortality risk of COVID-19 is higher in the north east part of the United States, and lower in the north west part. There is some difference between the overall spatial effect and the structured spatial effect, indicating that there exists unstructured variation among all the states. We show the fitted value versus true observations in Fig. 5 for 9 exemplar states. We can see that the fitted values almost match the true observations, also indicating our model fit the data well.

Fig. 5

Fitted value versus true observations. The light blue shaded areas are the true observations, the darker blue line represents the fitted value and the two green dotted lines represent the 95% credible interval of fitted values. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Discussion

In this paper, we propose a Conway Maxwell Poisson spatio-temporal model fitting the number of deaths of COVID-19. In this model, the spatial random effect is assumed a scaled BYM model, and the temporal random effect is assumed to be a random walk. The superiority of Conway Maxwell Poisson model comes from its flexibility to capture both over- and under-dispersion of the count data. If the data is not equi-dispersed after incorporating the spatial and temporal effect, using Conway Maxwell Poisson is more appropriate and can achieve higher accuracy. We show this by comparing the model performance of the proposed model with the commonly used Poisson spatio-temporal model. For the mortality rate of COVID-19, the Conway Maxwell Poisson model performs better, and indicates the presence of over-dispersion in the data even after incorporating the spatial and temporal variation. The proposed model accurately estimated the mortality risk of each state at different time points, which provides information for medical resources allocation. We also find that the population and proportion of elderly of 65 years old or above are positively related to the mortality rate of COVID-19, and the male female ratio is negatively related to the mortality rate of COVID-19. More possible predictors at finer spatial scales can be explored, from which we may detect other factors that might influence the mortality rate associated with COVID-19.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

1 in total

1. Bayesian disease mapping: Past, present, and future.

Authors: Ying C MacNab
Journal: Spat Stat Date: 2022-01-19

1 in total