Literature DB >> 34631400

Clustering spatio-temporal series of confirmed COVID-19 deaths in Europe.

A Bucci¹, L Ippoliti¹, P Valentini¹, S Fontanella².

Abstract

The impact of the COVID-19 pandemic varied significantly across different countries, with important consequences in the definition of control and response strategies. In this work, to investigate the heterogeneity of this crisis, we analyse the spatial patterns of deaths attributed to COVID-19 in several European countries. To this end, we propose a Bayesian nonparametric approach, based on mixture of Gaussian processes coupled with Dirichlet process, to group the COVID-19 mortality curves. The model provides a flexible framework for the analysis of time series data, allowing the inclusion in the clustering procedure of different features of the series, such as spatial correlations, time varying parameters and measurement errors. We evaluate the proposed methodology on the death counts recorded at NUTS-2 regional level for several European countries in the period from March 2020 to February 2021.

Entities: Chemical

Keywords: Bayes nonparametrics; COVID-19; Dynamic linear models; Model-based clustering; Spatio-temporal analysis

Year: 2021 PMID： 34631400 PMCID： PMC8493647 DOI： 10.1016/j.spasta.2021.100543

Source DB: PubMed Journal: Spat Stat

Introduction

Since the officially reported outbreak in China of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), best known as coronavirus disease 2019 (COVID-19), the world has been facing an international public health emergency. The rapid and global spread of the disease prompted an extensive range of government interventions across the whole world in an attempt to contain the spread of further infection and prevent health system strain. These measures, such as the detection and isolation of infected individuals, contact-tracing, quarantine measures, social distancing, and closure of non-essential businesses, have since become significant components of public health policies. While Hsiang et al. (2020) have shown that these interventions helped prevent or delay more than 500 million infections across several countries, data on their efficacy are conflicting and many questions remain open. In fact, despite all the efforts, the human cost of coronavirus continues to mount, with about 4.7 million people known to have died at the time of this publication. Like other pandemics, COVID-19 is characterized by a critical spatial dimension such that the impact of this health crisis differs significantly not only across countries, but also across regions and municipalities within countries, both in terms of infections and related deaths. Hence, understanding the differences in the evolution of mortality by COVID-19 between countries constitutes a key factor to discover the dynamics of the disease and the relative effectiveness of preventive interventions on its spread. To help decision-makers to monitor the efficacy of control and response strategies, appropriate statistical models that allow for the representation and the understanding of spatial patterns and trajectories of both infections and deaths attributed to COVID-19 are required. Recently, several studies have been conducted to analyse the evolution of the COVID-19 pandemic from different perspectives. However, despite the growing literature, there seems to be a gap in the investigation of the different ways in which the pandemic evolves over time and spreads across countries. Apart from merely providing descriptive statistics, only a few studies rely on proper temporal and spatial analyses of COVID-19, demonstrating the impact of morbidity, mortality and global geographical dissemination of the disease across countries. For example, Shang and Xu (2020) and Kumar et al. (2022) analysed the evolving patterns of the COVID-19 pandemic in time and space, modelling the trajectory of confirmed cases and deaths adopting a change-point perspective. Regional short-term predictions of COVID-19 intensive care unit beds in Italy were provided by Farcomeni et al. (2020), and a spatial panel data model was proposed by Guliyev (2020) to determine the spatial effects of COVID-19. Some Bayesian spatio-temporal models were also developed by Chen et al., 2020, Feng, 2022, Sahu and Böhning, 2022, Lee et al., 2022 and Bartolucci and Farcomeni (2022), to assess and predict the distribution of the pandemic at its different stages. Using a different perspective, Zhang et al. (2020) proposed a space–time scan statistical analysis (Kulldorff et al., 2005) to detect spatio-temporal clusters associated with changes in local containment strategies in China. Spatial clustering techniques were also proposed by Shariati et al. (2020) to study the cumulative incidence and cumulative mortality rates over two time periods of interest, and by D’Urso et al. (2022) to group the 20 Italian regions by taking into account the spatial and temporal features of several variables related to the pandemic. In this paper, to understand the dynamics of the pandemic and how they differ across different countries and regions, we also propose a clustering approach to group 145 European NUTS-2 Regions according to the temporal similarity of their time series of COVID-19 confirmed deaths. In particular, we seek to evaluate: (a) the way the dynamics of the time series differ across the areas, (b) the existence of possible timing effects resulting in an earlier or later evolution of the phenomenon and, finally, (c) the shape effects resulting in more or less persistent patterns in some regions and specific intervals of time. While the first question aims at quantifying the spatial heterogeneity in terms of temporal trends, the other two refer to time shift and shape effects which arise when the time location of a specific phenomenon and its temporal profile change across the areas. Answers to these questions can offer key information about the impact of the implemented control measures, how these have elicited similar effects across regions and on the necessity of further local interventions in case of persistent patterns. In particular, the identification of clusters of high incidence areas allows future health resources to be targeted appropriately at regions greatly affected in terms of deaths. There is an extensive literature for clustering time series. Popular approaches include hierarchical and non-hierarchical methods, fuzzy clustering methods, machine learning methods and model-based methods — see, for example, Maharaj et al. (2019) and references therein. Here, we adopt a Bayesian nonparametric approach where time series patterns are modelled as a mixture of Gaussian processes (GP), with a Dirichlet process (DP) prior (Ferguson, 1973) over mixture components. This procedure is appealing on several grounds as, for example, the number of clusters does not need to be determined in advance, but is automatically selected during the clustering procedure within a suitable Markov chain Monte Carlo (MCMC) algorithm. Furthermore, the Gaussian mixture is defined through the observation equation of a state-space model (West and Harrison, 1997), which, not only allows for time varying parameters, but also accommodates for temporal trends, spatial correlations and measurement noise. The remainder of the paper is organized as follows. In Section 2, we offer a simple exploratory analysis of the data. The methodological contribution is outlined in Section 3 and results of the study are presented in Section 4. Finally, Section 5 concludes the paper with a brief discussion.

The data

To investigate the dynamics of the pandemic, we consider data for a period of weeks (18 March 2020–18 February 2021) on COVID-19 confirmed deaths recorded for NUTS-2 Regions in France, Germany, Italy, Spain, Switzerland and United Kingdom.1 Data were collected from National repositories as reported in Table 1.

Table 1

Data sources of confirmed deaths by COVID-19.

Countries	Source	Link
France	Santé publique France	https://geodes.santepubliquefrance.fr
Germany	NPGEO Corona Hub 2020	https://npgeo-corona-npgeo-de.hub.arcgis.com
Italy	Civil Protection Department	https://github.com/pcm-dpc/COVID-19
Spain	Escovid19data	https://github.com/montera34/escovid19data
Switzerland	Open Government Data Canton of Zurich	https://github.com/openZH/covid_19
United Kingdom	UK Government	https://coronavirus.data.gov.uk/

Weekly counts of deaths are a key indicator of overall epidemic impact and trajectory and may help improving data quality respect to daily data. Thereby, in our analysis, daily data have been aggregated with a weekly frequency to account for differences in the recording mechanism among countries and to filter out weekend effects. Data sources of confirmed deaths by COVID-19. The raw data, stratified by country, are depicted in Fig. 1. The temporal evolution of the pandemic highlights a clear trend across countries, characterized by two or three peaks in the periods March – May 2020 and October 2020 – March 2021 and by low death counts for all regions during the summer.

Fig. 1

Weekly time series of deaths (per million inhabitants) by COVID for each Country.

As expected, the time series within the same country present similar patterns; however, high variability can be observed in all the country-level plot, especially for the second and third waves. This suggests that the impact of the pandemic has been highly asymmetric within countries and that there might exist a supranational pattern across regions. This seems to be confirmed to some extent by Fig. 2 which shows the time series correlations as a function of the spatial distance computed by means of the region centroids. As it can be noticed, although the correlations tend to decay with the spatial distance, high correlation values can still be observed at larger distances. This feature suggests that discriminative methods based on a pairwise spatial similarity metric are not suitable in this context. For this reason, in the following, we adopt a modelling strategy that, while explicitly accounting for the spatial correlation in the model specification, does not directly consider the spatial effect in the definition of the clustering structure. Furthermore, to enhance comparability among the regions, we examine standardized time series (i.e., for each time series we subtract its mean and divide by its standard deviation) maintaining the shape of their patterns. This allows comparable ranges focusing on the structural similarities of the regions rather than just on the amplitude levels of their time series.

Fig. 2

Time series correlations represented as a function of the spatial distance (in kilometres). The red line represents the empirical LOESS fit.

Weekly time series of deaths (per million inhabitants) by COVID for each Country. Time series correlations represented as a function of the spatial distance (in kilometres). The red line represents the empirical LOESS fit.

The statistical model

Following Section 2, we consider population level summaries of confirmed COVID-19 deaths collected over time over a fixed study area, . The complete set of information is denoted as , , where is the number of areal units, , and is the length of the time series. We also assume that at the first level of the hierarchy, the model for the (standardized) data is given by where is the true (latent) process of interest and is a spatially and temporally uncorrelated Gaussian error term, assumed to have zero mean and time varying variance, i.e. . This error term corresponds to measurement error and/or representativeness error associated with the true process . In particular, the fact that there may be deaths misattributed to COVID-19 and, conversely, others not attributed to COVID-19, some uncertainty is expected surrounding the true number of COVID-19 deaths. Because of this, it is desired to predict the smooth process rather than the observed noisy process . With the aim of recognizing similarities and differences in the shape of the patterns of the weekly time series, a finite mixture is used as flexible model. The central assumption here is that the time series arise from hidden classes and, within each class, all time series can be characterized by a common data generating mechanism which is defined in terms of a probability distribution for the entire time series, depending on unknown class-specific parameters . Accordingly, denoting with , the vector of time series of the latent process at areal unit , the density of the mixture is written as where we adopt a standard truncated Dirichlet process model to define the prior over the mixing probabilities, , based on some (large) upper bound see, Ishwaran and James (2002). A class assignment index taking a value in the set is then introduced for each time series, , to indicate which class the time series belongs to: The class assignment indices are random and independently distributed apriori, with prior class assignment distribution being the same for all sites, such that, , . The weights in , with and , are assumed to be unknown model parameters estimated along with the data. An important aspect of the model is that we do not assume to know a priori which time series belongs to which group. For each time series, the group indicator variable is estimated along with the group-specific parameters from the data. Finally, summarizes unknown parameters in the probabilistic distribution that are not related with the group indicators. Assume that the are independently drawn from an uncertain prior distribution , where the uncertainty about is expressed through a Dirichlet process model. Then, the mixture formulation gives the following standard hierarchical model (Escobar and West, 1995) where the means and variances determine the parameters , and The Dirichlet process is thus defined by a distribution function which is the prior expectation of , so that , and , a precision parameter determining the concentration of the prior for about . From the Pólya Urn Scheme it also follows that where denotes a unit point mass distribution at see, for example, Escobar and West (1995) and Manolopoulou et al. (2010). The Dirichlet process prior allows to construct the weights for model classes following a stick-breaking representation (Sethuraman, 1994), where the are obtained iteratively as follows The variables represent a sequence of independent random variables for which we have and to ensure that . The coefficient tunes the number of clusters in a direct manner, with larger values implying a larger number of clusters a priori. In particular, as tends to zero, most of the samples share the same value, whereas when tends to infinity, the are almost i.i.d. samples from . Placing a prior on thus allows us to draw inferences about the number of mixture components; a typical choice (Ishwaran and James, 2002) is to assume a Gamma prior, i.e. , where . Choosing a large value for is particularly relevant, because it encourages clustering; Escobar and West (1995) suggest , with . The prior specification for each component , completes with an appropriate choice for . In the following, since we aim to group areas together that exhibit similar temporal trends, we are only interested in modelling the mean with a mixture distribution. This is easily accommodated within the framework here by setting to equal some parameter . This parametrization has been studied widely in literature and references can be found, for example, in West, 1992, West and Cao, 1992 and Ishwaran and Zarepour (2000). To specify the form of the mean we consider the following state space model where is a state vector, is a zero mean temporally uncorrelated error with covariance matrix and is an incidence vector where its th element is equal to if and zero otherwise. Hence, conditionally on the class assignment variables , the structure of implies that where is the th element of and the error terms are uncorrelated in time but correlated in space. In particular, the vector , denotes a zero mean spatial Conditional Autoregressive (CAR) process (Cressie, 1993) with precision matrix , where , are conditional variances and is a matrix which captures the spatial dependence at time , with elements satisfying the condition for a discussion on the symmetry conditions see, for example, Ippoliti et al. (2018). A possible choice for requires first the specification of , a symmetric regional neighbour-incidence matrix (with elements equal to if regions and are adjacent and otherwise) and then the definition of as , where is a diagonal matrix whose element is the th row sum of , i.e. . This specification implies that and that where is a spatial dependence parameter. For simplicity, it is assumed here that each region has at least one neighbour and that, for isolated regions or islands, the set of neighbours is determined by the regions within a distance of kilometres. The use of disconnected graphs (i.e., a graph containing a singleton node/region with no neighbours or a graph split in different sub-graphs) for the CAR specification is not straightforward (Freni-Sterrantino et al., 2018) and it is not considered here. Also, we note that, although we consider the time series to be spatially correlated, using our approach it is possible to cluster into a single group multiple time series that have similar temporal patterns, but that are located far away from each other. This is in contrast with other approaches that, by imposing Markov random field (MRF) priors on the mixture parameters (Blekas et al., 2007) or on the cluster assignment variables (Jiang and Serban, 2012), force the spatial clustering of the series. The state space model described above represents a local level model where the mean of is group specific and varies over time. If is considered as a vector of parameters then, Eq. (6) can be interpreted as defining a hierarchical prior for . In particular, if is a square first difference matrix, and , we can write such that with and for the initial conditions. In words, the stochastic trend term has a variance which is increasing with time (and thus can wander over an increasing wide range), but changes only gradually over time being consistent with the assumption that and tend to be close to one another. It is thus worth noting that the state equation can be interpreted as already providing us with a prior for implying that with . This is an example of a hierarchical prior, since the prior for depends on which, in turn, requires its own prior. Finally, for estimation purposes, it is also worth considering that, by isolating each variable in turn, and considering the individual conditional distributions , , at each site, where denotes all other values than , the general CAR specification described above for implies that for spatial coefficients and . A special case is the Intrinsic Conditional Autoregressive (ICAR) model which uses and (Cressie, 1993). Incidentally we note that Eq. (8), by Brook’s Lemma (Brook, 1964), gives rise to a pseudo-likelihood approximation of the joint distribution of , which is efficient for simple Gaussian fields see, Besag, 1975, Besag, 1977.

Posterior inference

Assume that is a vector collecting all model parameters. Estimation of is based on the posterior distribution of given the data . The posterior distribution is defined through Bayes’ theorem as the product of the observed-data likelihood function and the prior . Under this framework, we use Markov chain Monte Carlo (MCMC) methods and posterior inference is obtained by implementing a Gibbs sampler. Computationally, we only need to calculate the full conditionals of each parameter given all other parameters, which is usually not hard. In the following, we provide details for the relevant conditional distributions. First, it is important to recall (see, for example, Sahu and Mardia, 2005) that and that where is a matrix with th row equal to . It thus follows that the full conditional distribution of is the multivariate Normal distribution where Then, using Eq. (8), we can thus write Conditional on , the state equation of model (5)–(6) can be sampled by means of the well-known forward filtering–backward sampling algorithm of Carter and Kohn (1994) or Durbin and Koopman (2002). A further important part of the estimation refers to the sampling of whose posterior only depends on the data through the variables . Accordingly, has the usual posterior distribution (Ishwaran and James, 2002) The posterior for the class assignment indices is multinomial with probabilities Furthermore, since the precision matrix of the CAR process can be time dependent, we specify an additional state-space model. In particular, for the conditional variance , we introduce the following representation where the th element of is , denotes the th element of , , is a () vector of independent errors, is and is independent of and . The constant is an offset that we set at 0.001. The system in this form has a linear, but non-Gaussian state space form, because the innovations in the measurement equations are distributed as a . As described in Kim et al. (1998), in order to further transform the system in a Gaussian one, a mixture of seven Normals is used as an approximation of the . Let denote the indicator variable of the th Normal from which is drawn from, and let and be component indicators for all the elements of . Conditional on (and the other parameters), Eqs. (9), (10) define a Gaussian state-space model and, hence, the algorithm by Durbin and Koopman (2002) can be used to draw . In particular, let , be the component probability of the th Normal of the mixture with mean, , and variance , for - see, Kim et al., 1998. Then for , , and . Regarding the evolution of the spatial correlation parameter, , we specify the following state equation where , is independent of , and . Eq. (12) is combined with the measurement equation defined in (8), so that the algorithm of Durbin and Koopman (2002) can be used to draw the states. Finally, posterior simulation of follows closely that of . Let , and . Then, the state space model with measurement and state equations is specified by As for , the procedures described by Kim et al. (1998) and Durbin and Koopman (2002) are used to draw .

Clustering time series of confirmed COVID-19 deaths

Here, we complete the statistical analysis of the data introduced in Section 2 by applying the methodology discussed in Section 3 to group the 145 regions into distinct communities based on their temporal patterns. In particular, we fit several competing models by considering , and either as time-varying parameters or as constant through time. Overall, we evaluate different model parametrizations. For all the fitted models, the MCMC algorithm is run for 60,000 iterations and posterior inference is based on the last 10,000 draws using every 5th member of the chain to avoid autocorrelation within the sampled values. Convergence of the chains of the model is monitored visually through trace plots as well as using the R-statistic of Gelman (1996). When convergence is attained, to avoid label switching (Stephens, 2000), we follow Nieto-Barajas and Contreras-Cristan (2014) and Dahal (2006) and choose the representative clustering structure which minimizes the deviation from a pairwise clustering matrix which, taking into account all MCMC iterations, provides an estimate of the probability that the two time series belong to the same cluster. Then, in order to compare among cluster structures resulting from different model specifications, we use the following heterogeneity measure where denotes the set of indices for a structure of clusters with size-groups . This measure of cluster validity assessment is expected to be as small as possible. Additionally, for model comparison, we also evaluate a measure of goodness of fit. In particular, we consider the logarithm of the pseudo marginal likelihood (LPML) which requires the computation of the Conditional Predictive Ordinate (CPO) statistics (Geisser and Eddy, 1979; Mukhopadhyay and Gelfand, 1997) where . Given posterior samples of and , the LPML is a predictive measure of model performance with larger values indicating a better model fit. By considering the model selection criteria described above, Table 2 suggests that models M and M do not provide the best fit of the data, but produce the most homogeneous clusters. On the other hand, a better fit can be obtained by using models M and M which also give the lowest number of clusters. Accordingly, a good balance between heterogeneity and goodness of fit seems to be offered by model M, which is characterized by the highest LPML and a good value of the heterogeneity measure HM. For this model, the regions are grouped in clusters, both the conditional variance and the spatial parameter of the CAR are time varying parameters, and the variance of the measurement error is constant. The dynamics of the time-varying parameters, with 95% credible intervals, are shown in Fig. 3, Fig. 4. The pattern of the conditional variance , coherently with the data, highlights three peaks overall the study period. There are also nearly zero values between the end of May and the end of September, denoting small individual (regional) effects in this period. There is the hint that this might be due to the implementation of national and subnational strict measures during the first phase and favourable climate conditions, though this is difficult to prove. The dynamic of shows that, on average, the spatial dependence parameter varies between 0.85 and 0.92 reaching its minimum value in the second week of May. Although the range of values appears limited, its plot suggests that the spatial correlation increases with the resumption of social and economic activities reaching two peaks of different amplitude around the second week of November and the third of January. As it can be noticed from Fig. 1, Fig. 3, these peaks are aligned both with the last two of and with those of several time series.

Table 2

LPML and HM statistics for different model parametrizations. The last column reports the number of estimated groups.

Model	LPML	HM	M
M1(σν,t2,ρt,σɛ,t2)	1864.60	2181.20	19
M2(σν,t2,ρt,σɛ2)	9085.03	2289.01	12
M3(σν,t2,ρ,σɛ,t2)	1329.00	2193.60	16
M4(σν2,ρt,σɛ,t2)	−115.92	2363.10	23
M5(σν2,ρ,σɛ2)	544.45	1790.10	19
M6(σν,t2,ρ,σɛ2)	8557.31	2588.42	12
M7(σν2,ρt,σɛ2)	862.44	1809.09	19
M8(σν2,ρ,σɛ,t2)	−124.84	2922.09	24

Fig. 3

The temporal dynamics of the posterior mean conditional variance . 95% credible intervals are represented by shaded areas .

Fig. 4

The temporal dynamics of the posterior mean spatial dependence parameter . 95% credible intervals are represented by shaded areas.

A spatial map of the regions grouped by using model M is shown in Fig. 5, where each cluster is represented by a specific colour. Fig. 6 also offers a graphic visualization of the clusters where the posterior mean group functions are overlaid with the time series of each region. As expected, the curves in the same cluster appear similar in shape, whereas present different patterns or trends across the clusters. The regions that are geographically close are likely to be in the same cluster and this applies in particular to several regions in Italy, France, and Germany. However, as expected from the exploratory data analysis (Section 2), the clusters are not always dominated by the spatial proximity of the regions. An example is provided by the second cluster, which mainly includes regions from the South of Germany and the West-Southern part of Italy.

Fig. 5

Spatial map representation of the estimated clusters. Each labelled cluster is shown by its own specific colour. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 6

Weekly confirmed deaths counts (after data standardization) of time series in each cluster; the mean curves are presented by thick functions and their colour reflects the label of the clusters shown in Fig. 5.

LPML and HM statistics for different model parametrizations. The last column reports the number of estimated groups. The temporal dynamics of the posterior mean conditional variance . 95% credible intervals are represented by shaded areas . The temporal dynamics of the posterior mean spatial dependence parameter . 95% credible intervals are represented by shaded areas. The temporal patterns of the mean groups represented in Fig. 6 show that, after peaking in early April, the pandemic appeared to be well controlled until the beginning of August. As a result of the introduction of several containment measures and with increasing temperatures, all the regions have experienced relatively low counts of deaths, with the mean curves resulting flat for most part of the summer period. Focusing on the first phase of the pandemic, the subplots also suggest that the rate of change of the early counts of deaths in a few clusters were higher than all of the other clusters. Spatial map representation of the estimated clusters. Each labelled cluster is shown by its own specific colour. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Weekly confirmed deaths counts (after data standardization) of time series in each cluster; the mean curves are presented by thick functions and their colour reflects the label of the clusters shown in Fig. 5. This is confirmed in Fig. 7 by the dynamics of the first derivative of the group mean functions, which help to understand their rate of change as well as the persistence of the phenomenon. As it can be noticed, the regions in clusters and show the highest rate of change (i.e., a higher velocity) during March and April. From the exploratory analysis of the data, it also results that cluster includes some of the regions with the highest mortality rate as, for example, the Lombardy region (Italy) with the provinces of Bergamo and Brescia, which have been at the centre of Italy’s coronavirus outbreak started in February 2020.

Fig. 7

First derivative functions of the group mean curves.

Cluster includes a few regions from Spain and one from England. This cluster appears highly heterogeneous in the period October 2020–February 2021. Furthermore, its mean function highlights a slightly delayed onset of the pandemic compared to cluster 5, which peaked earlier than cluster 7 in the first phase. Cluster is similar in shape to cluster ; however, the latter appears more heterogeneous, with a few time series showing different timing effects, especially in January and February 2021. In contrast with these patterns, cluster 1, mainly composed by regions from the Central and North-Eastern part of Germany and most of the Swiss Confederation, shows the smallest rate of change and slow dynamics of the phenomenon in the first phase. All other clusters do not seem to differ substantially in terms of variations during the first COVID-19 wave. First derivative functions of the group mean curves. Further differences among clusters can be appreciated by studying the dynamics of the series in the period August 2020–February 2021. In general, all the series show that the number of deaths has surged after growth slowed in summer. Almost all the series, in fact, have been trending upward during the autumn. Exceptions to these trends are represented by the pattern in cluster where the regions do not seem to show any important tendency of rebound after the first phase. Understanding the potential reasons behind this trend may be crucial to identify key factors in the definition of effective preventive policies. Changes in the healthcare system, in the exposure of the vulnerable population as well as restrictive measurements, such as local lockdowns, might all have played an important role in flattening the mortality curves. For example, many regions have seen a marked improvement in the healthcare system, both in terms of intensive care units (ICU) capacity as well as access to novel effective treatments, that have guaranteed a timely and effective assistance to the most severe cases with a consequent reduction of the mortality rate. A further explanation for the reduced mortality in cluster , during the second wave, could be found in the harvesting effect. According to this theory, the most vulnerable population, made of elderly and those with health conditions, might have died during the first COVID-19 wave, especially in regions with a high infection rate. By contrast, the regions that were spared from the first phase have then shown an increase of the mortality in the second phase. Clusters , , and show similar bimodal mean shapes in the last five months, with some differences appearing in the rate of change and in the persistence of the phenomenon. In particular, the number of deaths in clusters and rises steeply than in other clusters from late September, while in clusters and the resurgence appears slower and delayed to reach a plateau until the beginning of February. These temporal patterns also differ from the dynamics of clusters and for which only one peak is observed near the end of the third week of January, denoting a significant reappearance of the phenomenon in UK. For the regions in cluster , this resurgence could be linked to the English Coronavirus variant, which has rapidly outcompeted pre-existing variants in the Southeast region of the country (Davies et al., 2021). Finally, the dynamics of cluster , which includes most of the regions with the lowest population density, highlight a significant rate of change (see, Fig. 7) of mortality between October and November 2020, with regions reaching their peak around the second week of November. In contrast with many other clusters, for this group of regions, the mortality has steadily decreased from the third week of November to attain low-stable levels in 2021.

Conclusions

In this paper, we have analysed the regional patterns of spatio-temporal data on confirmed deaths by COVID-19 in NUTS-2 regions from six European countries between March 2020 and February 2021. To uncover homogeneous patterns of COVID-19 progression, we have implemented a Bayesian nonparametric approach that models (standardized) time series patterns as a mixture of Gaussian processes, specifying a Dirichlet process prior over mixture components. Although the statistical analysis was carried out under Gaussianity assumptions, our model could be easily cast within the more general framework of generalized linear models (GLM), where the data are assumed to be from an exponential family with distribution . For example, considering the death counts, one obvious choice would be to assume a Poisson regression model with a log link function. In order to keep computations for the posterior distributions simple, one may then apply the data augmentation technique proposed by Tanner and Wong (1987). The proposed methodology has allowed grouping the 145 regions into 12 clusters providing evidence that the effects of the pandemic varied a lot both temporally and across countries and regions. Some of the areas that have been hit the hardest in the first phase (for example, the Northern region of Lombardy in Italy) have seen a steep decline in the number of deaths soon after implementing control strategies, with contagion and deaths far less widespread in autumn and winter 2020. In contrast, other areas, which have not been affected in the first wave, have seen a sharp rise in deaths during the second and third waves. An example is given by the southern regions of Italy, which became a new hotspot. One advantage of our analysis concerns the possibility of defining community mitigation strategies not only at countries levels, but also at the level of clusters, where their members (administrative areas) appear homogeneous in terms of temporal patterns. This clustering effect calls for a strong inter-governmental coordination and for a territorial approach in order to define joint solutions and enhance acceptance of measures at all levels. The aim is thus to facilitate a regional cooperation to support recovery strategies by ensuring coherent mitigation guidelines, pooling resources, and strengthening investment opportunities. In particular, promoting sub-national governments could help to evaluate the degree of asymmetry and differentiate aid schemes to align with the differentiated impact of COVID-19. In policy-making, trying to understand the underlying causes of this heterogeneity represents a crucial step to evaluate the efficacy of the different mitigation strategies adopted by central and local governments. However, assessing the effect of the individual intervention policies is a very challenging task due to the inherent multidimensional nature of the phenomenon under investigation. From the statistical analysis, in fact, there is a clear hint that differences in population density, age structure, urbanization and socio-economic factors contribute to differentiate the impact of COVID-19. Indeed, even if coronavirus infects exposed subjects indiscriminately, the severity of disease and social and economic effects are not being felt equally throughout the European territory, partly due to existing differences in health between and within countries, mostly related to socio-economic inequities. To fully understand the impact of containment policies on the pandemic dynamic, it would be necessary to explicitly relate these factors to the death counts within the proposed methodological framework. Albeit the proposed model could easily accommodate the introduction of these factors, in practice, we have found it infeasible as most of the covariates arising from multiple sources are often spatially misaligned or aggregated over different geographical units and, more importantly, they are only spatially referenced, making it impossible to evaluate their temporal effect on disease spreading. Accordingly, here, we have not attempted to establish the causality of the observed differences between clusters, but we have simply provided a tool to evaluate the evolution of the pandemic across European regions. The inclusion of these covariates and the other risks factors will be the objects of future studies.

17 in total

1. Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.

Authors: Ioanna Manolopoulou; Cliburn Chan; Mike West
Journal: Bayesian Anal Date: 2010 Impact factor: 3.728

2. Multiple change point estimation of trends in Covid-19 infections and deaths in India as compared with WHO regions.

Authors: Pavan Kumar S T; Biswajit Lahiri; Rafael Alvarado
Journal: Spat Stat Date: 2021-09-03

3. The effect of large-scale anti-contagion policies on the COVID-19 pandemic.

Authors: Solomon Hsiang; Daniel Allen; Sébastien Annan-Phan; Kendon Bell; Ian Bolliger; Trinetta Chong; Hannah Druckenmiller; Luna Yue Huang; Andrew Hultgren; Emma Krasovich; Peiley Lau; Jaecheol Lee; Esther Rolf; Jeanette Tseng; Tiffany Wu
Journal: Nature Date: 2020-06-08 Impact factor: 49.962

4. Discussion of the paper "Clustering Random Curves Under Spatial Interdependence with Application to Service Accessibility" by Jiang and Serban.

Authors: Jiaping Wang; Haipeng Shen; Hongtu Zhu
Journal: Technometrics Date: 2012-05-01

5. A spatio-temporal model based on discrete latent variables for the analysis of COVID-19 incidence.

Authors: Francesco Bartolucci; Alessio Farcomeni
Journal: Spat Stat Date: 2021-03-27

6. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England.

Authors: Sam Abbott; Rosanna C Barnard; Christopher I Jarvis; Adam J Kucharski; James D Munday; Carl A B Pearson; Timothy W Russell; Damien C Tully; Alex D Washburne; Tom Wenseleers; Nicholas G Davies; Amy Gimma; William Waites; Kerry L M Wong; Kevin van Zandvoort; Justin D Silverman; Karla Diaz-Ordaz; Ruth Keogh; Rosalind M Eggo; Sebastian Funk; Mark Jit; Katherine E Atkins; W John Edmunds
Journal: Science Date: 2021-03-03 Impact factor: 63.714

7. Determining the spatial effects of COVID-19 using the spatial panel data model.

Authors: Hasraddin Guliyev
Journal: Spat Stat Date: 2020-04-07

8. Spatiotemporal analysis and hotspots detection of COVID-19 using geographic information system (March and April, 2020).

Authors: Mohsen Shariati; Tahoora Mesgari; Mahboobeh Kasraee; Mahsa Jahangiri-Rad
Journal: J Environ Health Sci Eng Date: 2020-10-12

9. An ensemble approach to short-term forecast of COVID-19 intensive care occupancy in Italian regions.

Authors: Alessio Farcomeni; Antonello Maruotti; Fabio Divino; Giovanna Jona-Lasinio; Gianfranco Lovison
Journal: Biom J Date: 2020-11-30 Impact factor: 1.715

10. Spatial-temporal generalized additive model for modeling COVID-19 mortality risk in Toronto, Canada.

Authors: Cindy Feng
Journal: Spat Stat Date: 2021-07-06