Literature DB >> 33816095

A spatio-temporal model based on discrete latent variables for the analysis of COVID-19 incidence.

Francesco Bartolucci1, Alessio Farcomeni2.   

Abstract

We propose a model based on discrete latent variables, which are spatially associated and time specific, for the analysis of incident cases of SARS-CoV-2 infections. We assume that for each area the sequence of latent variables across time follows a Markov chain with initial and transition probabilities that also depend on latent variables in neighboring areas. The model is estimated by a Markov chain Monte Carlo algorithm based on a data augmentation scheme, in which the latent states are drawn together with the model parameters for each area and time. As an illustration we analyze incident cases of SARS-CoV-2 collected in Italy at regional level for the period from February 24, 2020, to January 17, 2021, corresponding to 48 weeks, where we use number of swabs as an offset. Our model identifies a common trend and, for every week, assigns each region to one among five distinct risk groups.
© 2021 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Data augmentation; Hidden Markov models; MCMC; SARS-CoV-2; Swabs

Year:  2021        PMID: 33816095      PMCID: PMC7997863          DOI: 10.1016/j.spasta.2021.100504

Source DB:  PubMed          Journal:  Spat Stat


Introduction

COVID-19 syndrome is due to newly discovered SARS-CoV-2 coronavirus, which acquired the ability to infect humans, and for human-to-human transmission, during 2019. See Hu et al. (2020) and references therein for a review. Due to very limited pre-existing immunity, SARS-CoV-2 has the potential to be highly infectious. Simultaneously, pathogenicity is high enough to generate a proportion of severe cases (Buss et al., 2021, Del Sole et al., 2020) that can overwhelm health systems (Grasselli et al., 2020, Farcomeni et al., 2021a) when prevalence is high. Monitoring the epidemics is therefore a priority for policy, planning, and resource allocation. Public data for Italy are available at regional level. These include incident cases (i.e., new positives), prevalent cases (i.e., currently infected), deaths, number of infected currently in hospital wards, number of infected specifically hospitalized in Intensive Care Units, number of swabs, and number of tested cases. Data are timely updated, every day at around 6p.m., by the Italian Civil Protection Department. However, counts are subjected to measurement errors and biases of various nature, among which we mention two important sources of error that we take into account in our model. First of all, data collection is not standardized and, in many areas, not in electronic form. This results in frequent late notifications and communication errors that are corrected afterwards. Adjustments are made to the daily count, in order to obtain the correct cumulative number of cases; this may make daily counts inconsistent, and occasionally negative. This is the main reason why we will work with weekly incidence, even if counts are available on a daily basis. This is not ensured to resolve measurement issues but, as can be seen from our descriptive analyses, it mitigates them sufficiently. Secondly, it can be easily argued that the most prominent source of bias arises from undercount. Several infections from SARS-CoV-2 are asymptomatic or paucisymptomatic, and will easily go undetected (Li et al., 2020). This is linked to the difficulties in testing and tracing (Contreras et al., 2021). Indeed, SARS-CoV-2 infection at first could be detected only through polymerase chain reaction of nasopharingeal swabs, whose availability was very limited, and yet has not scaled sufficiently. Notably, in the first months of 2020, swabs were mostly used in Italy for confirmation of severe symptomatic cases, with the exception of Veneto and some provinces of other regions. Diagnostic testing was later extended also to asymptomatic subjects, through contact tracing and screening strategies. Contact tracing efforts have varied wildly over time and space due to unobserved reasons (e.g., availability of swabs, tracers, underlying incidence, policies), making the proportion of identified cases highly and unpredictably variable over time and space. This is another important reason why we will model weekly counts using swabs as an offset, essentially studying positive rate rather than incidence. We prefer to offset with respect to the number of swabs rather than that of tested cases, as the latter has been added to the data set only starting from April 19, 2020. More importantly, a positive rate defined with respect to the number of swabs has been identified by WHO as an official indicator of a nation’s ability to flatten the curve. Being focused on weekly positive rate, and on a flexible model that can take into account spatio-temporal unobserved heterogeneity, we claim that our analysis is somehow more reliable than more common analyses simply focused on fixed-effects models for daily incident cases. On the other hand, our results should be interpreted with some care: a high positive rate is an indication of inability to perform contact tracing, and only an indirect indication of high incidence. In order to analyze data of the type described above, we propose a model based on discrete latent variables, so that regions may be clustered in groups corresponding to different levels of severity. More precisely, we adopt a model with a structural component formulated in the spirit of Bartolucci and Farcomeni (2021). Discrete latent variables are assumed to depend on the current status of their neighbors according to an auto-logistic model (Besag, 1974). More precisely, each area-time-varying latent distribution depends, via a logistic parameterization, on the latent states of the neighbors for the same time occasion, and on the previous latent state for the same site (as in a standard hidden Markov model). For each geographical region, the sequence of latent variables is assumed to follow a first-order Markov chain with initial and transition probabilities depending on the latent states of the neighboring areas. The model may be seen as an extension of a hidden Markov model for longitudinal data (Bartolucci et al., 2013, Bartolucci et al., 2014), accounting for spatial interaction. From another perspective, we are adopting a temporal extension of a hidden Markov field model (Qian and Titterington, 1991, Green and Richardson, 2002, Spezia et al., 2018) for spatial data, based on discrete latent variables. Unlike Bartolucci and Farcomeni (2021), which is restricted to binary outcomes, in this work we outline a class of models based on general parametric assumptions on the outcome. The proposed inferential strategy seems to be stable under different model specifications. Another distinctive feature with respect to Bartolucci and Farcomeni (2021), that we find very useful to modeling COVID-19 data, is that a common trend is estimated through splines, therefore without strict parametric assumptions on its shape. This allows us to automatically fit more pandemic waves within the same model. On the other hand, more common parametric models, typically based on logistic growth curves (e.g., Cabras, 2020, Girardi et al., 2020, Alaimo Di Loro et al., 2020), are restricted to modeling only a single wave at a time, and often involve arbitrarily setting an initial and final date for the specific wave. For model estimation we adopt a Markov chain Monte Carlo (MCMC) algorithm that extends the proposal of Bartolucci and Farcomeni (2021) to the model class here adopted. The algorithm is based on data augmentation (Tanner and Wong, 1987), and alternates different steps at which parameter values and latent states are drawn. For selecting the number of latent states we adopt a simple strategy based on post-processing the MCMC output, thus avoiding computational overheads and issues with respect to estimating the marginal likelihood or ratios of marginal likelihoods. The particular model we use for COVID-19 data assumes that the number of incident cases for a certain area and time occasion follows a Poisson regression model, with offset equal to the logarithm of the number of swabs, conditionally on a linear predictor that depends on a specific discrete latent variable. In particular, the log-linear predictor is based on splines of time of suitable order common to all areas, and a shift that depends on the value of the underlying latent variable. This allows us to separate a common time trend from unobserved heterogeneity, which is completely captured by the latent variable. Additionally, our specification allows us to cluster areas with respect to shifts from the common trend, an operation that can be used to identify risk profiles in an unsupervised manner. Italian authorities have worked for some time with three main risk profiles (areas can be yellow, orange, or red) and have recently increased the number of risk profiles to five (adding white and “dark orange” groups). Risk profiles are currently identified according to an algorithm which is mainly based on estimates of effective reproductive number, restricted to the symptomatic cases. The rest of the paper is organized as follows. In the following section we illustrate the assumptions of the proposed model. Bayesian inference, and in particular the MCMC algorithm for parameter estimation, is described in Section 3 together with model selection. The application to Italian regional data is described, together with the data structure, in Section 4. Section 5 provides some conclusions, and lists routes for possible extensions.

Model assumptions

Let be the count variable of interest for area at time , where and . In our context this variable corresponds to the daily number of positives in a certain region. Let denote the observed value of , with all values collected in the matrix . Finally, we represent the spatial structure by the indicator variables , where if area is in the neighborhood of (and viceversa) and otherwise, for . The first assumption of our model is that a discrete latent variable is associated to each area and time so that, given the set of all these latent variables, the response variables are conditionally independent. These latent variables have support points, referred to as latent states, labeled from 1 to . We assume that , conditionally on , has a distribution belonging to the natural exponential family (see also McCullagh and Nelder, 1989, Farcomeni, 2015): where functions , , and are known, while is a parameter of interest and a nuisance parameter. We then adopt a specific link function and assume that where is an intercept specific to latent state and is the base vector at time of splines of a suitable order , with prespecified knots (e.g., Wood, 2017). The parameters , together with , will collected in the vector . In our application we assume that , conditional on , follows a Poisson regression model with offset . The offset is set equal to the logarithm of the number of swabs in the same area and for the same period. The log-linear predictor depends additionally on the value of the underlying latent variable, plus a common trend that is independent of the latent state. More formally, we assume that with Regarding the assumptions on the distribution latent variables, we slightly generalize [5] by allowing general dependence structures among each latent state and its neighbors, as we summarize in the following. Let denote the vector of latent variables of the neighbors of area at time , that is, variables such that . For the initial probabilities we assume that where is a known function. In our implementation we specify so to obtain a vector with a leading unity (for the intercept) and the remaining elements corresponding to the proportion of neighbors currently dwelling in each latent state apart from the first. For ease of notation we collect vectors in the matrix . Regarding the transition probabilities, we assume a similar parameterization, but using the starting state as reference category, that is, for . Here is a possibly time-dependent known function. In our implementation we specify for all . Parameters for the transition distribution are collected in matrix , with vectors for , with . For all parameters we assume prior independence and that they a priori follow a zero-centered Gaussian distribution. Formally, for and we assume that

Bayesian inference

In this section we describe an algorithm for approximately sampling from the posterior distribution, which is closely related to that in Bartolucci and Farcomeni (2021). We follow an augmentation scheme according to which latent states are sampled from their full conditional at each iteration. Let denote a matrix with element for and . Formally, Bayes theorem can be used to show that the posterior distribution is proportional to where the normalizing constant cannot be obtained in closed form. A particularly challenging factor to be computed among those in (7) is the full conditional distribution of , considering the assumed spatio-temporal dependence structure. We approximate this distribution as in Bartolucci and Farcomeni (2021) through the following pseudo-probability It must be clarified that the above expression coincides with the true probability under spatial independence, whereas in general the quality of the approximation decreases as the spatial dependence becomes stronger. However, adopting this approximation is rather common in spatial statistics even for simpler models, and leads to the definition of a pseudo-likelihood (Besag, 1975); see also Spezia et al. (2018). A brief additional discussion on this choice is given in the concluding section. Regarding the conditional distribution of the response variables , with realizations collected in , under the conditional independence assumption it is straightforward to see that where is the density or probability mass function corresponding to the assumed distribution, that belongs to the natural exponential family, defined in (1). Dependence on is formulated according to (2). For our specific application, each response variable has a conditional Poisson distribution defined as in (3) and link function defined in (4).

Markov chain Monte Carlo algorithm

In order to approximate the posterior distribution of model parameters, we rely on an MCMC algorithm, with an augmented parameter space. The algorithm is based on repeating a sequence of iterations for a large number of times , where initial iterations are finally discarded (burn-in). Each of these steps can be summarized as follows: Update of theparameters: A candidate update is sampled from a Gaussian distribution centered at the current value , with variance . The proposed parameter vector is accepted with probability Update of the latent variable values: here, differently from Bartolucci and Farcomeni (2021), for and we use a pseudo-Gibbs sampling step, based on the -dimensional vector of probabilities with elements equal to where is the current matrix of latent states , with elements substituted by . The new value of is drawn from a Multinomial distribution with parameters 1 and . Update of theparameters: for we propose a new parameter vector which is accepted with probability where is the current matrix with vector substituted by . Update of theparameters: for , with , we propose a new parameter vector , denoted by , which is accepted with probability where is the matrix with parameter vector substituted by . The products in the previous expression may be restricted to the only cases in which . The MCMC output may be elaborated in the usual way to obtain point estimates and credible intervals for the model parameters. Similarly, latent states are predicted according to a maximum a posteriori rule, that is, the predicted latent state for each area at each time point corresponds to the latent state most frequently sampled during the MCMC iterations, after ignoring burn-in. It is straightforward to see that the model is label independent and that consequently the posterior distribution will have modes. There are several ways of dealing with this label switching issue. In our implementation we use an on-line approach that, at every iteration of the MCMC algorithm, maps the sampled parameters so that latent intercepts are increasingly ordered. This is simpler and implies a reduced computational load with respect to the post-processing approach of Bartolucci and Farcomeni (2021). For selecting the number of support points of the latent variables we use a similar approach to Bartolucci and Farcomeni (2021). In our Bayesian framework it would be natural to set up a reversible jump sampling scheme, after specification of a prior distribution for the number of latent states. This would anyway be cumbersome both from a computational and formal perspective, as acceptance probabilities for transdimensional moves are not easily derived under the current general model formulation. A simple solution is to repeatedly fit the model for different values of , and compare the results through an appropriate tool like an information criterion. This is particularly advantageous from a computational perspective, as parallel computing can be set up in order to simultaneously fit the model for different number of latent states. In particular, we suggest to rely on the WAIC (Watanabe, 2010, Vehtari et al., 2017), which is a measure of the predictive accuracy and is a direct by-product of our MCMC sampling scheme. For each model specification, the WAIC is computed first by evaluating the log-pointwise predictive density and where is the variance of across the iterations producing parameter vectors . The expected log-pointwise predictive density for a new data set is then computed as Due to a certain instability in the estimation of this variance that we noted in the application, the above quantities are computed only using the final part of the MCMC iterations. An advantage of the WAIC is that it does not involve the marginal likelihood, and it is simpler to compute than other Bayesian criteria. Additionally, unlike information criteria adopted in the frequentist approach, it does not require computation of the maximum of the observed likelihood.

Application

We applied the approach illustrated in the previous sections to the Italian regional data for the period from February 24, 2020, to January 17, 2021. There are wide daily oscillations and a clear weekly seasonality, with substantially fewer swabs on Sunday with respect to the rest of the week. In order to overcome problems with data reporting we simply aggregate data at weekly level, as already mentioned. In the end, we record weeks of observation for each of areas (19 corresponding to Italian regions and 2 corresponding to the two provinces of the Trentino Alto Adige region: Bolzano and Trento). The analysis is based on the number of new positives reported in each week, and the corresponding number of swabs. The first is our outcome of interest , while, on the basis of the second, we obtain the offset . Our aims with this analysis are to: () identify a common trend for Italy, after having taken into account variation at regional level that is due to time-varying unobserved heterogeneity and spatial dependence; () compare regional trends with respect to the common trend; () identify latent trajectories in order to identify different risk profiles; and () identify latent clusters in order to dynamically assign (i.e., specifically for each week) a region to a risk profile.

Data description

In Table 1 we report descriptive statistics about the number of swabs, number of positives, and rate of positives at regional and Italian levels. We observe heterogeneity among regions, with an average rate of positives that goes from 4.23% for Calabria to 9.58% for Lombardia, while the Italian average rate is 6.87%. Regarding the temporal pattern, in Fig. 1, Fig. 2 we represent the weekly positive rate region by region, together with the Italian tendency obtained by fitting our model with , that is, a fixed-effect Poisson spline regression with offset. This regression model is based on splines of cubic order, and knots at each fourth week starting from the fifth.
Table 1

Descriptive statistics for the number of swabs, number of positives, and rate of positives at regional and Italian levels. P. A. stands for ”Provincia Autonoma”.

RegionAreaN. swabs
N. positives
Rate
minmeanmaxminmeanmaxminmeanmax
AbruzzoSouth521210431584583846660.00090.05380.2301
BasilicataSouth39427412729026715910.00000.04260.1584
CalabriaSouth351024423170161938030.00010.04230.1641
CampaniaSouth3804761416493684411273190.00030.06150.2209
Emilia-RomagnaNorth1795598631316031184302172180.00320.07410.3449
Friuli Venezia GiuliaNorth24322081550903129757210.00020.04230.1297
LazioCenter7246304518807154035184600.00300.04360.1350
LiguriaNorth135164824117632140070860.00360.08910.3660
LombardiaNorth742211133929384839110940600260.00720.09580.4609
MarcheCenter10313161417758106645270.00120.08370.4482
MoliseSouth62707792801619140.00000.04500.1701
P. A. BolzanoNorth21861626080171442100.00070.06480.3810
P. A. TrentoNorth1161036322639052517400.00000.05840.3737
PiemonteNorth36243306145671384582276860.00210.09320.3974
PugliaSouth262251436364632328111230.00030.06120.1875
SardegnaSouth291122528203076233100.00000.05110.1465
SiciliaSouth2953058813434952577126740.00030.05490.1764
ToscanaCenter57243394125867132720164570.00150.04770.1840
UmbriaCenter351171830907168739920.00010.04480.1805
Valle d’AostaNorth1015675019016311110.00000.08570.4264
VenetoNorth686277527177645266344262820.00040.06600.2263

North191663511447877331034302671445720.00500.07750.2931
Center1434131318360265458509433810.00190.04970.1737
South10981438993889143511963598370.00060.05570.1568

Italy2169862636115101901306507382434440.00400.06870.2543
Fig. 1

Positive rate for twelve regions, together with the Italian tendency (in red). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 2

Positive rate for nine regions, together with the Italian tendency (in red). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

In terms of partition between areas, the North presents an average rate of 7.75%, which is distant from that of the Center and South that have a rate around 5%. This is due to the fact that northern regions, and especially Lombardia, were hit very hard during the first outbreak. The limited amount of swabs available made it very difficult to perform contact tracing or screening, and for some time swabs were reserved to symptomatic cases with a high index of suspicion, resulting in high positive rates. This is apparent not only for Lombardia but also for Liguria, that experienced a comparatively large outbreak in the late Summer of 2020. A high heterogeneity is also observed in the Center, where the Marche region has an average positive rate of 8.37%, much higher than the other central regions, whereas less heterogeneity is observed in the South. All regions present two or three peaks in the temporal distribution of positive rates, with Lombardia, Marche, and Piemonte clearly being hit harder than other regions in the first few weeks. An interesting example is that of Veneto, which managed well the first wave through relatively massive testing, but then had troubles in Autumn 2020. It is speculated that this is due to the somehow restricted use of molecular swabs to confirm results of rapid antigen tests, which are less accurate. Descriptive statistics for the number of swabs, number of positives, and rate of positives at regional and Italian levels. P. A. stands for ”Provincia Autonoma”. Positive rate for twelve regions, together with the Italian tendency (in red). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Positive rate for nine regions, together with the Italian tendency (in red). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) The adjacency matrix has been defined by considering regions sharing land borders as neighbors. As a consequence, seven regions have three neighbors, which is the mode. Two regions have six neighbors. Sardinia, which is an island, does not have any neighbor according to our definition. For the other island, Sicily, we selected Calabria as the only neighbor. The two regions are separated by the few kilometers wide Strict of Messina, with ferries going back and forth more than 150 times a day. In Table 2 we report the average positive rate for each region in comparison with that of the neighborhood, in order to better appraise the spatial pattern. The correlation between these two quantities is equal to 0.589, which is in agreement with the heterogeneity of the regional situations already noted in commenting Table 1.
Table 2

Number of neighbors, average positive rate, and positive rate of the neighborhood for each region.

RegionN. neighborsMean rate
RegionNeighborhood
Abruzzo30.05380.0519
Basilicata30.04260.0579
Calabria20.04230.0533
Campania40.06150.0477
Emilia-Romagna60.07410.0780
Friuli Venezia Giulia10.04230.0660
Lazio60.04360.0579
Liguria30.08910.0726
Lombardia50.09580.0690
Marche50.08370.0569
Molise40.04500.0518
P. A. Bolzano30.06480.0803
P. A. Trento30.05840.0805
Piemonte40.09320.0884
Puglia30.06120.0590
Sardegna00.0511
Sicilia10.05490.0423
Toscana50.04770.0645
Umbria30.04480.0503
Valle d’Aosta10.08570.0932
Veneto50.06600.0819
Number of neighbors, average positive rate, and positive rate of the neighborhood for each region.

Data analysis

In applying our approach we considered the model based on the splines and defined by Eqs. (3), (4). In particular, as already mentioned, we considered splines of cubic order with knots every four weeks starting from the fifth, the same adopted to describe the national Italian pattern in the previous section. For this model we considered a number of latent states () from 1 to 6. Each model was estimated by an MCMC algorithm based on iterations after a burnin of iterations. In order to reduce the autocorrelation between consecutive draws, we fixed a thinning of 100. The convergence diagnostics are good for all values of , with low autocorrelation functions for the final chains. The acceptance rates for the Metropolis-within-Gibbs steps are sensible for all parameters. The results from the MCMC algorithm in terms of , as defined in (8), and its standard deviation, are reported in Table 3. On the basis of theese results, taking into account also the standard deviation, we selected the model based on latent states. This result is particularly interesting since authorities have recently switched from a three-level to a five-level regime (white, yellow, orange, dark orange, red). Our results suggest a mild evidence that, over time, there might actually have been already five different risk levels to differentiate regions with respect to a common trend.
Table 3

Results in terms of and corresponding standard deviation for the model based on splines for different values of the number of latent states .

uelpd^waicse(elpd^waic)
1−169866.0315747.37
2−66676.264219.82
3−35660.022096.13
4−23159.701355.10
5−19690.561201.52
6−17079.821041.95
Results in terms of and corresponding standard deviation for the model based on splines for different values of the number of latent states . In order to interpret our risk stratification, we report in Table 4 posterior summaries for the latent intercepts. We note that latent intercepts are roughly equally spaced, and that posterior distributions seem to be well separated. Jointly with the estimate of the regression coefficients in , which are not reported here because their interpretation is not straightforward, we obtain five trajectories that are represented in Fig. 3.
Table 4

Estimated intercepts of each latent state.

uξˆuse(ξˆu)
1−3.4250.0100
2−3.0570.0114
3−2.8420.0130
4−2.6130.0126
5−2.1300.0112
Fig. 3

Trajectories corresponding to the 5 latent states suitably ordered together with the Italian trend (dashed curve).

The five latent states correspond to increasing degrees of severity in terms of number of positives, conditionally on the number of swabs. The corresponding trajectories are represented in Fig. 3 in comparison with the Italian trend directly obtained from the observed data, the same used in Fig. 1, Fig. 2. We observe that there are two trajectories that are uniformly below the national trend (for states 1–2) and two that are uniformly above the national trend (for states 4–5). The third state corresponds to a trend very close to the national one, but not uniformly above or below the latter. Estimated intercepts of each latent state. It is important to consider that each region may be in one of the five latent states at each time occasion, and many regions move among latent states during the observation period. Each region moves between these trajectories according to our hidden Markov formulation, depending on the parameters and in (5), (6), respectively. Rather than reporting the estimates of these parameters, to favor interpretability we directly illustrate the estimated latent structure by reporting the distribution of the predicted latent states. In fact, the MCMC algorithm also allows us to assign each region to a latent state in a dynamic fashion, on the basis of the number of visits to these states. The posterior distribution of the latent states for each region, across weeks, is reported in Table 5. The time variation of assigned latent states, in each week, is shown in a heatmap in Fig. 4. A clear pattern emerges for most regions, which we better comment below. It shall be noted that Valle D’Aosta is one of the few regions without a clear emerging temporal pattern. We speculate this is due to the high variability of positive rate over time related to the reduced number of tests and positives in this small region. As a result, identification of small clusters could have an observable effect on the positive rate in this region. We additionally report in Table 6 the predicted latent states for each region across three periods, which define as the first wave (first fifteen weeks, until beginning of June, 2020), the transition period (the following eleven weeks, until mid-August), and the second wave (the remaining observation period).
Table 5

Distribution of the predicted latent states across weeks at regional level.

RegionLatent state
12345
Abruzzo0.3830.4040.1700.0210.021
Basilicata0.6170.1910.0850.0640.043
Calabria0.7020.0850.1060.0850.021
Campania0.3620.1490.1700.1060.213
Emilia-Romagna0.2340.1280.1910.2550.191
Friuli Venezia Giulia0.7450.1490.0850.0210.000
Lazio0.4890.1060.1700.1700.064
Liguria0.0430.2130.0640.3190.362
Lombardia0.0430.1060.2340.1700.447
Marche0.2980.2130.2770.0430.170
Molise0.5530.2980.0430.0430.064
P. A. Bolzano0.4260.1280.2340.1910.021
P. A. Trento0.5110.1700.0210.2130.085
Piemonte0.0210.1060.2340.3830.255
Puglia0.3830.2130.1490.1910.064
Sardegna0.4680.2980.0430.0640.128
Sicilia0.4260.1280.2770.1490.021
Toscana0.5530.2770.1490.0210.000
Umbria0.5960.2770.0850.0210.021
Valle d’Aosta0.1910.3190.1910.1060.191
Veneto0.5320.2130.0850.0640.106

Italy0.4080.1990.1460.1290.119
Fig. 4

Heatmap of the predicted latent state for each region at each week. Darker is associated with worse predicted risk.

Table 6

Distribution of the predicted latent states across epidemic phases, at regional level.

RegionFirst waveTransition timeSecond wave
Latent state
Latent state
Latent state
123451234512345
Abruzzo0.470.200.130.130.070.360.090.360.090.090.000.180.500.320.00
Basilicata0.800.200.000.000.000.820.000.000.090.090.140.140.500.180.05
Calabria0.930.070.000.000.000.820.180.000.000.000.140.090.410.320.05
Campania0.470.330.130.070.000.270.270.090.270.090.000.090.180.320.41
Emilia-Romagna0.070.070.270.400.200.180.360.000.360.090.000.180.320.500.00
Friuli Venezia Giulia0.530.470.000.000.001.000.000.000.000.000.050.230.500.230.00
Lazio0.200.730.070.000.000.000.450.000.550.000.000.090.820.090.00
Liguria0.000.000.130.130.730.090.090.550.180.090.000.000.450.180.36
Lombardia0.000.000.000.001.000.000.090.360.090.450.000.140.410.230.23
Marche0.270.270.000.000.470.640.180.090.090.000.000.090.320.450.14
Molise0.530.000.000.400.070.730.000.090.000.180.090.180.360.360.00
P. A. Bolzano0.470.270.200.000.070.640.180.000.180.000.000.140.270.450.14
P. A. Trento0.270.130.070.530.000.910.000.000.000.090.050.270.500.090.09
Piemonte0.000.000.130.270.600.000.180.550.270.000.000.000.320.450.23
Puglia0.200.600.200.000.000.820.000.180.000.000.000.000.140.640.23
Sardegna0.530.330.130.000.000.640.180.090.000.090.000.000.360.450.18
Sicilia0.670.330.000.000.000.450.360.090.000.090.000.000.180.730.09
Toscana0.330.470.200.000.000.730.180.090.000.000.000.410.320.270.00
Umbria0.670.130.130.000.070.910.000.090.000.000.000.320.450.230.00
Valle d’Aosta0.330.070.270.000.330.450.090.270.000.180.000.050.320.320.32
Veneto0.530.470.000.000.000.450.550.000.000.000.000.270.180.090.45
Trajectories corresponding to the 5 latent states suitably ordered together with the Italian trend (dashed curve). Pooling information across regions, it can be argued that the latent state corresponding to the lowest risk is the most visited (40.8% of time-area occasions), whereas that corresponding to the highest risk is visited 11.9% of time-area occasions. The distribution of the latent states at regional level is in agreement with the data description in Section 4.1. Many regions are never or very rarely assigned to the last latent state whereas other regions, such as Liguria and Lombardia, are frequently assigned to this state and at the same time are rarely assigned to the latent state corresponding to the lowest risk. Regions that are assigned to the last state at least 10% of times (about equal to the national average) are in the North of Italy, with the exception of Marche and Campania. On the contrary, regions that are assigned at least 40% of times (corresponding about to the national average) to the first state are in the Center and South, with the exception of Friuli Venezia Giulia, Veneto, and Trentino provinces. Distribution of the predicted latent states across weeks at regional level. Distribution of the predicted latent states across epidemic phases, at regional level. Heatmap of the predicted latent state for each region at each week. Darker is associated with worse predicted risk. The temporal pattern in Fig. 4 is also of interest, and can be compared with the observed patterns in Fig. 1, Fig. 2. There are regions that started in an unfavorable situation, namely in the last latent state, and then improved much towards the end of the period of observation, such as Lombardia and Liguria. Other regions have an opposite trend, with a clear worsening of the situation across time, such as Veneto and Friuli Venezia Giulia. More mixed trends are typically observed for regions in the Center and the South of Italy. For instance, Toscana and Umbria moved towards states of greater severity in an intermediate period, but then saw a decreased incidence. This is especially true for Toscana, after the end of a localized lockdown that lasted few weeks. Finally, in order to illustrate the spatial patterns in terms of predicted latent states, in Table 7 we report a measure of agreement between the state predicted for each region and the states predicted for its neighbors. This index is obtained by counting, for each week, the number of neighbors having a predicted state equal to that of the region. Once collapsed over the time occasions, this count is divided by the number of weeks and that of neighbors. We also include the same agreement index computed across macroregions. The average of the agreement measure at Italian level is 0.361, which compares with a theoretical value of 0.2 in case of absence of spatial dependence, being the number of latent states. In this regard there is a certain heterogeneity, with some regions showing a strong spatial dependence, such as Calabria, and others showing a weaker spatial dependence, such as Emilia-Romagna. Even when we aggregate the index at a macroregional level it can be noted that northern regions show a larger spatial dependence than Center and South. We speculate that strong spatial dependence could be due to ongoing pendolarism across regions even during lockdown periods, where people were indeed allowed to move across regions for work. Note, however, that strength of spatial dependence might be influenced by the size of the neighborhood even if we divide this simple index by the number of neighbors. Overall, spatial dependence is far from irrelevant and it should then be appropriately modeled when analyzing the data at hand. This conclusion is also in agreement with what already noted in Section 4.1 on the basis of the results in Table 2.
Table 7

Agreement between the state assigned to a certain region and to its neighbors.

RegionN. neighborsAgreement
Abruzzo30.326
Basilicata30.482
Calabria20.617
Campania40.362
Emilia-Romagna60.206
Friuli Venezia Giulia10.574
Lazio60.305
Liguria30.213
Lombardia50.255
Marche50.306
Molise40.372
P. A. Bolzano30.355
P. A. Trento30.234
Piemonte40.351
Puglia30.340
Sardegna0
Sicilia10.553
Toscana50.311
Umbria30.433
Valle d’Aosta10.298
Veneto50.332

North30.134
Center50.085
South20.108

Italy30.361
Agreement between the state assigned to a certain region and to its neighbors.

Discussion

We propose a spatio-temporal approach for the analysis of weekly COVID-19 data, in which the number of swabs is used as an offset. Conditionally on a discrete latent variable, incident cases are assumed to follow a Poisson distribution modulated by a common trend, which is flexibly estimated using a spline model on the log-scale, and a multiplicative shock that depends on the latent state. Latent states evolve over time according to an inhomogeneous first-order Markov chain, so that areas can move from one level of risk to another. Levels of risk are therefore not absolute, but defined as proportional variations with respect to the common trend. Finally, spatial dependence is taken into account through the latent state, which depends on the state of neighboring areas at the same time occasion. In contrast to other works using splines to approximate the emission densities (e.g., Langrock et al. (2015)), in this work we have specified them to describe a non-linear relationship between expected counts and time, in the spirit of Langrock et al. (2017). For ease of computation we have approximated the conditional distribution of the latent states through a pseudo-probability. This is not uncommon in spatial statistics, and an excellent fit is obtained, even if some care might have to be used in interpreting parameters linked to the latent distribution. Readers are pointed to Friel and Pettitt, 2004, Friel et al., 2009 and Everitt (2012) for further discussion on this point, and to Spezia et al. (2017) and references therein for other examples in spatial statistics. An alternative to this approximation would have been to use a nested sampling algorithm, according to which is approximated at each MCMC iteration. This more rigorous approach would have been problematic from a computational point of view, as approximating the correctly specified full conditional for would have implied a full-length MCMC algorithm within each outer MCMC iteration. We have used the logarithm of the number of swabs as an offset in order to partially overcome bias due to the unknown and spatio-temporal heterogeneous sampling ratio. Our analysis, combined with weekly aggregation, gives a somehow robust assessment of five risk profiles and a common trend. In our implementation we have defined a spatial structure that depends on sharing a land border. This is reasonable, but results should be interpreted considering this choice. Other choices are possible, such as defining a spatial structure on the basis of direct train or flight connections as in Della Rossa et al. (2020). While adopting a different spatial structure would probably lead to slightly different results regarding the distribution of latent states, we are confident that as long as enough and appropriate connections are specified, the goodness-of-fit and qualitative conclusions would be similar. The advantage of using our spatial structure is that it promotes similar latent states in adjacent regions, allowing authorities to act homogeneously in neighboring areas (e.g., specifying a policy for all regions in the north-east of Italy rather than for regions that are well connected from an economic and social point of view, but far apart from each other). We leave to further work the derivation of a formal method for obtaining a posterior distribution on the number of latent states. One could also extend the model in order to allow for a time-varying number of latent states, as in Anderson et al. (2019), thus obtaining a different number of risk profiles at different times. Other possible extensions of our model involve use of more flexible parametric assumptions for the counts, for instance a negative Binomial, which would take into account residual overdispersion, and extension to multivariate outcomes (e.g., a joint model for incident cases, hospital admissions, and deaths). While this would be straightforward using conditional independence assumptions (e.g., Bartolucci and Farcomeni, 2009, Bartolucci and Farcomeni, 2015, Farcomeni et al., 2021b), in our case the outcomes would have constraints that it is not straightforward to take into account. For instance, the (cumulative) number of deaths clearly cannot exceed the cumulative number of incident cases, and similarly for hospital admissions. Current assumptions, also, allow us to identify additive unobserved effects with respect to a common trend, essentially resulting in a multiplicative shift of the same. Much more care would be needed to identify a different trend for each latent state. Another possible extension would be the use of covariates at site level, which would allow us for instance to catch cyclic weekly effects (if modeling daily counts). Covariates at site level might also include indicators of interventions (like lockdowns, curfews, school closures), but interpretation of effects would require a lot of care due to endogeneity. Finally, we believe the methodological device we put forward in this work is not only useful for analysis of COVID-19 data, but it can be applied with minor changes also in other areas of disease mapping. In applications of the latent Markov framework, indeed, spatial dependence might often be present and is generally ignored (e.g., Dotto et al., 2019).
  12 in total

1.  Nonparametric inference in hidden Markov models using P-splines.

Authors:  Roland Langrock; Thomas Kneib; Alexander Sohn; Stacy L DeRuiter
Journal:  Biometrics       Date:  2015-01-13       Impact factor: 2.571

2.  Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy: Early Experience and Forecast During an Emergency Response.

Authors:  Giacomo Grasselli; Antonio Pesenti; Maurizio Cecconi
Journal:  JAMA       Date:  2020-04-28       Impact factor: 56.272

3.  Robust inference for non-linear regression models from the Tsallis score: Application to coronavirus disease 2019 contagion in Italy.

Authors:  Paolo Girardi; Luca Greco; Valentina Mameli; Monica Musio; Walter Racugno; Erlis Ruli; Laura Ventura
Journal:  Stat (Int Stat Inst)       Date:  2020-10-23

4.  A network model of Italy shows that intermittent regional strategies can alleviate the COVID-19 epidemic.

Authors:  Fabio Della Rossa; Davide Salzano; Anna Di Meglio; Francesco De Lellis; Marco Coraggio; Carmela Calabrese; Agostino Guarino; Ricardo Cardona-Rivera; Pietro De Lellis; Davide Liuzza; Francesco Lo Iudice; Giovanni Russo; Mario di Bernardo
Journal:  Nat Commun       Date:  2020-10-09       Impact factor: 14.919

5.  Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic.

Authors:  Lewis F Buss; Carlos A Prete; Claudia M M Abrahim; Alfredo Mendrone; Tassila Salomon; Cesar de Almeida-Neto; Rafael F O França; Maria C Belotti; Maria P S S Carvalho; Allyson G Costa; Myuki A E Crispim; Suzete C Ferreira; Nelson A Fraiji; Susie Gurzenda; Charles Whittaker; Leonardo T Kamaura; Pedro L Takecian; Pedro da Silva Peixoto; Marcio K Oikawa; Anna S Nishiya; Vanderson Rocha; Nanci A Salles; Andreza Aruska de Souza Santos; Martirene A da Silva; Brian Custer; Kris V Parag; Manoel Barral-Netto; Moritz U G Kraemer; Rafael H M Pereira; Oliver G Pybus; Michael P Busch; Márcia C Castro; Christopher Dye; Vítor H Nascimento; Nuno R Faria; Ester C Sabino
Journal:  Science       Date:  2020-12-08       Impact factor: 47.728

6.  Nowcasting COVID-19 incidence indicators during the Italian first outbreak.

Authors:  Pierfrancesco Alaimo Di Loro; Fabio Divino; Alessio Farcomeni; Giovanna Jona Lasinio; Gianfranco Lovison; Antonello Maruotti; Marco Mingione
Journal:  Stat Med       Date:  2021-05-06       Impact factor: 2.373

7.  Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2).

Authors:  Ruiyun Li; Sen Pei; Bin Chen; Yimeng Song; Tao Zhang; Wan Yang; Jeffrey Shaman
Journal:  Science       Date:  2020-03-16       Impact factor: 47.728

8.  An ensemble approach to short-term forecast of COVID-19 intensive care occupancy in Italian regions.

Authors:  Alessio Farcomeni; Antonello Maruotti; Fabio Divino; Giovanna Jona-Lasinio; Gianfranco Lovison
Journal:  Biom J       Date:  2020-11-30       Impact factor: 1.715

9.  Features of severe COVID-19: A systematic review and meta-analysis.

Authors:  Francesco Del Sole; Alessio Farcomeni; Lorenzo Loffredo; Roberto Carnevale; Danilo Menichelli; Tommasa Vicario; Pasquale Pignatelli; Daniele Pastori
Journal:  Eur J Clin Invest       Date:  2020-08-29       Impact factor: 5.722

Review 10.  Characteristics of SARS-CoV-2 and COVID-19.

Authors:  Ben Hu; Hua Guo; Peng Zhou; Zheng-Li Shi
Journal:  Nat Rev Microbiol       Date:  2020-10-06       Impact factor: 78.297

View more
  6 in total

1.  A D-vine copula-based quantile regression model with spatial dependence for COVID-19 infection rate in Italy.

Authors:  Pierpaolo D'Urso; Livia De Giovanni; Vincenzina Vitale
Journal:  Spat Stat       Date:  2022-01-10

2.  Clustering spatio-temporal series of confirmed COVID-19 deaths in Europe.

Authors:  A Bucci; L Ippoliti; P Valentini; S Fontanella
Journal:  Spat Stat       Date:  2021-10-06

3.  Combining rank-size and k-means for clustering countries over the COVID-19 new deaths per million.

Authors:  Roy Cerqueti; Valerio Ficcadenti
Journal:  Chaos Solitons Fractals       Date:  2022-03-11       Impact factor: 9.922

4.  Spatial clustering behaviour of Covid-19 conditioned by the development level: Case study for the administrative units in Romania.

Authors:  Stefana Cioban; Codruta Mare
Journal:  Spat Stat       Date:  2021-12-02

5.  The role of the socio-economic context in the spread of the first wave of COVID-19 in the Marche Region (central Italy).

Authors:  Eleonora Gioia; Alessandra Colocci; Cristina Casareale; Noemi Marchetti; Fausto Marincioni
Journal:  Int J Disaster Risk Reduct       Date:  2022-10-04       Impact factor: 4.842

6.  Endemic-epidemic models to understand COVID-19 spatio-temporal evolution.

Authors:  Alessandro Celani; Paolo Giudici
Journal:  Spat Stat       Date:  2021-07-12
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.