Literature DB >> 34908652

On the interplay of regional mobility, social connectedness and the spread of COVID-19 in Germany.

Abstract

Since the primary mode of respiratory virus transmission is person-to-person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID-19. While research has shown that non-pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, we investigate the relative regional mobility behaviour to assess the effect of human movement on the spread of COVID-19. In particular, we explore the impact of human mobility and social connectivity derived from Facebook activities on the weekly rate of new infections in Germany between 3 March and 22 June 2020. Our results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns. The extent of social distancing, quantified by the percentage of people staying put within a federal administrative district, has an overall negative effect on the incidence of infections. Additionally, our results show spatial infection patterns based on geographical as well as social distances.

Entities: Chemical

Keywords: COVID‐19; infectious disease modelling; semiparametric regression; social connectedness; social networks; spatial network data

Year: 2021 PMID： 34908652 PMCID： PMC8662283 DOI： 10.1111/rssa.12753

Source DB: PubMed Journal: J R Stat Soc Ser A Stat Soc ISSN： 0964-1998 Impact factor: 2.175

INTRODUCTION

The COVID‐19 virus outbreak originating in mainland China leapt over to Europe and quickly evolved to a global pandemic in March 2020. Only through strict non‐pharmaceutical interventions (NPI) could most national health systems rapidly react to this new threat. In numerous scientific efforts, physical distancing measures were discovered to be the most effective interventions (Prem et al., 2020) and found to be necessary maybe until 2022 (Kissler et al., 2020). The measures’ effectiveness emanates from researchers confirming that the main form of virus transmission is person‐to‐person interaction (Chan et al., 2020). The virus can be spread by inhaling microscopic aerosol particles that contain COVID‐19 and remain viable in the air with a half‐life of about 1 h (Asadi et al., 2020) or direct contact through the exchange of virus‐containing droplets with infected individuals (Guan et al., 2020). Since also a high proportion of cases is asymptomatic (Lavezzo et al., 2020) and gets infected by cases in the presymptomatic stage (Li et al., 2020b), human mobility can explain the spread of COVID‐19 to a considerable extent (Kraemer et al., 2020). Stemming from the consequential need to account for contact patterns when investigating the spread of COVID‐19, Oliver et al. (2020) list multiple possibilities of how one may utilise mobile phone data to do so. To enable this type of research, Facebook extended the Data for Good program to a broader audience of researchers and provided so‐called Disease Prevention Maps for multiple countries (Maas et al., 2019). This database includes measurements on quantities like co‐location, user counts and movement ranges on a regional level derived from information of more than 26 million Facebook users. Additionally, a measure for the social connectedness between geographical regions is supplied (Bailey et al., 2018). In various studies, this data source was employed to demonstrate how the impact of lockdown measures in Italy was more severe for municipalities with higher fiscal capacities (Bonaccorsi et al., 2020), quantify social and geographical spillover effects from relaxations of shelter‐in‐place orders (Holtz et al., 2020) and predict the number of infections on a granular spatiotemporal resolution using contact tracing data (Lorch et al., 2020). This article uses the same data source to analyse how regional differences in mobility patterns and friendship proximity affect the spread of COVID‐19 in Germany. While NPIs, for example, the nationwide shutdown in Germany that started 22 March, had an evident impact on national human mobility and ceased the exponential spread of the virus (Flaxman et al., 2020), the effect of the relative movement between regional districts was not yet fully assessed. So far, studies concerning human movements during the current pandemic are focused mainly on how the lockdown affected national human mobility (Galeazzi et al., 2020) or specific regions regarding their economic status (Bonaccorsi et al., 2020). To fill this gap, we derive covariates from the mobility data to quantify the overall dispersion of meeting patterns and compliance with social distancing. Through weekly standardisation of the covariates, we control for the dynamics therein, which are, in turn, driven by NPIs. As a result, our research enables a quantitative assessment of different mobility strategies relative to the national average. Also, we infer positions of the federal administrative regions in a social space from the information on the relative friendship links among them using multidimensional scaling (Cox & Cox, 2000). Subsequently, we relate the processed data to Germany’s weekly rate of local COVID‐19 infections between 3 March and 22 June 2020. This time frame permits the analysis of the dynamic spread starting with the WHO declaring COVID‐19 a pandemic (WHO, 2020). We employ a spatiotemporal regression model for the ratio of local COVID‐19 infections that takes autoregressive structures, age‐ and gender‐specific effects, contagion by nearby districts in the geographical and social space, as well as latent heterogeneities between the districts into account. Our method is closely related to the surveillance model introduced by Held et al. (2005). They extend generalised linear models to analyse surveillance data from epidemic outbreaks. This approach was expanded to handle multivariate surveillance data (Paul et al., 2008), control for seasonality and spatial heterogeneity (Held & Paul, 2012) and include neighbourhood information from social contact data (Meyer & Held, 2017). In contrast to this type of model, our model’s objective is to investigate the connection between mobility patterns, social connectivity and the spread of COVID‐19 in an interpretable manner. While forecasting infections is undoubtedly a central objective in epidemic surveillance, this is not the main focus of our work (see also Held et al., 2017). The rest of the article will be structured as follows: We discuss the data sources, its measures on social interaction as well as mobility in Section 2. In Section 3, we detail our proposed modelling approach. We propose an imputation model for missing onset dates and use a semiparametric spatiotemporal model to analyse the ratio of local COVID‐19 infections with a specific disease onset date. The results of the analysis are presented in Section 4. Section 5 concludes the article.

DATA DESCRIPTION

Data on infections

Our application’s outcome of interest is the ratio of COVID‐19 infections in a federal administrative district (NUTS‐3 level), which we define as the quotient of the number of COVID‐19 infections over the corresponding population size. In Germany, there are n = 401 federal administrative districts (a complete list is given by the German Federal Statistical Office). At a higher hierarchical level, each federal district also belongs to a federal state (NUTS‐1 level). In most figures, for example Figure 2, we colour‐code the district‐specific time series according to this allocation. If we refer to a specific district in the text, we generally specify the corresponding federal state in brackets.

FIGURE 2

(a) Gini indices for each district over time. (b) Standardised Gini indices for each district over time. (c) Percentages of people staying put for each district over time. (d) Standardised percentages of the people staying put. The colour of the lines indicate the state in which each district is located and the dates (mm:dd) are the first day of the corresponding week [Colour figure can be viewed at wileyonlinelibrary.com]

Infection count: The Robert‐Koch‐Institute provides timely data on the daily number of COVID‐19 infections in Germany for each federal district. We limit the present analysis to individuals between 15 and 59,years old due to the age structure in the Facebook population. Besides, the given surveillance counts are stratified by age group (15–35 and 36–59) and gender. For each entry, dates of symptom onset and reporting are given, although the onset date is partially missing. Our principal analysis is based on the disease onset date since it ensures more valid information on the infection incidence (Günther et al., 2020). Imputation of the missing values is required (we present our method in Section 3.1). By we denote the observed (and partially imputed) counts of new onsets within district i, age/gender‐group g and week t. For completeness, we define with the corresponding indicator for the age/gender group. Population: We obtained district‐, age‐ and gender‐specific population data from the German Federal Statistical Office. To guarantee a consistent definition of age‐groups, we categorised the data according to the two primary age groups according to which the infection data are reported, namely people between 15–35 and 36–59 years old. The corresponding time‐constant covariate is denoted for age/gender‐group g in district i by . The observed rates per 10.000 inhabitants are visualised in Figure 1 colour‐coded according to the different states. For each week, we plot the rate of disease onsets that we partially impute in case of missingness, as described in detail in the next section. Once the first peak of infections could be overcome, the cases in the aftermath are increasingly attributed to local outbreaks. Two districts, namely Guetersloh and Warendorf (North Rhine‐Westphalia), experience a local outbreak in a meat factory during the last weeks of the observational period (Kottasová, 2020). This local outbreak encompasses 48% of all infections with disease onset in the week starting on 16 June.

FIGURE 1

Observed Rate of Weekly Infections for each federal district. The colour of the lines indicate the federal state in which each district is located and the dates (mm:dd) are the first day of the corresponding week [Colour figure can be viewed at wileyonlinelibrary.com]

Data on social activity during COVID‐19

All data related to social activities during the COVID‐19 pandemic are generated from approximately 10 million Facebook users in Germany, who enabled geolocation features in the Facebook app on their mobile devices. To abide by the privacy policies, the observations are anonymised through aggregation onto tile bing polygons, censoring if we observed not enough users in the spatial region, as well as randomisation using additional noise and spatial smoothing (Maas et al., 2019). We aggregate the polygons to the same spatial units for our application on which we have the infection data. We propose the following measures describing social interaction and mobility. All measurements are taken weekly, where we use simple averaging for quantities available at a more granular temporal resolution. Co‐location: Co‐location in week t is measured by the probability of a random person from district i to be located in the same 0.6 km × 0.6 km square as another random person from district j (Iyer et al., 2020). These probabilities are then used to construct a district‐wise quantity for the concentration of meeting patterns using the Gini index, which is given by: If we were to observe the maximal value of 1 in , all people within federal district i would only meet people (i.e. Facebook users) from only one further district. This behaviour is exemplary of extremely restricted mobility. Conversely, a lower value heuristically indicates dispersed meeting patterns. Due to this intuitive interpretation, we opt for the Gini index as a measure of concentration. The Gini indices’ temporal paths for the 401 districts in Germany are depicted in Figure 2a. Overall, the meeting patterns become more concentrated on a few other districts as the crisis evolves. This behaviour contrasts rather dispersed practices before the pandemic. An upward trend is visible until the nationwide lockdown on 22nd of March, 2020.1 1 Thereupon, meeting patterns continue to be overall condensed, although the indices slowly decline. To enable a meaningful comparison between the respective estimates in the regression setting of Section 3, we standardise the Gini indices per week. The standardised covariate is shown in Figure 2b and given by where and . (a) Gini indices for each district over time. (b) Standardised Gini indices for each district over time. (c) Percentages of people staying put for each district over time. (d) Standardised percentages of the people staying put. The colour of the lines indicate the state in which each district is located and the dates (mm:dd) are the first day of the corresponding week [Colour figure can be viewed at wileyonlinelibrary.com] Percentage staying put: Besides the relative attribution of co‐location probabilities to other districts, we investigate a measure that expresses how people (Facebook users) comply with social distancing. We quantify this concept by the covariate , which is defined as the average percentage of people in district i staying put during week t. Respective data were collected using geolocation traces of mobile devices and users are defined to be staying put, if they are only observed in one 0.6 km × 0.6 km square throughout a day (Facebook, 2020). In Figure 2c, clear break‐points are visible, giving evidence of the temporary lockdown that started between 17 and 24 March. During the following weeks, the observed values gradually level off around pre‐lockdown values. We also observe some peaks in the weeks starting on 7 and 28 April, which could be traced back to the different mobility behaviour during national holidays, namely Good Friday on 10 April and Labour Day on 1 May 2020. Similarly to the treatment of the Gini index, we standardise the percentages in the regression setting. While the visual impression from Figure 2c insinuates that the dynamics of people staying put are similar between districts, the standardised paths, given in Figure 2d, reveal local differences between them. For instance, the early look‐down in Bavaria resulted in a substantial relative increase of the respective districts between the 10th and 17th of March, see the yellow‐green lines. Friendship distance: Spatial distance is found to be strongly associated with the spread between regions (Kang et al., 2020). Beyond the geographical proximity, Cho et al. (2011) argued that friendship ties explain specifically long‐distance mobility, which is fundamental for understanding the early spread of the pandemic (Chinazzi et al., 2020). To accommodate this possible line of infection, we include a measure for the strength of friendship ties between the districts of Germany. More precisely, we employ the social connectedness index proposed by Bailey et al. (2018), which is based on an anonymised snapshot of all active Facebook users and their friendship networks from April 2020. For the administrative district i and j, the time‐invariant measure is given by: In a note, Kuchler et al. (2021) uncover high correlations between the social connectedness indices and the spread of COVID‐19. This index is further processed to provide a spatial allocation based on social instead of Euclidean distances. To do so, we first transform social connectedness to social distance by taking the reciprocal of connectedness, that is, . Consecutively, we process these distances to coordinates using multidimensional scaling (Cox & Cox, 2000). In our application, this procedure’s result is a two‐dimensional representation of each district’s location in the network defined through Equation (2) that is only identifiable up to the scale and rotation. Using Procrustes analysis, we map the rotation of the inferred coordinates in the friendship space to be most similar to the geographical coordinates (Cox & Cox, 2008). Technical details on both procedures are given in Annex A. The outcome of the algorithm for each district i is denoted by and gives the geo‐coordinates in the friendship space as shown in Figure 3. Robust connectivity within federal states and neighbouring districts are visible in the friendship coordinates. We also observe that the capital, Berlin, is situated in the very centre, reflecting its unique and highly connected position. One can also detect a persisting corridor between districts located in former East‐ and West Germany. Next to the social coordinates, we incorporated each district’s geographical coordinates , that is, the longitude and latitude of each districts centroid, in our application.

FIGURE 3

(a) Coordinates representing the friendship distances. The colour of the points indicates the state in which each district is located. (b) Map representing the colour legend. The thick black lines represent borders between federal states, while the thinner grey borders separate federal districts [Colour figure can be viewed at wileyonlinelibrary.com]

MODELLING

We start by proposing a model to impute missing dates of the disease onset. Subsequently, these partially imputed infection data are modelled with a negative binomial regression.

Imputation model

We can see in Figure 4 that approximately 30% of the onset dates are missing. To still make use of all available information, we propose to impute missing disease onset dates under the assumption of missingness at random. This allows for unbiased findings which are not guaranteed when using a complete case analysis (Little & Rubin, 2002). In particular, we leverage the fact that the chronologically later reporting date is available for all cases. Thereby, the problem of imputing the date of disease onset for a single infection is reduced to imputing the time between onset of disease and its reporting through a positive test, which we call test delay. Following Günther et al. (2020), we use the subset of all data without any missing disease onset dates to fit a distributional regression model for this test delay. In the next step, we predict all distributional parameters under this model for all cases with a missing disease onset date and sample the missing onset date.

FIGURE 4

Count of missing and observed disease onset dates per reporting week [Colour figure can be viewed at wileyonlinelibrary.com]

Count of missing and observed disease onset dates per reporting week [Colour figure can be viewed at wileyonlinelibrary.com] To fit the imputation model, we first disaggregate the given surveillance counts to the patient level. For each complete case l, the data include the age/gender group indicator (), an indicator whether the reporting date was during a weekend (), and the state () and district () where it was observed. Regarding the temporal information of each infection, we are given the date of disease onset () and its reporting (). For complete‐case data, the test delay is then given by . As regressors in the imputation model, we include dummy covariates and the metric covariate itself to account for changing testing strategies, for example, during the early spread the test capacities were limited and patients needed to wait longer for a test to be conducted. We assume that is a realisation of random variable , which follows a negative binomial model: where and holds. A discrete‐valued distribution appears most suitable since the patient‐level data are available daily, making the test delay inherently discrete. As indicated in Equation (3), we model the location and scale parameters of the distribution by separate linear predictors. Note that the linear predictors are defined by and for the corresponding distributional parameters and that the linearity only refers to linearity in the coefficients not in the covariates. Therefore, the model lies within the family of generalised additive models for location, scale and shape (Rigby & Stasinopoulos, 2005). While all components of have a log‐linear effect, we parameterise the trend effect of the reporting date by nonlinear penalised splines (see Eilers & Marx, 1996 for details). The district‐specific effects are assumed to be Gaussian. After having obtained the estimates, we calculate and for all observations with missing disease onset. We can now simulate from Equation (3) to acquire a full data set by setting . Through aggregation from the daily patient‐level data to the infection counts per district i and age/gender group g with disease onset in week t, denoted by , we build a single partially imputed data set. This procedure is repeated K times to represent the uncertainty associated with the missing information of all disease onsets.

Infection model

To model the rate of infections with partially imputed data, we apply a negative binomial ‘observation‐driven’ model for count data including the population as an offset term (Cox, 1981). By doing so, we assume where are the covariates at arbitrary week u specified in Section 2 and denotes the set of age/gender groups used from the data. Furthermore, let T be the final week of data we use in the analysis. We assume in Equation (4) that the random variable follows a negative binomial distribution conditional on , and to compensate overdispersion in the observed counts. Aligned with models for the spread of infectious diseases (Held et al., 2005), we decompose into an endemic and epidemic component: where each part is parameterised as follows: We include a first‐order autoregressive term of this rate, since path dependencies and self‐exciting behaviour are common with infectious diseases and should therefore be accounted for Held et al. (2005). In addition, we transform the respective term by h(x) = log(x + c) to bypass problems with absorbing states of the implied counting process when . The value c ∈ (0, 1] is estimated from the data. More general types of these autoregressive models are proposed by Zeger and Qaqish (1988). As is evident from Equation (5), we constitute that both the epidemic and endemic components have a multiplicative effect on the observed infection rates. As an alternative, Held et al. (2005) replace the log link by an identity link, although Fokianos et al. (2020) argue for the logarithmic link implied in Equation (5) if additional covariates are available. They further derive theoretical properties, such as ergodicity, in the case of Poisson‐distributed target variables under the condition . Time‐varying effects: For the endemic part (7), the temporal trend is reflected by piecewise constant fixed effects separately for each week, . By means of group‐specific covariates we control for gender‐ and age‐related effects and their interaction, , and (Walter & Mcgregor, 2020). The principal covariates, Gini Index and Percentage Staying Put, are modelled by piecewise constants in each week for maximal flexibility. To account for the stylised fact, that the incubation period, that is, the time between being infected and symptom onset, for COVID‐19 is around 5 days (Li et al., 2020a), we lag the information on Gini Index and Percentage Staying Put by one week as indicated in Equation (7). Isotropic smooth effects: The bivariate functions and display the effects of geographical coordinates and social coordinates on the incidence rate. To properly incorporate and in our regression framework, we propose the usage of isotropic splines. These kind of flexible functions were proposed by Duchon (1977) to model multiple covariates by a multivariate term as an alternative to anisotropic tensor products. Isotropic smoothers have the property of giving the identical predictions of the response under arbitrary rotation and reflection of the respective covariates (Wood, 2017). This characteristic is commonly reasonable when working with geographical coordinates and in accordance with the uniqueness of the multidimensional scaling results, thus also for . With respect to the form of the smooth terms, we follow Wood (2003) and use a low‐rank approximation of the thin‐plate splines introduced in Duchon (1977). To obtain a smooth fit, we impose a penalty that is controlled by and for the respective isotropic splines. Random effects: Because super spreader events such as carnival sessions (Streeck et al., 2020) or local outbreaks in major slaughterhouses (Dyal et al., 2020) lead to unobserved heterogeneities, our model comprises two district‐specific Gaussian random effects. The random effect governs long‐term heterogeneities, while short‐term dependencies, that is, sudden locally confined outbreaks as visible in the last week of Figure 1, are captured by . We assume and . Relying on the duality between semiparametric regression and random effects (Ruppert et al., 2003), we can equivalently write the random effects as semiparametric terms. Hence we may replace and by and , respectively, and introduce a ridge penalty for each coefficient vector. In this context, the design matrices and each consist of n dummy variables indicating to which district a specific observation refers. As a result of this reformulation, we can estimate the additional parameters and as tuning parameters in semiparametric regression (see Annex B for further information). Modelling rates via count regression: Effectively, we model the rate of infections by including the term as an offset in Equation (7) since the infections rates relate to the counts through via (note the slight abuse of notation as we here do not regard the infection rate among 10.000 inhabitants but the percentage of people infected with a disease onset in a specific week). As a byproduct, we implicitly assume that the entire population is susceptible, which is reasonable when considering the low prevalence of COVID‐19 in Germany during the first wave. However, the model is still applicable in the later stages of the pandemic by replacing this offset with the number of susceptible inhabitants in each region.

Estimation

At first, we propose an estimation procedure for the imputation model from Section 3.1. Given a partially imputed data set, we specify how to get estimates for the infection model from Section 3.2. Finally, the multiple imputation scheme combining both approaches is presented. Generally, we carry out all computations conditional on the observations in t = 1, that is, the week between the 3rd and 9th of March. Imputation model: We get estimates for the imputation model through maximising the likelihood function resulting from Equation (3). As mentioned in Section 3.2, we can rewrite all random effects as smooth terms and penalise this likelihood to obtain smooth functions. By repeatedly updating the estimators through a backfitting algorithm, we optimise this objective (see Rigby & Stasinopoulos, 2005 for details). This procedure is readily implemented in the software package gamlss (Stasinopoulos et al., 2020). Infection model: The infection model is characterised by the parameters c and θ, relating to the log‐transformation of the autoregressive component and all other parameters. Given a partially imputed data set, we first consider θ to be a nuisance parameter and find c via a profile likelihood approach. Here the profile likelihood is given by where is the joint likelihood resulting from Equation (4) and is the maximum likelihood estimator of θ for a fixed value of c. For any c, we can find by carrying out the estimation as explained in Annex B, hence it is straightforward to evaluate . Building on this result, we use standard optimisation software, that is, the optimise routine within the software environment R (R Core Team, 2020), to obtain . In the consecutive step, we fix c at to get again by following Annex B. Multiple imputation: Since information on the onset of symptoms is missing for approximately 30% of the cases, we proposed an imputation model in Section 3.1 to generate K partially imputed data sets. To correct the uncertainty quantification of the infection model for this multiple imputation procedure, we use the Rubin’s rule. At first, we sample K imputed data sets according to Section 3.1. Let be the resulting estimator from the two‐stage maximum profile likelihood procedure explained in the previous paragraph given the partially imputed data set from the kth imputation step. By we denote the corresponding variance estimate that results from Bayesian large sample properties (Wood, 2013). We then average the coefficients over all K iterations to obtain and estimate its variance through: where its components are given by In our application, setting K = 20 proved to be sufficient since the estimates of different imputed data sets varied only marginally.

RESULTS

We only report the findings of the infection model detailed in Section 3.2. A detailed analysis of the imputation model as well as a robustness check for the infection model can be found in the Supplementary Material.

Temporal effect

To start, the estimate of is shown in Figure 5. The progression of the weekly estimates confirms generally decreasing infection rates over time. Due to the standardisation employed for the principal covariates in the analysis, the temporal trend can be interpreted as the log‐transformed expected infection rate of female individuals aged between 15 and 35 in a district where the standardised Gini Index and Percentage Staying Put are zero. Since observing a zero in the standardised covariates translates to the mean observed values where we observed most information, the standard errors are also extremely narrow.

FIGURE 5

Estimate of temporal effect . The 95% confidence interval accompanies the estimates, and the shown dates (mm:dd) on the x‐axis are the first days of the corresponding weeks

Sociodemographic and epidemic effects

The linear time‐constant estimates are given in Table 1 and exhibit in general a negative effect on male patients compared to female patients, 3% in the younger and 9.6% in the older age cohort. 2 According to its partial effect, we also predict that the older age group has a lower infection rate than the younger group encompassing individuals aged between 15 and 35, for men 9.7% and women 3.1%. The autocorrelation coefficient expresses that one more infection among 10.000 inhabitants in a district during the past week almost doubles the predicted infections for the present week. This dominant finding confirms strong path dependencies in the data. In this context, we need to remark that the coefficients are partial effects that condition on all other covariates used in the model. Therefore, a positive coefficient of a dummy variable does not necessarily translate to the same finding in the raw numbers.

TABLE 1

Estimates of linear time‐constant effects

Covariable	Estimate (standard error)	exp{Estimate} (standard error)
Male	−0.03	0.97
Male	(0.015)	(0.014)
A35–A59	−0.031	0.969
A35–A59	(0.014)	(0.013)
Male: A35–A59	−0.071	0.931
Male: A35–A59	(0.02)	( 0.017)
log(y~i,g,t‐1+c)	0.623	1.865
	(0.009)	(0.031)

Notes The reference group are female individuals aged between 15 and 35. By use of the delta rule, we approximated the standard errors of the transformed coefficients in the third row. The value c is estimated at 0.499 with a standard error of 0.027.

Estimates of linear time‐constant effects Notes The reference group are female individuals aged between 15 and 35. By use of the delta rule, we approximated the standard errors of the transformed coefficients in the third row. The value c is estimated at 0.499 with a standard error of 0.027.

Mobility effects

The time‐varying estimates regarding the relative mobility pattern are displayed in Figure 6. Overall, the estimated effects of the measures proposed in Section 2.2 on the rate of local COVID‐19 infections are negative. In regards to relative importance, both variables rank similarly during the lockdown period that persists until early May. Subsequently, the Gini Index in a region gains weight, while the effect of People Staying Put becomes more volatile. The temporal changes of the respective estimates illustrate nonlinearities, which would not have been sufficiently captured by linear effects.

FIGURE 6

(a) Time‐varying effects of the Gini index . (b) Time‐varying effects of the Percentage of People Staying Put . The 95% confidence interval accompanies the estimates, and the shown dates (mm:dd) on the x‐axis are the first days of the corresponding weeks Gini index of co‐location: Given all other covariates, Figure 6a suggests that inhabitants with meeting patterns that are centred around a few other districts entail reduced infection rates for a specific district. This tendency is only suspended in the week starting on March 17th during the early lockdown in Bavaria. The corresponding estimate is positive and significant. Right after the national lockdown on 22 March 2020, is ordered, the effect is not significantly different from zero for one week (03–24). The estimated effects remain low but negative until the German government introduces compulsory masks in public areas on 22 April (Mitze et al., 2020). Thereupon, the effect has a clear downwards tendency. Once policymakers slowly lift the lockdown measures, the estimate declines further until its maximum in the penultimate week of our observational period. This development may be viewed as evidence that a more focused attribution of co‐location probabilities in a district becomes more crucial over time. Percentage staying put: Suppose the percentage of inhabitants in a district staying put is large relative to the national tendency. In that case, we expect the incidence of infections throughout the lockdown period to be lower. We deduce this result from the largely negative estimates in Figure 6b for the weeks between 10 March and 12 May. Once the orders are relaxed, on the other hand, the standard errors of the respective covariate become relatively large, and the effect vanishes in the final week of the study. A possible explanation for this phenomenon is that when daily infections decline, most diseases are related to local outbreaks (as already mentioned in Section 2). These breakouts, in turn, cannot be associated with the percentage of people staying put. One exception to this finding is the estimate in the week starting on 26 May, where we encounter a significant positive effect.

Spatial and social connectedness effects

In our model specification, we incorporate the friendship coordinates and geographical coordinates as two spatial effects. In combination with the two unstructured latent variables, we can disentangle separate influences on the local infection rates of spatial and friendship proximity as well as short‐ and long‐term district‐specific deviations from it. Spatial effects: Let us start with the smooth spatial effect in Figure 7. Overall, the geographical effects within federal states, indicated by the black borders in Figure 7, are mostly heterogeneous. To give some examples, an almost uniformly augmented risk of infections is estimated in Baden‐Württemberg and Thuringia. At the same time, we remark a negative spatial effect in Germany’s northern districts, that is, Schleswig Holstein and Mecklenburg Western Pomerania. On the other hand, the fit for districts in North Rhine‐Westphalia varies between positive, negative and no effect.

FIGURE 7

Estimated smooth spatial effect . The thick black lines represent borders between federal states, while the thinner grey borders separate federal districts. Through arrows, we highlight selected states mentioned in the text [Colour figure can be viewed at wileyonlinelibrary.com] We visualise the result of the friendship coordinates in two manners. One may plot the smooth bivariate function in the friendship space, Figure 8a, or map the smooth fit on the geographical space, Figure 8b. The re‐mapping allows for sharp edges in the geographical coordinates. Broadly, the fit differentiates between districts allocated in former East Germany (corresponding in Figure 8a to MDS coordinates located in the first quadrant) and former West Germany. We observe that the predicted infections are ceteris paribus lower if a district is situated in former East Germany. Districts allocated in the second and fourth quadrant of Figure 8a (mainly including districts from the states Bavaria, North Rhine‐Westphalia and parts of Lower Saxony) are negatively affected by social proximity. Figure 8b demonstrates how the partial effects sometimes change abruptly between large cities and neighbouring districts. For instance, Berlin’s central position is unrelated to the infection rates compared to the negative effect evaluated in Brandenburg. We observe a similar phenomenon for Hamburg when contrasting its partial effect with surrounding districts in Schleswig Holstein and Lower Saxony.

FIGURE 8

(a) Coordinates of the districts in the friendship space with the smooth partial effect of in the background. We only show the predictions in the range of observed values. (b) Coordinates of the districts in the geometric space with the smooth partial effect of again shown in the background for each district. The thick black lines represent borders between federal states, while the thinner grey borders separate federal districts. Through arrows, we highlight selected states mentioned in the text [Colour figure can be viewed at wileyonlinelibrary.com] Unobserved heterogeneity effect: In Figure 9, the posterior modes of both random effects evince strong heterogeneities between districts and underpin local differences in the spread of COVID‐19. Noticeable estimates of the long‐term random effects, Figure 9a, reflect early outbreaks in the districts Greiz (Thuringia) and Coesfeld (North Rhine‐Westphalia). Some estimates may also be related to heterogeneous testing practices between the districts.

FIGURE 9

(a) Maximum posterior modes of the long‐term random effects . (b) Maximum posterior modes of the short‐term random effects . The thick black lines represent borders between federal states, while the thinner grey borders separate federal districts. Through arrows, we highlight selected districts mentioned in the text [Colour figure can be viewed at wileyonlinelibrary.com] We can trace back most high estimates of the short‐term random effect to locally confined outbreaks, for instance, Guethersloh and Warendorf (North Rhine‐Westphalia). As already stated in Section 2 the proportion of infections attributed to these local events rises once the general level of new cases declines. This result is supported by the different scales of the two types of random effects and apparent in the estimates . Therefore, the posterior modes of the short‐term effects exhibit higher variances and are larger in absolute terms than the long‐term effects.

Model assessment

We compare various alternative model specifications to check the robustness of our conclusions. In particular, we estimate separate models, adding dummy covariates for each state and leaving out one of the spatial terms, the Gini index, the Percentage of People Staying Put, all Facebook‐related covariates and random effects. For this endeavour, we utilise the corrected Akaike information criterion (cAIC) introduced by Wood et al. (2016) since the effective degrees of freedom need to adjusted for the additionally estimated variance components if random effects are included (we average the respective values over the results of all imputed data sets). The results in Table 2 support the appropriateness of our final model since the corresponding cAIV value is the lowest. Besides, the change in the cAIC value to the model (4), denoted by ΔcAIC, permits an evaluation of the variable importance of each eliminated covariate. We can conclude from Table 2 that the exclusion of the Gini Index induces the highest loss in cAIC value. Concerning the different types of distances, the friendship distance is more important than the geographical distance.

TABLE 2

Alternative model specifications with resulting corrected Akaike information criterion (cAIC) value and change in corrected AIC value when compared to our model from Section 3

Model description	cAIC (Model)	ΔcAIC (Model)
Our model	86694	–
With state effect	86694.79	0.790
Without geographical distance	86699.42	5.422
Without friendship distance	86701.26	7.262
Without Age:Gender interaction	86707.34	13.336
Without percentage staying put	86732.46	38.461
Without Gini index	86974.32	280.319
Without Facebook covariates	87033.87	339.867
Without long‐term effect	87452.62	758.620
Without short‐term effect	87900.03	1206.034
Without long‐ and short‐term effect	88624.38	1930.382

Alternative model specifications with resulting corrected Akaike information criterion (cAIC) value and change in corrected AIC value when compared to our model from Section 3 For further validation, we plot one draw of the randomised quantile residuals in Figure 10a. Dunn and Smyth (1996) proposed this type of residual based on the result that evaluating the cumulative distribution function at all observed values of under the estimated parameters should yield uniformly distributed random variables. Transforming these uniform values by the quantile function of the standard normal gives the quantile residuals. To obtain continuous residuals, the values are randomised since the negative binomial distribution in Equation (4) has discrete support. On average, the empirical quantiles are close to the theoretical expectations and do not indicate problems regarding the statistical fit. At the right tail of the distribution, 38 (out of 24.060) observations exhibit higher deviations from the normal quantiles, which we coloured in red. The underlying counts are mainly credited to local outbreaks that could not be completely captured by the random effects, namely Coesfeld (Thuringia), Cuxhaven (Lower Saxony), Aichach‐Friedberg (Bavaria), Guetersloh and Warendorf (North Rhine‐Westphalia). Additionally, we assess the predictions of the final model through plotting the predicted infections against the observed infections, Figure 10b, and a rootogram proposed by Kleiber and Zeileis (2016), Figure 10c. Both visualisations confirm a strong fit of the presented model and proof that the model can sufficiently capture the observed counts of infected individuals. Due to the multiple imputation scheme specified in Sections 3.1 and 3.3, we carry the model assessment out for each imputation separately and report the averaged results.

FIGURE 10

(a) QQ Plot of randomised quantile residuals, observations with a distance larger than 1 to the theoretically expected values are drawn in red. (b) Scatter plot of the observed and predicted infection count, for the x and y‐axis, we used a log(· + 1) scale. The dotted grey line is the best‐case scenario of the prediction and has intercept 0 and slope 1. (c) Rootogram comparing the observed and expected counts. The grey barplot specifies the observed counts, while the red line gives the expected values under Equation (4) [Colour figure can be viewed at wileyonlinelibrary.com]

CONCLUSION

In this writing, our contributions are twofold. First, we used state‐of‐the‐art regression models to quantify the importance of human mobility for understanding the spread of COVID‐19 on a local level accounting for their temporal dynamic, latent effects and other covariates. Concerning the relative importance, the Gini index of meeting probability attribution proved to be a primary driver of the infection rates. Second, we used methods from multivariate statistics to derive friendship coordinates for the federal districts in Germany. Consecutively, we coupled the result with a regression model via isotropic splines and, thereby, revealed a perpetual clustering of communities in former East‐ and West Germany that remains existent for COVID‐19 infections because the social geographical system proves to be an essential regressor in our application. Moreover, our findings enable an evaluation of the district‐wise policies undertaken between March and June 2020. The results corroborate the usefulness of interventions limiting trans‐district movements and concentrating meeting patterns. Especially during the last weeks of this study, local lockdowns could mitigate further national outbreaks. Still, we need to address some limitations of our work, which require additional investigation. The data sources for the infection data include all individuals in Germany that tested positive on COVID‐19. During the peak phase in March, these tests were mainly carried out with patients who showed symptoms or had contact with an infected individual. Due to an unknown dark figure of infected persons missing in the public records (Lavezzo et al., 2020), the observed data are a proxy for the current epidemiological situation. To control for this possible bias, further research on the prevalence of COVID‐19 in Germany and the representability of the official statistics of the real infection occurrence akin to the REACT Study in England (Riley et al., 2021) would be necessary. Even with these caveats, the combination of infection, mobility and connectivity data can serve for a fruitful application of other methods as well. Contrasting our approach, one may tackle the regression task in Section 3 by incorporating the spatial dependencies directly in the correlation structure, as is done in the literature on spatial econometric models (LeSage & Pace, 2009). We could also employ novel clustering algorithms that naturally exploit different proximity dimensions, such as the geographical and social space, to identify similar districts while taking into account spatial dependencies (D’Urso & Vitale, 2020; D’Urso et al., 2019). Furthermore, the research questions posed in this article would greatly benefit from an examination through the lens of analytical sociology (Hedström & Bearman, 2011). Nevertheless, this type of analysis usually necessitates individual‐level data, which are not readily available. Therefore, we can only verify some of the theoretical results of Block et al. (2020) on the macro scale, which does not necessarily translate to the micro scale (Stadtfeld, 2018). Therefore, additional empirical work on the implications of individual behaviour on the spread of COVID‐19 is still needed. Nevertheless, our work can give valuable pointers in that regard contingent on the assumption that the corresponding district average adequately represents the mobility patterns of an individual.

DATA AND CODE AVAILABILITY

Facebook collected the anonymised mobility and connectivity data. We cannot share the raw data due to a data agreement. Still, we are allowed to provide all data aggregated onto the level of federal districts. To guarantee the replicability of our results, we make the complete code to obtain the results from this article available online. We also supply a visualisation of the entire pipeline of our analysis in the Supplementary Material for transparency.

ACKNOWLEDGEMENTS

We thank the anonymous reviewers for their careful reading and constructive comments. Further, we thank Alex Pompe from Facebook for his useful comments and explanations of the data from Facebook. The project was supported by the European Cooperation in Science and Technology [COST Action CA15109 (COSTNET)]. This work has been funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibility for its content. Click here for additional data file. Click here for additional data file.

31 in total

1. Social Connectedness: Measurement, Determinants, and Effects.

Authors: Michael Bailey; Rachel Cao; Theresa Kuchler; Johannes Stroebel; Arlene Wong
Journal: J Econ Perspect Date: 2018

2. Markov regression models for time series: a quasi-likelihood approach.

Authors: S L Zeger; B Qaqish
Journal: Biometrics Date: 1988-12 Impact factor: 2.571

3. Incorporating social contact data in spatio-temporal models for infectious disease spread.

Authors: Sebastian Meyer; Leonhard Held
Journal: Biostatistics Date: 2017-04-01 Impact factor: 5.899

4. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe.

Authors: Seth Flaxman; Swapnil Mishra; Axel Gandy; H Juliette T Unwin; Thomas A Mellan; Helen Coupland; Charles Whittaker; Harrison Zhu; Tresnia Berah; Jeffrey W Eaton; Mélodie Monod; Azra C Ghani; Christl A Donnelly; Steven Riley; Michaela A C Vollmer; Neil M Ferguson; Lucy C Okell; Samir Bhatt
Journal: Nature Date: 2020-06-08 Impact factor: 49.962

5. Resurgence of SARS-CoV-2: Detection by community viral surveillance.

Authors: Steven Riley; Kylie E C Ainslie; Oliver Eales; Caroline E Walters; Haowei Wang; Christina Atchison; Claudio Fronterre; Peter J Diggle; Deborah Ashby; Christl A Donnelly; Graham Cooke; Wendy Barclay; Helen Ward; Ara Darzi; Paul Elliott
Journal: Science Date: 2021-04-23 Impact factor: 47.728

6. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia.

Authors: Qun Li; Xuhua Guan; Peng Wu; Xiaoye Wang; Lei Zhou; Yeqing Tong; Ruiqi Ren; Kathy S M Leung; Eric H Y Lau; Jessica Y Wong; Xuesen Xing; Nijuan Xiang; Yang Wu; Chao Li; Qi Chen; Dan Li; Tian Liu; Jing Zhao; Man Liu; Wenxiao Tu; Chuding Chen; Lianmei Jin; Rui Yang; Qi Wang; Suhua Zhou; Rui Wang; Hui Liu; Yinbo Luo; Yuan Liu; Ge Shao; Huan Li; Zhongfa Tao; Yang Yang; Zhiqiang Deng; Boxi Liu; Zhitao Ma; Yanping Zhang; Guoqing Shi; Tommy T Y Lam; Joseph T Wu; George F Gao; Benjamin J Cowling; Bo Yang; Gabriel M Leung; Zijian Feng
Journal: N Engl J Med Date: 2020-01-29 Impact factor: 176.079

7. Sex- and Gender-specific Observations and Implications for COVID-19.

Authors: Lauren A Walter; Alyson J McGregor
Journal: West J Emerg Med Date: 2020-04-10

8. Clinical Characteristics of Coronavirus Disease 2019 in China.

Authors: Wei-Jie Guan; Zheng-Yi Ni; Yu Hu; Wen-Hua Liang; Chun-Quan Ou; Jian-Xing He; Lei Liu; Hong Shan; Chun-Liang Lei; David S C Hui; Bin Du; Lan-Juan Li; Guang Zeng; Kwok-Yung Yuen; Ru-Chong Chen; Chun-Li Tang; Tao Wang; Ping-Yan Chen; Jie Xiang; Shi-Yue Li; Jin-Lin Wang; Zi-Jing Liang; Yi-Xiang Peng; Li Wei; Yong Liu; Ya-Hua Hu; Peng Peng; Jian-Ming Wang; Ji-Yang Liu; Zhong Chen; Gang Li; Zhi-Jian Zheng; Shao-Qin Qiu; Jie Luo; Chang-Jiang Ye; Shao-Yong Zhu; Nan-Shan Zhong
Journal: N Engl J Med Date: 2020-02-28 Impact factor: 91.245

9. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2).

Authors: Ruiyun Li; Sen Pei; Bin Chen; Yimeng Song; Tao Zhang; Wan Yang; Jeffrey Shaman
Journal: Science Date: 2020-03-16 Impact factor: 47.728

10. Nowcasting the COVID-19 pandemic in Bavaria.

Authors: Felix Günther; Andreas Bender; Katharina Katz; Helmut Küchenhoff; Michael Höhle
Journal: Biom J Date: 2020-12-01 Impact factor: 1.715

5 in total

1. Combining graph neural networks and spatio-temporal disease models to improve the prediction of weekly COVID-19 cases in Germany.

Authors: Cornelius Fritz; Emilio Dorigatti; David Rügamer
Journal: Sci Rep Date: 2022-03-10 Impact factor: 4.379

2. On the interplay of regional mobility, social connectedness and the spread of COVID-19 in Germany.

Authors: Cornelius Fritz; Göran Kauermann
Journal: J R Stat Soc Ser A Stat Soc Date: 2021-11-16 Impact factor: 2.175

3. Modelling the effect of a border closure between Switzerland and Italy on the spatiotemporal spread of COVID-19 in Switzerland.

Authors: Mathilde Grimée; Maria Bekker-Nielsen Dunbar; Felix Hofmann; Leonhard Held
Journal: Spat Stat Date: 2021-11-10

4. Social media mining under the COVID-19 context: Progress, challenges, and opportunities.

Authors: Xiao Huang; Siqin Wang; Mengxi Zhang; Tao Hu; Alexander Hohl; Bing She; Xi Gong; Jianxin Li; Xiao Liu; Oliver Gruebner; Regina Liu; Xiao Li; Zhewei Liu; Xinyue Ye; Zhenlong Li
Journal: Int J Appl Earth Obs Geoinf Date: 2022-08-19

5. The Covid-19 containment effects of public health measures: A spatial difference-in-differences approach.

Authors: Reinhold Kosfeld; Timo Mitze; Johannes Rode; Klaus Wälde
Journal: J Reg Sci Date: 2021-06-20

5 in total