Literature DB >> 36151389

A unified Gaussian copula methodology for spatial regression analysis.

Abstract

Spatially referenced data arise in many fields, including imaging, ecology, public health, and marketing. Although principled smoothing or interpolation is paramount for many practitioners, regression, too, can be an important (or even the only or most important) goal of a spatial analysis. When doing spatial regression it is crucial to accommodate spatial variation in the response variable that cannot be explained by the spatially patterned explanatory variables included in the model. Failure to model both sources of spatial dependence-regression and extra-regression, if you will-can lead to erroneous inference for the regression coefficients. In this article I highlight an under-appreciated spatial regression model, namely, the spatial Gaussian copula regression model (SGCRM), and describe said model's advantages. Then I develop an intuitive, unified, and computationally efficient approach to inference for the SGCRM. I demonstrate the efficacy of the proposed methodology by way of an extensive simulation study along with analyses of a well-known dataset from disease mapping.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36151389 PMCID： PMC9508247 DOI： 10.1038/s41598-022-20171-1

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.996

Introduction

The aim of a spatial regression analysis is to explain a substantial proportion of the spatial pattern exhibited by some dependent variable by appealing to the spatial structure exhibited by one or more independent variables. Consider the data shown in Fig. 1, for example. The response (left panel) is stomach cancer incidence for each of Slovenia’s municipalities for the period 1995–2001[1]. The explanatory variable (right panel) is socioeconomic status. An inverse relationship between the two variables is clearly evident, i.e., higher socioeconomic status (tending toward black) is associated with lower stomach cancer incidence (tending toward white), and vice versa.

Figure 1

Stomach cancer incidence and socioeconomic status for the municipalities of Slovenia. Figure created using R version 4.1.2 (https://www.r-project.org).

Stomach cancer incidence and socioeconomic status for the municipalities of Slovenia. Figure created using R version 4.1.2 (https://www.r-project.org). Since spatially structured data like the Slovenia data are common in many fields, and explanation, as opposed to interpolation or smoothing, is often desired, spatial regression methods are important data-analytic tools for many practitioners. Unfortunately, the most popular spatial regression model, although intuitive as a posited data-generating mechanism, is problematic as a data-analytic tool. In this article I will describe a little-used alternative spatial regression model that has an equally satisfying motivation but avoids the challenges faced by the more popular model.

Spatial regression models: two roads diverge

A linear spatial regression model for Gaussian outcomes

Both of the spatial regression models treated herein have their genesis in the spatial linear mixed-effects regression model (SLMM), which can be specified as follows. We havewhere is the response; is the design matrix, the p columns of which are spatially structured covariates; are regression coefficients; are spatially dependent random effects, the purpose of which is to accommodate spatial structure in the response that cannot be explained by ; and are iid Gaussian errors, which are independent of . Note that each element of , of , of , and of is spatially referenced, i.e., , , , and , where is the ith spatial location at which the response and covariates were observed. The locations may be points in a continuous domain (e.g., latitude and longitude); or the locations may represent spatial aggregates (e.g., Census tracts, states, pixels), which are referred to as areal units. To increase readability I will generally omit the locations . Two characteristics make (1) a spatial regression model. First, the explanatory variables in exhibit spatially patterned variation, as I mentioned above. Indeed, the chief aim of a spatial regression analysis is to explain spatially patterned variation in the response as having arisen due to an association between and one or more columns of . Second, the spatial random effects accommodate/induce spatial variation in the response beyond that which can be attributed to . Indeed, one might say is a stand-in for covariates that are missing from . It is the supposed random nature of that makes model (1) a mixed-effects model: are fixed effects and are random effects. Typically, is assumed to follow a multinormal distribution having mean and spatial covariance matrix . To say that is a spatial covariance matrix is to say that the th element () of accounts for the spatial relationship between locations and . For example, when the spatial domain of interest is continuous, it is common to let , where K is a spatial kernel function having parameters . For some applications, an appealing choice of K is the powered exponential kernel, which is given bywhere is the distance between locations and , is a smoothness parameter, and is a range parameter. Note that models of the sort just described are usually referred to as Gaussian process (GP) models. GP methods are immensely popular not only in spatial statistics but more generally. The above described spatial model is referred to as a point-level model since the spatial locations are points in a continuous domain. Point-level models are often (less precisely) termed geostatistical models. Should the spatial domain comprise aggregates (as for the Slovenia stomach cancer data presented above), is typically constructed from the undirected graph that represents the adjacency structure among the areal units (for Slovenia, municipalities; see Fig. 2). Here each areal unit corresponds to exactly one vertex in , and edge if and only if areal units i and are adjacent. A popular form of in this case is the proper conditional autoregressive (CAR) model[2], for whichwhere is a smoothing parameter, is diagonal with the degrees of the vertices of G as its diagonal elements, is a range parameter, and is the adjacency matrix for G, i.e., .

Figure 2

The adjacency structure for the municipalities of Slovenia. Figure created using R version 4.1.2 (https://www.r-project.org).

The adjacency structure for the municipalities of Slovenia. Figure created using R version 4.1.2 (https://www.r-project.org). Note that the proper CAR and similar models are Gaussian Markov random field (GMRF) models, which is to say that, conditional on its neighbors, each outcome is independent of the remaining outcomes (spatial Markov property). The dependence structure for a GMRF model is specified in terms of a precision matrix rather than a covariance matrix. It is the form of the precision matrix that implies the conditional independency structure. Moreover, the precision matrix is sparse, and so fast sparse-matrix routines can often be used to speed computation. The curious reader may desire more information on the methods I just touched on. An excellent book-length presentation of GP methods can be found in Williams and Rasmussen[3]. A seminal book in the field of spatial models, which also discusses GP methods but is more wide ranging, is Banerjee et al.[4]. See Rue and Held[5] for a seminal book-length treatment of GMRF models. For a concise treatment of Gaussian random field models for spatial data, see Haran[6]. For a survey of Markov random field models and their applications, see Kindermann and Snell[7]. The seminal paper on geostatistical models is Diggle et al.[8]. And the pathbreaking paper on spatial Markov random field models is Besag[9]. Of course many other excellent sources exist. In any case, there are many possibilities for , and each of the many spatial linear mixed-effects regression models is distinguished by the manner in which said model constructs from . For each model is structured so that and —and hence and —exhibit stronger positive correlation the closer and are to one another, and decays towards 0 as the distance between and increases. Finally, properties of the multinormal distribution imply that the response, too, is multinormal:where is the common variance of the and denotes the identity matrix. Thus the SLMM can be viewed as a linear regression model having spatially correlated multinormal errors as well as the iid Gaussian errors of the ordinary linear model, the latter of which represent extra-spatial dispersion, i.e., variation that cannot be attributed to the spatial process.

Extending the linear spatial regression model

Should we desire to accommodate non-Gaussian response variables, the two formulations of the SLMM, (1) and (2), suggest two different approaches. The first form of the SLMM suggests that we model a non-Gaussian response aswhere g is a suitable link function and the linear predictor is the same as above. We pair (3) with an appropriate distribution for the response. The resulting model is called the spatial generalized linear mixed model (SGLMM) since it can be viewed as a generalized linear model[10,11] that induces spatial dependence in the response variable by augmenting the ordinary linear predictor with spatially dependent random effects . (Conditional on , the outcomes are assumed to be independent). The SGLMM is immensely popular—so popular, in fact, that alternatives receive little attention, despite the fact that the SGLMM poses a number of formidable challenges (to be described shortly). The second form of the SLMM suggests an alternative to the SGLMM, namely, the spatial direct Gaussian copula regression model (SGCRM). The SGCRM can be specified aswhere is a spatial correlation matrix, and are the cdf and covariates, respectively, for the response variable , and is the quantile function for . Note that the are marginally standard uniform owing to the probability integral transform, and follows distribution owing to the inverse probability integral transform[12]. In this scheme we havefor an appropriate choice of . Thus we see that the SGCRM induces “extra” spatial dependence “from below,” if you will, as opposed to augmenting the ordinary linear predictor. This extra spatial structure is produced by a Gaussian spatial copula, of which is a realization: the are marginally standard uniform and jointly carry the dependence structure encoded in . (For book-length treatments of copula methods see Nelsen[13] and/or Joe[14]. And excellent papers on the subject are Kolev and Paiva[15] and Xue-Kun Song[16]). Note that the SGLMM is equivalent to the SLMM if g is the identity function and the response distributions are Gaussian and have covariance matrix . Likewise, the SGCRM is equivalent to the SLMM if is the correlation matrix corresponding to and the response distributions are Gaussian with means and variances . Thus the SLMM and the SGCRM are the same model in this case. When the response is non-Gaussian, the SGLMM and the SGCRM are usually not equivalent[17], and so for a non-Gaussian SGLMM we must consider the joint distribution , while the SGCRM is a marginal model irrespective of the response distribution. This implies that has a conditional interpretation for non-Gaussian SGLMMs, i.e., has its usual interpretation only conditional on the spatial random effects since the random vector is essentially an additional, but unobserved, covariate. By contrast, the SGCRM and the ordinary generalized linear model share the same intuitive marginal interpretation of because both models employ the “bare” linear predictor instead of . And the problem of interpretation does not end there, for the SGLMM exhibits spatial confounding, a type of perfect collinearity between and that can lead to a rather different value for and inflate ’s standard errors so dramatically as to conceal important associations between and [18,19]. To see why the SGLMM is spatially confounded, first let be the orthogonal projection onto (the column space of ), so that is the orthogonal projection onto . Now eigendecompose and to obtain orthogonal bases ( and , say) for and . Then the augmented linear predictor can be rewritten aswhere and are random coefficients. This form shows that is the source of the confounding, for and have the same column space. So troubling is spatial confounding that a considerable literature devoted to the problem now exists. Yet no proposed remedy appears to be satisfactory, and it may be safe to conclude that no proper remedy will ever be found within the standard mixed-effects paradigm[20]. It is fortunate that the SGCRM does not suffer from spatial confounding (because the SGCRM’s linear predictor does not contain spatial random effects). Clayton et al.[18] discovered spatial confounding. Reich et al.[19] and Paciorek[21] then sparked renewed (and sustained) interest in spatial confounding, and offered means of alleviating or remedying the problem. Since then many articles have appeared on the subject. Many of those articles propose some remedy or other within the mixed-effects paradigm. Instead, I recommend avoiding the problem altogether by setting aside the mixed-effects paradigm (for regression). Some spatial modelers might contend that we simply must work within the mixed-effects paradigm if we aim to do both spatial regression and spatial smoothing. But the literature on spatial confounding suggests that regression and smoothing are cross purposes. Perhaps we should consider spatial regression and spatial smoothing to be distinct tasks, and treat them as such. In the remainder of this article I will put forth the SGCRM as a compelling solution to the spatial regression problem. I will address spatial smoothing in future work. Finally, I would be remiss if I did not mention an additional, and important, potential pathology of the SGLMM, namely, that the conclusions implied by an SGLMM fit can be implausible not only for the regression but also for the dependence model. In a later section I will demonstrate by applying an SGLMM to the Slovenia data. For those data, results for both the regression part of the model and the dependence part of the model are implausible.

Approaches to inference for the SGCRM

If one aims to estimate simultaneously the marginal parameters and the copula parameters, doing inference for the SGCRM does not permit a unified approach. For continuous outcomes the SGCRM likelihood is meta-Gaussian, and so likelihood-based inference (i.e., maximum likelihood or Bayesian inference) is straightforward—but potentially burdensome computationally owing to repeated computation of and along with simultaneous estimation of the marginal parameters. For discrete outcomes the SGCRM likelihood comprises on the order of terms and is thus intractable for realistic sample sizes. Consequently, a number of tractable alternative objective functions have been proposed for discrete data[22-28]. Although these alternatives are compelling and well-studied, I recommend a simple, flexible, unified (i.e., suitable for both continuous and discrete outcomes) two-stage approach to inference, as follows. Recall that the standardized residuals for an ordinary GLM fit are given byorwhere is the Pearson residual for the ith outcome , is the corresponding deviance residual, and is the estimated leverage of . Since these residuals are asymptotically standard normal for most forms of the GLM[11,29], it is quite natural in a multivariate setting to regard as a realization of from (4). (In fact, it is advantageous to regard as a realization of even when the residuals appear to depart markedly from standard normality—more on this later.) This suggests that we can fit an SGCRM by first fitting an ordinary GLM having linear predictor , and then using standardized residuals to estimate the parameters of . Specifically, we can estimate by optimizingFinally, armed with and , we can do parametric bootstrap[30,31] inference for , as follows. The resulting bootstrap sample reflects the more dispersed—that is, more variable due to reduced effective sample size—sampling distribution of under dependence and so can be used to do much improved inference for . Fit an ordinary GLM to to estimate the marginal parameters (and perhaps additional, nuisance, parameters). Use the standardized residuals from Step 1 to estimate the parameters of . Produce a bootstrap sample for , where is the bootstrap sample size, as follows. For , simulate ; compute by applying the probability integral transform—as in (4) above—to ; produce by applying the inverse probability integral transform—as in (4) above—to , where the ith quantile function has parameters (and perhaps estimates of nuisance marginal parameters); obtain by fitting the above mentioned ordinary regression model to .

Application of the SGCRM to simulated data

For any topic as broad as spatial regression, many satisfactory simulation study designs exist. I feel that the study design I used for this article more than accomplishes my goal, namely, to exercise broadly my proposed methodology by applying the methodology in a variety of realistic scenarios. To that end, I considered both areal and continuous-domain data; the most common response types; various realistic sample sizes; common forms of the spatial Gaussian copula; both strong and weaker spatial dependence; and intuitive spatially patterned covariates (which permit easy visualization). I considered six scenarios (see Table 1).

Table 1

Scenarios for the simulation study.

Scenario	Response	Copula	Sample size	Copula parameter(s)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }$$\end{document}β
1	Poisson	Proper CAR	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$30\times 30$$\end{document}30×30	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho =0.99$$\end{document}ρ=0.99 (strong)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1 = 3, \beta _2=1$$\end{document}β1=3,β2=1
2	Poisson	Proper CAR		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho =0.8$$\end{document}ρ=0.8 (weaker)
3	Binomial \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(N=20)$$\end{document}(N=20)	Leroux	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$40\times 40$$\end{document}40×40	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda =0.95$$\end{document}λ=0.95 (strong)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1 = 2, \beta _2=0.5$$\end{document}β1=2,β2=0.5
4	Bernoulli	Exponential	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$30\times 30$$\end{document}30×30	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi =0.3$$\end{document}ϕ=0.3 (strong)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1 = 2, \beta _2=0.5$$\end{document}β1=2,β2=0.5
5	Negative binomial	Matérn	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\times 20$$\end{document}20×20	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi =0.1,\nu =1$$\end{document}ϕ=0.1,ν=1 (strong)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1 = 3, \beta _2=1$$\end{document}β1=3,β2=1
6	Negative binomial	Matérn		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi =0.03,\nu =1$$\end{document}ϕ=0.03,ν=1 (weaker)

The first three scenarios were for areal (i.e., spatially aggregated) versions of the SGCRM. For Scenarios 1 and 2 I used a proper CAR copula and Poisson marginal distributions. For both scenarios the underlying graph was the square lattice (sample size ). Recall that the proper CAR model has precision matrix proportional to , where parameter is a range parameter. I used two values for , namely, 0.99 in Scenario 1 and 0.8 in Scenario 2. These values represent strong spatial dependence and somewhat weaker (but still consequential) dependence, respectively. I assigned to the lattice points locations in the unit square centered at the origin, so that both the x and y coordinates range from to 0.5. I used these coordinates as the spatial explanatory variables by using Poisson rates , where is an east–west effect and is a north–south effect. For both scenarios I used (strong effect) and (weaker effect). The resulting mean structure is shown in the left panel of Fig. 3.

Figure 3

Left panel: the mean structure for the Poisson (Scenarios 1 and 2) and negative binomial (Scenarios 5 and 6) simulation studies. Right panel: the mean structure for the binomial (Scenario 3) and ungrouped binary (Scenario 4) simulation studies. Figure created using R version 4.1.2 (https://www.r-project.org).

Scenario 3 increased the sample size to 1,600 by using the lattice. And I used the Leroux GMRF specification[32] for the copula in Scenario 3. The Leroux precision matrix is a mixture of the identity matrix and the intrinsic CAR precision matrix, which is proportional to . (The intrinsic CAR is obtained by setting equal to 1 in the proper CAR model. The intrinsic CAR is widely considered to be an appealing prior distribution in Bayesian spatial modeling, but we cannot use the intrinsic CAR in the SGCRM because is singular.) Specifically, the Leroux precision matrix is proportional towhere is the identity matrix and is a spatial dependence parameter. This specification yields the independence case if , and approaches the intrinsic autoregression as . I used , which implies strong dependence. (See Waller and Carlin[33], Lee[34], and/or LeSage and Pace[35] for more information regarding the proper CAR, Leroux, and similar models). As for response distribution and mean structure, I used binomial marginals for Scenario 3, where the subsample size was and the “success” probabilities wereI let and , which produced over the range 0.2 to 0.8, approximately. This mean structure is shown in the right panel of Fig. 3. The final three scenarios employed Gaussian process specifications for their copulas. For Scenarios 5 and 6 I used a two-parameter Matérn spatial correlation function[36,37]. This correlation function is given bywhere is a range parameter, is a smoothness parameter, denotes the gamma function, and denotes the modified Bessel function of the third kind of order . This kernel is immensely popular not only in spatial statistics but also in machine learning, computer experiments[38], and elsewhere. I set equal to 1 for both of Scenarios 5 and 6, and I let for Scenario 5 and for Scenario 6. These parameter settings correspond to strong dependence and weaker (yet consequential) dependence, respectively. For simulation Scenarios 5 and 6 I once again used the mean structure shown in the left panel of Fig. 3, but I used a less dense lattice (), and I chose negative binomial marginal distributions. The negative binomial is an important model for spatial counts because the negative binomial distribution accommodates overdispersion, i.e., variance larger than the mean, which is commonly exhibited by count data in many contexts but cannot be handled by the equi-dispersed Poisson distribution. I used the parameterization of the negative binomial distribution that provides variance function , where is the mean and is the dispersion parameter. For close to 0 the variance is much larger than the mean, and as the negative binomial distribution converges to the Poisson distribution. I used for both scenarios. This value of produces moderate overdispersion. Finally, for simulation Scenario 4 I employed a special case of the Matérn correlation function, namely, the exponential correlation function:This kernel is suitable for ungrouped binary data since it fixes the smoothness (at 0.5), leaving only the range parameter to be estimated. Learning the smoothness is hopeless for binary data, and so it makes sense to assume a rough (merely continuous in the mean-squared sense) process. I used , which implies strong dependence. Note that I included 3 in the exponential kernel because doing so lends an intuitive interpretation to the range parameter: is the so-called effective range of the process, i.e., the distance between locations and at which has fallen to 0.05. Because my simulation study design melds areal methodology and point-level methodology, it may seem as if my design employs the same spatial domain for both areal data and point-level data. This is not the case, however. For areal data, the underlying graph—in this case a square grid graph—is the spatial domain. For point-level data, the outcomes are observed on a square grid of locations covering the unit square centered at the origin, and so said square is the spatial domain. Because these two notions of spatial domain (discrete domain versus continuous domain) are rather different, areal copulas and point-level copulas are specified in rather different ways. Note that I did not consider continuous response distributions in the simulation. Results for continuous outcomes are predictable because residuals are very well behaved for continuous-response GLMs. And so I chose to focus on discrete spatial data, which are common and pose more of a challenge for any spatial method. Also note that I used a bootstrap sample size of 1,000 and a simulation sample size (i.e., number of simulated datasets) of 1,000 for all simulations described in this article. Scenarios for the simulation study. Left panel: the mean structure for the Poisson (Scenarios 1 and 2) and negative binomial (Scenarios 5 and 6) simulation studies. Right panel: the mean structure for the binomial (Scenario 3) and ungrouped binary (Scenario 4) simulation studies. Figure created using R version 4.1.2 (https://www.r-project.org). The regression results from the simulation study are given in Table 2. For Scenario 1 (Poisson marginals, proper CAR copula, strong dependence) and 95% intervals we see that the coverage rates for ordinary GLM intervals was very poor (27%, 32%) while the coverage rates for the spatial methodology were much better (84%, 88%).

Table 2

Regression results for the simulation scenarios described in Table 1.

Scenario	Interval (%)	Parameter	Ordinary coverage rate (%)	Spatial coverage rate (%)	Ordinary type II rate (%)	Spatial type II rate (%)	Running time (s)
1	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	27	84	0	0	11
	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	32	88	3	29
	99	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	–	95	–	0
	99	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	–	95	–	50
2	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	70	94	0	0	10
2	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	69	94	0	0	10
3	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	29	91	0	0	48
	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	31	90	2	26
	99	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	–	98	–	0
	99	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	–	98	–	48
4	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	31	59	6	13	16
4	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	36	63	30	58	16
5	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	31	89	0	7	101
	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	31	88	16	68
	99	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	–	95	–	12
	99	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	–	94	–	80
6	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1$$\end{document}β1	72	97	0	0	90
6	95	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}β2	70	95	2	19	90

It is unfortunate but not surprising that the spatial method did not provide the desired 95% coverage; this scenario is challenging due to the quite strong dependence (). This can be remedied, however, by using 99% intervals instead (second row of Table 2): we see that 99% intervals gave 95% coverage for both regression coefficients. This could be seen as a weakness of the proposed methodology, but at least the deficiency can be remedied, and the use of similar remedies is commonplace in other statistical contexts. For example, when applying local polynomial methods it is good practice to intentionally undersmooth since the bandwidth chosen by cross-validation tends to be a bit too large. Using a slightly smaller bandwidth yields more accurate pointwise confidence bands. For Scenario 2 (Poisson marginals, proper CAR copula, moderate dependence) the spatial methodology produced 95% intervals having the desired coverage rate. This shows just how challenging Scenario 1 was. The results for Scenario 3 (binomial marginals, Leroux copula, strong dependence) were similar to the results for Scenario 1. The ordinary GLM intervals had very poor coverage rates while the spatial intervals performed much better. Once again the 95% spatial intervals did not have the desired coverage rate (91%, 90%), but 99% intervals offered better than 95% coverage (98%, 98%). For Scenario 4 (ungrouped binary outcomes, exponential copula, strong dependence), which is certainly the most challenging scenario, the spatial methodology performed rather poorly. This is what I expected. I included this scenario in the study in the interest of completeness and also to show just how bleak are our prospects for revealing the data-generating mechanism behind dependent ungrouped binary data. I plan to make dependent ungrouped binary data the focus of a future study. For the final two simulation scenarios, Scenarios 5 and 6, (negative binomial marginals, Matérn copula, strong and moderate dependence, respectively) we see that once again the spatial methodology performed much better than the ordinary GLM methodology. And once again 99% confidence intervals were required in the case of strong dependence (Scenario 5). For Scenario 6, 95% intervals performed well (97% coverage for and 95% coverage for ) owing to weaker, but still consequential, spatial dependence. For all scenarios the findings for type II error rates (type II error defined as interval contains 0) were predictable. Specifically, type II error rates are higher for the spatial method than for the ordinary GLM. This is because my proposed method yields wider confidence intervals, reflecting the reduced effective sample size caused by positive spatial dependence. Average running times were short—considerably shorter than the running times for competing Bayesian procedures or for frequentist approaches that estimate the marginal parameters and the copula parameters simultaneously. Note that the reported average running times were for the full approach as outlined above (estimation of marginal parameters, followed by estimation of copula parameters, followed by bootstrapping). I did not parallelize the bootstrap sampling, however. An embarrassingly parallelized[39] bootstrap procedure would of course lead to even shorter running times. Regression results for the simulation scenarios described in Table 1. Simulation results for the nuisance parameters—, , , , and —are shown in Table 3. For Scenarios 1 and 2, the proper CAR scenarios, we see that the copula range parameter can be recovered with little bias despite the fact that the outcomes for those scenarios were discrete (Poisson). For Scenario 2 the estimator is approximately normally distributed because the value of the parameter, , is sufficiently far from 1 and the sample size was on the large side for areal data. For Scenario 1 the estimator is left-skewed because the value of the parameter, , is rather close to 1. If desired, an approximately Gaussian estimator can be obtained as , where is the standard normal quantile function. This is useful for obtaining a symmetric confidence interval for , the endpoints of which can be transformed back to the scale of to obtain an interval for .

Table 3

Results for nuisance parameters for the simulation scenarios described in Table 1.

Scenario	Parameter	Median estimate	Remarks
1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho =0.99$$\end{document}ρ=0.99	0.976	Small negative bias; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\rho }$$\end{document}ρ^ has left-skewed distribution
2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho =0.8$$\end{document}ρ=0.8	0.763	Small negative bias; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\rho }$$\end{document}ρ^ is approximately normally distributed
3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda =0.95$$\end{document}λ=0.95	0.931	Small negative bias; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\lambda }$$\end{document}λ^ has slightly left-skewed distribution
4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi =0.3$$\end{document}ϕ=0.3	0.109	Large negative bias (63%); \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\phi }$$\end{document}ϕ^ has slightly right-skewed distribution
5	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi =0.1$$\end{document}ϕ=0.1	0.114	Small bias; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\phi }$$\end{document}ϕ^ has strongly right-skewed distribution
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =1$$\end{document}ν=1	0.654	Substantial negative bias; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\nu }$$\end{document}ν^ has right-skewed distribution
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =3$$\end{document}θ=3	5.533	Positive bias; distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document}θ^ has a heavy right tail; many values larger than 10; some extreme values
6	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi =0.03$$\end{document}ϕ=0.03	0.047	Positive bias; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\phi }$$\end{document}ϕ^ has a right-skewed distribution; a few extreme values
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =1$$\end{document}ν=1	0.509	Large negative bias; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\nu }$$\end{document}ν^ has a strongly right-skewed distribution (range 0–6)
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =3$$\end{document}θ=3	3.286	Small positive bias; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document}θ^ has a right-skewed distribution

For the Leroux copula in Scenarion 3, where the response distribution was binomial, once again the dependence parameter can be estimated with only a small bias. And ’s left-skewed distribution can be transformed to approximate Gaussianity by applying the standard normal quantile function, if necessary. The first Matérn scenario, Scenario 4, yielded a poor result, as expected. For ungrouped binary data, neither marginal parameters nor dependence parameters can be recovered reliably. Specifically, in Scenario 4 the median estimate of the Matérn range parameter was biased downward by 63%. That is, the underlying spatial dependence appears to be much weaker than it is. This implies that the effective sample size is overestimated, which leads to very optimistic inference for the regression coefficients. In a future project I will further explore dependent ungrouped binary data. For the negative binomial outcomes in simulation Scenario 5, the range parameter can be recovered with only a small positive bias. But is substantially biased downward because information about spatial smoothness is lost to discretization. Also note that exhibited a large upward bias for this scenario. This can be attributed to the dependence structure: strong positive dependence reduces the variance of the outcomes, working against the overdispersion induced by the marginal distributions. In Scenario 6 smoothness parameter was once again substantially underestimated, on average. And again exhibited positive bias, which was small in magnitude but large as a percentage of the true parameter value. Finally, because the dependence was considerably weaker for Scenario 6, dispersion estimator was only slightly biased upward. Results for nuisance parameters for the simulation scenarios described in Table 1.

Model assessment and choice

Although standardized residuals are quite useful for estimating copula parameters, standardized residuals are often not terribly useful for assessing model appropriateness when the response variable is discrete. Specifically, standardized residuals for discrete outcomes may be far from Gaussian and exhibit banding. Consequently, I recommend that standardized residuals be used only for estimating copula parameters. For assessing model fit I strongly recommend the use of randomized quantile residuals (RQR)[40] since Feng et al.[41] recently showed that RQRs are powerful for detecting many forms of misspecification (e.g., nonlinear effects of covariates, overdispersion, zero inflation) for discrete regression models. Note, however, that RQRs should not be used in place of standardized residuals when estimating copula parameters, for the production of RQRs somewhat obscures the dependence structure, leading to poor estimation of . I discovered this fact through simulation. Akaike’s information criterion (AIC)[42] for ordinary GLMs—which is given bywhere p is the number of marginal parameters and denotes the maximum value of the ordinary GLM log-likelihood—is very useful for choosing the right response family, despite the fact that ordinary AIC neglects dependence. For example, in a large simulation study where the true response distributions were negative binomial and dependence was strong (Scenario 1 from Table 1, but with negative binomial outcomes according to Scenarios 5 and 6), AIC selected the true model over a Poisson model for every simulated dataset. In fact, the minimum difference in AIC values for said study was 23, and the average difference was 104. These are huge differences and so left little doubt that the negative binomial model was superior to the Poisson model. Selecting the best copula model from a collection of candidates is more challenging. An approach that is analogous to using AIC to select a response model, one can use a version of AIC (or some other suitable information criterion[43]) to select a copula model. Specifically, one can computewhere q is the number of copula parameters, is the estimated copula correlation matrix, and are the standardized residuals for the first-stage fit selected using AIC. In the simulation study mentioned above, I used the CAR copula as the true copula and used AIC to choose between the CAR copula and the Leroux copula. Although a paired t test with unequal variances found a statistically significant difference () between AIC for the true copula and AIC for the Leroux copula, the difference in means was very small and AIC selected the true copula for only 59% of the simulated datasets. To determine whether these poor results should be taken as an indictment of AIC or of the CAR and Leroux copulas, I carried out a follow-up simulation study (for Scenarios 1 and 2) in which I computed AIC using simulated realizations of (according to (4)) instead of using standardized residuals from GLM fits to simulated . The follow-up study yielded comparable results, and so we must conclude not that AIC loses information to discretization for areal models but that the CAR and Leroux models are practically indistinguishable. To my knowledge, this is a new finding. Selecting a Gaussian process copula is more complicated. In a simulation study that employed negative binomial marginals and a Matérn copula with smoothness and range , AIC chose the true copula over a powered exponential copula for only 30% of the simulated datasets. The percentage rose to 65% (considerably better but still far from outstanding performance) when I used simulated instead of simulated outcomes. This shows that we do lose to discretization a substantial amount of information about the smoothness of a Matérn copula. This is not surprising and agrees with the results for Scenarios 5 and 6 from Table 3. I obtained comparable results when I simulated data according to a powered exponential copula and compared AIC for the true copula and a Matérn copula, and so it seems that the methodology described in this paper is fairly insensitive to the choice of GP copula, at least for rougher spatial processes. A thoughtful reviewer suggested that I should also investigate the effect of copula misspecification on regression inference. I did so by simulating data according to Scenarios 1 and 2 (CAR copula), and fitting the Leroux copula. Both coverage rates and type II rates were unaffected. Specifically, for the coverage rates were 95% and 93%, and the type II rates were 0%. For the coverage rates were 86% and 88%, and the type II rates were 0% and 31%. I obtained very similar results for data simulated according to Scenarios 5 and 6. That is, regression inference was little affected by GP copula misspecification. For smoother GP processes the Matérn model is preferred to the powered exponential model because the former is more flexible. (Note that the powered exponential and the Matérn coincide when the powered exponential has smoothness 1 and the Matérn has smoothness 0.5 (this is the exponential correlation function), and when the powered exponential has smoothness 2 and the Matérn has smoothness (this is the Gaussian or squared exponential correlation function).) Specifically, the process for the powered exponential kernel is not mean-squared differentiable except for smoothness (when the process is infinitely mean-squared differentiable). A Matérn process, on the other hand, is m times mean-squared differentiable iff . In any case, Williams and Rasmussen[3] pointed out that it is very difficult or even infeasible to distinguish between Matérn smoothness values larger than, say, 7/2. It is even difficult to distinguish between finite values of and for the Matérn! And of course these difficulties are exacerbated when the response is discrete.

Additional computing concerns

Naive computation of and is burdensome and does not scale well. Thus it may be advantageous, or even necessary, to consider approaches for optimizing more efficiently. I will discuss efficient computing for areal models first, and then turn to continuous-domain variants of the SGCRM.

Efficient computing for areal copulas

Recall that Gaussian copulas for areal data are typically Gaussian Markov random fields, which are parameterized in terms of their precision matrices. I considered two such models above, namely, the proper CAR model and the Leroux model. These models have precision matrices that are proportional to and , respectively. Efficient computing for these and similar models can be done as follows. I will use the proper CAR model as an example. Note that is not an inverse correlation matrix because the variances are not equal to 1. But of course we can rescale so its inverse is a correlation matrix, i.e., we can construct a Gaussian copula using , where . This leads to objective functionwhere , with denoting the Hadamard product. Now, numerical methods for sparse matrices can be used to compute quickly. Let be the lower Cholesky triangle of , so that . Then , which implies thatThe righthand side of (5) can be computed efficiently because can be computed efficiently after has been reordered to reduce its bandwidth. Also note that needs to be computed just once—the structure of depends on the sparsity structure of and not on , and so can simply be updated to reflect a change in [44]. It remains to handle the proper CAR variances efficiently. Since computing obviously requires inversion of , it would seem that using the proper CAR copula must leave us unable to fully exploit the sparsity of . This is not the case, however, because the variances can be replaced with approximations so that need not be inverted[25]. The running time of the approximation algorithm scales linearly with sample size. And the difference between the approximate CAR copula and the true copula can be made negligible except with respect to computational complexity. I use the spam package[45] for R to do sparse matrix computations. The chol and update functions of package spam perform the fast Cholesky decomposition and updating described above. And for smaller datasets the spam::chol2inv function can be used to invert .

Efficient computing for Gaussian processes

There are many approaches for speeding computing for Gaussian process models. I will briefly describe covariance tapering[46], an approach that is appealing because it induces sparsity in an intuitive fashion and is effective for copula parameter estimation (and hence for the bootstrapping procedure for SGCRMs). When kriging (i.e., spatial interpolation), rather than regression, is of interest, other GP methods, such as the nearest-neighbor Gaussian process (NNGP)[47], may perform better than covariance tapering. (The NNGP approach is, interestingly, reminiscent of the areal methods described above since the NNGP approach induces sparsity by defining a spatial process in terms of a well-chosen directed acyclic graph having vertices at the observed spatial locations). In spatial covariance tapering, a spatial covariance (correlation) function is multiplied by a suitable positive definite function with compact support. The result is a spatial covariance (correlation) function that equals zero beyond a certain range. For example, if the spatial correlation function is the exponential correlation function , which I used in Scenario 4 of my simulation study, Furrer et al.[46] recommend the spherical tapering function. The spherical tapering function is given bywhere t is the taper length and . Furrer et al.[46] also considered tapers from the Wendland family, which prove useful for Matérn processes having smoothness . If is the tapering matrix with th entry , the tapered correlation matrix is . For a suitable taper length t, this matrix is sparse, and so fast sparse matrix algorithms can be used to compute and . The resulting gain in computational efficiency can be dramatic.

Analyses of Slovenia stomach cancer data

In this section, I revisit the Slovenian stomach cancer data. I applied four methods to the data: (1) ordinary Poisson regression with offset, (2) Poisson SGCRM with proper CAR copula as described in this paper, (3) Poisson SGCRM with Leroux copula, and (4) traditional Poisson SGLMM with proper CAR random effects. I used a bootstrap sample size of 4,000 for the SGCRM analyses. And I applied the SGLMM using Markov chain Monte Carlo for Bayesian inference, the customary approach for that model. I drew 1,000,000 posterior samples, which resulted in small Monte Carlo standard errors ()[48]. For the ordinary GLM and the SGCRM, the transformed conditional mean iswhere is the observed count for the ith municipality, is the expected number of cases for the ith municipality, is an intercept term, and is the fixed effect for socioeconomic status, . For the SGLMM this linear predictor was of course augmented by addition of a spatial random effect. The standardized residuals from the ordinary GLM fit are shown in Fig. 4. These residuals were used in the second stage of the SGCRM procedures to estimate the CAR copula parameter and Leroux copula parameter , and subsequently to incorporate spatial dependence in the bootstraps. We see that the standardized residuals clearly exhibit appreciable but fairly short-range spatial dependence. Specifically, we see regions of similarity, and said clusters are small relative to the extent of the map. To put it another way, we see small-scale spatial structure in these residuals. This visual assessment was corroborated by the estimate of (), which is shown in Table 4 along with the other results. (Recall that near 0 implies short-range dependence while near 1 implies long-range dependence.) The mild dependence in these residuals led to a confidence interval for that does not contain 0 and is just over 5% wider than the interval produced via the ordinary GLM. The Leroux model gave comparable results.

Figure 4

Standardized residuals for an ordinary GLM fit (Poisson regression with offset) to the Slovenia stomach cancer data. Darker gray means larger value. The residuals exhibit substantial, but short-range, positive dependence. Figure created using R version 4.1.2 (https://www.r-project.org).

Table 4

Results for analyses of the Slovenia stomach cancer data.

Approach	Intercept	Effect	Copula parameter	Running time
Ordinary GLM	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }_0=0.156$$\end{document}β^0=0.156	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }_1=-0.137$$\end{document}β^1=-0.137; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1\in (-0.175, -0.098)$$\end{document}β1∈(-0.175,-0.098)	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$< 1$$\end{document}<1 s
SGCRM (CAR)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }_0=0.156$$\end{document}β^0=0.156	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }_1=-0.137$$\end{document}β^1=-0.137; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1\in (-0.177, -0.096)$$\end{document}β1∈(-0.177,-0.096)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\rho }=0.282$$\end{document}ρ^=0.282	18 s
SGCRM (Leroux)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }_0=0.156$$\end{document}β^0=0.156	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }_1=-0.137$$\end{document}β^1=-0.137; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1\in (-0.178, -0.094)$$\end{document}β1∈(-0.178,-0.094)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\lambda }=0.067$$\end{document}λ^=0.067	11 s
SGLMM (CAR)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }_0=0.124$$\end{document}β^0=0.124	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }_1=-0.061$$\end{document}β^1=-0.061; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _1\in (-0.142, 0.022)$$\end{document}β1∈(-0.142,0.022)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\rho }=0.979$$\end{document}ρ^=0.979	1 h

The first row shows results for an ordinary Poisson regression with offset. The second row shows results for the two-stage SGCRM procedure, where the first stage employed an ordinary Poisson regression with offset. The third row shows results for the SGCRM procedure with Leroux copula. The fourth row shows results for an SGLMM fit such that the linear predictor for the Poisson regression with offset was augmented with proper CAR spatial random effects.

An astute reviewer pointed out that plays a complicated role in the joint distribution, as revealed by Wall[49]. Yet subsequent work by Assunção and Krainski[50] established as a spatial range parameter. And so, values of that are close to 0 generally produce small-scale clustering in the data while values close to 1 produce large spatial clusters. The SGLMM with CAR random effects produced a rather different estimate of and found no association between socioeconomic status and stomach cancer incidence. This is implausible given that a simple correlation analysis found a statistically significant association (Kendall’s ; p value < 0.001) between and . The non-significance of the SGLMM regression is due to the well-known phenomenon termed spatial confounding, i.e., collinearity between the spatial random effects and the fixed-effects predictors. Additionally, the CAR SGLMM found quite strong spatial dependence in the Slovenia data, yielding (estimated posterior mode). This is puzzling and implausible given the agreement between the appearance of the residuals and the SGCRM’s small estimates of and . The SGLMM fit is difficult to defend while the SGCRM fit is plausible and satisfying. Standardized residuals for an ordinary GLM fit (Poisson regression with offset) to the Slovenia stomach cancer data. Darker gray means larger value. The residuals exhibit substantial, but short-range, positive dependence. Figure created using R version 4.1.2 (https://www.r-project.org). Results for analyses of the Slovenia stomach cancer data. The first row shows results for an ordinary Poisson regression with offset. The second row shows results for the two-stage SGCRM procedure, where the first stage employed an ordinary Poisson regression with offset. The third row shows results for the SGCRM procedure with Leroux copula. The fourth row shows results for an SGLMM fit such that the linear predictor for the Poisson regression with offset was augmented with proper CAR spatial random effects.

Discussion

I believe the SGCRM, along with the two-stage inferential procedure developed in this paper, should be appealing to spatial modelers whose chief aim is to identify important explanatory variables when the response variable is spatially referenced. Because the SGCRM is a marginal model and thus cannot suffer from spatial confounding or other mixed-model pathologies, regression results from an SGCRM analysis have a clear and intuitive interpretation, namely, the same interpretation as for an ordinary regression model. And the bootstrap in the procedure’s second stage permits the standard errors for the estimated regression coefficients to be appropriately and efficiently adjusted in light of the extra-regression spatial dependence accommodated by the spatial Gaussian copula, the parameters of which can be estimated using standardized residuals from the first-stage fit. I was inspired to write this paper by the excellent paper of Valle et al.[51]. They argue that one can get free of having to select the correct response family for count data (e.g., Poisson, negative binomial, zero-inflated Poisson, etc) by simply applying ordinal regression models to count data. Valle et al.[51] show that an ordinal model fits count data better than the true model of the counts, owing to the ordinal model’s greater flexibility. Although Valle et al.[51] did not consider spatially referenced counts, or, indeed, dependent counts more generally, it stands to reason that ordinal regression models hold the same promise for dependent counts as for independent counts. This suggests that the methodology I explored in this paper can be further unified for spatial count data. Ungrouped binary spatial data present a unique challenge since the residuals from a Bernoulli GLM are typically worthless. In a future study I will carefully explore methods for regression analyses of spatially referenced ungrouped binary data, focusing especially on methods for continuous-domain spatial processes. Another topic for future study is spatial smoothing, which I believe should more often be considered as distinct from spatial regression.

11 in total

A unified Gaussian copula methodology for spatial regression analysis.

Introduction

Spatial regression models: two roads diverge

A linear spatial regression model for Gaussian outcomes

Extending the linear spatial regression model

Approaches to inference for the SGCRM

Application of the SGCRM to simulated data

Model assessment and choice

Additional computing concerns

Efficient computing for areal copulas

Efficient computing for Gaussian processes

Analyses of Slovenia stomach cancer data

Discussion

Review 1. A comparison of conditional autoregressive models used in Bayesian disease mapping.

2. Analysis of the relationship between socioeconomic factors and stomach cancer incidence in Slovenia.

3. Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models.

4. Neighborhood dependence in Bayesian spatial models.

5. Spatial correlation in ecological analysis.

6. The importance of scale for spatial-confounding bias and precision of spatial regression estimators.

7. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.

8. Disease mapping.

9. Ordinal regression models for zero-inflated and/or over-dispersed count data.

10. A comparison of residual diagnosis tools for diagnosing regression models for count data.