Literature DB >> 35974067

A new mixture copula model for spatially correlated multiple variables with an environmental application.

Mohomed Abraj^1,2, You-Gan Wang^3,4, M Helen Thompson^3,4.

Abstract

In environmental monitoring, multiple spatial variables are often sampled at a geographical location that can depend on each other in complex ways, such as non-linear and non-Gaussian spatial dependence. We propose a new mixture copula model that can capture those complex relationships of spatially correlated multiple variables and predict univariate variables while considering the multivariate spatial relationship. The proposed method is demonstrated using an environmental application and compared with three existing methods. Firstly, improvement in the prediction of individual variables by utilising multivariate spatial copula compares to the existing univariate pair copula method. Secondly, performance in prediction by utilising mixture copula in the multivariate spatial copula framework compares with an existing multivariate spatial copula model that uses a non-linear principal component analysis. Lastly, improvement in the prediction of individual variables by utilising the non-linear non-Gaussian multivariate spatial copula model compares to the linear Gaussian multivariate cokriging model. The results show that the proposed spatial mixture copula model outperforms the existing methods in the cross-validation of actual and predicted values at the sampled locations.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35974067 PMCID： PMC9381801 DOI： 10.1038/s41598-022-18007-z

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.996

Introduction

Many environmental sampling is often observed multiple spatially correlated variables at a given geographical location. For instance, multiple topsoil heavy metal concentrations, such as cadmium, zinc, and copper, are sampled from the soil sample at a field location. In forestry, multiple biomass variables, such as bole, foliage, stump, branch, and root biomass, are sampled in a tree. Also, the spatial distribution of forest biomass variables may use to understand wildfire behaviour. These variables can depend on each other in complex ways, such as non-linear and non-Gaussian spatial dependence. The spatial modelling by considering these complex multivariate spatial dependence may increase the prediction accuracy of individual variables, which may help forest managers to minimise risk and save lives. This article focusses on the copula-based spatial modelling of spatially correlated multiple variables and predicts the individual variables while utilising multivariate spatial dependence of spatially correlated variables. Gaussian-based linear kriging method is widely used to model spatial variables and provides a weighted average measure of linear spatial dependence. The kriging weights do not depend on the different values of samples and also assume linear Gaussian spatial dependence over the spatial domain[1-6]. However, spatial interpolation (prediction or simulation) based on a spatial model expects to behave differently for different values of samples. That is, spatial correlation between samples varies for the different quantiles of samples. Thus, Bárdossy[7] introduced spatial copula method that can capture the spatial dependence of a spatial variable by considering the different values of samples. In the non-spatial setting, copula method is used to model the dependence between two or more non-spatial variables, which has widely applied in many fields, such as environmental science, finance, economics, medicine and engineering[8-14]. Bárdossy’s[7] spatial copula method divides the distance over which spatial dependence exists into equally spaced intervals, also referred to as distance classes or spatial bins, and requires the same family of copulas to be fitted across all of the spatial bins. Gräler and Pebesma[15] proposed a more flexible spatial copula model that permits copulas from different families to be fitted across the distance classes. The added flexibility of Gräler and Pebesma’s model, over Bárdossy’s model, permits increased accuracy in modelling and prediction. The spatial copula concept proposed by Bárdossy, Gräler and Pebesma has used in mining, forestry, soil sampling, hydrology, and other environmental applications[8,16-24]. However, these spatial copula methods enable modelling and predicting a univariate spatial variable without considering the multivariate dependence of spatially correlated multiple variables. Recently, Gnann et al.[25] improved Bàrdossy’s[7] method to interpolate a primary spatial variable while considering a secondary correlated spatial variable. However, Gnann et al.[25] assumed that the joint distribution of primary and secondary variables follows Gaussian copula. As a solution to model non-Gaussian multivariate spatial dependence in spatial copula framework, Musafer et al.[26] proposed a multivariate spatial copula model, whereby the correlated spatial variables were transformed into spatially uncorrelated factors using non-linear principal component analysis (NLPCA). Then, Gräler and Pebesma’s univariate spatial copula model was used to model and predict spatially uncorrelated factors. Subsequent back transformation is required to transform predicted values to the scale of the original variables and to re-inject correlation. However, Musafer et al.’s[26] method indirectly models the joint dependence between spatial variables through a black-box transformation. We directly extend Gräler and Pebesma’s univariate spatial copula to multivariate setting that jointly models spatially correlated multiple variables via a white-box mixture copula[24]. The mixture copula is a joint distribution function of multiple copulas that offers a more flexible framework for parametric statistical modelling and analysis. Also, a single copula family may not be able to capture tail dependencies but the mixture copula capture the tail dependencies as well[24]. The mixture copula has used in the non-spatial setting for modelling multivariate genomic data[27], and modelling wave height and period[28,29]. We adapt the mixture copula in the spatial setting that offers a more flexible multivariate spatial copula framework for spatially correlated multiple variables.

Methods

The methodology for modelling spatially correlated multiple variables consists of two essential components: modelling each spatial variable separately using Gräler and Pebesma’s[15] univariate spatial copula; then joining the univariate spatial copulas using the idea of mixture copula[24]. We also use the proposed spatial mixture model to predict individual variables using inverse conditional approach in a bivariate context[30]. However, the method can be used to predict more than two variables with a trivial generalisation of the bivariate setting.

Modelling

Let be the second-order stationary multivariate spatial random field with m spatial variables that are sampled at the same two-dimensional location , and let be the set of existing locations in the given spatial domain . A spatial copula[15] describes the joint spatial dependence of a univariate spatial variable at any two spatial locations x and , where h is the separation distance between two locations. Hence, spatial copulas model dependence of one spatial location relative to another spatial location, rather than modelling dependence using absolute locations. The methodology for modelling spatially correlated multiple variables is simply shown in Fig. 1, and a detail procedure for the model development is provided in steps 1–4.

Figure 1

A diagram for spatial mixture copula construction.

A diagram for spatial mixture copula construction. Step 1: For each spatial variable , models the marginal cumulative distribution functions (CDFs), such as Gamma, Weibull, Normal, Log-normal, and obtain the best fitted CDFs. Let denote the best fit CDF of , which is assumed to be same at each location x, i.e., . The proposed method is based on the concept of distance dependent spatial copula[15,31]. Hence, the distances between every pair of locations are calculated. Suppose, , , and are four sampled locations of . Example plot to show the possible pairs with four locations. As given in Fig. 2, the distances are calculated for each location pair - as , where and are the coordinates of and , for , , respectively. Also, number of pairs are obtained with n sampled locations. The next important step of the methodology is spatial binning.

Figure 2

Example plot to show the possible pairs with four locations.

The spatial dependence of copula-based spatial models depends on the distance between locations. As the distance between two points increases, spatial dependence between points decreases until it is independent or negligible enough to be considered independent. The distance at which independence occurs is referred to as the cut-off distance and is determined empirically using a correlogram. The correlogram plots the Kendall’s tau correlation coefficient for each spatial bin, and a curve is fitted through the plotted points. Similar to a variogram in Kriging, the cut-off distance is visually determined as the distance at which the curve plateaus. Step 2: Based on the distance between pairs, place each sample pair into K equally spaced spatial bins as follows: , where is the cut-off distance. A correlogram is used to determine the cut-off distance as a plot of against the mean distance of each bin, which is calculated using the pairs belonging to relevant spatial bin. Figure 3 depicts an example correlogram.

Figure 3

An example correlogram. The blue dashed line indicates the upper limit of cut-off distance at which pairs of points are no longer considered to be spatially dependent. Empirical values (black dots) overlaid with theoretical cubic smooth line. Given the pairs of points for each spatial bin, spatial copula that describes the dependence of spatial variable at any two locations can be calculated as,where is the index of the spatial bin, u and v are any selected quantiles of the corresponding univariate CDF of at locations x and . The copulas for each bin are selected using maximum log-likelihood values of competing copulas, such as Gaussian, Student’s t, Clayton, Frank, Gumbel, and Joe[30], which represent variety of dependence structures. Then, a mixture copula is used to determine the multivariate spatial dependence across bins as a weighted linear combination of copula. Step 3: For each spatial bin k, use the spatial copulas in Eq. (1) to construct the mixture copula as,where, is the mixture weight, , and . An equal weight can be used in Eq. (2) if the correlogram of each variable is not significantly different. Otherwise, compare different weight combinations across bins and obtain the optimal weight combination. Moreover, for small distances, pairs of points will become extremely strong dependent and modelled using a comonotonic copula M(u, v). For large distances, pairs will become independent and modelled using a product copula [32], as follows The mixture copula in Eq. (2) only describes the multivariate spatial dependence across individual bins. However, a spatial model should be able to capture spatial autocorrelation between bins[15]. For instance, points near the upper bound of the first bin and the lower bound of the second bin may have similar features; points near the upper bound of the second bin and lower bound of the third bin; and so on. Thus, the spatial dependence is incorporated using the distance dependent parameter that determines spatial dependence while incorporating spatial autocorrelation. In practical situations, the first bin is modelled using the best fit copula for that bin, and subsequent bins are modelled using the convex linear combination of copulas with parameter [15]. Further, pairs that fall above the cut-off distance are often omitted and not incorporated into the convex combination, assumed as an independent copula. Step 4: Use Eq. (2), construct the distance dependent spatial mixture copula of as the convex linear combination of mixture copulas of each spatial bin as follows,where for , is the mean distance, and denote upper limits of the chosen distances for the spatial bins.

Prediction

Prediction of individuals spatial variables at sampled locations based on the spatial mixture copula is described in a bivariate context. That is , then is the spatial mixture copula of and . The prediction method demonstrates the advantage of using a secondary correlated spatial variable in the prediction of a primary spatial variable[25]. Thus, an inverse conditional prediction approach is proposed[30],[pp. 40–42] where a secondary correlated spatial variable is known when predicting the primary spatial variable . Suppose is the primary variable of interest, then is correlated secondary variable. The prediction of at location x conditional on the known given value of can be generated at the same location x, using the copula CDF of . The procedure of the inverse conditional approach is given in steps 5-8, Step 5: Obtain the joint CDF values of and using , and let T be the vector with joint CDF values. Step 6: Obtain the marginal CDF values of using , and let R be the vector with marginal CDF values. Step 7: Derive the conditional distribution of T, given , using the partial derivative of as follows,let be the conditional predicted value of at location x. Step 8: Take, . The prediction of , given , can be described by simply switching the subscripts 1 and 2 in the steps 5–8. The proposed method can be validated against actual values at sampled locations by cross-validation, and three scenarios are considered with the existing methods. Can any improvement in the prediction of individual variables be gained by utilising the multivariate spatial dependence using mixture copula over the univariate pair copula[15]? Can any improvement in the prediction of individual variables be gained by utilising the mixture copula over the NLPCA transformation based spatial copula[26]? Can any improvement in the prediction of individual variables be gained by utilising the non-linear non-Gaussian multivariate spatial dependence (spatial mixture copula) over the linear Gaussian multivariate spatial dependence (cokriging)[33]? The cross-validation study is illustrated using mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE). The MAE, RMSE and MAPE can be calculated using the actual and predicted values at the sampled locations[26]. Also, accuracy in the reproduction of the bivariate relationship of and is evaluated based on the mean square error from the kernel density estimation (KDE MSE). The KDE MSE can be calculated by taking the mean of the squared differences between the bivariate KDEs of the actual and predicted data[26].

Application

The proposed method was applied to model real forest data that was taken from georeferenced forest inventory plots in the US Department of Agriculture Forest Service Bartlett Experimental Forest (BEF) in Bartlett, New Hampshire[34]. The variables of interest were forest-wide biomass estimations within the area of 1053 hectares (measured in mg/ha). In this study, only foliage biomass () and bole biomass () were used that sampled at 335 two-dimensional locations. The prediction of bole biomass can be used for carbon accounting purposes, and the prediction of foliage biomass can be used to identify regions with high values of foliage biomass. Also, the behaviour of wildfires depends on pools of biomass variables[26,35]. Table 1 gives the summary statistics of the data. Figure 4a,b show the spatial distributions of and at observed locations. Figure 4c shows a strong bivariate non-linear relationship between and . The best marginal distributions were selected based on the maximum log-likelihood (ML) values. The Weibull distribution was achieved as the best distribution for based on the ML values, 65.03, 69.43, 33.26, 44.03, and the Gamma distribution was achieved as the best distribution for based on the ML values, 378.86, 378.28, 250.20, 377.51, of Gamma, Weibull, Normal, Log-normal distributions respectively. Then, the CDF values of and were calculated using the corresponding CDFs. The following steps for the modelling is only incorporated the CDF values of and (Step 1).

Table 1

Summary statistics of and .

Statistics	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_1$$\end{document}Z1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_2$$\end{document}Z2
n	335	335
Mean	0.334	0.119
Standard deviation	0.219	0.115
Minimum	0.200	0.010
First quartile \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q_1$$\end{document}Q1	0.140	0.030
Median	0.300	0.090
Third quartile \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q_3$$\end{document}Q3	0.510	0.160
Maximum	0.820	0.560

Figure 4

BEF data. Spatial distributions of (a) , (b) , and (c) scatter plot between and .

Summary statistics of and . BEF data. Spatial distributions of (a) , (b) , and (c) scatter plot between and . BEF data: spatial binning. The cut-off distance was selected as 800 m using the correlograms of variables, and ten equally spaced (80 m) spatial bins were created (see Table 2). Table 3 shows the best fit copulas and the estimated copula parameters, where and are the fitted univariate spatial copulas of and respectively (Step 2). The correlation across bins almost similar for each variable (see Table 2), and then equal weights were used. Table 4 shows the mixture copulas of each bin (Step 3).

Table 2

BEF data: spatial binning.

Bins	Mean distance	Kendall’s tau
Bins	Mean distance	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_1$$\end{document}Z1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_2$$\end{document}Z2
0–80	68	0.31	0.23
80–160	105	0.20	0.18
160–240	209	0.11	0.10
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots$$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots$$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots$$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots$$\end{document}⋮
720–800	758	0.03	0.03

Table 3

The univariate spatial copulas for each bin.

Bins	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${C_{1,k,h}}-Z_1$$\end{document}C1,k,h-Z1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${C_{2,k,h}}-Z_2$$\end{document}C2,k,h-Z2
0–80	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,1,h}$$\end{document}C1,1,h = Joe (1.71)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,1,h}$$\end{document}C2,1,h = Joe (1.48)
80–160	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,2,h}$$\end{document}C1,2,h = Gumbel (1.31)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,2,h}$$\end{document}C2,2,h= Gaussian (0.29)
160–240	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,3,h}$$\end{document}C1,3,h = Gumbel(1.16)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,3,h}$$\end{document}C2,3,h = Frank (1.12)
240–320	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,4,h}$$\end{document}C1,4,h = Gumbel (1.10)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,4,h}$$\end{document}C2,4,h = Clayton (0.19)
320–400	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,5,h}$$\end{document}C1,5,h = Gumbel (1.06)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,5,h}$$\end{document}C2,5,h= Clayton (0.13)
400–480	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,6,h}$$\end{document}C1,6,h = Joe(1.09)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,6,h}$$\end{document}C2,6,h = Joe (1.08)
480–560	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,7,h}$$\end{document}C1,7,h = Joe (1.07)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,7,h}$$\end{document}C2,7,h = Gumbel (1.03)
560–640	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,8,h}$$\end{document}C1,8,h = Clayton (0.09)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,8,h}$$\end{document}C2,8,h= Clayton (0.06)
640–720	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,9,h}$$\end{document}C1,9,h = Clayton(0.08)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,9,h}$$\end{document}C2,9,h = Clayton (0.05)
720–800	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{1,10,h}$$\end{document}C1,10,h = Joe (1.05)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2,10,h}$$\end{document}C2,10,h = Gumbel (1.03)

Table 4

The mixture copulas of each bin with ==0.5.

Bins	Mean distance (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{h}_k$$\end{document}h¯k)	Mixture copula (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{k,h}$$\end{document}Ck,hm)
0–80	68	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{1,h} = 0.5C_{1,1,h} +0.5C_{2,1,h}$$\end{document}C1,hm=0.5C1,1,h+0.5C2,1,h
80–160	105	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{2,h}=0.5C_{1,2,h} + 0.5C_{2,2,h}$$\end{document}C2,hm=0.5C1,2,h+0.5C2,2,h
160–240	209	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{3,h} = 0.5C_{1,3,h} + 0.5C_{2,3,h}$$\end{document}C3,hm=0.5C1,3,h+0.5C2,3,h
240–320	284	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{4,h} = 0.5C_{1,4,h} + 0.5C_{2,4,h}$$\end{document}C4,hm=0.5C1,4,h+0.5C2,4,h
320–400	368	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{5,h} = 0.5C_{1,5,h} +0.5C_{2,5,h}$$\end{document}C5,hm=0.5C1,5,h+0.5C2,5,h
400–480	436	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{6,h} =0.5C_{1,6,h} + 0.5C_{2,6,h}$$\end{document}C6,hm=0.5C1,6,h+0.5C2,6,h
480–560	518	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{7,h} = 0.5C_{1,7,h} + 0.5C_{2,7,h}$$\end{document}C7,hm=0.5C1,7,h+0.5C2,7,h
560–640	606	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{8,h} = 0.5C_{1,8,h} + 0.5C_{2,8,h}$$\end{document}C8,hm=0.5C1,8,h+0.5C2,8,h
640–720	678	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{9,h} = 0.5C_{1,9,h} +0.5C_{2,9,h}$$\end{document}C9,hm=0.5C1,9,h+0.5C2,9,h
720–800	758	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{m}_{10,h} = 0.5C_{1,10,h} + 0.5C_{2,10,h}$$\end{document}C10,hm=0.5C1,10,h+0.5C2,10,h

The univariate spatial copulas for each bin. The mixture copulas of each bin with ==0.5. The mixture copulas in Table 4 were used to develop the distance dependent convex combination of mixture copulas as given in the Eq. (3), which is the proposed spatial mixture copula of the spatially correlated and (Step 4), is given bywhere = = 0.31, = 0.61,, = 0.48. The proposed spatial mixture copula method was used to predict and using the inverse conditional approach as described in the steps 5–8. Figure 5 shows the bivariate relationship of and . Table 5 shows the model validation results with the existing methods.

Figure 5

Reproduction of bivariate relationship using various methods. Actual (red), predicted (black), given (green), and given (blue).

Table 5

Model validation in prediction of and .

Method	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_1$$\end{document}Z1			\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_2$$\end{document}Z2			KDE
	RMSE	MAE	MAPE	RMSE	MAE	MAPE	MSE
Pair copula	0.20	0.17	1.12	0.11	0.08	1.98	3.61
Cokriging	0.19	0.16	0.54	0.11	0.08	0.75	3.71
NLPCA	0.29	0.24	1.64	0.14	0.10	2.21	12.40
Mixture copula	0.14	0.13	0.56	0.06	0.05	0.63	1.34

Significant values are in [bold].

Reproduction of bivariate relationship using various methods. Actual (red), predicted (black), given (green), and given (blue). Model validation in prediction of and . Significant values are in [bold]. According to Table 5 almost all the RMSE, MAE, and MAPE values are the lowest for the and predictions based on the spatial mixture copula. The MAPE value of cokriging method is the smallest for the prediction that is very close to the spatial mixture copula. Thus, it can be seen that the proposed method outperformed in the prediction of and across the observed locations. Also, the proposed method accurately reproduces the bivariate relationship in terms of the minimum value of KDE MSE. In Fig. 5, the univariate pair copula method does not reproduce the tail values of both variables. Cokriging is unable to predict tail values and follows a strictly linear relationship. Although the NLPCA spatial copula method reproduces the non-linear relationship, it cannot reproduce upper tails, specifically for . The prediction of individual variables using the novel spatial mixture copula method accurately predicts both upper and lower tail values, and conditional values of the variables reproduce the non-linear relationships between them. Thus, using mixture copula in the multivariate spatial copula framework is improved the accuracy in the univariate prediction.

Conclusions

This article proposed a new mixture copula method for modelling spatially correlated multiple variables. The proposed method models multiple spatial variables without any normalisation of the original variables, such as NLPCA transformation. The method was applied to model bivariate non-linear spatial variables (foliage biomass) and (bole biomass). The model performance was assessed in the cross-validation of actual versus predicted values at sampled locations. The use of multivariate spatial dependence in the univariate prediction, the strength of the mixture copula in the univariate prediction, and utilising non-linear non-Gaussian multivariate spatial dependence in the univariate prediction, were compared with the existing univariate pair copula, NLPCA spatial copula and cokriging methods, respectively. The results showed that the proposed spatial mixture copula model outperformed the existing methods in terms of the minimum values of RMSE, MAE, MAPE, and KDE MSE. The method also applied to non-linear simulated bivariate correlated variables (see Supplementary online), where the spatial mixture copula outperformed the existing methods, in terms of predicting individual simulated variables and their bivariate relationship. The proposed method used equal weights for each variable for both BEF application and simulation study. However, further improvement to the spatial mixture model is the optimal weights selection of each variable in the mixture copula modelling. For instance, one spatial variable may have a strong spatial dependence across locations than the other variable, and the optimal weights selection may increase the prediction accuracy of each variable across locations. The prediction method is explained for the bivariate case (m = 2), however, it can be extended to multivariate setting. Also, the proposed spatial mixture copula can be extended to multivariate spatial sampling design methodology for optimally selecting additional sampling to reduce prediction uncertainty by leveraging spatially correlated multiple variables. Moreover, the proposed method assumes isotropic spatial dependence of spatial variables but can be extended to model spatially correlated anisotropic variables, which can be present in mining[36] and soil variables[37], for example. The proposed method assumes that the spatial random field is stationary. However, the method can be extended to non-stationary spatial processes. For example, a non-stationary spatial process can be divided into several locally stationary processes. A univariate spatial copula can be modelled to each stationary process, and then a global non-stationary spatial copula can be constructed as a mixture of locally stationary spatial copulas. Furthermore, the proposed method assumes that all data points are known and collected at the same set of locations. However, measurements could be unavailable or difficult to sample at some locations, which is quite common in functional spatial data analysis. Thus, the proposed method can be extended for modelling and predicting complex functional spatial data. Supplementary Information.

5 in total

1. spBayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models.

Authors: Andrew O Finley; Sudipto Banerjee; Bradley P Carlin
Journal: J Stat Softw Date: 2007-04 Impact factor: 6.440

2. Empirical Bayesian kriging implementation and usage.

Authors: Alexander Gribov; Konstantin Krivoruchko
Journal: Sci Total Environ Date: 2020-02-15 Impact factor: 7.963

3. An investigation into seasonal variations of groundwater nitrate by spatial modelling strategies at two levels by kriging and co-kriging models.

Authors: Ali Asghar Rostami; Vahid Karimi; Rahman Khatibi; Biswajeet Pradhan
Journal: J Environ Manage Date: 2020-06-30 Impact factor: 6.789

4. A mixture copula Bayesian network model for multimodal genomic data.

Authors: Qingyang Zhang; Xuan Shi
Journal: Cancer Inform Date: 2017-04-12

5. Prediction of nickel concentration in peri-urban and urban soils using hybridized empirical bayesian kriging and support vector machine regression.

Authors: Prince Chapman Agyeman; Ndiye Michael Kebonye; Kingsley John; Luboš Borůvka; Radim Vašát; Olufadekemi Fajemisim
Journal: Sci Rep Date: 2022-02-22 Impact factor: 4.379

5 in total