Literature DB >> 26928878

A Variational Bayes Approach to the Analysis of Occupancy Models.

Allan E Clark^1,2, Res Altwegg^1,2, John T Ormerod^3,4.

Abstract

Detection-nondetection data are often used to investigate species range dynamics using Bayesian occupancy models which rely on the use of Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution of the parameters of the model. In this article we develop two Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model which uses logistic link functions to model the probability of species occurrence at sites and of species detection probabilities. This task is accomplished through the development of iterative algorithms that do not use MCMC methods. Simulations and small practical examples demonstrate the effectiveness of the proposed technique. We specifically show that (under certain circumstances) the variational distributions can provide accurate approximations to the true posterior distributions of the parameters of the model when the number of visits per site (K) are as low as three and that the accuracy of the approximations improves as K increases. We also show that the methodology can be used to obtain the posterior distribution of the predictive distribution of the proportion of sites occupied (PAO).

Entities: Chemical Disease Species

Mesh：
Algorithms
Bayes Theorem

Year: 2016 PMID： 26928878 PMCID： PMC4771718 DOI： 10.1371/journal.pone.0148966

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Bayesian analysis is a coherent statistical paradigm whereby prior information regarding the research area is blended with that of information obtained from the observed data [1]. Subjective prior information is elicited either from expertise in the field or based on prior research (meta analyses). Informative priors are increasingly being used in ecology ([2, 3]) and even in the absence of prior information many ecologists are using Bayesian methods [4]. One class of model that is often analysed in a Bayesian way is the occupancy model [5]. The single season occupancy model was formulated by using ideas borrowed from closed population mark-recapture models. In this model n sites are visited a number of times (K) in order to estimate the occupancy () and detection probability (denoted throughout as ) of a species associated with each site. (The term detection probability should be read as conditional detection probability throughout the text.) These methods are particularly useful when studying the range dynamics of various animal species and have extensively been applied in the ecological literature (see [6, 7] and [8] for some examples). The model has been formulated as a hierarchical Bayesian model which has lead to numerous extensions of the single season occupancy model ([9, 10] and [11]). Many papers have investigated the statistical properties of the estimators of the single season occupancy model. The first of these developed a maximum likelihood formulation of the model and investigated the properties of the estimators for the occupancy and detection probabilities using simulations. They assume that the parameters of the model are constant for all sites although also consider incorporating covariates in the model. They found that when d ≥ 0.3, the parameter estimates of the occupancy probability were reasonably unbiased when K ≥ 5 while when K = 2, a detection probability of at least 0.5 is required to provide a reasonable estimate of ψ. They also found that when the true detection probability is low that tends to 1 [5]. Numerous authors have found similar results regarding boundary problems ([12, 13] and [14]) although it has been argued that boundary parameter estimates are rare but could be observed in small data sets [13]. Moreno and Lele investigated the small sample properties of the maximum likelihood estimators [15]. They note that ‘When detection or occupancy probability is small or when the number of sites and number of visits per site is small, maximum likelihood estimators (MLE) of site occupancy parameters have large biases, are numerically unstable, and the corresponding confidence intervals have smaller than nominal coverage.’ They proposed a penalized maximum likelihood method which performed adequately for small sample sizes. Recently, their study has been extended by considering three different penalized likelihood type models [14]. They found that the penalized methods performed well and suggested that ‘fully Bayesian methods would be competitive’. Here, we develop Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model. One big advantage of the methods developed here is the fact that they could be applied to cases where the researcher has informative priors and might not want to rely on the use of the MLE method. In that situation, Markov Chain Monte Carlo (MCMC) methods were so far the only methods available for fitting occupancy models in a Bayesian analysis. However, for big data sets, MCMC methods can be too slow to be useful. Admittedly the potential computational efficiencies accrued from using a VB algorithm compared to the MLE method would possibly only apply when fitting more complicated occupancy type models. We view our contribution as a first step towards developing similar methods for more complicated occupancy models (e.g., the inclusion of site-specific random effects, spatial occupancy models and dynamic occupancy models). The proportion of occupied sample locations ( where z is the occupancy state for site i) is a derived parameter of interest in many studies ([9, 16]). Although frequentist methods can be used to estimate , the calculation of a valid confidence interval for is problematic for the frequentist. The same holds true for prediction of occupancy status in species distribution models [17]. We show (via simulations as well as using practical examples) that the VB approximations can be used to accurately obtain prediction intervals for latent state variables (e.g., occupancy states ) or for functions of these state variables by simulating from the VB posterior distributions. This paper commences with a brief discussion of Variational Bayes (VB). Thereafter, a VB implementation of a particular occupancy model is developed in Section 1.2. In Section 2.1 the results of a short simulation study are presented while in Section 2.2 we analyse site occupancy data of five bird species to illustrate the usefulness of the VB technique developed. A list of some of the notations and distribution theory used in the text can be found in S1 Text.

1 Material and Methods

1.1 A brief introduction to Variational Bayes (VB)

Variational Bayes is used to approximate posterior distributions obtained when undertaking Bayesian analysis and could be useful in many ecological applications. In what follows let be a vector of parameters of a statistical model, π() be a prior distribution for these parameters and be a random variable. In the context of this article, are the parameters of a single-season occupancy model while represents detection-nondetection data used to fit an occupancy model. Further, suppose that a posterior distribution π(|) is not analytically tractable and that analytical expressions for its posterior moments do not exist. In probability theory the Kullback-Leibler (KL) divergence provides a measure of a difference between two probability distributions [18]. When the two distributions being compared are exactly the same the divergence measure is equal to zero while when they are different the divergence measure is positive. The VB method approximates a posterior distribution by using a distribution q() which is obtained by minimizing the Kullback-Leibler (KL) divergence between q() and π(|) [18]. The KL divergence is where p() is the marginal likelihood, p(, ) is the joint likelihood of the data and the parameter vector with Since KL (q()||p(|)) ≥ 0, ln p() ≥ L(q()) for every q() and minimising KL (q()||p(|)) is equivalent to maximising L(q()). Often it is assumed that q() can be factorized as a product of simple probability distributions as q() = ∏ q(θ) where each of the q(θ) are iteratively estimated as . Here denotes an expectation with respect to the density ∏ q(θ). An alternate method of obtaining q() involves making an assumption regarding its parametric form. The parameters of this distribution are obtained my maximising L(q()) [19]. VB is often used as an alternative to Markov chain Monte Carlo (MCMC) methods since the method can be much faster to implement since in most applications q(θ) will be of a known simple form ([20-22]). Variational approximations to posterior distributions can accurately estimate the posterior mean of the parameters, although the posterior variances of some of the parameters might be underestimated ([23, 24]). Although this problem is context specific the estimate of the posterior variance is asymptotically valid for linear models [25]. As a solution the variational covariance matrix is often replaced by the inverse of the Fishers’ information matrix [23]. Alternately the non-parametric bootstrap could be used to provide interval estimates of the parameters [26].

1.2 VB applied to single season occupancy models

In a single season occupancy model n sites are visited times in order to estimate the occupancy () and detection () probability of a species associated with each site. Each site could be surveyed a different amount of times such that = (K1, K2, … , K) where K represents the number of surveys undertaken to site i and is a ragged matrix with dimensions determined by . The total number of site visits undertaken is defined as N = ∑ K. The data collected at each site are represented as an N dimensional vector , where each of the denotes the vector of detections and nondetections for site i. A 0 in the vector indicates that the species was not observed at the i site during a particular visit while a 1 indicates that the species was observed at the particular site during a particular visit. Let the vector represent the true species occupancy at the sites considered. Since we are using a single season model, is assumed to be constant across the season. is partially observed, i.e. z = 1 if the species occupies site i and z = 0 if it does not occupy site i. We know z = 1 if the species is observed at site i during any of the visits since we assume that there are no false identifications of individuals. If the species is however not observed at site i, z could equal 0 or 1 since we are uncertain about whether the species actually occurs at that site. We treat as a row vector and is of dimension 1 × K while is of dimension n × 1. The single season occupancy models can be represented using the following hierarchical model [9] for all sites i = 1, … , n; for all visits j = 1, … , K. ψ = Pr(z = 1) denotes the probability that the species occurs at site i while p = Pr(y = 1|z = 1) denotes the conditional probability of detecting the species during the j visit of site i given that the species is present at site i. The occupancy probabilities and the detection probabilities can be estimated using either maximum likelihood [5], penalized maximum likelihood [15] or Bayesian methods [9]. In what follows we develop a VB approach to estimating these quantities. Additional covariate data collected at each of the sites are used to estimate the site occupancy and detection probabilities. Specifically we assume that we have r occupancy and s detection covariates. We further assume that we have no missing values in these covariates. Formally we let and be the design matrices for the detection and occupancy effects respectively, with dimensions N × s and n × r. Correspondingly, let and be the detection and occupancy effects with dimensions s × 1 and r × 1 respectively. The matrix is constructed by row-binding the detection covariates at the different locations and for different visits one below each other such that where each of the matrices are of dimension K × s with . The occupancy and detection probabilities at the various sites for all visits are modelled using the following logistic link functions It can be shown that the conditional likelihood of the data and the true occupancy variables is We now assume that the prior distribution for and are multivariate Gaussian distributions (denoted as π(, )) with parameters , and , respectively. We further assume that the variational approximate distribution of π(, , ) is of the form q(, , ) = q(, )∏ q(z) where each of the q(z) are Bernoulli distributed with success probability (sp). Under this restriction q(, ) can be factorized into two separate factors q() and q() with where , , and b(x) = ln(1 + exp(x)). Here . Refer to S1 Appendix for a derivation of the above results. The normalization constant of q(, ) is not known analytically and thus q() and q() are not of a known type. We attempt to approximate the posterior distribution of (, ) using two different methods. In the first method we approximate the variational distribution by using a Laplace approximation to Eqs (3) and (4) and thus assume that the variational distributions are multivariate Gaussian with parameters , Σ and , Σ respectively; while in the second method we employ a tangent based approximation to b() and b() to obtain approximations to q() and q() respectively. Once we have obtained approximations to q(, ) it then follows that the q-densities, q(z| = 0)∀i, is Bernoulli distributed with success probability (1 + exp(−c))−1. The approximate conditional occupancy probabilities for all sites can then be calculated for the two methods (denoted as ‘L’ and ‘T’ respectively) using Here is a vector of length K such that The parameters of the variational distributions are all dependent on one another and can be computed using an iterative scheme such as that given in Algorithm 1 and Algorithm 2. A detailed description of aspects of the above derivations can be found in the supplemental information to this paper. In particular, the quantities used to calculate can be found in S2 Appendix while an explanation regarding the stopping rule for both algorithms is described in S3 Appendix.

1.3 SIMULATION STUDY

In the following simulation study we investigate some of the properties of the VB method and investigate whether it could be used to produce statistically valid inference. We specifically focus on the frequentist properties of the posterior mean parameters of the VB distribution of and . This task is undertaken by empirically comparing the coverage probability and credibility/confidence intervals of and associated with the two VB methods developed and comparing these to the same statistics obtained using MCMC and maximum likelihood. We calculate credibility intervals for the Bayesian methods and confidence intervals for the MLE method and focus particularly on the 95% credibility or confidence intervals. Algorithm 1 Iterative scheme for obtaining the parameters of the optimal density of 1. Initialize , Σ, , Σ 2. Cycle: 2.1 Cycle: ← + Σ11 1 and ← + Σ22 2 until the Newton-Raphson algorithm converges. 2.2 Calculate conditional occupancy probabilities for all sites where = 0 using Eq (5). Note that (sp) = 1 for all sites where . until the change in becomes negligible. (≤ 10−6) Algorithm 2 Iterative scheme for obtaining the parameters of the optimal density of 1. Initialize , Σ, , Σ, > 0 and b > 0. 2. Cycle: 2.1 Calculate the conditional occupancy probabilities for all sites where = 0 using Eq (6). Note that (sp) = 1 for all sites where . 2.2 Set , , and with 2.3 Calculate the ‘variational parameters’. Refer to S2 Appendix. until the change in becomes negligible. (≤ 10−6) The accuracy of the VB approximations to the posterior distribution obtained through MCMC is also assessed. This is undertaken by calculating . The acc(x) measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution. Occupancy models are often used to assess the predictive distribution of the proportion of occupied sites defined as . We thus investigate the posterior approximation of the PAO using the Laplace VB posterior approximation method. These can easily be obtained by sampling from the VB posterior distribution for each z in turn to construct the PAO statistic. To assess the VB approximations the acc(x) statistic was used. We consider 32 simulation settings. The number of sites (n) are set to 50 and 100 while the number of visits to each site (K) are set to 2, 3, 4 and 5 respectively. The following combinations of the regression coefficients were used: 1. = [0, 1.75], = [−1.85, 2.5]; 2. = [1.35, 1.75], = [−1.85, 2.5]; 3. = [0, 1.75] = [−0.1, 2.5] and 4. = [1.35, 1.75] = [−0.1, 2.5]. These parameter values ensure an approximate average detection and occupancy probability among the sites of (0.5, 0.3), (0.7, 0.3), (0.5, 0.5) and (0.5, 0.7) respectively. We have not considered any cases where the detection and occupancy probabilities are lower than 0.3 since in these cases data sets are expected to be very sparse which requires many site visits in order to undertake useful statistical inference [5]. The occupancy regression covariate was obtained by standardizing a Uniform(−2, 2) random variable while the detection covariate was obtained by standardizing a Uniform(−5, 5) random variable. Each of these variables were transformed to have a zero mean and a standard deviation of one. The following parameter vectors were used to specify the prior distribution of the parameters: , for = , . Each simulation setting was replicated 350 times. All calculations were undertaken using R 3.3.1 [27]. Numerical optimizations were performed using the the BFGS method of the R function optim; MCMC sampling was undertaken using the R package R2jags [28] in combination with JAGS 3.4.0 [29] while all variational approximations were performed using the authors’ code. 100000 posterior samples were obtained for each MCMC simulation. The first 25000 samples were discarded as burn-in samples while the remaining 75000 samples were retained. Prior experimentation using the MCMC algorithm indicated that 25000 iterations are enough to ensure that the Markov chains would converge to the stationary distributions. The posterior samples were not thinned [30].

2 Results

2.1 SIMULATION RESULTS

Table 1 contains a summary of some of the results of the simulation study. For each value of K we tabulate the median coverage probability and credibility/confidence interval width of and associated with the four estimation procedures considered here. The medians are calculated across the different occupancy and detection probability combinations for fixed values of n and K.

Table 1

The median coverage probability and credibility/confidence interval widths of the covariate effects for the single season occupancy model.

			K
			2		3		4		5
n	Parameters	Method	Coverage	Width	Coverage	Width	Coverage	Width	Coverage	Width
50	α₀	Laplace	0.929	2.006	0.944	1.619	0.956	1.393	0.962	1.221
		Tangent	0.793	1.51	0.838	1.22	0.841	1.05	0.854	0.93
		MLE	0.96	2.675	0.964	1.869	0.962	1.509	0.968	1.28
		MCMC	0.95	3.219	0.95	2.203	0.95	1.769	0.95	1.284
	α₁	Laplace	0.944	2.452	0.959	1.986	0.96	1.704	0.966	1.493
		Tangent	0.76	1.642	0.787	1.31	0.812	1.127	0.803	0.996
		MLE	0.962	2.926	0.97	2.215	0.968	1.81	0.963	1.55
		MCMC	0.95	3.295	0.95	2.407	0.95	2.02	0.95	1.635
	β₀	Laplace	0.936	1.994	0.952	2.019	0.963	2.032	0.964	2.022
		Tangent	0.75	1.345	0.821	1.348	0.845	1.349	0.841	1.349
		MLE	0.973	2.776	0.974	2.505	0.974	2.416	0.97	2.322
		MCMC	0.95	4.716	0.95	3.452	0.95	2.95	0.95	2.553
	β₁	Laplace	0.92	2.656	0.952	2.679	0.952	2.705	0.952	2.71
		Tangent	0.666	1.493	0.744	1.494	0.733	1.498	0.723	1.498
		MLE	0.954	4.11	0.963	3.578	0.964	3.419	0.969	3.28
		MCMC	0.95	6.684	0.95	5.41	0.95	4.319	0.95	3.822
100	α₀	Laplace	0.907	1.402	0.932	1.103	0.946	0.944	0.942	0.852
		Tangent	0.764	1.048	0.823	0.842	0.858	0.726	0.826	0.651
		MLE	0.96	1.784	0.96	1.24	0.954	1.008	0.95	0.88
		MCMC	0.95	2.01	0.95	1.445	0.95	0.986	0.95	0.924
	α₁	Laplace	0.949	1.728	0.942	1.351	0.949	1.143	0.953	1.033
		Tangent	0.76	1.126	0.768	0.905	0.806	0.778	0.79	0.696
		MLE	0.968	2.025	0.949	1.444	0.947	1.182	0.953	1.05
		MCMC	0.95	2.235	0.95	1.57	0.95	1.222	0.95	1.11
	β₀	Laplace	0.928	1.459	0.957	1.478	0.95	1.475	0.95	1.484
		Tangent	0.723	0.954	0.766	0.957	0.792	0.956	0.791	0.958
		MLE	0.96	1.931	0.968	1.729	0.968	1.638	0.96	1.607
		MCMC	0.95	2.528	0.95	2.038	0.95	1.81	0.95	1.792
	β₁	Laplace	0.919	1.945	0.944	1.979	0.947	1.969	0.95	2.01
		Tangent	0.623	1.052	0.654	1.058	0.71	1.055	0.67	1.06
		MLE	0.96	2.828	0.963	2.482	0.958	2.292	0.954	2.225
		MCMC	0.95	3.489	0.95	3.032	0.95	2.71	0.95	2.584

The medians are calculated across the different occupancy and detection probability combinations for fixed values of n and K. n represents the total number of sites visited while K represents the number of visits to each site. We highlight the method with the smallest credibility/confidence interval width as well the method (not considering the MCMC method) with the closest coverage probability to 0.95. We found that as the number of sites increased, the credibility (and confidence) interval widths of the true regression parameters decreased (for all methods). For a fixed number of sites, the credibility (and confidence) interval widths of the true parameter values decreased as K increased while the associated widths for the parameters did not appear to decrease noticeably with an increase in K. The simulation results suggests that the coverage probabilities associated with the Laplace method and the MLE methods (for all regression parameters) are very close to that of the nominal coverage value of 0.95 for K ≥ 3. It is evident from these results that the Tangent based method does not perform well under any of the scenarios considered and consistently produced the smallest credibility interval widths. Based on the accuracy calculations across the replicate data sets, the Tangent based method generally appears to be worst at approximating the marginal posterior distributions of both and when comparisons are made based on the median accuracy measure for these parameters. As an example of these simulation results, consider the scenario where the estimated mean detection and occupancy probability across all sites and revisits are both 0.5 (see Fig 1). In general, the posterior approximations for the detection regression parameters were quite good with median accuracy statistics greater than 0.8 even when the number of revisits are small. The accuracy statistics for the detection regression parameters dramatically increase as n and K increases with median accuracy statistics in excess of 0.95 when K = 5. Similar comments can be made regarding the accuracy of the posterior approximations for the occupancy regression parameters. In general, the accuracies increase with an increase in n and K however the rate of increase in the accuracy statistics for the occupancy regression parameters appears slower than those observed for the detection regression parameters. The box plots of the accuracy statistics associated with the different methods for the remaining cases can be found in the Supporting Information (see S1, S2 and S3 Figs).

Fig 1

Box plots of the accuracy measurements for the model parameters associated with the Laplace (L-dark boxes) and Tangent (T-light boxes) based method for number of sites n = 50, 100 and number of visits to each site K = 2, 5.

Box plots of the accuracy measurements for the model parameters associated with the Laplace (L-dark boxes) and Tangent (T-light boxes) based method for number of sites n = 50, 100 and number of visits to each site K = 2, 5.

The detection and occupancy probabilities are approximately 0:5. The accuracy of the VB approximations is measured by calculating The measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution. We found that the accuracy of the approximate predictive distributions for the proportion of occupied sites improves as K increases (see Fig 2). This observation is consistent across all of the scenarios considered. From an examination of the posterior predictive distributions (not shown here) it is evident that the VB predictive distributions are lighter tailed than the MCMC predictive distributions however this effect is reduced for K ≥ 3. This observation can clearly be seen when examining the results displayed in Tables 2 and 3. It is noticeable that the summary statistics of the predictive distributions using the two methods are very similar although the VB predictive distributions display a slightly reduced posterior variance under certain conditions.

Fig 2

Box plots of the accuracy measurements for the predictive distribution of the proportions of occupied sites associated with the Laplace method for number of sites n = 50, 100 and number of visits to each site K = 2; 3; 4 and 5.

Table 2

Summary statistics of the posterior predictive distributions of the proportion of sites occupied using the MCMC method and the VB method for different scenarios. (n = 50.).

		MCMC Method					Laplace Method
Case	K	Mean	Std	Median	2.5%	97.5%	Mean	Std	Median	2.5%	97.5%
d ≈ 0.5ψ ≈ 0.3	2	16.8	5.2	16	8	29	14.6	3.3	15	8	21
	3	15.6	3.6	15	10	24	14.6	2.9	14	9	20
	4	15.0	2.8	15	10	21	14.6	2.6	15	10	20
	5	14.9	2.6	15	10	20	14.7	2.5	15	10	20
d ≈ 0.7ψ ≈ 0.3	2	25.7	4.7	25	18	36	23.9	3.5	24	17	31
	3	25.0	3.6	25	18	33	24.1	3.1	24	18	30
	4	25.0	3.0	25	19	31	24.6	2.8	25	19	30
	5	24.7	2.8	25	20	30	24.5	2.7	24	19	30
d ≈ 0.5ψ ≈ 0.5	2	15.8	3.5	15	10	24	14.9	2.7	15	10	20
	3	14.8	2.5	15	10	20	14.5	2.4	14	10	20
	4	14.5	2.4	14	10	19	14.4	2.4	14	10	19
	5	14.6	2.4	15	10	19	14.6	2.4	15	10	19
d ≈ 0.7ψ ≈ 0.5	2	25.2	3.3	25	19	32	24.6	2.9	25	19	30
	3	24.6	2.8	25	20	30	24.4	2.7	24	19	30
	4	24.6	2.5	25	20	30	24.6	2.5	25	20	29
	5	24.7	2.5	25	20	29	24.6	2.5	25	20	29

Table 3

Summary statistics of the posterior predictive distributions of the proportion of sites occupied using the MCMC method and the VB method for different scenarios. (n = 100.).

		MCMC Method					Laplace Method
Case	K	Mean	Std	Median	2.5%	97.5%	Mean	Std	Median	2.5%	97.5%
d ≈ 0.5ψ ≈ 0.3	2	31.0	6.3	30	21	45	29.4	4.6	29	21	39
	3	30.4	4.6	30	22	40	29.7	4.1	30	22	37
	4	29.7	3.8	30	22	37	29.4	3.6	29	22	36
	5	29.7	3.6	30	22	37	29.5	3.5	29	22	36
d ≈ 0.7ψ ≈ 0.3	2	49.7	6.3	49	38	63	48.3	5.2	48	38	58
	3	49.6	4.9	50	40	59	48.9	4.6	49	40	58
	4	49.3	4.2	49	41	58	48.9	4.1	49	41	57
	5	49.1	3.9	49	41	57	48.9	3.9	49	41	56
d ≈ 0.5ψ ≈ 0.5	2	30.2	4.4	30	22	40	29.7	4.0	30	22	38
	3	29.6	3.5	30	23	36	29.5	3.4	30	23	36
	4	29.5	3.3	30	23	36	29.4	3.3	29	23	36
	5	29.5	3.3	29	23	36	29.5	3.2	29	23	36
d ≈ 0.7ψ ≈ 0.5	2	49.4	4.5	49	41	58	48.9	4.2	49	41	57
	3	49.2	3.8	49	42	57	49.0	3.8	49	42	56
	4	49.1	3.7	49	42	56	49.0	3.7	49	42	56
	5	49.0	3.7	49	41	56	49.0	3.7	49	41	56

Box plots of the accuracy measurements for the predictive distribution of the proportions of occupied sites associated with the Laplace method for number of sites n = 50, 100 and number of visits to each site K = 2; 3; 4 and 5.

The accuracy of the VB approximations is measured by calculating The measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution. Box plots of the accuracy measurements for the predictive distribution of the proportions of occupied sites associated with the Laplace method for number of sites n = 50, 100 and number of visits to each site K = 2, 3, 4 and 5. The accuracy of the VB approximations is measured by calculating . The measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution.

2.2 Application to real data sets

As examples of the proposed technique, we use detection-nondetection data extracted from the second Southern African Bird Atlas Project [31] database (see http://sabap2.adu.org.za/) for 2012 to compare the performance of different methods for fitting a single season occupancy model. The data were collected by citizen scientists using 5-minute latitude × 5-minute longitude rectangular grids across South Africa [31]. Each site is approximately 8 km × 7.6 km [8]. The citizen scientists were asked to make a list of all the species that they encountered during at least two hours of intense birding. They were allowed to add additional species to the list for up to five days. By providing information on the species that they encountered, the citizen scientists implicitly also provided information about the species they did not encounter. Hence, we extracted detection-nondetection data for five bird species (1. Black-headed heron (Ardea melanocephala), 2. Egyptian goose (Alopochen aegyptiaca), 3. orange-throated longclaw (Macronyx capensis), 4. white-browed sparrow-weaver (Plocepasser mahali) and 5. Long-tailed widowbird (Euplectes progne)) from this database, treating each check-list as an independent observation. We included all grid cells in and around Gauteng, South Africa, that contained a minimum of three site visits. Many of the sites were visited a large number of times but we limited the maximum number of site visits to five (since the focus of the analysis was to assess whether the VB techniques could be used to analyse studies which have relatively small sample sizes and low number of revisits per site). This restriction reduced the data sets to 123 sites; 50 of which had three surveys; 52 had four surveys and the remaining 21 sites had 5 surveys. In our analysis we specifically compare the MLE, MCMC and the VB methods where uninformative priors (as in the simulation study) were used for all parameters. We fitted a model with one detection covariate and one occupancy covariate. The detection covariate used was the number of species observed by the birder (denoted as nspp) while the occupancy probability was modelled as a function of the ratio of potential to realized evapotranspiration (AETdivPETs). AETdivPETs is a measure of vegetation cover and hydric stress and is an important predictor for bird species occurrence in South Africa [32]. Both covariates were standardized to have zero mean and unit variance. Maximum likelihood estimation was undertaken using the R package unmarked[33]; MCMC sampling was undertaken using the R package jagsUI [34] while all variational approximations were performed using the authors’ code. The R code used to perform the analysis (S1 Code), the data (S1 Data) as well as documentation regarding the VB code (S2 Code, S2 Data) can be found in the Supporting information. The MCMC estimation was undertaken as per the simulation study discussed previously. The approximate posterior means and standard deviations of the VB distributions were all close to the posterior means and standard deviations obtained using MCMC (see Table 4 and Fig 3). The regression coefficients are all positive and statistically significantly different from zero. From an examination of the predictive distributions of the PAO for the different species it is evident that the VB distributions can be used to obtain accurate approximations to the true predictive distributions of the PAO (see Fig 4 and Table 5). Notice that the accuracy statistics for all of the species considered were above 0.9.

Table 4

Parameter estimates of the single season occupancy models fitted using MLE, VB and MCMC.

		MLE		VB		MCMC
Species	Covariate	Est	SE	Mean	Std	Mean	Std	MCSE
Black headed Heron	Int (detection) nspp	0.412	0.123	0.413	0.113	0.408	0.124	<0.001
	Int (detection) nspp	0.381	0.128	0.381	0.123	0.386	0.128	<0.001
	Int (Occupancy) AETdivPETs	1.136	0.255	1.131	0.231	1.182	0.269	0.002
	Int (Occupancy) AETdivPETs	0.892	0.236	0.890	0.221	0.926	0.246	0.002
Egyptian Goose	Int (detection) nspp	0.738	0.124	0.739	0.117	0.737	0.125	<0.001
	Int (detection) nspp	0.745	0.135	0.745	0.131	0.751	0.135	<0.001
	Int (Occupancy) AETdivPETs	1.690	0.300	1.682	0.273	1.764	0.323	0.003
	Int (Occupancy) AETdivPETs	0.866	0.258	0.861	0.242	0.906	0.273	0.002
Orange throated longclaw	Int (detection) nspp	0.946	0.141	0.946	0.135	0.950	0.143	0.001
	Int (detection) nspp	0.607	0.159	0.607	0.154	0.618	0.160	0.001
	Int (Occupancy) AETdivPETs	0.668	0.238	0.665	0.229	0.690	0.243	0.001
	Int (Occupancy) AETdivPETs	1.410	0.268	1.406	0.258	1.462	0.279	0.002
White browed sparrow weaver	Int (detection) nspp	0.372	0.132	0.374	0.120	0.370	0.133	0.001
	Int (detection) nspp	0.280	0.132	0.280	0.127	0.284	0.132	<0.001
	Int (Occupancy) AETdivPETs	0.607	0.210	0.604	0.197	0.626	0.213	0.001
	Int (Occupancy) AETdivPETs	0.556	0.205	0.556	0.196	0.569	0.210	0.001
Long tailed widow bird	Int (detection) nspp	1.257	0.153	1.256	0.149	1.268	0.154	0.001
	Int (detection) nspp	0.727	0.168	0.726	0.166	0.738	0.169	0.001
	Int (Occupancy) AETdivPETs	0.746	0.245	0.743	0.241	0.763	0.251	0.002
	Int (Occupancy) AETdivPETs	1.633	0.293	1.628	0.288	1.693	0.302	0.002

The estimation results indicate that the VB method accurately estimates the posterior means of all of the parameters while the posterior variances are marginally underestimated.

Fig 3

A comparison between the VB distributions (solid line) and the posterior distributions obtained using MCMC (the histogram) for the regression parameters of the detection and occupancy process for the different bird species (denoted as (1) = Black-headed heron, (2) = Egyptian goose, (3) = Orange-throated longclaw, (4) = White-browed sparrow-weaver and (5) = Long-tailed widowbird).

Fig 4

Predictive distribution of the proportions of occupied sites using the VB Laplace method and the MCMC method for the different bird species.

The accuracy statistics (acc(x)) are displayed in brackets. The acc(x) measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution.

Table 5

Summary statistics of the posterior predictive distributions of the proportion of sites occupied using the MCMC method and the VB method for the five bird species considered.

Species	Method	Mean	Std	2.5%	97.5%
Black-headed	MCMC	89.7	2.05	86	94
Heron	VB	89.3	1.64	87	93
Egyptian	MCMC	100.2	1.78	97	104
goose	VB	99.9	1.47	97	103
Orange-throated	MCMC	77.4	1.23	76	80
longclaw	VB	77.2	1.05	76	80
White-browed	MCMC	78.2	1.99	75	83
sparrow-weaver	VB	77.9	1.63	75	81
Long-tailed	MCMC	78.5	0.74	78	80
widowbird	VB	78.5	0.67	78	80

The estimation results indicate that the VB method accurately estimates the posterior means of all of the parameters while the posterior variances are marginally underestimated. A comparison between the VB distributions (solid line) and the posterior distributions obtained using MCMC (the histogram) for the regression parameters of the detection and occupancy process for the different bird species (denoted as (1) = Black-headed heron, (2) = Egyptian goose, (3) = Orange-throated longclaw, (4) = White-browed sparrow-weaver and (5) = Long-tailed widowbird).

Predictive distribution of the proportions of occupied sites using the VB Laplace method and the MCMC method for the different bird species.

3 Discussion

We developed two new methods of approximating the posterior distribution of the parameters of a Bayesian single season occupancy model that use logistic link functions. The first method uses a Laplace approximation of the VB optimal distributions while the second method utilizes the tangent based method of [35]. Based on the simulation studies it was found that the Laplace approximation method performed well under most conditions considered. We believe that the approximation results obtained using the probit link function would be similar to those obtained using the tangent based method and thus did not explicitly consider this link function here. The methods have laid the groundwork that would enable VB methods to be applied to more complicated occupancy models and are currently the focus of ongoing research. One big advantage of the methods developed here is the fact that they could be applied to cases where the researcher has informative prior information and might not want to rely on the use of the MLE method. In that situation, Markov Chain Monte Carlo (MCMC) methods were so far the only methods available for fitting occupancy models in a Bayesian analysis. However, for big data sets, MCMC methods can be too slow to be useful. The code used to implement the methods is available in R and was at least 100 times faster than running MCMC using jagsUI in our example. Simulations showed that when uninformative prior distributions were used, in general, the Laplace method attains very similar frequentist coverage probabilities to those obtained by the MLE method when the number of sampling occasions is at least three. We advise that the approximate methods could be used when the detection probability is at least 0.5 and there are at least three sampling occasions. A further advantage of the methods developed here is the ease with which one can approximate the predictive distribution of the proportion of area occupied. Our simulation results showed that the Laplace approximate method can be used to obtain approximate distributions of the PAO. For scenarios where the detection probabilities are relatively low and the number of sites visits are small (K = 2) we found that the approximate methods slightly under estimate the upper bound of the PAO. The differences between the true predictive distribution and the approximate one is however very small for K ≥ 3. In both of the methods considered the approximate distributions derived are both multivariate Gaussian. When the sample size is particularly small, the number of sampling occasions is low (possibly one or two) or when the detection probability is low (less than 0.3) we have found that the posterior distributions of the parameters of the model are often skewed (particularly the occupancy covariate parameters). In these cases the approximate methods do not work well. Future work could entail the use of skew distributions similar to that proposed by [36].

Some notation and distribution theory used in the main part of the text.

(PDF) Click here for additional data file.

Derivation of the lower bound to the joint likelihood and the VB distributions.

(PDF) Click here for additional data file.

Derivation of the tangent based method.

(PDF) Click here for additional data file.

Explanation regarding the convergence calculations.

(PDF) Click here for additional data file. The detection probability is approximately 0.5 while the occupancy probability is approximately 0.3. (TIF) Click here for additional data file. The detection probability is approximately 0.7 while the occupancy probability is approximately 0.3. (TIF) Click here for additional data file. The detection probability is approximately 0.7 while the occupancy probability is approximately 0.5. (TIF) Click here for additional data file.

The R code used to undertake the analysis.

(R) Click here for additional data file.

How to use the VB Laplace approximation code.

(PDF) Click here for additional data file.

The R data file that contains the data used to undertake the analysis.

(RDATA) Click here for additional data file.

The R data file that contains the data used to explain how to use the VB Laplace approximation code.

(RDA) Click here for additional data file.

9 in total

9. Ignoring imperfect detection in biological surveys is dangerous: a response to 'fitting and interpreting occupancy models'.

Authors: Gurutzeta Guillera-Arroita; José J Lahoz-Monfort; Darryl I MacKenzie; Brendan A Wintle; Michael A McCarthy
Journal: PLoS One Date: 2014-07-30 Impact factor: 3.240

9 in total

A Variational Bayes Approach to the Analysis of Occupancy Models.

Introduction

1 Material and Methods

1.1 A brief introduction to Variational Bayes (VB)

1.2 VB applied to single season occupancy models

1.3 SIMULATION STUDY

2 Results

2.1 SIMULATION RESULTS

Box plots of the accuracy measurements for the model parameters associated with the Laplace (L-dark boxes) and Tangent (T-light boxes) based method for number of sites n = 50, 100 and number of visits to each site K = 2, 5.

Box plots of the accuracy measurements for the predictive distribution of the proportions of occupied sites associated with the Laplace method for number of sites n = 50, 100 and number of visits to each site K = 2; 3; 4 and 5.

2.2 Application to real data sets

Predictive distribution of the proportions of occupied sites using the VB Laplace method and the MCMC method for the different bird species.

3 Discussion

Some notation and distribution theory used in the main part of the text.

Derivation of the lower bound to the joint likelihood and the VB distributions.

Derivation of the tangent based method.

Explanation regarding the convergence calculations.

The R code used to undertake the analysis.

How to use the VB Laplace approximation code.

The R data file that contains the data used to undertake the analysis.

The R data file that contains the data used to explain how to use the VB Laplace approximation code.

1. Improved estimation of site occupancy using penalized likelihood.

2. A guide to eliciting and using expert knowledge in Bayesian ecological models.

3. Spatial occupancy models applied to atlas data show Southern Ground Hornbills strongly depend on protected areas.

4. A Bayesian state-space formulation of dynamic occupancy models.

5. Elicitation by design in ecology: using expert opinion to inform priors for Bayesian statistical models.

6. A variational Bayes spatiotemporal model for electromagnetic brain mapping.

7. Dynamic occupancy models for analyzing species' range dynamics across large geographic scales.

8. Fitting and interpreting occupancy models.

9. Ignoring imperfect detection in biological surveys is dangerous: a response to 'fitting and interpreting occupancy models'.