| Literature DB >> 31725922 |
Gleb Tikhonov1,2, Li Duan3, Nerea Abrego4, Graeme Newell5, Matt White5, David Dunson6, Otso Ovaskainen1,7.
Abstract
The ongoing global change and the increased interest in macroecological processes call for the analysis of spatially extensive data on species communities to understand and forecast distributional changes of biodiversity. Recently developed joint species distribution models can deal with numerous species efficiently, while explicitly accounting for spatial structure in the data. However, their applicability is generally limited to relatively small spatial data sets because of their severe computational scaling as the number of spatial locations increases. In this work, we propose a practical alleviation of this scalability constraint for joint species modeling by exploiting two spatial-statistics techniques that facilitate the analysis of large spatial data sets: Gaussian predictive process and nearest-neighbor Gaussian process. We devised an efficient Gibbs posterior sampling algorithm for Bayesian model fitting that allows us to analyze community data sets consisting of hundreds of species sampled from up to hundreds of thousands of spatial units. The performance of these methods is demonstrated using an extensive plant data set of 30,955 spatial units as a case study. We provide an implementation of the presented methods as an extension to the hierarchical modeling of species communities framework.Entities:
Keywords: Gaussian process; community modeling; ecological communities; hierarchical modeling of species communities; joint species distribution model; latent factors; spatial statistics
Mesh:
Year: 2019 PMID: 31725922 PMCID: PMC7027487 DOI: 10.1002/ecy.2929
Source DB: PubMed Journal: Ecology ISSN: 0012-9658 Impact factor: 5.499
Figure 2Ecological inference with Gaussian predictive process (GPP) and nearest‐neighbor Gaussian process (NNGP) models fitted to the full training data set. Panel (A) shows the spatial locations of observed sites (black), and 1,024 knots used in the biggest GPP model (magenta). Panels (B) and (C) show species association patterns, with red (respectively, blue) depicting species pairs that co‐occur more often (respectively, less often) based on the latent factor part of the hierarchical model of species communities (HMSC) model, and white color stands for the species pairs for which association sign was not credibly estimated at 95% threshold. Species ordering is the same in both panels and selected for enhanced visual clarity of association structure. Panels (D) and (E) visualize predicted spatial distribution of species richness, (F) and (G)—predicted occurrence probability of Acaena novae‐zelandiae; (H) and (I)—predicted regions of common profile, with nodes of 5 × 5 self‐organizing map mapped to YUV color space.
Figure 1Comparison of nonspatial, full Gaussian process (GP), Gaussian predictive process (GPP), and nearest‐neighbor Gaussian process (NNGP) models. Panels (A)–(C) show time elapsed for model fitting to small (), medium (), and large () species communities with using a hierarchical model of species communities (HMSC) Gibbs sampler with 10,000 Markov chain Monte Carlo (MCMC) iterations. Panels (D)–(F) depict the same results adjusted for the autocorrelation in the samples, showing the time required to obtain 1,000 effectively independent samples from the posterior. Panels (G)–(I) show predictive performance measured in terms of Tjur R 2 for models fitted, and panels (J)–(L) in terms of deviance. The colors indicate nonspatial models (gray), GP models (black), GPP models with 16, 64, 265, and 1,024 knots (gradation of blue from light to deep), the NNGP models with 10 and 20 neighbors (light and dark red). Note that because of very similar results, red and black lines often overlap. Panels (M) and (N) depict the predictive performance results with respect to number of factors. Dashed lines depict cases with species and solid lines cases with species; blue lines correspond to GPP with 64 knots and red lines correspond to NNGP with 10 neighbors.