| Literature DB >> 32953266 |
Lélis A Carlos-Júnior1,2,3, Joel C Creed4, Rob Marrs2, Rob J Lewis5, Timothy P Moulton4, Rafael Feijó-Lima1,6, Matthew Spencer2.
Abstract
BACKGROUND: Ecological communities tend to be spatially structured due to environmental gradients and/or spatially contagious processes such as growth, dispersion and species interactions. Data transformation followed by usage of algorithms such as Redundancy Analysis (RDA) is a fairly common approach in studies searching for spatial structure in ecological communities, despite recent suggestions advocating the use of Generalized Linear Models (GLMs). Here, we compared the performance of GLMs and RDA in describing spatial structure in ecological community composition data. We simulated realistic presence/absence data typical of many β-diversity studies. For model selection we used standard methods commonly used in most studies involving RDA and GLMs.Entities:
Keywords: Beta diversity; Moran’s Eigenvector Maps (MEMs); Redundancy Analysis (RDA); Spatial analysis; Spatial ecology; Statistical modelling
Year: 2020 PMID: 32953266 PMCID: PMC7474884 DOI: 10.7717/peerj.9777
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Schematic diagram of the main steps used in this study to simulate community presence/absence data with pre-defined spatial structure.
Data acquisition (I): We used real data from marine, terrestrial and freshwater communities and their respective sampling site coordinates as our baseline datasets. Obtaining response and predictor matrices (II): Those datasets were used to construct a response matrix of presence/absence data Y (1) and a matrix X of spatial explanatory variables called MEMs. The spatial variables were obtained from a pairwise site-by-site distance matrix A(2) and a connectivity matrix B (3) describing the spatial relationship among sites (see main text for specific decisions for each dataset). The Hadamard product of these two matrices generates the spatial weighting matrix W (4), which is then doubly centred and diagonalised, yielding eigenvectors to be used as spatial variables, represented below by matrix X. Obtaining realistic coefficients for spatial variables (III): From a Generalized Linear Model (GLMs) for the relationship between Y and X (5) we obtained a matrix C of realistic regression coefficients (6). Using non-zero coefficients to model new presence/absence data with pre-defined spatial structure (IV): We sampled different numbers of non-zero coefficients from C under 14 distinct scenarios (see main text) to build a new matrix C* and then left-multiplied C* by X (7) to obtain matrix Y*. This matrix represented the logit predicted probabilities of presence or a matrix of log abundances, depending on which of two models that differed, respectively, in assumptions regarding absences as real (simulated presence model, SPM) or artifacts derived from poor sampling (SAM). From Y* we estimated (8) new presence/absence data Y* containing the spatial structure defined by C*. Using GLM/AIC and RDA/FW to select spatial models using the simulated presence/absence data (V): Finally, we regressed Y* against X using the GLM/AIC and RDA/FW frameworks (9) to assess which MEMs would be correctly selected by those two methods. The performance of each method was mainly assessed by the proportion of MEM variables that were correctly included or excluded from final models by each method (10).
Simulation scenarios for the three datasets as described in main text.
Distribution of MEM variables with non-zero coefficient under each simulation scenario in all three datasets (A = marine algae from Ilha Grande Bay, m = 16; B = Scotland grasslands, m = 30; C = freshwater insects, m = 12). Rows and columns define all simulation scenarios regarding the number of variables to be used and their position. Rows represent the number of non-zero variables to be included based on set K (see main text), whereas columns define the scaling of these non-zero variables, i.e. position to which those non-zero variables would be assigned. Scaling 1 assigned non-zero coefficients only to MEMs associated with larger eigenvalues representing broader spatial scales. Scaling 2 assigned non-zero coefficients only to MEMs associated with smaller eigenvalues, representing finer spatial scales. Scaling 3 assigned non-zero coefficients to MEMs representing a range of spatial scales. Cells contain sets of indices of explanatory variables. When nVar=0, none of the variables had non-zero coefficients.
| Scaling | |||||
|---|---|---|---|---|---|
| 1 (only broad) | 2 (only fine) | 3 (mixed) | |||
| (A) | 0 | None | – | – | |
| ⌊ | {1, 16} | ||||
| ⌊ | |||||
| ⌊ | |||||
| ⌊3 | |||||
| – | – | ||||
| (B) | 0 | None | – | – | |
| ⌊ | {1, 2, 3, 29, 30} | ||||
| ⌊ | |||||
| ⌊ | |||||
| ⌊3 | |||||
| – | – | ||||
| (C) | |||||
| 0 | None | – | – | ||
| ⌊ | {1, 12} | ||||
| ⌊ | {9, 10, 11, 12} | {1, 2, 11, 12} | |||
| ⌊ | {1, 2, …, 6} | {7, 8, …, 12} | {1, 2, 3, 10, 11, 12} | ||
| ⌊3 | {1, 2, …, 9} | {4, 5, …, 12} | {1, 2, 3, 4, 5, 9, 10, 11, 12} | ||
| {1, 2, …, 12} | – | – | |||
Figure 2Overall performance comparison between GLM/AIC (blue) and RDA/FW (red) methods on simulated presence/absence data.
Scores were measured by counting the percentage of MEMs correctly included/excluded from the final model out of the total number of variables in each dataset (1 = 16, 2 = 30, 3 = 2). This comparison was made across varying numbers of MEMs with non-zero coefficients (x axis). (A, B) simulated data based on subtidal macroalgae in Ilha Grande Bay; (C, D) data based on plant species from Scottish grassland and (E, F) data based on aquatic macroinvertebrate insect species from a river in Brazil. A, C and E depict results where community presence/absence data was simulated directly from real coefficients (SPM, see main text) whereas B, D and F show simulation results where presence/absence data was estimated from expected abundances (SAM).
Figure 3Differences in performance between GLM/AIC and RDA/FW frameworks regarding the proportion of incorrect inclusions/exclusions of explanatory variables across 1,000 simulations for each method.
Panels A, C and E depict results where community presence/absence data was simulated directly from real coefficients (SPM, see main text) whereas B, D and F show simulation results where presence/absence data was estimated from expected abundances (SAM). Panels A and B depict results for simulated data based on subtidal macroalgae in Ilha Grande Bay; C and D represent data based on plant species from Scottish grassland; and E and F represent data based on aquatic macroinvertebrate insect species from a river in Brazil. Darker lines represent mean values.
Figure 4Performance of GLM/AIC (blue) and RDA/FW (red) modelling approaches under variation in spatial scales of MEMs with non-zero coefficients.
Spatial scale was defined as broad (1), fine (2) or mixed (3) (where applicable). (A, B) simulated data based on macroalgae in Ilha Grande Bay; (C, D) data based on plant species from Scottish grassland and (E, F) data based on aquatic macroinvertebrate insect species from a river in Brazil. (A, C and E) depict results where community presence/absence data was simulated direclty from real coefficients (SPM) whereas (B, D and F) show simulation results where presence/absence data was estimated from expected abundances (SAM, see main text).