| Literature DB >> 28097076 |
Cajo J F Ter Braak1, Pedro Peres-Neto2, Stéphane Dray3.
Abstract
Statistical testing of trait-environment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on species-site combinations. A number of correlation-based methods, such as the community weighted trait means method (CWM), the fourth-corner correlation method and the multivariate method RLQ, have been proposed to estimate such trait-environment associations. In these methods, valid statistical testing proceeds by performing two separate resampling tests, one site-based and the other species-based and by assessing significance by the largest of the two p-values (the pmax test). Recently, regression-based methods using generalized linear models (GLM) have been proposed as a promising alternative with statistical inference via site-based resampling. We investigated the performance of this new approach along with approaches that mimicked the pmax test using GLM instead of fourth-corner. By simulation using models with additional random variation in the species response to the environment, the site-based resampling tests using GLM are shown to have severely inflated type I error, of up to 90%, when the nominal level is set as 5%. In addition, predictive modelling of such data using site-based cross-validation very often identified trait-environment interactions that had no predictive value. The problem that we identify is not an "omitted variable bias" problem as it occurs even when the additional random variation is independent of the observed trait and environment data. Instead, it is a problem of ignoring a random effect. In the same simulations, the GLM-based pmax test controlled the type I error in all models proposed so far in this context, but still gave slightly inflated error in more complex models that included both missing (but important) traits and missing (but important) environmental variables. For screening the importance of single trait-environment combinations, the fourth-corner test is shown to give almost the same results as the GLM-based tests in far less computing time.Entities:
Keywords: Community composition; Compositional count data; Fourth-corner problem; Generalized linear models; Log-linear model; Negative-binomial response; Poisson regression; Trait-environment association
Year: 2017 PMID: 28097076 PMCID: PMC5237366 DOI: 10.7717/peerj.2885
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Type I error rates of the anova.traitglm test (site-based bootstrap approach based on the negative-binomial likelihood) and the max r/c test (pmax test based on the Poisson likelihood) for 1,000 simulated negative-binomial data sets using the Gaussian response model and n = m =30.
Value in bold represents the inflated error compared to the nominal level of 0.05.
| Test procedure | Scenario | ||
|---|---|---|---|
| Trait random ( | Environment random ( | Both random ( | |
| anova.traitglm | 0.055 | 0.045 | |
| max r/c | 0.047 | 0.055 | 0.019 |
Figure 1Type I error rate of the four tests of ‘Statistical tests on trait-environment interaction’ on the trait-environment (t–e) interaction in the log-linear simulation model of Eq. (5) in relation to the size of the z–e nuisance interaction (of e with a latent trait z): anova.traitglm (site-based bootstrap with negative-binomial likelihood) and sites (site-based permutation with Poisson likelihood), species (species-based permutation with Poisson likelihood), and max r/c (GLM based pmax test that combines the sites and species tests).
The t–x nuisance interaction (of t with a nuisance environmental variable x) is either absent (solid lines) or equal to the size of the z–e nuisance interaction (dashed lines). The vertical scale is logarithmic. The data were generated using a negative-binomial distribution with variance function . The horizontal line at 0.05 indicates the nominal significance threshold; error rates above the dotted line (at 0.064) are significantly greater than 0.05.
Figure 2Power of the max r/c test (GLM based pmax test) on the trait-environment (t–e) interaction, added to the log-linear simulation model of Eq. (5), in relation the size of the interaction of interest with colours for various sizes of the z–e nuisance interaction (of e with a latent trait z) and linetypes as in Fig. 1.
The vertical scale is logarithmic. The data were generated using a negative-binomial distribution with variance function , whereas the test statistic was based on the Poisson likelihood. The horizontal line at 0.05 indicates the nominal significance threshold; rates above the dotted line (at 0.064) are significantly greater than 0.05.