| Literature DB >> 29844961 |
Xavier A Harrison1, Lynda Donaldson2,3, Maria Eugenia Correa-Cano2, Julian Evans4,5, David N Fisher4,6, Cecily E D Goodwin2, Beth S Robinson2,7, David J Hodgson4, Richard Inger2,4.
Abstract
The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.Entities:
Keywords: AIC; Collinearity; GLMM; Mixed effects models; Model averaging; Model selection; Multi-model inference; Overdispersion; Random effects; Type I error
Year: 2018 PMID: 29844961 PMCID: PMC5970551 DOI: 10.7717/peerj.4794
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Differences between Random Intercept vs Random Slope Models.
(A) A random-intercepts model where the outcome variable y is a function of predictor x, with a random intercept for group ID (coloured lines). Because all groups have been constrained to have a common slope, their regression lines are parallel. Solid lines are the regression lines fitted to the data. Dashed lines trace the regression lines back to the y intercept. Point colour corresponds to group ID of the data point. The black line represents the global mean value of the distribution of random effects. (B) A random intercepts and random slopes model, where both intercepts and slopes are permitted to vary by group. Random slope models give the model far more flexibility to fit the data, but require a lot more data to obtain accurate estimates of separate slopes for each group.
Figure 2The effect of collinearity on model parameter estimates.
We simulated 10,000 iterations of a model y ∼ x1 + x2, where x1 had a positive effect on y (β = 1, vertical dashed line). x2 is collinear with x1 with either a moderate (r = 0.5). (A) or strong correlation (r = 0.9). (B) With moderate collinearity, estimation of β is precise, but certainty of the sign of β is low. When collinearity is strong, estimation of β is far less precise, with 14% of simulations estimating a negative coefficient for the effect of x1. For more elaborate versions of these simulations, see Freckleton (2011).
Figure 3Using Simulation to Assess Model Fit for GLMMs.
(A) Histogram of the proportion of zeroes in 10,000 datasets simulated from a Poisson GLMM. Vertical red line shows the proportion of zeroes in our real dataset. There is no strong evidence of zero-inflation for these data. (B) Histogram of the sum of squared Pearson residuals for 1,000 parametric bootstraps where the Poisson GLMM has been re-fitted to the data at each step. Vertical red line shows the test statistic for the original model, which lies well outside the simulated frequency distribution. The ratio of the real statistic to the simulated data can be used to calculate a mean dispersion statistic and 95% confidence intervals, which for these data is mean 3.16, (95% CI [2.77–3.59]). Simulating from models provides a simple yet powerful set of tools for assessing model fit and robustness.