| Literature DB >> 28904005 |
Shinichi Nakagawa1,2, Paul C D Johnson3, Holger Schielzeth4.
Abstract
The coefficient of determination R2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments.Entities:
Keywords: goodness of fit; heritability; model fit; reliability analysis; repeatability; variance decomposition
Mesh:
Year: 2017 PMID: 28904005 PMCID: PMC5636267 DOI: 10.1098/rsif.2017.0213
Source DB: PubMed Journal: J R Soc Interface ISSN: 1742-5662 Impact factor: 4.118
The observation-level variance for the three distributional families: quasi-Poisson, negative binomial and gamma with the three different methods for deriving : the delta method, lognormal approximation and the trigamma function, . when x follows gamma distribution. In the R environment, the function, trigamma can be used to obtain ; also note that ν is known as a shape parameter while κ is as a rate parameter in gamma distribution.
| family | distributional parameters | mean ( | link function | delta method | lognormal approximation | trigamma function |
|---|---|---|---|---|---|---|
| quasi-Poisson (QP) | log | |||||
| Poisson | square-root | 0.25 | — | |||
| negative binomial (NB) | log | |||||
| square-root | — | |||||
| gamma | log | |||||
| inverse | — | |||||
| gamma (alternative parameterization) | log | |||||
| inverse | — |
The distribution-specific (theoretical) variance and observation-level variance using the delta method for binomial (and Bernoulli) distributions; note that only one of them should be used for obtaining R2 and ICC. ‘erf−1’ is the inverse of the Gauss error function, which is often denoted as ‘erf’.
| family | distributional parameters, mean and variance | link name | link function | theoretical (distribution-specific) variance | observation-level variance (min. values and corresponding |
|---|---|---|---|---|---|
| binomial | binomial( | logit | |||
| probit | 1(standard normal distribution) | ||||
| cloglog |
Figure 1.A schematic of how hypothetical datasets are obtained (see the main text for details).
Parameter settings of regression coefficients (b) and variance components (σ2) for five datasets: (1) fecundity, (2) endoparasite, (3) size, (4) exploration and (5) morph; all parameters are set on the latent scale apart from the size data (see below).
| response | intercept ( | sex ( | treatment ( | habitat ( | population ( | container ( | overdispersion ( |
|---|---|---|---|---|---|---|---|
| fecundity: the number of eggs per female | 1.1 | — | 0.5 | 0.1 | 0.4 | 0.05 | 0.1 |
| parasite: the number of endoparasites per individual | 1.8 | −2 | −0.8 | 0.7 | 0.5 | 0.8 | — |
| size: the body length of an individuala | 15 | −3 | 0.4 | 0.15 | 1.3 | 0.3 | 1.2 |
| exploration: the time taken visiting five sectors for an individual | 4 | −1 | 2 | −0.5 | 0.2 | 0.2 | — |
| morph colour morph of a male | −0.8 | — | 0.8 | 0.5 | 1.2 | 0.2 | — |
aData for the six sets of models were simulated on the normal (Gaussian) scale but analysed assuming a gamma error structure with the log link so that estimations of these parameters will be on the log scale; note the overdispersion variance for this data is the residual variance.
Mixed-effects model analysis of a simulated dataset estimating variance components and regression slopes for nutrient manipulations on fecundity, endoparasite loads, body length, exploration levels and male morph types; N[population] = 12, N[container] = 120 and N[animal] = 960 (N[male] = N[female] = 480). 95% CI (confidence intervals) were calculated by the confint function in lme4. The observation-level variance was obtained by using the trigamma function. In the Morph models, both the observation-level variance and (theoretical) distribution-specific variance were used; note that ones in brackets use the distribution-specific variance for R2 and ICC. ICC[Container] is not a typical ‘repeatability’ but the proportion of variance due to the container effect beyond the population variance.
| fecundity models (log-link) | parasite models (log-link) | size models (log-link) | exploration models (log-link) | morph models (logit-link) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| model name | null model | full model | null model | full model | null model | full model | null model | full model | null model | full model |
| intercept | 1.630 [1.379, 1.882] | 1.261 [0.989, 1.532] | 0.766 [0.330, 1.202] | 1.752 [1.282, 2.223] | 2.682 [2.616, 2.689] | 2.737 [2.699, 2.775] | 4.752 [4.555, 4.949] | 4.056 [3.842, 4.269] | −0.108 [−0.718, 0.501] | −0.740 [−1.450, −0.030] |
| treatment (experiment) | — | 0.491 [0.391, 0.591] | — | −0.768 [−0.870, −0.667] | — | 0.033 [0.023, 0.044] | — | 2.007 [1.965, 2.050] | — | 0.840 [0.422, 1.258] |
| habitat (wet) | — | 0.152 [0.055, 0.249] | — | 0.700 [0.599, 0.801] | — | 0.009 [−0.001, 0.019] | — | −0.560 [−0.603, −0.518] | — | 0.414 [0.002, 0.826] |
| sex (male) | — | — | — | −2.198 [−2.511, −1.884] | — | −0.213 [−0.230, −0.196] | — | −1.105 [−1.256, −0.955] | — | — |
| population | 0.178 | 0.187 | 0.375 | 0.541 | 0.0026 | 0.0039 | 0.071 | 0.104 | 1.002 | 1.111 |
| container | 0.042 | 0.059 | 1.976 | 0.613 | 0.0140 | 0.0014 | 0.364 | 0.163 | 0.136 | 0.186 |
| observation-level (distribution-specific) | 0.477 | 0.349 | 0.873 | 0.397 | 0.0069 | 0.0064 | 1.664 | 0.118 | 4.010 (3.290) | 4.010 (3.290) |
| fixed factors | — | 0.066 | — | 1.479 | — | 0.0116 | — | 1.393 | — | 0.220 |
| — | 9.96 | — | 48.50 | — | 49.54 | — | 78.34 | — | 3.98 (4.57) | |
| — | 46.95 | — | 86.33 | — | 72.52 | — | 93.34 | — | 27.46 (31.55) | |
| ICC[Population] (%) | 25.33 | 31.30 | 11.53 | 34.44 | 11.38 | 33.17 | 3.40 | 26.94 | 19.48 (22.64;) | 20.95 (24.23) |
| ICC[Container] (%) | 5.94 | 9.79 | 60.80 | 39.02 | 59.57 | 12.37 | 17.34 | 42.34 | 2.64 (3.07;) | 3.50 (4.05) |
| AIC | 2498.8 | 2412.3 | 4342.6 | 3920.5 | 3379.9 | 3139.5 | 11223.8 | 9004.3 | 605.5 | 589.6 |