| Literature DB >> 31042745 |
Jenni Niku1, Wesley Brooks2, Riki Herliansyah3, Francis K C Hui4, Sara Taskinen1, David I Warton2,5.
Abstract
Generalized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods. For likelihood based estimation, several closed form approximations for the marginal likelihood of GLLVMs have been proposed, but their efficient implementations have been lacking in the literature. To fill this gap, we show in this paper how to obtain computationally convenient estimation algorithms based on a combination of either the Laplace approximation method or variational approximation method, and automatic optimization techniques implemented in R software. An extensive set of simulation studies is used to assess the performances of different methods, from which it is shown that the variational approximation method used in conjunction with automatic optimization offers a powerful tool for estimation.Entities:
Mesh:
Year: 2019 PMID: 31042745 PMCID: PMC6493759 DOI: 10.1371/journal.pone.0216129
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Median computation times for negative binomial GLLVMs.
Times for the plain R (gray) and the TMB implementations (black) for the variational approximation (VA, solid line) method and the Laplace approximation (LA, dashed line) method for a negative binomial GLLVM with two covariates and two latent variables. The simulation setup was based on testate amoebae data.
Average biases, root mean squared errors (RMSE), coverage probabilities of 95% confidence intervals and mean confidence interval widths (CI) for negative binomial GLLVM estimates based on the plain R and the TMB implementations for the variational approximation and the Laplace approximation methods.
The true model parameters were obtained by fitting a negative binomial GLLVM with two environmental covariates for the testate amoebae data with counts of m = 48 species recorded at n = 50, 120, 190 and 260 sites. Parameter β0 refers to the species specific intercepts, β and β to the coefficients of water pH and water temperature and log ϕ to the log transformed dispersion parameters.
| VA- | LA- | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Bias | RMSE | Cover | CI | Bias | RMSE | Cover | CI | ||
| 50 | -0.32 | 0.85 | 0.94 | 3.09 | -0.92 | 2.24 | 0.93 | 5.14 | |
| -0.03 | 0.63 | 0.95 | 2.44 | 0.01 | 0.90 | 0.95 | 2.94 | ||
| 0.02 | 0.73 | 0.93 | 2.76 | -0.05 | 0.97 | 0.93 | 3.31 | ||
| log | -0.38 | 0.67 | 0.92 | 2.35 | -2.80 | 5.12 | 0.95 | 76.72 | |
| 120 | -0.05 | 0.49 | 0.94 | 1.78 | -0.33 | 0.99 | 0.95 | 2.53 | |
| -0.04 | 0.40 | 0.95 | 1.55 | -0.01 | 0.46 | 0.95 | 1.67 | ||
| 0.02 | 0.37 | 0.96 | 1.48 | 0.00 | 0.46 | 0.96 | 1.65 | ||
| log | -0.06 | 0.36 | 0.94 | 1.48 | -0.59 | 1.57 | 0.95 | 5.13 | |
| 190 | 0.03 | 0.40 | 0.92 | 1.36 | -0.19 | 0.62 | 0.96 | 1.80 | |
| -0.04 | 0.32 | 0.95 | 1.20 | -0.01 | 0.34 | 0.95 | 1.27 | ||
| 0.01 | 0.30 | 0.97 | 1.24 | 0.00 | 0.36 | 0.96 | 1.34 | ||
| log | 0.02 | 0.30 | 0.93 | 1.16 | -0.24 | 0.62 | 0.95 | 1.81 | |
| 260 | 0.07 | 0.36 | 0.91 | 1.15 | -0.13 | 0.46 | 0.96 | 1.46 | |
| -0.04 | 0.27 | 0.96 | 1.05 | -0.02 | 0.29 | 0.96 | 1.10 | ||
| 0.01 | 0.25 | 0.97 | 1.05 | 0.01 | 0.29 | 0.97 | 1.11 | ||
| log | 0.06 | 0.28 | 0.91 | 0.99 | -0.15 | 0.36 | 0.95 | 1.24 | |
| VA-R | LA-R | ||||||||
| 50 | -0.31 | 0.85 | 0.95 | 3.15 | -0.94 | 2.34 | 0.84 | 4.60 | |
| -0.03 | 0.63 | 0.95 | 2.48 | -0.00 | 0.86 | 0.72 | 2.18 | ||
| 0.02 | 0.73 | 0.94 | 2.80 | -0.05 | 0.98 | 0.67 | 2.19 | ||
| log | -0.38 | 0.67 | 0.93 | 2.42 | -1.44 | 2.39 | 0.51 | 3.27 | |
| 120 | -0.05 | 0.49 | 0.95 | 1.79 | -0.31 | 0.97 | 0.89 | 2.17 | |
| -0.04 | 0.40 | 0.95 | 1.56 | -0.02 | 0.48 | 0.79 | 1.54 | ||
| 0.02 | 0.37 | 0.96 | 1.49 | 0.00 | 0.46 | 0.81 | 1.61 | ||
| log | -0.06 | 0.36 | 0.95 | 1.49 | -0.40 | 0.86 | 0.56 | 0.85 | |
| 190 | 0.03 | 0.40 | 0.92 | 1.37 | -0.18 | 0.60 | 0.91 | 1.55 | |
| -0.04 | 0.32 | 0.95 | 1.20 | -0.02 | 0.39 | 0.77 | 1.22 | ||
| 0.01 | 0.30 | 0.97 | 1.24 | -0.00 | 0.39 | 0.79 | 1.30 | ||
| log | 0.02 | 0.30 | 0.93 | 1.17 | -0.21 | 0.48 | 0.58 | 0.63 | |
| 260 | 0.07 | 0.36 | 0.91 | 1.15 | -0.12 | 0.45 | 0.89 | 1.26 | |
| -0.04 | 0.27 | 0.96 | 1.05 | -0.03 | 0.39 | 0.71 | 1.04 | ||
| 0.01 | 0.25 | 0.97 | 1.05 | 0.01 | 0.34 | 0.77 | 1.11 | ||
| log | 0.06 | 0.28 | 0.91 | 0.99 | -0.13 | 0.35 | 0.59 | 0.53 | |
Scaled mean Procrustes errors of predicted latent variables and estimated latent variable loadings for negative binomial GLLVM estimates based on the plain R and the TMB implementations for the variational approximation and the Laplace approximation methods.
The true model parameters were obtained by fitting a negative binomial GLLVM for the testate amoebae data with counts of m = 48 species recorded at n = 50, 120, 190 and 260 sites.
| VA- | LA- | VA-R | LA-R | |||||
|---|---|---|---|---|---|---|---|---|
| LVs | Loadings | LVs | Loadings | LVs | Loadings | LVs | Loadings | |
| 50 | 0.256 | 0.346 | 0.296 | 0.497 | 0.256 | 0.347 | 0.328 | 0.489 |
| 120 | 0.198 | 0.198 | 0.208 | 0.296 | 0.198 | 0.198 | 0.219 | 0.276 |
| 190 | 0.185 | 0.147 | 0.189 | 0.213 | 0.185 | 0.148 | 0.213 | 0.195 |
| 260 | 0.177 | 0.118 | 0.179 | 0.150 | 0.177 | 0.119 | 0.216 | 0.135 |
Median VE values of negative binomial GLLVMs for 500 simulated datasets using the plain R and the TMB implementations for the variational approximation and the Laplace approximation methods.
The datasets were based on a negative binomial GLLVM fitted for the testate amoebae data with counts of m = 48 species recorded at n = 50, 120, 190 and 260 sites.
| VA- | LA- | VA-R | LA-R | |
|---|---|---|---|---|
| 50 | 0.27 | 0.19 | 0.27 | 0.21 |
| 120 | 0.48 | 0.43 | 0.42 | 0.29 |
| 190 | 0.53 | 0.50 | 0.53 | 0.35 |
| 260 | 0.56 | 0.54 | 0.56 | 0.39 |
Fig 2Median computation times for Bernoulli GLLVMs.
Times for the plain R (gray) and the TMB implementations (black) for the variational approximation (VA, solid line) method and the Laplace approximation (LA, dashed line) method for a Bernoulli GLLVM with two latent variables. The left plot is for the model without row effects and right one with random row effects. The simulation setup was based on the Indonesian birds data.
Average biases, root mean squared errors (RMSEs), coverage probabilities of 95% confidence intervals and mean confidence intervals widths (CI) for GLLVM estimates based on the plain R and the TMB implementations for the variational approximation and the Laplace approximation methods.
The true model parameters were obtained by fitting a Bernoulli GLLVM with probit link function for the Indonesian birds data with presence-absences of m = 30, 60, 100 and 140 species recorded at n = 37 sites.
| VA- | LA- | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Bias | RMSE | Cover | CI | Bias | RMSE | Cover | CI | ||
| 30 | 0.05 | 0.29 | 0.93 | 1.27 | -4.43 | 18.24 | 0.73 | 5.22 | |
| 60 | -0.03 | 0.30 | 0.98 | 1.55 | -0.22 | 7.77 | 0.89 | 5.23 | |
| 100 | -0.03 | 0.35 | 0.96 | 1.55 | -0.05 | 5.37 | 0.92 | 3.19 | |
| 140 | -0.03 | 0.39 | 0.96 | 1.57 | -0.04 | 1.04 | 0.92 | 2.07 | |
| VA-R | LA-R | ||||||||
| 30 | 0.05 | 0.29 | 0.93 | 1.27 | -0.01 | 0.46 | 0.81 | 1.31 | |
| 60 | -0.03 | 0.30 | 0.98 | 1.54 | -0.14 | 0.67 | 0.83 | 1.57 | |
| 100 | -0.03 | 0.35 | 0.96 | 1.55 | -0.12 | 0.95 | 0.84 | 1.69 | |
| 140 | -0.03 | 0.39 | 0.96 | 1.56 | -0.10 | 0.94 | 0.83 | 1.49 | |
Scaled mean Procrustes errors of predicted latent variables and estimated latent variable loadings for GLLVM estimates based on the plain R and the TMB implementations for the variational approximation and the Laplace approximation methods.
Values are scaled with the number of sites and number of species for comparisons. The true model parameters were obtained by fitting a Bernoulli GLLVM with probit link function for the Indonesian birds data with presence-absences of m = 30, 60, 100 and 140 species recorded at n = 37 sites.
| VA- | LA- | VA-R | LA-R | |||||
|---|---|---|---|---|---|---|---|---|
| LVs | Loadings | LVs | Loadings | LVs | Loadings | LVs | Loadings | |
| 30 | 0.556 | 0.122 | 0.615 | 0.140 | 0.556 | 0.122 | 0.615 | 0.173 |
| 60 | 0.185 | 0.098 | 0.204 | 0.160 | 0.185 | 0.098 | 0.204 | 0.141 |
| 100 | 0.129 | 0.095 | 0.144 | 0.130 | 0.129 | 0.095 | 0.144 | 0.139 |
| 140 | 0.098 | 0.091 | 0.109 | 0.121 | 0.098 | 0.091 | 0.109 | 0.126 |
Median VE values of Bernoulli GLLVMs for 500 simulated datasets using the plain R and the TMB implementations for the variational approximation and the Laplace approximation methods.
The datasets were based on a Bernoulli GLLVM with probit link function fitted for the Indonesian birds data with presence-absences of m = 30, 60, 100 and 140 species recorded at n = 37 sites.
| VA- | LA- | VA-R | LA-R | |
|---|---|---|---|---|
| 30 | 0.23 | 0.08 | 0.24 | 0.08 |
| 60 | 0.30 | 0.28 | 0.30 | 0.26 |
| 100 | 0.34 | 0.30 | 0.31 | 0.31 |
| 140 | 0.36 | 0.35 | 0.36 | 0.36 |
Fig 3Differences in log-likelihood value when strategies res, zero and random are compared to res3.
The true models were based on negative binomial GLLVM fitted for the Testate amoebae data with n = 260 sites and Bernoulli GLLVM fitted for the Indonesian bird data with m = 140 species. A negative value means that performance of the corresponding starting value strategy is worse than that of res3. Notice that columns have different scales.