Literature DB >> 28638204

Modern methods for longitudinal data analysis, capabilities, caveats and cautions.

Lin Ge¹, Justin X Tu², Hui Zhang³, Hongyue Wang¹, Hua He⁴, Douglas Gunzler⁵.

Abstract

Longitudinal studies are used in mental health research and services studies. The dominant approaches for longitudinal data analysis are the generalized linear mixed-effects models (GLMM) and the weighted generalized estimating equations (WGEE). Although both classes of models have been extensively published and widely applied, differences between and limitations about these methods are not clearly delineated and well documented. Unfortunately, some of the differences and limitations carry significant implications for reporting, comparing and interpreting research findings. In this report, we review both major approaches for longitudinal data analysis and highlight their similarities and major differences. We focus on comparison of the two classes of models in terms of model assumptions, model parameter interpretation, applicability and limitations, using both real and simulated data. We discuss caveats and cautions when applying the two different approaches to real study data.

Entities: Chemical Disease Gene Species

Keywords: R; SAS; binary variables; correlated outcomes; generalized linear mixed-effects models; latent variable models; weighted generalized estimating equations; 二分类变量; 加权广义估计方程; 广义线性混合效应模型; 潜变量模型; 相关结果

Year: 2016 PMID： 28638204 PMCID： PMC5434286 DOI： 10.11919/j.issn.1002-0829.216081

Source DB: PubMed Journal: Shanghai Arch Psychiatry ISSN： 1002-0829

1. Introduction

Longitudinal study designs have become increasingly popular in research and practice across all disciplines. Such designs capture both between-individual differences and within-subject dynamics, providing opportunities to study complex biological, psychological and behavioral changes over time such as causal treatment effects and mechanisms of change [. Since longitudinal study designs create serial correlations over repeated assessments from same subjects, traditional statistical methods for cross-sectional data analysis such as linear and logistic regression do not apply. In addition, since longitudinal studies are typically of long duration, missing data is common. Specialized models and methods must be used to address the two major issues. The two dominant approaches for longitudinal data analysis are the generalized linear mixed-effects model (GLMM) and weighted generalized estimating equations (WGEE)[. Both methods are derived from the same class of models for cross-sectional data, the generalized linear models (GLM). Because different techniques are used to extend the GLM to longitudinal data, the GLMM and WGEE are quite different and some of the differences carry significant implications for their applicability and interpretation of study findings. Despite the existence of an extremely large body of literature discussing the development and application of the two approaches, practitioners still often face many difficult questions when choosing and applying such models to real study data. For example, which approach is applied given data from a study? Do the two approaches yield identical estimates and/or inferences? If not, how should one approach and interpret such differences? What are the pros and cons associated with each approach? Although some questions have well-documented answers in the literatures, others have only been approached recently and still await answers. In this report, we first give an overview of the approaches and then discuss major differences between the two classes of models. Unlike the literature on the discussion of the two methods, we focus on their practical implications, which we think provide useful guidance for practitioners for selecting right approaches for their studies and effectively addressing their study questions.

2. Models for Longitudinal Data

Since both the GLMM and WGEE are extensions of the GLM, we start with a brief overview of the latter.

2.1 Generalized Linear Models (GLM)

Consider a sample of n subjects and let Yi (Xi) denote a continuous response (a vector of explanatory variables). The classic linear model is given by: where N(μ, σ2) denotes a normal distribution with mean μ and σ2 variance. The linear model is widely used in research and practice. One major limitation is that it only applies to continuous response Yi. The generalized linear models (GLM) extend the classic linear model to non-continuous response such as binary. To express the GLM, we first rewrite the linear regression in (1) as where y | X denotes the conditional distribution of Yi given Xi and E(y |x) denotes the conditional mean of Yi given Xi. By replacing the normal in (2) with other distributions appropriate for the type of response, we obtain the class of GLM: where f(μ) denotes some distribution with mean μ and g(μ) is a function of μ. Since g(μ) links the mean to the explanatory variables, g(μ) is called the link function. The specification of f(μ) and g(μ) depend on the type of response Yi. For a binary Yi, f(μ) is the Bernoulli distribution and g(μ) is often set as the function logit . The resulting GLM is the logistic regression. For a count Yi, a popular choice for f(μ) is the Poisson distribution and g(μ) is the log function, g(μ) =log(μ). Another choice for f(μ) is the negative binomial. A major limitation of Poisson is that its variance is the same as its mean. Count responses arising in many real studies often have variances larger than means, a phenomenon known as “overdispersion” [. The negative binomial is similar to the Poisson, but unlike the Poisson, allows for overdispersed count responses [. Inference for GLM can be based on maximum likelihood (ML) or estimating equations (EE). The classic ML provides most efficient estimates, if the response Yi follows the specified distribution such as the normal in the linear regression in (1). In many studies, it may be difficult to specify the right distribution, in which case the ML will yield biased estimates if the specified distribution does not match the data distribution. The modern alternative EE uses an approach for inference that does not require specification of a mathematical distribution for Yi, thereby providing valid inference for a wider class of data distribution. Since no distribution is required under EE, we may also express the GLM in this case as: or simply or equivalently where h = g–1 is the inverse of g(μ). When specified without the distribution component, (4), (5) or (6) are also called the semi-parametric GLM. In comparison, (3) is called the parametric GLM. Example 1. In a suicide study, we may model number of suicide attempts, Yi, a count response, using a parametric GLM: where Xi denotes a set of explanatory variables such as age, physical problems and history of depression. If concerned about overdispersion, such as indicated empirically by a much larger variance as compared to the mean, the semi-parametric GLM below may be used Unlike the parametric model in (7), the semi-parametric GLM above provides valid inference regardless of presence of over-dispersion.

2.2 Extension of GLM to longitudinal data

1.1.1 Weighted Generalized Estimating Equations (WGEE)

Consider a longitudinal study with T time points and let Yit and Xit denote the same response and predictors/covariates as in the cross-sectional setting, but with t indicating their dependence on the time of assessment (1 We can then get estimates of βt for each time point t. However, it is difficult to interpret different βt across the different time points. Moreover, it is technically challenging to combine estimates of βt to test hypotheses concerning temporal trends because of interdependence between such estimates. The WGEE addresses the aforementioned difficulties by using a single estimate β to model changes over time based on multiple assessment times[. Since the WGEE estimates β using a set of equations that do not reply on assumed distribution f(μ) in (8), the first part of the GLM can be removed and the resulting model becomes By comparing the WGEE above with the model in (4), it is seen that the WGEE is an extension of the semi-parametric GLM to longitudinal data. The key difference between (4) and (8) is that (8) is not simply an application of GLM to each of the time points, but rather an extension of the model in (4) to provide a single parameter vector β for easy interpretation and estimate this parameter vector by using data from all time points and accounting for correlations between the repeated assessments. Like the semi-parametric GLM, the WGEE provides valid inference for a wider class of data distributions.

2.3 Generalized Linear Mixed-effects Models (GLMM)

The GLMM extends the GLM to longitudinal data analysis using a completely different approach. Consider again a longitudinal study with T time points and let Yit and Xit denote the same response and predictors/covariates as in the WGEE above. The GLMM is specified by: where N(μ, N (μ, Σ) denotes a multivariate normal with mean μ and variance Σ, Zit is a vector of predictors/covariates (often set equal to Xit), and g(μ) is the appropriate link function for the type of response Yit. The vector of latent variables, bi, is called the random effects, denoting individual differences from the population mean bi, which is known as the fixed effects. Although β̃ is typically assumed to follow a multivariate normal as in (3), other types of distributions may also be considered. Unlike the WGEE, the GLMM accommodates correlated responses Yit by directly modeling their joint distribution. Since multivariate distributions are extremely complex except for the multivariate normal, latent variables bi are generally employed to model the correlated responses. Thus, although Yit is still modeled for each time point t, by including the random effect b, in the specification of the conditional distribution of Yit given bi (Xit and Zit), the GLMM in (9) allows the resulting Yit ‘s to be correlated (conditional on Xit and Zit only). This approach allows one to specify multivariate distributions using familiar univariate distributions such as the Bernoulli (for binary responses) and the Poisson (for count responses).

2.4 Key differences between GLMM and WGEE

Although both WGEE and GLMM are extensions of the GLM, the two approaches are quite different because of the way the extensions are accomplished. In this section, we discuss such key differences.

2.4.1 Interpretation of Model Parameters

A fundamental difference between the WGEE and GLMM is in interpretation of model parameters. As noted earlier, the WGEE is an extension of the semi-parametric GLM, while the GLMM is an extension of the parametric GLM. Although the parameter vector β has the same interpretation between the parametric (1) and semi-parametric (4) GLM, the parameter vectors β in the WGEE (8) and β in the GLMM (9) are generally different except for the linear regression. Example 2. Consider the model in Example 1, but now assume a longitudinal study with T assessment times. Let Yit and Xit denote the longitudinal versions of Yi and Xi. Again, if Yit at each point is modeled to follow a Poisson, the GLMM has the form: On the other hand, the WGEE for a count response has the form The two models look quite the same, except for the additional random effect in (10). So, to compare the two models, we integrate out bi in (10) and calculate the conditional expectation E (y | x) to obtain (see Zhang et al., 2012 for derivationμ] This conditional expectation is different from the expression in (11) for the WGEE unless In some special cases, 3 and [3 may be identical except for the intercept term. For example, if z is independent of xit, then Thus (12) reduces to Except for the intercept term, β and i have the same interpretation. The next Example illustrates this special case with data from a real study. Example 3. The COMBINE study was a multi-site randomized clinical trial with longitudinal follow-up to compare intervention effects between nine pharmacological and/or psychosocial treatments. For illustration purposes, we combined all the 9 treatment conditions and focused on the two drinking outcomes, days of any drinking and days of heavy drinking over the past month, at three visits during treatment at weeks 8, 16 and 26. Denote the three visits by 1 Since Zi= 1 and Xi1t=1 are both constants, they are independent. We fit a WGEE with the same predictors Xit=(Xi1t, Xi2t, Xi3t)T as in the fixed effect of GLMM above. As indicated in Example 2, β̃0 differs from its WGEE counterpart βo by a constant , whereas β̃1 and β̃2 retain the same scales as the corresponding β1 and β2 in the WGEE model. Shown in Table 1 are the estimates of β and β̃ and associated standard errors, and test statistics and associated p-values for both models. As expected, estimates of β̃1 and β̃2 were quite close to the respective WGEE estimates β1i and β2, but those of were much smaller (in magnitude) than their WGEE counterparts (0.694 vs. 1.908 for Days of Any Drinking and -0.30 vs. 1.307 for Days of Heavy Drinking).

Table 1.

Estimates of reference level at Visit 1 (β̃0 for WGEE and β0 for GLMM) and change from reference level at Visit 2 (β1 and β̃1) and at Visit 3 (β2 and β̃2), along with associated standard errors, for Days of Any Drinking and Days of Heavy Drinking for the real COMBINE Study in Example 3.

Comparison of Estimates (Standard Errors) between GEE and GLMM
	Visit 1(β₀ or β̃₀)	Visit 2(β₁ or β̃₁)	Visit 3(β₂ or β̃₂)
Model Fit	Days of Any Drinking
GEE(β)	1.908 (3.6)	0.144 (2.2)	0.144 (2.8)
GLMM(β̃)	0.694 (6.2)	0.144 (1.5)	0.144 (1.5)
	Days of Heavy Drinking
GEE(β)	1.307 (4.7)	0.224 (3.3)	0.216 (4.1)
GLMM(β̃)	-0.30 (7.3)	0.224 (1.9)	0.216 (1.9)

Since β̃1 and β̃2 (β1 and β2) represent changes relative to the reference level β̃0(β0) from Visit 1 to Visit 2 and from Visit 1 to Visit 3 under GLMM (WGEE), the difference between the estimated β̃0 and β0 implies not only different means at visit 1, but also at visits 2 and 3 for each drinking outcome. For example, for the outcome of Days of Any Drinking, the WGEE estimates indicate a mean of 6. 7, 7.7 and 8.9 days of any drink over the three visits, much lower than 1.9, 2. 3 and 2.7 days of any drink at the corresponding visits estimated by the GLMM. The WGEE estimates are actually identical to sample means of this drinking outcome at each visit, but the GLMM estimates are not, making the latter difficult to interpret. Example 4. In Example 2, now suppose that yMit is binary and is modeled by a GLMM for binary response as follows The corresponding WGEE has the form As in Example 2, the two models look the same, except for the additional random effect in (14). To compare the two, we again need to integrate out bi in (14). In general, it is difficult to obtain a closed-form expression for the resulting integral. But, in the special case when Zit is a subset of Xit and Zit =Zit, the integral can be expressed as (see Zhang, Xia et al., 2011 for derivation)[ Thus β̃0, is a rescaled β. Even in this special case, it may not easy to interpret β̃, as demonstrated by a real study example next. Example 5. In a recent study on smoking cessation, 276 subjects participated in multi-component program adapted to seriously mentally ill patients within an outpatient mental health clinic. Out of these subjects, 99 also participated in a formal evaluation, in which interviews were conducted at the point of enrollment (baseline) and again at 3, 6 and 12 months. A primary outcome of the study is the 7-day point prevalence abstinence (defined as no smoking at all in the previous 7 days). We modeled changes of this longitudinal binary abstinence outcome using data from the 99 subjects. We applied both longitudinal approaches and used three dummy variables to model the rate of 7-day point prevalence abstinence at each of the follow-up times for the WGEE and also included a random intercept for the GLMM model. Thus, the GLMM has the form while the WGEE is given by: Shown in Table 2 are the estimates of β̃ and i and associated standard errors, and test statistics and associated p-values for both models. The estimates were quite different between the two models, although the ratio , a constant for all 1

Table 2.

Estimates of parameters, standard errors, p-values and rates of 7-day point prevalence abstinence at baseline and 3 follow-up visits from the WGEE and GLMM models for the real Smoking Cessation Study in Example 5.

Parameters	Estimates (Standard Errors)		p-values		Rates of 7-day point prevalence abstinence
	WGEE	GLMM	GEE	GLMM	GEE	GLMM
Baseline (3 or 3)	4.05 (0.92)	5.47 (1.31)	<.001	<.001	0.076	0.021
Month 3 (3₀ or 3 ₀)	0.87 (0.42)	1.27 (0.56)	0.036	0.026	0.164	0.070
Month 6 (3, or 3 J	1.01 (0.42)	1.49 (0.56)	0.016	0.009	0.184	0.086
Month 12 (3₂ or 3 ₂)	0.55 (0.45)	0.79 (0.57)	0.217	0.173	0.124	0.044

Also, since H : β =0 under one model implies that H holds true for the corresponding coefficient under the other model, same conclusions were obtained regarding statistical significance of the parameters, although the p-values under the two models were slightly different. The scale difference in the parameters between the two models did have serious implications for the interpretation of estimates from the GLMM. Shown in the last two columns of Table 2 are the rates of 7-day point prevalence abstinence over time estimated by each of the two models. Since the WGEE becomes the logistic regression at each assessment time, it yields estimates identical to the observed rates of 7-day point prevalence abstinence. Estimates from the GLMM were quite different, underestimating the observed rates by over 50%. Even in this simplest case where estimates of GLMM are a scale shift of their GEE counterparts, GLMM estimates are difficult to interpret.

2.4.2 Computational Issues for GLMM

Inference for the WGEE is based on a set of equations. Although it is generally not possible to express solutions (estimates) in closed-form, the equations are readily solved numerically [. Inference for the GLMM, however, is much more challenging. Since the likelihood function arising from the GLMM is generally quite complex, involving multidimensional integrals, it is difficult to directly maximize this function except for the special linear models case for continuous outcomes. Different approaches have been proposed to address the computational issues when modeling non-continuous responses using the GLMM. The approach implemented in most major software such as SAS and R is integral approximation. This approach first approximates the log-likelihood function and then maximizes the approximated function using the Newton and Gauss-Hermite quadratures. However, studies have shown that the approach does not work well as one would expect [, especially for modeling binary responses. Below, we highlight some key findings from these studies. Example 6. Zhang, Lu and Feng et al. (2011) [ examined the computational issues by fitting GLMM for simulated binary responses using both SAS and R. They simulated an explanatory variable Xi and the response Yit from the GLMM: where β0=β1=1 and τ=0.001 and 2. For τ=0.001, the within-subject correlation was very small and thus negligible, making theYit’s almost independent. For τ=2, the within-subject correlations was about 0.5. They simulated data from the GLMM in (17) with a sample size n=500, fit the same model to the simulated data using R and SAS, and repeated the process 1,000 times. Shown in Table 3 are the estimates of parameter vector β (under “β0=1” and “β1=1” based on averaging over 1,000 estimates from fitting the model to each simulated data) and associated standard errors (under S.E. × 10 based on sample standard deviations from the Monte Carlo 1,000 estimates of β) from fitting SAS NLMIXED procedure and R lme4 function. For t=0.001, the estimates were quite similar between the two, but for t=2, the SAS NLMIXED procedure provided more accurate estimates. There were more pronounced differences in the standard errors between the two, with the R lme4 consistently yielding lower standard errors than the SAS NILMIXED. Such differences play a significant role in hypothesis testing.

Table 3.

Estimates of parameters, standard errors, Type I error rates (for testing null: H0: Pμ1 from SAS NLMIXED and R lme4 procedures for two within-subject correlation cases t=0.001 and t=0.0001) for the simulation study in Example 6.

τ	Software	Type I error for testing null H₀: β₁=1	β₀=1	S.E.	β₁=1	S.E.
0.001	SAS NLMIXED	0.046	1.025	0.075	1.029	0.085
	R lme4	0.098	1.022	0.067	1.029	0.075
2	SAS NLMIXED	0.066	0.992	0.125	0.976	0.166
	R lme4	0.256	0.983	0.123	0.912	0.141

Shown in Table 3 under “Type I error” are the type I error rates for testing the hypothesis, H0 : β1 = 1 vs. Ha : β1 ≠ 1 based on the Wald statistic from the SAS and R procedures. The Wald statistic is where is the standard error for the estimate. The type I error rate was calculated as: where q1,0.95 is the 95th percentile of χ21 and T denotes the Wald statistic for testing the hypothesis based on th simulated data (1 ≤ m ≤1,000). Since the null hypothesis H : β1 = 1 is true, the type I error rate α̂ should be close to the nominal value a=0.05, if the SAS and R procedures provided correct inference. Although the SAS NLMIXED did yield type I error rates close to the nominal value, R lme4’s estimates were inflated, especially for the case with higher within- subject correlation t=2. In addition to the SAS and R procedures, Zhang, Lu and Feng et al. (2011) [ also considered other procedures in SAS and R and found that none provided correct estimates. Their conclusion was that the SAS NLMIXED procedure provided more accurate estimates and type I error rates. However, more recent studies by Chen, Knox, Arora et al. (2016) [ and Chen, Lu, Arora et al. (2016)[ show that this SAS procedure did not provide correct estimates either. Example 7. Chen, Knox, Arora et al. (2016) [ considered clustered binary responses arising from multi-center studies. They modeled and simulated clustered binary responses from the following GLMM where K denotes the number of study sites, n number of subjects within each site (cluster size), Xkj is a binary variable indicating treatment condition assigned to the th subject within the k th site, and is the random effect accounting for correlations between responses y from subjects with the k th site. They considered testing the hypothesis: where p = E(y | x = j) is the mean response y (or proportion of y = 1) for the treatment group. The parameters β0 and β1 in (18) are related to p0 and p in (19) as follows: Thus given (19) and σ2λ, we can solve the above for β0 and β1 (numerically). Chen, Knox, Arora et al. (2016) [ considered σ2λ = 0, σ2λ = 0.1 and σ2λ = 1, with K = 20 clusters and n = 25 within each cluster. For each σ2λ, they obtained β0 and β1 and simulated data from the GLMM in (18) under H : p 0.05. They fit the model (18) to the data generated using SAS NLMIXED, tested the null H 0.10 and rejected the null if the p-value is larger than α = 0.05. The process was repeated 1,000 times and power was estimated by the percent of times the null was rejected. Shown in Table 4 are power estimates under “NLMIXED” for the three cases of σ2λ, along with true power values under (“True”) obtained using a different approach developed in Chen, Knox, Arora et al. (2016)[. As seen, power estimates from obtained from the SAS NLMIXED were quite different from the true power for each case of σ2λ, all underestimating the true power. Note that when σ2λ = 0, there is no data cluster and power only depends on the total sample size Kn. In this case, we can also obtain power estimates by using instead the SAS LOGISTIC NLMIXED for non-clustered data. This is indeed the case, since Chen, Knox, Arora et al. (2016) [ reported that they obtained power estimates similar to 0.561 when running the Monte Carlo substitution to estimate power using the SAS LOGISTIC.

Table 4.

Power estimates from SAS NLMIXED along with true power values for two data clustering cases (σ2λ=0.1 and σ2λ=1) for the simulation study in Example 7.

K	n	σ²_λ=0		σ²_λ=0.1		σ²_λ=1
K	n	True	T NL-MIXED	True	NL-MIXED	True	T NL-MIXED
50	10		0.561	0.565	0.546	0.598	0.473
20	25	0.561 0.275	0.561	0.336	0.590	0.455

3. Discussion

In this report, we discussed the two most popular regression models for longitudinal data. We focused on interpretation and computation of model parameters. For parameter interpretation, we discussed differences between the GLMM and WGEE when applied to model binary and count responses. Since parameters from the two longitudinal models are generally quite different, we should not expect similar estimates when applying the two models to real study data. Moreover, except for some special cases, it is generally not possible to find a relationship between estimates from the two models. Our analysis indicates that GLMM estimates can be quite difficult to interpret, while WGEE estimates afford straightforward interpretation. Another major issue with the GLMM is to obtain reliable estimates using existing software. It seems that even software giants like SAS cannot provide correct estimates when applying GLMM to model binary responses. Until the computational problem is resolved, one may want to consider applying WGEE when modeling longitudinal binary responses. We focused on the binary and count response when discussing interpretation of model parameters in this report. When modeling continuous responses, the two longitudinal models have the same interpretation for their model parameters and thus the interpretational issue does not arise. The computational problem seems only relevant to binary responses. For continuous responses, the log-likelihood function can be solved accurately. For count responses, major software such as R and SAS seem to provide quite reliable estimates.

2 in total

1. On fitting generalized linear mixed-effects models for binary responses using different statistical packages.

Authors: Hui Zhang; Naiji Lu; Changyong Feng; Sally W Thurston; Yinglin Xia; Liang Zhu; Xin M Tu
Journal: Stat Med Date: 2011-06-10 Impact factor: 2.373

2. A class of distribution-free models for longitudinal mediation analysis.

Authors: D Gunzler; W Tang; N Lu; P Wu; X M Tu
Journal: Psychometrika Date: 2013-11-22 Impact factor: 2.500

2 in total