Literature DB >> 22133756

A marginalized conditional linear model for longitudinal binary data when informative dropout occurs in continuous time.

Abstract

Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.

Entities: Chemical

Mesh：

Year: 2011 PMID： 22133756 PMCID： PMC3297830 DOI： 10.1093/biostatistics/kxr041

Source DB: PubMed Journal: Biostatistics ISSN： 1465-4644 Impact factor: 5.899

INTRODUCTION

Dropout occurs commonly in longitudinal studies. For example, in the HIV Epidemiology Research Study (HERS), a HIV cohort study of 1310 women from 1993 to 2000, it was of interest to examine the time course of depression (defined as whether the Center for Epidemiologic Studies Depression Scale is equal to or greater than 16) in HIV-infected women and other associated factors (Smith ; Ickovics ; Su and Hogan, 2010). At baseline, the HERS women were scheduled to be followed up every 6 months for 12 visits. However, the dropout rate in the HERS was appreciable and only 173 women had a depression observation at the 12th visit among the 753 women who were HIV-infected at baseline and did not die with HIV-related reasons during the study period. Moreover, previous studies have suggested that the dropout could be related to the disease progression and associated depressive symptoms (Ickovics ; Roy and Daniels, 2008; Su and Hogan, 2010). As the actual measurement times correspond to assessment dates and vary across women (see Figure 1 of Su and Hogan, 2010), following Su and Hogan (2010), in this article the dropout in the HERS is considered to occur in continuous time.

Fig. 1.

Posterior mean estimates of depression prevalence by race and baseline CD4 groups from the mTLV and MCLM fits of the HERS depression data.

Posterior mean estimates of depression prevalence by race and baseline CD4 groups from the mTLV and MCLM fits of the HERS depression data. When dropout depends on the unobserved response at the time of dropout, or at future times, even after conditioning on the observed data, it is called “informative” or “nonignorable.” To deal with informative dropout, a variety of model-based approaches, including “selection” models (SMs), “pattern mixture” models (PMMs), and “shared parameter” models have been proposed for the joint modeling of the response and dropout processes (Wu and Carroll, 1988, Diggle and Kenward, 1994, Follman and Wu, 1995, Ten Have and others, 1998, Wu and Bailey, 1989, Little, 1993, Little, 1994, Hogan and Laird, 1997, Wulfsohn and Tsiatis, 1997, Henderson and others, 2000, Tsiatis and Davidian, 2004, Ibrahim and Molenberghs, 2009). Semiparametric approaches were also proposed to adjust for the dependence of the dropout time on the unobserved responses (Rotnitzky , Scharfstein , Lin and Ying, 2003, Wilkins and Fitzmaurice, 2007). Within the PMMs framework, conditional linear models (CLMs) by Wu and Bailey (1989) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, one disadvantage of CLMs and PMMs compared with SMs is that their parameters usually lack a direct interpretation in terms of marginal covariate effects if nonidentity link functions are used in the mean structures (Wilkins and Fitzmaurice, 2007, Roy and Daniels, 2008, Su and Hogan, 2010). For some scenarios with only treatment groups and measurement times as the covariates, we can obtain the marginal summaries for covariate strata by averaging the response distributions over the dropout patterns (Fitzmaurice and Laird, 2000, Su and Hogan, 2010). When a number of confounders or quantitative covariates are present, a simple summary of the marginal covariate effects might not be immediately available in a CLM or PMM. To overcome this limitation, several PMMs have been proposed. Building upon log-linear models, Wilkins and Fitzmaurice (2006) developed a marginalized PMM for short sequences of binary data, where the conditional dependencies among the responses and between the responses and dropout patterns are specified separately in addition to the marginal mean model. To avoid the proliferation of nuisance parameters in full likelihood approaches, Wilkins and Fitzmaurice (2007) proposed a PMM using the semiparametric moment-based approach. Focusing on the scenarios with many unique dropout patterns, Roy and Daniels (2008) developed a PMM where the marginal mean follows a generalized linear model and the mean conditional on the latent class and random effects is specified separately. However, mainly because of the concerns about sample size per dropout pattern and model parsimony, these models may not be directly applicable to the situation where measurement times are irregular across individuals and dropout can occur at any point in continuous time. In this article, within the Bayesian paradigm, we propose a marginalized conditional linear model (MCLM) to deal with continuous-time informative dropout for long sequences of binary data when the target of inference is the marginal covariate effects. Given the dropout time, models for the mean and dependence (including serial dependence and nondiminishing dependence) structures of the binary responses are specified separately (Heagerty, 2002, Schildcrout and Heagerty, 2007, Roy and Daniels, 2008), while parameters in both models are allowed to depend on the dropout time through linear or quadratic formulations similarly as in the original CLMs. One advantage of PMMs and CLMs over others is that the unidentifiable part of the model for extrapolating missing data can be distinguished from those identifiable from the observed data, which facilitates substantive critique and empirical sensitivity analysis (Little and Wang, 1996, Daniels and Hogan, 2000, Daniels and Hogan, 2008, Rotnitzky ). In this article, we will illustrate the unverifiable assumptions in the proposed MCLM and demonstrate sensitivity analysis strategies based on the extrapolation method (Rizopoulos ) using the HERS depression data. The remainder of this article is organized as follows. In Section 2, we introduce the model. Computational details are provided in Section 3. In Section 4, we apply our methods to the HERS depression data and conduct a sensitivity analysis to assess the impact of unverifiable assumptions on the scientific conclusions. Conclusions and discussion follow in Section 5.

MODEL

Let D denote the dropout time for the ith individual (i = 1,…,N). At continuous-time points t,…,t (t ≤ D), we observe the binary responses Y = (Y,…,Y)T and the n×p exogenous covariate matrix X = (x,…,x)T (e.g. external or fixed by study design). When the dropout is informative in the sense that it is related to the unobserved responses given the observed data, we need to jointly model (Y,X,D). Specifically, building on the marginalized transition and latent variable model (mTLV) by Schildcrout and Heagerty (2007) for long series of binary data, we develop an MCLM by allowing the conditional mean and dependence given the dropout time as well as the marginal mean to be separately specified. Basically, our model formulation involves 4 components: (a) Marginal model for the mean of the jth response, μM = E (Y|x). (b) Conditional model for the mean of the jth response given the dropout time (pattern) D, μC = E(Y|x,D). (c) Dependence model for the responses given the dropout time D, E(Y|Y,…,Y,b,x,D), where b is an individual-level random intercept. (d) Marginal model for the dropout time distribution, f(D|X). To specify (a), we assume thatwhere g(·) is a link function, j = 1,…,n, and β is a p×1 vector of marginal regression coefficients. Both (b) and (c) capture the association between binary responses and the dropout time. In particular, we assume thatwhere z is a subset of x, α(·) is a q×1 vector of linear or quadratic functions of the dropout time D. For identifiability, we use a constraint on α(·) such that α(T) = 0, where T indicates the time for study end or the maximum follow-up in the study. Because of the following relationship between (2.1) and (2.2)the δ term is implicitly a function of β, α(·), the parameters for (d) and the covariates x. Basically, the model in (2.1) is chosen to obtain the desired target of inference: marginal covariate effects. The conditional mean model in (2.2) specifies how the response mean for individuals differ by their dropout times D and this is consistent with the specification in the original CLM by Wu and Bailey (1989). In other words, we allow the response mean to depend on the dropout process using a parametric formulation (e.g. linear or quadratic functions) as in a CLM. It must be recognized that unverifiable assumptions in (b) influence the inferences about the parameters in (a). For example, in the HERS example, if z includes the time variable t and its corresponding coefficient is α(D) = θ0 + θ1D, then early dropouts were allowed to have different time slopes of depression compared to later dropouts. However, here we assume that the time slope before dropout at D can be extrapolated to characterize the time slope after dropout, where no data after dropout were available to assess the validity of assumption. Therefore, sensitivity analysis is required, and we will demonstrate the corresponding strategies using the HERS example in Section 4. The purpose of (c) is to account for the dependence between binary responses within individuals and allow full likelihood-based inference for long series of binary data. Following Schildcrout and Heagerty (2007), we consider both serial dependence with a Markov component and nondiminishing dependence with a random intercept. Specifically, the mean of Y, conditional on its history Y,…,Y, the random intercept b, the covariates x as well as the dropout time D is μS = E (Y|Y,…,Y,b,x,D) = E (Y|Y,b,x,D) and Although a logit link function is used here, note that any valid link function can be adopted (Heagerty, 2002). For simplicity, the dependence of Δ, γ(D), and σ2(D) on x is suppressed for now. Given b, the log odds ratio γ(D) measures the serial dependence between Y and the immediate previous response Y among those who drop out at D; b introduces the nondiminishing (long-range) dependence between responses within individuals. The intercept Δ is determined such that the conditional mean model in (2.2) and the dependence model in (2.3) are simultaneously satisfied (Schildcrout and Heagerty, 2007). In other words, Δ is the solution to Further, the serial dependence measure γ(D) and random intercept variance σ2(D) can be modeled viawhere w and v are subsets of x, φ(·), and ψ(·) are vectors of linear or quadratic functions of the dropout time D. For example, w can include the gap time between 2 consecutive visits, which accommodates irregular spacing of measurement times. v can include treatment group membership such that the random intercept variance differs by treatment groups, but this treatment effect will vary by the dropout time. By allowing the dependence parameters to vary by D in (2.3), our MCLM has a different within-individual dependence structure from a CLM that only allows the mean parameters, e.g. in (2.2), to vary by D. It is well known that with complete data and likelihood-based approaches, properly modeling the within-individual dependence structure can affect the variability estimates more than the point estimates of the mean parameters (Diggle ). However, with missing data, even point estimates can be biased if the dependence structure is not carefully modeled (Kurland and Heagerty, 2004, Daniels and Hogan, 2008). By including covariates and allowing the dependence on the dropout time in the dependence model, we are trying to minimize these biases in our approach. Finally, component (d) needs to be specified to complete the joint distribution for (Y,X,D). Basically, this can be modeled using any event time distribution, where the dependence on X can be checked by standard event time regression analysis methods. Here, we adopt a nonparametric approach and allow f(D|X) to be completely unspecified within the strata of X. Following Su and Hogan (2010), we use Rubin's Bayesian bootstrap (Rubin, 1981) to obtain the posterior of f(D|X) for the observed dropout times (see details in the Supplementary material available at Biostatistics online).

COMPUTATIONAL DETAILS

We let θ denote the set of parameters that characterize the functions α(·) in the conditional mean model in (2.2), let λ denote the set of parameters that characterize the dependence model in (2.3–2.5), and let π index the dropout time distribution f(D|X;π). The likelihood contribution from the response data of the ith individual is The posterior distribution for the parameters in a MCLM is proportional to where p(·) is a prior density function. We follow the specification of the original PMMs in the Bayesian paradigm (Daniels and Hogan, 2008) and assume that the priors for π are independent of the priors for (β,θ,λ). It follows that π is not a part of the posterior for (β,θ,λ) and the inference for π can be based on the marginal likelihood ∏f(D|X;π). We standardize the continuous covariates to have mean 0 and standard deviation 0.5 as recommended by Gelman (2008) and assign independent t priors with 7 degrees of freedom and scale 2.5 (Gelman ) to the elements of β, θ as well as those serial dependence parameters within λ in (2.4). Independent N(0,7) priors are used for random intercept variance parameters (at log scale) within λ in (2.5). The Markov Chain Monte Carlo (MCMC) for posterior sampling is implemented in MATLAB (version 7.1) and more details can be found in the Supplementary material available at Biostatistics online.

EXAMPLE

As briefly described in Section 1, our goal is to characterize the depression time course for the 753 HERS women. We exclude those women who died due to HIV-related reasons during the study period because we consider that response-related death mixed with dropout (Kurland and Heagerty, 2005) is another problem that needs further research and is beyond the scope of this article. Depression was measured using the Center for Epidemiologic Studies Depression Scale (CES-D), which ranges from 0 to 60 with larger scores indicating the presence of more symptoms. Following Su and Hogan (2010), we focus on the dichotomized CES-D data that commonly define clinically significant depression in HIV research (Radloff, 1977, Ickovics , Cook and others, 2004, Leserman, 2008). The analysis of the continuous and binary HERS CES-D data using the original PMM approach (i.e. the marginal covariate effects are not directly specified) can be found in Sections 4.1 and 4.2 of Su and Hogan (2010). The covariates of interest include baseline characteristics, such as race (Black/White/Latina and others) and initial disease stage (defined as whether the baseline CD4 count is > 200), as well as the time variable (in the unit of days). Following Gelman (2008), the time variable is standardized to have mean 0 and standard deviation 0.5.

Models under comparison

We fit an mTLV (Schildcrout and Heagerty, 2007) and an MCLM to the HERS depression data. Assuming “missingness at random” (MAR) and the prior independence of the parameters in the response model and the dropout time distribution, the missingness is ignorable in the mTLV (Little and Rubin, 2002). In both models, the marginal mean of depression follows:where I(·) is the indicator function. The quadratic term of the time variable is included to allow more flexibility to characterize the depression time course. In the mTLV, no conditional mean model given the dropout time is needed, while the dependence structure includes constant first-order serial dependence and a random intercept for nondiminishing dependence: The conditional mean model in the MCLM is specified as follows: where the standardized dropout time D* = (D − T)/T is within [ − 1,0], and T = 2093 corresponds to the maximum follow-up days in the HERS. The choice for covariates here is based on the analysis reported in Su and Hogan (2010), where regression coefficients for races were found to be relatively constant over the dropout time. Basically, we allow the regression coefficients in (4.2) to vary as linear functions of the dropout time, and if women reached maximum follow-up in the HERS, their regression coefficients are assumed to be 0 for identifiability purpose because we have specified a separate model (4.1) for the marginal mean of depression. Further, both the first-order serial dependence and the nondiminishing dependence are assumed to be linearly related to the dropout time as follows: Note that if θ1 = θ2 = θ3 = λ1 = λ3 = 0, the MCLM is reduced to the mTLV under MAR. For calculation of the intercept δ, we need to obtain the posterior samples of f(D|X). Initially, we use Cox regression analysis methods to check the relationship between the discrete covariates (race, baseline CD4 count) and the dropout time distribution. The Whites and Blacks were less likely to drop out than the Latinas and other races; the patients with baseline CD4 count > 200 were also less likely to drop out. Therefore, we have f(D|X)≠f(D) in the HERS data and the Bayesian bootstrapping for the observed dropout times is conducted within the race and baseline CD4 groups. The priors assigned for β0,β1,β2,β3,β4,β5,β6,β7, θ1,θ2,θ3 and γ, λ0,λ1 are t priors with 7 degrees of freedom and scale 2.5. The N(0,7) priors are used for ψ, λ2, and λ3. For both models, we run 2 MCMC chains and check the convergence after 5000-iteration burn-in period using history plots. The computing time for the mTLV and MCLM fits of the HERS example (6505 observations) is approximately 2 and 8 h per 1000 iterations, respectively, on our machine (2.59 GHz CPU, 32 GB RAM). Pooled posterior samples of size 10000 are used for inference.

Results

Table 1 presents the results from both the mTLV and the MCLM. In the MCLM, both the conditional mean regression coefficients and the dependence parameters indicate some associations with the dropout time. Specifically, earlier dropouts are shown to have larger main effect of baseline CD4 count ( [posterior mean] = − 0.22, 95%credible interval(CI) = [ − 0.77;0.34]). If their baseline CD4 counts are ≤ 200, earlier dropouts had larger time slopes than later dropouts (, 95% CI = [ − 1.67;0.95]), while if their baseline CD4 counts are > 200, later dropouts had larger time slopes than earlier dropouts (, 95% CI = [ − 0.64;0.93]). In other words, those early dropouts who had severe immunosuppression at baseline (CD4 ≤ 200) tended to have higher change rates of depression than later dropouts, but for patients who had baseline CD4 counts over 200, this pattern was reversed. However, given the fact that women with baseline CD4 > 200 were less likely to drop out, the influence of dropout on the binary responses is relatively small for them. Finally, the first-order serial dependence and nondiminishing dependence are also shown to vary positively with the dropout time (, , 95% CI = [ − 0.10;0.87]). Overall, compared with the mTLV fit, the MCLM adjusted the marginal depression prevalence profiles upward at the later period of followup for the group with baseline CD4 ≤ 200 and the largest adjustment occurred for the Latina/others group (left panel of Figure 1). On the other hand, the marginal depression prevalence profiles for both the White and Latina/others groups were shifted slightly if their baseline CD4 counts are > 200, but the general time trends remain stable (right panel of Figure 1).

Table 1.

	MCLM				mTLV
Parameter	Mean	SD	2.5%	97.5%	Mean	SD	2.5%	97.5%
β₀	0.28	0.22	– 0.15	0.77	0.32	0.18	– 0.06	0.63
β₁	– 0.19	0.13	– 0.45	0.05	– 0.26	0.11	– 0.47	– 0.04
β₂	0.37	0.16	0.05	0.71	0.24	0.14	– 0.03	0.53
β₃	0.00	0.21	– 0.37	0.39	0.02	0.18	– 0.29	0.40
β₄	– 0.17	0.21	– 0.62	0.18	– 0.25	0.18	– 0.57	0.09
β₅	– 0.59	0.28	– 1.12	0.01	– 0.66	0.29	– 1.18	– 0.05
β₆	– 0.29	0.08	– 0.45	– 0.12	– 0.28	0.04	– 0.37	– 0.20
β₇	0.19	0.10	0.00	0.39	0.24	0.10	0.02	0.40
θ₁	– 0.22	0.28	– 0.77	0.34
θ₂	– 0.46	0.68	– 1.67	0.95
θ₃	0.20	0.39	– 0.64	0.93
λ₀	0.63	0.45	– 0.26	1.53
λ₁	0.67	0.52	– 0.36	1.70
γ					1.19	0.09	1.02	1.36
λ₂	0.26	0.21	– 0.17	0.64
λ₃	0.36	0.25	– 0.10	0.87
ψ					0.55	0.05	0.46	0.66
σ²					1.74	0.09	1.58	1.93

Results from the HERS analysis. The posterior means, standard deviations (SD), and the 95% CI are reported for the marginal regression coefficients, conditional mean, and dependence parameters from the fitted MCLM and mTLV Recall that when θ1 = θ2 = θ3 = λ1 = λ3 = 0, the MCLM is reduced to the mTLV under MAR. Therefore, if we assume that MAR is violated, the parameters θ1, θ2, θ3, λ1, and λ3 will quantify the degree to which MAR fails to hold. Since the estimated 95% CIs for all these parameters cover zero, there is no strong evidence from the HERS data that the MCLM fit is preferred to the mTLV fit under MAR. The goodness of fit of the MCLM was further assessed by posterior predictive checks based on completed-data plots obtained by multiple imputation of the missing responses (Gelman ; see details in the Supplementary material available at Biostatistics online.). In summary, we observed that, regardless of their baseline CD4 counts, Latinas and other race groups had higher depression prevalence over time as compared with Blacks and Whites. Given their races, women with different baseline CD4 counts all had downward trends in depression prevalence over time. There is no sufficient evidence from the data to show that these trends differ (see Figure 3).

Fig. 3.

Sensitivity analysis for the MCLM of the HERS depression data: posterior mean estimates of the prevalence difference of depression between baseline CD4 groups (CD4 > 200 vs. CD4 ≤ 200) for White women with fixed values for sensitivity parameters a0 and a1 compared with the results from the mTLV and MCLM (the results for Latinas and Blacks are similar); gray shades represent corresponding pointwise 95% credible bands from the MCLM fit.

Illustration of the unverifiable assumption made in the MCLM: the horizontal axis represents time since enrollment, the vertical axis represents the conditional mean of depression at the logit scale, and T represents the study end or maximum follow-up. At time d, some participants dropped out of the HERS. Therefore, the depression time slope after d is not estimable from the observed data. In the MCLM, the depression time slope before dropout is extrapolated to the time slope after dropout (the solid line). In the corresponding sensitivity analysis, we allow the time slope after dropout to follow a piecewise linear model (the dashed line). That is, the time slope before dropout is not necessarily equal to the time slope after dropout.

Sensitivity analysis

In previous section, the mTLV and MCLM appeared to have similar fits to the observed HERS CES-D data. However, the assumptions for extrapolating the missing responses given the observed data are different in these models. In the mTLV, MAR is assumed such that the conditional distribution of missing depression responses given the observed data for those who remained in the study at d is the same as the corresponding conditional distribution for those who left the study at d (Molenberghs and others, 1998), i.e. In the MCLM, we assume that given the dropout time d and the covariates, missing data after dropout share the same parameters as observed data before dropout. For example, in the HERS example, it is assumed that given their baseline CD4 counts, women with observed dropout at d had the same time slope for t > d as for t ≤ d. This is clear from the illustration in Figure 2. The time slope after dropout cannot be obtained from the observed data and has to be extrapolated in the MCLM. Both assumptions in the mTLV and MCLM cannot be verified from the observed data and sensitivity analysis is required (Little and Wang, 1996, Daniels and Hogan, 2000, Rotnitzky , Daniels and Hogan, 2008).

Fig. 2.

We demonstrate an example of sensitivity analysis regarding the abovementioned assumption in the MCLM. The strategy of sensitivity analysis for the MCLM can be based on the extrapolation method (Rizopoulos ). Basically, we assume a different time slope for t > d, i.e. assume a continuous piecewise linear model with a change point at d (see Figure 2) . For the group with baseline CD4 ≤ 200, we assume the conditional mean model as follows: where (x)+ = x if x > 0 and 0 otherwise, is the observed dropout time standardized to have the same scale of t and ω0(D*) is the change of the slope after dropout that is different across specific dropout times. The model for baseline CD4 > 200 is similar but with ω1(D*) representing the slope change after dropout: In principle, sensitivity analysis should be based on the parameters that cannot be identified by the observed data, such as ω0(D*) and ω1(D*). We assume a simple functional form for ω0(·) and ω1(·): Thus, when D = T is the maximum follow-up, no adjustment is made about the slope after dropout (i.e. for study completers), while the slope is adjusted upward by a0 or a1 when D = 0, that is, when the participants dropped out after the enrolment visit. For example, when a0 = 2 and some HERS women with baseline CD4 ≤ 200 dropped out the study at 1 year (d = 365), we assume that before dropout their time slopes are , but their time slopes after dropouts are . In Figure 3, we fix the nonidentifiable parameters a0 and a1 at various combinations of their values and compare the estimated prevalence differences of depression between baseline CD4 groups for White women to check their sensitivity to a0 and a1. The results for Latinas and Blacks are similar. Estimates for the early time period after enrollment are close across all model fits, including the original mTLV and MCLM fits. Depending on specific combination of a0 and a1, the baseline CD4 group difference in depression prevalence is adjusted downward or upward at the later follow-up period. However, the pointwise 95% credible bands from the MCLM fit cover all these estimated depression prevalence profiles even when we choose a0 and a1 at relatively large values (i.e. large changes in time slopes after dropout are assumed). In practice, caution needs to be taken about how to choose values or assign priors for sensitivity parameters. In this particular example, we only showed a simple case by setting them as constants (i.e. assign 1–0 point mass prior). Informative priors on sensitivity parameters can also be used based on expert opinions and prior elicitation from previous studies (Daniels and Hogan, 2008). Sensitivity analysis for the MCLM of the HERS depression data: posterior mean estimates of the prevalence difference of depression between baseline CD4 groups (CD4 > 200 vs. CD4 ≤ 200) for White women with fixed values for sensitivity parameters a0 and a1 compared with the results from the mTLV and MCLM (the results for Latinas and Blacks are similar); gray shades represent corresponding pointwise 95% credible bands from the MCLM fit.

DISCUSSION

We have proposed a new model for dealing with informative dropout that occurs in continuous time. The marginal covariate effects of interest are directly modeled and the relationship between the binary responses and the dropout process is specified using linear or quadratic formulations in both conditional mean and dependence models. In our Bayesian approach, the continuous dropout time distribution is not modeled and its uncertainty is properly taken into account by Bayesian bootstrapping when obtaining marginal covariate effects. In this article, we focused on the scenario with dropouts only. There were 173 HERS women who actually finished 12 scheduled visits. Su and Hogan (2010) distinguished these administratively censored patients from dropouts and allowed them to form a separate pattern in their varying coefficient modeling approach to these data. They found that the parameter estimates for responses from these patients were similar to those from later dropouts (e.g. those who finished 11 visits). Therefore, for simplicity, in the analysis reported in Section 4, we treated the follow-up times of administratively censored patients (ranged from 1952 to 2093 days) the same as the dropout times. In practice, distinguishing administrative censoring from dropouts might be more important when patients have staggered entry and informative dropout is present (Li and Schluchter, 2004). The proposed MCLM can be extended by allowing the parameters to depend on administrative censoring times through linear or quadratic functions, but these functions are distinct from those for dropout times. We have assumed that the relationship between the dropout time and binary responses follows the linear or quadratic formulations. Unspecified smooth functions modeled by penalized splines (Ruppert and others, 2003) can be used to allow more flexibility for this relationship (Hogan and others, 2004, Su and Hogan, 2010). However, we found that the estimation of the dependence parameters is usually less stable than for the mean parameters due to the sparsity nature of the binary data. Therefore, incorporating unspecified smooth functions in the mean structure of the MLCM is a more practical extension and the same penalized spline approach described in Su and Hogan (2010) can be applied straightforwardly.

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

FUNDING

The Medical Research Council (UK) (unit programme number U105261167).

28 in total

1. Semiparametric regression analysis of longitudinal data with informative drop-outs.

Authors: D Y Lin; Zhiliang Ying
Journal: Biostatistics Date: 2003-07 Impact factor: 5.899

2. Joint modelling of longitudinal measurements and event time data.

Authors: R Henderson; P Diggle; A Dobson
Journal: Biostatistics Date: 2000-12 Impact factor: 5.899

3. Directly parameterized regression conditioning on being alive: analysis of longitudinal data truncated by deaths.

Authors: Brenda F Kurland; Patrick J Heagerty
Journal: Biostatistics Date: 2005-04 Impact factor: 5.899

4. Scaling regression inputs by dividing by two standard deviations.

Authors: Andrew Gelman
Journal: Stat Med Date: 2008-07-10 Impact factor: 2.373

5. A hybrid model for nonignorable dropout in longitudinal binary responses.

Authors: Kenneth J Wilkins; Garrett M Fitzmaurice
Journal: Biometrics Date: 2006-03 Impact factor: 2.571

6. Mixed effects logistic regression models for longitudinal binary response data with informative drop-out.

Authors: T R Ten Have; A R Kunselman; E P Pulkstenis; J R Landis
Journal: Biometrics Date: 1998-03 Impact factor: 2.571

7. Pattern-mixture models for multivariate incomplete data with covariates.

Authors: R J Little; Y Wang
Journal: Biometrics Date: 1996-03 Impact factor: 2.571

8. A joint model for survival and longitudinal data measured with error.

Authors: M S Wulfsohn; A A Tsiatis
Journal: Biometrics Date: 1997-03 Impact factor: 2.571

9. Mortality, CD4 cell count decline, and depressive symptoms among HIV-seropositive women: longitudinal analysis from the HIV Epidemiology Research Study.

Authors: J R Ickovics; M E Hamburger; D Vlahov; E E Schoenbaum; P Schuman; R J Boland; J Moore
Journal: JAMA Date: 2001-03-21 Impact factor: 56.272

10. Varying-coefficient models for longitudinal processes with continuous-time informative dropout.

Authors: Li Su; Joseph W Hogan
Journal: Biostatistics Date: 2009-10-15 Impact factor: 5.899

2 in total

1. Non-ignorable loss to follow-up: correcting mortality estimates based on additional outcome ascertainment.

Authors: M Schomaker; T Gsponer; J Estill; M Fox; A Boulle
Journal: Stat Med Date: 2013-07-22 Impact factor: 2.373

2. Role of pill-taking, expectation and therapeutic alliance in the placebo response in clinical trials for major depression.

Authors: Andrew F Leuchter; Aimee M Hunter; Molly Tartter; Ian A Cook
Journal: Br J Psychiatry Date: 2014-09-11 Impact factor: 9.319

2 in total