Literature DB >> 24201470

A corrected formulation for marginal inference derived from two-part mixed models for longitudinal semi-continuous data.

Brian Dm Tom¹, Li Su², Vernon T Farewell².

Abstract

For semi-continuous data which are a mixture of true zeros and continuously distributed positive values, the use of two-part mixed models provides a convenient modelling framework. However, deriving population-averaged (marginal) effects from such models is not always straightforward. Su et al. presented a model that provided convenient estimation of marginal effects for the logistic component of the two-part model but the specification of marginal effects for the continuous part of the model presented in that paper was based on an incorrect formulation. We present a corrected formulation and additionally explore the use of the two-part model for inferences on the overall marginal mean, which may be of more practical relevance in our application and more generally.

Entities: Chemical

Keywords: bridge distribution; excess zeros; longitudinal data; random effects

Mesh：

Substances：
HLA-B27 Antigen

Year: 2013 PMID： 24201470 PMCID： PMC5051603 DOI： 10.1177/0962280213509798

Source DB: PubMed Journal: Stat Methods Med Res ISSN： 0962-2802 Impact factor: 3.021

1 Introduction

In Su et al.,[1] we described a two-part marginal model for longitudinal semi-continuous data that are a mixture of true zeros and continuously distributed positive values. Our likelihood-based model had an underlying two-part mixed model, where, in the random effects logistic regression for the first part (i.e. the binary part), the random intercept was assumed to follow the bridge distribution of Wang and Louis.[2] A zero-mean normal random intercept was included into the linear mixed modelling structure of the second part (i.e. the continuous part). Our primary focus was to ensure that the regression parameters in the binary part of the two-part marginal model were interpretable after integration over the random effects distribution. Marginal covariate effects on the expected value of the response for the population of observed non-zero responses may, however, also be of interest. These arise directly from the approaches in Moulton et al.,[3] Lu et al.,[4] Hall and Zhang[5] and Yang and Simpson[6] and involve well-defined integrations (see refs[7,8]). However, when discussing the continuous part of our model, we assumed, as did Tooze et al.,[9] that integrating out the random effects was straightforward, and that the form of the relationship between covariates and the marginal mean of the response, given that it is positive, was the same as the conditional mean given a positive response and random effects. Unfortunately, this is not the case. This paper rectifies this error and explores the use of the proposed model when the target of inference is the overall marginal mean, which may be of most practical relevance.

2 Marginal inference from two-part models

2.1 Model

Our two-part marginal model[1] is based on the original two-part mixed modelling framework introduced in Olsen and Schafer[10] and Tooze et al.[9] and the random effects specifications in Lin et al.[11] Briefly, let Y be a semi-continuous variable for the ith () subject at time t (). Let and be the covariate vectors (possibly overlapping) associated with the ith subject at time t in the two parts of the two-part mixed model. Let B and V be correlated subject-level random intercepts, which are independent of the covariates. Define also and . Y can be represented by two variables, the occurrence variable and the intensity variable given that , where is a (monotonic) transformation making normally distributed with a subject–time-specific mean. The distribution of Y is formulated by assuming, firstly, that Z is specified by a random effects logistic regression with , where is a covariate vector, is a regression coefficient vector and B is the subject-level random intercept in this first part (i.e. the binary part) assumed to follow the (symmetric) mean zero bridge distribution of Wang and Louis[2] with unknown parameter ϕ (). Next, the intensity variable given is assumed to have the linear mixed modelling structure described by , where is a covariate vector, β is a regression coefficient vector and V is the subject-level random intercept for the second part (i.e. the continuous part) assumed . The error term is assumed to be and independent of the random effects. The random effects, B and V are assumed to be correlated with their joint distribution specified through a Gaussian copula transformation model, where the correlation of the underlying Gaussian random variables is ρ (see Supplementary Material). The covariate vectors may coincide, but this is not required.

2.2 Marginal covariate effects

The main benefit of the bridge density, stressed in Su et al.,[1] is that after integration over the random intercepts, , the marginal probability relates to the linear predictors through the same logit link function as for the corresponding conditional probability, . Furthermore, if we specify the marginal regression structure of the binary part as , then the marginal covariate effects θ are proportional to the subject-specific conditional covariate effects , with . However, although in Su et al.[1] we claimed that the marginal mean of , found after integrating over , is , this is not the case generally. The correct form of the marginal mean of is which will be dependent on the impact of covariates, , on the marginal and conditional probabilities of occurrence (see Supplementary Material). As the integral given by has no closed-form solution, an exact analytical expression for (1) is not available. However, bounds on (1) are available. Specifically, we can show, after some algebraic manipulations (see Supplementary Material), that for and for Although an exact analytical expression is not available, numerically solving (1) at the maximum-likelihood estimates is straightforward as only a single integral is involved. This integral can be evaluated using adaptive Gaussian quadrature techniques. The estimation of the parameters and ρ is based on maximizing the likelihood presented in Su et al.[1]

2.3 Interpretation of the marginal effects in the continuous part

As noted earlier and in Su et al.,[1] the interpretation of θ is straightforward as these parameters are simply (population-averaged) log-odds ratios. In contrast, assessment of the impact of a covariate on the marginal mean (given being positive), , depends on whether or not that covariate is also involved in the binary part of the two-part model. If the covariate is not included in the binary part or B and V are uncorrelated (i.e. ), then the interpretation of its effect on can be quantified through just the appropriate element of β. However when B and V are correlated and, in addition, the covariate of interest is in both regression components of the model, then a simple interpretation is not readily obtainable because of the non-linearity of (1) in this covariate. In such a case, the impact of a covariate could be assessed through plotting the relationship between this covariate and , with other covariates held fixed, or alternatively by describing the local changes (i.e. through the derivative or the difference) in with respect to the covariate.[7] However, the clinical relevance of has been questioned, as discussed by Albert[12] in light of work by Lu et al.[4] and Williamson et al.[13] On the other hand, the overall marginal mean of Y as the target of inference is more easily justified clinically. The calculation of this overall mean is addressed in Section 2.4 and Section 2.5 illustrates its use.

2.4 Overall marginal mean

When is the identity function, the overall marginal mean, , is given by where we have suppressed the dependence on the covariate vectors, , for convenience. Although a closed form for the overall marginal mean is not available, the analyst can easily numerically evaluate it (as is done in the subsequent Health Assessment Questionnaire (HAQ) analysis). From Section 2.2, bounds on the overall marginal mean can be obtained as when , and when , where . Similar bounds can be derived for other common monotonic transformation functions for . For example, bounds on when is logarithmic are shown in the Supplementary Material.

3 The HAQ data revisited

In this section, we revisit the HAQ data described in Su et al.[1] The objective is to examine the association between alleles that code for human leukocyte antigen (HLA) proteins and disability level in a psoriatic arthritis (PsA) patient cohort. R code for this new analysis is located in the Supplementary Material. Table 2 of Su et al.[1] presented results from fitting the two-part mixed model to the data, where the third column shows the conditional covariate effects in the continuous part. As noted earlier, the corresponding marginal covariate effects are generally not equal to these conditional effects. In this particular application, it is perhaps more natural to examine the association between the HLA alleles and the overall expected disability level of the patients over the study period, instead of the association when some disability is present. This is because disability, as measured by HAQ, for patients can vary over time and, for example, at one visit a patient can have mild disability, but at the next visit his/her situation may be improved resulting in a zero value of HAQ. We conjecture that it will often be felt to be clinically more informative to present the marginal covariate effects on the overall expected disability level together with the marginal covariate effects on the probability of having any level of disability. For the HAQ example, we sample from the asymptotic distribution of the parameters based on the estimates in Table 2 of Su et al.[1] and calculate the contrasts of overall expected HAQ with and without specific HLA alleles, controlling for other covariates. For presentation purposes, we fix the age at PsA diagnosis at 35 years and disease duration at 15 years, which correspond to zero values in standardized versions of the two variables. These contrasts represent the effects of HLA alleles on the overall expected disability level (controlling for other covariates) in the PsA patient cohort. The top panels of Figure 1 show the HLA-B27 effects given other alleles, sex, age at diagnosis and disease duration. Because the overall mean of HAQ is not directly parametrized in the fitted model, the corresponding covariate effects are not the same for all values of the other variables. However, the HLA-B27 effects are approximately the same across different combinations of other covariates, and the 95% confidence intervals do not include zero. This demonstrates a significant association between HLA-B27 and overall expected HAQ.

Figure 1.

HAQ: Health Assessment Questionnaire.

Contrasts (with 95% confidence intervals) of overall mean of HAQ for different combinations of the covariates (controlling for being 35 years old at PsA diagnosis and having a disease duration of 15 years). HAQ: Health Assessment Questionnaire. In Su et al.,[1] we found a significant interaction between the effects of HLA-DQW3 and HLA-DR7 in the binary part of the two-part mixed model (), while the same interaction was non-significant in the continuous part (). The estimated marginal (log-odds ratio) effect of this interaction in the binary part was 0.8089 with 95% confidence interval [0.0565, 1.5613]. The middle and bottom panels of Figure 1 reflect the possible interaction between HLA-DQW3 and HLA-DR7 on the overall marginal mean of HAQ stratified by gender and absence/presence of the HLA-B27 allele. Age at PsA diagnosis is fixed at 35 years and disease duration at 15 years. For illustrative purposes, considering the left middle (or bottom) panel of Figure 1 for females with the presence of HLA-B27, we estimate that the difference in the HLA-DQW3 (or, alternatively, HLA-DR7) effects on the overall marginal mean of HAQ between those with the presence of HLA-DR7 (or HLA-DQW3) allele and those with it absent (i.e. contrast D–B in figure) is 0.0564 with 95% confidence interval [–0.2062, 0.3232]. For females with HLA-B27 absent, the estimate of this difference in the HLA-DQW3 (or, alternatively, HLA-DR7) effects on the overall marginal mean of HAQ between those with and without the HLA-DR7 (or HLA-DQW3) allele (i.e. contrast C–A) is 0.0648 with 95% confidence interval [–0.1971, 0.3158]. These estimates of the HLA-DQW3 and HLA-DR7 interaction for females, with and without HLA-B27 present, are similar and both non-significant statistically. Conclusions based on these results are similar to those found for the continuous part in the two-part marginal model (data not shown).

4 Discussion

In this article, we have corrected the formulation for the continuous part of the two-part marginal model presented in Su et al.[1] We show that the (marginal) mean of is not the fixed effects predictor, , as originally reported, but is non-linear in the covariates included in the binary part of the model. Thus, interpretation of the impact of a covariate on the marginal mean given being positive cannot be made from only considering the relevant component of β when the random effects are correlated and that covariate is also included in the binary part. In some contexts, the logit may not be the preferred link function in the binary part. For example, in dilution and serological studies the cloglog link may be more appropriate. In psychometrics, the probit may be more convenient. For either of these alternatives, a two-part marginal formulation can be derived. For instance, if the logit link is replaced with the probit and B is assumed instead of from the bridge distribution when formulating the binary part of the two-part mixed model, then the link function of the marginal regression structure for the binary part, after integrating out B, remains probit.[2] Furthermore, in the binary part, the marginal covariate effects are proportional to their subject-specific conditional covariate effects, with constant of proportionality . Under the same linear mixed effects structure considered earlier for the continuous part (i.e. V normal and the identity function), a closed-form solution for the marginal mean of Y, given , (and therefore for the overall marginal mean) can be derived in terms of the standard normal density and cumulative distribution function (see Supplementary Material). Unfortunately, this closed-form solution is again non-linear in the covariates associated with the binary part, and therefore interpretation of marginal covariate effects on the continuous part (and on the overall marginal mean) will not generally be straightforward. In conclusion, care should be taken when using two-part models for semi-continuous data that are longitudinal or otherwise clustered. Both the specification of random effects structures, as discussed in Su et al.,[14] and the interpretation and calculation of marginal effects, as discussed in this paper, require careful attention to assumptions.

10 in total

1. Mixture models for quantitative HIV RNA data.

Authors: Lawrence H Moulton; Frank C Curriero; Paulo F Barroso
Journal: Stat Methods Med Res Date: 2002-08 Impact factor: 3.021

2. Analysis of repeated measures data with clumping at zero.

Authors: Janet A Tooze; Gary K Grunwald; Richard H Jones
Journal: Stat Methods Med Res Date: 2002-08 Impact factor: 3.021

3. Marginal analyses of clustered data when cluster size is informative.

Authors: John M Williamson; Somnath Datta; Glen A Satten
Journal: Biometrics Date: 2003-03 Impact factor: 2.571

4. Analyzing excessive no changes in clinical trials with clustered data.

Authors: Shou-En Lu; Yong Lin; Wei-Chung Joe Shih
Journal: Biometrics Date: 2004-03 Impact factor: 2.571

5. Unified Computational Methods for Regression Analysis of Zero-Inflated and Bound-Inflated Data.

Authors: Yan Yang; Douglas Simpson
Journal: Comput Stat Data Anal Date: 2010-06-01 Impact factor: 1.681

6. Likelihood methods for binary responses of present components in a cluster.

Authors: Xiaoyun Li; Dipankar Bandyopadhyay; Stuart Lipsitz; Debajyoti Sinha
Journal: Biometrics Date: 2010-09-03 Impact factor: 2.571

7. A flexible two-part random effects model for correlated medical costs.

Authors: Lei Liu; Robert L Strawderman; Mark E Cowen; Ya-Chen T Shih
Journal: J Health Econ Date: 2009-11-22 Impact factor: 3.883

8. Association models for clustered data with binary and continuous responses.

Authors: Lanjia Lin; Dipankar Bandyopadhyay; Stuart R Lipsitz; Debajyoti Sinha
Journal: Biometrics Date: 2009-05-07 Impact factor: 2.571

9. A likelihood-based two-part marginal model for longitudinal semicontinuous data.

Authors: Li Su; Brian Dm Tom; Vernon T Farewell
Journal: Stat Methods Med Res Date: 2011-08-25 Impact factor: 3.021

10. Bias in 2-part mixed models for longitudinal semicontinuous data.

Authors: Li Su; Brian D M Tom; Vernon T Farewell
Journal: Biostatistics Date: 2009-01-08 Impact factor: 5.899

10 in total

5 in total

1. Assessing Risk-Taking in a Driving Simulator Study: Modeling Longitudinal Semi-Continuous Driving Data Using a Two-Part Regression Model with Correlated Random Effects.

Authors: Van Tran; Danping Liu; Anuj K Pradhan; Kaigang Li; C Raymond Bingham; Bruce G Simons-Morton; Paul S Albert
Journal: Anal Methods Accid Res Date: 2015-01

2. Shared parameter and copula models for analysis of semicontinuous longitudinal data with nonrandom dropout and informative censoring.

Authors: Miran A Jaffa; Mulugeta Gebregziabher; Ayad A Jaffa
Journal: Stat Methods Med Res Date: 2021-11-22 Impact factor: 3.021

3. Two-Part and Related Regression Models for Longitudinal Data.

Authors: V T Farewell; D L Long; B D M Tom; S Yiu; L Su
Journal: Annu Rev Stat Appl Date: 2017-03 Impact factor: 5.810

4. Two-part models with stochastic processes for modelling longitudinal semicontinuous data: Computationally efficient inference and modelling the overall marginal mean.

Authors: Sean Yiu; Brian Dm Tom
Journal: Stat Methods Med Res Date: 2017-05-23 Impact factor: 3.021

5. A simplified approach to estimating the distribution of occasionally-consumed dietary components, applied to alcohol intake.

Authors: Julia Chernova; Ivonne Solis-Trapala
Journal: BMC Med Res Methodol Date: 2016-07-01 Impact factor: 4.615

5 in total