Literature DB >> 30754731

Repeated Measures Regression in Laboratory, Clinical and Environmental Research: Common Misconceptions in the Matter of Different Within- and between-Subject Slopes.

Donald R Hoover1, Qiuhu Shi2, Igor Burstyn3, Kathryn Anastos4.   

Abstract

When using repeated measures linear regression models to make causal inference in laboratory, clinical and environmental research, it is typically assumed that the within-subject association of differences (or changes) in predictor variable values across replicates is the same as the between-subject association of differences in those predictor variable values. However, this is often false. For example, with body weight as the predictor variable and blood cholesterol (which increases with higher body fat) as the outcome: (i) a 10-lb weight increase in the same adult affects more greatly an increase in cholesterol in that adult than does (ii) one adult weighing 10 lbs more than a second indicate higher cholesterol in the heavier adult. A 10-lb weight gain in the first adult more likely reflects a build-up of body fat in that person, while a second person being 10 lbs heavier than the first could be influenced by other factors, such as the second person being taller. Hence, to make causal inferences, different within- and between-subject slopes should be separately modeled. A related misconception commonly made using generalized estimation equations (GEE) and mixed models on repeated measures (i.e., for fitting cross-sectional regression) is that the working correlation structure only influences variance of the parameter estimates. However, only independence working correlation guarantees that the modeled parameters have interpretability. We illustrate this with an example where changing working correlation from independence to equicorrelation qualitatively biases parameters of GEE models and show that this happens because within- and between-subject slopes for the outcomes regressed on the predictor variables differ. We then systematically describe several common mechanisms that cause within- and between-subject slopes to differ: change effects, lag/reverse-lag and spillover causality, shared within-subject measurement bias or confounding, and predictor variable measurement error. The misconceptions we describe should be better publicized. Repeated measures analyses should compare within- and between-subject slopes of predictors and when they do differ, investigate the causal reasons for this.

Entities:  

Keywords:  cross-sectional regression; generalized estimating equations; mixed models; repeated measures; within-/between-subject associations; working correlation structure

Mesh:

Year:  2019        PMID: 30754731      PMCID: PMC6388388          DOI: 10.3390/ijerph16030504

Source DB:  PubMed          Journal:  Int J Environ Res Public Health        ISSN: 1660-4601            Impact factor:   3.390


1. Introduction

We focus on two common misconceptions that are made in research while fitting repeated measures regression with generalized estimating equations (GEE) and mixed models (MM). Misconception-A: The association between the predictor variable and the outcome across different measures from the same subject (within-subject) is the same as the association of that variable with the outcome between measures from different subjects (between-subject). In fact, these associations often differ, which should be considered when making causal inference. For example, consider weight as the predictor and cholesterol the outcome given the well-known association of higher serum cholesterol and with greater body fat: (i) a 10 lb. increase in the same adult more likely indicates greater difference in serum cholesterol than does (ii) one adult being 10 lbs heavier than a second adult. A 10-pound weight gain in the same adult more likely reflects a build-up of body fat in that person, while the first adult being 10 pounds heavier than the second could be influenced by other factors such as the first adult being taller than the second. Misconception-B: The working correlation structure used in GEE and MM models is only a nuisance factor that impacts precision of model parameter estimates. As illustrated and explained in the next Section (and Table 1), the wrong choice for working correlation structure biases parameter estimates.
Table 1

Cross-sectional regression parameter estimates using GEE 1 for EGFR = HIV infection, serum albumin and BUN in the Bronx WIHS.

VariableWorking Correlation Structure
Independence Equicorrelation 2
Point Estimate95% CIZ-Value (p)Point Estimate95% CIZ-Value (p)
HIV Infection HIV,CS)−2.04(−5.07, 0.98)−1.32 (<0.19)−3.96(−6.90, −1.03)−2.65 (0.0081)
Albumin Per g/dL ALB,CS)−6.21(−8.95, −3.47)−4.44 (<0.0001)−9.84(−12.01, −7.68)−8.93 (<0.0001)
BUN Per mg/dL BUN, CS)−1.87(−2.12, −1.62)−14.45 (<0.0001)−1.22(−1.46, −0.99)−10.30 (<0.0001)
Quasi-Likelihood Information Criteria (QIC)10,847.1410,836.27

1 Mixed models gave essentially similar point estimates; see Appendix A. 2 Interclass correlation of residuals from GEE-E was 0.45 indicating non-independence correlation was structurally correct.

Both of these misconceptions are related, but the analytical details are complicated. To explore this further, Section 2 begins with an illustration of Misconception-B in real data. Section 2 also explains how this relates to Misconception-A and why independence working correlation must be used for creation of predictive models using “cross-sectional regression” on repeated measures. Then, Section 3 details how separation into within- and between-subject associations is needed for using repeated measures regression to makes causal inference. Section 4 describes epidemiological mechanisms that can cause within- and between-subject slopes to differ. Section 6 summarizes and explores further implications for statistical practice in applied research.

2. Cross-Sectional and Between/Within-Subject Linear Models with Repeated Measures

We begin here with some notation. Consider repeated measures on n subjects denoted by i = 1,2, …, n. The “subjects” can either be persons with longitudinal repeated measures, or, as is common in environmental epidemiology, can be cities, schools, neighborhoods, census tracks, hospitals, etc. Each subject has J different observations enumerated by j = 1, …, J. For example, these J different observations could be taken at times t < t <… < t, on the same person when the “subject” is a person or from J different persons living in the same neighborhood when the “subject” is a neighborhood. For J constant across i, (i.e., always the same number of repeat measures for a subject), we drop the “i” subscript and denote J. Let us consider that the observations have continuous outcomes Y and K predictor (or exposure) variables . When K = 1, we drop the “K” enumeration, using X for the only predictor. Linear regression models for E[Y ] or E[Y] are fit in the analyses described here. However, the overall conclusions we obtain on these linear regression models can be generalized to discrete outcomes (i.e., logistic regression) and survival analyses. 2A Cross-Sectional (CS) Regression. The most commonly fitted linear regression model on repeated measures does not separate within- and between-subject associations and is usually written out as Y + ε. This is denoted as “cross-sectional (CS) regression” particularly for longitudinal repeated measures. We add a subscripted “CS” to the β’s to distinguish these slopes from between-subject (BS) and within-subject (WS) slopes defined in Section 2B. The CS regression model here is thus denoted as where are parameters (fixed effects), while ε is error with E[εij] = 0 that is independent between different subjects i and i’, but may be correlated for j ≠ j’ within the same subject. It should be noted that the intercept is fixed at the same for each subject. Should the actual intercepts differ between subjects (i.e., be ) as random intercepts, then for both MM and GEE, the difference is incorporated into the error term ε of (1) and the within-subject correlation of that error [1]. Using (vs. not using) random intercepts does not influence the point estimates of or the variance of these estimates for mixed models [1]. However, for GEE using a different intercept on each subject (with each intercept now adding a new parameter) creates too many parameters for the asymptotic properties of GEE model to hold in our examples (and in general) which destabilizes parameter estimates [2]. Again for K = 1, the subscript for K is dropped and the model is + ε. The main goal of CS regression is to first obtain estimates for and then input into (1) in order to estimate future unobserved Y’s from observed ’s as . Cross-sectional regression is also used to make adjusted (causal) inference on the covariate associations in , but, as we show later, doing this may be problematic. Table 1 presents parameter estimates from repeated measure cross-sectional regression (1) to a clinical measure of glomerular filtration rate (EGFR) from the Modification of Diet in Renal Disease Study (MDRD) [3]. Formula (1) with EGFR as the outcome Y and three predictor variables (X1, X2, X3) = (HIV infection, serum albumin, blood urea nitrogen (BUN)) was fit to 10,782 semi-annual measures of 584 women at the Bronx-site of the Women’s Interagency HIV Study (WIHS) [4]. Higher EGFR values indicate better renal function. The models assume that the within- and between-subject associations of the predictor variables are the same. We later show this assumption is incorrect. The parameter estimates of Table 1 were calculated using GEE [1] with both independence (GEE-IND) in columns 2–4 and equicorrelation (GEE-E) columns 5–7 for the working correlation structure of model residuals from repeated measures in the same person. We again note that this model (1) has a fixed intercept across all subjects with the error term being independent between different subjects. However, otherwise in Table 1 (and elsewhere in the paper) the within subject correlation structure of the error is allowed to be either (i) independent within the same subject (GEE-IND) or (ii) to have the same correlation for all outcomes within the same subject (GEE-E). The second condition (i.e., equicorrelation) is equivalent to fitting a random subject intercept model [1]. Most of today’s literature providing guidance on fitting repeated measures linear regression (i.e., [5,6,7,8,9,10,11,12,13]) qualitatively describes working correlation as a “nuisance factor” that does not alter model parameters and states that “the working correlation that minimizes variance of parameter estimates should be chosen”. However, in Table 1, the parameter estimates for BUN (per g/dL), from GEE-E, of −1.22; 95% confidence interval (CI) (−1.46, −0.99) is both qualitatively and statistically higher than the corresponding GEE-IND estimate of −1.87; 95% CI (−2.12, −1.62). For HIV, the parameter estimates of −3.86, p = 0.0081 from GEE-E is qualitatively lower than that from GEE-IND −2.04 and p = 0.19. Clearly, changing the working correlation from independence to equicorrelation qualitatively and statistically changes the parameter estimates. Thus, this correlation structure is not a nuisance factor. When faced with such a dilemma of qualitatively and statistically different parameter estimates from the same model fit to the same data with only the working correlation structure changed (as is shown in Table 1), investigators typically go to published guidance on which correlation structure to use. To that end, based on the within-subject correlation of residuals being 0.45 in GEE-E (and in MM-E), and the quasi-likelihood independence criteria goodness of fit statistic (QIC) = 10,836.27 for GEE-E being smaller than the QIC = 10,847.14 for GEE-IND (or the Akaike information criteria goodness of fit statistic (AIC) from a mixed model using equicorrelation (MM-E) of (AIC = 94,934.5) being smaller than AIC = 99,374.5 from a mixed model using independence (MM-IND) as shown in Table A1 in Appendix A), almost all articles providing model fitting guidance [5,6,7,8,9,10,11,12,13] point towards using equicorrelation as the working correlation structure. However, as the rest of Section 2 describes in detail, this guidance is problematic as only the parameter estimates obtained by using independence working correlation can have any meaning for cross-sectional regression.
Table A1

Parameter estimates for the cross-sectional regression of Table 1 EGFR = HIV infection, serum albumin and BUN in the Bronx WIHS using mixed models.

VariableWorking Correlation Structure
Independence Equicorrelation
Point Estimate95% CI 1Z-Value (p) 1Point Estimate95% CI 1Z-Value (p) 1
HIV Infection HIV,CS)−2.04(−5.04, −1.04)−4.02 (<0.0001)−3.99(−7.04 −0.93)−2.55 (0.01)
Albumin Per g/dL ALB,CS)−6.21(−7.30 −5.11)−11.04 (<0.0001)−9.89(−11.03, −8.73)−16.90 (<0.0001)
BUN Per mg/dL BUN,CS)−1.87(−1.95, −1.79)−44.68 (<0.0001)−1.22(−1.30, −1.13)−29.08 (<0.0001)
Akaike Information Criteria (AIC)99,374.594,934.5

1 The confidence interval and p-values for independence working correlation structure in particular but also arguably for equicorrelation as well overestimate the precision of the parameter estimates. Unlike GEE, mixed models are not robust to misspecification of the correlation structure.

But first we make two brief asides. First, we note that if MM, rather than GEE are used for Table 1, the corresponding parameter point estimates in Table 1 using independence correlation (MM-IND) and equicorrelation (MM-E) are essentially unchanged [1]. (See Appendix A for details on parameter estimates from MM fit to this data with independence and equicorrelation correlations structures). However, due to non-robustness of MM, GEE is preferable for this specific example. Second, we note that the differences observed in Table 1 occur not only between independence and equicorrelation. Any different choice of correlation structure, such as AR(1), Toeplitz, unstructured, etc. will result in different parameter estimates (results not shown). For simplicity, we focus this article on only two structures: independence and equicorrelation. 2B Between-/Within-Subject Slope (BS/WS) Regression. While investigators almost never consider this in practice, it has long been noted that slopes on changes of X within the same subject i differ from cross-sectional slopes on between subject-measure differences in X [14,15,16,17]. To illustrate this, consider the cross-sectional model of a laboratory measure cholesterol (Y), which is well known to be higher in people with more body fat. To that end, the predictor is body weight (X) with E[Y] = . As described in the Introduction, the cross-sectional slope β for association of a 10 lbs. weight difference between two different adults for cholesterol is less than the slope for association of a 10 lbs. within-subject weight change for the same adult on cholesterol, which we denote as β. Again, the reason β is less than β is that: (i) a 10 lb. cross-sectional weight difference between two adults often reflects greater height in one of the persons, but (ii) a 10 lbs. weight increase in the same adult is not influenced by height difference and thus is more likely due to more body fat after the 10-lbs weight gain. Thus, since greater body fat is what is directly associated with more cholesterol, the within-person association of a 10-lb. weight increase with cholesterol is greater than the cross-sectional repeated measures association with a 10-lb. weight difference between two persons. Common within-person height creates a shared within-subject measurement bias from this extraneous factor for subject i (denoted E) on weight as a predictor of cholesterol. To that end, many investigators adjust weight for height using body mass index = wt/ht to remove this effect of height on weight. As Figure 1a illustrates, if TX = body mass index (wt/ht) were the true predictor of Y, and H = height (which does not change with j in the same i), then X contains this shared within-subject measurement bias from common H which again we denote as E in Table 1a to confer it is an extraneous within-subject bias. Section 4 describes more settings where .
Figure 1

Illustration of common within-subject measurement bias and confounding for K = 1.

While for weight it is possible to remove the common shared within-subject bias from height by dividing by ht, this is not the case for less well-understood causal relationships. Therefore, to model and account for a bias such as this, linear regression models fit for making causal inference can decompose the associations into “within-subject” slopes (), described above, and “between-subject” slopes (), described below, which capture associations of subjects’ central tendencies of the exposure. To do this, subject means of the predictor variables are calculated, where . Then Y is modeled as a combination of “between-subject” slopes from (that could be influenced by the common person measurement bias in Figure 1) and “within-subject” slopes from deviations of X about which will be free of such a bias, since the comparison is within person. As described for (1), this is a fixed intercept model that is functionally equivalent to a random intercept model for MM. When K = 1, we have . To illustrate this for our earlier example with Yij = cholesterol and Xij = weight, let , β = 0.9 and β = 3, such that . If person i had an average value of = 210 across all J measures with the j measure being X = 200, then for the person-visit at time t, E[Yij] = 30 + 0.9(210) + 3(200-210) = 189. Now we make some technical asides. First, the choice of the observed as the “central tendency” of X for subject i is necessary as a person’s “true average weight” over the entire time period is unknown, but for J large enough, should be close to . Thus, while β only captures association with within-subject change in X inherently contains some β from deviation of (); especially for small J. This situation is described for occupational epidemiology research, where often an average of personal exposure measurements is computed as estimate of true exposure of a “subject”, defined as either an individual, or group of individuals that share a job [18]. Second, the implicit assumption that β is well defined may also not always be true. For example, “β” could differ by time separation tij – tij’. Perhaps for k = weight, a weight gain of 10 lbs. in one month creates a shock that hyper-elevates cholesterol, but a 10 lbs. weight gain over 12 months does not, in which case . Third, if the investigator is only interested in the within-subject slopes he/she can substitute as a a different subject intercept for the between-subject slopes in (2) with the model reducing to . Despite these technical caveats, the within- vs. between-subject decomposition in (2) is used to test whether so that, as shown in Section 2C, they also equal and thus the separated WS vs. BS decomposition can be collapsed to (1). Due to the orthogonal decomposition of X about this previous test for collapsing the within- vs. between-subject decomposition is a two-sample z-test of parameter estimates from fitted models comparing to Z [17]. The within- vs. between-subject decomposition is mostly used for inference on adjusted (causal) associations of the X on Y. It is typically not used to produce models to estimate future unknown Y from known as such estimation often only happens in settings where just one observation per subject is available, hence . We refit the analyses of Table 1 to illustrate that the impact of choice of correlation structure (i.e., GEE-IND vs. GEE-E working correlation structure) is eliminated in our example after making a within- vs. between-subject decomposition. Please note that there were no new HIV infections after study entry; so meaning that the within-subject association of change of HIV infection status cannot be modeled. For within-subject associations of BUN and albumin, GEE-IND and GEE-E gave identical point estimates, because centering about makes comparisons entirely within-subject and invariant to these correlation structure choices (although within-subject estimates could differ slightly if autoregressive (AR (1)) or other formulations for intra-subject correlation of residuals had been used). There were small GEE-IND vs. GEE-E differences on the between-subject slopes as was observed elsewhere [19]. For example, the point estimate for between-subject HIV status is −1.16; 95% CI (−4.21, 1.88) in the GEE-IND of Table 2 versus −1.57; 95% CI (−4.47, 1.33) with GEE-E.
Table 2

Within- and between-subject decomposition regression parameter estimates using GEE 1 for EGFR = HIV infection, serum albumin and BUN in the Bronx WIHS.

Variable Working Correlation Structure
CompartmentIndependence Equicorrelation
Point Estimate95% CIZ-Value (p)Point Estimate95% CIZ-Value (p)
HIV InfectionBetween-subjectHIV, BS)−1.16(−4.21, 1.88)−0.75(0.45)−1.57(−4.47, 1.33)−1.06(0.29)
NA 2---------NA 2------
AlbuminPer g/dL Between-subjectALB, BS)−3.27(−7.88, 1.33) −1.39(0.16)−2.71(−7.00, 1.57)−1.24 (0.21)
Within-subjectALB, WS)−10.70(−12.99, −8.40)−9.16 (<0.0001)−10.70(−12.99, −8.40)−9.16 (<0.0001)
BUN Per mg/dLBetween-subjectBUN, BS)−2.72(−3.10,−2.33)−13.89(<0.0001)−2.65(−3.01, −2.08)−14.21(<0.0001)
Within-subjectBUN, WS)−1.11(−1.34, −0.88)−9.31 (<0.0001)−1.11(−1.34, −0.88)−9.31 (<0.0001)
Quasi-Likelihood InformationCriteria (QIC)10,866.6410,857.62

1 Mixed models gave essentially similar point estimates. See Appendix A 2 There is no within-subject variation for HIV infection status.

From now on, we only examine GEE-IND results for within- between-subject decomposition models, as GEE-E results are similar. For BUN and GEE-IND, the within-subject = −1.11, 95% CI (−1.34, −0.88) is qualitatively and statistically closer to 0 than is the corresponding between-subject slope = −2.72, 95% CI (−3.10, −2.33). However, serum albumin goes the other way: the within-subject slope = −10.70, 95% CI (−12.99, −8.40) is statistically further from 0 than is the corresponding between-subject GEE-IND = −3.27 with a 95% CI (−7.88, 1.33) that overlaps 0. The QIC is lower (10,857.62) for equicorrelation than for independence (10,866.64) which perhaps now indicates an advantage to the former correlation structure in this setting where the slopes have been correctly decomposed. One might wonder how to interpret differences in the within- and between-subject slopes for causal inference, including the reasons that these slopes were different? This in part will depend on the hypotheses of interest (and we did not have any for this illustrative example). However, general rules also apply, although we are unaware of any systematic exploration of reasons why the between-subject slopes (or for K = 1) could differ from within-subject slopes (or for K = 1). and the resultant implications for causal inference. Before outlining these rules, it is important to note an important relationship among cross-sectional, within-subject and between-subject slopes. 2C Relationship between , and . Now averages and according to relative variances of the subject means (i.e., the ) vs. the variance of the repeated measures about those sample means (i.e., the ) [17]. For example, with K = 1, if is the population variance of the within-person mean and is the population variance of the deviations of differences of the repeat measures X from their , then In the previous example of weight and cholesterol with = 0.9, = 3 and , if and , then from (3) = 0.9*400/(100+400)+3*100/(100+400) = 1.32. If the between-person sample means are more homogeneous in weight with but the within- person is still 100, then again using (3) moves closer to ; = 0.9*200/(100+200)+3*100/(100+200) = 1.60. 2D Working Correlation Structures for Model Residuals Other than Independence Can Lead to Unusable Results for Cross-Sectional Regression. As noted earlier, fitting both MM and GEE repeated measure regression models involves specification of correlation (or working correlation) structure of ε within the same subject i. We denote the working correlation structure by matrix V. Typical choices for V are the ones we used in the illustrative examples of Table 1 and Table 2; equicorrelation (E), with correlation of ε and ε’ for j ≠ j’ always the same value ρ (this common value of ρ is estimated in the model fitting process based on the residuals in the model fitting process), and independence (IND), with correlation of εij and εij’ ≡ 0. However, other structures are used such as AR(1) where correlation of ε and ε is ρ with the value of ρ being estimated from the residuals [1]. Again, current guidance [5,6,7,8,9,10,11,12,13] emphasizes choosing the V that most closely fits the true covariance structure of the residuals within i and/or by model fit criteria such as having lowest QIC for GEE and AIC for MM, because doing so often improves precision of the model parameter estimates. However, we just observed that this approach may be wrong for CS regression, because using any correlation structure other than IND can introduce structural bias into [20,21] and, unfortunately, AIC and QIC do not account for this bias. To that end, Pepe and Anderson (1994) [20], developed a general rule for when IND is (and is not) the only correlation structure that should be used for CS regression that we now present. Specifically, they show that if a predictor varies (i.e., takes on different values) within the same subject i and, then, no matter what true correlation structure of εij among repeated measures within a subject is, GEE-IND gives unbiased estimates for , but any MM or GEE model not using V = IND, gives biased estimates of . Thus, the only working correlation structure that should be used to estimate is V = IND. However, if (4) does not hold, then any working correlation structure obtains unbiased estimates for in which case, choosing the V that most accurately fits the correlation structure of εij minimizes the variance of . Our paper only focuses on equicorrelation as the alternate to independence in order to keep the presentation from becoming too cumbersome, given the large number of possible correlation structures. However, the previous paragraph and (4) apply to any non-independence correlation structure. As one (of many) examples of where (4) holds, let k = 1 and Y and X be the degree of airway obstruction and inhalation of tobacco smoke of subject i at time j, respectively. One would expect that, because smoking effect on the lung is cumulative, historical smoking in a current smoker or non-smoker would lead to poorer lung function. Thus, E[Y for a smoker at time j’ < j would poorer irrespective of X. We now present an easier way to visualize (4). If repeated measures j and j’ are thought of as “siblings” and the predictors as “exposures” then (4) means that even after considering the “self-exposure” of the current measure j through the outcome Y has “nditional ependence n ibling xposures” (Co-DOSE) (i.e., on ). Thus, the sibling exposure could be thought of as a Co-DOSE beyond the “dose” from the “self-exposure”. Hence, from now on we use the term Co-DOSE to denote that (4) occurs. Also, while this point has not been very well made, for CS regression, Co-DOSE in (4) largely occurs if and only if within- and between-subject slopes differ. If within- and between-subject slopes differ for any predictor (i.e., ) then Co-DOSE (4) happens. However, if the within- and between-subject slopes are equal for all predictors (i.e., ) then Co-DOSE (4) does not occur. More details on this and an illustration are given in Appendix B, but one trivial case arises if the predictors are invariant within the same subject (i.e., ) such that the within-subject slopes are not defined (since ) and for the same reason Co-DOSE in (4) cannot occur. While the mathematical details are beyond this paper, if and = IND, then non-zero covariance ρ > 0 besides adjusting for within-i collinearity of ε over-weights the relative to in (3), thereby pushing CS regression parameter estimates away from towards [17]. Since robust covariance methods exist to adjust for impact of misspecification of = IND from collinearity of the residuals ε on variance estimates, in particular for GEE [1], = IND can eliminate bias in estimating while providing conservative variances for the parameter estimates. 2E Implications for Applied Research and Statistical Practice. Much of what has been presented above is not commonly understood and implemented in applied research and statistical practice. CS models are typically fit, with interpreted to also be and , without checking if these slopes are equal. Non-independence is often used for CS regression without checking if Co-DOSE (in (4)) exists. Perhaps in part this occurs because systematic epidemiological descriptions of causal mechanisms for why between- and within-subject slopes can differ are lacking, which hinders awareness of this possibility. We endeavor to fill this gap in Section 3.

3. Epidemiological Reasons for Between- and Within-Subject Slopes to Differ

To make it easier for investigators to identify what could cause β ≠ β (or equivalently Co-DOSE) in a given setting, we classify major reasons why this can happen. For simplicity, let K = 1 unless otherwise noted, as the following principles extend to multivariate settings. 3A. Change Effects. We propose that the effect of a longitudinal within-subject change in the predictor X could have a greater (or less) direct impact on Y than a long-term standing difference in X between two different subjects (hence β ≠ β) and define this as a (c.f. short term) “change effect”. Returning to the example of weight and cholesterol, consider two identical twins, “A” has lived his adult life at = 190 lbs. and “B” at = 180 lbs. If “B” undergoes a short-term weight gain of 10 lbs. to 190 ( = 10), assuming not impacted by the rapid change, while A remains at 190 lbs. ( = 0), the shock or corollaries of this rapid change in B may raise his cholesterol level above that of A’s even though they both now weigh 190 lbs., meaning that β > β and Co-DOSE in (4) occurs. However, it should be noted that as was mentioned in Section 2B, in this setting, β would be somewhat undefined if, e.g., a 10 lbs. gain in a shorter time period (i.e., 1 month) increases β more than does a 10 lbs. gain over a longer time period (i.e., 12 months). 3B Lag Causality of The effect of historical levels of X on Y may independently project into the future (i.e., beyond that effect of the current level of X). For example, consider an HIV-infected person and two time points t < t; let X be HIV viral load and Y be CD4 count. High HIV levels destroy CD4 blood cells into the future. Therefore, as illustrated in Figure 2a, high HIV viral load at t may affect CD4 loss from t1 to t2 so that even if the person’s HIV viral load is low at t2, the high viral load at t1 is predictive of lower CD4 at t2 through higher viral load at t1 having created more CD4 destruction between t1 and t2 (i.e., lag causality of X at t1 on Y at t2). Thus, Y at t is not independent of X at t1; Co-DOSE in (4) occurs and the within- and between- subject slopes differ (β ≠ β). In Figure 2a,b, E denotes that X differs from X due to an extraneous process that is causing X to change over time. Lag causality is often considered when serial measures of X represent long-term environmental exposures (such as air pollution and cigarette smoke) that effect chronic conditions Y (such as lung function) are obtained [1,18,22].
Figure 2

Illustration of lag causality and reverse-lag causality for K = 1.

3C Reverse-Lag Causality of The setting in Section 3B also manifests in the opposite direction if X is being used as to estimate Y that is causal for future X. Reversing the previous example with X now being CD4 used to predict HIV viral load as Y, as Figure 2b illustrates, high viral load (Y) at t1 may have degraded the CD4 count from t1 to t2. Thus, Y at t is not independent of X at t: Co-DOSE in (4) occurs and within- and between-subject slopes differ (β ≠ β). 3D Spillover Causality of X on Adjacent Y. An analogous setting to those of 3B and 3C can also manifest in repeated measure cross-sectional settings based on geographical proximities. Let the subjects i now be cities and j enumerate different neighborhoods in these cities. The repeated measures are average air pollution (X) of neighborhood j in city i and average lung function of all residents living within neighborhood j of city i (Y). A resident living in neighborhood j may work in a different neighborhood j’ of the same city and thus have “spillover exposure” to air in the neighborhood they work in, for a given city i, thus Y is not independent of X and hence Co-DOSE in (4) occurs. 3E Common Within-Subject Measurement Bias. Shared within-subject measurement bias occurs if all repeat measures from the same subject have the same correlated measurement bias. This was the setting described in Section 2B and Figure 1a with weight as exposure for cholesterol. Here with weight as a surrogate for body fat, the measurement bias was mediated by height with taller adults being heavier independently of body fat than shorter adults, which leads to β > β and Co-DOSE in (4) when weight was a predictor of cholesterol. In this setting, height is a measurement bias not a confounder as height itself is not associated with cholesterol. We now present a similar setting where the un-modeled variable is a confounder. 3F Common Within-Subject Confounding. Figure 1b shows common within-subject confounding, that causes β ≠ β and Co-DOSE in (4). This phenomenon is diagrammatically similar to common measurement bias that was described in Section 2B. However, rather than a common measurement bias, the extraneous factor, shared by the repeated measures of the same subject, is a confounder that is associated with both X and Y. For example, let the confounder variable C be sex of subject i (which does not change with j) not be in the model and the outcome Y be a linear score for male pattern baldness at time j with again X being weight at time j. Adult men are both on average heavier and, independently of weight, have greater male pattern baldness than do adult women. So C is associated with both the exposure and the outcome. Here a 10 lbs. weight difference in two adults, but not a within-adult increase of 10 lbs., could be informative of the heavier adult more likely being male. Hence for this example, β = 0 (assuming within adult weight does not influence baldness), but β > 0 (and thus β > 0) as males are more likely to be both heavier and bald compared to women. Hence also β > 0, reflecting unaddressed between-subject confounding from heavier adults more likely being men. Similarly, Mancl, Leroux and DeRouen proposed that in a study with repeated dental predictor and outcome pairs as (X) measured on teeth (i.e., enumerated by j) on the same persons (i.e., enumerated by i) that better compliance with dental treatment by some persons was a confounder that could lead to differences in slopes within and between subjects [19]. In a non-longitudinal setting where i denotes clusters (for example schools) and j denotes repeated subjects within that cluster (for example students), common within-subject confounding is referred to as “contextual effects” [23,24]. For example, as Robinson (1950) [14] observed, when X was race of the student and Y was achievement-score, a higher (here: portion of a school’s students that were non-White) indicated weaker financial support for that school (weaker financial support being the confounder) and thus worse achievement-scores overall for that school: β was negative. However, within the same school, race had no impact on the achievement score (β = 0). Begg and Parides [25] identify a similar setting in birthweight and intelligence quotient in families. 3G Measurement Error in In many settings, the predictor we observe is X = TX + M where TX is the true value of the predictor and M is measurement error that is independent of TX (i.e., classical measurement error). It has been shown that, measurement error in X that is either independent of [26], or correlated with Y [27], biases estimates for the slope that relates TX with Y. Measurement error can arise either from imprecision in an analysis instrument, such as in a machine quantifying components of serum, or in data collection process, such as the chemical composition of blood samples being non-informatively influenced by diurnal and other nuisance processes. If X is incorrectly quantified due to such measurement error, then Co-DOSE in (4) occurs and the observed within- and between-subject slopes differ, because, as illustrated in Appendix C, the biases being created from the measurement error distribute differentially to different slopes. As Figure 3 shows and the paragraph below it describes using an illustrative example, if X incompletely measures the true state TX (i.e., true BUN) due to classical measurement error as the extraneous influence then X is informative for TX even after considering X. Please note that in Figure 3 there are two times subscripts on the extraneous influence, because E and E are two independent measure errors.
Figure 3

Illustration of residual association with independent measure error in X for K = 1.

For example, going back to the analysis of Table 1, let X be BUN and Y be EGFR. Consider two persons who have BUN of X = 10 mg/dL measured with error today. Also assume that the true BUN state changes slowly. If so, and after 6 months one of these persons measures X = 20 mg/dL while the other measures X = 5 mg/dL, we can then surmise that since BUN changes slowly, it is more likely that the true BUN today (TX) of the former person is > 10 mg/dL and that of the latter is < 10 mg/dL. Thus, since (i) EGFR (Y) directly depends on TX not X, and (ii) X is informative on TX after considering X, then (iii) Y is not independent of X and similarly Y not independent of X meaning Co-DOSE in (4) occurs and the observed within- between-subject slopes differ. Appendix C shows that measurement error in the exposure that is independent of the outcome pushes both β and β towards 0, but more so for βWS. Such tempering from averaged measurement error has been proposed as a reason |β| < |β| was observed in dental research [19] and occupational epidemiology [28,29]. However, if M is correlated with Y (most likely being correlated with measurement error on Y [27]) the tempering of β’s from M will not be to 0. For example, consider TX = CD8 and TY = CD4 cells which together are the almost exclusive components of serum lymphocytes (TZ) (i.e., . Physiologically, TZ is constrained to create a negative β, β and β for TY on TX: subjects with a higher CD8 component of serum lymphocytes by converse must a have lower CD4 components. However, the measured lymphocyte count (Z) is subject to a correlated measurement error that equally spreads onto X and Y. For example, if a person is dehydrated, the entire measured lymphocyte (meaning both CD8 = X and CD4 = Y) portion of blood becomes artificially higher due to reduction of the percentage of water in the blood. If a person has a high (or low) measured lymphocyte count Z = TZ + M due to such measurement error, then M contributes to both CD4 (X) and CD8 (Y), making both simultaneously artificially higher (or lower). Consequently, within person, a higher measured CD4 count due to positive M is associated with higher measured CD8. Because in this case the measurement error is shared, naïve regression analysis tends to draw β towards being positive. On the other hand, β which tempers down M on both X and Y through averaging as shown in Appendix C, is less affected by the shared bias due to measurement error. We have only considered classical measurement error so far. The other common type of measurement error is known as Berkson error [30]. It is approximated by some exposure assessment procedures commonly used in environmental and occupational epidemiology (see semi-ecological design and group-based exposure assessment) [18]. While this is an aside to the main points of this paper, when Berkson measurement error exists, only the between-subject slope, β, is estimable. More details are in Appendix D.

4. Predictors Having Co-DOSE Will Bias Adjusted Parameter Estimates of Other Predictors Not Having Co-DOSE When Included Together in Cross-Sectional Regression When V ≠ IND Is Used

Going back to Table 1, it was shown earlier that the point estimate from GEE-IND for the adjusted cross-sectional association of HIV with EGFR is still consistent for . However, HIV infection status was constant over all replicates within the same subject, and therefore cannot have Co-DOSE in (4) as the entire effect of HIV is mediated between-subject, not within-subject. Consequently, the question arises whether the adjusted estimate from a non-independence correlation structure (say for example ) can be biased for . Please note that for this section, we use and to denote estimates for adjusted cross-sectional association for variable XXX from models using independence and equicorrelation structures, respectively. The added designation of “E” (CS-E) in the subscript for equicorrelation, but none for independence correlation, is made because the equicorrelation estimate (but not the independence estimate) can be asymptotically biased. The specific question addressed here is: could including BUN and albumin that each have Co-DOSE in the model bias the corresponding estimate for cross-sectional adjusted HIV association from using equicorrelation () so that it no longer is consistent for βHIV,CS in the multivariate model, even though HIV itself is not Co-DOSE? This is important, because in Table 1, of −2.04 95% CI (−5.07 0.98) qualitatively differs from of −3.96 (−6.90, −1.03) with only statistically (p < 0.01) differing from 0. We believe that for HIV in Table 1 is biased away from β. To help make this point, Table 3 presents normative data broken down by HIV status of the subjects. First we note from Table 1 that is biased higher for (with GEE-E = −1.22 > = −1.87, p < 0.0001 from GEE-IND), while from Table 3, those who are HIV+ have higher mean BUN (12.94 vs. 12.10, p < 0.0001 from GEE-IND). Thus, the full apparent “negative effect” of the higher BUN in HIV+ subjects from is underestimated by and this pushes down to compensate. Second, similarly, also from Table 1, is biased lower for β (with GEE-E = −9.84 < = −6.21), while from Table 3, HIV+ individuals have lower mean albumin (3.97 vs. 4.14, p < 0.0001 from GEE-IND). Thus, the apparent “positive effect” of the lower albumin in HIV+ subjects from is overestimated by , which pushes further down to compensate. Now we consider these two biases together as illustrated in Figure 4. These two deficits act jointly to push downwards from the true adjusted β. Therefore, non-independence can bias multivariate cross-sectional parameter estimates of variables that do not carry Co-DOSE in (4) when other variables in the model carry Co-DOSE.
Table 3

Means ± standard deviation of EGFR serum albumin and BUN broken down by HIV status across all repeated measures used in Table 1 and Table 2.

Variable For HIV + Subjects (496 persons 7326 Replicates)For HIV - Subjects (178 persons 3456 Replicates)
EGFR90.3 ± 27.292.4 ± 25.0
BUN12.94 ± 5.7112.10 ± 5.30
Albumin3.97 ± 0.444.14 ± 0.36
Figure 4

Compensating bias pathways on time invariant HIV estimate from failure to use an independent working correlation structure in repeated measures GEE.

5. Discussion

Numerous published papers fit GEE and MM cross-sectional regression models with repeated measures having time varying predictors that either use non-independence working correlations structures or do not state the correlation structure. These papers, which continue to be published, do not show awareness of the points presented in Section 1, Section 2, Section 3 and Section 4, above. Specifically, they: Neither specify whether the coefficients of interest are , or, nor check whether ; Make potentially invalid interpretations of from MM and GEE using non-independence correlation ; and/or; Do not justify the choice of non-independence working correlation structures in light of potential differences between , and . We have identified almost 45 such papers including some authored by us prior to becoming aware of these issues. This is almost certainly only a fraction of the total number of such papers. Yet papers published up to 65 years ago either warn against using non-independence working correlation structure in cross-sectional regression with repeated measures [19,20], or instruct to decompose the associations into within-subject () and between-subject () slopes to make causal inference [14,15,16,17]. Numerous examples where have been presented [14,15,16,17,18,19,20,22,23,24,25]. While it was not covered in our paper, this includes fitting GEE models of binary outcomes where the issues discussed here also apply [19,31]. However, these points are still not well known or emphasized in statistical software documentation and papers providing guidance on GEE and MM analyses (i.e., [5,6,7,8,9,10,11,12,13]). One problem that impedes acceptance of within- and between-subject decomposition is that it necessitates much more complicated models that are difficult to explain. Still, some air pollution epidemiologic studies have attempted within- and between-subject decompositions using cities as the subject and neighborhoods as the repeated measures within the city [32,33,34]. Most often in these studies, the magnitude was greater for within-subject slope |βWS| > |βBS| but sometimes |βBS| > |βWS| was observed meaning that possibly multiple causes for slope differences are involved. Those papers that did attempt to explain the reasons for the differences described only “common within-subject confounding” (Section 3E) as a potential reason; such as un-modeled pollutants that were correlated between (but not within) cities with the modeled pollutants of interest. Other studies in environmental research have considered the mechanism described in Section 3B, namely, lag causality in longitudinal analyses of association of air pollution on health measures [1]. Nevertheless, having to explain complicated and unknown mechanisms for biases such as these can appear to detract from the main purpose of the research and cast doubt on the overall findings, making the paper harder to publish. In other words, there appears to be neither incentive, nor guidance on how to engage with these issues for applied researchers. We concur with others [19,20], that cross-sectional regression with repeated measures should use independence as the default working correlation unless justification is given to use other . While non-independence can improve precision and thus be desirable [21], they can considerably bias estimates for cross-sectional parameters, , including perhaps towards what the investigator wants to see. For example, in Table 1, p < 0.01 was observed for association with HIV with worse EGFR in GEE-E compared to the more appropriate p = 0.19 from GEE-IND. An investigator who was expecting HIV to be associated with worse EGFR might thus be tempted to use the results from GEE-E for this reason. While showing this is beyond the scope of our paper, when is not independence, factors such as the values of J and magnitude/structure of ε strongly influence parameter estimate values for from the miss-fitted cross-sectional models, allowing the miss-fitted estimate to arbitrarily range from to [17]. Standardization is important and, as such factors will arbitrarily vary between studies, parameter estimates of become harder to compare across studies when differs at discretion of investigators. Therefore, the working correlation structure used in cross-sectional regressions using repeated measures should always be justified and reported. We also concur with others [14,15,16,17,18,19,23,24,25] that despite the difficulties in identifying why within- and between-subject slopes differ, causal inference analyses with repeated measures should initially make such decompositions. Investigators should then be wary if there are qualitative differences between and . For example, Table 2 with 584 subjects and 10,782 measurements demonstrated need for , decomposition to make causal inference (as well as for using GEE-IND in cross-sectional regression). However, a smaller study could have been less clear-cut. If the same point estimates for and seen in Table 2 were observed but did not statistically differ, one would be tempted to merge and into a combined at least for some variables, because standard model-fitting practice promotes parsimony when statistical significance is not observed. This would be particularly true if for a given variable, k, neither nor statistically differed from 0, but did. If such collapsing is done, it may still be important to report and for comparison to future studies and target potential mechanisms for within- between-subject slope differences as described in Section 3. Unfortunately, the within- and between-subject slope decomposition expands required analyses and presentation. Statistical software mostly does not have standard subroutines to do this. Decomposition can be tedious if is recalculated to maintain orthogonal decomposition of X as new models are fit if observations are excluded from the J due to missing values of newly included variables. The fact that the are ill-defined by averaging the X rather than being true means for subject i creates confusion about interpretation of that can also be influenced by within-subject slopes as was noted in Section 2B. When and differ, the causal mechanisms as to why this happens should be explored. For example, in our analysis presented in Table 2 with EGFR as the outcome, for BUN the between-subject slope (from GEE-IND) was statistically further from 0 in the expected direction of association than was the within-subject slope . However, the albumin went the other way: between-subject slope was statistically closer to 0 than was within-subject slope with again both slopes being in the expected direction from zero. So what are the potential reasons for this? While lag/reverse-lag causality (Section 3B,C) between BUN and creatinine (the main component of calculated EGFR) could reduce magnitude of βBUN,WS vs. βBUN,BS, this was unlikely given the separation of visits was 6 months and internal biochemistry operates over shorter time periods. However, independent measurement error on BUN (Section 3G) would temper |βBUN,WS| towards 0 relative to |βBUN,BS|. To that end, several articles find greater coefficient of variation [35,36], within-person change [35,36], assay error [36], and sample degradation for BUN vs. albumin measures [37], all of which could reflect BUN having larger independent measurement error than does albumin that would selectively attenuate towards 0 (i.e., more than it did to ). Conversely, serum creatinine and albumin are both constrained into the intravascular fluid compartment and will non-informatively increase together with greater hydration and decrease with less hydration of this compartment, inducing positively correlated measurement error, as in the case for measured CD4 and CD8 cells in the last paragraph of Section 3G. As creatinine factors inversely into the EGFR calculation, this would constitute negative correlation of measurement error between albumin and EGFR and selectively bias to be more negative than . However, BUN, which crosses across all body compartments, is less subject to such correlation in measurement error with creatinine and thus with EGFR. As is illustrated in the previous paragraph, we believe that the systematic epidemiological description of reasons for within- and between subject slopes to differ in Section 3 will provide some basis for future studies to explore this. That may lead to greater recognition and understanding of this phenomenon. However, our list of reasons for these slopes to differ may not be exhaustive. Furthermore, these mechanisms are quite complicated including that limited resources may be available to investigate them in given studies given the other tasks that need to be done and limited funding/personnel. When between- and within-subject slopes differ, , it is unclear which is the “least confounded or biased”, including the possibility that by “averaging” the different biases in each would make be the least biased. There may be a heuristic perception that by “matching within the same subject”, is superior to and , but this is not necessarily true as measurement error in X (Section 3G) and lag/reverse-lag and spillover causality (Section 3B–D) can in fact bias to a larger degree than they do for and .

6. Conclusions

It has been known for decades by some that when exposures vary within subjects in repeated measures regression then, (i) cross-sectional regression using = independence working correlation should be the default for building a model to estimate a future unknown Y as the goal, and (ii) within- and between-subject decompositions of slopes should at least initially be fit when building models for causal inference. Yet this advice rarely makes it into published guidelines and hence is not heeded, perhaps in part due to complexity of the settings where within- and between-subject slopes differ and limited substantive study of the mechanisms that cause such differences. In general, analysts should explore and quantify reasons for biases that can occur in such study designs. To that end, analyses using repeated measures regression should investigate if within- and between-subject slopes differ and when they do, try to identify the reasons for this.
Table A2

Parameter estimates for the within- between-subject decomposition regression of Table 2 EGFR = HIV infection, albumin and BUN in the Bronx WIHS using mixed models.

VariableCompartmentWorking Correlation Structure
IndependenceEquicorrelation
Point Estimate95% CI 1Z-Value (p) 1Point Estimate95% CI 1Z-Value (p) 1
HIV Infection Between-subjectHIV,BS)−1.16(−4.21, 1.88)−2.30(0.02)−1.57(−4.47, 1.32)−1.06(0.29)
NA 2---------NA 2------
AlbuminPer g/dL Between-subjectALB,BS) −3.28 (−4.78, −1.77) −4.26(0.16) −2.72 (−7.00, 1.57) −1.32 (0.19)
Within-subjectALB,WS)−10.70(−12.24, −9.15)13.60 (<0.0001)−10.67(−11.86, −9.48)−17.57 (<0.0001)
BUN Per mg/dL Between-subjectBUN,BS)−2.72(−2.84, −2.60)−44.87(<0.0001)−2.65(−2.95, −2.35)−17.10(<0.0001)
Within-subjectBUN,WS)−1.11(−1.22, −0.99)−18.59 (<0.0001)−1.11(−1.20, −1.03)−25.72 (<0.0001)
Akaike Information Criteria (AIC)98,451.894,824.7

1 The confidence interval and p-values for independence working correlation structure in particular but also arguably for equicorrelation overestimate the precision of the parameter estimates. Unlike GEE, mixed models are not robust to misspecification of the correlation structure. 2 There is no Within-subject Variation for HIV Infection Status.

  21 in total

Review 1.  Modelling covariance structure in the analysis of repeated measures data.

Authors:  R C Littell; J Pendergast; R Natarajan
Journal:  Stat Med       Date:  2000-07-15       Impact factor: 2.373

2.  Separation of individual-level and cluster-level covariate effects in regression analysis of correlated data.

Authors:  Melissa D Begg; Michael K Parides
Journal:  Stat Med       Date:  2003-08-30       Impact factor: 2.373

3.  A comparison of several approaches for choosing between working correlation structures in generalized estimating equation analysis of longitudinal binary data.

Authors:  Justine Shults; Wenguang Sun; Xin Tu; Hanjoo Kim; Jay Amsterdam; Joseph M Hilbe; Thomas Ten-Have
Journal:  Stat Med       Date:  2009-08-15       Impact factor: 2.373

4.  Effects of correlated and uncorrelated measurement error on linear regression and correlation in medical method comparison studies.

Authors:  R D Rifkin
Journal:  Stat Med       Date:  1995-04-30       Impact factor: 2.373

5.  Modeling long-term average exposure in occupational exposure-response analysis.

Authors:  L Preller; H Kromhout; D Heederik; M J Tielen
Journal:  Scand J Work Environ Health       Date:  1995-12       Impact factor: 5.024

6.  Individual-based and group-based occupational exposure assessment: some equations to evaluate different strategies.

Authors:  E Tielemans; L L Kupper; H Kromhout; D Heederik; R Houba
Journal:  Ann Occup Hyg       Date:  1998-02

7.  Bias in the estimation of exposure effects with individual- or group-based exposure assessment.

Authors:  Hyang-Mi Kim; David Richardson; Dana Loomis; Martie Van Tongeren; Igor Burstyn
Journal:  J Expo Sci Environ Epidemiol       Date:  2010-02-24       Impact factor: 5.563

8.  Long-term exposure to air pollution and incidence of cardiovascular events in women.

Authors:  Kristin A Miller; David S Siscovick; Lianne Sheppard; Kristen Shepherd; Jeffrey H Sullivan; Garnet L Anderson; Joel D Kaufman
Journal:  N Engl J Med       Date:  2007-02-01       Impact factor: 91.245

9.  Real longitudinal data analysis for real people: building a good enough mixed model.

Authors:  Jing Cheng; Lloyd J Edwards; Mildred M Maldonado-Molina; Kelli A Komro; Keith E Muller
Journal:  Stat Med       Date:  2010-02-20       Impact factor: 2.373

10.  Stability studies of common biochemical analytes in serum separator tubes with or without gel barrier subjected to various storage conditions.

Authors:  Serap Cuhadar; Ayşenur Atay; Mehmet Koseoglu; Ahmet Dirican; Aysel Hur
Journal:  Biochem Med (Zagreb)       Date:  2012       Impact factor: 2.313

View more
  2 in total

1.  Mortgage Discrimination and Racial/Ethnic Concentration Are Associated with Same-Race/Ethnicity Partnering among People Who Inject Drugs in 19 US Cities.

Authors:  Sabriya L Linton; Hannah L F Cooper; Yen-Tyng Chen; Mohammed A Khan; Mary E Wolfe; Zev Ross; Don C Des Jarlais; Samuel R Friedman; Barbara Tempalski; Dita Broz; Salaam Semaan; Cyprian Wejnert; Gabriela Paz-Bailey
Journal:  J Urban Health       Date:  2020-02       Impact factor: 3.671

2.  Change in Circulating Undercarboxylated Osteocalcin (ucOCN) Is Associated With Fat Accumulation in HIV-Seropositive Women.

Authors:  Arnold Z Olali; Anjali Sharma; Qiuhu Shi; Donald R Hoover; Kathleen M Weber; Audrey L French; Heather S McKay; Phyllis C Tien; Lena Al-Harthi; Michael T Yin; Ryan D Ross
Journal:  J Acquir Immune Defic Syndr       Date:  2021-04-15       Impact factor: 3.771

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.