| Literature DB >> 30754731 |
Donald R Hoover1, Qiuhu Shi2, Igor Burstyn3, Kathryn Anastos4.
Abstract
When using repeated measures linear regression models to make causal inference in laboratory, clinical and environmental research, it is typically assumed that the within-subject association of differences (or changes) in predictor variable values across replicates is the same as the between-subject association of differences in those predictor variable values. However, this is often false. For example, with body weight as the predictor variable and blood cholesterol (which increases with higher body fat) as the outcome: (i) a 10-lb weight increase in the same adult affects more greatly an increase in cholesterol in that adult than does (ii) one adult weighing 10 lbs more than a second indicate higher cholesterol in the heavier adult. A 10-lb weight gain in the first adult more likely reflects a build-up of body fat in that person, while a second person being 10 lbs heavier than the first could be influenced by other factors, such as the second person being taller. Hence, to make causal inferences, different within- and between-subject slopes should be separately modeled. A related misconception commonly made using generalized estimation equations (GEE) and mixed models on repeated measures (i.e., for fitting cross-sectional regression) is that the working correlation structure only influences variance of the parameter estimates. However, only independence working correlation guarantees that the modeled parameters have interpretability. We illustrate this with an example where changing working correlation from independence to equicorrelation qualitatively biases parameters of GEE models and show that this happens because within- and between-subject slopes for the outcomes regressed on the predictor variables differ. We then systematically describe several common mechanisms that cause within- and between-subject slopes to differ: change effects, lag/reverse-lag and spillover causality, shared within-subject measurement bias or confounding, and predictor variable measurement error. The misconceptions we describe should be better publicized. Repeated measures analyses should compare within- and between-subject slopes of predictors and when they do differ, investigate the causal reasons for this.Entities:
Keywords: cross-sectional regression; generalized estimating equations; mixed models; repeated measures; within-/between-subject associations; working correlation structure
Mesh:
Year: 2019 PMID: 30754731 PMCID: PMC6388388 DOI: 10.3390/ijerph16030504
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Cross-sectional regression parameter estimates using GEE 1 for EGFR = HIV infection, serum albumin and BUN in the Bronx WIHS.
| Variable | Working Correlation Structure | |||||
|---|---|---|---|---|---|---|
| Independence | Equicorrelation 2 | |||||
| Point Estimate | 95% CI | Z-Value ( | Point Estimate | 95% CI | Z-Value ( | |
| HIV Infection | −2.04 | (−5.07, 0.98) | −1.32 (<0.19) | −3.96 | (−6.90, −1.03) | −2.65 (0.0081) |
| Albumin Per g/dL | −6.21 | (−8.95, −3.47) | −4.44 (<0.0001) | −9.84 | (−12.01, −7.68) | −8.93 (<0.0001) |
| BUN Per mg/dL | −1.87 | (−2.12, −1.62) | −14.45 (<0.0001) | −1.22 | (−1.46, −0.99) | −10.30 (<0.0001) |
| Quasi-Likelihood Information Criteria (QIC) | 10,847.14 | 10,836.27 | ||||
1 Mixed models gave essentially similar point estimates; see Appendix A. 2 Interclass correlation of residuals from GEE-E was 0.45 indicating non-independence correlation was structurally correct.
Parameter estimates for the cross-sectional regression of Table 1 EGFR = HIV infection, serum albumin and BUN in the Bronx WIHS using mixed models.
| Variable | Working Correlation Structure | |||||
|---|---|---|---|---|---|---|
| Independence | Equicorrelation | |||||
| Point Estimate | 95% CI 1 | Z-Value ( | Point Estimate | 95% CI 1 | Z-Value ( | |
| HIV Infection | −2.04 | (−5.04, −1.04) | −4.02 (<0.0001) | −3.99 | (−7.04 −0.93) | −2.55 (0.01) |
| Albumin Per g/dL | −6.21 | (−7.30 −5.11) | −11.04 (<0.0001) | −9.89 | (−11.03, −8.73) | −16.90 (<0.0001) |
| BUN Per mg/dL | −1.87 | (−1.95, −1.79) | −44.68 (<0.0001) | −1.22 | (−1.30, −1.13) | −29.08 (<0.0001) |
| Akaike Information Criteria (AIC) | 99,374.5 | 94,934.5 | ||||
1 The confidence interval and p-values for independence working correlation structure in particular but also arguably for equicorrelation as well overestimate the precision of the parameter estimates. Unlike GEE, mixed models are not robust to misspecification of the correlation structure.
Figure 1Illustration of common within-subject measurement bias and confounding for K = 1.
Within- and between-subject decomposition regression parameter estimates using GEE 1 for EGFR = HIV infection, serum albumin and BUN in the Bronx WIHS.
| Variable | Working Correlation Structure | ||||||
|---|---|---|---|---|---|---|---|
| Compartment | Independence | Equicorrelation | |||||
| Point Estimate | 95% CI | Z-Value ( | Point Estimate | 95% CI | Z-Value ( | ||
| HIV Infection | Between-subject | −1.16 | (−4.21, | −0.75 | −1.57 | (−4.47, | −1.06 |
| NA 2 | --- | --- | --- | NA 2 | --- | --- | |
| Albumin | Between-subject | −3.27 | (−7.88, | −1.39 | −2.71 | (−7.00, | −1.24 (0.21) |
| Within-subject | −10.70 | (−12.99, | −9.16 (<0.0001) | −10.70 | (−12.99, | −9.16 (<0.0001) | |
| BUN Per mg/dL | Between-subject | −2.72 | (−3.10, | −13.89 | −2.65 | (−3.01, | −14.21 |
| Within-subject | −1.11 | (−1.34, | −9.31 (<0.0001) | −1.11 | (−1.34, | −9.31 (<0.0001) | |
| Quasi-Likelihood Information | 10,866.64 | 10,857.62 | |||||
1 Mixed models gave essentially similar point estimates. See Appendix A 2 There is no within-subject variation for HIV infection status.
Figure 2Illustration of lag causality and reverse-lag causality for K = 1.
Figure 3Illustration of residual association with independent measure error in X for K = 1.
Means ± standard deviation of EGFR serum albumin and BUN broken down by HIV status across all repeated measures used in Table 1 and Table 2.
| Variable | For HIV + Subjects (496 persons 7326 Replicates) | For HIV - Subjects (178 persons 3456 Replicates) |
|---|---|---|
| 90.3 ± 27.2 | 92.4 ± 25.0 | |
| 12.94 ± 5.71 | 12.10 ± 5.30 | |
| 3.97 ± 0.44 | 4.14 ± 0.36 |
Figure 4Compensating bias pathways on time invariant HIV estimate from failure to use an independent working correlation structure in repeated measures GEE.
Parameter estimates for the within- between-subject decomposition regression of Table 2 EGFR = HIV infection, albumin and BUN in the Bronx WIHS using mixed models.
| Variable | Compartment | Working Correlation Structure | |||||
|---|---|---|---|---|---|---|---|
| Independence | Equicorrelation | ||||||
| Point Estimate | 95% CI 1 | Z-Value ( | Point Estimate | 95% CI 1 | Z-Value ( | ||
| HIV Infection | Between-subject | −1.16 | (−4.21, 1.88) | −2.30 | −1.57 | (−4.47, 1.32) | −1.06 |
| NA 2 | --- | --- | --- | NA 2 | --- | --- | |
| Albumin | Between-subject | −3.28 | (−4.78, | −4.26 | −2.72 | (−7.00, | −1.32 (0.19) |
| Within-subject | −10.70 | (−12.24, | 13.60 (<0.0001) | −10.67 | (−11.86, | −17.57 (<0.0001) | |
| BUN Per mg/dL | Between-subject | −2.72 | (−2.84, | −44.87 | −2.65 | (−2.95, | −17.10 |
| Within-subject | −1.11 | (−1.22, | −18.59 (<0.0001) | −1.11 | (−1.20, | −25.72 (<0.0001) | |
| Akaike Information Criteria (AIC) | 98,451.8 | 94,824.7 | |||||
1 The confidence interval and p-values for independence working correlation structure in particular but also arguably for equicorrelation overestimate the precision of the parameter estimates. Unlike GEE, mixed models are not robust to misspecification of the correlation structure. 2 There is no Within-subject Variation for HIV Infection Status.