| Literature DB >> 25087978 |
Shaun Seaman1, Menelaos Pavlou, Andrew Copas.
Abstract
Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland.Entities:
Keywords: conditional maximum likelihood; confounding by cluster; contextual effect; informative cluster size; poor man's method; within-cluster effect
Mesh:
Year: 2014 PMID: 25087978 PMCID: PMC4320764 DOI: 10.1002/sim.6277
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Methods for cluster-specific inference: estimands and assumptions needed to estimate them consistently. See main text for more details.
| Assumptions | Estimands | Notes | |
|---|---|---|---|
| GLMM | A1, A2 | Alternative assumptions for LMM: | |
| A3 and A4 for | |||
| of | |||
| Conditional ML | A1 | A1 not needed for LMM | |
| Poor man's method | A1 | Same as conditional ML for LMM; | |
| approximate otherwise | |||
| Brumback | A1 | Generalisation of poor man's method | |
| Conditional GEE | None | Only for identity/log link | |
| Conditional ML | A1 | ||
| Model expectation | A1 | Generalisation of Brumback's method | |
| of random intercept | |||
| Treat random intercept | A1. Also: | Same as conditional ML if | |
| as fixed effect | A6 if | Var( | and identity/log link; same as conditional |
| and | LMM if | ||
| if not LMM | |||
| Joint model | A1, A7 | See 3.3.8 for alternative to A7 when | |
Methods for population-average inference: estimands and assumptions needed to estimate them consistently. See main text for further detail.
| Method | Assumptions | Estimand | Notes |
|---|---|---|---|
| GEE | |||
| Marginalise over | GLMM correct (so A8 | ||
| GLMM | and | ||
| Marginalise over | Joint model correct | ||
| joint model | |||
| IEE | |||
| Poor man's method | See text | ||
| only on | |||
| Include | Model for | See text | |
| Weighted IEE | Useful when cluster-size | ||
| balance or | |||
| Type 1 doubly weighted | Each cluster has all | See text | |
| IEE | possible | ||
| Type 3 doubly weighted | Propensity score model | Marginal treatment effect | Method of Joffe |
| IEE | in typical member | extra weighting by | |
Log odds ratios of preterm neonatal death for six-point increase in deprivation score.
| Method | log OR | SE | 95% CI | ||
|---|---|---|---|---|---|
| IEE | 0.600 | 0.178 | 0.251 | 0.949 | 0.001 |
| ML estimation of GLMM | 0.546 | 0.189 | 0.175 | 0.917 | 0.004 |
| Conditional ML | 0.460 | 0.195 | 0.076 | 0.843 | 0.019 |
| Poor Man's method | |||||
| Within-cluster | 0.470 | 0.198 | 0.083 | 0.858 | 0.017 |
| Between-cluster | 1.262 | 0.641 | 0.005 | 2.519 | 0.049 |
| Model random intercept | |||||
| Within-cluster | 0.464 | 0.197 | 0.079 | 0.850 | 0.018 |
| Between-cluster | 0.987 | 0.630 | −0.248 | 2.222 | 0.117 |
| Cluster size (×1000) | 0.007 | 0.003 | 0.000 | 0.013 | 0.035 |
Methods are independence estimating equations, maximum likelihood estimate from random-intercept logistic regression model, conditional ML estimate from the same model, poor man's method, and modelling expectation of random intercept as linear function of mean deprivation in cluster and cluster size.