| Literature DB >> 30879056 |
Rachael A Hughes1,2, Jon Heron1,2,3, Jonathan A C Sterne1,3, Kate Tilling1,2,3.
Abstract
BACKGROUND: Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situations.Entities:
Keywords: Complete case analysis; inverse probability weighting; missing data; missing data mechanisms; missing data patterns; multiple imputation
Mesh:
Year: 2019 PMID: 30879056 PMCID: PMC6693809 DOI: 10.1093/ije/dyz032
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 7.196
Figure 1.Diagrams showing causal relationships between the completely observed outcomes of the linear and logistic regression (depression symptom score and self-harm respectively), completely observed covariates maternal substance use and sex, incompletely observed exposure cannabis use, and MissCU, a binary variable that indicates whether cannabis use is observed or missing. Note, for clarity we have not included all arrows between the covariates.
Potential bias of the exposure regression coefficient in complete case analysis based on linear or logistic regression, according to the reasons for missing data. Unless otherwise stated, the entries apply to both Missing At Random and Missing Not At Random missingness mechanisms
| Exposure regression coefficient | ||
|---|---|---|
| Variables missingness is dependent upon | Linear | Logistic |
| None (i.e. Missing Completely At Random) | Unbiased | Unbiased |
| Outcome | Biased | Unbiased |
| Exposure (and possibly confounders) | Unbiased | Unbiased |
| Outcome and confounders | Biased | Unbiased |
| Outcome and exposure (and possibly confounders) | Biased | Biased |
Biased in general, except when in truth there is no association between the outcome and the exposure (i.e. the true value of the exposure regression coefficient is zero).
Biased in general, except when missingness depends on the outcome and exposure independently.
Results of the missingness model applied to 679 participants with observed values for adult body mass index (BMI), weight at 5 years and maternal weight
| Odds ratio | 95% CI | |
|---|---|---|
| Weight at 5 years (kg) (exposure variable) | 0.913 | 0.827, 1.01 |
| Birth weight (kg) | 1.19 | 0.775, 1.83 |
| Sex | 0.721 | 0.479, 1.09 |
| Maternal weight (kg) | 0.950 | 0.924, 0.976 |
| Adult BMI (kg/m2) (outcome variable) | 1.06 | 1.01, 1.11 |
CI, confidence interval.
Figure 2.Diagram showing the causal relationship between the outcome [adult body mass index (BMI), exposure (weight at age 5), confounders (birth weight, sex, gestational age, maternal weight, paternal weight and parental socioeconomic status (SES)], and complete case, a binary variable that indicates whether a participant is a complete case (observed values for the outcome, exposure and all confounders) or an incomplete case (missing values for at least one of these variables). Note, we have not included all arrows between the covariates.
Missing data patterns of the main analysis variables: outcome (adult BMI), exposure (weight at 5 years), confounders (maternal weight, paternal weight and parental socioeconomic status) for 951 participants of the Barry Caerphilly Growth Study. ✓ denotes observed, × denotes missing, and ✓/× denotes some observed and some missing. Omitted variables sex and birth weight were completely observed
| Follow-up study | Original childhood study | |||
|---|---|---|---|---|
| Pattern | Outcome | Exposure | Confounders | Number of participants (%) |
| 1 | ✓ | ✓ | ✓ | 547 (57.5%) |
| 2 | ✓ | ✓ | ✓/× | 125 (13.1%) |
| 3 | ✓ | × | ✓ | 7 (0.7%) |
| 4 | × | ✓ | ✓ | 210 (22.1%) |
| 5 | × | ✓ | ✓/× | 61 (6.4%) |
| 6 | × | × | ✓ | 1 (0.1%) |
Results of complete case and multiple imputation analyses of the association of weight at 5 years with adult BMI, using data from the Barry Caerphilly Growth Study
| Complete case analysis ( | Multiple imputation ( | ||||||
|---|---|---|---|---|---|---|---|
| Log OR | SE | 95% CI | Log OR | SE | 95% CI | ||
| Weight at 5 years (kg) (exposure variable) | 0.467 | 0.0876 | 0.295, 0.639 | 0.458 | 0.0735 | 0.314, 0.602 | |
| Birth weight (kg) | –0.176 | 0.438 | –1.04, 0.684 | –0.788 | 0.410 | –1.60, 0.0200 | |
| Sex | 0.209 | 0.372 | –0.521, 0.940 | 0.165 | 0.334 | –0.492, 0.822 | |
| Gestational age: | 0.635 | 0.610 | –0.564, 1.83 | 0.150 | 0.565 | –0.963, 1.26 | |
| 39–40 weeks | < 39 weeks | 0 (reference) | 0 (reference) | ||||
| > 41 weeks | –0.00779 | 0.561 | –1.11, 1.09 | 0.321 | 0.476 | –0.615, 1.26 | |
| Maternal weight (kg) | 0.0810 | 0.0198 | 0.0421, 0.120 | 0.0835 | 0.0183 | 0.0475, 0.120 | |
| Paternal weight (kg) | 0.0463 | 0.0180 | 0.0110, 0.0816 | 0.0477 | 0.0170 | 0.0143, 0.0812 | |
| Parental socioeconomic status: | I/II | –0.633 | 0.493 | –1.60, 0.334 | –0.791 | 0.453 | –1.68, 0.101 |
| III | 0 (reference) | 0 (reference) | |||||
| IV/V | 1.07 | 0.465 | 0.158, 1.99 | 1.20 | 0.449 | 0.317, 2.09 | |
n, number of observations; m, number of imputations; log OR, odds ratio on the natural logarithm scale; SE, standard error; CI, confidence interval.