| Literature DB >> 26429998 |
Jonathan W Bartlett, Ofer Harel, James R Carpenter.
Abstract
Missing data are a commonly occurring threat to the validity and efficiency of epidemiologic studies. Perhaps the most common approach to handling missing data is to simply drop those records with 1 or more missing values, in so-called "complete records" or "complete case" analysis. In this paper, we bring together earlier-derived yet perhaps now somewhat neglected results which show that a logistic regression complete records analysis can provide asymptotically unbiased estimates of the association of an exposure of interest with an outcome, adjusted for a number of confounders, under a surprisingly wide range of missing-data assumptions. We give detailed guidance describing how the observed data can be used to judge the plausibility of these assumptions. The results mean that in large epidemiologic studies which are affected by missing data and analyzed by logistic regression, exposure associations may be estimated without bias in a number of settings where researchers might otherwise assume that bias would occur.Entities:
Keywords: complete case analysis; logistic regression; missing data; odds ratio
Mesh:
Year: 2015 PMID: 26429998 PMCID: PMC4597800 DOI: 10.1093/aje/kwv114
Source DB: PubMed Journal: Am J Epidemiol ISSN: 0002-9262 Impact factor: 4.897
Bias of Estimates Derived From Complete Records Analysis Logistic Regression Under Different Missingness Assumptions
| Quantity on Which Missingness Is Dependent | Parameter | ||
|---|---|---|---|
| β0 | β | β | |
| Neither | Asymptotically unbiased | Asymptotically unbiased | Asymptotically unbiased |
| Outcome ( | Biased | Asymptotically unbiased | Asymptotically unbiased |
| Covariates ( | Asymptotically unbiased | Asymptotically unbiased | Asymptotically unbiased |
| Outcome ( | Biased | Asymptotically unbiased | Biased |
| Outcome ( | Biased | Biaseda | Biased |
a Biased in general. However, if P (R = 1|X, Y, C) = s (X, C)t (Y, C) for some functions s (X, C) and t (Y, C), with R being the complete record indicator, then the exposure association is again estimated without bias (asymptotically).
Guidance for Investigation and Implications of Missingness Mechanisms in Complete Records Analysis Logistic Regression
| Quantity With Which Missingness Is Found to Be Associated | Plausible Missingness Mechanism(s) | Bias in CRA Estimate of β |
|---|---|---|
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Generally biased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Generally biased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Asymptotically unbiased | ||
| Generally biased | ||
Abbreviation: CRA, complete records analysis.
Log Odds Ratios for the Adjusted Association Between Number of Flying Hours (Categorized) and Mortality Among United Kingdom Flight Crew Members, 1989–1999a,b
| Missingness Mechanism | Quantity on Which Missingness Is Dependent | No. of Flying Hours | ||||
|---|---|---|---|---|---|---|
| 400–5,499 vs. <400 | ≥5,500 vs. <400 | |||||
| Log OR (SE) | % Bias | Log OR (SE) | % Bias | |||
| N/A (full data) | N/A | 0.64 (0.22) | N/A | 0.70 (0.23) | N/A | |
| 1 | Nothing (MCAR) | expit(0) | 0.65 (0.32) | 1.3 | 0.72 (0.32) | 2.4 |
| 2 | Death indicator ( | 1 if | 0.65 (0.23) | 1.4 | 0.72 (0.23) | 2.5 |
| 0.485 if | ||||||
| 3 | Age ( | expit((age − 37.32)/10.79) | 0.58 (0.29) | −9.0 | 0.63 (0.27) | −9.9 |
| 4 | Flying hoursd ( | expit(−(flyhrscat − 1)) | 0.65 (0.28) | 0.9 | 0.72 (0.30) | 2.4 |
| 5 | Age and flying hours ( | expit(−(flyhrscat − 1) + (age − 37.32)/10.79) | 0.60 (0.27) | −6.4 | 0.64 (0.26) | −9.1 |
| 6 | Death indicator and age ( | expit((age − 37.32)/10.79) if | 0.77 (0.36) | 19.1 | 0.90 (0.42) | 28.0 |
| expit(−(age − 37.32)/10.79) if | ||||||
| 7 | Death indicator and flying hours ( | expit(−(flyhrscat − 1)) if | 1.67 (0.40) | 160.6 | 2.76 (0.36) | 292.5 |
| expit(flyhrscat − 1) if | ||||||
| 8 | Death indicator and flying hours ( | expit(−(flyhrscat − 1)) if | 0.66 (0.29) | 3.5 | 0.74 (0.31) | 5.9 |
| expit(−(flyhrscat − 1)) × 0.485 if | ||||||
Abbreviations: MCAR, missing completely at random; N/A, not applicable; OR, odds ratio; SE, standard error.
a Simulations based on imposing artificial missingness on data from the Medical Records System of the United Kingdom Civil Aviation Authority. Data were obtained from a cohort study of professional pilots, flight engineers, and navigators who held a professional flight crew license in the United Kingdom at some point between 1989 and 1999 (18, 19).
b Estimates (SEs) from the full data and averages obtained across 10,000 replications under various missingness mechanisms.
c expit(t) = exp(t)/(1 + exp(t)).
d flyhrscat = 0 if flying hours <400, 1 if 400 ≤ flying hours <5,500, and 2 if flying hours ≥5,500.