| Literature DB >> 33539930 |
Katherine J Lee1, Kate M Tilling2, Rosie P Cornish2, Roderick J A Little3, Melanie L Bell4, Els Goetghebeur5, Joseph W Hogan6, James R Carpenter7.
Abstract
Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. Importantly, the lack of transparency around methodological decisions is threatening the validity and reproducibility of modern research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal Study of Parents and Children. The framework consists of three steps: 1) Develop an analysis plan specifying the analysis model and how missing data are going to be addressed. An important consideration is whether a complete records' analysis is likely to be valid, whether multiple imputation or an alternative approach is likely to offer benefits and whether a sensitivity analysis regarding the missingness mechanism is required; 2) Examine the data, checking the methods outlined in the analysis plan are appropriate, and conduct the preplanned analysis; and 3) Report the results, including a description of the missing data, details on how the missing data were addressed, and the results from all analyses, interpreted in light of the missing data and the clinical relevance. This framework seeks to support researchers in thinking systematically about missing data and transparently reporting the potential effect on the study results, therefore increasing the confidence in and reproducibility of research findings.Entities:
Keywords: ALSPAC; Missing data; Multiple imputation; Observational studies; Reporting; STRATOS initiative
Mesh:
Year: 2021 PMID: 33539930 PMCID: PMC8168830 DOI: 10.1016/j.jclinepi.2021.01.008
Source DB: PubMed Journal: J Clin Epidemiol ISSN: 0895-4356 Impact factor: 7.407
Summary of variables for analysis
| Variable type | Definition | Relevant variable(s) in the ALSPAC case study |
|---|---|---|
| Outcome | Outcome of interest in the analysis model. | Educational attainment score at 16 years |
| Exposure | Main exposure of interest in the analysis model. | Smoking status at 14 years |
| Confounders | Variables required for adjustment in the analysis model. | Child sex |
| Auxiliary | Variables that are not in the analysis model but can be used to recover some of the missing data in the incomplete variables | Smoking age 10 years |
Fig. 1The framework.
Fig. 2Causal diagram for the Avon Longitudinal Study of Parents and Children (ALSPAC) case study. Note, this figure illustrates the fact that we expect missingness to depend on the outcome of interest, educational attainment, as well as smoking itself, and that we expect there will be potential auxiliary variables that are both associated with missingness and with the incomplete exposure variable (smoking age 14 years).
Fig. 3Flowchart for selecting an appropriate method to handle the missing data. ∗ The exception is if there is missingness in the exposure or covariates that is unrealted to the outcome but is missing not at random; in this context although inference from a complete records analysis would be unbiased, inference following multiple imputataion would be biased.
Analysis of the relationship between smoking at 14 years and educational attainment at 16 years
| Method of analysis | Number of observations in the analysis | Regression coefficient (95% CI) | % Of missing smoking values imputed as “smokers” | |
|---|---|---|---|---|
| Primary analysis: Multiple imputation | 14,684 | −10.8 (−12.2, −9.4) | <0.001 | 13.3 |
| Complete records analysis | 3,153 | −7.9 (−9.1, −6.7) | <0.001 | N/A |
| Sensitivity analysis—sensitivity parameter = 0.1 | 14,684 | −10.9 (−12.4, −9.4) | <0.001 | 14.2 |
| Sensitivity analysis—sensitivity parameter = 0.25 | 14,684 | −11.0 (−12.3, −9.6) | <0.001 | 15.5 |
| Sensitivity Analysis – sensitivity parameter = 0.5 | 14,684 | −11.0 (−12.3, −9.6) | <0.001 | 18.1 |
| Sensitivity analysis—sensitivity parameter = 1 | 14,684 | −10.7 (−11.8, −9.6) | <0.001 | 24.2 |
| Sensitivity analysis—sensitivity parameter = 10 | 14,684 | −4.3 (−4.7, −3.8) | <0.001 | 99.8 |