| Literature DB >> 22784200 |
Amalia Karahalios1, Laura Baglietto, John B Carlin, Dallas R English, Julie A Simpson.
Abstract
BACKGROUND: Retaining participants in cohort studies with multiple follow-up waves is difficult. Commonly, researchers are faced with the problem of missing data, which may introduce biased results as well as a loss of statistical power and precision. The STROBE guidelines von Elm et al. (Lancet, 370:1453-1457, 2007); Vandenbroucke et al. (PLoS Med, 4:e297, 2007) and the guidelines proposed by Sterne et al. (BMJ, 338:b2393, 2009) recommend that cohort studies report on the amount of missing data, the reasons for non-participation and non-response, and the method used to handle missing data in the analyses. We have conducted a review of publications from cohort studies in order to document the reporting of missing data for exposure measures and to describe the statistical methods used to account for the missing data.Entities:
Mesh:
Year: 2012 PMID: 22784200 PMCID: PMC3464662 DOI: 10.1186/1471-2288-12-96
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Summary of cohort studies included in the review
| Characteristics | Number of studies (N = 82) | (%) |
|---|---|---|
| 2000 | 2 | (2) |
| 2001 | 4 | (5) |
| 2002 | 4 | (5) |
| 2003 | 5 | (6) |
| 2004 | 6 | (7) |
| 2005 | 8 | (10) |
| 2006 | 11 | (13) |
| 2007 | 13 | (16) |
| 2008 | 13 | (16) |
| 2009 | 16 | (20) |
| Not stated in paper | 4 | (5) |
| 1,000–2,000 | 15 | (18) |
| 2,000–3,000 | 13 | (16) |
| 3,000–5,000 | 14 | (17) |
| 5,000–10,000 | 13 | (16) |
| 10,000–20,000 | 14 | (17) |
| 20,000 + | 9 | (11) |
| Not stated in paper | 5 | (6) |
| Before 1970 | 9 | (11) |
| 1970–1979 | 9 | (11) |
| 1980–1989 | 25 | (30) |
| 1990–1999 | 30 | (37) |
| 2000–2009 | 4 | (5) |
| Number not stated | 12 | (15) |
| Mean or range given | 7 | (9) |
| 1 | 10 | (12) |
| 2 | 9 | (11) |
| 3 | 18 | (22) |
| 4 | 12 | (15) |
| ≥5 | 14 | (17) |
| Cox proportional hazards regression^ | 37‡ | |
| Time-varying covariates | 35 | |
| Time-invariant covariates | 3 | |
| Generalised Estimating Equations | 12* | |
| Linear regression | 3 | |
| Logistic regression | 10 | |
| Generalised linear mixed-effects modelling | 16 | |
| Linear regression | 13 | |
| Logistic regression | 3 | |
| Standard linear regression | 3 | |
| Standard logistic regression | 9 | |
| Other methods | 6 | |
† The total number of papers is 83 because one paper used two analysis models: a Cox proportional hazards model for a time to event outcome and a linear mixed effects model for a numerical outcome.
^ Note one paper used a parametric survival model.
‡ One paper incorporated their repeated measures of exposures using both a time varying covariate and time-invariant covariates.
*One paper had a numerical and binary outcome.
Missing data features reported by the studies
| Yes | 66 (80%) |
| Missing data reported for each follow-up wave used in the analysis | 35 |
| A general statement was made about the amount of missing data or the amount of completed follow-up (how many participants attended at least one wave or only the final follow-up wave) | 22 |
| Indicated number that completed all waves of follow-up (i.e. number included in final sample) | 6 |
| Indicated amount missing for certain (key) variables | 3 |
| No | 16 (20%) |
| Yes | 26 (32%) |
| Provided a table comparing distributions of key exposures and outcome variables for those with missing and non-missing information | 6 |
| Table not provided but some summary statistics included in text | 4 |
| General comment provided (did not include a table or summary statistics or included p-values only) | 16 |
| No | 56 (69%) |
| 13 (16%) | |
| | |
| Method not stated | 14 (16%) |
| Complete-case analysis assumed | 9 (11%) |
| Complete-case analysis | 54 (66%) |
| Weighted | 1 |
| Unweighted | 53 |
| Exclude participants with missing data at any repeated waves of exposure | 38 |
| Exclude participant data record for waves of data collection with missing exposure data†† | 15 |
| Missing Indicator Method | 1 (1%) |
| Mean value substitution | 3 (4%) |
| Last Observation Carried Forward | 7 (9%) |
| Multiple Imputation | 5 (6%) |
| Details provided for the multiple imputation: | |
| Indicated how many imputations were performed | 4 |
| Indicated which variables were included in the imputation model | 2 |
| Compared results from multiple imputation with complete case analysis | 3 |
| Performed a sensitivity analysis under different assumptions for missing data | 4 |
| Fully Bayesian Model | 1 (1%) |
† Three papers used more than one method to handle their missing exposure data.
†† These studies assessed both exposure and outcome measures repeatedly over the waves of data collection. The data were analysed using either Generalised Estimating Equations (GEEs) or mixed-effects models to account for the correlated outcome data within an individual and excluded participant data records for waves where the exposure data were missing.
Figure 1Search results.