| Literature DB >> 32471366 |
Orlagh U Carroll1, Tim P Morris2,3, Ruth H Keogh2.
Abstract
BACKGROUND: Missing data in covariates can result in biased estimates and loss of power to detect associations. It can also lead to other challenges in time-to-event analyses including the handling of time-varying effects of covariates, selection of covariates and their flexible modelling. This review aims to describe how researchers approach time-to-event analyses with missing data.Entities:
Keywords: Epidemiology; Missing data; Multiple imputation; Observational studies; Oncology; Survival; Time-to-event
Mesh:
Year: 2020 PMID: 32471366 PMCID: PMC7260743 DOI: 10.1186/s12874-020-01018-7
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Summary of recommendations or considerations from STROBE, ROBINS-I and Sterne et al. guidelines
| Recommendation | Explanation | STROBE | ROBINS-I | Sterne et al. |
|---|---|---|---|---|
| State eligibility criteria | State inclusion and exclusion criteria of study participants, including criteria concerning missing data | ✓ | ✓ | |
| Report the number of individuals at each stage of the study | Give reasons for exclusion at each stage | ✓ | ||
| Indicate the amount of individuals discarded due to missingness at each stage of the study | ✓ | ✓ | ||
| Give consideration to selection bias introduced by exclusion criteria | ✓ | |||
| May use a flowchart to summarise | ✓ | |||
| Covariates | Detail whether included as continuous or categorical and, if relevant, detail how the quantitative covariate was categorised | ✓ | ✓ | |
| Consider departures from linearity for continuous covariates and state which transformation, if any, was used | ✓ | ✓ | ||
| State analysis model | make it clear which method will be used to model the data | ✓ | ✓ | |
| Covariate Selection | describe the procedure used to reach the final model | ✓ | ✓ | |
| this includes, but is not restricted to, missing data imputation, transformation of covariates, interactions between covariates or inclusion of covariates for a priori reasons | ✓ | ✓ | ||
| Results | Provide unadjusted estimates and the final adjusted model | ✓ | ✓ | |
| State the number of participants included in unadjusted and adjusted analyses | ✓ | |||
| Report the number of participants with missing data | Report this for each covariate of interest or the number of complete data for the important covariates | ✓ | ✓ | |
| Give reasons for missing values | ✓ | ✓ | ✓ | |
| Investigate if there are key differences between those observed and those with missing data - this may be compared across exposure/intervention groups. | ✓ | ✓ | ||
| Which method was used to handle missing data? | State clearly the method used | ✓ | ✓ | ✓ |
| State any missing data assumptions that were made | Such as whether the data are MCAR, MAR or MNAR | ✓ | ✓ | ✓ |
| Sensitivity analysis | Should investigate robustness of findings | ✓ | ✓ | |
| Compare method with a complete-case analysis | ✓ | |||
| If necessary, assess validity of methods if there are differences | ✓ | ✓ | ||
| Assess plausibility of missing data assumptions | ✓ | |||
| Give details of the imputation model | State the software used and key settings for imputation model | ✓ | ||
| State the number of imputations used | ✓ | |||
| State variables included in imputation model | ✓ | |||
| State how non-normal or binary covariates were handled | ✓ | |||
| Were interactions in analysis model included in imputation model? | ✓ | |||
| If a large fraction of data are imputed, compare observed and imputed values | ✓ | |||
| Missing data assumptions | Discuss if variables included in the imputation model make MAR assumption plausible | ✓ | ||
| Sensitivity analyses | Compare MI results with CC results | ✓ | ||
| Investigate departures from MAR assumption | ✓ | |||
| If necessary, suggest explanations for why there are differences in results across sensitivity analyses | ✓ | |||
Fig. 1Flowchart of the inclusion process for studies into the review [10]
Breakdown of the number of individuals with missing data
| Description | Number | (%) |
|---|---|---|
| Excluded individuals with missing data in any covariate1 | 44 | (42) |
| Excluded individuals with missing data in a subset of covariates | 62 | (58) |
| Reported the number of individuals excluded | 66 | (62) |
| Mean (SD) | 14.14 | (12.40) |
| Median (IQR) | 10.22 | (4.73, 18.34) |
| Min, Max | 0.11 | 47.38 |
| Reported missing data in baseline table for incomplete covariates | 82 | (80) |
| Used a complete-case analysis2 | 35 | (34) |
| Used other missing data methods | 36 | (35) |
| Quantified the complete-case sample size | 25 | (25) |
| Mean (SD) | 31.65 | (21.90) |
| Median (IQR) | 31.34 | (13.67, 37.76) |
| Min, Max | 1.77 | 94.16 |
The initial phase is the stage when defining the study population using inclusion and exclusion criteria.
1potentially used a complete-case in initial phase but did not clearly state their methods
2A further 31 were not clear on whether they used a complete-case during the analysis
Methods used in studies for the handling of missing data
| Missing data methods | Count | (%)* |
|---|---|---|
| Complete-case | 79 | (53) |
| Removed individuals with incomplete data for a subset of covariates | 67 | (45) |
| Multiple Imputation | 33 | (22) |
| Missing indicator | 10 | (7) |
| Worst or best case scenario1 | 2 | (1) |
| Stochastic imputation | 1 | (1) |
| Mean value imputation | 1 | (1) |
| Mode value imputation | 1 | (1) |
| Growth models | 1 | (1) |
| Bayesian model incorporating handling of missing data | 1 | (1) |
| Full-information maximum likelihood estimation 2 | 1 | (1) |
| Selection procedure3 | 1 | (1) |
| Unclear | 33 | (22) |
*Percentages do not sum to 100 as there is overlap with some studies using more than one method.
1[11,12]
2[11]
3A selection model to account for missing data and time-varying covariates [13]
Fig. 2Breakdown of complete-case (CC) usage. The initial phase refers to those who used complete-case analysis when determining inclusion/exclusion of individuals to the study population
Fig. 3Breakdown of multiple imputation (MI) usage. 1 2 did not specify the type of multivariate MI model used, similarly 1 for univariate. 2 1 study ensured the sample size stayed the same for different models. 3 3 studies did not clearly state that they were using complete-case
Selected papers describing methods for addressing common issues arising in the analysis of time-to-event data when there is missing covariate data
| Consideration | Some recommended references | |
|---|---|---|
| General recommendations | [ | |
| Simple imputation | [ | |
| Complete-case bias considerations | [ | |
| [ | ||
| Number of imputations to use | [ | |
| [ | ||
| Covariate selection procedures | [ | |
| [ | ||
| Non-linear effects | [ | |
| [ | ||
| Using a Cox model | [ | |
| [ | ||
| Testing the Proportional hazards assumption and modelling time-varying effects of covariates | [ | |
| Time-dependent covariates | [ | |
| [ | ||
| Functional form | [ | |
| [ | ||
| [ | ||
| [ | ||
| [ | ||
| Covariate selection procedures | [ | See above |
| [ | ||
| Testing the Proportional hazards assumption | [ | |
| [ | ||
| [ | ||
| [ | ||
| Time-varying effects | [ | See above |
| [ | See above | |
| [ | See above | |
| [ | See above | |
| [ | See above | |
| [ | See above | |
| [ | See above | |
| [ | See above | |
| Categorising of covariates | [ | |
| Non-linear effects | [ | |
| [ | ||
| Covariate selection procedures | [ | See above |
| [ | ||
MFP: Multivariable fractional polynomials