| Literature DB >> 28743256 |
Anurika Priyanjali De Silva1, Margarita Moreno-Betancur2,3, Alysha Madhu De Livera1, Katherine Jane Lee2,4, Julie Anne Simpson5.
Abstract
BACKGROUND: Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another 'distinct' variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time - a commonly encountered scenario in epidemiological studies.Entities:
Keywords: Fully conditional specification; Longitudinal data; Missing data; Multiple imputation; Multivariate normal imputation; Non-linear trajectory; Time-dependent covariate
Mesh:
Year: 2017 PMID: 28743256 PMCID: PMC5526258 DOI: 10.1186/s12874-017-0372-y
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Description of variables used in the simulation study, for the ith child at wave j
| Variable | Type | Grouping/ Units | Label |
|---|---|---|---|
| Study child’s BMI for age z-score | Continuous | bmizij | |
| Study child’s sleep problems | Categorical | 0 = No sleep problems | sleep_probij |
| Study child’s age | Continuous | Months | scageij a |
| Maternal education | Categorical | 0 = Not completed | m_educationi |
| Maternal smoking | Categorical | 0 = No | m_smokingi |
| Study child’s sex | Categorical | 0 = Male | sexi |
| Study child’s birth weight | Continuous | Kilograms | birthweighti |
| Maternal age at child birth | Continuous | Years | m_agei |
aA new variable scage_sqij was derived as the squared term of scageij to be used in the data generation models
Fig. 1a) Variable dependencies of simulated time-independent variables; m_age, maternal age at child birth; m_education, maternal education; m_smoking, maternal smoking; sex, study child’s sex; birthweight, study child’s birth weight; b) Causal diagram for the association between sleep problems and BMI for age z-scores. For the ease of presentation all time-independent variables are presented using a single node excluding maternal smoking; scage1-scage5, study child’s age at waves 1 to 5; bmiz1-bmiz5, study child’s BMI for age z-scores at waves 1 to 5; sleep_prob1-sleep_prob5, study child’s sleep problems at waves 1 to 5; c) Causal diagram for MAR missingness. R is an indicator variable of missingness where BMI for age z-scores were assigned to missing if R = 1. Only variables required to model the MAR missingness are shown in the figure
Specifications of the logistic regression models used to impose missing data under the MAR scenarios
| Variable | Odds Ratio | |||
|---|---|---|---|---|
| MAR (weak) | MAR (strong)a | |||
| Model A/ Eq. | Model B/ Eq. | Model A/ Eq. | Model B/ Eq. | |
| 1 Sleep problem at wave 1Yes | 1.67 | 1.61 | 2.80 | 2.60 |
| 2 Sleep problem at wave 5Yes | 1.64 | 1.58 | 2.70 | 2.50 |
| 3 Maternal smokingYes | 1.61 | 1.58 | 2.60 | 2.50 |
exp exponential, MAR missing at random
aOdds ratio for MAR (Strong) = square of the Odds ratio for MAR (Weak)
bModel A/ Eq. 5 and Model B/ Eq. 6 represent the logistic regression models used to generate missingness in BMI for age z-scores from waves 2–5 under MAR, to denote bmiz missing for all subsequent waves and intermittent missingness respectively
Performance of various methods for handling 50% missingness in BMI for age z-scores; true ORa = 1.1(log(OR) = 0.1)
| Performance Measure | Method | ||||
|---|---|---|---|---|---|
| Complete Case Analysis | FCS | MVNI | two-fold FCS (width = 1)c | two-fold FCS (width = 2)d | |
| MCAR | |||||
| Absolute Biasb | 0.001 | 0.000 | 0.000 | 0.002 | 0.001 |
| Relative Bias (%) | 0.65 | 0.28 | 0.34 | 1.63 | 0.77 |
| Empirical SE | 0.017 | 0.017 | 0.017 | 0.018 | 0.017 |
| Model-based SE | 0.018 | 0.017 | 0.017 | 0.017 | 0.017 |
| Coverage (%) | 95.6 | 95.8 | 95.9 | 95.3 | 95.9 |
| RMSE | 0.017 | 0.017 | 0.017 | 0.018 | 0.017 |
| MAR (weak) | |||||
| Absolute Biasb | 0.015 | 0.000 | 0.000 | 0.000 | 0.000 |
| Relative Bias (%) | 15.03 | 0.16 | 0.22 | 0.03 | 0.28 |
| Empirical SE | 0.018 | 0.017 | 0.017 | 0.017 | 0.017 |
| Model-based SE | 0.018 | 0.017 | 0.017 | 0.017 | 0.017 |
| Coverage (%) | 86.3 | 94.4 | 94.5 | 94.3 | 94.6 |
| RMSE | 0.023 | 0.017 | 0.017 | 0.017 | 0.017 |
| MAR (strong) | |||||
| Absolute Biasb | 0.020 | 0.000 | 0.000 | 0.003 | 0.002 |
| Relative Bias (%) | 20.36 | 0.23 | 0.21 | 3.19 | 2.16 |
| Empirical SE | 0.018 | 0.017 | 0.017 | 0.018 | 0.017 |
| Model-based SE | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 |
| Coverage (%) | 77.8 | 95.0 | 94.9 | 93.6 | 93.9 |
| RMSE | 0.027 | 0.017 | 0.017 | 0.018 | 0.018 |
Empirical SE empirical standard error, FCS fully conditional specification, MAR missing at random, MCAR missing completely at random, Model-based SE model based standard error, MVNI multivariate normal imputation, RMSE root mean square error, two-fold FCS two-fold fully conditional specification algorithm
aTrue OR represents the true odds ratio between sleep problems and BMI for age z-scores
bMonte Carlo standard error did not exceed 0.0006
cResults for the two-fold FCS with a time window width of 1, that is, including immediately adjacent time points
dResults for the two-fold FCS with a time window width of 2, that is, including two adjacent time points
Fig. 2Absolute bias and Relative bias (%) for complete case analysis (CC), fully conditional specification (FCS), multivariate normal imputation (MVNI), and two-fold fully conditional specification (two-fold FCS) for increasing proportions of missing data (0.25, 0.5) under three missing data scenarios and two simulation scenarios; true OR represents the true odds ratio between sleep problems and BMI for age z-scores. aRelative bias is calculated as absolute bias relative to the value of the true parameter. As the value of the true parameter (log(OR)) increases from 0.1 to 0.4 in the second simulation scenario, the magnitude of the relative bias drops even though the absolute bias shows a slight increase
Fig. 3Empirical standard error and Coverage (%) for complete case analysis (CC), fully conditional specification (FCS), multivariate normal imputation (MVNI), and two-fold fully conditional specification (two-fold FCS) for increasing proportions of missing data (0.25, 0.5) under three missing data scenarios and two simulation scenarios; true OR represents the true odds ratio between sleep problems and BMI for age z-scores