| Literature DB >> 32787781 |
Rushani Wijesuriya1,2, Margarita Moreno-Betancur3,4, John B Carlin3,4, Katherine J Lee3,4.
Abstract
BACKGROUND: Three-level data arising from repeated measures on individuals who are clustered within larger units are common in health research studies. Missing data are prominent in such longitudinal studies and multiple imputation (MI) is a popular approach for handling missing data. Extensions of joint modelling and fully conditional specification MI approaches based on multilevel models have been developed for imputing three-level data. Alternatively, it is possible to extend single- and two-level MI methods to impute three-level data using dummy indicators and/or by analysing repeated measures in wide format. However, most implementations, evaluations and applications of these approaches focus on the context of incomplete two-level data. It is currently unclear which approach is preferable for imputing three-level data.Entities:
Keywords: FCS; Incomplete multilevel data; Joint modelling; Linear mixed model; Multilevel multiple imputation; Multiple imputation; Three-level data
Mesh:
Year: 2020 PMID: 32787781 PMCID: PMC7422505 DOI: 10.1186/s12874-020-01079-8
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Description of the variables measured for the j individual belonging to the i school at wave k in the analysis model
| Variable | Type | Grouping /Range | Label |
|---|---|---|---|
| Categorical | 0 = Female 1 = Male | ||
| Continuous | Range [ | ||
| Categorical | 0 = 1st quintile (most disadvantaged) | ||
| 1 = 2nd quintile | |||
| 2 = 3rd quintile | |||
| 3 = 4th quintile | |||
| 4 = 5th quintile (most advantaged) | |||
| Continuous | z-score | ||
| Continuous | z-score | ||
| a
| Continuous | Range [0,8] | |
| b
| Continuous | Range[0,40] |
IRSAD Index of Relative Socio-Economic Advantage and Disadvantage, NAPLAN National Assessment Program - Literacy and Numeracy, SDQ Strengths and Difficulties Questionnaire, SEIFA Socioeconomic Index for Areas, SES Socio-Economic Status
aA subset of 4 items (each ranging from 0 to 2) from the Short Mood and Feelings Questionnaire (SMFQ) was used to measure the depressive symptoms at each wave in the CATS study [3, 38]. Depressive symptoms at each wave in our study is the total summary score of these four items
b For measuring the overall child behaviour, a total difficulties score is derived from the first 4 subscales of the Strengths and Difficulties Questionnaire (SDQ): emotional symptoms, conduct problems, hyperactivity/inattention, peer relationship problems (each ranging from 0 to 10) [39]. This variable is not included in the analysis but is included in the imputation model as an auxiliary variable to improve its performance
Summary of the imputation approaches for handling incomplete three-level data
| MI approach | Paradigm | Model | Softwarea | How the two sources of clustering are handled | |
|---|---|---|---|---|---|
| Clustering due to higher level clusters | Clustering due to repeated measures | ||||
| JM | Standard (single-level) | SAS [ | DI | Repeated measures arranged in wide format | |
| FCS | Standard (single-level) | SAS, SPSS, Stata, Mplus, R, Blimp [ | DI | Repeated measures arranged in wide format | |
| JM | Two-level MLMM | SAS [ | RE | Repeated measures arranged in wide format | |
| DI | RE | ||||
| FCS | Two-level LMM | Mplus, R, Blimp | RE | Repeated measures arranged in wide format | |
| DI | RE | ||||
| JM | Three-level MLMM | Stat-JR, Mplus | RE | RE | |
| FCS | Three-level LMM | R, Blimp | RE | RE | |
DI dummy indicators, FCS fully conditional specification, JM joint modelling, LMM linear mixed model, MLMM multivariate linear mixed model, RE random effects
aR and Blimp are the only freely available, open-source software implementations
Fig. 1Distribution of the bias in the estimated regression coefficient of interest (β1, true value = − 0.025) across the 1000 simulated datasets for available case analysis (ACA) and the 8 multiple imputation (MI) approaches under two scenarios for missing data proportions at waves 2, 4 and 6 (10%, 15%, 20% and 20%, 30%, 40%, respectively) and four ICC combinations when data are missing at random (MAR-CATS). The lower and upper margins of the boxes represent the 25th (Q1) and the 75th (Q3) percentiles of the distribution respectively. The whiskers extend to Q1–1.5*(Q3- Q1) at the bottom and Q3 + 1.5*(Q3- Q1) at the top. The following abbreviations are used to denote different MI methods, e.g., DI: dummy indicators, FCS: fully conditional specification, JM: joint modelling
Fig. 2Empirical standard errors (filled circles with error bars showing ±1.96× Monte Carlo standard errors) and average model-based standard errors (hollow circles) from 1000 simulated datasets, for available case analysis (ACA) and the 8 multiple imputation (MI) approaches under two scenarios for missing data proportions at waves 2,4 and 6 (10%, 15%, 20% and 20%, 30%, 40%, respectively) and four ICC combinations when data are missing at random (MAR-CATS). The following abbreviations are used to denote different MI methods, e.g., DI: dummy indicators, FCS: fully conditional specification, JM: joint modelling
Fig. 3Estimated bias in the variance components at level 1, 2 and 3 across the 1000 simulated datasets available case analysis (ACA) and the 8 multiple imputation (MI) approaches under two scenarios for missing data proportions at waves 2, 4 and 6 (10%, 15%, 20% and 20%, 30%, 40%, respectively) and four ICC combinations when data are missing at random (MAR-CATS). The following abbreviations are used to denote different MI methods, e.g., DI: dummy indicators, FCS: fully conditional specification, JM: joint modelling
Point estimate (and standard error) for the effect of early depressive symptoms on subsequent standardized NAPLAN numeracy scores, and point estimates for the variance components at levels 3, 2 and 1, from available case analysis (ACA) and 8 MI approaches applied to the CATS data analysis
| Method | Regression coefficient estimate (SE) | Level 3 variance component | Level 2 variance component | Level 1 variance component |
|---|---|---|---|---|
| −0.022 (0.007) | 0.042 | 0.239 | 0.232 | |
| −0.019 (0.007) | 0.043 | 0.243 | 0.231 | |
| −0.019 (0.008) | 0.043 | 0.246 | 0.230 | |
| −0.020 (0.007) | 0.041 | 0.246 | 0.228 | |
| −0.022 (0.007) | 0.042 | 0.245 | 0.229 | |
| −0.020 (0.008) | 0.042 | 0.237 | 0.228 | |
| – | – | – | – | |
| −0.021 (0.007) | 0.033 | 0.238 | 0.232 | |
| −0.021 (0.007) | 0.040 | 0.238 | 0.228 |
ACA available case analysis, DI dummy indicators, FCS fully conditional specification, JM joint modelling, NAPLAN National Assessment Program - Literacy and Numeracy, RE random effects, SE standard error