| Literature DB >> 23947681 |
Amalia Karahalios1, Laura Baglietto, Katherine J Lee, Dallas R English, John B Carlin, Julie A Simpson.
Abstract
BACKGROUND: Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome).Entities:
Year: 2013 PMID: 23947681 PMCID: PMC3751092 DOI: 10.1186/1742-7622-10-6
Source DB: PubMed Journal: Emerg Themes Epidemiol ISSN: 1742-7622
Data structure for the Melbourne Collaborative Cohort Study dataset
| Waist circumference wave 1 | Continuous | Centimetres | WC1 | 1.00 |
| Waist circumference wave 2 | Continuous | Centimetres | WC2 | 0.81 |
| Age | Continuous | Years | Age | 0.18 |
| Sex | Categorical | 0 = Males | - | -0.52 |
| | | 1 = Females | Female | - |
| Education | Categorical | 0 = None or primary school | - | -0.18 |
| | | 1 = Secondary or trade school | Education secondary | - |
| | | 2 = Tertiary education | Education tertiary | - |
| Country of birth | Categorical | 0 = Australia/New Zealand | - | 0.19 |
| | | 1 = United Kingdom | COB UK | - |
| | | 2 = Mediterranean | COB Mediterranean | - |
| Smoking status | Categorical | 0 = Never smoked | - | 0.17 |
| | | 1 = Former smoker | Smoke former | - |
| | | 2 = Current smoker | Smoke current | |
| Physical activity score | Categorical | 0 = 0 | - | -0.13 |
| | | 1 = (0 to 4) | Physical low | - |
| | | 2 = [4 to 6) | Physical moderate | - |
| | | 3 = 6+ | Physical high | - |
| Alcohol consumption | Categorical | 0 grams (Male & Female) | - | 0.05 |
| | | 1-39 grams (Male) / | Alcohol low | - |
| | | 1-19 grams (Female) | | |
| | | 40-59 grams (Male) / | Alcohol moderate | - |
| | | 20-39 grams (Female) | | |
| | | 60+ grams (Male) / | Alcohol high | - |
| 40+ grams (Female) |
*Correlation with waist circumference at wave 1 was obtained from the observed data of the Melbourne Collaborative Cohort Study participants who attended both waves (i.e. 26,846 participants).
Notwithstanding the limitations of correlations for categorical variables.
Specification of the logistic regression models used to impose missing data under the two covariate-dependent MAR scenarios
| Scenario 1 | Scenario 2 $ | |
| 1 (WC1, 10 cm) | 1.10 | 1.21 |
| 2 (Age, years) | 1.06 | 1.12 |
| 3 (Female) | 1.10 | 1.21 |
| 4 (Education secondary) | 0.72 | 0.52 |
| 5 (Education tertiary) | 0.44 | 0.19 |
| 6 (COB UK) | 1.15 | 1.32 |
| 7 (COB Mediterranean) | 1.71 | 2.92 |
| 8 (Alcohol low) | 0.77 | 0.59 |
| 9 (Alcohol moderate) | 0.66 | 0.44 |
| 10 (Alcohol high) | 0.85 | 0.72 |
| 11 (Smoking former) | 1.16 | 1.35 |
| 12 (Smoking current) | 1.80 | 3.24 |
| 13 (Physical low) | 0.93 | 0.86 |
| 14 (Physical moderate) | 0.99 | 0.98 |
| 15 (Physical high) | 0.91 | 0.83 |
$Odds ratio for Scenario2 = (Odds ratio for Scenario1) 2.
Figure 1Absolute bias for complete-case analysis and MI for increasing proportions of missing data (0.15, 0.3, 0.5) under three missing data scenarios (a) corresponds to the epidemiological analysis of absolute change in waist circumference adjusting for waist circumference at wave 1; (b) corresponds to the epidemiological analysis of waist circumference at wave 2 (without adjusting for waist circumference at wave 1); CD-MAR refers to Covariate-dependent MAR scenario.
Figure 2Empirical standard error for complete-case analysis and MI for increasing proportions of missing data (0.15, 0.3, 0.5) under three missing data scenarios (a) corresponds to the epidemiological analysis of absolute change in waist circumference adjusting for waist circumference at wave 1; (b) corresponds to the epidemiological analysis of waist circumference at wave 2 (without adjusting for waist circumference at wave 1); CD-MAR refers to Covariate-dependent MAR scenario.
Figure 3Coverage of the regression coefficient estimates for complete-case analysis and MI for increasing proportions of missing data (0.15, 0.3, 0.5) under three missing data scenarios (a) corresponds to the epidemiological analysis of change in waist circumference adjusting for waist circumference at wave 1; (b) corresponds to the epidemiological analysis of waist circumference at wave 2 (without adjusting for waist circumference at wave 1); CD-MAR refers to Covariate-dependent MAR scenario.