| Literature DB >> 29870056 |
Ruth H Keogh1, Shaun R Seaman2, Jonathan W Bartlett3, Angela M Wood4.
Abstract
The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston (2009) and the "substantive model compatible" (MI-SMC) method of Bartlett et al. (2015). We also apply the "MI matched set" approach of Seaman and Keogh (2015) to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.Entities:
Keywords: Case-cohort study; Cohort study; Cox proportional hazards; Missing data; Multiple imputation; Nested case-control study
Mesh:
Year: 2018 PMID: 29870056 PMCID: PMC6481559 DOI: 10.1111/biom.12910
Source DB: PubMed Journal: Biometrics ISSN: 0006-341X Impact factor: 2.571
Analyses performed in each simulated data set. NCC indicates “nested case-control.”
| Analysis | Variables with missing data | Imputation | Analysis |
|---|---|---|---|
| Complete-data case-cohort | NA | NA | Case-cohort |
| Complete-data full cohort | NA | NA | Full cohort |
| Complete-case case-cohort | NA | Case-cohort | |
| MI-approx: full-cohort approach | Full cohort | Full cohort | |
| MI-SMC: full-cohort approach | Full cohort | Full cohort | |
| MI-approx: intermediate approach | Full cohort | Case-cohort | |
| MI-SMC: intermediate approach | Full cohort | Case-cohort | |
| MI-approx: substudy approach | Case-cohort | Case-cohort | |
| MI-SMC: substudy approach | Case-cohort | Case-cohort | |
| Complete-data NCC | NA | NA | NCC |
| Complete-data full cohort | NA | NA | Full cohort |
| Complete-case NCC | NA | NCC | |
| MI-approx: full-cohort approach | Full cohort | Full cohort | |
| MI-SMC: full-cohort approach | Full cohort | Full cohort | |
| MI-approx: intermediate approach | Full cohort | NCC | |
| MI-SMC: intermediate approach | Full cohort | NCC | |
| MI-approx: substudy approach | NCC | NCC | |
| MI-SMC: substudy approach | NCC | NCC | |
| MI matched set | NCC | NCC |
Figure 1Simulation study results: case-cohort study within a cohort with 50% missing X2. The points are the means of the point estimates from 1000 simulated data sets. Horizontal lines around each point are the 95% confidence intervals obtained based on Monte Carlo errors. The relative efficiency is relative to the complete-data substudy analysis.
Figure 2Simulation study results: nested case-control (NCC) study with one control per case within a cohort with 50% missing X2. The points are the means of the point estimates from 1000 simulated data sets. Horizontal lines around each point are the 95% confidence intervals obtained based on Monte Carlo errors. The relative efficiency is relative to the complete-data substudy analysis.
Figure 3Simulation study results from the additional scenario in which the imputation model is misspecified: case-cohort study within a cohort with β1 = β2 = β = 0.7 and 50% missing X2. In “MI-SMC full-cohort approach (1)” and “MI-SMC intermediate approach (1)” the model used for the proposal distribution p(X1|X2, Z) was a normal distribution (with main effects of X2 and Z) and in “MI-SMC full-cohort approach (2)” and “MI-SMC intermediate approach (2)” the model used for the proposal distribution for p(log X1|X2, Z) was a normal distribution (with main effects of X2 and Z). In both (1) and (2), and in “MI-SMC substudy approach,” the model used for the proposal distribution p(X2|X1, Z) was a logistic regression with main effects of X1 and Z. The horizontal lines around each point are the 95% confidence intervals obtained based on Monte Carlo errors. Coverage for X1 in “MI-approx full-cohort approach” and “MI-SMC full-cohort approach (1)” is 0% and not shown on the plot.
Results from applying MI methods to the ARIC cohort with case-cohort and nested case-control substudies, to investigate the association between the variables listed and the hazard for death due to cardiovascular disease.
The estimates are log hazard ratios. [*Education levels: 1 - primary, 2 - Secondary, 3 - Vocational/University].
| (a) Case-cohort study within the full cohort | ||||||||
| Complete-data full cohort | Complete-case | MI full-cohort approach | MI substudy approach | MI intermediate approach | ||||
| MI-approx | MI-SMC | MI-approx | MI-SMC | MI-approx | MI-SMC | |||
| Est (SE) | Est (SE) | Est (SE) | Est (SE) | Est (SE) | Est (SE) | Est (SE) | Est (SE) | |
| Sex (baseline=male) | −0.481 (0.073) | −0.796 (0.188) | −0.436 (0.072) | −0.453 (0.071) | −0.508 (0.139) | −0.481 (0.140) | −0.482 (0.140) | −0.502 (0.140) |
| Age (years) | 0.097 (0.006) | 0.091 (0.017) | 0.095 (0.006) | 0.095 (0.006) | 0.082 (0.012) | 0.082 (0.012) | 0.083 (0.012) | 0.083 (0.012) |
| Education*: 1 | ref | |||||||
| Education: 2 | −0.149 (0.092) | 0.323 (0.274) | −0.119 (0.096) | −0.090 (0.101) | 0.066 (0.208) | 0.026 (0.214) | 0.062 (0.214) | 0.052 (0.218) |
| Education: 3 | −0.426 (0.097) | 0.136 (0.288) | −0.373 (0.101) | −0.369 (0.103) | −0.093 (0.211) | −0.097 (0.215) | −0.080 (0.221) | −0.114 (0.233) |
| Non-smoker | ref | |||||||
| Current smoker | 0.641 (0.067) | 0.735 (0.178) | 0.609 (0.071) | 0.622 (0.070) | 0.584 (0.140) | 0.609 (0.141) | 0.595 (0.143) | 0.610 (0.141) |
| White race | ref | |||||||
| Non-white race | 0.434 (0.072) | 0.456 (0.201) | 0.508 (0.077) | 0.498 (0.080) | 0.363 (0.159) | 0.319 (0.159) | 0.365 (0.161) | 0.355 (0.166) |
| SBP (mmHG) | 0.015 (0.002) | 0.008 (0.004) | 0.015 (0.002) | 0.015 (0.002) | 0.012 (0.004) | 0.013 (0.003) | 0.013 (0.004) | 0.013 (0.004) |
| BMI (kg/metres2) | 0.040 (0.006) | 0.060 (0.019) | 0.037 (0.006) | 0.038 (0.006) | 0.042 (0.014) | 0.042 (0.014) | 0.040 (0.014) | 0.042 (0.014) |
| Total chol. (mmol/l) | 0.055 (0.030) | 0.055 (0.066) | 0.037 (0.043) | 0.043 (0.050) | 0.026 (0.052) | 0.028 (0.052) | 0.028 (0.052) | 0.028 (0.052) |
| HDL chol. (mmol/l) | −0.453 (0.102) | −0.427 (0.275) | −0.637 (0.134) | −0.600 (0.110) | −0.473 (0.187) | −0.458 (0.187) | −0.477 (0.187) | −0.464 (0.187) |
| (b) Nested case-control study within the full cohort | ||||||||
| Complete-case | MI full-cohort approach | MI substudy approach | MI Matched set | MI intermediate approach | ||||
| MI-approx | MI-SMC | MI-approx | MI-SMC | MI-approx | MI-SMC | |||
| Est (SE) | Est (SE) | Est (SE) | Est (SE) | Est (SE) | Est (SE) | |||
| Sex (baseline=male) | 0.522 (0.195) | −0.435 (0.077) | −0.411 (0.076) | −0.523 (0.123) | −0.501 (0.118) | −0.508 (0.120) | −0.509 (0.122) | −0.510 (0.122) |
| Age (years) | 0.114 (0.018) | 0.096 (0.006) | 0.097 (0.006) | 0.106 (0.011) | 0.111 (0.011) | 0.111 (0.011) | 0.105 (0.011) | 0.104 (0.011) |
| Education*: 1 | ref | |||||||
| Education: 2 | −0.068 (0.280) | −0.087 (0.098) | −0.084 (0.106) | −0.082 (0.197) | −0.129 (0.183) | −0.114 (0.188) | −0.107 (0.210) | −0.086 (0.201) |
| Education: 3 | −0.269 (0.275) | −0.354 (0.103) | −0.356 (0.109) | −0.402 (0.195) | −0.461 (0.185) | −0.454 (0.186) | −0.418 (0.205) | −0.399 (0.203) |
| Non-smoker | ref | |||||||
| Current smoker | 0.806 (0.198) | 0.595 (0.072) | 0.600 (0.078) | 0.829 (0.135) | 0.806 (0.130) | 0.795 (0.131) | 0.811 (0.138) | 0.803 (0.137) |
| White race | ref | |||||||
| Non-white race | 0.415 (0.209) | 0.525 (0.077) | 0.529 (0.078) | 0.390 (0.146) | 0.424 (0.135) | 0.421 (0.136) | 0.376 (0.140) | 0.375 (0.145) |
| SBP (mmHG) | 0.010 (0.005) | 0.014 (0.002) | 0.015 (0.002) | 0.015 (0.003) | 0.014 (0.003) | 0.015 (0.003) | 0.015 (0.003) | 0.015 (0.003) |
| BMI (kg/metres2) | 0.063 (0.017) | 0.038 (0.007) | 0.037 (0.006) | 0.057 (0.012) | 0.055 (0.012) | 0.055 (0.011) | 0.056 (0.012) | 0.056 (0.012) |
| Total chol. (mmol/l) | 0.039 (0.081) | 0.032 (0.043) | 0.031 (0.037) | −0.034 (0.051) | 0.001 (0.049) | 0.006 (0.049) | −0.034 (0.051) | −0.034 (0.051) |
| HDL chol. (mmol/l) | −0.302 (0.212) | −0.570 (0.132) | −0.664 (0.135) | −0.493 (0.146) | −0.458 (0.140) | −0.471 (0.139) | −0.502 (0.145) | −0.502 (0.145) |