| Literature DB >> 23922236 |
Tim P Morris1, Ian R White, Patrick Royston, Shaun R Seaman, Angela M Wood.
Abstract
We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable.Entities:
Keywords: compatibility; missing data; multiple imputation; ratios
Mesh:
Substances:
Year: 2013 PMID: 23922236 PMCID: PMC3920636 DOI: 10.1002/sim.5935
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Aurum summary of covariates and of the analysis model and components of body mass index (BMI); n = 1348.
| Covariate | Frequency missing (%) | Mean (SD) or frequency (%) | |
|---|---|---|---|
| Age (years) | 0 (0%) | 37 (9) | |
| Sex: male | 0 (0%) | 542 (40%) | |
| Hæmoglobin (g/mL) | 143 (11%) | 11.4 (2.3) | |
| 162 (12%) | 4.8 (0.8) | ||
| 94 (7%) | 8.9 (4.5) | ||
| BMI (kg/m 2) | 381 (28%) | 21.9 (4.9) | |
| 376 (28%) | 58 (12) | ||
| 275 (20%) | 2.7 (0.3) |
Transformation used for viral load is log 10(x4); transformation used for CD4 count is . These are standard transformations in HIV research, and we use them in the imputation models and the analysis models.
†Summarised on transformed scale.
‡Only enters into the analysis model via BMI.
EPIC-Norfolk summary of covariates of the analysis model and of components of cholesterol ratio; n = 22 754.
| Covariate | Frequency missing (%) | Mean (SD) or frequency (%) | |
|---|---|---|---|
| Age (years) | 0 (0%) | 59 (9) | |
| Sex: male | 0 (0%) | 10145 (45%) | |
| Smoking status: ever smoked | 0 (0%) | 11971 (53%) | |
| Systolic blood pressure (mm Hg) | 52 (<1%) | 135 (18) | |
| Diastolic blood pressure (mm Hg) | 52 (<1%) | 82 (11) | |
| Cholesterol ratio | 2155 (9%) | 4.7 (1.6) | |
| 1514 (7%) | 6.2 (1.2) | ||
| 2155 (9%) | 1.4 (0.4) |
†Only enters into the analysis model via cholesterol ratio.
Candidate imputation models for x.
| Imputation model | Label | Relationship to analysis model |
|---|---|---|
| ( | M1 | Compatible |
| ( | M2 | Semi-compatible |
| ( | M3 | Semi-compatible |
| ( | M4 | Semi-compatible |
| M5 | Incompatible | |
| M6 | Incompatible |
†Passive imputation of is required.
‡Passive imputation of x = exp[ln(a1) − ln(a2)] is required.
Figure 1Results from analyses of Aurum data under different models for imputing body mass index (BMI). The estimated fraction of missing information (FMI) is given next to multiple imputation analyses.
Figure 2Results from analyses of EPIC-Norfolk data under different models for cholesterol ratio. The estimated fraction of missing information (FMI) is given next to multiple imputation analyses.
Figure 3Dotplot of imputed cholesterol ratio for single (typical) imputed datasets in EPIC-Norfolk under models M1–M6. Imputed values of x < 3 or x > 20 are not plotted but represented according to rank; imputed values of (x,a1) are listed.
Simulation results: bias, coverage and efficiency of different imputation models.
| Bias ( | Empirical SE | Coverage | |||||||
|---|---|---|---|---|---|---|---|---|---|
| CV ( | Imputation model | MCAR | MAR | MCAR | MAR | MCAR | MAR | ||
| 0.1 | 0.1 | Complete data | 0.000 | 0.273 | 95.2 | ||||
| Complete cases | 0.003 | − 0.172 | 0.366 | 0.352 | 95.1 | 92.6 | |||
| M1 | − 0.005 | − 0.004 | 0.368 | 0.386 | 93.8 | 94.9 | |||
| M2 | − 0.001 | 0.002 | 0.333 | 0.345 | 94.6 | 94.7 | |||
| M3 | − 0.009 | − 0.003 | 0.363 | 0.383 | 94.6 | 94.9 | |||
| M4 | − 0.005 | 0.005 | 0.330 | 0.342 | 94.7 | 95.0 | |||
| M5 | − 0.017 | − 0.016 | 0.328 | 0.337 | 94.8 | 95.0 | |||
| ln( | M6 | − 0.016 | − 0.034 | 0.329 | 0.332 | 94.9 | 95.1 | ||
| 0.1 | 0.3 | Complete data | 0.006 | 0.267 | 95.3 | ||||
| Complete cases | 0.001 | − 0.168 | 0.359 | 0.351 | 95.3 | 92.9 | |||
| M1 | − 0.009 | 0.005 | 0.358 | 0.385 | 94.7 | 94.9 | |||
| M2 | − 0.007 | 0.014 | 0.348 | 0.372 | 94.9 | 94.9 | |||
| M3 | − 0.001 | 0.031 | 0.334 | 0.362 | 95.4 | 95.0 | |||
| M4 | − 0.001 | 0.038 | 0.325 | 0.346 | 95.0 | 94.7 | |||
| M5 | − 0.562 | − 0.665 | 0.350 | 0.334 | 94.3 | 92.6 | |||
| ln( | M6 | − 0.038 | − 0.064 | 0.313 | 0.318 | 95.8 | 95.4 | ||
| 0.3 | 0.1 | Complete data | 0.003 | 0.137 | 95.2 | ||||
| Complete cases | 0.001 | − 0.139 | 0.183 | 0.188 | 95.5 | 88.5 | |||
| M1 | − 0.005 | 0.031 | 0.171 | 0.187 | 95.3 | 94.0 | |||
| M2 | − 0.003 | 0.026 | 0.159 | 0.171 | 95.8 | 95.0 | |||
| M3 | − 0.007 | 0.029 | 0.170 | 0.188 | 95.2 | 93.8 | |||
| M4 | − 0.003 | 0.026 | 0.159 | 0.171 | 95.9 | 94.6 | |||
| M5 | − 0.016 | 0.000 | 0.158 | 0.168 | 96.1 | 95.3 | |||
| ln( | M6 | − 0.016 | − 0.031 | 0.158 | 0.163 | 96.2 | 95.6 | ||
| 0.3 | 0.3 | Complete data | − 0.002 | 0.137 | 95.0 | ||||
| Complete cases | − 0.006 | − 0.143 | 0.184 | 0.192 | 94.9 | 88.5 | |||
| M1 | − 0.009 | 0.054 | 0.174 | 0.196 | 94.2 | 93.0 | |||
| M2 | − 0.012 | 0.057 | 0.172 | 0.193 | 94.8 | 93.3 | |||
| M3 | − 0.010 | 0.076 | 0.170 | 0.191 | 94.3 | 91.5 | |||
| M4 | − 0.009 | 0.080 | 0.167 | 0.187 | 94.2 | 91.8 | |||
| M5 | − 0.580 | − 0.814 | 0.287 | 0.300 | 94.3 | 93.3 | |||
| ln( | M6 | − 0.051 | − 0.070 | 0.162 | 0.164 | 95.1 | 94.6 | ||
SE, standard error; CV, coefficient of variation; MCAR, missing completely at random; MAR, missing at random.
Candidate fully Bayesian models for x.
| Model for covariates | Label |
|---|---|
| ( | B1 |
| ( | B2 |
| ( | B3 |
| ( | B4 |
| ( | B5 |
| ( | B6 |
Figure B.1Results from analyses of Aurum data under different Bayesian models for body mass index (BMI).
Figure C.1Results from analyses of EPIC-Norfolk data under different models for cholesterol ratio using predictive mean matching. The estimated fraction of missing information (FMI) is given next to multiple imputation analyses.