| Literature DB >> 28830466 |
Iris Eekhout1,2,3, Mark A van de Wiel4,5, Martijn W Heymans4,6.
Abstract
BACKGROUND: Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin's Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power.Entities:
Keywords: Categorical covariates; Logistic regression; Multiple imputation; Pooling; Significance test; Simulation study
Mesh:
Year: 2017 PMID: 28830466 PMCID: PMC5568368 DOI: 10.1186/s12874-017-0404-7
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Type I error for each pooling method for simulated data with beta equal to zero for 25% missing data in varying correlations between the variables where Factor1 is categorical and Covar1-Covar4 are continuous
| Cor | Full data | RR | MR | CHI | VAR | MPRin | MPRout | |
|---|---|---|---|---|---|---|---|---|
| 0.2 | Factor | 0.057 | a | 0.019 | 0.024 | 0.018 | 0.065 | 0.038 |
| covar1 | 0.056 | 0.048 | 0.052 | 0.057 | 0.048 | 0.104 | 0.028 | |
| covar2 | 0.056 | 0.056 | 0.057 | 0.057 | 0.056 | 0.061 | 0.059 | |
| covar3 | 0.043 | 0.051 | 0.057 | 0.055 | 0.051 | 0.063 | 0.052 | |
| covar4 | 0.070 | 0.058 | 0.061 | 0.063 | 0.058 | 0.070 | 0.052 | |
| 0.4 | Factor | 0.057 | a | 0.020 | 0.026 | 0.025 | 0.065 | 0.035 |
| covar1 | 0.056 | 0.046 | 0.048 | 0.052 | 0.046 | 0.094 | 0.030 | |
| covar2 | 0.056 | 0.051 | 0.054 | 0.054 | 0.051 | 0.060 | 0.056 | |
| covar3 | 0.043 | 0.059 | 0.063 | 0.061 | 0.059 | 0.070 | 0.052 | |
| covar4 | 0.070 | 0.056 | 0.057 | 0.056 | 0.056 | 0.057 | 0.055 | |
| 0.6 | Factor | 0.061 | a | 0.026 | 0.026 | 0.026 | 0.068 | 0.023 |
| covar1 | 0.058 | 0.048 | 0.049 | 0.051 | 0.048 | 0.088 | 0.031 | |
| covar2 | 0.065 | 0.055 | 0.057 | 0.060 | 0.055 | 0.066 | 0.056 | |
| covar3 | 0.059 | 0.051 | 0.054 | 0.053 | 0.051 | 0.058 | 0.051 | |
| covar4 | 0.063 | 0.063 | 0.066 | 0.066 | 0.063 | 0.075 | 0.064 | |
| 0.8 | Factor | 0.057 | a | 0.026 | 0.026 | 0.025 | 0.077 | 0.033 |
| covar1 | 0.056 | 0.057 | 0.058 | 0.058 | 0.057 | 0.098 | 0.019 | |
| covar2 | 0.056 | 0.058 | 0.060 | 0.061 | 0.058 | 0.063 | 0.055 | |
| covar3 | 0.043 | 0.060 | 0.061 | 0.061 | 0.060 | 0.070 | 0.043 | |
| covar4 | 0.070 | 0.053 | 0.052 | 0.054 | 0.053 | 0.062 | 0.059 |
aFor the categorical variable the p-value could not be obtained by RR; Cor correlation between variables; Full data complete data; RR Rubin’s Rules, MR Meng and Rubin pooling, CHI chi-square test with multiple degrees of freedom, VAR pooled sampling variance method, MPR Median P Rule with the outcome included in model, MPR Median P Rule with the outcome excluded from model
Fig. 1Power for the condition where the percentage of missing data was 25% and the correlation between the variables was 0.4 and the outcome was included in the imputation model. Note that for the continuous variable the lines for RR and VAR overlap. Full data = complete data; MPR = median P rule; CHI = chi-square test with multiple degrees of freedom; MR = Meng and Rubin pooling; VAR = pooled sampling variance method; RR = Rubin’s Rules
Fig. 2Power for the condition where the percentage of missing data was 25% and the correlation between the variables was 0.4 and the outcome was excluded from the imputation model. Note that for the continuous variable the lines for RR and VAR overlap. Full data = complete data; MPR = median P rule; CHI = chi-square test with multiple degrees of freedom; MR = Meng and Rubin pooling; VAR = pooled sampling variance method; RR = Rubin’s Rules
Fig. 3Power for the condition where the percentage of missing data was 25% and the correlation between the variables was 0.4. Note that for the continuous variable the lines for RR and VAR overlap. Full data = complete data; MPRout = median P rule with outcome excluded from imputation model; CHI = chi-square test with multiple degrees of freedom; MR = Meng and Rubin pooling; VAR = pooled sampling variance method; RR = Rubin’s Rules
Model estimates of complete data analysis
| Estimate | SE | Z |
| |
|---|---|---|---|---|
| Intercept | −8.0215 | 2.4064 | −3.3333 | 0.0008 |
| Group | 0.0516 | |||
| Group (1)a | 0.7535 | 0.3287 | 2.2927 | 0.0219 |
| Group (2)a | 0.5986 | 0.3338 | 1.7936 | 0.0729 |
| Age | −0.0007 | 0.0141 | −0.0498 | 0.9602 |
| Gender | 0.4247 | 0.3525 | 1.2049 | 0.2282 |
| BMI | 0.0619 | 0.0352 | 1.7587 | 0.0786 |
| Education | 0.5108 | |||
| Education (1)a | −0.1501 | 0.4058 | −0.3699 | 0.7114 |
| Education (2)a | −0.3208 | 0.4397 | −0.7297 | 0.4656 |
| Education (3)a | −0.5997 | 0.6829 | −0.8782 | 0.3798 |
| Education (4)a | −1.2694 | 0.8117 | −1.5639 | 0.1179 |
| Sitting | 0.0195 | |||
| Sitting (1)a | 0.6305 | 0.3295 | 1.9134 | 0.0557 |
| Sitting (2)a | −0.3515 | 0.4795 | −0.7329 | 0.4636 |
| Sitting (3)a | 1.0407 | 0.5286 | 1.9690 | 0.0489 |
| Lifting | 0.9830 | |||
| Lifting (1)a | 0.1441 | 0.3692 | 0.3903 | 0.6963 |
| Lifting (2)a | 0.0574 | 0.4127 | 0.1389 | 0.8894 |
| Lifting (3)a | 0.0990 | 0.4424 | 0.2238 | 0.8229 |
| Vibration tools | 0.0090 | |||
| Vibration tools (1)a | −0.5406 | 0.3717 | −1.4543 | 0.1459 |
| Vibration tools (2)a | 0.0554 | 0.4165 | 0.1329 | 0.8942 |
| Vibration tools (3)a | −1.6335 | 0.5573 | −2.9313 | 0.0034 |
| Pain at baseline | 0.3232 | 0.0836 | 3.8656 | 0.0001 |
| Physical functioning | 0.3220 | 0.1919 | 1.6778 | 0.0934 |
| Disability | −0.9110 | 0.3283 | −2.7747 | 0.0055 |
| Kinesiophobia | 0.0299 | 0.0222 | 1.3524 | 0.1762 |
aThe numbers between brackets indicate the dummy variables; SE Standard Error
P-values from complete data analysis, pooling methods and listwise deletion
| Full data | RR | MR | CHI | VAR | MPRout | Listwise | |
|---|---|---|---|---|---|---|---|
| Group | 0.0515 | a | 0.0498 | 0.0583 | 0.0643 | 0.0549 | 0.3234 |
| Age | 0.9602 | 0.9245 | 0.9283 | 0.8780 | 0.9244 | 0.8898 | 0.8245 |
| Gender | 0.2250 | 0.3040 | 0.2854 | 0.3017 | 0.3041 | 0.2862 | 0.8836 |
| BMI | 0.0764 | 0.0172 | 0.0222 | 0.0137 | 0.0173 | 0.0049 | 0.0103 |
| Education | 0.5108 | a | 0.7546 | 0.7235 | 0.7468 | 0.4579 | 0.6141 |
| Sitting | 0.0195 | a | 0.0396 | 0.0355 | 0.0498 | 0.0306 | 0.1196 |
| Lifting | 0.9830 | a | 0.9485 | 0.8755 | 0.9484 | 0.7605 | 0.9289 |
| Vibration Tools | 0.0090 | a | 0.0115 | 0.0130 | 0.0236 | 0.0109 | 0.0833 |
| Pain baseline | 0.0000 | 0.0001 | 0.0000 | 0.0000 | 0.0001 | 0.0000 | 0.0008 |
| Physical Functioning | 0.0913 | 0.0970 | 0.1095 | 0.0943 | 0.0970 | 0.0532 | 0.0608 |
| Disability | 0.0049 | 0.0032 | 0.0009 | 0.0027 | 0.0032 | 0.0022 | 0.0595 |
| Kinesiophobia | 0.1730 | 0.2115 | 0.2312 | 0.2084 | 0.2115 | 0.2018 | 0.0438 |
aFor the categorical variables the overall p-value could not be obtained by RR. Full data complete data, RR Rubin’s Rules, MR Meng and Rubin pooling, CHI chi-square test with multiple degrees of freedom, VAR pooled sampling variance method; MPR Median P Rule with the outcome excluded from model, Listwise analysis after excluding cases with missings
Probability (P) and standard deviation (SD) for rejection of the null-hypothesis, i.e. power for variables above dashed line and type I error for variables below dashed line, from data based simulation
aFor the categorical variables the overall p-value could not be obtained by RR; RR Rubin’s Rules, MR Meng and Rubin pooling, CHI chi-square test with multiple degrees of freedom, VAR pooled sampling variance method, MPR Median P Rule with the outcome excluded from model