| Literature DB >> 28587662 |
Muhan Zhou1, Yulei He2, Mandi Yu3, Chiu-Hsieh Hsu4.
Abstract
BACKGROUND: Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities.Entities:
Keywords: Categorical data; Double robustness; Missing at Random; Multiple imputation; Nearest neighbour
Mesh:
Year: 2017 PMID: 28587662 PMCID: PMC5461637 DOI: 10.1186/s12874-017-0360-2
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Simulation results from probability estimation for Y, where Y is generated using a logit link function with five covariates, δ is generated using a logit link function with not extreme missingness probabilities (M1) based on five covariates, N = 400
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Est | SD | SE | CR | Est | SD | SE | CR |
| FO | 0.386 | 0.023 | 0.024 | 0.960 | 0.286 | 0.023 | 0.023 | 0.934 |
| CC | 0.439 | 0.034 | 0.035 | 0.674 | 0.340 | 0.034 | 0.033 | 0.670 |
| Working models for | Five covariates with logit link | |||||||
| Working models for | Five covariates with logit link | |||||||
| CE | 0.388 | 0.036 | 0.036 | 0.948 | 0.286 | 0.038 | 0.036 | 0.924 |
| PMI | 0.387 | 0.030 | 0.032 | 0.954 | 0.287 | 0.034 | 0.032 | 0.930 |
|
| 0.387 | 0.032 | 0.033 | 0.952 | 0.288 | 0.036 | 0.033 | 0.936 |
|
| 0.389 | 0.033 | 0.034 | 0.956 | 0.288 | 0.035 | 0.033 | 0.930 |
|
| 0.386 | 0.032 | 0.033 | 0.956 | 0.290 | 0.036 | 0.034 | 0.926 |
|
| 0.385 | 0.032 | 0.033 | 0.948 | 0.294 | 0.036 | 0.034 | 0.916 |
|
| 0.381 | 0.032 | 0.033 | 0.944 | 0.295 | 0.037 | 0.034 | 0.928 |
|
| 0.390 | 0.032 | 0.033 | 0.950 | 0.294 | 0.037 | 0.034 | 0.936 |
| Working models for | Three covariates with logit link | |||||||
| (misspecified scenario 1) | ||||||||
| Working models for | Five covariates with logit link | |||||||
| CE | 0.311 | 0.057 | 0.057 | 0.760 | 0.288 | 0.041 | 0.041 | 0.932 |
| PMI | 0.464 | 0.037 | 0.038 | 0.454 | 0.285 | 0.032 | 0.031 | 0.922 |
|
| 0.410 | 0.036 | 0.039 | 0.932 | 0.290 | 0.035 | 0.033 | 0.926 |
|
| 0.407 | 0.036 | 0.039 | 0.940 | 0.290 | 0.035 | 0.033 | 0.932 |
|
| 0.408 | 0.035 | 0.039 | 0.930 | 0.291 | 0.035 | 0.033 | 0.928 |
|
| 0.415 | 0.036 | 0.038 | 0.896 | 0.292 | 0.034 | 0.033 | 0.940 |
|
| 0.412 | 0.036 | 0.039 | 0.916 | 0.292 | 0.035 | 0.033 | 0.934 |
|
| 0.413 | 0.035 | 0.039 | 0.926 | 0.291 | 0.035 | 0.034 | 0.954 |
| Working models for | Five covariates with logit link | |||||||
| Working models for | Three covariates with logit link | |||||||
| (misspecified scenario 2) | ||||||||
| CE | 0.389 | 0.032 | 0.033 | 0.954 | 0.285 | 0.033 | 0.032 | 0.942 |
| PMI | 0.387 | 0.030 | 0.032 | 0.954 | 0.287 | 0.034 | 0.032 | 0.930 |
|
| 0.393 | 0.032 | 0.033 | 0.962 | 0.292 | 0.035 | 0.033 | 0.936 |
|
| 0.402 | 0.034 | 0.035 | 0.936 | 0.289 | 0.035 | 0.033 | 0.926 |
|
| 0.389 | 0.031 | 0.033 | 0.960 | 0.297 | 0.036 | 0.034 | 0.936 |
|
| 0.387 | 0.031 | 0.032 | 0.958 | 0.298 | 0.035 | 0.033 | 0.936 |
|
| 0.382 | 0.031 | 0.033 | 0.956 | 0.298 | 0.035 | 0.034 | 0.940 |
|
| 0.392 | 0.031 | 0.033 | 0.954 | 0.302 | 0.035 | 0.034 | 0.920 |
Est: Estimates of probabilities; SD: Empirical standard deviation; SE: Estimate of standard error; CR: Coverage rate of 95% confidence intervals; FO: fully observed; CC: Complete Cases; CE: Calibration estimator; PMI: Parametric Multiple Imputation; NNMI (NN,ω 1,ω 2;ω 3): the NNMI method using Multinomial Logistic Regressions, NN is the number of nearest neighbors and weights are ω 1,ω 2, and ω 3; NNMI : the NNMI method using Cumulative Logistic Regressions; K = 10 imputed datasets are used for PMI and NNMI methods
Simulation results from probability estimation for Y, where Y is generated using a logit link function with five covariates, δ is generated using a logit link function with extreme missingness probabilities (M2) based on five covariates, N = 400
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Est | SD | SE | CR | Est | SD | SE | CR |
| FO | 0.386 | 0.023 | 0.024 | 0.960 | 0.286 | 0.023 | 0.023 | 0.934 |
| CC | 0.425 | 0.031 | 0.033 | 0.802 | 0.374 | 0.033 | 0.033 | 0.250 |
| Working models for | Five covariates with logit link | |||||||
| Working models for | Five covariates with logit link | |||||||
| CE | 0.378 | 0.102 | 0.080 | 0.946 | 0.288 | 0.108 | 0.076 | 0.902 |
| PMI | 0.385 | 0.034 | 0.036 | 0.950 | 0.288 | 0.036 | 0.033 | 0.922 |
|
| 0.389 | 0.039 | 0.040 | 0.946 | 0.297 | 0.043 | 0.040 | 0.906 |
|
| 0.399 | 0.042 | 0.045 | 0.942 | 0.292 | 0.041 | 0.039 | 0.918 |
|
| 0.385 | 0.038 | 0.039 | 0.936 | 0.302 | 0.044 | 0.042 | 0.916 |
|
| 0.384 | 0.037 | 0.038 | 0.938 | 0.304 | 0.042 | 0.040 | 0.918 |
|
| 0.372 | 0.038 | 0.039 | 0.926 | 0.307 | 0.043 | 0.042 | 0.918 |
|
| 0.395 | 0.038 | 0.039 | 0.944 | 0.305 | 0.043 | 0.041 | 0.908 |
| Working models for | Three covariates with logit link | |||||||
| (misspecified scenario 1) | ||||||||
| Working models for | Five covariates with logit link | |||||||
| CE | 0.302 | 0.234 | 0.184 | 0.946 | 0.287 | 0.117 | 0.084 | 0.910 |
| PMI | 0.495 | 0.039 | 0.042 | 0.258 | 0.288 | 0.032 | 0.031 | 0.932 |
|
| 0.436 | 0.051 | 0.053 | 0.852 | 0.295 | 0.042 | 0.040 | 0.914 |
|
| 0.431 | 0.052 | 0.054 | 0.878 | 0.293 | 0.041 | 0.040 | 0.932 |
|
| 0.433 | 0.050 | 0.053 | 0.858 | 0.296 | 0.042 | 0.041 | 0.924 |
|
| 0.440 | 0.047 | 0.049 | 0.806 | 0.297 | 0.040 | 0.039 | 0.924 |
|
| 0.429 | 0.046 | 0.048 | 0.852 | 0.299 | 0.042 | 0.039 | 0.926 |
|
| 0.441 | 0.050 | 0.051 | 0.806 | 0.297 | 0.041 | 0.040 | 0.920 |
| Working models for | Five covariates with logit link | |||||||
| Working models for | Three covariates with logit link | |||||||
| (misspecified scenario 2) | ||||||||
| CE | 0.386 | 0.050 | 0.048 | 0.960 | 0.286 | 0.043 | 0.040 | 0.894 |
| PMI | 0.385 | 0.034 | 0.036 | 0.950 | 0.288 | 0.036 | 0.033 | 0.922 |
|
| 0.398 | 0.038 | 0.040 | 0.952 | 0.301 | 0.041 | 0.039 | 0.906 |
|
| 0.426 | 0.042 | 0.045 | 0.858 | 0.294 | 0.039 | 0.037 | 0.922 |
|
| 0.392 | 0.037 | 0.038 | 0.942 | 0.312 | 0.043 | 0.040 | 0.882 |
|
| 0.390 | 0.035 | 0.038 | 0.954 | 0.307 | 0.040 | 0.039 | 0.912 |
|
| 0.377 | 0.037 | 0.039 | 0.940 | 0.307 | 0.041 | 0.040 | 0.924 |
|
| 0.401 | 0.036 | 0.037 | 0.938 | 0.313 | 0.041 | 0.039 | 0.912 |
Est: Estimates of probabilities; SD: Empirical standard deviation; SE: Estimate of standard error; CR: Coverage rate of 95% confidence intervals; FO: fully observed; CC: Complete Cases; CE: Calibration estimator; PMI: Parametric Multiple Imputation; NNMI (NN,ω 1,ω 2;ω 3): the NNMI method using Multinomial Logistic Regressions, NN is the number of nearest neighbors and weights are ω 1,ω 2, and ω 3; NNMI : the NNMI method using Cumulative Logistic Regressions; K = 10 imputed datasets are used for PMI and NNMI methods
Simulation results from probability estimation for Y, where Y is generated using a probit link function with five covariates, δ is generated using a logit link function with not extreme missingness probabilities (M1) based on five covariates, N = 400
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Est | SD | SE | CR | Est | SD | SE | CR |
| FO | 0.298 | 0.023 | 0.023 | 0.952 | 0.249 | 0.021 | 0.022 | 0.974 |
| CC | 0.322 | 0.032 | 0.033 | 0.910 | 0.303 | 0.033 | 0.032 | 0.606 |
| Working models for | Five covariates with logit link | |||||||
| (misspecified scenario 3) | ||||||||
| Working models for | Five covariates with logit link | |||||||
| CE | 0.291 | 0.036 | 0.037 | 0.954 | 0.230 | 0.031 | 0.032 | 0.900 |
| PMI | 0.307 | 0.033 | 0.033 | 0.942 | 0.271 | 0.034 | 0.033 | 0.902 |
|
| 0.301 | 0.033 | 0.033 | 0.940 | 0.260 | 0.031 | 0.032 | 0.942 |
|
| 0.302 | 0.034 | 0.034 | 0.946 | 0.259 | 0.032 | 0.032 | 0.936 |
|
| 0.301 | 0.032 | 0.033 | 0.944 | 0.260 | 0.033 | 0.032 | 0.930 |
|
| 0.299 | 0.033 | 0.033 | 0.936 | 0.263 | 0.032 | 0.033 | 0.936 |
|
| 0.297 | 0.033 | 0.033 | 0.930 | 0.263 | 0.033 | 0.033 | 0.926 |
|
| 0.302 | 0.032 | 0.034 | 0.948 | 0.261 | 0.032 | 0.032 | 0.942 |
Est: Estimates of probabilities; SD: Empirical standard deviation; SE: Estimate of standard error; CR: Coverage rate of 95% confidence intervals; FO: fully observed; CC: Complete Cases; CE: Calibration estimator; PMI: Parametric Multiple Imputation; NNMI (NN,ω 1,ω 2;ω 3): the NNMI method using Multinomial Logistic Regressions, NN is the number of nearest neighbors and weights are ω 1,ω 2, and ω 3; NNMI : the NNMI method using Cumulative Logistic Regressions; K = 10 imputed datasets are used for PMI and NNMI methods
Simulation results from probability estimation for Y, where Y is generated using a logit link function with five covariates, δ is generated using a probit link function with not extreme missingness probabilities (M1) based on five covariates, N = 400
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Est | SD | SE | CR | Est | SD | SE | CR |
| FO | 0.386 | 0.023 | 0.024 | 0.960 | 0.286 | 0.023 | 0.023 | 0.934 |
| CC | 0.456 | 0.033 | 0.035 | 0.512 | 0.357 | 0.033 | 0.034 | 0.472 |
| Working models for | Five covariates with logit link | |||||||
| Working models for | Five covariates with logit link | |||||||
| (misspecified scenario 4) | ||||||||
| CE | 0.386 | 0.056 | 0.051 | 0.944 | 0.287 | 0.060 | 0.051 | 0.910 |
| PMI | 0.388 | 0.033 | 0.034 | 0.950 | 0.288 | 0.035 | 0.034 | 0.926 |
|
| 0.391 | 0.036 | 0.038 | 0.954 | 0.294 | 0.040 | 0.039 | 0.942 |
|
| 0.397 | 0.038 | 0.041 | 0.948 | 0.291 | 0.039 | 0.038 | 0.928 |
|
| 0.388 | 0.035 | 0.037 | 0.966 | 0.299 | 0.042 | 0.041 | 0.928 |
|
| 0.387 | 0.035 | 0.036 | 0.948 | 0.303 | 0.040 | 0.041 | 0.928 |
|
| 0.379 | 0.035 | 0.036 | 0.938 | 0.304 | 0.040 | 0.041 | 0.930 |
|
| 0.395 | 0.036 | 0.037 | 0.956 | 0.302 | 0.041 | 0.040 | 0.924 |
Est: Estimates of probabilities; SD: Empirical standard deviation; SE: Estimate of standard error; CR: Coverage rate of 95% confidence intervals; FO: fully observed; CC: Complete Cases; CE: Calibration estimator; PMI: Parametric Multiple Imputation; NNMI (NN,ω 1,ω 2;ω 3): the NNMI method using Multinomial Logistic Regressions, NN is the number of nearest neighbors and weights are ω 1,ω 2, and ω 3; NNMI : the NNMI method using Cumulative Logistic Regressions; K = 10 imputed datasets are used for PMI and NNMI methods
Simulation results from probability estimation for Y, where Y is generated using a probit link function with five covariates, δ is generated using a probit link function with not extreme missingness probabilities (M1) based on five covariates, N = 400
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Est | SD | SE | CR | Est | SD | SE | CR |
| FO | 0.298 | 0.023 | 0.023 | 0.952 | 0.249 | 0.021 | 0.022 | 0.974 |
| CC | 0.328 | 0.032 | 0.033 | 0.862 | 0.323 | 0.033 | 0.033 | 0.406 |
| Working models for | Five covariates with logit link | |||||||
| (misspecified scenario 5) | ||||||||
| Working models for | Five covariates with logit link | |||||||
| (misspecified scenario 5) | ||||||||
| CE | 0.295 | 0.068 | 0.058 | 0.956 | 0.218 | 0.049 | 0.051 | 0.926 |
| PMI | 0.316 | 0.038 | 0.038 | 0.912 | 0.294 | 0.038 | 0.038 | 0.800 |
|
| 0.310 | 0.039 | 0.040 | 0.940 | 0.275 | 0.036 | 0.039 | 0.930 |
|
| 0.314 | 0.041 | 0.041 | 0.934 | 0.274 | 0.037 | 0.038 | 0.924 |
|
| 0.309 | 0.040 | 0.040 | 0.924 | 0.276 | 0.038 | 0.038 | 0.914 |
|
| 0.308 | 0.040 | 0.040 | 0.936 | 0.279 | 0.037 | 0.038 | 0.924 |
|
| 0.305 | 0.039 | 0.040 | 0.930 | 0.279 | 0.037 | 0.039 | 0.914 |
|
| 0.310 | 0.040 | 0.040 | 0.920 | 0.276 | 0.037 | 0.038 | 0.924 |
Est: Estimates of probabilities; SD: Empirical standard deviation; SE: Estimate of standard error; CR: Coverage rate of 95% confidence intervals; FO: fully observed; CC: Complete Cases; CE: Calibration estimator; PMI: Parametric Multiple Imputation; NNMI (NN,ω 1,ω 2;ω 3): the NNMI method using Multinomial Logistic Regressions, NN is the number of nearest neighbors and weights are ω 1,ω 2, and ω 3; NNMI : the NNMI method using Cumulative Logistic Regressions; K = 10 imputed datasets are used for PMI and NNMI methods
2013 BRFSS Survey Data: Estimation for the probabilities of satisfaction with health care received for the Hispanic participants who were unable to work with annual household income less than 15000 dollars, N=1430 (overall missing rate=25.4%)
|
|
| |||
|---|---|---|---|---|
| Method | Est (SE) | 95% CI | Est (SE) | 95% CI |
| CC | 0.585 (0.015) | (0.555, 0.614) | 0.335 (0.014) | (0.306, 0.363) |
| CE | 0.553 (0.016) | (0.521, 0.584) | 0.349 (0.016) | (0.319, 0.380) |
| PMI | 0.552 (0.014) | (0.524, 0.581) | 0.345 (0.014) | (0.318, 0.372) |
|
| 0.560 (0.019) | (0.522, 0.598) | 0.353 (0.020) | (0.314, 0.392) |
|
| 0.556 (0.019) | (0.519, 0.592) | 0.351 (0.021) | (0.310, 0.391) |
|
| 0.550 (0.022) | (0.507, 0.594) | 0.359 (0.019) | (0.322, 0.396) |
|
| 0.547 (0.021) | (0.506, 0.588) | 0.358 (0.017) | (0.324, 0.392) |
|
| 0.559 (0.016) | (0.528, 0.590) | 0.352 (0.016) | (0.320, 0.383) |
|
| 0.555 (0.018) | (0.520, 0.590) | 0.350 (0.019) | (0.314, 0.387) |
Est: Estimates of probabilities; SE: Estimate of standard error; 95%CI: 95% confidence interval
X: covariates as gender, general health, education level, having health care coverage, and having delayed getting medical care, that are used in working models
CC: Complete Cases; CE: Calibration estimator; PMI: Parametric Multiple Imputation; NNMI (NN,ω 1,ω 2;ω 3): denotes the NNMI method using Multinomial Logistic Regressions, NN is the number of nearest neighbors and weights are ω 1,ω 2, and ω 3; NNMI : the NNMI method using Cumulative Logistic Regressions; K = 10 imputed datasets are used for PMI and NNMI methods