| Literature DB >> 30899916 |
Qiyuan Pan1, Rong Wei2.
Abstract
Multiple imputation (MI) has become the most popular approach in handling missing data. Closely associated with MI, the fraction of missing information (FMI) is an important parameter for diagnosing the impact of missing data. Currently γ m , the sample value of FMI estimated from MI of a limited m, is used as the estimate of γ0, the population value of FMI, where m is the number of imputations of the MI. This FMI estimation method, however, has never been adequately justified and evaluated. In this paper, we quantitatively demonstrated that E(γ m ) decreases with the increase of m so that E(γ m ) > γ0 for any finite m. As a result γ m would inevitably overestimate γ0. Three improved FMI estimation methods were proposed. The major conclusions were substantiated by the results of the MI trials using the data of the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey, USA.Entities:
Keywords: Applied Mathematics; Mathematics & Statistics; Mathematics for Biology & Medicine; National Ambulatory Medical Care Survey; Science; fraction of missing information; missing data; multiple imputation; number of imputations
Year: 2018 PMID: 30899916 PMCID: PMC6423960 DOI: 10.1080/25742558.2018.1551504
Source DB: PubMed Journal: Cogent Math Stat
Figure 1.The m–E(γ) relationship curve at γ0 = 0.2 and 0.15 as determined by Equation (14).
Changes of E(γ), Dγ, and R with the increase of m at different γ0 levels, where Dγ = 100(E(γ)−γ0 )/γ0 and R= 100(γ −γ)/γ
| γ0 = 0.2 | γ0 = 0.01 | γ0 = 0.2 | γ0 = 0.01 | γ0 = 0.2 | γ0 = 0.01 | |
|---|---|---|---|---|---|---|
| 0.361 | 0.01536 | 23.33 | 14.12 | 80.59 | 53.64 | |
| 0.250 | 0.01205 | 3.872 | 2.957 | 25.23 | 20.47 | |
| 0.224 | 0.01102 | 1.0240 | 0.8502 | 11.84 | 10.16 | |
| 0.212 | 0.01051 | 0.2664 | 0.2306 | 5.750 | 5.062 | |
| 0.206 | 0.01025 | 0.0682 | 0.0602 | 2.837 | 2.528 | |
| 0.204 | 0.01017 | 0.0306 | 0.0272 | 1.883 | 1.684 | |
| 0.202 | 0.01010 | 0.0111 | 0.0099 | 1.126 | 1.010 | |
| 0.201 | 0.01005 | 0.0028 | 0.0025 | 0.561 | 0.505 | |
Figure 2.Effects of γ0 levels onDγ as defined by Equation (15) and RDγ as defined by Equation (16): a. Dγ at m = 5; b. Dγ at m = 2.
Figure 3.Effects of m on γ at δ = 29% for analytic model = Anal-2: MI model = MI-1; b. MI model = MI-2.
Coefficient of variations (%) of B, U, and γ for SIZE100
| MI-1, Anal-1 | MI-2, Anal-2 | |||||
|---|---|---|---|---|---|---|
| γ | γ | |||||
| 3 | 20.24 | 0.185 | 20.19 | 1.52 | 0.0553 | 1.50 |
| 5 | 11.12 | 0.179 | 11.10 | 1.00 | 0.0200 | 1.01 |
| 10 | 7.75 | 0.135 | 7.75 | 0.88 | 0.0353 | 0.91 |
| 20 | 5.91 | 0.098 | 5.89 | 0.49 | 0.0282 | 0.48 |
| 30 | 6.24 | 0.077 | 6.26 | 0.24 | 0.0150 | 0.24 |
| 40 | 3.60 | 0.048 | 3.59 | 0.61 | 0.0180 | 0.59 |
| 60 | 2.74 | 0.058 | 2.73 | 0.39 | 0.0091 | 0.39 |
| 80 | 2.48 | 0.041 | 2.47 | 0.17 | 0.0062 | 0.17 |
| 100 | 2.45 | 0.037 | 2.45 | 0.15 | 0.0081 | 0.16 |
Comparison of different γ0 estimation methods for SIZE20 with imputation model = MI-2 and analytic model = Anal-2 in the PWS12 MI trials. The best was calculated by Equation (19) using as the estimate of B0/U0, where and were the mean of the 30 replicates of B100 and U100, respectively
| SIZE20, MI-2, Anal-2 | ||||
|---|---|---|---|---|
| Control | Improved | |||
| 3 | 0.00379 | 0.00283 | 0.00284 | |
| 5 | 0.00318 | 0.00265 | 0.00265 | |
| 10 | 0.00286 | 0.00260 | 0.00260 | |
| 20 | 0.00293 | 0.00279 | 0.00279 | |
| 30 | 0.00278 | 0.00269 | 0.00269 | |
| 40 | 0.00273 | 0.00267 | 0.00267 | |
| 60 | 0.00277 | 0.00273 | 0.00273 | |
| 80 | 0.00272 | 0.00269 | 0.00269 | |
| 100 | 0.00270 | 0.00270 | 0.00267 | 0.00267 |
| (∞) | Best | |||