| Literature DB >> 34237092 |
Faisal Maqbool Zahid1, Shahla Faisal1, Christian Heumann2.
Abstract
Multiple Imputation (MI) is always challenging in high dimensional settings. The imputation model with some selected number of predictors can be incompatible with the analysis model leading to inconsistent and biased estimates. Although compatibility in such cases may not be achieved, but one can obtain consistent and unbiased estimates using a semi-compatible imputation model. We propose to relax the lasso penalty for selecting a large set of variables (at most n). The substantive model that also uses some formal variable selection procedure in high-dimensional structures is then expected to be nested in this imputation model. The resulting imputation model will be semi-compatible with high probability. The likelihood estimates can be unstable and can face the convergence issues as the number of variables becomes nearly as large as the sample size. To address these issues, we further propose to use a ridge penalty for obtaining the posterior distribution of the parameters based on the observed data. The proposed technique is compared with the standard MI software and MI techniques available for high-dimensional data in simulation studies and a real life dataset. Our results exhibit the superiority of the proposed approach to the existing MI approaches while addressing the compatibility issue.Entities:
Year: 2021 PMID: 34237092 PMCID: PMC8266107 DOI: 10.1371/journal.pone.0254112
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Simulation study: Mean Squared Imputation Error (MSIE).
| MLE | Ridge | MLE | Ridge | MLE | Ridge | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| p | miss% | mice | VIM | durr | MI1 | MI4 | MI2 | MI5 | MI3 | MI6 |
| 50 | 10 | 73.72 | 69.93 | 123.95 | 45.87 | 45.89 | 65.52 | 52.38 | 69.14 | 53.89 |
| 20 | 163.31 | 139.75 | 248.27 | 94.8 | 94.02 | 131.2 | 105.59 | 154.23 | 112.87 | |
| 30 | 283.89 | 211.09 | 364.86 | 145.43 | 143.98 | 194.2 | 158.82 | 265.55 | 181.44 | |
| 200 | 10 | 127.45 | 129.83 | 115.71 | 44.28 | 43.91 | 55.48 | 48.93 | - | 59.65 |
| 20 | 229.27 | 274.68 | 235.88 | 91.53 | 90.06 | 112.06 | 100.48 | - | 121.18 | |
| 30 | 311.01 | 429.53 | 354.47 | 139.23 | 137.67 | 168.09 | 152.47 | - | 183.51 | |
| 500 | 10 | 86.8 | 131.26 | 113.9 | 45.08 | 44.58 | 52.38 | 49.36 | - | 56.24 |
| 20 | 169.91 | 274.23 | 231.59 | 93.3 | 92.45 | 106.78 | 100.98 | - | 114.96 | |
| 30 | 249 | 424.35 | 343.41 | 142.68 | 140.73 | 160.48 | 153.04 | - | 173.99 | |
| Results for | ||||||||||
| 30 | 10 | 59.74 | 73.86 | 139.53 | 50.86 | 50.98 | 59.24 | 53.88 | 59.24 | 53.88 |
| 20 | 129.21 | 159.5 | 268.24 | 104.35 | 104.28 | 126.9 | 112.45 | 126.9 | 112.45 | |
| 30 | 207.76 | 258.92 | 400.2 | 163.34 | 162.61 | 203.48 | 178.65 | 203.48 | 178.65 | |
| 60 | 10 | 88.83 | 80.23 | 135.95 | 47.33 | 46.99 | 64.47 | 53.15 | 81.78 | 59.46 |
| 20 | 210.88 | 161.02 | 259.43 | 96.71 | 96.3 | 130.04 | 107.02 | 194.23 | 132.38 | |
| 30 | 402.2 | 248.81 | 382.58 | 149.07 | 147.1 | 194.41 | 163.6 | - | 179.47 | |
| Results for | ||||||||||
| 30 | 10 | 100.77 | 95.88 | 121.26 | 91.86 | 91.41 | 99.69 | 101.23 | 100.37 | 101.7 |
| 20 | 210.22 | 192.5 | 242.44 | 187.13 | 187.95 | 210.75 | 210.92 | 210.33 | 211 | |
| 30 | 330.6 | 290.03 | 362.39 | 289.66 | 286.57 | 333.75 | 336.99 | 334.01 | 335.24 | |
| 60 | 10 | 136.3 | 111.71 | 115.46 | 92.12 | 91.36 | 117.23 | 110.2 | 137.23 | 131.75 |
| 20 | 302.19 | 213.12 | 231.57 | 186.56 | 184.87 | 232.53 | 216.71 | 299.13 | 279.77 | |
| 30 | 523.83 | 298.49 | 348.13 | 282.69 | 280.09 | 342.4 | 322.41 | 438.96 | 415.76 | |
Fig 1Simulation study: Box plots of MSIE.
White boxes represent the threshold MI methods i.e., mice and VIM and durr. Blue and green boxes represent the MLE fit and ridge fit to the selected imputation model respectively.
Simulation study: Results of MSE().
| MLE | Ridge | MLE | Ridge | MLE | Ridge | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| p | miss% | mice | VIM | durr | MI1 | MI4 | MI2 | MI5 | MI3 | MI6 |
| 50 | 10 | 1.87 | 2.2 | 2.3 | 1.7 | 1.7 | 1.82 | 1.71 | 1.84 | 1.73 |
| 20 | 2.35 | 2.96 | 2.82 | 1.81 | 1.82 | 2.12 | 1.84 | 2.32 | 1.86 | |
| 30 | 3.79 | 5.96 | 3.1 | 1.97 | 1.96 | 2.41 | 2.01 | 3.86 | 2.06 | |
| 200 | 10 | 6.2 | 6.6 | 6.43 | 5.85 | 5.89 | 5.95 | 5.88 | - | 5.87 |
| 20 | 6.45 | 7.26 | 6.87 | 5.98 | 5.96 | 6.08 | 5.94 | - | 5.97 | |
| 30 | 6.78 | 7.87 | 7.19 | 6.27 | 6.26 | 6.29 | 6.22 | - | 6.24 | |
| 500 | 10 | 6.56 | 6.95 | 6.83 | 6.22 | 6.23 | 6.29 | 6.28 | - | 6.36 |
| 20 | 6.99 | 7.33 | 7.21 | 6.45 | 6.45 | 6.5 | 6.44 | - | 6.46 | |
| 30 | 7.19 | 7.88 | 7.46 | 6.64 | 6.59 | 6.64 | 6.59 | - | 6.65 | |
| Results for | ||||||||||
| 30 | 10 | 181.42 | 210.11 | 184.48 | 164.99 | 163.33 | 176.19 | 159.73 | 176.19 | 159.73 |
| 20 | 192.73 | 238.74 | 182.46 | 163.12 | 159.34 | 181.66 | 156.93 | 181.66 | 156.93 | |
| 30 | 208.73 | 275.4 | 181.47 | 162.24 | 160.66 | 189.74 | 151.49 | 189.74 | 151.49 | |
| 60 | 10 | 835.56 | 859.77 | 726.65 | 732.99 | 724.37 | 804.63 | 717.01 | 841.87 | 704.99 |
| 20 | 1020.01 | 1042.21 | 711.25 | 717.04 | 704.13 | 817.36 | 679.49 | 1024.78 | 633.56 | |
| 30 | 1221.6 | 1327.28 | 705.15 | 690.36 | 679.86 | 819.15 | 652.56 | - | 626.07 | |
| Results for | ||||||||||
| 30 | 10 | 14.99 | 15.81 | 14.49 | 13.02 | 12.75 | 14.49 | 12.68 | 14.49 | 12.68 |
| 20 | 18.04 | 20.51 | 16.53 | 14.09 | 13.64 | 16.29 | 13.25 | 16.29 | 13.25 | |
| 30 | 22.08 | 29.8 | 18.23 | 15.68 | 14.7 | 18.87 | 14.03 | 18.87 | 14.03 | |
| 60 | 10 | 95.24 | 100.65 | 87.25 | 84.04 | 83.74 | 87.07 | 80.95 | 93.89 | 79.59 |
| 20 | 120.72 | 133.98 | 91.95 | 86.95 | 86.35 | 94.87 | 79.7 | 118.29 | 75.24 | |
| 30 | 150.9 | 200.57 | 95.2 | 88 | 85.47 | 97.94 | 81.28 | 127.56 | 74.54 | |
Fig 2Simulation study: Box plots of MSE() for 30% missing values.
White boxes represent the threshold MI methods i.e., mice and VIM and durr. Blue and green boxes represent the results based on imputed data using the MLE fit and ridge fit to the selected imputation model respectively.
Average (of S = 200 elapsed time values) processing time (in seconds) required by different algorithms to impute one dataset with 10% missing values.
| MLE | Ridge | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| p | mice | VIM | durr | MI1 | MI2 | MI3 | MI4 | MI5 | MI6 |
| 50 | 12.02 | 6.61 | 255.25 | 72.23 | 128.94 | 130.28 | 118.76 | 149.47 | 150.34 |
| 200 | 822.99 | 28.99 | 213.02 | 99.06 | 179.90 | - | 138.76 | 198.21 | 187.06 |
| 500 | 24016.82 | 123.24 | 140.40 | 144.27 | 221.05 | - | 168.06 | 209.66 | 243.42 |
Heart data results of MSE with its split into variance and bias components.
The results are obtained when a ridge regression was fitted to the data imputed with all proposed and existing imputation methods.
| 10% missing | 20% missing | 30% missing | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Method | MSE | var | bias2 | MSE | var | bias2 | MSE | var | bias2 |
| mice | 0.68 | 0.56 | 0.12 | 0.77 | 0.59 | 0.18 | 0.87 | 0.61 | 0.26 |
| VIM | 0.81 | 0.69 | 0.12 | 0.91 | 0.75 | 0.15 | 1.08 | 0.88 | 0.21 |
| durr | 0.68 | 0.50 | 0.19 | 0.74 | 0.43 | 0.31 | 0.84 | 0.35 | 0.49 |
| MI1 | 0.65 | 0.53 | 0.11 | 0.71 | 0.54 | 0.17 | 0.81 | 0.58 | 0.23 |
| MI2 | 0.68 | 0.56 | 0.12 | 0.81 | 0.65 | 0.16 | 1.05 | 0.83 | 0.23 |
| MI3 | 0.65 | 0.52 | 0.13 | 0.74 | 0.56 | 0.18 | 0.93 | 0.68 | 0.25 |
| MI4 | 0.63 | 0.50 | 0.13 | 0.67 | 0.48 | 0.19 | 0.73 | 0.48 | 0.25 |
| MI5 | 0.58 | 0.42 | 0.16 | 0.62 | 0.38 | 0.24 | 0.67 | 0.36 | 0.31 |
| MI6 | 0.56 | 0.39 | 0.17 | 0.59 | 0.30 | 0.29 | 0.66 | 0.26 | 0.40 |