| Literature DB >> 33948262 |
Nicole Solomon1, Yuliya Lokhnygina1,2, Susan Halabi1,3.
Abstract
INTRODUCTION: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typically biased output of complete case analysis. This article examines several types of regression imputation of missing covariates in the prediction of time-to-event outcomes subject to right censoring.Entities:
Keywords: Missing data; proportional hazards model; regression imputation
Year: 2020 PMID: 33948262 PMCID: PMC8057424 DOI: 10.1017/cts.2020.533
Source DB: PubMed Journal: J Clin Transl Sci ISSN: 2059-8661
Distribution of chosen baseline covariates from the TROPIC trial
| Discrete variables | % No (0) | % Yes (1) | % (2)* | % Missing |
|---|---|---|---|---|
| ARM | 0.5 | 0.5 | 0.0 | |
| ECOG | 0.34 | 0.58 | 0.08 | 0.0 |
| PROG | 0.11 | 0.89 | 0.0 | |
| CHEMO | 0.68 | 0.32 | 0.0 | |
| PAIN | 0.49 | 0.51 | 16.2 | |
| WHITE | 0.16 | 0.84 | 0.0 | |
| MEAS_DIS | 0.47 | 0.53 | 0.0 |
Parameter settings for simulation studies
| Parameter | Levels | ||
|---|---|---|---|
| N | 200 | 500 | 1000 |
| C | 10% | 30% | |
| M | 5% | 10% | 15% |
N, sample size; C, censoring percentage; M, missing percentage.
Relative rank of regression imputation methods by simulation scenario (number of events)
| Dataset | Statistic | GLM | LASSO | MARS | SVM | RF |
|---|---|---|---|---|---|---|
| 140 events | Bias | 5 | 2 | 3 | 1 | 4 |
| MSE | 4 | 1 | 5 | 2 | 3 | |
| MSPE | 4 | 1 | 3 | 2 | 5 | |
| MAD | 4 | 1 | 3 | 2 | 5 | |
| mPCOV | 3 | 5 | 4 | 1 | 2 | |
| 180 events | Bias | 4 | 1 | 5 | 2 | 3 |
| MSE | 4 | 1 | 5 | 3 | 2 | |
| MSPE | 3 | 1 | 4 | 2 | 5 | |
| MAD | 4 | 1 | 3 | 2 | 5 | |
| mPCOV | 3 | 2 | 4 | 1 | 5 | |
| 350 events | Bias | 4 | 1 | 5 | 3 | 2 |
| MSE | 4 | 1 | 5 | 2 | 3 | |
| MSPE | 4 | 1 | 3 | 2 | 5 | |
| MAD | 4 | 1 | 3 | 2 | 5 | |
| mPCOV | 1 | 5 | 4 | 3 | 2 | |
| 450 events | Bias | 5 | 1 | 3 | 2 | 4 |
| MSE | 4 | 3 | 5 | 1 | 2 | |
| MSPE | 3 | 1 | 4 | 2 | 5 | |
| MAD | 5 | 1 | 3 | 2 | 4 | |
| mPCOV | 3 | 5 | 4 | 2 | 1 | |
| 700 events | Bias | 3 | 1 | 5 | 4 | 2 |
| MSE | 4 | 2 | 5 | 1 | 3 | |
| MSPE | 4 | 1 | 3 | 2 | 5 | |
| MAD | 4 | 1 | 3 | 2 | 5 | |
| mPCOV | 1 | 5 | 4 | 3 | 2 | |
| 900 events | Bias | 4 | 1 | 5 | 3 | 2 |
| MSE | 4 | 1 | 5 | 2 | 3 | |
| MSPE | 4 | 1 | 3 | 2 | 5 | |
| MAD | 4 | 1 | 3 | 2 | 5 | |
| mPCOV | 2 | 5 | 4 | 3 | 1 | |
| (Freq. in Top 2)/30 | 3 | 24 | 0 | 23 | 10 | |
| (Freq. in Bottom 2)/30 | 21 | 5 | 18 | 1 | 15 | |
| (Freq. in Bottom 3)/30 | 27 | 6 | 12 | 7 | 20 | |
MSE, mean squared error; MSPE, mean squared prediction error; MAD, median absolute deviation; mPCOV, minimum 95% probability coverage.
Summary statistics of simulations by regression imputation method and percent missing
| 5% | 10% | 15% | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| % Missing | GLM | LASSO | MARS | SVM | RF | GLM | LASSO | MARS | SVM | RF | GLM | LASSO | MARS | SVM | RF |
| (a) 140 events ( | |||||||||||||||
| P.Bias | 2.125 | 2.056 | 1.963 | 1.896 | 2.061 | 2.274 | 2.093 | 2.357 | 2.063 | 2.191 | 2.471 | 2.099 | 2.159 | 2.051 | 2.230 |
| P.MSE | 2.171 | 2.148 | 2.191 | 2.144 | 2.150 | 2.244 | 2.122 | 2.297 | 2.178 | 2.184 | 2.283 | 2.131 | 2.453 | 2.210 | 2.216 |
| MSPE | 2.122 | 2.051 | 2.113 | 2.083 | 2.159 | 2.104 | 1.963 | 2.046 | 2.029 | 2.173 | 2.073 | 1.914 | 2.019 | 1.981 | 2.187 |
| MAD | 0.470 | 0.442 | 0.460 | 0.458 | 0.472 | 0.470 | 0.424 | 0.447 | 0.445 | 0.477 | 0.473 | 0.409 | 0.437 | 0.431 | 0.479 |
| mPCOV | 0.768 | 0.772 | 0.765 | 0.764 | 0.769 | 0.763 | 0.752 | 0.762 | 0.771 | 0.768 | 0.758 | 0.721 | 0.743 | 0.769 | 0.767 |
| (b) 180 events ( | |||||||||||||||
| P.Bias | 1.819 | 1.670 | 1.699 | 1.804 | 1.741 | 1.738 | 1.672 | 1.779 | 1.734 | 1.771 | 1.850 | 1.540 | 2.074 | 1.739 | 1.789 |
| P.MSE | 2.188 | 2.144 | 2.209 | 2.147 | 2.150 | 2.203 | 2.143 | 2.334 | 2.156 | 2.154 | 2.239 | 2.133 | 2.379 | 2.186 | 2.180 |
| MSPE | 2.008 | 1.983 | 2.039 | 1.995 | 2.081 | 1.974 | 1.893 | 1.980 | 1.925 | 2.087 | 1.930 | 1.808 | 1.950 | 1.882 | 2.094 |
| MAD | 0.480 | 0.456 | 0.473 | 0.469 | 0.486 | 0.473 | 0.435 | 0.455 | 0.452 | 0.484 | 0.472 | 0.419 | 0.448 | 0.439 | 0.482 |
| mPCOV | 0.761 | 0.763 | 0.756 | 0.763 | 0.758 | 0.761 | 0.764 | 0.761 | 0.768 | 0.760 | 0.760 | 0.729 | 0.759 | 0.760 | 0.761 |
| (c) 350 events ( | |||||||||||||||
| P.Bias | 1.175 | 1.096 | 1.116 | 1.122 | 1.071 | 1.124 | 1.038 | 1.389 | 1.109 | 1.081 | 1.185 | 1.063 | 1.264 | 1.079 | 1.068 |
| P.MSE | 2.127 | 2.143 | 2.180 | 2.124 | 2.120 | 2.172 | 2.108 | 2.527 | 2.143 | 2.141 | 2.194 | 2.136 | 2.770 | 2.157 | 2.192 |
| MSPE | 1.783 | 1.758 | 1.787 | 1.775 | 1.834 | 1.751 | 1.688 | 1.735 | 1.726 | 1.849 | 1.720 | 1.632 | 1.701 | 1.666 | 1.868 |
| MAD | 0.454 | 0.434 | 0.444 | 0.439 | 0.455 | 0.448 | 0.410 | 0.429 | 0.424 | 0.455 | 0.446 | 0.392 | 0.418 | 0.410 | 0.456 |
| mPCOV | 0.784 | 0.783 | 0.780 | 0.789 | 0.785 | 0.782 | 0.734 | 0.752 | 0.778 | 0.788 | 0.789 | 0.654 | 0.722 | 0.745 | 0.776 |
| (d) 450 events ( | |||||||||||||||
| P.Bias | 7.268 | 5.901 | 6.515 | 5.946 | 6.653 | 4.722 | 5.013 | 2.435 | 2.494 | 2.678 | 7.694 | 3.671 | 4.311 | 4.976 | 4.140 |
| P.MSE | 2.189 | 2.152 | 2.248 | 2.146 | 2.149 | 2.190 | 2.155 | 2.599 | 2.181 | 2.179 | 2.229 | 2.172 | 3.052 | 2.168 | 2.190 |
| MSPE | 1.719 | 1.694 | 1.728 | 1.711 | 1.770 | 1.672 | 1.630 | 1.680 | 1.661 | 1.782 | 1.642 | 1.574 | 1.628 | 1.604 | 1.794 |
| MAD | 0.459 | 0.437 | 0.450 | 0.444 | 0.459 | 0.452 | 0.414 | 0.438 | 0.432 | 0.460 | 0.447 | 0.398 | 0.424 | 0.416 | 0.459 |
| mPCOV | 0.754 | 0.767 | 0.759 | 0.759 | 0.756 | 0.757 | 0.733 | 0.742 | 0.758 | 0.761 | 0.748 | 0.662 | 0.684 | 0.734 | 0.757 |
| (e) 700 events ( | |||||||||||||||
| P.Bias | 0.964 | 0.941 | 0.983 | 0.872 | 0.948 | 0.948 | 0.908 | 1.166 | 0.987 | 0.939 | 1.003 | 0.947 | 1.313 | 1.033 | 0.955 |
| P.MSE | 2.082 | 2.070 | 2.131 | 2.081 | 2.065 | 2.131 | 2.095 | 3.012 | 2.092 | 2.115 | 2.178 | 2.100 | 3.722 | 2.120 | 2.156 |
| MSPE | 1.701 | 1.680 | 1.705 | 1.694 | 1.754 | 1.671 | 1.628 | 1.647 | 1.641 | 1.769 | 1.646 | 1.569 | 1.614 | 1.603 | 1.787 |
| MAD | 0.451 | 0.428 | 0.440 | 0.436 | 0.453 | 0.446 | 0.406 | 0.427 | 0.422 | 0.454 | 0.445 | 0.390 | 0.417 | 0.408 | 0.454 |
| mPCOV | 0.786 | 0.780 | 0.786 | 0.783 | 0.782 | 0.782 | 0.664 | 0.735 | 0.758 | 0.781 | 0.755 | 0.560 | 0.712 | 0.690 | 0.744 |
| (f) 900 events ( | |||||||||||||||
| P.Bias | 0.915 | 0.975 | 1.027 | 0.987 | 0.979 | 1.118 | 0.843 | 1.365 | 1.010 | 0.942 | 1.138 | 0.821 | 1.100 | 1.067 | 0.849 |
| P.MSE | 2.221 | 2.169 | 2.268 | 2.188 | 2.173 | 2.208 | 2.160 | 3.570 | 2.184 | 2.221 | 2.272 | 2.176 | 4.770 | 2.208 | 2.250 |
| MSPE | 1.646 | 1.629 | 1.659 | 1.640 | 1.709 | 1.606 | 1.573 | 1.601 | 1.588 | 1.725 | 1.569 | 1.521 | 1.566 | 1.533 | 1.741 |
| MAD | 0.443 | 0.425 | 0.438 | 0.433 | 0.448 | 0.439 | 0.404 | 0.423 | 0.417 | 0.450 | 0.436 | 0.386 | 0.411 | 0.403 | 0.448 |
| mPCOV | 0.750 | 0.755 | 0.753 | 0.754 | 0.754 | 0.753 | 0.665 | 0.723 | 0.744 | 0.755 | 0.726 | 0.507 | 0.690 | 0.665 | 0.730 |
Fig. 1.Performance of regression imputation methods for each summary statistic in simulations where C = 30% and M = 15%.
Fig. 2.Performance of regression imputation methods for each summary statistic in simulations where C = 30% and M = 10%.