| Literature DB >> 34291028 |
Chao-Yu Guo1,2, Ying-Chen Yang1,2, Yi-Hau Chen3.
Abstract
An adequate imputation of missing data would significantly preserve the statistical power and avoid erroneous conclusions. In the era of big data, machine learning is a great tool to infer the missing values. The root means square error (RMSE) and the proportion of falsely classified entries (PFC) are two standard statistics to evaluate imputation accuracy. However, the Cox proportional hazards model using various types requires deliberate study, and the validity under different missing mechanisms is unknown. In this research, we propose supervised and unsupervised imputations and examine four machine learning-based imputation strategies. We conducted a simulation study under various scenarios with several parameters, such as sample size, missing rate, and different missing mechanisms. The results revealed the type-I errors according to different imputation techniques in the survival data. The simulation results show that the non-parametric "missForest" based on the unsupervised imputation is the only robust method without inflated type-I errors under all missing mechanisms. In contrast, other methods are not valid to test when the missing pattern is informative. Statistical analysis, which is improperly conducted, with missing data may lead to erroneous conclusions. This research provides a clear guideline for a valid survival analysis using the Cox proportional hazard model with machine learning-based imputations.Entities:
Keywords: cox proportional hazard model; k-nearest neighbors imputation; machine learning; random forest imputation; survival data simulation
Year: 2021 PMID: 34291028 PMCID: PMC8289437 DOI: 10.3389/fpubh.2021.680054
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1The proportion of falsely classified (PFC) using 500 subjects.
Figure 2The root means square error (RMSE) using 500 subjects.
The best performer for proportion of falsely classified (PFC) and root means square error (RMSE).
| 100 | 0.1 | MCAR | RFotf (0.3062) | KNN (0.8781) |
| 100 | 0.1 | MAR | RFotf (0.3147) | KNN (0.8744) |
| 100 | 0.1 | MNAR | RFprxe (0.4983) | KNN (1.0584) |
| 100 | 0.2 | MCAR | RFotf (0.3071) | KNN (0.9522) |
| 100 | 0.2 | MAR | RFotf (0.3184) | KNN (0.9519) |
| 100 | 0.2 | MNAR | RFprxe (0.4862) | KNN (1.1101) |
| 100 | 0.3 | MCAR | RFotf (0.3049) | KNN (0.9871) |
| 100 | 0.3 | MAR | RFotf (0.3058) | KNN (0.9942) |
| 100 | 0.3 | MNAR | RFprxe (0.4998) | KNN (1.0973) |
| 250 | 0.1 | MCAR | RFotf (0.396) | KNN (0.9603) |
| 250 | 0.1 | MAR | RFotf (0.3069) | KNN (0.9669) |
| 250 | 0.1 | MNAR | RFprxe (0.4987) | KNN (1.1751) |
| 250 | 0.2 | MCAR | RFotf (0.3022) | KNN (0.9977) |
| 250 | 0.2 | MAR | RFotf (0.3079) | KNN (0.9938) |
| 250 | 0.2 | MNAR | RFprxe (0.4999) | KNN (1.1489) |
| 250 | 0.3 | MCAR | RFotf (0.3107) | KNN (1.0083) |
| 250 | 0.3 | MAR | RFotf (0.3143) | KNN (1.0099) |
| 250 | 0.3 | MNAR | RFprxe (0.4971) | KNN (1.1278) |
| 500 | 0.1 | MCAR | RFotf (0.3114) | KNN (0.98) |
| 500 | 0.1 | MAR | RFotf (0.3057) | KNN (0.9852) |
| 500 | 0.1 | MNAR | RFprxe (0.5072) | KNN (1.2091) |
| 500 | 0.2 | MCAR | RFotf (0.3069) | KNN (1.0045) |
| 500 | 0.2 | MAR | RFotf (0.307) | KNN (1.0044) |
| 500 | 0.2 | MNAR | RFprxe (0.5046) | KNN (1.1735) |
| 500 | 0.3 | MCAR | RFotf (0.3073) | KNN (1.01) |
| 500 | 0.3 | MAR | RFotf (0.3087) | KNN (1.0098) |
| 500 | 0.3 | MNAR | RFprxe (0.502) | KNN (1.1367) |
| 1,000 | 0.1 | MCAR | RFotf (0.3093) | KNN (0.9958) |
| 1,000 | 0.1 | MAR | RFotf (0.3067) | KNN (0.9963) |
| 1,000 | 0.1 | MNAR | RFprxe (0.5272) | KNN (1.2208) |
| 1,000 | 0.2 | MCAR | RFotf (0.3074) | KNN (1.0034) |
| 1,000 | 0.2 | MAR | RFotf (0.3102) | KNN (1.0056) |
| 1,000 | 0.2 | MNAR | RFprxe (0.5274) | KNN (1.1766) |
| 1,000 | 0.3 | MCAR | RFotf (0.3089) | KNN (1.0081) |
| 1,000 | 0.3 | MAR | RFotf (0.31) | KNN (1.0102) |
| 1,000 | 0.3 | MNAR | RFprxe (0.5142) | KNN (1.1385) |
The type-I error of the Cox model.
| 100 | 0.1 | MCAR | 0.079 | 0.088 | 0.085 | 0.092 | 0.082 | 0.088 | 0.08 |
| 100 | 0.1 | MAR | 0.084 | 0.095 | 0.094 | 0.098 | 0.09 | 0.094 | 0.085 |
| 100 | 0.1 | MNAR | 0.087 | 0.093 | 0.095 | 0.093 | 0.09 | 0.091 | 0.089 |
| 100 | 0.2 | MCAR | 0.091 | 0.092 | 0.102 | 0.113 | 0.081 | 0.095 | 0.079 |
| 100 | 0.2 | MAR | 0.076 | 0.083 | 0.089 | 0.105 | 0.073 | 0.085 | 0.073 |
| 100 | 0.2 | MNAR | 0.075 | 0.08 | 0.09 | 0.106 | 0.075 | 0.082 | 0.074 |
| 100 | 0.3 | MCAR | 0.085 | 0.101 | 0.12 | 0.134 | 0.094 | 0.114 | 0.086 |
| 100 | 0.3 | MAR | 0.072 | 0.092 | 0.114 | 0.13 | 0.087 | 0.107 | 0.078 |
| 100 | 0.3 | MNAR | 0.085 | 0.105 | 0.114 | 0.136 | 0.092 | 0.104 | 0.086 |
| 250 | 0.1 | MCAR | 0.055 | 0.057 | 0.063 | 0.069 | 0.051 | 0.059 | 0.051 |
| 250 | 0.1 | MAR | 0.06 | 0.066 | 0.079 | 0.089 | 0.064 | 0.072 | 0.064 |
| 250 | 0.1 | MNAR | 0.054 | 0.059 | 0.072 | 0.083 | 0.054 | 0.058 | 0.053 |
| 250 | 0.2 | MCAR | 0.052 | 0.065 | 0.082 | 0.095 | 0.055 | 0.068 | 0.055 |
| 250 | 0.2 | MAR | 0.062 | 0.078 | 0.093 | 0.117 | 0.069 | 0.087 | 0.067 |
| 250 | 0.2 | MNAR | 0.054 | 0.072 | 0.087 | 0.115 | 0.059 | 0.068 | 0.056 |
| 250 | 0.3 | MCAR | 0.069 | 0.094 | 0.125 | 0.169 | 0.075 | 0.097 | 0.07 |
| 250 | 0.3 | MAR | 0.059 | 0.09 | 0.126 | 0.153 | 0.073 | 0.09 | 0.064 |
| 250 | 0.3 | MNAR | 0.07 | 0.083 | 0.126 | 0.165 | 0.069 | 0.089 | 0.059 |
| 500 | 0.1 | MCAR | 0.061 | 0.057 | 0.073 | 0.082 | 0.058 | 0.062 | 0.057 |
| 500 | 0.1 | MAR | 0.051 | 0.059 | 0.062 | 0.077 | 0.055 | 0.061 | 0.055 |
| 500 | 0.1 | MNAR | 0.068 | 0.066 | 0.08 | 0.096 | 0.069 | 0.071 | 0.066 |
| 500 | 0.2 | MCAR | 0.05 | 0.065 | 0.105 | 0.141 | 0.055 | 0.062 | 0.053 |
| 500 | 0.2 | MAR | 0.056 | 0.069 | 0.113 | 0.141 | 0.055 | 0.067 | 0.055 |
| 500 | 0.2 | MNAR | 0.063 | 0.072 | 0.102 | 0.147 | 0.064 | 0.067 | 0.061 |
| 500 | 0.3 | MCAR | 0.046 | 0.065 | 0.137 | 0.201 | 0.053 | 0.074 | 0.046 |
| 500 | 0.3 | MAR | 0.047 | 0.078 | 0.154 | 0.224 | 0.058 | 0.079 | 0.054 |
| 500 | 0.3 | MNAR | 0.057 | 0.068 | 0.131 | 0.204 | 0.061 | 0.071 | 0.056 |
| 1,000 | 0.1 | MCAR | 0.053 | 0.059 | 0.074 | 0.094 | 0.055 | 0.057 | 0.053 |
| 1,000 | 0.1 | MAR | 0.042 | 0.049 | 0.077 | 0.1 | 0.044 | 0.048 | 0.044 |
| 1,000 | 0.1 | MNAR | 0.047 | 0.054 | 0.07 | 0.081 | 0.052 | 0.056 | 0.051 |
| 1,000 | 0.2 | MCAR | 0.043 | 0.054 | 0.121 | 0.173 | 0.048 | 0.057 | 0.043 |
| 1,000 | 0.2 | MAR | 0.048 | 0.057 | 0.127 | 0.191 | 0.052 | 0.06 | 0.051 |
| 1,000 | 0.2 | MNAR | 0.053 | 0.065 | 0.151 | 0.217 | 0.06 | 0.069 | 0.055 |
| 1,000 | 0.3 | MCAR | 0.047 | 0.061 | 0.21 | 0.32 | 0.047 | 0.07 | 0.046 |
| 1,000 | 0.3 | MAR | 0.043 | 0.066 | 0.237 | 0.362 | 0.049 | 0.069 | 0.049 |
| 1,000 | 0.3 | MNAR | 0.045 | 0.07 | 0.217 | 0.351 | 0.061 | 0.071 | 0.055 |