| Literature DB >> 35027065 |
Huimin Wang1, Jianxiang Tang1, Mengyao Wu1, Xiaoyu Wang2, Tao Zhang3.
Abstract
BACKGROUND: There are often many missing values in medical data, which directly affect the accuracy of clinical decision making. Discharge assessment is an important part of clinical decision making. Taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example, this study adopted the missing data processing evaluation criteria more suitable for clinical decision making, aiming at systematically exploring the performance and applicability of single machine learning algorithms and ensemble learning (EL) under different data missing scenarios, as well as whether they had more advantages than traditional methods, so as to provide basis and reference for the selection of suitable missing data processing method in practical clinical decision making.Entities:
Keywords: Clinical decision making; Discharge assessment; Ensemble learning; Imputation; Machine learning; Missing data; Spontaneous supratentorial intracerebral hemorrhage
Mesh:
Year: 2022 PMID: 35027065 PMCID: PMC8756624 DOI: 10.1186/s12911-022-01752-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Experimental design. MAR (1:2): MAR (the ratio of missing proportion 1:2); MAR (2:1): MAR (the ratio of missing proportion 2:1)
Fig. 2Stacking ensemble learning algorithm framework
The optimal hyperparameter configuration of machine learning imputation techniques (under the MAR (the ratio of missing proportion 1:2) mechanism scenario with a missing proportion of 5%)
| Methods | Packages | Hyperparameters to be tuned | Hyperparameters ranges | Optimal configuration |
|---|---|---|---|---|
| LR | – | – | – | – |
| RF | randomForest | mtry: number of randomly selected predictors | mtry = {1:8} | mtry = 4 |
| NN | nnet | size: numbers of hidden units, decay: weight decay | size = {1:24}, decay = {0, 0.1, 0.01, 5e-4} | size = 4, decay = 0.1 |
| SVM | Kernlab | sigma: Sigma*, C: cost | Kernel = Radial Basis Function Kernel, C = {0.25, 0.50, 1, 2, 4, 8, 16, 32} | Kernel = Radial Basis Function Kernel, C = 0.25 |
| EL | kernlab, caret, caretEnsemble | sigma: Sigma*, C: cost | Kernel = Radial Basis Function Kernel, C = {0.25, 0.50, 1, 2, 4, 8, 16, 32} | Kernel = Radial Basis Function Kernel, C = 0.25 |
–: the parameter tuning is not required; *: optimal configuration is automatically tuned
Description of the data set
| Variables | Categories | Discharge situation | |
|---|---|---|---|
| Success (n = 1207) | Failure (n = 261) | ||
| Age | < 55 | 249 (20.6%) | 37 (14.2%) |
| 55–64 | 265 (22.0%) | 51 (19.5%) | |
| 65–74 | 391 (32.4%) | 86 (33.0%) | |
| 75–84 | 246 (20.4%) | 60 (23.0%) | |
| > 84 | 56 (4.6%) | 27 (10.3%) | |
| Gender | Male | 688 (57.0%) | 163 (62.5%) |
| Female | 519 (43.0%) | 98 (37.5%) | |
| More than two times of in-hospital | No | 1194 (98.9%) | 251 (96.2%) |
| Yes | 13 (1.1%) | 10 (3.8%) | |
| Deep coma | No | 1190 (98.6%) | 130 (49.8%) |
| Yes | 17 (1.4%) | 131 (50.2%) | |
| Diagnostic location | Deep | 1081 (89.6%) | 220 (84.3%) |
| Superficial | 126 (10.4%) | 41 (15.7%) | |
| Supratentorial hemorrhage volume | < 30 ml | 1032 (85.5%) | 128 (49.0%) |
| ≥ 30 ml | 175 (14.5%) | 133 (51.0%) | |
| Operation | No | 1045 (86.6%) | 203 (77.8%) |
| Yes | 162 (13.4%) | 58 (22.2%) | |
| Co-infection | No | 802 (66.4%) | 138 (52.9%) |
| Yes | 405 (33.6%) | 123 (47.1%) | |
Evaluation of logistic regression model fitting with original complete data set
| Sensitivity | AUC | Kappa | |
|---|---|---|---|
| Original complete data set | 0.874 | 0.914 | 0.558 |
Evaluation results of different processing methods in different scenarios of MCAR mechanism
| Evaluation metrics | Missing proportions | Machine learning methods | Traditional methods | ||||||
|---|---|---|---|---|---|---|---|---|---|
| LR | RF | NN | SVM | EL | Mode | KNN | MICE | ||
| Sensitivity | 0.05 | 0.874 | 0.874 | 0.877 | 0.874 | 0.877 | 0.854 | 0.874 | 0.870 |
| 0.10 | 0.889 | 0.881 | 0.881 | 0.877 | 0.893 | 0.847 | 0.877 | 0.877 | |
| 0.15 | 0.866 | 0.866 | 0.885 | 0.866 | 0.889 | 0.835 | 0.866 | 0.862 | |
| 0.20 | 0.877 | 0.874 | 0.893 | 0.866 | 0.893 | 0.851 | 0.866 | 0.872 | |
| 0.30 | 0.877 | 0.870 | 0.885 | 0.866 | 0.900 | 0.839 | 0.866 | 0.868 | |
| 0.50 | 0.847 | 0.904 | 0.893 | 0.862 | 0.893 | 0.793 | 0.851 | 0.849 | |
| Average | 0.872 | 0.878 | 0.886 | 0.869 | 0.891 | 0.837 | 0.867 | 0.866 | |
| AUC | 0.05 | 0.912 | 0.913 | 0.914 | 0.913 | 0.915 | 0.911 | 0.913 | 0.912 |
| 0.10 | 0.921 | 0.917 | 0.918 | 0.915 | 0.922 | 0.908 | 0.916 | 0.915 | |
| 0.15 | 0.908 | 0.914 | 0.918 | 0.914 | 0.915 | 0.895 | 0.915 | 0.907 | |
| 0.20 | 0.908 | 0.916 | 0.918 | 0.913 | 0.918 | 0.901 | 0.913 | 0.915 | |
| 0.30 | 0.909 | 0.915 | 0.916 | 0.913 | 0.926 | 0.893 | 0.914 | 0.913 | |
| 0.50 | 0.892 | 0.923 | 0.922 | 0.910 | 0.923 | 0.877 | 0.901 | 0.894 | |
| Average | 0.908 | 0.916 | 0.918 | 0.913 | 0.920 | 0.898 | 0.912 | 0.909 | |
| Kappa | 0.05 | 0.553 | 0.553 | 0.555 | 0.553 | 0.555 | 0.555 | 0.553 | 0.551 |
| 0.10 | 0.566 | 0.561 | 0.561 | 0.559 | 0.568 | 0.557 | 0.559 | 0.558 | |
| 0.15 | 0.552 | 0.552 | 0.564 | 0.552 | 0.566 | 0.497 | 0.553 | 0.545 | |
| 0.20 | 0.568 | 0.566 | 0.578 | 0.561 | 0.578 | 0.532 | 0.563 | 0.566 | |
| 0.30 | 0.562 | 0.557 | 0.566 | 0.555 | 0.576 | 0.512 | 0.560 | 0.574 | |
| 0.50 | 0.533 | 0.569 | 0.562 | 0.543 | 0.596 | 0.493 | 0.540 | 0.524 | |
| Average | 0.556 | 0.560 | 0.564 | 0.554 | 0.573 | 0.524 | 0.555 | 0.553 | |
Fig. 3Simulation evaluation results of the sensitivity (a), AUC (b) and Kappa (c) values of each method under MCAR mechanism
Evaluation results of different processing methods under different scenarios of MAR (the ratio of missing proportion 1:2) mechanism
| Evaluation metrics | Missing proportions | Machine learning methods | Traditional methods | ||||||
|---|---|---|---|---|---|---|---|---|---|
| LR | RF | NN | SVM | EL | Mode | KNN | MICE | ||
| sensitivity | 0.05 | 0.870 | 0.874 | 0.874 | 0.870 | 0.874 | 0.866 | 0.870 | 0.871 |
| 0.10 | 0.877 | 0.877 | 0.877 | 0.877 | 0.877 | 0.866 | 0.877 | 0.873 | |
| 0.15 | 0.885 | 0.881 | 0.889 | 0.881 | 0.889 | 0.866 | 0.877 | 0.882 | |
| 0.20 | 0.874 | 0.874 | 0.874 | 0.874 | 0.877 | 0.870 | 0.870 | 0.875 | |
| 0.30 | 0.897 | 0.885 | 0.877 | 0.874 | 0.900 | 0.805 | 0.870 | 0.878 | |
| 0.50 | 0.885 | 0.866 | 0.866 | 0.866 | 0.893 | 0.766 | 0.866 | 0.852 | |
| Average | 0.881 | 0.876 | 0.876 | 0.874 | 0.885 | 0.840 | 0.872 | 0.872 | |
| AUC | 0.05 | 0.912 | 0.914 | 0.914 | 0.912 | 0.915 | 0.911 | 0.912 | 0.913 |
| 0.10 | 0.917 | 0.916 | 0.916 | 0.916 | 0.917 | 0.917 | 0.917 | 0.914 | |
| 0.15 | 0.910 | 0.916 | 0.917 | 0.915 | 0.917 | 0.915 | 0.914 | 0.915 | |
| 0.20 | 0.912 | 0.914 | 0.914 | 0.914 | 0.916 | 0.909 | 0.911 | 0.913 | |
| 0.30 | 0.921 | 0.919 | 0.913 | 0.913 | 0.922 | 0.911 | 0.908 | 0.914 | |
| 0.50 | 0.924 | 0.914 | 0.913 | 0.915 | 0.925 | 0.912 | 0.916 | 0.909 | |
| Average | 0.916 | 0.916 | 0.915 | 0.914 | 0.919 | 0.913 | 0.913 | 0.913 | |
| Kappa | 0.05 | 0.556 | 0.558 | 0.558 | 0.556 | 0.558 | 0.564 | 0.556 | 0.556 |
| 0.10 | 0.564 | 0.564 | 0.564 | 0.564 | 0.564 | 0.580 | 0.566 | 0.562 | |
| 0.15 | 0.560 | 0.557 | 0.562 | 0.557 | 0.562 | 0.580 | 0.564 | 0.557 | |
| 0.20 | 0.554 | 0.554 | 0.555 | 0.554 | 0.556 | 0.539 | 0.552 | 0.548 | |
| 0.30 | 0.572 | 0.565 | 0.571 | 0.559 | 0.574 | 0.629 | 0.564 | 0.560 | |
| 0.50 | 0.560 | 0.548 | 0.549 | 0.548 | 0.564 | 0.632 | 0.552 | 0.540 | |
| Average | 0.561 | 0.558 | 0.560 | 0.556 | 0.563 | 0.587 | 0.559 | 0.554 | |
Fig. 4Simulation evaluation results of sensitivity (a), AUC (b) and Kappa (c) values of each method under MAR (the ratio of missing proportion 1:2) mechanism
Evaluation results of different processing methods under different scenarios of MAR (the ratio of missing proportion 2:1) mechanism
| Evaluation metrics | Missing proportions | Machine learning methods | Traditional methods | ||||||
|---|---|---|---|---|---|---|---|---|---|
| LR | RF | NN | SVM | EL | Mode | KNN | MICE | ||
| Sensitivity | 0.05 | 0.877 | 0.866 | 0.866 | 0.866 | 0.889 | 0.851 | 0.866 | 0.869 |
| 0.10 | 0.858 | 0.874 | 0.874 | 0.874 | 0.881 | 0.862 | 0.874 | 0.868 | |
| 0.15 | 0.889 | 0.877 | 0.904 | 0.874 | 0.897 | 0.858 | 0.866 | 0.870 | |
| 0.20 | 0.885 | 0.874 | 0.877 | 0.866 | 0.889 | 0.843 | 0.866 | 0.877 | |
| 0.30 | 0.866 | 0.897 | 0.920 | 0.881 | 0.923 | 0.739 | 0.866 | 0.876 | |
| 0.50 | 0.862 | 0.943 | 0.973 | 0.900 | 0.969 | 0.693 | 0.739 | 0.789 | |
| Average | 0.873 | 0.889 | 0.902 | 0.877 | 0.908 | 0.808 | 0.846 | 0.858 | |
| AUC | 0.05 | 0.913 | 0.912 | 0.913 | 0.912 | 0.913 | 0.901 | 0.912 | 0.911 |
| 0.10 | 0.909 | 0.914 | 0.915 | 0.914 | 0.911 | 0.904 | 0.915 | 0.912 | |
| 0.15 | 0.919 | 0.917 | 0.924 | 0.916 | 0.919 | 0.893 | 0.911 | 0.913 | |
| 0.20 | 0.913 | 0.918 | 0.916 | 0.914 | 0.916 | 0.891 | 0.915 | 0.912 | |
| 0.30 | 0.902 | 0.921 | 0.933 | 0.921 | 0.934 | 0.860 | 0.910 | 0.907 | |
| 0.50 | 0.887 | 0.952 | 0.947 | 0.942 | 0.950 | 0.855 | 0.860 | 0.875 | |
| Average | 0.907 | 0.922 | 0.925 | 0.920 | 0.924 | 0.884 | 0.904 | 0.905 | |
| Kappa | 0.05 | 0.562 | 0.555 | 0.555 | 0.555 | 0.569 | 0.519 | 0.555 | 0.555 |
| 0.10 | 0.547 | 0.557 | 0.557 | 0.557 | 0.561 | 0.526 | 0.557 | 0.554 | |
| 0.15 | 0.565 | 0.558 | 0.574 | 0.555 | 0.591 | 0.507 | 0.551 | 0.561 | |
| 0.20 | 0.568 | 0.561 | 0.563 | 0.556 | 0.592 | 0.506 | 0.557 | 0.564 | |
| 0.30 | 0.547 | 0.566 | 0.579 | 0.556 | 0.619 | 0.491 | 0.547 | 0.569 | |
| 0.50 | 0.507 | 0.627 | 0.630 | 0.622 | 0.645 | 0.556 | 0.491 | 0.514 | |
| Average | 0.549 | 0.571 | 0.576 | 0.567 | 0.596 | 0.518 | 0.543 | 0.553 | |
Fig. 5Simulation evaluation results of sensitivity (a), AUC (b) and Kappa (c) values of each method under MAR (the ratio of missing proportion 2:1) mechanism
Fig. 6Comparison of processing effects of each method in different missing scenarios. a to h: the sensitivity comparison of different missing scenarios of EL, LR, RF, NN, SVM, Mode, KNN and MICE; i to p: AUC comparison; q to x: Kappa comparison; MAR (1:2): the MAR (the ratio of missing proportion 1:2) mechanism; MAR (2:1): the MAR (the ratio of missing proportion 2:1) mechanism
The P values of statistical test between EL and other methods
| Missing mechanisms | Evaluation metrics | Machine learning methods | Traditional methods | ||||||
|---|---|---|---|---|---|---|---|---|---|
| LR | RF | NN | SVM | Mode | KNN | MICE | |||
| MCAR | Sensitivity | 0.018 | 0.047 | 0.091 | 0.016 | 0.016 | 0.016 | 0.016 | |
| 0.025 | 0.055 | 0.091 | 0.025 | 0.025 | 0.025 | 0.025 | |||
| AUC | 0.016 | 0.029 | 0.139 | 0.018 | 0.016 | 0.030 | 0.018 | ||
| 0.031 | 0.034 | 0.139 | 0.031 | 0.031 | 0.034 | 0.031 | |||
| Kappa | 0.018 | 0.016 | 0.050 | 0.016 | 0.030 | 0.016 | 0.016 | ||
| 0.025 | 0.025 | 0.050 | 0.025 | 0.034 | 0.025 | 0.025 | |||
| MAR (the ratio of missing proportion 1:2) | Sensitivity | 0.028 | 0.050 | 0.091 | 0.030 | 0.016 | 0.030 | 0.016 | |
| 0.041 | 0.059 | 0.091 | 0.041 | 0.041 | 0.041 | 0.041 | |||
| AUC | 0.029 | 0.017 | 0.029 | 0.018 | 0.030 | 0.029 | 0.018 | ||
| 0.030 | 0.030 | 0.030 | 0.030 | 0.030 | 0.030 | 0.030 | |||
| Kappa | 0.024 | 0.050 | 0.091 | 0.029 | 0.953 | 0.086 | 0.016 | ||
| 0.068 | 0.088 | 0.106 | 0.068 | 0.953 | 0.106 | 0.068 | |||
| MAR (the ratio of missing proportion 2:1) | Sensitivity | 0.016 | 0.018 | 0.172 | 0.017 | 0.016 | 0.018 | 0.016 | |
| 0.021 | 0.021 | 0.172 | 0.021 | 0.021 | 0.021 | 0.021 | |||
| AUC | 0.050 | 0.584 | 0.819 | 0.086 | 0.016 | 0.071 | 0.031 | ||
| 0.117 | 0.681 | 0.819 | 0.120 | 0.109 | 0.120 | 0.109 | |||
| Kappa | 0.016 | 0.016 | 0.016 | 0.018 | 0.016 | 0.016 | 0.016 | ||
| 0.018 | 0.018 | 0.018 | 0.018 | 0.018 | 0.018 | 0.018 | |||
p.raw: the p value of Wilcoxon signed rank test;
p.adj: the p value adjusted by the FDR method based on p.raw