| Literature DB >> 34330254 |
Ruoyu Liu1, Xin Lai2,3, Jiayin Wang1, Xuanping Zhang1, Xiaoyan Zhu1, Paul B S Lai4, Ci-Ren Guo5.
Abstract
BACKGROUND: The misestimation of surgical risk is a serious threat to the lives of patients when implementing surgical risk calculator. Improving the accuracy of postoperative risk prediction has received much attention and many methods have been proposed to cope with this problem in the past decades. However, those linear approaches are inable to capture the non-linear interactions between risk factors, which have been proved to play an important role in the complex physiology of the human body, and thus may attenuate the performance of surgical risk calculators.Entities:
Keywords: Clinical decision support system; Gradient boosting decision tree; Machine learning; Surgical risk calculator
Year: 2021 PMID: 34330254 PMCID: PMC8323237 DOI: 10.1186/s12911-021-01450-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The pseudocode of GBDT algorithm
Results of baseline models with all risk factors
| Models | AUC | BS | P-value | ||
|---|---|---|---|---|---|
| Training mode 1 (10-fold cross validation) | NL-SRC | 0.899 | 0.062 | ||
| logit | 0.884 | 0.065 | 14.196 | 0.077 | |
| CART | 0.842 | 0.071 | 24.850 | 0.002 | |
| SVM | 0.866 | 0.065 | 27.490 | <0.001 | |
| Training mode 1 (6-fold cross validation) | NL-SRC | 0.897 | 0.062 | 8.798 | 0.360 |
| logit | 0.883 | 0.065 | 13.879 | 0.085 | |
| CART | 0.826 | 0.070 | 16.848 | 0.031 | |
| SVM | 0.865 | 0.064 | 26.793 | <0.001 | |
| Training mode 2 | NL-SRC | 0.058 | 8.391 | 0.396 | |
| logit | 0.890 | 0.059 | 12.427 | 0.133 | |
| CART | 0.853 | 0.065 | 11.321 | 0.184 | |
| SVM | 0.873 | 0.059 | 10.363 | 0.240 | |
| Training mode 3 | NL-SRC | 0.872 | 15.232 | 0.055 | |
| logit | 0.875 | 0.066 | 90.989 | <0.001 | |
| CART | 0.781 | 0.077 | 12.146 | 0.145 | |
| SVM | 0.851 | 0.068 | 95.817 | <0.001 |
Results of similar models with all risk factors
| Models | AUC | BS | P-value | ||
|---|---|---|---|---|---|
| Training mode 1 (10-fold cross validation) | NL-SRC | 0.899 | 0.062 | ||
| logit | 0.884 | 0.065 | 14.196 | 0.077 | |
| RF | 0.885 | 0.064 | 10.426 | 0.236 | |
| XGBoost | 0.895 | 0.062 | 13.026 | 0.111 | |
| Training mode 1 (6-fold cross validation) | NL-SRC | 0.897 | 0.062 | 8.798 | 0.360 |
| logit | 0.883 | 0.065 | 13.879 | 0.085 | |
| RF | 0.885 | 0.064 | 12.818 | 0.118 | |
| XGBoost | 0.896 | 0.063 | 10.968 | 0.204 | |
| Training mode 2 | NL-SRC | 0.058 | 8.391 | 0.396 | |
| logit | 0.890 | 0.059 | 12.427 | 0.133 | |
| RF | 0.892 | 0.058 | 14.226 | 0.058 | |
| XGBoost | 0.900 | 0.056 | 9.439 | 0.306 | |
| Training mode 3 | NL-SRC | 0.872 | 15.232 | 0.055 | |
| logit | 0.875 | 0.066 | 90.989 | <0.001 | |
| RF | 0.879 | 0.067 | 11.522 | 0.174 | |
| XGBoost | 0.886 | 0.067 | 27.285 | <0.001 |
Fig. 2Top-15 most important risk factors and their feature importance
Results of baseline models with top-15 risk factors
| Models | AUC | BS | P-value | ||
|---|---|---|---|---|---|
| Training mode 1 (10-fold cross validation) | NL-SRC | 0.892 | 0.063 | 8.082 | 0.426 |
| logit | 0.864 | 0.068 | 13.561 | 0.094 | |
| CART | 0.841 | 0.069 | 25.614 | 0.001 | |
| SVM | 0.818 | 0.069 | 64.247 | <0.001 | |
| Training mode 1 (6-fold cross validation) | NL-SRC | 0.890 | 0.064 | 9.753 | 0.283 |
| logit | 0.867 | 0.068 | 14.603 | 0.067 | |
| CART | 0.825 | 0.071 | 20.643 | 0.008 | |
| SVM | 0.820 | 0.071 | 55.271 | <0.001 | |
| Training mode 2 | NL-SRC | ||||
| logit | 0.861 | 0.065 | 31.460 | <0.001 | |
| CART | 0.856 | 0.064 | 16.088 | 0.041 | |
| SVM | 0.827 | 0.072 | 97.202 | <0.001 | |
| Training mode 3 | NL-SRC | 0.869 | 0.066 | 17.062 | 0.030 |
| logit | 0.863 | 0.067 | 75.033 | <0.001 | |
| CART | 0.745 | 0.079 | 14.266 | 0.075 | |
| SVM | 0.856 | 0.071 | 72.822 | <0.001 |
Results of similar models with top-15 risk factors
| Models | AUC | BS | P-value | ||
|---|---|---|---|---|---|
| Training mode 1 (10-fold cross validation) | NL-SRC | 0.892 | 0.063 | 8.082 | 0.426 |
| logit | 0.864 | 0.068 | 13.561 | 0.094 | |
| RF | 0.881 | 0.066 | 12.281 | 0.139 | |
| XGBoost | 0.887 | 0.064 | 12.358 | 0.136 | |
| Training mode 1 (6-fold cross validation) | NL-SRC | 0.890 | 0.064 | 9.753 | 0.283 |
| logit | 0.867 | 0.068 | 14.603 | 0.067 | |
| RF | 0.882 | 0.066 | 13.372 | 0.100 | |
| XGBoost | 0.885 | 0.065 | 11.726 | 0.164 | |
| Training mode 2 | NL-SRC | ||||
| logit | 0.861 | 0.065 | 31.460 | <0.001 | |
| RF | 0.886 | 0.061 | 14.022 | 0.081 | |
| XGBoost | 0.882 | 0.061 | 9.740 | 0.284 | |
| Training mode 3 | NL-SRC | 0.869 | 0.066 | 17.062 | 0.030 |
| logit | 0.863 | 0.067 | 75.033 | <0.001 | |
| RF | 0.864 | 0.069 | 41.323 | <0.001 | |
| XGBoost | 0.874 | 0.069 | 67.251 | <0.001 |
Fig. 3The visualization of the 169th tree with top-15 risk factors under mode 2
Information of some numerical risk factors in SOMIP
| Name | Min. | Max. | Median | Mean |
|---|---|---|---|---|
| Age | 1 | 106 | 66 | 62.9 |
| Alb_num | 2 | 56 | 35 | 34.41 |
| Alk_num | 7 | 2173 | 75 | 93.37 |
| Urea_num | 0.1 | 69.9 | 5.6 | 7.479 |
| Base_num | -32 | 23.9 | − 1.1 | − 1.788 |
| WBC_num | 0.2 | 91.2 | 11.13 | 12.17 |
| Pulse_num | 10 | 985 | 88 | 89.21 |
| PCO2_num | 0.89 | 13.99 | 4.63 | 4.717 |
| Sodium_num | 104 | 167 | 137.1 | 137.1 |
| Max complexity score | 0 | 79 | 23 | 23.93 |
Information of some categorical risk factors in SOMIP
| Name | Categories | Number of samples |
|---|---|---|
| ASA Status | 1 | 2704 |
| 2 | 5828 | |
| 3 | 5405 | |
| 4 | 1278 | |
| 5 | 84 | |
| Bloodloss | 0 | 8548 |
| 1 | 5772 | |
| 2 | 520 | |
| 3 | 194 | |
| 4 | 413 | |
| 5 | 122 | |
| WBC_cat | L | 256 |
| N | 8220 | |
| H | 6570 | |
| M | 253 | |
| Alb_cat | VL | 1910 |
| L | 5618 | |
| N | 3964 | |
| H | 3419 | |
| M | 388 | |
| Sepsis | Yes | 5323 |
| No | 9976 | |
| Disseminated cancer | Yes | 1236 |
| No | 14063 | |
| Sex | Male | 9337 |
| Female | 5962 | |
| Current smoker | Smoker | 3154 |
| Ex-smoker | 2514 | |
| Non-smoker | 9352 | |
| Functional health status | Totally dependent | 424 |
| Partially dependent | 2033 | |
| Independent | 12842 | |
| Dyspnea | Dysponea At Re | 2077 |
| Moderate dyspnea | 576 | |
| Mild dyspnoea | 3248 | |
| No dyspnoea | 9398 | |
| Magnitude revised | Ultramajor III | 1084 |
| Ultramajor II | 1680 | |
| Ultramajor I | 2538 | |
| Ultramajor | 4 | |
| Major III | 2636 | |
| Major II | 2995 | |
| Major I | 4067 | |
| Major | 295 |