| Literature DB >> 35923545 |
Xiao Zhang1, Ningbo Fei2, Xinxin Zhang1, Qun Wang1, Zongping Fang1.
Abstract
Objective: With the aging of populations and the high prevalence of stroke, postoperative stroke has become a growing concern. This study aimed to establish a prediction model and assess the risk factors for stroke in elderly patients during the postoperative period.Entities:
Keywords: MIMIC database; machine learning; post-operative; prediction model; stroke
Year: 2022 PMID: 35923545 PMCID: PMC9341133 DOI: 10.3389/fnagi.2022.897611
Source DB: PubMed Journal: Front Aging Neurosci ISSN: 1663-4365 Impact factor: 5.702
FIGURE 1Flow diagram of the selection process of patients in MIMIC III (A) and MIMIC VI (B).
FIGURE 2Schematic illustration to the performance of the stroke prediction model. The SMOTENC balancing technique was applied to training dataset before establishing the model due to the imbalanced ratio of non-stroke to stroke patients in this work. After the normalization of data was completed, we applied seven machine learning methods to train and test models with the training dataset, test dataset and the independent test dataset. Finally, we can get model-based importance of features.
Characteristics of stroke and non-stroke patients in the training and validation sets of the MIMIC-IV database.
| Variables | Total ( | Non-stroke ( | Stroke ( |
|
|
| ||||
| Age, mean ± | 72.3 ± 10.4 | 72.1 ± 10.4 | 74.0 ± 10.5 | <0.001 |
| Gender, female | 3,426 (48.1) | 3,082 (48.2) | 344 (47.1) | 0.584 |
|
| 0.039 | |||
| Asian, | 222 (3.1) | 193 (3) | 29 (4) | 0.198 |
| Black, | 592 (8.3) | 538 (8.4) | 54 (7.4) | 0.38 |
| White, | 5,083 (71.3) | 4,584 (71.7) | 499 (68.3) | 0.06 |
| Other, | 1,231 (17.3) | 1,082 (16.9) | 149 (20.4) | 0.022 |
| BMI, mean ± | 1300.8 ± 85503.1 | 1404.9 ± 90219.6 | 389.4 ± 7628.6 | 0.761 |
|
| ||||
| CHF, | 1,483 (20.8) | 1,399 (21.9) | 84 (11.5) | <0.001 |
| PVD, | 927 (13.0) | 858 (13.4) | 69 (9.4) | 0.003 |
| Hypertension, | 2,794 (39.2) | 2,278 (35.6) | 516 (70.6) | <0.001 |
| CPD, | 1,712 (24.0) | 1,595 (24.9) | 117 (16) | <0.001 |
| Diabetes, | 2,070 (29.0) | 1,879 (29.4) | 191 (26.1) | 0.074 |
| Renal_disease, | 1,349 (18.9) | 1,267 (19.8) | 82 (11.2) | <0.001 |
| Liver_disease, | 844 (11.8) | 817 (12.8) | 27 (3.7) | <0.001 |
| PUD, | 204 (2.9) | 201 (3.1) | 3 (0.4) | <0.001 |
| Cancer, | 1,405 (19.7) | 1,322 (20.7) | 83 (11.4) | <0.001 |
| Rheumatic_disease, | 259 (3.6) | 245 (3.8) | 14 (1.9) | 0.012 |
| Sepsis, | 3,033 (42.6) | 2,773 (43.3) | 260 (35.6) | <0.001 |
|
| ||||
| Spo2_min, mean ± | 91.8 ± 7.1 | 91.8 ± 6.7 | 92.1 ± 9.6 | 0.383 |
| Spo2_mean, mean ± | 96.8 ± 2.6 | 96.8 ± 2.5 | 97.0 ± 3.2 | 0.14 |
| Aniongap_min, mean ± | 13.5 ± 3.3 | 13.4 ± 3.4 | 13.8 ± 2.6 | 0.004 |
| Aniongap_max, mean ± | 16.2 ± 4.3 | 16.3 ± 4.4 | 16.0 ± 3.1 | 0.182 |
| Albumin_min, mean ± | 3.3 ± 0.5 | 3.3 ± 0.5 | 3.6 ± 0.4 | <0.001 |
| Albumin_max, mean ± | 3.4 ± 0.5 | 3.4 ± 0.5 | 3.7 ± 0.4 | <0.001 |
| Glucose_mean, mean ± | 140.5 ± 49.0 | 140.6 ± 48.5 | 139.8 ± 53.5 | 0.679 |
| Potassium_min, mean ± | 3.8 ± 0.4 | 3.8 ± 0.4 | 3.7 ± 0.3 | <0.001 |
| Potassium_max, mean ± | 4.1 ± 0.5 | 4.1 ± 0.5 | 3.9 ± 0.4 | <0.001 |
| Bilirubin_total_min, median (IQR) | 0.8 (0.4, 1.6) | 0.9 (0.4, 1.6) | 0.8 (0.4, 1.2) | <0.001 |
| Bilirubin_total_max, median (IQR) | 1.0 (0.4, 1.9) | 1.0 (0.4, 2.0) | 0.8 (0.4, 1.4) | <0.001 |
| Creatinine_min, median (IQR) | 0.9 (0.7, 1.2) | 0.9 (0.7, 1.2) | 0.8 (0.7, 1.1) | 0.002 |
| Creatinine_max, median (IQR) | 1.0 (0.8, 1.4) | 1.0 (0.8, 1.5) | 0.9 (0.8, 1.2) | <0.001 |
| Lactate_min, median (IQR) | 1.5 (1.2, 1.9) | 1.5 (1.2, 1.9) | 1.4 (1.2, 1.7) | <0.001 |
| Lactate_max, median (IQR) | 2.0 (1.4, 2.9) | 2.1 (1.4, 2.9) | 1.9 (1.4, 2.4) | <0.001 |
| Platelets_min, median (IQR) | 193.0 (141.0, 253.0) | 192.0 (139.0, 254.0) | 196.0 (157.0, 246.5) | 0.12 |
| Platelets_max, median (IQR) | 218.0 (165.0, 285.0) | 218.0 (163.0, 287.0) | 216.0 (171.0, 273.0) | 0.816 |
| Ptt_min, median (IQR) | 28.3 (25.0, 32.6) | 28.4 (25.2, 33.1) | 26.8 (23.6, 30.0) | <0.001 |
| Ptt_max, median (IQR) | 30.5 (26.5, 40.2) | 30.8 (26.7, 42.0) | 28.6 (24.9, 33.3) | <0.001 |
| Inr_min, median (IQR) | 1.2 (1.1, 1.4) | 1.2 (1.1, 1.4) | 1.1 (1.0, 1.2) | <0.001 |
| Inr_max, median (IQR) | 1.2 (1.1, 1.6) | 1.2 (1.1, 1.7) | 1.1 (1.1, 1.4) | <0.001 |
| Pt_min, median (IQR) | 13.0 (11.7, 15.1) | 13.1 (11.7, 15.3) | 12.3 (11.4, 13.6) | <0.001 |
| Pt_max, median (IQR) | 13.8 (12.1, 18.8) | 13.9 (12.1, 19.3) | 13.0 (11.8, 15.5) | <0.001 |
| Bun_min, median (IQR) | 18.0 (12.0, 26.2) | 18.0 (13.0, 27.0) | 16.0 (12.0, 21.0) | <0.001 |
| Bun_max, median (IQR) | 20.0 (15.0, 31.0) | 21.0 (15.0, 32.0) | 19.0 (15.0, 24.0) | <0.001 |
| Wbc_min, median (IQR) | 9.3 (6.8, 12.6) | 9.3 (6.8, 12.7) | 9.2 (7.1, 11.9) | 0.951 |
| Wbc_max, median (IQR) | 11.7 (8.6, 15.9) | 11.8 (8.6, 16.1) | 11.1 (8.7, 14.5) | <0.001 |
|
| ||||
| TP_mean, mean ± | 100.8 ± 20.3 | 101.3 ± 20.5 | 96.9 ± 18.4 | <0.001 |
| HR_max, mean ± | 82.4 ± 15.0 | 82.8 ± 15.2 | 78.5 ± 12.8 | <0.001 |
| HR_mean, mean ± | 153.9 ± 23.3 | 152.8 ± 23.3 | 164.2 ± 21.2 | <0.001 |
| Sbp_max, mean ± | 123.5 ± 17.4 | 122.5 ± 17.4 | 132.3 ± 14.4 | <0.001 |
| Sbp_mean, mean ± | 88.0 ± 20.2 | 87.7 ± 20.2 | 90.9 ± 19.4 | <0.001 |
| Dbp_max, mean ± | 62.6 ± 10.7 | 62.3 ± 10.8 | 64.8 ± 10.1 | <0.001 |
| Dbp_mean, mean ± | 106.6 ± 24.0 | 106.1 ± 24.1 | 110.6 ± 22.5 | <0.001 |
| Mbp_max, mean ± | 79.5 ± 11.0 | 79.1 ± 11.1 | 83.0 ± 9.8 | <0.001 |
| Mbp_mean, mean ± | 36.9 ± 0.6 | 36.8 ± 0.6 | 36.9 ± 0.6 | <0.001 |
Continuous variables are presented as the median and interquartile range (IQR). Count data are presented as numbers and percentages. Severe respiratory failure, severe coagulation failure, severe liver failure, severe cardiovascular failure, severe central nervous failure, and severe renal failure refer to the scores of the specific organ or system that scored 4 in the SOFA scheme. The definition of the medical condition was based on the ICD-9 code. A mean, minimum, or maximum parameter refers to the mean, the highest, or the lowest level of the parameter on the first day of ICU admission. CHF, congestive heart failure; PVD, peripheral vascular disease; CPD, chronic pulmonary disease; PUD, chronic pulmonary disease; Spo2, finger pulse oxygen saturation; ptt, partial thromboplastin time; INR, international normalized ratio; pt, prothrombin time; BUN, blood urea nitrogen; wbc, white blood cells; TP, temperature; HR, heart rate; sbp, systolic blood pressure; dbp, diastolic blood pressure; mbp, mean blood pressure.
FIGURE 3Performance evaluation for seven machine learning algorithms with ROC curves. (A) ROC curves were drawn for the validation set based on MIMIC VI performed by leaving 20% as a testing set and using the rest for the training set. (B) ROC curves were drawn for the independent testing set based on MIMIC III. The mean ROC curve of XGB is shown in pink and its corresponding 95% confidence interval is shown in deep blue.
Performance of machine learning methods in different data sets.
| Accuracy | Sensitivity | Specificity | AUC | ||
| The validating set | KNN | 00.59 (0.47–0.65) | 0.75 (0.65–0.9) | 0.57 (0.43–0.64) | 0.69 (0.66–0,73) |
| LR | 0.68 (0.55–0.79) | 0.71 (0.55–0.86) | 0.67 (0.51–0.82) | 0.75 (0.71–0.78) | |
| RF | 0.69 (0.56–0.79) | 0.74 (0.6–0.88) | 0.69 (0.53–0.81) | 0.78 (0.74–0.81) | |
| DT | 00.79 (0.77–0.81) | 0.34 (0.26–0.41) | 0.84 (0.82–0.87) | 0.59 (0.55–0.63) | |
| SVM | 0.69 (0.59–0.78) | 0.75 (0.62–0.86) | 0.68 (0.56–0.8) | 0.76 (0.73–0.8) | |
| MLP | 0.64 (0.52–0.76) | 0.75 (0.58–0.89) | 0.63 (0.47–0.78) | 0.74 (0.7–0.77) | |
| XGB | 0.68 (0.57–0.78) | 0.77 (0.63–0.9) | 0.67 (0.53–0.8) | 0.78 (0.75–0.81) | |
| The independent | KNN | 0.82 (0.72–0.87) | 0.98 (0.97–0.99) | 0.25 (0.16–0.31) | 0.84 (0.81–0.88) |
| LR | 0.81 (0.68–0.9) | 0.95 (0.94–0.96) | 0.13 (0.1–0.2) | 0.67 (0.65–0.69) | |
| RF | 0.88 (0.79–0.93) | 0.97 (0.96–0.98) | 0.33 (0.2–0.49) | 0.84 (0.8–0.87) | |
| DT | 0.87 (0.84–0.89) | 0.94 (0.94–0.95) | 0.15 (0.09–0.24) | 0.57 (0.51–0.63) | |
| SVM | 0.87 (0.78–0.92) | 0.96 (0.95–0.97) | 0.26 (0.17–0.38) | 0.77 (0.74–0.81) | |
| MLP | 0.84 (0.75–0.91) | 0.97 (0.96–0.98) | 0.24 (0.16–0.35) | 0.8 (0.76–0.84) | |
| XGB | 0.87 (0.78–0.93) | 0.97 (0.96–0.98) | 0.3 (0.19–0.45) | 0.83 (0.79–0.87) |
FIGURE 4Significance of the predictors in the XGB model. The 20 variables with the highest relative importance are measured by the amount the variable reduced the information gain. CHF, congestive heart failure; CPD, chronic pulmonary disease; PVD, peripheral vascular disease; inr, international normalized ratio; spo2, Finger pulse oxygen saturation; sbp, systolic blood pressure; PUD, chronic pulmonary disease; bun, blood urea nitrogen.
Confusion matrix of the XGBoost model.
| Predicted: non-stroke | Predicted: stroke | ||
| The validating set | Actual: non-stroke | 988 | 292 |
| Actual: stroke | 52 | 94 | |
| The independent testing set | Actual: non-stroke | 572 | 46 |
| Actual: stroke | 19 | 24 |