| Literature DB >> 36090880 |
Yan Zheng1,2, Yuan-Xiang Lin1,2,3,4, Qiu He1,2, Ling-Yun Zhuo1,2, Wei Huang1,2, Zhu-Yu Gao1,2, Ren-Long Chen1,2, Ming-Pei Zhao1,2, Ze-Feng Xie5, Ke Ma6, Wen-Hua Fang1,2,3,4, Deng-Liang Wang1,2,3,4, Jian-Cai Chen5, De-Zhi Kang1,2,3,4,6, Fu-Xin Lin1,2,3,4,6.
Abstract
Background: Stroke-associated pneumonia (SAP) contributes to high mortality rates in spontaneous intracerebral hemorrhage (sICH) populations. Accurate prediction and early intervention of SAP are associated with prognosis. None of the previously developed predictive scoring systems are widely accepted. We aimed to derive and validate novel supervised machine learning (ML) models to predict SAP events in supratentorial sICH populations.Entities:
Keywords: ensemble model; intracerebral hemorrhage; machine learning; pneumonia; predict; stroke
Year: 2022 PMID: 36090880 PMCID: PMC9452786 DOI: 10.3389/fneur.2022.955271
Source DB: PubMed Journal: Front Neurol ISSN: 1664-2295 Impact factor: 4.086
Figure 1Flowchart of the current work. (A) Participant enrollment in the retrospective cohort of the Risa-MIS-ICH study; (B) Data flow from the FAHFMU subcohort; (C) The prediction model derivations and internal/cross-/external validations for SAP events. sICH, supratentorial intracerebral hemorrhage; ML, machine learning; LASSO, least absolute shrinkage and selection operator; SAP, stroke-associated pneumonia.
Baseline characteristics.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
|
|
|
| |||
| Age (years) | 58.6 (±11.8) | 60.0 (±12.6) | 0.370 | 62.7 (±12.7) | 66.0 ± (13.5) | 0.182 |
|
| ||||||
| Male ( | 155 (68.3%) | 69 (71.1%) | 0.694 | 59 (55.7%) | 25 (65.8%) | 0.339 |
| Female ( | 72 (31.7%) | 28 (28.9%) | 47 (44.3%) | 13 (34.2%) | ||
|
| ||||||
| Hypertension ( | 163 (71.8%) | 74 (76.3%) | 0.416 | 65 (61.3%) | 27 (71.7%) | 0.329 |
| Diabetes mellitus ( | 29 (12.8%) | 13 (13.4%) | 1.000 | 4 (3.8%) | 3 (7.9%) | 0.381 |
| Heart disease ( | 8 (3.5%) | 4 (4.1%) | 1.000 | 2 (1.9%) | 2 (5.3%) | 0.284 |
| Smoking ( | 59 (26.0%) | 24 (24.7%) | 0.890 | - | - | - |
| Alcohol abuse ( | 59 (26.0%) | 23 (23.7%) | 0.679 | - | - | - |
| Previous surgery ( | 48 (21.1%) | 19 (19.6%) | 0.768 | 2 (1.9%) | 4 (10.5%) | 0.042 |
|
| ||||||
| Neurological dysfunction ( | 201 (88.5%) | 72 (74.2%) | 0.002 | 91 (85.8%) | 31 (81.6%) | 0.600 |
| Unconsciousness ( | 54 (23.8%) | 71 (73.2%) | <0.001 | 27 (25.5%) | 27 (71.1%) | <0.001 |
| Epileptic attack ( | 4 (1.8%) | 4 (4.1%) | 0.246 | 2 (1.9%) | 0 | 1.000 |
| Headache ( | 71 (31.3%) | 24 (24.7%) | 0.287 | 91 (85.8%) | 21 (55.3%) | <0.001 |
| Others ( | 93 (41.0%) | 39 (40.2%) | 0.903 | 94 (88.7%) | 29 (76.3%) | 0.105 |
| Interval time from onset to admission (h) | 12.0 (7.0, 24.0) | 10.0 (6.5, 16.0) | 0.022 | 3.0 (2.0, 8.3) | 3.0 (2.0, 4.5) | 0.103 |
|
| ||||||
| Temperature (°C) | 36.5 (36.5, 36.8) | 36.7 (36.5, 36.9) | 0.115 | 36.6 (36.5, 36.8) | 36.6 (36.5, 36.7) | 0.667 |
| Heart rate (min−1) | 77 (±14) | 83 (±17) | 0.002 | 81 (±12) | 84 (±14) | 0.237 |
| Respiratory rate (min−1) | 20(19, 20) | 20(19, 21) | 0.008 | 20 (20, 20) | 20 (20, 20) | 0.998 |
| Systolic BP (mmHg) | 158 (±24) | 162 (±25) | 0.145 | 170 (±24) | 174 (±27) | 0.473 |
| Dilated BP (mmHg) | 93 (±15) | 92 (±14) | 0.610 | 100 (±15) | 101.8 (±16) | 0.453 |
|
| ||||||
| 15 ( | 106 (46.7%) | 12 (12.4%) | <0.001 | 80 (75.5%) | 10 (26.3%) | <0.001 |
| 13–14 ( | 77 (33.9%) | 33 (34.0%) | 8 (7.5%) | 5 (13.2%) | ||
| 9–12 ( | 31 (13.7%) | 19 (19.6%) | 14 (13.2%) | 15 (39.5%) | ||
| 5–8 ( | 13 (5.7%) | 33 (34.0%) | 4 (3.8%) | 8 (21.1%) | ||
| Hospital costs (thousand CNY) | 17.0 (12.5, 25.8) | 49.7 (34.4, 91.0) | <0.001 | 7.7 (6.5, 10.8) | 25.1 (14.6, 35.7) | <0.001 |
| Hospital stay (d) | 15 (11, 20) | 17 (13, 24) | 0.003 | 14 (12, 15) | 23 (15, 29) | <0.001 |
|
| ||||||
| Home/nursing or rehabilitation ( | 96 (42.3%) | 46 (47.6%) | 0.463 | 97 (91.5%) | 29 (76.3%) | 0.022 |
| Care withdrawal or hospital death ( | 131 (57.7%) | 51 (52.6%) | 9 (8.5%) | 9 (23.7%) | ||
|
| ||||||
| Survival ≥ 1 year ( | 168 (74.0%) | 63 (64.9%) | 0.009 | 77 (72.6%) | 20 (52.6%) | 0.013 |
| 3 Months−1 year ( | 4 (1.8%) | 6 (6.2%) | 2 (1.9%) | 2 (5.3%) | ||
| <3 Months ( | 7 (3.1%) | 10 (10.3%) | 1 (0.9%) | 4 (10.5%) | ||
| Loss of follow-up ( | 48 (21.1%) | 18 (18.6%) | 26 (24.5%) | 12 (31.6%) | ||
These prognostic variables were not included in further multivariate analysis and model derivations/validations.
SAP, stroke-associated pneumonia; BP, blood pressure; GCS, Glasgow Coma Scale; CNY, Chinese yuan.
Variables of laboratory results, imaging features, and early clinical interventions.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
|
|
|
| |||
| RBC (1012 L−1) | 4.66 (4.30, 4.94) | 4.59 (4.15, 4.87) | 0.097 | 4.66 (4.29, 5.08) | 4.64 (4.30, 5.11) | 0.928 |
| Hemoglobin (g·L−1) | 142.2 (±14.2) | 140.2 (±15.3) | 0.278 | 139.9 (±17.2) | 137.9 (±20.2) | 0.553 |
| Hematocrit | 0.41 (±0.04) | 0.41 (±0.04) | 0.681 | 0.42 (±0.05) | 0.41 (±0.05) | 0.335 |
| WBC (109 L−1) | 8.52 (6.61, 10.64) | 10.17 (7.54, 13.01) | <0.001 | 8.27 (6.62, 10.83) | 9.95 (7.77, 12.28) | 0.014 |
| Neutrophil (109 L−1) | 6.46 (4.42, 8.72) | 8.46 (5.49, 11.61) | <0.001 | 5.86 (4.41, 8.45) | 7.49 (5.03, 10.38) | 0.016 |
| Lymphocyte (109 L−1) | 1.29 (0.86, 1.66) | 1.04 (0.70, 1.39) | 0.001 | 1.37 (0.99, 1.82) | 1.46 (0.99, 1.90) | 0.724 |
| Platelet (109 L−1) | 217.4 (±62.3) | 214.8 (±63.3) | 0.897 | 235.1 (±62.4) | 221.2 (±55.7) | 0.227 |
| PT (s) | 11.1 (10.8, 11.7) | 11.1 (10.6, 11.9) | 0.925 | 11.3 (10.9, 11.8) | 11.4 (10.9, 12.2) | 0.307 |
| PT-INR | 0.97 (0.94, 1.02) | 0.97 (0.93, 1.04) | 0.554 | 0.98 (0.94, 1.03) | 0.99 (0.94, 1.07) | 0.294 |
| APTT (s) | 25.0 (22.2, 27.9) | 24.1 (21.8, 27.2) | 0.200 | 25.3 (23.9, 27.1) | 24.8 (23.2, 26.8) | 0.385 |
| Fibrinogen (g·L−1) | 2.64 (2.23, 3.04) | 2.69 (2.30, 3.13) | 0.677 | 2.62 (2.20, 3.12) | 2.68 (2.35, 3.18) | 0.607 |
| Serum creatinine (μmol·L−1) | 67.0 (54.0, 78.3) | 66.0 (54.7, 78.2) | 0.769 | 66.0 (57.0, 82.0) | 71.5 (58.8, 95.0) | 0.098 |
| Serum urea nitrogen (mmol·L−1) | 5.02 (4.13, 5.94) | 5.15 (4.27, 6.59) | 0.259 | 4.85 (4.00, 5.83) | 5.10 (4.28, 6.85) | 0.276 |
| Serum sodium (mmol·L−1) | 139.5 (±3.9) | 139.9 (±4.6) | 0.486 | 138.7 (±3.5) | 138.1 (±3.1) | 0.386 |
| Serum potassium (mmol·L−1) | 3.80 (±0.42) | 3.84 (±0.47) | 0.474 | 3.88 (±0.53) | 3.92 (±0.61) | 0.723 |
| Serum calcium (mmol·L−1) | 2.28 (±0.54) | 2.20 (±0.13) | 0.158 | 2.36 (±0.12) | 2.36 (±0.15) | 0.802 |
| Serum chloride (mmol·L−1) | 102.0 (99.0, 105.0) | 102.6 (99.0, 105.0) | 0.743 | 100.6 (97.8, 102.5) | 99.4 (96.3, 101.4) | 0.058 |
| sICH volume (cc) | 8.7 (3.9, 17.2) | 22.5 (9.4, 37.9) | <0.001 | 6.8 (3.5, 13.4) | 21.7 (6.3, 40.4) | <0.001 |
| Lobar Involvement ( | 38 (16.7%) | 25 (25.8%) | 0.067 | 23 (21.7%) | 12 (31.6%) | 0.271 |
| Frontal lobe ( | 17 (7.5%) | 14 (14.4%) | 0.063 | 8 (7.5%) | 5 (13.2%) | 0.328 |
| Parietal lobe ( | 15 (6.6%) | 13 (13.4%) | 0.054 | 10 (9.4%) | 4 (10.5%) | 1.000 |
| Temporal lobe ( | 17 (7.5%) | 14 (14.4%) | 0.063 | 10 (9.4%) | 9 (23.47%) | 0.047 |
| Occipital lobe ( | 7 (3.1%) | 3 (3.1%) | 1.000 | 5 (4.7%) | 2 (5.3%) | 1.000 |
| Deep Involvement ( | 204 (89.9%) | 87 (89.7%) | 1.000 | 87 (82.1%) | 31 (81.6%) | 1.000 |
| Basal ganglia ( | 174 (76.7%) | 74 (76.3%) | 1.000 | 66 (62.3%) | 29 (76.3%) | 0.162 |
| Thalamus ( | 56 (24.7%) | 33 (34.0%) | 0.103 | 33 (31.1%) | 11 (28.9%) | 0.841 |
| Corona radiata ( | 5 (2.2%) | 4 (4.1%) | 0.552 | 6 (5.7%) | 6 (15.8%) | 0.082 |
| Insular lobe ( | 4 (1.8%) | 1 (1.0%) | 1.000 | 9 (8.5%) | 6 (15.8%) | 0.223 |
| Intraventricular involvement ( | 60 (26.4%) | 47 (48.5%) | <0.001 | 37 (34.9%) | 15 (39.5%) | 0.695 |
| Unilateral ventricle ( | 26 (11.5%) | 13 (13.4%) | <0.001 | 21 (19.8%) | 7 (18.4%) | 0.227 |
| Bilateral ventricles ( | 33 (14.5%) | 33 (34.0%) | 15 (14.2%) | 8 (21.1%) | ||
| Third ventricle ( | 29 (12.8%) | 26 (26.8%) | 0.003 | 17 (16.0%) | 10 (26.3%) | 0.224 |
| Fourth ventricle ( | 24 (10.6%) | 22 (22.7%) | 0.006 | 14 (13.2%) | 7 (18.4%) | 0.593 |
| Subarachnoid involvement ( | 7 (3.1%) | 8 (8.2%) | 0.050 | 3 (2.8%) | 1 (2.6%) | 1.000 |
| ICU Stay ( | 14 (6.2%) | 39 (40.2%) | <0.001 | 0 | 8 (21.1%) | <0.001 |
| Nasogastric feeding ( | 59 (26.0%) | 84 (86.6%) | <0.001 | 11 (10.4%) | 24 (63.2%) | <0.001 |
|
| ||||||
| None ( | 215 (94.7%) | 48 (49.5%) | <0.001 | 105 (99.1%) | 30 (78.9%) | <0.001 |
| Endotracheal Intubation ≤ 24 h or Naso-/oropharyngeal airway ( | 2 (0.9%) | 13 (13.4%) | 0 | 4 (10.5%) | ||
| Endotracheal intubation > 24 h or tracheotomy ( | 10 (4.4%) | 36 (37.1%) | 1 (0.9%) | 4 (10.5%) | ||
| Surgery | 18 (7.9%) | 50 (51.5%) | <0.001 | 14 (13.2%) | 22 (57.9%) | <0.001 |
| Only sICH evacuation ( | 11 (4.8%) | 20 (20.6%) | <0.001 | 0 | 4 (10.5%) | 0.004 |
| Only endoscopic sICH evacuation ( | 1 (0.4%) | 1 (1.0%) | 0.510 | 0 | 0 | - |
| Only sICH catheter evacuation ( | 0 | 2 (2.1%) | 0.089 | 9 (8.5%) | 7 (18.4%) | 0.089 |
| Only EVD approach ( | 4 (1.8%) | 15 (15.5%) | <0.001 | 3 (2.8%) | 9 (23.7%) | <0.001 |
| Ensemble approaches ( | 2 (0.9%) | 12 (12.4%) | <0.001 | 2 (1.9%) | 2 (5.3%) | 0.573 |
These prognostic variables were not included in further multivariate analysis and model derivations/validations.
RBC, red blood cell; WBC, white blood cell; PT, prothrombin time; INR, international normalized ratio; APTT, activated partial thromboplastin time; sICH, supratentorial intracerebral hemorrhage; ICU, intensive care unit; EVD, external ventricular drainage.
Figure 2Importance ranking of six independent variables selected by LASSO regression: (1) nasogastric feeding, (2) airway support, (3) unconscious onset, (4) surgery for EVD, (5) larger sICH volume, and (6) ICU stay. EVD, external ventricular drainage; ICU, intensive care unit.
Figure 3Multivariate analysis and variable filtrations with LASSO regression. The tuning parameter (λ) was selected for the minimized MSE in the LASSO model using 10-fold cross-validation. Features with nonzero coefficients were selected while the previous λ value was applied. (A) The MSE was plotted vs. log λ. An optimal λ value of 0.02477 was chosen via the minimum criteria and presented as a black vertical dashed line. (B) LASSO coefficient profiles of the features. Each colored line represents the coefficient of each feature, and six of them were selected as independent variables when λ equals 0.02477. MSE, mean-square error.
Figure 4ROC curves for SAP on the (A) internal and (B) external validation datasets. A greater AUC value indicated a higher predictive ability of the models. ROC, receiver operating characteristic; AUC, area under the curve.
Performance metrics of the ML models in the FAHFMU validation dataset and external subcohort.
|
|
|
|
| |
|---|---|---|---|---|
|
| ||||
| LR | 0.838 (0.765, 0.911) | 0.827 (0.752, 0.901) | 0.615 (0.519, 0.712) | 0.903 (0.844, 0.961) |
| GNB | 0.861 (0.793, 0.930) | 0.816 (0.740, 0.893) | 0.615 (0.519, 0.712) | 0.889 (0.827, 0.951) |
| RF | 0.837 (0.763, 0.910) | 0.816 (0.740, 0.893) | 0.462 (0.363, 0.560) | 0.944 (0.899, 0.990) |
| KNN | 0.807 (0.729, 0.885) | 0.786 (0.704, 0.867) | 0.500 (0.401, 0.599) | 0.889 (0.827, 0.951) |
| SVM | 0.770 (0.687, 0.854) | 0.786 (0.704, 0.867) | 0.500 (0.401, 0.599) | 0.889 (0.827, 0.951) |
| XGB | 0.839 (0.766, 0.912) | 0.827 (0.752, 0.901) | 0.692 (0.601, 0.784) | 0.875 (0.810, 0.940) |
| ESVM | 0.830 (0.756, 0.904) | 0.837 (0.764, 0.910) | 0.615 (0.519, 0.712) | 0.917 (0.862, 0.971) |
|
| ||||
| LR | 0.867 (0.812, 0.923) | 0.812 (0.749, 0.876) | 0.447 (0.366, 0.529) | 0.943 (0.906, 0.981) |
| GNB | 0.856 (0.798, 0.913) | 0.833 (0.772, 0.894) | 0.553 (0.471, 0.634) | 0.934 (0.893, 0.975) |
| RF | 0.844 (0.784, 0.903) | 0.806 (0.741, 0.870) | 0.368 (0.290, 0.447) | 0.962 (0.931, 0.993) |
| KNN | 0.734 (0.662, 0.806) | 0.778 (0.710, 0.846) | 0.395 (0.315, 0.475) | 0.915 (0.870, 0.961) |
| SVM | 0.730 (0.658, 0.803) | 0.778 (0.710, 0.846) | 0.395 (0.315, 0.475) | 0.915 (0.870, 0.961) |
| XGB | 0.856 (0.799, 0.913) | 0.792 (0.725, 0.858) | 0.421 (0.340, 0.502) | 0.925 (0.881, 0.968) |
| ESVM | 0.843 (0.784, 0.902) | 0.812 (0.749, 0.876) | 0.447 (0.366, 0.529) | 0.943 (0.906, 0.981) |
AUC, area under the curve; LR, logistic regression; GNB, Gaussian naïve Bayes; RF, random forest; KNN, K-nearest neighbor; SVM, support vector machine; XGB, extreme gradient boosting; ESVM, ensemble soft voting model.
Figure 5Kaplan–Meier curves of participants with/without SAP over 1-year follow-up. The colored area represents the 95% confidence intervals of the survival rates.