| Literature DB >> 36249261 |
Hui-Chu Tsai1, Cheng-Yang Hsieh2,3, Sheng-Feng Sung4,5.
Abstract
Background: Identifying patients at high risk of stroke-associated pneumonia (SAP) may permit targeting potential interventions to reduce its incidence. We aimed to explore the functionality of machine learning (ML) and natural language processing techniques on structured data and unstructured clinical text to predict SAP by comparing it to conventional risk scores.Entities:
Keywords: machine learning; natural language processing; pneumonia; prediction; risk score; stroke
Mesh:
Year: 2022 PMID: 36249261 PMCID: PMC9556866 DOI: 10.3389/fpubh.2022.1009164
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Risk scores for predicting stroke-associated pneumonia.
|
|
|
|
| |
|---|---|---|---|---|
|
| ||||
| ≥70 | +1 | |||
| ≥75 | +1 | +1 | ||
| 60–69 | +3 | |||
| 70–79 | +4 | |||
| 80–89 | +6 | |||
| ≥90 | +8 | |||
| Male | +1 | +1 | ||
| Diabetes | +1 | |||
| AF | +1 | |||
| CHF | +1 | |||
| Pre-stroke dependency | +2 | |||
|
| ||||
| 5–15 | +3 | +5 | +3 | |
| ≥16 | +5 | +5 | ||
| 16–20 | +8 | |||
| ≥21 | +10 | |||
| Dysphagia | +2 | +4 | ||
| Dysarthria | +1 |
AF, atrial fibrillation; CHF, congestive heart failure; NIHSS, National Institutes of Health Stroke Scale.
Baseline characteristics of the study population.
|
|
|
|
|
|
|---|---|---|---|---|
| Age | 70 (59–78) | 72 (61–80) | 69 (59–78) | <0.001 |
| Male | 3,643 (61.6) | 308 (68.4) | 3,335 (61.0) | 0.002 |
| Hypertension | 4,739 (80.2) | 361 (80.2) | 4,378 (80.1) | 0.966 |
| Diabetes | 2,422 (41.0) | 188 (41.8) | 2,234 (40.9) | 0.714 |
| Hyperlipidemia | 3,167 (53.6) | 187 (41.6) | 2,980 (54.6) | <0.001 |
| AF | 822 (13.9) | 106 (23.6) | 716 (13.1) | <0.001 |
| CHF | 226 (3.8) | 30 (6.7) | 196 (3.6) | 0.001 |
| COPD | 397 (6.7) | 34 (7.6) | 363 (6.6) | 0.458 |
| Smoking | 2,431 (41.1) | 202 (44.9) | 2,229 (40.8) | 0.090 |
| Pre-stroke dependency | 562 (9.5) | 80 (17.8) | 482 (8.8) | <0.001 |
| Pre-stroke mRS | 0 (0–0) | 0 (0–1) | 0 (0–0) | <0.001 |
| NIHSS | 5 (3–11) | 17 (9–27) | 5 (3–10) | <0.001 |
| GCS | 15 (14–15) | 13 (8–15) | 15 (15–15) | <0.001 |
| Dysphagia | 1,195 (20.2) | 282 (62.7) | 913 (16.7) | <0.001 |
| Dysarthria | 3,039 (51.4) | 338 (75.1) | 2,701 (49.4) | <0.001 |
| Glucose (mmol/L) | 7.38 (6.11–9.99) | 7.77 (6.27–10.43) | 7.33 (6.11–9.96) | 0.030 |
| WBC (109/L) | 7.68 (6.19–9.61) | 8.49 (6.63–10.96) | 7.63 (6.16–9.47) | <0.001 |
| A2DS2 | 4 (1–5) | 6 (4–6) | 3 (1–5) | <0.001 |
| ISAN | 7 (4–10) | 11 (8–14) | 7 (4–9) | <0.001 |
| PNA | 4 (1–5) | 5 (4–6) | 4 (1–5) | <0.001 |
| ACDD4 | 1 (0–2) | 5 (2–5) | 1 (0–2) | <0.001 |
P values are comparisons between patients with SAP and those without SAP for each variable.
Data are given as n (%) and median (interquartile range).
AF, atrial fibrillation; CHF, congestive heart failure; COPD, chronic obstructive pulmonary disease; GCS, Glasgow coma scale; mRS, modified Rankin Scale; NIHSS, National Institutes of Health Stroke Scale; SAP, stroke-associated pneumonia; WBC, white blood cells.
Performance of prediction models for predicting SAP.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| ML model A | 0.840 (0.806–0.875) | 83.2% | 0.254 | 0.634 | 0.363 |
| ML model B | 0.828 (0.793–0.863) | 76.3% | 0.212 | 0.786 | 0.334 |
| A2DS2 | 0.803 (0.762–0.845) | 75.1% | 0.197 | 0.741 | 0.311 |
| ISAN | 0.795 (0.752–0.837) | 76.9% | 0.202 | 0.696 | 0.313 |
| PNA | 0.778 (0.735–0.822) | 75.9% | 0.189 | 0.661 | 0.294 |
| ACDD4 | 0.807 (0.766–0.849) | 73.5% | 0.193 | 0.786 | 0.310 |
AUC, area under the receiver operating characteristic curve; CI, confidence interval; ML, machine learning; SAP, stroke-associated pneumonia.
Figure 1Receiver operating characteristic curves for predicting stroke-associated pneumonia in the holdout test set by existing pneumonia risk scores and two ML models. ML Model A was built using both structured variables and features extracted from the text. ML Model B was built using structured variables alone. The AUC (95% CI) is shown for each model. AUC, area under the receiver operating characteristic curve; CI, confidence interval; ML, machine learning.
Figure 2Calibration plots for predicting stroke-associated pneumonia in the holdout test set by existing pneumonia risk scores and two ML models. The P value for the Hosmer-Lemeshow test is shown for each model. ML, machine learning.
Figure 3The top 20 most influential features identified by the model based on both structured variables and features extracted from the text. The average impact of each feature on the model output was quantified as mean absolute Shapley values (A). Each feature's individual Shapley values for each patient are depicted in a beeswarm plot (B), where a dot's position on the x-axis denotes each feature's contribution to the model prediction for the corresponding patient. The color of the dot specifies the relative value of the corresponding feature. AST, aspartate aminotransferase; BMI, body mass index; GCS, Glasgow coma scale; HR, heart rate; INR, international normalization ratio; NIHSS, National Institutes of Health Stroke Scale; WBC, white blood cells.