| Literature DB >> 35175201 |
Sheng-Feng Sung1,2, Cheng-Yang Hsieh3, Ya-Han Hu4.
Abstract
BACKGROUND: Several prognostic scores have been proposed to predict functional outcomes after an acute ischemic stroke (AIS). Most of these scores are based on structured information and have been used to develop prediction models via the logistic regression method. With the increased use of electronic health records and the progress in computational power, data-driven predictive modeling by using machine learning techniques is gaining popularity in clinical decision-making.Entities:
Keywords: MetaMap; acute ischemic stroke; bag-of-words; extreme gradient boosting; machine learning; natural language processing; outcome prediction; text classification; unstructured clinical text
Year: 2022 PMID: 35175201 PMCID: PMC8895286 DOI: 10.2196/29806
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Model development and validation. ASTRAL: Acute Stroke Registry and Analysis of Lausanne; NIHSS: National Institutes of Health Stroke Scale; PLAN: preadmission comorbidities, level of consciousness, age, and neurological deficit.
Figure 2Keyness plots showing the top 20 concepts that frequently appear in the (A) HPIs and (B) CT reports of patients with good or poor functional outcomes. The prefix before the concept is the concept unique identifier. A negated concept is suffixed with “_Neg.” CT: computed tomography; HPI: history of present illness.
Baseline characteristics of the study population.
| Characteristics | All (N=3847) | Functional outcome | |||
|
|
| Good (n=2173) | Poor (n=1674) |
| |
| Age (years), mean (SD) | 69.5 (12.3) | 66.1 (11.9) | 74.0 (11.4) | <.001 | |
| Female, n (%) | 1583 (41.1) | 771 (35.5) | 812 (48.5) | <.001 | |
| Hypertension, n (%) | 3098 (80.5) | 1694 (78) | 1404 (83.9) | <.001 | |
| Diabetes mellitus, n (%) | 1602 (41.6) | 846 (38.9) | 756 (45.2) | <.001 | |
| Hyperlipidemia, n (%) | 2195 (57.1) | 1323 (60.9) | 872 (52.1) | <.001 | |
| Atrial fibrillation, n (%) | 684 (17.8) | 246 (11.3) | 438 (26.2) | <.001 | |
| Congestive heart failure, n (%) | 196 (5.1) | 68 (3.1) | 128 (7.6) | <.001 | |
| Cancer, n (%) | 249 (6.5) | 106 (4.9) | 143 (8.5) | <.001 | |
| Preadmission dependence (mRSa score of >2), n (%) | 419 (10.9) | 29 (1.3) | 390 (23.3) | <.001 | |
| Onset-to-admission delay (>3 hours), n (%) | 2763 (71.8) | 1574 (72.4) | 1189 (71) | .34 | |
| NIHSSb score, median (IQR) | 5 (3-10) | 4 (2-6) | 10 (5-19) | <.001 | |
| Glucose (mg/dl), mean (SD) | 163 (83) | 161 (82) | 166 (84) | .06 | |
| PLANc score, median (IQR) | 8 (6-12) | 7 (6-8) | 12 (9-17) | <.001 | |
| ASTRALd score, median (IQR) | 21 (18-27) | 19 (16-22) | 27 (22-39) | <.001 | |
amRS: modified Rankin Scale.
bNIHSS: National Institutes of Health Stroke Scale.
cPLAN: preadmission comorbidities, level of consciousness, age, and neurological deficit.
dASTRAL: Acute Stroke Registry and Analysis of Lausanne.
Figure 3(A) A bar chart showing the top 20 most important features of simple model 2 according to the average absolute SHAP values, which indicate the average impact on model output. (B) A bee swarm plot for the top 20 features in which each dot represents an individual patient. A dot’s position on the x-axis indicates the impact that a feature has on the model’s prediction for that patient. The color of the dot specifies the relative value of the corresponding feature (concept). A higher feature value means that the concept appears more times in the clinical text. The prefix before the concept is the concept unique identifier. A negated concept is suffixed with “_Neg”. CT: computed tomography; HPI: history of present illness; SHAP: Shapley additive explanations.
Figure 4Receiver operating characteristic curves for predicting a poor functional outcome for (A) models without age and (B) models with age. ASTRAL: Acute Stroke Registry and Analysis of Lausanne; AUC: area under the receiver operating characteristic curve; CT: computed tomography; HPI: history of present illness; NIHSS: National Institutes of Health Stroke Scale; PLAN: preadmission comorbidities, level of consciousness, age, and neurological deficit.
Comparison of the performance of baseline models with or without added information from clinical text.
| Model | AUCa (95% CI) | NRIb, % (95% CI) | IDIc, % (95% CI) | |||
| Age and NIHSSd score | 0.841 (0.815-0.867) | N/Ae | N/A | N/A | N/A | N/A |
| Age and NIHSS score plus text | 0.861 (0.837-0.885) | .002 | 0.427 (0.302-0.551) | <.001 | 0.042 (0.029-0.054) | <.001 |
| PLANf score | 0.837 (0.811-0.863) | N/A | N/A | N/A | N/A | N/A |
| PLAN score plus text | 0.856 (0.835-0.882) | <.001 | 0.543 (0.420-0.665) | <.001 | 0.038 (0.026-0.051) | <.001 |
| ASTRALg score | 0.840 (0.814-0.866) | N/A | N/A | N/A | N/A | N/A |
| ASTRAL score plus text | 0.860 (0.837-0.884) | .004 | 0.443 (0.318-0.567) | <.001 | 0.044 (0.031-0.057) | <.001 |
aAUC: area under the receiver operating characteristic curve.
bNRI: net reclassification improvement.
cIDI: integrated discrimination improvement.
dNIHSS: National Institutes of Health Stroke Scale.
eN/A: not applicable.
fPLAN: preadmission comorbidities, level of consciousness, age, and neurological deficit.
gASTRAL indicates Acute Stroke Registry and Analysis of Lausanne.