| Literature DB >> 35253466 |
Anna K Bonkhoff1,2, Nicole Rübsamen2, Christian Grefkes3,4, Natalia S Rost1, Klaus Berger2, André Karch2.
Abstract
Background The treatment of stroke has been undergoing rapid changes. As treatment options progress, prediction of those under risk for complications becomes more important. Available models have, however, frequently been built based on data no longer representative of today's care, in particular with respect to acute stroke management. Our aim was to build and validate prediction models for 4 clinically important, severe outcomes after stroke. Methods and Results We used German registry data from 152 710 patients with acute ischemic stroke obtained in 2016 (development) and 2017 (validation). We took into account potential predictors that were available at admission and focused on in-hospital mortality, intracranial mass effect, secondary intracerebral hemorrhage, and deep vein thrombosis as outcomes. Validation cohort prediction and calibration performances were assessed using the following 4 statistical approaches: logistic regression with backward selection, l1-regularized logistic regression, k-nearest neighbor, and gradient boosting classifier. In-hospital mortality and intracranial mass effects could be predicted with high accuracy (both areas under the curve, 0.90 [95% CI, 0.90-0.90]), whereas the areas under the curve for intracerebral hemorrhage (0.80 [95% CI, 0.80-0.80]) and deep vein thrombosis (0.73 [95% CI, 0.73-0.73]) were considerably lower. Stroke severity was the overall most important predictor. Models based on gradient boosting achieved better performances than those based on logistic regression for all outcomes. However, area under the curve estimates differed by a maximum of 0.02. Conclusions We validated prediction models for 4 severe outcomes after acute ischemic stroke based on routinely collected, recent clinical data. Model performance was superior to previously proposed approaches. These predictions may help to identify patients at risk early after stroke and thus facilitate an individualized level of care.Entities:
Keywords: ischemic stroke; machine learning; mortality; prediction; severe outcomes
Mesh:
Year: 2022 PMID: 35253466 PMCID: PMC9075320 DOI: 10.1161/JAHA.121.023175
Source DB: PubMed Journal: J Am Heart Assoc ISSN: 2047-9980 Impact factor: 6.106
Stroke Sample Characteristics
| 2016 and 2017, N=146 062 | |
|---|---|
| Age, y | 72.7 (13.1) |
| Female sex | 69 234 (47.4) |
| Situation of living, before stroke | |
| Independently in own home | 117 055 (80.1) |
| Care at home | 15 847 (10.9) |
| Nursing home | 13 160 (9.0) |
| Comorbidities | |
| Diabetes | 42 944 (29.4) |
| Hypertension | 124 754 (85.4) |
| Previous myocardial infarct | 14 246 (9.8) |
| Previous stroke | 38 089 (26.1) |
| Hypercholesterinaemia | 84 644 (58.0) |
| Atrial fibrillation | |
| Yes, known before stroke | 28 962 (19.8) |
| Yes, previously unknown | 13 455 (9.2) |
| Stroke severity and symptoms at admission | |
| Stroke severity (NIHSS) | 5.9 (6.2) |
| 4 (6) | |
| Motor impairments | 95 636 (65.5) |
| Language impairments | 44 684 (30.6) |
| Speech impairments | 63 962 (43.8) |
| Swallowing impairments | 32 168 (22.0) |
| Consciousness | |
| Awake | 134 357 (92.0) |
| Soporific‐stuporous | 10 034 (6.9) |
| Comatose | 1671 (0.01) |
| Rankin scale | |
| 0 | 7749 (5.3) |
| 1 | 20 541 (14.1) |
| 2 | 35 672 (24.4) |
| 3 | 34 920 (23.9) |
| 4 | 23 839 (16.3) |
| 5 | 23 341 (16.0) |
| Median (interquartile range) | 3 (2) |
| Barthel index: bladder function | |
| 0 | 29 459 (20.2) |
| 5 | 18 893 (12.9) |
| 10 | 97 710 (66.9) |
| Median (interquartile range) | 10 (5) |
| Barthel index: transfer | |
| 0 | 27 596 (18.9) |
| 5 | 24 527 (16.8) |
| 10 | 35 666 (24.4) |
| 15 | 58 273 (39.9) |
| Median (interquartile range) | 10 (10) |
| Barthel index: mobility | |
| 0 | 34 310 (23.5) |
| 5 | 27 645 (18.9) |
| 10 | 36 117 (24.7) |
| 15 | 47 990 (32.9) |
| Median (interquartile range) | 10 (10) |
| Admission, times, and therapies | |
| Intravenous thrombolysis | 24 989 (17.1) |
| Intraarterial thrombectomy and thrombolysis | 10 706 (7.3) |
| Time from symptom onset until admission | |
| <1 h | 11 825 (8.1) |
| 1–2 h | 23 177 (15.9) |
| 2–3 h | 13 889 (9.5) |
| 3–3.5 h | 4697 (3.2) |
| 3.5–4 h | 4229 (2.9) |
| 4–6 h | 13 366 (9.2) |
| 6–24 h | 29 721 (20.4) |
| 24–48 h | 10 923 (7.5) |
| >48 h | 17 539 (12.0) |
| Imaging before admission | 15 270 (10.5) |
| Intensive care admission | 7752 (5.3) |
| Stroke characteristics | |
| TOAST classification | |
| Atherothrombotic | 33 314 (22.8) |
| Embolic | 46 097 (31.6) |
| Microangiopathic | 30 003 (20.5) |
| Competing | 5754 (3.9) |
| Other | 5153 (3.5) |
| Uncertain | 25 741 (17.6) |
| Large vessel stenosis | |
| Stenosis | 137 411 (94.1) |
| No stenosis | 5335 (3.7) |
| Unknown, no diagnostic tests | 3316 (2.3) |
| Complications | |
| In‐hospital mortality | 7683 (5.3) |
| Intracranial mass effect | 2411 (1.7) |
| Secondary intracerebral hemorrhage | 2580 (1.8) |
| Deep vein thrombosis | 606 (0.4) |
Please note that the variable “Admission to intensive care” was only included in the prediction models of early mortality. Although admission to intensive care necessarily occurs before a fatal outcome, the temporal order was not known for any of the other 3 complications (ie, we could not exclude that admission to intensive care was a consequence of a complication). Continuous variables are presented as mean (SD) and categorical variables as absolute count (percentage). NIHSS indicates National Institutes of Health Stroke Scale. TOAST stands for the Trial of Org 10172 in Acute Stroke Treatment.
Prediction Results for All 4 Outcomes and Prediction Models in the Temporal Validation Cohort
| Classifier | In‐hospital mortality | Intracranial mass effect | Secondary intracerebral hemorrhage | Deep vein thrombosis |
|---|---|---|---|---|
| Logistic | 0.90 (0.90–0.90) | 0.89 (0.89–0.89) | 0.79 (0.79–0.79) | 0.71 (0.71–0.71) |
|
| 0.90 (0.90–0.90) | 0.90 (0.89–0.90) | 0.80 (0.79–0.80) | 0.73 (0.72–0.73) |
| k‐nearest neighbor classifier | 0.89 (0.89–0.89) | 0.88 (0.88–0.88) | 0.78 (0.78–0.78) | 0.71 (0.71–0.72) |
| Gradient boosting classifier | 0.90 (0.90–0.90) | 0.90 (0.90–0.90) | 0.80 (0.80–0.80) | 0.73 (0.72–0.73) |
Data are shown as area under the curve (95% CI).
Figure 1The 10 most frequently selected variables in the backward stepwise logistic regression models.
After each initial downsampling step, we performed backward stepwise variable selection, that is, we only kept those input variables in the model that were significantly associated with the outcome. Because we repeated the downsampling step 100 times, an input variable could have, at maximum, been selected 100 times, or in 100% of the cases (x axis). Altogether, a variable may be considered more important in the prediction of a specific outcome, the more often it is selected. In case of secondary intracerebral hemorrhage, thrombolysis and microangiopathic stroke etiology were, for example, selected in all 100 downsampling scenarios and may thus possess the highest predictive capacity. Atherothrombotic stroke etiology and imaging before admission were selected in ≈80% of the downsampling scenarios and hence did not contribute to prediction models in ≈20% of the cases, indicating a less consistent predictive capacity. Of note, we here only measured the overall relevance, yet not the direction of the association. Each variable could thus have had a positive or negative effect on the outcome. In a second step, we retrained logistic models with the 10 most stables input variables in 100 further downsampled scenarios to compute odds ratios informing about the directionality of effects (Table S11). Tables S7 through S10 furthermore present the group averages for patients with and without a specific outcome. Because the outcome deep vein thrombosis could not be predicted as well as the other outcomes, the relevance of input variables was not as certain either, which may explain the lower overall percentages for selected variables. ICU indicates intensive care unit; NIHSS, National Institutes of Health Stroke Scale.
Figure 2Feature importance for the 10 most important input variables for each of the 4 outcomes based on the gradient boosting classifier models.
Feature importance, a measure inherent to tree‐based algorithms, is higher the more a variable contributes to the prediction of a specific outcome. Accordingly, the NIHSS score on admission was the most important variable in prediction of in‐hospital mortality, increased intracranial pressure, and deep vein thrombosis, whereas administration of thrombolytic therapy was the most telling variable in the prediction of an intracerebral hemorrhage. Individual feature importance has been normalized by the top‐ranked input variable and therefore range from 0 to 1. NIHSS indicates National Institutes of Health Stroke Scale.