| Literature DB >> 33208913 |
Domenico Scrutinio1, Carlo Ricciardi2,3, Leandro Donisi1,4, Ernesto Losavio1, Petronilla Battista1, Pietro Guida1, Mario Cesarelli1,5, Gaetano Pagano1, Giovanni D'Addio1.
Abstract
Stroke is among the leading causes of death and disability worldwide. Approximately 20-25% of stroke survivors present severe disability, which is associated with increased mortality risk. Prognostication is inherent in the process of clinical decision-making. Machine learning (ML) methods have gained increasing popularity in the setting of biomedical research. The aim of this study was twofold: assessing the performance of ML tree-based algorithms for predicting three-year mortality model in 1207 stroke patients with severe disability who completed rehabilitation and comparing the performance of ML algorithms to that of a standard logistic regression. The logistic regression model achieved an area under the Receiver Operating Characteristics curve (AUC) of 0.745 and was well calibrated. At the optimal risk threshold, the model had an accuracy of 75.7%, a positive predictive value (PPV) of 33.9%, and a negative predictive value (NPV) of 91.0%. The ML algorithm outperformed the logistic regression model through the implementation of synthetic minority oversampling technique and the Random Forests, achieving an AUC of 0.928 and an accuracy of 86.3%. The PPV was 84.6% and the NPV 87.5%. This study introduced a step forward in the creation of standardisable tools for predicting health outcomes in individuals affected by stroke.Entities:
Mesh:
Year: 2020 PMID: 33208913 PMCID: PMC7674405 DOI: 10.1038/s41598-020-77243-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The workflow of the study is represented: the data of 1207 patients from three facilities of Maugeri Institute in the South and in the North of Italy were collected and used to create models through a multivariate logistic regression and tree-based ML algorithms to predict three-year mortality in stroke patients after rehabilitation.
Baseline characteristics.
| Age (years), mean (SD) | 71 (12) | |
| < 65 years, n (%) | 289 (23.9) | |
| 65 to 74 years, n (%) | 348 (28.8) | |
| ≥ 75 years, n (%) | 570 (47.2) | |
| Male sex, n (%) | 667 (55.3) | |
| Marital status—married, n (%) | 863 (71.5) | |
| Retired, n (%) | 793 (65.7) | |
| Hypertension, n (%) | 874 (72.4) | |
| Diabetes, n (%) | 352 (29.2) | |
| COPD, n (%) | 170 (14.1) | |
| CAD, n (%) | 149 (12.3) | |
| Atrial fibrillation, n (%) | 295 (24.4) | |
| Anemia (haemoglobin < 13 g/dL in men, < 12 g/dL in women), n (%) | 406 (33.6) | |
| Renal dysfunction (eGFR < 60 mL/min/1.73 m2), n (%) | 206 (17.1) | |
| CMG 108, n (%) | 136 (11.3) | |
| CMG 109, n (%) | 121 (10.0) | |
| CMG 110, n (%) | 950 (78.7) | |
| Time from stroke onset to rehabilitation admission ≤ 30 days, n (%) | 933 (77.3) | |
| Ischemic stroke, n (%) | 971 (80.4) | |
| Haemorrhagic stroke, n (%) | 236 (19.6) | |
| Dysphagia, n (%) | 226 (18.7) | |
| Neglect, n (%) | 170 (14.1) | |
| Aphasia, n (%) | 525 (43.4) | |
| Right body, n (%) | 602 (49.9) | |
| Left body, n (%) | 605 (50.1) | |
| Motor-FIM score at admission, mean (SD) | 18.6 (5.6) | |
| Cognitive-FIM score at admission, mean (SD) | 17.1 (9.2) | |
| Total FIM score, mean (SD) | 35.7 (13.0) | |
| Blood urea nitrogen (mg/dl), mean (SD) | 20.9 (10.1) | |
| Serum creatinine (mg/dl), mean (SD) | 0.89 (0.35) | |
| Estimated glomerular filtration rate (mL/min/1.73 m2), mean (SD) | 83 (24) | |
| Serum sodium (mmol/l), mean (SD) | 140.1 (5.5) | |
| Serum sodium < 135 mmol/l, n (%) | 51 (4.2) | |
| Haemoglobin (g/dl), mean (SD) | 13.2 (1.8) | |
| Total cholesterol (mg/dl), mean (SD) | ||
* Measured at admission to rehabilitation.
Results of the multivariate logistic regression analysis: beta (β) coefficients with standard deviations (SD), odds ratios with the 95% confidence intervals (CI) and the p-values are presented.
| Variable | β coefficients (SE) | Odds Ratio (95% CIs) | P-value |
|---|---|---|---|
| Age (per 5-year increase) | 0.269 (0.048) | 1.31 (1.19–1.44) | 0.000 |
| Diabetes | 0.352 (0.179) | 1.42 (1.00–2.02) | 0.050 |
| History of CAD | 0.762 (0.224) | 2.14 (1.38–3.32) | 0.001 |
| Atrial fibrillation | 0.408 (0.184) | 1.50 (1.05–2.16) | 0.027 |
| Anemia | 0.339 (0.175) | 1.40 (1.00–1.98) | 0.053 |
| Renal dysfunction (eGFR < 60 mL/min/1.73 m2) | 0.439 (0.203) | 1.55 (1.04–2.31) | 0.031 |
| Neglect | 0.609 (0.234) | 1.84 (1.16–2.91) | 0.009 |
| Cognitive FIM score (per 1-point increase) | − 0.053 (0.011) | 0.95 (0.93–0.97) | 0.000 |
Top-ranked variables in the logistic regression.
| Variable | χ2 | Likelihood ratio test p value |
|---|---|---|
| Age | 56.83 | 0.0000 |
| Cognitive FIM score | 80.86 | 0.0000 |
| History of CAD | 95.54 | 0.0001 |
| Neglect | 103.07 | 0.0061 |
| Renal dysfunction (eGFR < 60 mL/min/1.73 m2) | 108.90 | 0.0158 |
| Time from stroke occurrence to rehabilitation admission | 113.47 | 0.0325 |
| Diabetes | 117.46 | 0.0457 |
Measures of performance with 95% confidence intervals for the machine learning-based algorithms before and after the implementation of SMOTE on the test data.
| Algorithm | SMOTE | Sensitivity | Specificity | Accuracy | F-measure | AUC |
|---|---|---|---|---|---|---|
| RF | Not applied | 0.422 (0.395–0.451) | 0.904 (0.886–0.913) | 0.763 (0.738–0.786) | 0.510 | 0.844 (0.806–0.882) |
| GB | Not applied | 0.465 (0.437–0.493) | 0.888 (0.869–0.905) | 0.764 (0.739–0.787) | 0.535 | 0.810 (0.768–0.852) |
| ADA-B of RF | Not applied | 0.516 (0.488–0.544) | 0.879 (0.859–0.896) | 0.773 (0.748–0.796) | 0.571 | 0.870 (0.835–0.905) |
| RF | Applied | 0.879 (0.854—0.900) | 0.842 (0.815–0.865) | 0.861 (0.844–0.876) | 0.863 | 0.928 (0.902–0.954) |
| GB | Applied | 0.841 (0.814–0.864) | 0.863 (0.837–0.885) | 0.852 (0.834–0.867) | 0.850 | 0.927 (0.900–0.953) |
| ADA-B of RF | Applied | 0.891 (0.866–0.911) | 0.822 (0.794–0.846) | 0.857 (0.839–0.872) | 0.861 | 0.910 (0.880–0.939) |
Figure 2Receiver operating characteristics curves for the SMOTE RF algorithm and the logistic model.
Figure 3Top 10 features according to the SMOTE RF model.
Univariate statistical analysis of the most importance features identified by the SMOTE RF model.
| Variables | Survivors | Deceased | p-value |
|---|---|---|---|
| Age, mean (SD) | 69.15 (11.88) | 77.92 (8.84) | < 0.001’ |
| Length of the follow-up (days), mean (SD) | 1762 (1192) | 1258 (1117) | < 0.001’ |
| 108 | 39.7 | 60.3 | |
| 109 | 79.3 | 20.7 | < 0.001 |
| 110 | 74.1 | 25.9 | |
| Time from stroke onset to rehabilitation admission (days), mean (SD) | 21.9 (15.52) | 27.9 (18.39) | < 0.001’ |
| Total FIM score, mean (SD) | 36.9 (13.24) | 32.8 (12.14) | < 0.001’ |
| eGFR (mL/min/1.73 m2), mean (SD) | 85 (24) | 79 (26) | 0.002’ |
| Cognitive FIM score, mean (SD) | 17.9 (9.5) | 15.2 (8.5) | < 0.001’ |
| Right side of motor deficit, (%) | 50.5 | 48.4 | 0.522^ |
| Atrial fibrillation, (%) | 21.0 | 32.9 | < 0.001^ |
| Diabetes, (%) | 26.2 | 36.3 | < 0.001^ |
’ = Mann Whitney. ^ = Chi square.