| Literature DB >> 33329298 |
Xiang Li1,2, XiDing Pan3, ChunLian Jiang4, MingRu Wu1, YuKai Liu5, FuSang Wang1,2, XiaoHan Zheng1,2, Jie Yang6, Chao Sun1,2, YuBing Zhu3, JunShan Zhou5, ShiHao Wang7, Zheng Zhao3, JianJun Zou2.
Abstract
Background and Purpose: Accurate prediction of functional outcome after stroke would provide evidence for reasonable post-stroke management. This study aimed to develop a machine learning-based prediction model for 6-month unfavorable functional outcome in Chinese acute ischemic stroke (AIS) patient.Entities:
Keywords: cerebral ischemia; machine learning; prediction; stroke; unfavorable outcome
Year: 2020 PMID: 33329298 PMCID: PMC7710984 DOI: 10.3389/fneur.2020.539509
Source DB: PubMed Journal: Front Neurol ISSN: 1664-2295 Impact factor: 4.003
Figure 1Flow chart illustrating patient selection. mRS, modified Rankin Scale; NIHSS, National Institute of Health stroke scale.
Clinical, demographic and laboratory data of the patients in the training set stratified according to 6-month favorable or unfavorable outcome after acute ischemic stroke.
| Patients, n | 955 | 433 | |
| Age, years, median (IQR) | 66 (58–74) | 77 (68–83) | <0.0001 |
| Sex, n (%) | <0.0001 | ||
| Male | 682 (71.4) | 254 (58.7) | |
| Female | 273 (28.6) | 179 (41.3) | |
| Onset-to-admission delay <4.5 h, n (%) | 236 (24.7) | 127 (29.3) | 0.070 |
| Medical history, n (%) | |||
| Hypertension | 650 (68.1) | 317 (73.2) | 0.053 |
| Diabetes mellitus | 248 (26.0) | 143 (33.0) | 0.007 |
| Hyperlipidemia | 29 (3.0) | 6 (1.4) | 0.069 |
| Coronary artery disease | 100 (10.5) | 77 (17.8) | <0.0001 |
| Atrial fibrillation | 75 (7.9) | 95 (21.9) | <0.0001 |
| Previous cerebral infarction | 123 (12.9) | 102 (23.6) | <0.0001 |
| Valvular heart disease | 13 (1.4) | 8 (1.8) | 0.492 |
| Smoking, n (%) | <0.0001 | ||
| Never smoker | 394 (41.3) | 255 (58.9) | |
| Former smoker | 129 (13.5) | 66 (15.2) | |
| Current smoker | 432 (45.2) | 112 (25.9) | |
| Drinking, n (%) | <0.0001 | ||
| Never drinker | 525 (55.0) | 307 (70.9) | |
| Former drinker | 84 (8.8) | 47 (10.9) | |
| Current drinker | 346 (36.2) | 79 (18.2) | |
| Baseline data | |||
| Premorbid mRS, median (IQR) | 0 (0–0) | 0 (0–2) | <0.0001 |
| NIHSS at admission, median (IQR) | 3 (2–5) | 10 (5–16) | <0.0001 |
| BMI, kg/m2, median (IQR) | 24.38 (22.38–26.64) | 24.03 (21.60–26.37) | 0.046 |
| Pulse, times/min, median (IQR) | 76 (70–80) | 76 (70–84) | 0.005 |
| Systolic BP, mmHg, median (IQR) | 140 (130–160) | 142 (130–160) | 0.350 |
| Diastolic BP, mmHg, median (IQR) | 84 (80–94) | 83 (78–95) | 0.533 |
| Platelet count, 109/L, median (IQR) | 195 (159–234) | 188 (150–238) | 0.159 |
| Urea nitrogen, mmol/L, median (IQR) | 5.23 (4.4–6.34) | 6.12 (4.71–7.75) | <0.0001 |
| Creatinine, μmol/L, median (IQR) | 71 (59–83) | 76 (62–97) | <0.0001 |
| FBG, mmol/L, median (IQR) | 5.08 (4.50–6.21) | 6.40 (5.05–7.99) | <0.0001 |
| TC, mmol/L, median (IQR) | 4.41 (3.76–5.16) | 4.41 (3.64–5.18) | 0.574 |
| TG, mmol/L, median (IQR) | 1.31 (0.96–1.86) | 1.18 (0.84–1.59) | <0.0001 |
| LDL, mmol/l, median (IQR) | 2.71 (2.13–3.31) | 2.76 (1.96–3.27) | 0.470 |
| HDL, mmol/l, median (IQR) | 1.05 (0.9–1.23) | 1.08 (0.9–1.26) | 0.165 |
| Endovascular therapy, n (%) | 72 (7.5) | 55 (12.7) | 0.002 |
| IV thrombolysis, n (%) | 208 (21.8) | 119 (27.5) | 0.020 |
mRS, modified Rankin Scale; IQR, interquartile range; NIHSS, National Institute of Health stroke scale; BMI, body mass index; BP, blood pressure; FBG, fasting blood glucose; TC, total cholesterol; TG, triglyceride; LDL, low-density lipoprotein; HDL, high-density lipoprotein; IV, intravenous.
calculated using Mann-Whitney U-test.
included into the multiple logistic regression models (P < 0.1).
The area under the curve (AUC) of training set and inner validation set.
| LR | 0.867 (0.847–0.888) | 0.862 (0.812–0.911) |
| SVM | 0.874 (0.855–0.894) | 0.871 (0.840–0.901) |
| RFC | 0.897 (0.880–0.915) | 0.866 (0.831–0.902) |
| XGBoost | 0.890 (0.872–0.908) | 0.867 (0.833–0.901) |
| DNN | 0.877 (0.858–0.897) | 0.860 (0.825–0.896) |
| LR | 0.874 (0.853–0.894) | 0.865 (0.833–0.897) |
| RFC | 0.899 (0.881–0.917) | 0.865 (0.835–0.894) |
LR, logistic regression; SVM, support vector machine; RFC, random forest classifier; XGBoost, extreme gradient boosting; DNN, fully-connected deep neural network.
indicates model developed with 21 variables.
Scores for each model in the testing set.
| LR | 0.857 (0.814–0.900) | 0.912 | 0.620 | 0.761 | 0.821 |
| SVM | 0.865 (0.823–0.907) | 0.912 | 0.602 | 0.756 | 0.816 |
| RFC | 0.862 (0.820–0.904) | 0.883 | 0.657 | 0.717 | 0.813 |
| XGBoost | 0.858 (0.815–0.901) | 0.895 | 0.630 | 0.731 | 0.813 |
| DNN | 0.867 (0.827–0.908) | 0.891 | 0.556 | 0.811 | 0.821 |
| LR | 0.866 (0.825–0.907) | 0.921 | 0.593 | 0.780 | 0.821 |
| RFC | 0.874 (0.835–0.912) | 0.950 | 0.500 | 0.818 | 0.810 |
AUC, the area under the curve; LR, logistic regression; SVM, support vector machine; RFC, random forest classifier; XGBoost, extreme gradient boosting; DNN, fully-connected deep neural network.
indicates model developed with 21 variables.
Figure 2The receiver operating characteristic (ROC) curves of the machine learning (ML) models on the testing set. LR, logistic regression; SVC, support vector machine; RFC, random forest classifier; XGBoost, extreme gradient boosting; DNN, fully-connected deep neural network.
Figure 3The receiver operating characteristic (ROC) curves of the random forest classifier (RFC) and previous models on the testing set. HIAT, Houston Intra-Arterial Therapy score; THRIVE, Totaled Health Risks in Vascular Events score; NADE, NIHSS score on admission, age, previous diabetes mellitus and creatinine.
Figure 4The receiver operating characteristic (ROC) curves of the random forest classifier (RFC) and logistic regression (LR) on the testing set. * indicates model developed with 21 variables.
Top 6 important features in the models with 21 variables.
| 1 | NIHSS at admission | NIHSS at admission |
| 2 | Premorbid mRS | Age |
| 3 | Age | Premorbid mRS |
| 4 | Fasting blood glucose | Fasting blood glucose |
| 5 | Creatinine | Urea nitrogen |
| 6 | Sex | Creatinine |
LR, logistic regression; RFC, random forest classifier; NIHSS, National Institute of Health stroke scale; mRS, modified Rankin Scale.
indicates model developed with 21 variables.