| Literature DB >> 35493925 |
Jianan Wang1, Xiaoxian Gong1, Hongfang Chen2, Wansi Zhong1, Yi Chen1, Ying Zhou1, Wenhua Zhang1, Yaode He1, Min Lou1.
Abstract
Background: Prognosis, recurrence rate, and secondary prevention strategies differ by different etiologies in acute ischemic stroke. However, identifying its cause is challenging. Objective: This study aimed to develop a model to identify the cause of stroke using machine learning (ML) methods and test its accuracy.Entities:
Keywords: cardioembolism; large-artery atherosclerosis; machine learning; small-artery occlusion; stroke
Year: 2022 PMID: 35493925 PMCID: PMC9051333 DOI: 10.3389/fnagi.2022.788637
Source DB: PubMed Journal: Front Aging Neurosci ISSN: 1663-4365 Impact factor: 5.702
Comparison of clinical characteristics between a cohort of algorithm development and a cohort of algorithm test.
| Cohort of algorithm development ( | Cohort of algorithm test ( | ||
| Female, | 5688 (41.8) | 1214 (39.5) | 0.019 |
| Age, year, median (IQR) | 72 (63–80) | 70 (61–80) | <0.001 |
| Baseline mRS score, median (IQR) | 2 (1–4) | 2 (1–4) | <0.001 |
| Baseline NIHSS score, median (IQR) | 3 (1–8) | 3 (1–6) | <0.001 |
| GCS score, median (IQR) | 15 (13–15) | 15 (14–15) | <0.001 |
| SBP at admission, mmHg, median (IQR) | 151 (135–167) | 151 (134–166) | 0.523 |
| DBP at admission, mmHg, median (IQR) | 84 (75–94) | 84 (75–94) | 0.859 |
| Glucose at admission, mmol/L, median (IQR) | 5.3 (4.7–6.5) | 5.3 (4.7–6.4) | 0.018 |
| BMI, kg/m2, median (IQR) | 23.2 (20.9–25.6) | 23.5 (21.5–25.7) | 0.001 |
| Hypertension, | 8841 (65.1) | 1955 (63.7) | 0.135 |
| Diabetes mellitus, | 2673 (19.7) | 559 (18.2) | 0.065 |
| Atrial fibrillation, | 2941 (21.6) | 507 (16.5) | <0.001 |
| Hyperlipemia, | 218 (1.6) | 42 (1.4) | 0.341 |
| Smoking, | 4379 (32.2) | 1071 (34.9) | 0.004 |
| Alcohol drinking, | 4766 (26.2) | 966 (26.2) | 0.682 |
| Coronary heart disease, | 1070 (7.9) | 178 (5.8) | <0.001 |
| Myocardial infarction, | 113 (0.8) | 22 (0.7) | 0.521 |
| Valvular heart disease, | 296 (2.2) | 49 (1.6) | 0.041 |
| Mitral stenosis, | 99 (0.7) | 23 (0.7) | 0.903 |
| Hyperhomocysteinemia, | 20 (0.1) | 5 (0.2) | 0.839 |
| Previous transient ischemic attack, | 57 (0.4) | 8 (0.3) | 0.202 |
| History of stroke, | 2962 (21.8) | 635 (20.7) | 0.176 |
| Renal insufficiency, | 206 (1.5) | 47 (1.5) | 0.951 |
| CE, | 6089 (44.8) | 1103 (35.9) | <0.001 |
| LAA, | 4539 (33.4) | 1269 (41.3) | <0.001 |
| SAO, | 2962 (21.8) | 698 (22.7) | 0.256 |
BMI, body mass index; CE, cardioembolism; DBP, diastolic blood pressure; GCS, Glasgow Coma Scale; IQR, interquartile range; LAA, large-artery atherosclerosis; mRS, modified Rankin Scale; NIHSS, National Institutes of Health Stroke Scale; SAO, small-artery occlusion; SBP, systolic blood pressure.
FIGURE 1Illustration of features contributing to the identification of CE by Gini importance values. CE, cardioembolism; NIHSS, National Institutes of Health Stroke Scale. Gini importance is a measurement of the feature importance in the model; the higher the value of Gini importance is, the more important the feature is.
FIGURE 2Illustration of features contributing to the identification of LAA by Gini importance values. LAA, large-artery atherosclerosis; LVO, large vessel occlusion; NIHSS, National Institutes of Health Stroke Scale. Gini importance is a measurement of the feature importance in the model; the higher the value of Gini importance is, the more important the feature is.
FIGURE 3Illustration of features contributing to the identification of SAO by Gini importance values. LVO, large vessel occlusion; NIHSS, National Institutes of Health Stroke Scale; SAO, small-artery occlusion. Gini importance is a measurement of the feature importance in the model; the higher the value of Gini importance is, the more important the feature is.
Comparison of six models to predict etiology.
| Etiology | AUC (95% CI) | Precision | Recall | F1 score | Accuracy |
|
| |||||
| RF | 0.981 (0.978–0.986) | 0.955 | 0.955 | 0.955 | 0.958 |
| LR | 0.976 (0.971–0.981) | 0.937 | 0.934 | 0.933 | 0.934 |
| XGBoost | 0.982 (0.978–0.986) | 0.959 | 0.959 | 0.959 | 0.959 |
| KNN | 0.974 (0.970–0.980) | 0.955 | 0.955 | 0.955 | 0.955 |
| Ada Boosting | 0.976 (0.971–0.981) | 0.940 | 0.937 | 0.937 | 0.937 |
| GBM | 0.982 (0.979–0.987) | 0.958 | 0.958 | 0.958 | 0.958 |
|
| |||||
| RF | 0.919 (0.911–0.928) | 0.847 | 0.849 | 0.848 | 0.849 |
| LR | 0.866 (0.857–0.877) | 0.785 | 0.771 | 0.775 | 0.771 |
| XGBoost | 0.920 (0.912–0.929) | 0.846 | 0.848 | 0.846 | 0.848 |
| KNN | 0.902 (0.893–0.912) | 0.833 | 0.836 | 0.833 | 0.836 |
| Ada Boosting | 0.916 (0.908–0.925) | 0.845 | 0.847 | 0.845 | 0.847 |
| GBM | 0.920 (0.978–0.986) | 0.846 | 0.848 | 0.846 | 0.848 |
|
| |||||
| RF | 0.918 (0.908–0.927) | 0.864 | 0.864 | 0.864 | 0.864 |
| LR | 0.855 (0.843–0.868) | 0.761 | 0.791 | 0.758 | 0.791 |
| XGBoost | 0.919 (0.910–0.928) | 0.868 | 0.867 | 0.867 | 0.867 |
| KNN | 0.837 (0.824–0.851) | 0.765 | 0.781 | 0.771 | 0.781 |
| Ada Boosting | 0.918 (0.909–0.927) | 0.857 | 0.860 | 0.858 | 0.861 |
| GBM | 0.919 (0.910–0.928) | 0.863 | 0.863 | 0.863 | 0.863 |
AUC, area under the curve; CE, cardioembolism; CI, confidence interval; GBM, gradient boosting machine; KNN, K-nearest neighbor; LAA, large-artery atherosclerosis; LR, logistic regression; RF, random forests; SAO, small-artery occlusion; XGBoost, extreme gradient boosting. The model method is more effective when the F1 score is higher.
FIGURE 4Confusion matrix of the model in identifying CE, LAA, and SAO on the test set. CE, cardioembolism; LAA, large-artery atherosclerosis; SAO, small-artery occlusion. Confusion matrices are calculated by comparing the position and classification of each measured sample with the actual corresponding position and classification. Each column represents the predicted category of the data, and each row represents the true attribution category.