| Literature DB >> 35864139 |
Satoru Tanioka1, Tetsushi Yago2, Katsuhiro Tanaka2, Fujimaro Ishida2, Tomoyuki Kishimoto3, Kazuhiko Tsuda3, Munenari Ikezawa4, Tomohiro Araki4, Yoichi Miura5, Hidenori Suzuki5.
Abstract
To examine whether machine learning (ML) approach can be used to predict hematoma expansion in acute intracerebral hemorrhage (ICH) with accuracy and widespread applicability, we applied ML algorithms to multicenter clinical data and CT findings on admission. Patients with acute ICH from three hospitals (n = 351) and those from another hospital (n = 71) were retrospectively assigned to the development and validation cohorts, respectively. To develop ML predictive models, the k-nearest neighbors (k-NN) algorithm, logistic regression, support vector machines (SVMs), random forests, and XGBoost were applied to the patient data in the development cohort. The models were evaluated for their performance on the patient data in the validation cohort, which was compared with previous scoring methods, the BAT, BRAIN, and 9-point scores. The k-NN algorithm achieved the highest area under the receiver operating characteristic curve (AUC) of 0.790 among all ML models, and the sensitivity, specificity, and accuracy were 0.846, 0.733, and 0.775, respectively. The BRAIN score achieved the highest AUC of 0.676 among all previous scoring methods, which was lower than the k-NN algorithm (p = 0.016). We developed and validated ML predictive models of hematoma expansion in acute ICH. The models demonstrated good predictive ability, showing better performance than the previous scoring methods.Entities:
Mesh:
Year: 2022 PMID: 35864139 PMCID: PMC9304401 DOI: 10.1038/s41598-022-15400-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Manually tuned hyperparameters and their values in each machine learning algorithm.
| Hyperparameter | Value | |
|---|---|---|
| k-nearest neighbors algorithm | n_neighbors | 3, 5, 7, 9, 11, 13 |
| Logistic regression | C | 0.001, 0.01, 0.1, 1, 10, 100 |
| Support vector machines | C | 0.001, 0.01, 0.1, 1, 10, 100 |
| Gamma | 0.001, 0.01, 0.1, 1, 10, 100 | |
| Random forests | n_estimators | 75, 125, 175, 225 |
| max_depth | 2, 3, 4, 5, 6, 7 | |
| XGBoost | num_round | 10, 20, 30 |
| eta | 0.01, 0.1 | |
| max_depth | 3, 4, 5, 6, 7 | |
| min_child_weight | 1, 2, 3, 4, 5 | |
| Colsample_bytree | 0.7, 0.8, 0.9 | |
| Subsample | 0.7, 0.8, 0.9 | |
| Gamma | 0, 0.1, 0.2 | |
| Alpha | 0, 0.01, 0.1 |
Figure 1A flow chart indicating the included and excluded patients in the development cohort (a) and the validation cohort (b). ICH indicates intracerebral hemorrhage.
Characteristics of the development and validation cohorts.
| Development cohort (n = 351) | Validation cohort (n = 71) | P Value | |
|---|---|---|---|
| Age, y | 73 (63–83) | 70 (56–78) | 0.017* |
| Sex (male) | 215 (61.3) | 42 (59.2) | 0.790† |
| History of intracerebral hemorrhage | 21 (6.0) | 3 (4.2) | 0.780† |
| History of cerebral infarction | 53 (15.1) | 3 (4.2) | 0.012† |
| History of ischemic heart disease | 16 (4.6) | 3 (4.2) | 1.000† |
| History of hypertension | 207 (59.0) | 43 (60.6) | 0.895† |
| History of diabetes mellitus | 77 (21.9) | 12 (16.9) | 0.426† |
| History of dyslipidemia | 133 (37.9) | 14 (19.7) | 0.004† |
| Anticoagulant use | 26 (7.4) | 11 (15.5) | 0.037† |
| Antiplatelet use | 72 (20.5) | 10 (14.1) | 0.251† |
| Glasgow Coma Scale | 15 (12–15) | 14 (11–15) | 0.373* |
| Systolic blood pressure, mmHg | 182.5 ± 32.7 | 191.1 ± 32.6 | 0.042‡ |
| Diastolic blood pressure, mmHg | 100.4 ± 22.3 | 104.1 ± 22.1 | 0.205‡ |
| PT-INR | 1.00 (0.95–1.05) | 0.94 (0.91–1.04) | 0.002* |
| White blood cell count, 106/mL | 7.50 (5.80–9.94) | 8.11 (5.76–10.33) | 0.625* |
| Hemoglobin, mg/dL | 13.6 ± 2.0 | 13.9 ± 2.1 | 0.243‡ |
| Platelet count, 106/mL | 211.4 ± 62.3 | 217.9 ± 55.7 | 0.415‡ |
| Serum creatinine, mg/dL | 0.73 (0.60–0.90) | 0.70 (0.55–0.86) | 0.177* |
| Serum total bilirubin, mg/dL | 0.7 (0.5–0.9) | 0.7 (0.6–0.8) | 0.702* |
| Time from onset to baseline CT scan, h | 2 (1–4) | 1 (1–2) | < 0.001* |
| Basal ganglia | 120 (34.2) | 31 (43.7) | 0.137† |
| Thalamus | 115 (32.8) | 23 (32.4) | 1.000† |
| Lobe | 73 (20.8) | 7 (9.9) | 0.031† |
| Brain stem | 15 (4.3) | 6 (8.5) | 0.141† |
| Cerebellum | 28 (8.0) | 4 (5.6) | 0.628† |
| Intraventricular hematoma extension | 144 (41.0) | 30 (42.3) | 0.895† |
| Baseline hematoma volume, mL | 11.9 (4.9–29.1) | 16.8 (6.2–27.9) | 0.190* |
| Intrahematoma hypodensities | 123 (35.0) | 37 (52.1) | 0.010† |
| Irregular hematoma shape | 211 (60.1) | 50 (70.4) | 0.110† |
Blend sign CT angiography spot sign** | 29 (8.3) 6 (7.1) | 5 (7.0) 13 (38.2) | 1.000† < 0.001† |
| Target systolic blood pressure | < 0.001† | ||
| Less than 140 mmHg | 183 (52.1) | 60 (84.5) | |
| Less than 180 mmHg | 168 (47.9) | 11 (15.5) | |
| Hematoma expansion | 71 (20.2) | 26 (36.6) | 0.005† |
Data are presented as n (%), mean ± standard deviation, or median (interquartile range).
CT = computed tomography; PT-INR = prothrombin time-international normalized ratio.
*Mann–Whitney U test between the development and validation cohorts.
†Fisher’s exact test between the development and validation cohorts.
‡Student’s t test between the development and validation cohorts.
**CT angiography is not performed in all patients.
Univariate analyses between expansion and no expansion groups in the development cohort.
| Expansion (n = 71) | No Expansion (n = 280) | P Value | |
|---|---|---|---|
| Age, y | 76 (67–84) | 71 (62–82) | 0.086* |
| Sex (male) | 54 (76.1) | 161 (57.5) | 0.004† |
| History of intracerebral hemorrhage | 8 (11.3) | 13 (4.6) | 0.048† |
| History of cerebral infarction | 10 (14.1) | 43 (15.4) | 0.855† |
| History of ischemic heart disease | 7 (9.9) | 9 (3.2) | 0.025† |
| History of hypertension | 37 (52.1) | 170 (60.7) | 0.224† |
| History of diabetes mellitus | 13 (18.3) | 64 (22.9) | 0.521† |
| History of dyslipidemia | 20 (28.2) | 113 (40.4) | 0.075† |
| Anticoagulant use | 13 (18.3) | 13 (4.6) | < 0.001† |
| Antiplatelet use | 20 (28.2) | 52 (18.6) | 0.099† |
| Glasgow Coma Scale | 14 (11–15) | 15 (12–15) | 0.108* |
| Systolic blood pressure, mmHg | 176.9 ± 29.3 | 183.9 ± 33.4 | 0.110‡ |
| Diastolic blood pressure, mmHg | 96.3 ± 22.1 | 101.4 ± 22.3 | 0.085‡ |
| PT-INR | 1.02 (0.98–1.11) | 0.99 (0.94–1.04) | < 0.001* |
| White blood cell count, 106/mL | 6.80 (5.57–8.41) | 7.77 (5.94–10.10) | 0.033* |
| Hemoglobin, mg/dL | 13.1 ± 1.8 | 13.7 ± 2.0 | 0.031‡ |
| Platelet count, 106/mL | 190.8 ± 58.1 | 216.7 ± 62.4 | 0.002‡ |
| Serum creatinine, mg/dL | 0.73 (0.63–0.91) | 0.73 (0.59–0.90) | 0.634* |
| Serum total bilirubin, mg/dL | 0.7 (0.5–1.0) | 0.7 (0.5–0.9) | 0.405* |
| Time from onset to baseline CT scan, h | 1 (1–3) | 2 (1–4) | 0.006* |
| Basal ganglia | 27 (38.0) | 93 (33.2) | 0.484† |
| Thalamus | 16 (22.5) | 99 (35.4) | 0.047† |
| Lobe | 26 (36.6) | 47 (16.8) | < 0.001† |
| Brain stem | 1 (1.4) | 14 (5.0) | 0.322† |
| Cerebellum | 1 (1.4) | 27 (9.6) | 0.024† |
| Intraventricular hematoma extension | 26 (36.6) | 118 (42.1) | 0.421† |
| Baseline hematoma volume, mL | 30.7 (17.1–57.2) | 9.5 (4.0–19.7) | < 0.001* |
| Intrahematoma hypodensities | 43 (60.6) | 80 (28.6) | < 0.001† |
| Irregular hematoma shape | 55 (77.5) | 156 (55.7) | 0.001† |
| Blend sign | 9 (12.7) | 20 (7.1) | 0.148† |
| Target systolic blood pressure | 0.894† | ||
| Less than 140 mmHg | 38 (53.5) | 145 (51.8) | |
| Less than 180 mmHg | 33 (46.5) | 135 (48.2) | |
Data are presented as n (%), mean ± standard deviation, or median (interquartile range).
CT = computed tomography; PT-INR = prothrombin time-international normalized ratio.
*Mann–Whitney U test between expansion and no expansion groups.
†Fisher’s exact test between expansion and no expansion groups.
‡Student’s t test between expansion and no expansion groups.
Test characteristics of previously reported scoring methods and machine learning models in the validation cohort.
| Accuracy | Sensitivity | Specificity | AUC | |
|---|---|---|---|---|
| BAT score | 0.606 (0.483–0.720) | 0.654 (0.443–0.828) | 0.578 (0.422–0.723) | 0.616 (0.497–0.734) |
| BRAIN score | 0.620 (0.497–0.732) | 0.885 (0.698–0.976) | 0.467 (0.317–0.621) | 0.676 (0.579–0.772) |
| 9-point score | 0.690 (0.569–0.795) | 0.538 (0.334–0.734) | 0.778 (0.629–0.888) | 0.658 (0.543–0.774) |
| k-nearest neighbors algorithm | 0.775 (0.660–0.865) | 0.846 (0.651–0.956) | 0.733 (0.581–0.854) | 0.790 (0.693–0.886) |
| Logistic regression | 0.648 (0.525–0.758) | 0.769 (0.564–0.910) | 0.578 (0.422–0.723) | 0.674 (0.563–0.784) |
| Support vector machines | 0.732 (0.614–0.831) | 0.769 (0.564–0.910) | 0.711 (0.557–0.836) | 0.740 (0.634–0.846) |
| Random forests | 0.775 (0.660–0.865) | 0.615 (0.406–0.798) | 0.867 (0.732–0.949) | 0.741 (0.633–0.849) |
| XGBoost | 0.732 (0.614–0.831) | 0.731 (0.522–0.884) | 0.733 (0.581–0.854) | 0.732 (0.623–0.841) |
Data are presented as value (95% confidence interval).
AUC = area under the receiver operating characteristic curve.