| Literature DB >> 30245663 |
Eunjeong Park1, Hyuk-Jae Chang2, Hyo Suk Nam3.
Abstract
Bayesian network is an increasingly popular method in modeling uncertain and complex problems, because its interpretability is often more useful than plain prediction. To satisfy the core requirement in medical research to obtain interpretable prediction with high accuracy, we constructed an inference engine for post-stroke outcomes based on Bayesian network classifiers. The prediction system that was trained on data of 3,605 patients with acute stroke forecasts the functional independence at 3 months and the mortality 1 year after stroke. Feature selection methods were applied to eliminate less relevant and redundant features from 76 risk variables. The Bayesian network classifiers were trained with a hill-climbing searching for the qualified network structure and parameters measured by maximum description length. We evaluated and optimized the proposed system to increase the area under the receiver operating characteristic curve (AUC) while ensuring acceptable sensitivity for the class-imbalanced data. The performance evaluation demonstrated that the Bayesian network with selected features by wrapper-type feature selection can predict 3-month functional independence with an AUC of 0.889 using only 19 risk variables and 1-year mortality with an AUC of 0.893 using 24 variables. The Bayesian network with 50 features filtered by information gain can predict 3-month functional independence with an AUC of 0.875 and 1-year mortality with an AUC of 0.895. We also built an online prediction service, Yonsei Stroke Outcome Inference System, to substantialize the proposed solution for patients with stroke.Entities:
Keywords: bayesian network; decision support techniques; imbalanced data; machine learning classification; prognostic model; stroke
Year: 2018 PMID: 30245663 PMCID: PMC6137617 DOI: 10.3389/fneur.2018.00699
Source DB: PubMed Journal: Front Neurol ISSN: 1664-2295 Impact factor: 4.003
Figure 1Process of a prediction system for post-stroke outcomes.
Demographic characteristics and comparison of outcome at 3 months and death within 1 year.
| Age | 65.9 ± 12.6 | 64.0 ± 12.3 | 71.2 ± 11.9 | <0.001 | 64.8 ± 12.4 | 73.9 ± 11.2 | <0.001 |
| Sex | <0.001 | 0.016 | |||||
| F | 1,416 (39.3%) | 969 (36.5%) | 447 (47.0%) | 1,222 (38.5%) | 194 (44.7%) | ||
| M | 2,189 (60.7%) | 1,684 (63.5%) | 505 (53.0%) | 1,949 (61.5%) | 240 (55.3%) | ||
| Hypertension | 2,675 (74.2%) | 1,940 (73.1%) | 735 (77.2%) | 0.015 | 2,675 (74.2%) | 1,940 (73.1%) | 0.023 |
| Diabetes | 1,144 (31.7%) | 827 (31.2%) | 317 (33.3%) | 0.243 | 1,144 (31.7%) | 827 (31.2%) | 0.282 |
| Hypercholesterolemia | 747 (20.7%) | 554 (20.9%) | 193 (20.3%) | 0.726 | 685 (21.6%) | 62 (14.3%) | 0.001 |
| Current smoking | 856 (23.7%) | 704 (26.5%) | 152 (16.0%) | <0.001 | 856 (23.7%) | 704 (26.5%) | <0.001 |
| Old stroke | 472 (13.1%) | 301 (11.3%) | 171 (18.0%) | <0.001 | 401 (12.6%) | 71 (16.4%) | 0.038 |
| Atrial fibrillation | 813 (22.6%) | 482 (18.2%) | 331 (34.8%) | <0.001 | 623 (19.6%) | 190 (43.8%) | <0.001 |
| Coronary artery disease | 811 (22.5%) | 603 (22.7%) | 208 (21.8%) | 0.608 | 717 (22.6%) | 94 (21.7%) | 0.701 |
| Congestive heart failure | 184 (5.1%) | 110 (4.1%) | 74 (7.8%) | <0.001 | 134 (4.2%) | 50 (11.5%) | <0.001 |
| Peripheral artery obstructive disease | 110 (3.1%) | 60 (2.3%) | 50 (5.3%) | <0.001 | 85 (2.7%) | 25 (5.8%) | 0.001 |
| Initial NIHSS score | 5.6 ± 6.3 | 3.4 ± 4.0 | 11.5 ± 7.5 | <0.001 | 4.8 ± 5.4 | 11.5 ± 8.4 | <0.001 |
| TOAST | <0.001 | <0.001 | |||||
| LAC | 321 (8.9%) | 285 (10.7%) | 36 (3.8%) | 312 (9.8%) | 9 (2.1%) | ||
| LAA | 741 (20.6%) | 504 (19.0%) | 237 (24.9%) | 661 (20.8%) | 80 (18.4%) | ||
| CE | 991 (27.5%) | 688 (25.9%) | 303 (31.8%) | 823 (26.0%) | 168 (38.7%) | ||
| SOD | 89 (2.5%) | 68 (2.6%) | 21 (2.2%) | 80 (2.5%) | 9 (2.1%) | ||
| UT | 668 (18.5%) | 498 (18.8%) | 170 (17.9%) | 587 (18.5%) | 81 (18.7%) | ||
| UN | 785 (21.8%) | 607 (22.9%) | 178 (18.7%) | 703 (22.2%) | 82 (18.9%) | ||
| UI | 10 (0.3%) | 3 (0.1%) | 7 (0.7%) | 5 (0.2%) | 5 (1.2%) | ||
| Anemia | 617 (17.1%) | 361 (13.6%) | 256 (26.9%) | <0.001 | 450 (14.2%) | 167 (38.5%) | <0.001 |
| Thrombolysis | 485 (13.5%) | 272 (10.3%) | 213 (22.4%) | <0.001 | 377 (11.9%) | 108 (24.9%) | <0.001 |
| Symtomatic ICH | 92 (2.6%) | 10 (0.4%) | 82 (8.6%) | <0.001 | 43 (1.4%) | 49 (11.3%) | <0.001 |
| Herniation | 105 (2.9%) | 3 (0.1%) | 102 (10.7%) | <0.001 | 38 (1.2%) | 67 (15.4%) | <0.001 |
| Body weight | 62.9 ± 11.1 | 64.0 ± 10.9 | 60.0 ± 11.2 | <0.001 | 63.6 ± 11.0 | 57.8 ± 10.8 | <0.001 |
| hgb | 13.8 ± 2.0 | 14.0 ± 1.9 | 13.3 ± 2.2 | <0.001 | 14.0 ± 1.9 | 12.7 ± 2.3 | <0.001 |
| hct | 40.6 ± 5.6 | 41.1 ± 5.3 | 39.3 ± 6.1 | <0.001 | 41.0 ± 5.3 | 37.9 ± 6.5 | <0.001 |
| esr | 23.9 ± 22.2 | 21.2 ± 20.1 | 31.3 ± 25.8 | <0.001 | 22.1 ± 20.6 | 36.5 ± 28.8 | <0.001 |
| pt | 1.0 ± 0.5 | 1.0 ± 0.3 | 1.0 ± 0.7 | 0.123 | 1.0 ± 0.5 | 1.0 ± 0.2 | 0.002 |
| Albumin | 4.2 ± 0.5 | 4.3 ± 0.4 | 4.0 ± 0.5 | <0.001 | 4.3 ± 0.4 | 3.9 ± 0.6 | <0.001 |
| Prealbumin | 223.7 ± 72.6 | 239.0 ± 69.9 | 205.6 ± 71.6 | <0.001 | 233.3 ± 69.8 | 186.8 ± 71.4 | <0.001 |
| Fibrinogen | 322.8 ± 94.3 | 316.1 ± 83.9 | 341.5 ± 116.8 | <0.001 | 320.1 ± 88.5 | 342.5 ± 128.5 | 0.001 |
| hsCRP | 11.3 ± 48.4 | 7.5 ± 49.7 | 22.2 ± 42.7 | <0.001 | 9.2 ± 48.4 | 27.3 ± 45.5 | <0.001 |
| D-dimer | 779.0 ± 3846.1 | 418.4 ± 1704.4 | 1788.2 ± 6834.6 | <0.001 | 464.5 ± 1759.3 | 3079.8 ± 9723.3 | <0.001 |
Figure 2Top 15 variables in dimension reduction for post-stroke outcome prediction: (A) variables filtered by ranks of information gain for predicting functional independence at 3 months, (B) variables selected by the wrapper of the Bayesian network classifier with greedy subset selection for predicting functional independence at 3 months, (C) variables filtered by ranks of information gain for predicting 1-year mortality, and (D) variables selected by the wrapper of the Bayesian network classifier with greedy subset selection for predicting 1-year mortality.
Figure 3Performance evaluation of Bayesian network-based classifiers: (A) performance of classifiers forecasting 90-day functional independence and (B) performance of classifiers for 1-year mortality prediction.
Figure 4Bayesian network for predicting functional independence at 3 months. The tree-augmented Bayesian network used 19 variables selected by the wrapper of the Bayesian network for prediction.
Figure 5Bayesian network for predicting 1-year mortality. The tree-augmented Bayesian network used 24 variables selected by the wrapper of the Bayesian network for prediction.
Figure 6Screenshots of an online prediction system, Y-SOIS (Yonsei Stroke Outcome Inference System). (A) Y-SOIS forecasts the functional independence at 3 months and (B) Y-SOIS forecasting the 1-year mortality.