| Literature DB >> 35072631 |
Po-Yuan Su1, Yi-Chia Wei2,3,4, Hung-Yu Wei1, Tsong-Hai Lee5,6, Hao Luo1, Chi-Hung Liu5,6, Wen-Yi Huang2,6, Kuan-Fu Chen7,8, Ching-Po Lin3.
Abstract
BACKGROUND: Timely and accurate outcome prediction plays a vital role in guiding clinical decisions on acute ischemic stroke. Early condition deterioration and severity after the acute stage are determinants for long-term outcomes. Therefore, predicting early outcomes is crucial in acute stroke management. However, interpreting the predictions and transforming them into clinically explainable concepts are as important as the predictions themselves.Entities:
Keywords: SHapley Additive exPlanations; acute ischemic stroke; cerebrovascular disease; early outcome; explanation; machine learning; prediction; random forest
Year: 2022 PMID: 35072631 PMCID: PMC8994144 DOI: 10.2196/32508
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Data enrollment. After the initial enrollment of 3589 patients, data cleaning excluded 809 patients and left 2780 eligible patients. The enrolled data set underwent k-fold cross-validation. In 5 folds, the data set was randomly divided such that 80% was for training and 20% for testing in each fold. The results of cross-validation underwent performance comparison with the ground truth and are expressed as the area under the curve of the receiver operating characteristic curve. mRS: modified Rankin Scale; NIHSS: National Institutes of Health Stroke Scale.
Figure 2Prediction of modified Rankin Scale (mRS) at hospital discharge. The outcome variable mRS at discharge was transformed from 6 ordinal classes to a binary class. The good outcome was defined by mRS {0,1,2}, whereas the bad outcome was indicated by mRS {3,4,5,6}. (A) The t-SNE graph shows the distribution of the data. Orange indicates discharge mRS 3-6 and blue represents mRS 0-2. (B) ROC curves for 4 machine learning models. (C) Comparisons of AUC between the data with and without normalization of numerical features. (D) AUC for different amounts of data. AUC: area under the curve; DNN: deep neural network; LGBM; light gradient boosting machine; mRS: modified Rankin Scale; RF: random forest; ROC: receiver operating characteristic; SVM: support vector machine; t-SNE: t-distributed stochastic neighbor embedding.
Figure 3Feature importance for predicting modified Rankin Scale at hospital discharge. (A) Top 5 important features of random forest and light gradient boosting machine. SHapley Additive exPlanations of (B) random forest and (C) light gradient boosting machine. Red indicates higher feature sample values, and blue indicates lower feature sample values. For example, the higher the total National Institutes of Health Stroke Scale scores at emergency room and at ward admission, the more severe would be the stroke outcome. ALT: alanine transaminase; APTT: activated partial thromboplastin time; DM: diabetes mellitus; ER: emergency room; LGBM: light gradient boosting machine; LOC: level of consciousness; NIHSS: National Institutes of Health Stroke Scale; RF: random forest; SHAP: SHapley Additive exPlanations. Wd: ward.
Figure 4Prediction of in-hospital deterioration. (A) Visualization by t-distributed stochastic neighbor embedding of the original sample shows an imbalanced outcome. The 3 resampling methods processed the imbalanced data with (B) random under sampling decreasing the majority class, (C) random over sampling increasing the minority class, and (D) synthetic minority over-sampling technique with nominal continuous data synthesis from the minority class. (E) Receiver operating characteristic curves for predicting in-hospital deterioration from the data without resampling. (F) Comparison of the area under the curve in the different resampling methods. Random under sampling was a reasonable choice for resampling. It improved the performance of the random forest, light gradient boosting machine, and support vector machine models, but not the deep neural network. The deep neural network performed better on the original data set than on the resampled data set. DNN: deep neural network; LGBM: light gradient boosting machine; RF: random forest; ROC: receiver operating characteristic; ROS: random over sampling; RUS: random under sampling; SMOTE-NC: synthetic minority over-sampling technique-nominal continuous; SVM: support vector machine.
Figure 5Feature importance for predicting in-hospital deterioration (without resampling). (A) Top 5 important features include initial systolic blood pressure at hospital admission in random forest and light gradient boosting machine. National Institutes of Health Stroke Scale total score at ward admission is also an important feature in both models. SHapley Additive exPlanations of (B) random forest and of (C) light gradient boosting machine. ALT: alanine transaminase; APTT: activated partial thromboplastin time; BUN: blood urea nitrogen; DBP: diastolic blood pressure; DM: diabetes mellitus; ER: emergency room; HDL: high-density lipoprotein; LDL: low-density lipoprotein; NIHSS: National Institutes of Health Stroke Scale; PT: prothrombin time; RBC: red blood cell; SBP: systolic blood pressure; WBC: white blood cell; Wd: ward.