| Literature DB >> 35046804 |
Bin Zhu1, Jianlei Zhao2, Mingnan Cao1, Wanliang Du3, Liuqing Yang4, Mingliang Su4, Yue Tian1, Mingfen Wu1, Tingxi Wu1, Manxia Wang2, Xingquan Zhao3, Zhigang Zhao1.
Abstract
Background: Thrombolysis with r-tPA is recommended for patients after acute ischemic stroke (AIS) within 4.5 h of symptom onset. However, only a few patients benefit from this therapeutic regimen. Thus, we aimed to develop an interpretable machine learning (ML)-based model to predict the thrombolysis effect of r-tPA at the super-early stage.Entities:
Keywords: acute ischemic stroke; machine learning algorithms; models; r-tPA; thrombolysis
Year: 2022 PMID: 35046804 PMCID: PMC8762247 DOI: 10.3389/fphar.2021.759782
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Demographic and laboratory data of the AIS patients stratified according to the NIHSS score.
| Variable | Total | Favorable prognosis | Unfavorable prognosis |
|
|---|---|---|---|---|
| 353 | 156 | 197 | ||
| Age, y | 63.0 (56.0, 71.0) | 63.0 (55.0, 70.25) | 63.0 (56.0, 71.0) | 0.438 |
| BMI, kg/m2
| 25.24 (23.05, 27.36) | 25.33 (23.03, 27.46) | 25.15 (23.12, 27.34) | 0.373 |
| Gender | 0.095 | |||
| Male | 261 (73.94%) | 108 (69.23%) | 153 (77.66%) | |
| Female | 92 (26.06%) | 48 (30.77%) | 44 (22.34%) | |
| TIA, n (%) | 0.045 | |||
| No | 261 (73.94%) | 108 (69.23%) | 153 (77.66%) | |
| Yes | 70 (19.83%) | 39 (25.0%) | 31 (15.74%) | |
| Missing | 22 | |||
| NIHSS score at admission | 5.0 (3.0, 9.0) | 4.0 (2.0, 9.0) | 5.0 (3.0, 8.5) | 0.003 |
| NIHSS score after rt-PA 1 h | 3.0 (1.0, 6.0) | 1.0 (0.0, 2.0) | 4.0 (2.0, 8.0) | <0.001 |
| BNP | 35.6 (16.9, 97.28) | 28.0 (13.65, 81.75) | 42.4 (21.6, 103.4) | 0.003 |
| APTT (time) | 29.6 (27.2, 31.5) | 29.8 (28.18, 31.58) | 29.0 (26.7, 31.5) | 0.011 |
| RDW-CV, % | 12.7 (12.4, 13.2) | 12.7 (12.4, 13.1) | 12.8 (12.4, 13.2) | 0.01 |
| MO, % | 5.2 (4.2, 6.5) | 5.65 (4.65, 6.73) | 5.0 (4.0, 5.9) | 0.002 |
| FDP, ng/ml | 1.33 (0.96, 2.0) | 1.205 (0.88, 1.8) | 1.5 (1.03, 2.13) | 0.001 |
| MYO, ng/ml | 42.0 (30.35–63.65) | 38.95 (27.8, 58.58) | 44.4 (33.45, 68.0) | 0.007 |
| EO, % | 1.2 (0.5, 2.2) | 1.3 (0.6, 2.4) | 1.1 (0.4–2.1) | 0.04 |
| GR, % | 69.68 ± 12.038 | 67.04 ± 11.98 | 71.741 ± 11.98 | 0.001 |
| Glu, mmol/L | 6.84 (5.92, 8.80) | 6.44 (5.68, 8.33) | 7.27 (6.15, 9.11) | 0.003 |
| cTnl, ng/ml | 0.003 (0.001, 0.006) | 0.002 (0.001, 0.004) | 0.003 (0.002, 0.007) | 0.001 |
| CK-MB, ng/ml | 1.2 (0.8, 1.6) | 1.1 (0.8, 1.5) | 1.2 (0.8, 1.7) | 0.036 |
| D-D, ug/ml | 0.6 (0.4, 0.9) | 0.6 (0.4, 0.8) | 0.61 (0.45, 0.99) | 0.012 |
| NLR | 2.87 (2.08, 4.63) | 2.59 (1.90, 4.16) | 3.04 (2.28, 4.79) | 0.002 |
Values are presented as median (IQR).
For continuous variables, values are presented as mean ± SD.
TIA:Transient Ischemic Attacks.
Patient characteristics divided by training data set and testing data set.
| Characteristic | Total ( | Training data set ( | Testing data set ( |
|---|---|---|---|
| Age, y | 63.0 (56.0, 71.0) | 63.0 (56.0, 70.0) | 63.0 (56.0, 72.5) |
| BMI, kg/m2
| 25.24 (23.39, 27.06) | 25.24 (23.38, 27.04) | 25.249 (23.41, 27.14) |
| Gender | |||
| Male | 261 (73.94%) | 208 (73.76%) | 53 (74.65%) |
| Female | 92 (26.06%) | 74 26.24%) | 18 (25.35%) |
| TIA, n (%) | |||
| No | 283 (80.17%) | 225 (79.79%) | 58 (81.69%) |
| Yes | 70 (19.83%) | 57 (20.21%) | 13 (18.31%) |
| BNP | 35.6 (17.8, 96.7) | 35.6 (17.0, 91.78) | 39.3 (19.15, 102.25) |
| APTT (time) | 29.6 (27.2, 31.5) | 29.6 (27.33, 31.2) | 29.2 (26.95, 31.95) |
| RDW-CV, % | 12.7 (12.4, 13.2) | 12.7 (12.4, 13.2) | 12.8 (12.4, 13.15) |
| MO, % | 5.2 (4.6, 5.9) | 5.2 (4.6, 5.8) | 5.2 (4.8, 6.55) |
| FDP, ng/ml | 1.33 (0.97, 2.0) | 1.3 (0.94, 1.96) | 1.52 (1.025, 2.02) |
| MYO, ng/ml | 42.0 (31.0, 63.0) | 42.15 (31.83, 62.53) | 38.8 (26.25, 64.0) |
| EO, % | 1.2 (0.5, 2.2) | 1.1 (0.43, 2.2) | 1.5 (0.6, 2.3) |
| GR, % | 69.36 ± 10.44 | 69.78 ± 10.27 | 67.69 ± 10.27 |
| Glu, mmol/L | 6.84 (6.23, 7.96) | 6.84 (6.19, 8.01) | 6.84 (6.23, 7.03) |
| cTnl, ng/ml | 0.003 (0.001, 0.006) | 0.003 (0.001, 0.005) | 0.003 (0.002, 0.007) |
| CK-MB, ng/ml | 1.2 (0.8, 1.6) | 1.2 (0.8, 1.6) | 1.1 (0.8, 1.49) |
| D-D, μg/ml | 0.6 (0.41, 0.9) | 0.6 (0.43, 0.87) | 0.6 (0.4, 0.92) |
| NLR | 2.87 (2.08, 4.63) | 2.9 (2.13, 4.67) | 2.709 (1.94, 4.34) |
Values are presented as median (IQR).
For continuous variables, values are presented as mean ± SD.
TIA:Transient Ischemic Attacks.
Summary of prediction results of six ML algorithms based on the training data set and testing data set.
| Model | LR | RF | XGBoost | AdaBoost | GBDT | LGBM |
|---|---|---|---|---|---|---|
| Train data set AUC (95% CI) | 0.58 (0.52, 0.65) | 0.8 (0.75, 0.85) | 0.88 (0.84, 0.92) | 0.842 (0.80, 0.89) | 0.97 (0.96, 0.99) | 0.96 (0.94, 0.98) |
| Testing data set AUC (95% CI) | 0.70 (0.57, 0.82) | 0.79 (0.69, 0.90) | 0.80 (0.69, 0.90) | 0.77 (0.66, 0.88) | 0.82 (0.72, 0.92) | 0.81 (0.71, 0.91) |
| Specificity | 0.54 | 0.69 | 0.65 | 0.70 | 0.68 | 0.64 |
| Sensitivity | 0.78 | 0.85 | 0.88 | 0.80 | 0.87 | 0.91 |
| F1 | 0.75 | 0.77 | 0.77 | 0.78 | 0.80 | 0.79 |
| Youden index | 0.32 | 0.53 | 0.53 | 0.50 | 0.54 | 0.55 |
| NPV | 0.75 | 0.79 | 0.86 | 0.68 | 0.82 | 0.89 |
| PPV | 0.58 | 0.77 | 0.70 | 0.81 | 0.74 | 0.67 |
| PLR | 1.69 | 2.71 | 2.51 | 2.69 | 2.67 | 2.53 |
| NLR | 0.41 | 0.22 | 0.18 | 0.29 | 0.20 | 0.15 |
FIGURE 1ROC curves of six ML algorithms based on variables in the testing data set.
Results of LIME with GBDT model under seven most important variables. Four patients were random selected to interpret of sample prediction results using true negative, true positive, false negative, and false positive.
| Model | GBDT |
|---|---|
| Train data set AUC (95% CI) | 0.83 (0.78, 0.87) |
| Testing data set AUC (95% CI) | 0.81 (0.71, 0.91) |
| Specificity | 0.61 |
| Sensitivity | 0.9 |
| F1 | 0.79 |
| Youden index | 0.51 |
| NPV | 0.89 |
| PPV | 0.63 |
| PLR | 2.31 |
| NLR | 0.16 |
FIGURE 2Using recursive feature elimination method to screen the optimal variables. (A) Seven variables were employed for the optimal GBDT algorithm. (B) ROC curves of the GBDT model based on selected variables.
FIGURE 3Seven most important variables and their impact on the GBDT model output by SHAP analysis. (A) Summary of SHAP analysis on the data set. One dot represents a case in the data set, and the color of a dot indicates the value of the feature. Blue indicates the lowest range and red the highest range. (B) Ranking of the seven variables' importance indicated by SHAP analysis.
FIGURE 4Results of LIME with GBDT applied to four random selected patients.