| Literature DB >> 34986813 |
Shuo Zhang1,2, Jing Wang1,2, Lulu Pei3, Kai Liu3, Yuan Gao3, Hui Fang3, Rui Zhang3, Lu Zhao3, Shilei Sun3, Jun Wu3, Bo Song3, Honghua Dai2,4, Runzhi Li5, Yuming Xu6.
Abstract
BACKGROUND: TOAST subtype classification is important for diagnosis and research of ischemic stroke. Limited by experience of neurologist and time-consuming manual adjudication, it is a big challenge to finish TOAST classification effectively. We propose a novel active deep learning architecture to classify TOAST.Entities:
Keywords: Active learning; Classification algorithm; Interpretability; Ischemic Stroke; Loss function
Mesh:
Year: 2022 PMID: 34986813 PMCID: PMC8729146 DOI: 10.1186/s12911-021-01721-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The basic schematic of our work
Fig. 2The detailed schematic of our proposed network with active model adaptation
Fig. 3Causal CNN architecture
Distribution of TOAST subtype in the cohort of patients
| Etiologic subtypes of ischemic stroke | Number of patients | Proportion of subtypes (%) |
|---|---|---|
| Large artery atherosclerosis (LAA) | 1290 | 56 |
| Cardioembolism (CE) | 107 | 5 |
| Small artery occlusion (SAO) | 550 | 24 |
| Other determined cause(OC) | 81 | 3 |
| Undetermined cause(UND) | 282 | 12 |
Inclusion and exclusion criteria
| Inclusion criteria | Exclusion criteria |
|---|---|
| Age of patient over 18 years | Hemorrhagic stroke |
| Cerebral infarction and TIA | |
| Time of onset and admission over 7days | Non-cerebrovascular disease event |
| Sign informed consent |
Features of the analyzed cohort
| Feature | Description |
|---|---|
| Gender | Male: 1557, Female: 753 |
| Age | Mean age: 59.2 |
| Demography | Nationality, Marital status, Living condition, Education level |
| Personal situation | Smoking, Drinking |
| Past medication | Antiplatelet, Antihypertensive, Antidiabetic, Antilipemic |
| Family history | Hypertension, Diabetes, Stroke, Cardiovascular disease |
| Past history | Hypertension, Stroke, TIA, Coronary atherosclerotic cardiopathy, Atrial fibrillation, Diabetes, Dyslipidemia, Renal disease, Surgery, mRS score |
| Treatment during hospitalization | Medication, Surgery, Rehabilitation training |
| Admission examination | Initial symptoms, Thrombolytic status, Basic information, NIHSS score |
| In-hospital adverse events | Adverse cardiac events, Adverse vascular events |
Comparison of different versions for causal CNN architecture
| Model | Accuracy | AUC | Recall | Precision | F1-score | Number of parameters |
|---|---|---|---|---|---|---|
| CNN-V1 | 0.5578 | 0.6557 | 0.5578 | 0.6012 | 0.4948 | 11077 |
| CNN-V2 | 0.5682 | 0.6505 | 0.5682 | 0.5912 | 0.4683 | 8997 |
| CNN-V3 | 0.5652 | 0.6474 | 0.5652 | 0.6081 | 0.4973 | 11717 |
The bold values are to highlight our results
Validation of data preprocessing operations
| Dataset | Accuracy | AUC | Recall | Precision | F1-score |
|---|---|---|---|---|---|
| 122 raw features | 0.5704 | 0.6484 | 0.5704 | 0.5942 | 0.4926 |
| 93 unranked features | 0.5682 | 0.6479 | 0.5682 | 0.6018 | 0.4948 |
|
|
|
|
|
|
|
The bold values are to highlight our results
Comparison of different preprocessing method
| Preprocessing method | Accuracy | AUC | Recall | Precision | F1-score |
|---|---|---|---|---|---|
| Scale | 0.4621 | 0.5927 | 0.4621 | 0.6040 | 0.4614 |
| Standard Scaler | 0.4534 | 0.5878 | 0.4534 | 0.6007 | 0.4541 |
| Min–Max | 0.4903 | 0.6052 | 0.4903 | 0.5980 | 0.4760 |
| Max Abs Scaler | 0.5110 | 0.6165 | 0.5110 | 0.6087 | 0.4894 |
| L1 | 0.5539 | 0.6192 | 0.5539 | 0.5720 | 0.4398 |
| L2 | 0.5535 | 0.6372 | 0.5535 | 0.5829 | 0.4702 |
|
|
|
|
|
|
|
The bold values are to highlight our results
Fig. 4Comparison of different subsets of features with Ours and ET
Comparison of different models for baseline
| Method | Accuracy | AUC | Recall | Precision | F1-score |
|---|---|---|---|---|---|
| NB [ | 0.5023 | 0.6054 | 0.5023 | 0.4493 | 0.4231 |
| Multinomial NB [ | 0.1728 | 0.5402 | 0.1728 | 0.4471 | 0.2070 |
| DT [ | 0.5421 | 0.6138 | 0.5421 | 0.4538 | 0.4594 |
| RF [ | 0.5671 | 0.6532 | 0.5671 | 0.4865 | 0.4755 |
| ET [ | 0.5786 | 0.6504 | 0.5786 | 0.5022 | 0.5016 |
| CART [ | 0.4431 | 0.5476 | 0.4431 | 0.4527 | 0.4557 |
| GDBT [ | 0.5639 | 0.5956 | 0.5639 | 0.4321 | 0.4544 |
| XGBoost [ | 0.5605 | 0.6453 | 0.5605 | 0.4734 | 0.4702 |
| AdaBoost [ | 0.5409 | 0.5812 | 0.5409 | 0.4639 | 0.4716 |
| LDA | 0.5647 | 0.6302 | 0.5647 | 0.4577 | 0.4653 |
| QDA | 0.2616 | 0.5667 | 0.2616 | 0.4144 | 0.2039 |
| LR [ | 0.5565 | 0.6309 | 0.5565 | 0.4452 | 0.4290 |
| KNN [ | 0.5366 | 0.6031 | 0.5366 | 0.4513 | 0.4564 |
| SVM [ | 0.5646 | 0.6228 | 0.5646 | 0.4461 | 0.4570 |
| NN [ | 0.5539 | 0.5192 | 0.5539 | 0.3649 | 0.4083 |
| MLP [ | 0.5353 | 0.5015 | 0.5353 | 0.3140 | 0.3956 |
| LSTM | 0.1295 | 0.5544 | 0.1295 | 0.4978 | 0.1252 |
| LSTM+Att | 0.0879 | 0.5781 | 0.0879 | 0.2701 | 0.0634 |
| Bi-LSTM [ | 0.1923 | 0.6032 | 0.1923 | 0.7009 | 0.1924 |
| Bi-LSTM+Att | 0.1515 | 0.6020 | 0.1515 | 0.6986 | 0.1446 |
|
|
|
|
|
|
|
The bold values are to highlight our results
Comparison of different loss function for our model
| Loss function | Accuracy | AUC | Recall | Precision | F1-score |
|---|---|---|---|---|---|
| Mean absolute error | 0.4647 | 0.5100 | 0.4647 | 0.3886 | 0.3115 |
| Mean absolute percentage error | 0.4933 | 0.5082 | 0.4933 | 0.4457 | 0.3383 |
| Mean squared error | 0.5189 | 0.5464 | 0.5189 | 0.5085 | 0.3908 |
| Mean squared logarithmic error | 0.5643 | 0.5928 | 0.5643 | 0.5895 | 0.4693 |
| Categorical Cross entropy | 0.5665 | 0.6515 | 0.5665 | 0.5940 | 0.4863 |
| Kullback leibler divergence | 0.5660 | 0.6532 | 0.5660 | 0.5939 | 0.4815 |
| Focal loss | 0.2287 | 0.6104 | 0.2287 | 0.5704 | 0.2379 |
The bold values are to highlight our results
Fig. 5Comparison of different models for KL-focal loss. The orange ones are the result of using Focal Loss, the blue ones are the result of using KL-focal loss function
Fig. 6Comparison of different strategies for active selection criterion
Fig. 7Confusion matrix of the best model
Distribution of TOAST subtype in the addition patients
| Etiologic subtypes of ischemic stroke | Number of patients | Proportion of initial data (%) |
|---|---|---|
| Large artery atherosclerosis (LAA) | 545 | + 42 |
| Cardioembolism (CE) | 47 | + 44 |
| Small artery occlusion (SAO) | 400 | + 72 |
| Other determined cause(OC) | 49 | + 60 |
| Undetermined cause(UND) | 159 | + 56 |
Comparison of classification performance in individual classes
| Subtype | Precision* | Recall* | F1-score* | Precision | Recall | F1-score |
|---|---|---|---|---|---|---|
| LAA | 0.5960 | 0.8774 | 0.7042 | 0.6559 ( | 0.7552 (− | 0.6994 (− |
| CE | 0.6220 | 0.3857 | 0.4705 | 0.7208 ( | 0.5343 ( | 0.5923 ( |
| SAO | 0.3966 | 0.2821 | 0.2910 | 0.5690 ( | 0.5392 ( | 0.5446 ( |
| OC | 0.2917 | 0.0682 | 0.1020 | 0.4785 ( | 0.2747 ( | 0.3277 ( |
| UND | 0.3067 | 0.0280 | 0.0507 | 0.3943 ( | 0.2391 ( | 0.2825 ( |