| Literature DB >> 34306597 |
Ayoub Bagheri1, T Katrien J Groenhof2, Folkert W Asselbergs3,4,5, Saskia Haitjema6, Michiel L Bots2, Wouter B Veldhuis7, Pim A de Jong7, Daniel L Oberski1,2.
Abstract
Methods: We used EHR data of patients included in the Second Manifestations of ARTerial disease (SMART) study. We propose a deep learning-based multimodal architecture for our text mining pipeline that integrates neural text representation with preprocessed clinical predictors for the prediction of recurrence of major cardiovascular events in cardiovascular patients. Text preprocessing, including cleaning and stemming, was first applied to filter out the unwanted texts from X-ray radiology reports. Thereafter, text representation methods were used to numerically represent unstructured radiology reports with vectors. Subsequently, these text representation methods were added to prediction models to assess their clinical relevance. In this step, we applied logistic regression, support vector machine (SVM), multilayer perceptron neural network, convolutional neural network, long short-term memory (LSTM), and bidirectional LSTM deep neural network (BiLSTM).Entities:
Mesh:
Year: 2021 PMID: 34306597 PMCID: PMC8285182 DOI: 10.1155/2021/6663884
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Characteristics of the patients.
| Characteristics | Total |
|---|---|
| Age, years, mean (SD) | 56.2 (12.5) |
| Female sex, | 1926 (34.4) |
| Current smoker, | 1549 (27.6) |
|
| |
|
| |
| CHD, | 2166 (38.7) |
| Stroke, | 1076 (19.2) |
| PAD, | 631 (11.3) |
| AAA, | 306 (5.5) |
| Years since first diagnosis of CVD, median (IQR) | 0 (0–4) |
|
| |
|
| |
| Diabetes mellitus, | 1047 (18.7) |
| Hypertension, | 2353 (42.0) |
| Dyslipidemia, | 432 (7.7) |
| BMI, kg/m2 (mean (SD)) | 26.8 (4.3) |
| SBP, mmHg (mean (SD)) | 140 (21) |
| DBP, mmHg (mean (SD)) | 83 (13) |
| Total cholesterol, mmol/L (mean (SD)) | 5.14 (1.38) |
| LDL-cholesterol, mmol/L (mean (SD)) | 3.1 (1.16) |
| HDL-cholesterol, mmol/L (mean (SD) | 1.27 (0.38) |
| Triglycerides, mmol/L (median (IQR)) | 1.7 (1.2–2.5) |
| MDRD, ml/min/1.73 m2 (median (IQR)) | 80 (68–91) |
| HbA1 | 5.7 (5.4–6.1) |
| Glucose, mmol/L (median (IQR)) | 5.7 (2.6–6.4) |
| Hemoglobin, mmol/L (mean (SD)) | 6.0 (2.04) |
| Creatinine, | 84 (73–97) |
| CRP, mg/L (median (IQR)) | 1.95 (0.90–4.20) |
| TSH, mU/l (mean (SD)) | 0.9 (0.09) |
| MACE during follow-up, | 1385 (24.7) |
CVD: cardiovascular disease; CHD: coronary heart disease; PAD: peripheral arterial disease; AAA: abdominal aortic aneurysm; BMI: body mass index; SBP: systolic blood pressure; DBP: diastolic blood pressure; LDL: low-density lipoprotein; HDL: high-density lipoprotein; MDRD: modification of diet in renal disease; HbA1c: hemoglobin; A1c CRP: C-reactive protein; TSH: thyroid-stimulating hormone; MACE: major cardiovascular events.
Figure 1Methodology text mining pipeline overview.
Dutch stop words used in this case study.
| de | informatie | je | al | na | worden | tegen |
| en | eerdere | mij | waren | reeds | zelf | gegevens |
| van | klinisch | uit | doen | wil | ons | klinische |
| ik | er | der | toen | kon | kunnen | tot |
| te | maar | daar | moet | uw | ook | omdat |
| dat | om | haar | ben | iemand | bij | ge |
| die | hem | naar | kan | geweest | zich | nu |
| in | dan | heb | hun | andere | gegevens | had |
| aan | zou | hoe | dus | klinisch | voor | als |
| een | of | heeft | onder | informatie | hier | thorax |
| hij | wat | hebben | ja | gegeven | men | u |
| het | mijn | deze | eens | xthorax | zijn | doch |
| is | dit | want | wie | conclusie | met | me |
| was | zo | nog | werd | onderzoek | ze | zij |
| op | door | zal | altijd | opname | wordt | eerder |
| over | ter | x/x |
Figure 2Most frequent words in the X-ray radiology reports in the SMART study. (a) Initial top frequent words. (b) Top frequent words after preprocessing.
Figure 3LDA clustering. The y-axis shows the top words in the selected cluster (topic). The x-axis shows the probability of the word in the topic. (a) Possible cardiac decompensation. (b) Possible pneumonia.
Figure 4Proposed multimodal learning architecture with a deep learning model.
Hyperparameter setting.
| Hyperparameter | Value |
|---|---|
| Embedding size | 500 |
| Window size | 5 |
| #filters | 128 |
| Filter size | 5 |
| #hidden units | 64 |
| Hidden activation function | ReLU |
| Output activation function | Sigmoid |
| #LSTM units | 100 |
| Dropout | 0.2 |
| Recurrent dropout | 0.2 |
| Batch size | 64 |
| #epochs | 20 |
Performance comparison of different experimental scenarios using AUC and misclassification rate.
| Classifier | AUC | Misclassification rate |
|---|---|---|
| V-LR | 0.799 | 0.195 |
| V-SVM | 0.648 | 0.196 |
| V-NN | 0.651 | 0.201 |
| T-LR | 0.512 | 0.247 |
| T-SVM | 0.625 | 0.186 |
| T-BiLSTM | 0.570 | 0.300 |
| VB-LR | 0.808 | 0.193 |
| VB-SVM | 0.784 | 0.122 |
| VC-LR | 0.809 | 0.194 |
| VC-SVM | 0.655 | 0.197 |
| MI-LR | 0.811 | 0.203 |
| MI-SVM | 0.694 | 0.237 |
| MI-CNN | 0.730 | 0.214 |
| MI-LSTM | 0.794 | 0.176 |
| MI-BiLSTM | 0.847 | 0.143 |
Figure 5Comparison of precision, recall, and F1 score for experimental scenarios.
Performance comparison of different experimental scenarios using AUC and misclassification rate when clinical predictors are not available.
| Classifier | AUC | Misclassification rate |
|---|---|---|
| D-LR | 0.685 | 0.242 |
| D-SVM | 0.572 | 0.246 |
| D-NN | 0.567 | 0.214 |
| DB-LR | 0.703 | 0.247 |
| DB-SVM | 0.674 | 0.163 |
| DC-LR | 0.705 | 0.239 |
| DC-SVM | 0.534 | 0.247 |
| D-MI-LR | 0.708 | 0.235 |
| D-MI-SVM | 0.568 | 0.247 |
| D-MI-CNN | 0.667 | 0.228 |
| D-MI-LSTM | 0.708 | 0.209 |
| D-MI-BiLSTM | 0.745 | 0.204 |
LR trained on demographic variables. LR trained on demographic variables and BOW representation. LR trained on demographic variables and clustering-based representation. Multimodal learning LR trained on demographic variables and word embeddings.
Figure 6Comparison of precision, recall, and F1 score for experimental scenarios when clinical predictors are not available.