| Literature DB >> 35966534 |
Sheng-Feng Sung1,2, Kuan-Lin Sung3, Ru-Chiou Pan4, Pei-Ju Lee5, Ya-Han Hu6.
Abstract
Background: Timely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitoring by stratifying patients into different risk groups likely to have newly detected AF (NDAF). This study aimed to develop an electronic health record (EHR)-based machine learning model to assess the risk of NDAF in an early stage after stroke.Entities:
Keywords: atrial fibrillation; electronic health records; ischemic stroke; natural language processing; prediction
Year: 2022 PMID: 35966534 PMCID: PMC9372298 DOI: 10.3389/fcvm.2022.941237
Source DB: PubMed Journal: Front Cardiovasc Med ISSN: 2297-055X
Figure 1Definition of AF categories according to the time sequence between AF detection and the index stroke. AF, atrial fibrillation.
Figure 2The process of machine learning model construction. BOW, bag-of-words; BR, binary representation; CV, cross validation; TF, term frequency; TF-IDF, term frequency with inverse document frequency.
Characteristics of the study population.
|
|
| ||
|---|---|---|---|
| Age, mean (SD) | 69.2 (12.3) | 68.0 (13.5) | =0.002 |
| Female | 1,896 (41.2) | 531 (35.6) | = <0.001 |
| Hypertension | 3,705 (80.5) | 1,119 (75.0) | = <0.001 |
| Diabetes mellitus | 1,958 (42.5) | 683 (45.8) | =0.028 |
| Hyperlipidemia | 2,670 (58.0) | 852 (57.1) | =0.546 |
| Coronary artery disease | 560 (12.2) | 103 (6.9) | = <0.001 |
| Congestive heart failure | 228 (5.0) | 25 (1.7) | = <0.001 |
| Prior stroke or TIA | 1,143 (24.8) | 274 (18.4) | = <0.001 |
| NIHSS, median (IQR) | 5 (3-10) | 5 (2-8) | = <0.001 |
| AS5F, median (IQR) | 67.4 (59.2–76.5) | 65.8 (56.9–74.2) | = <0.001 |
| CHASE-LESS, median (IQR) | 6 (5-8) | 6 (4-7) | = <0.001 |
Data are numbers (percentage) unless specified otherwise.
IQR, interquartile range; NIHSS, National Institutes of Health Stroke Scale; SD, standard deviation; TIA, transient ischemic attack.
Figure 3Heat map showing AUC values across machine learning models with different combinations of text vectorization techniques and resampling methods. AUC, area under the receiver operating characteristic curve; BR, binary representation; TF, term frequency; TF-IDF, term frequency with inverse document frequency.
Performance of prediction models for predicting newly detected atrial fibrillation.
|
|
|
|
|
|
|---|---|---|---|---|
| AS5F | =1.10 (1.08–1.13) | = <0.001 | =0.062 | =0.779 (0.734–0.825) |
| CHASE-LESS | =1.49 (1.38–1.60) | = <0.001 | =0.296 | =0.768 (0.721–0.816) |
| Model A (structured) | =1.05 (1.04–1.06) | = <0.001 | =0.764 | =0.791 (0.745–0.836) |
| Model B (unstructured) | =1.04 (1.03–1.05) | = <0.001 | =0.060 | =0.738 (0.688–0.788) |
| Model C (combined) | =1.05 (1.04–1.06) | = <0.001 | =0.600 | =0.840 (0.803–0.876) |
CI, confidence interval; HR, hazard ratio.
Figure 4The top 20 most important features identified by the model based on both structured data and unstructured textual data. The mean absolute Shapley values that indicate the average impact on model output are shown in a bar chart (A). The individual Shapley values for these features for each patient are depicted in a beeswarm plot (B), where a dot's position on the x-axis denotes each feature's contribution to the model prediction for that patient. The color of the dot specifies the relative value of the corresponding feature.