| Literature DB >> 31765395 |
Daniel M Bean1,2, James Teo3, Honghan Wu4,5,6, Ricardo Oliveira7, Raj Patel8, Rebecca Bendayan1,9, Ajay M Shah10,11, Richard J B Dobson1,2,9,12, Paul A Scott10,11.
Abstract
Atrial fibrillation (AF) is the most common arrhythmia and significantly increases stroke risk. This risk is effectively managed by oral anticoagulation. Recent studies using national registry data indicate increased use of anticoagulation resulting from changes in guidelines and the availability of newer drugs. The aim of this study is to develop and validate an open source risk scoring pipeline for free-text electronic health record data using natural language processing. AF patients discharged from 1st January 2011 to 1st October 2017 were identified from discharge summaries (N = 10,030, 64.6% male, average age 75.3 ± 12.3 years). A natural language processing pipeline was developed to identify risk factors in clinical text and calculate risk for ischaemic stroke (CHA2DS2-VASc) and bleeding (HAS-BLED). Scores were validated vs two independent experts for 40 patients. Automatic risk scores were in strong agreement with the two independent experts for CHA2DS2-VASc (average kappa 0.78 vs experts, compared to 0.85 between experts). Agreement was lower for HAS-BLED (average kappa 0.54 vs experts, compared to 0.74 between experts). In high-risk patients (CHA2DS2-VASc ≥2) OAC use has increased significantly over the last 7 years, driven by the availability of DOACs and the transitioning of patients from AP medication alone to OAC. Factors independently associated with OAC use included components of the CHA2DS2-VASc and HAS-BLED scores as well as discharging specialty and frailty. OAC use was highest in patients discharged under cardiology (69%). Electronic health record text can be used for automatic calculation of clinical risk scores at scale. Open source tools are available today for this task but require further validation. Analysis of routinely collected EHR data can replicate findings from large-scale curated registries.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31765395 PMCID: PMC6876873 DOI: 10.1371/journal.pone.0225625
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Derivation of the study cohort.
AF = Atrial fibrillation, NLP = natural language processing.
Baseline characteristics of study cohort.
| 75.3 ± 12.3 | 75.1 ± 11.5 | 74.4 ± 11.1 | 76.4 ± 12.1 | 77.5 ± 12.5 | 74.5 ± 14.3 | <0.001 | ||
| 2.5 (0.0–28.1) | 2.0 (0.0–28.1) | 1.8 (0.0–23.0) | 3.2 (0.0–28.1) | 3.2 (0.0–20.5) | 3.2 (0.0–28.1) | <0.001 | ||
| 6.5 (0.0–390.0) | 6.2 (0.0–360.4) | 6.2 (0.0–326.2) | 6.2 (0.0–360.4) | 6.4 (0.0–253.7) | 5.8 (0.0–390.0) | 0.019 | ||
| 7.0 (1.0–242.0) | 8.0 (1.0–242.0) | 7.0 (1.0–178.0) | 9.0 (1.0–242.0) | 6.0 (1.0–215.0) | 8.0 (1.0–189.0) | <0.001 | ||
| 3238 (32.3%) | 1992 (37.7%) | 1254 (37.7%) | 711 (38.0%) | 529 (27.8%) | 511 (25.6%) | <0.001 | ||
| 5722 (57.0%) | 3222 (60.9%) | 2044 (61.4%) | 1125 (60.1%) | 984 (51.7%) | 976 (48.9%) | <0.001 | ||
| 4351 (43.4%) | 2277 (43.1%) | 1371 (41.2%) | 866 (46.2%) | 911 (47.9%) | 886 (44.3%) | <0.001 | ||
| 6828 (68.1%) | 3664 (69.3%) | 2226 (66.9%) | 1376 (73.5%) | 1323 (69.6%) | 1256 (62.9%) | <0.001 | ||
| 4824 (48.1%) | 2607 (49.3%) | 1528 (45.9%) | 1028 (54.9%) | 967 (50.8%) | 952 (47.6%) | <0.001 | ||
| 3132 (31.2%) | 1710 (32.3%) | 1082 (32.5%) | 600 (32.0%) | 562 (29.6%) | 429 (21.5%) | <0.001 | ||
| 156 (1.6%) | 58 (1.1%) | 29 (0.9%) | 29 (1.6%) | 22 (1.2%) | 72 (3.6%) | |||
| 392 (3.9%) | 168 (3.2%) | 118 (3.5%) | 46 (2.5%) | 78 (4.1%) | 112 (5.6%) | |||
| 932 (9.3%) | 451 (8.5%) | 306 (9.2%) | 143 (7.6%) | 171 (9.0%) | 207 (10.4%) | |||
| 1405 (14.0%) | 707 (13.4%) | 482 (14.5%) | 214 (11.4%) | 227 (11.9%) | 312 (15.6%) | |||
| 1700 (16.9%) | 891 (16.9%) | 608 (18.3%) | 268 (14.3%) | 303 (15.9%) | 345 (17.3%) | |||
| 1853 (18.5%) | 1001 (18.9%) | 625 (18.8%) | 364 (19.4%) | 370 (19.4%) | 338 (16.9%) | |||
| 1651 (16.5%) | 899 (17.0%) | 540 (16.2%) | 337 (18.0%) | 338 (17.8%) | 310 (15.5%) | |||
| 1138 (11.3%) | 628 (11.9%) | 350 (10.5%) | 269 (14.4%) | 249 (13.1%) | 180 (9.0%) | |||
| 613 (6.1%) | 371 (7.0%) | 211 (6.3%) | 153 (8.2%) | 115 (6.0%) | 92 (4.6%) | |||
| 190 (1.9%) | 113 (2.1%) | 59 (1.8%) | 50 (2.7%) | 29 (1.5%) | 30 (1.5%) | |||
| 4.7 ± 2.0 | 4.8 ± 2.0 | 4.7 ± 1.9 | 5.0 ± 2.0 | 4.8 ± 2.0 | 4.3 ± 2.1 | <0.001 | ||
| 532 (5.3%) | 240 (4.5%) | 150 (4.5%) | 89 (4.8%) | 97 (5.1%) | 176 (8.8%) | <0.001 | ||
| 1706 (17.0%) | 937 (17.7%) | 539 (16.2%) | 380 (20.3%) | 307 (16.1%) | 355 (17.8%) | <0.001 | ||
| 75 (0.8%) | 75 (1.4%) | 26 (0.8%) | 47 (2.5%) | 0 (0.0%) | 0 (0.0%) | <0.001 | ||
| 1429 (14.2%) | 604 (11.4%) | 348 (10.5%) | 241 (12.9%) | 269 (14.1%) | 483 (24.2%) | <0.001 | ||
| 3504 (34.9%) | 3504 (66.3%) | 2130 (64.0%) | 1317 (70.3%) | - | - | - | ||
| 681 (6.8%) | 204 (3.9%) | 141 (4.2%) | 62 (3.3%) | 148 (7.8%) | 194 (9.7%) | |||
| 2716 (27.1%) | 1053 (19.9%) | 723 (21.7%) | 314 (16.8%) | 650 (34.2%) | 638 (31.9%) | |||
| 3528 (35.2%) | 1780 (33.7%) | 1186 (35.6%) | 568 (30.3%) | 783 (41.2%) | 721 (36.1%) | |||
| 2190 (21.8%) | 1488 (28.1%) | 866 (26.0%) | 596 (31.8%) | 267 (14.0%) | 359 (18.0%) | |||
| 763 (7.6%) | 618 (11.7%) | 338 (10.2%) | 267 (14.3%) | 53 (2.8%) | 79 (4.0%) | |||
| 135 (1.4%) | 127 (2.4%) | 65 (1.9%) | 59 (3.1%) | 1 (0.1%) | 7 (0.3%) | |||
| 17 (0.2%) | 17 (0.3%) | 9 (0.3%) | 7 (0.4%) | 0 (0.0%) | 0 (0.0%) | |||
| 2.0 ± 1.1 | 2.3 ± 1.1 | 2.2 ± 1.1 | 2.5 ± 1.1 | 1.7 ± 0.9 | 1.8 ± 1.0 | <0.001 | ||
Continuous variables are represented as mean ± standard deviation or median (min-max), categorical variables are represented as n (%). Hospital Frailty Risk Score (HFRS) is calculated according to Gilbert et al.[20]. P-value calculated comparing the mutually-exclusive groups Warfarin, DOAC, AP-only, No Antithrombotic medication. Continuous variables tested using a Kruskal-Wallis H-test, categorical variables tested using a Chi-squared test.
*uncontrolled hypertension is not shown for HAS-BLED as it was not detected for any patients. Stroke is only shown under CHA2DS2-VASc but is a factor for both CHA2DS2-VASc and HAS-BLED.
Performance of the drug NLP pipeline in manual validation.
| Drug | Accuracy | Precision | Recall | F1 | P | FN | FP | TN | TP |
|---|---|---|---|---|---|---|---|---|---|
| 0.94 | 0.87 | 0.97 | 0.92 | 69 | 2 | 10 | 121 | 67 | |
| 0.96 | 0.90 | 0.98 | 0.94 | 62 | 1 | 7 | 131 | 61 | |
| 1.00 | 1.00 | 0.95 | 0.98 | 22 | 1 | 0 | 178 | 21 | |
| 1.00 | 1.00 | 0.94 | 0.97 | 17 | 1 | 0 | 183 | 16 | |
| 1.00 | 1.00 | 1.00 | 1.00 | 13 | 0 | 0 | 187 | 13 | |
| 0.98 | 0.95 | 0.97 | 0.96 |
Discharge summaries were selected at random (n = 200) and manually annotated for the prescription of the 10 drugs detected by the pipeline. Performance for the 5 drugs with > 10 positive examples in manual annotation is shown. P = total positive examples in manual annotation, FN = false negative, FP = false positive, TN = true negative, TP = true positive.
Inter-rater agreement statistics for CHA2DS2-VASc and HAS-BLED risk scores.
| Score | Rater 1 | Rater 2 | Kappa (95% CI) |
|---|---|---|---|
| Algorithm | Expert A | 0.76 (0.65–0.86) | |
| Algorithm | Expert B | 0.80 (0.68–0.92) | |
| Expert A | Expert B | 0.85 (0.73–0.97) | |
| Algorithm | Expert A | 0.54 (0.36–0.72) | |
| Algorithm | Expert B | 0.53 (0.34–0.72) | |
| Expert A | Expert B | 0.74 (0.51–0.97) |
Raters 1 and 2 are two independent clinician raters, Algorithm is the automatic scoring pipeline developed in this paper.
Fig 2Antithrombotic drug prescribing patterns in the AF cohort patients with CHA2DS2-VASc ≥ 2.
A,B) Prescribing rates for all admissions during the study period. A) OAC choice vs. no OAC. B) Prescribing of OAC and/or AP vs. neither. C) Prescribing rates stratified by CHA2DS2-VASc for all patients. D) Prescribing rates grouped by HFRS as defined by Gilbert et al. Due to low numbers of patients with score > 20 the final (highest) bin is wider than the others. E) Prescribing rate vs. age at discharge. Points are the mean prescribing rate per year for all ages with ≥ 10 patients, a 10-year moving median (trend) is shown as a dashed red line. F) prescribing rates in patients grouped by discharging specialty. In C, D, F the number above each bar indicates the number of patients. AP = antiplatelet, HFRS = hospital frailty risk score, OAC = oral anticoagulant.
Fig 3Prescribing trends for new AF cases over the study period.
The solid blue line represents warfarin, the solid pink line represents DOAC, the dashed black line represents AP prescription without any OAC, the solid green line represents the no drug group. Total N = 4986. AP = antiplatelet, DOAC = direct oral anticoagulant, OAC = oral anticoagulant.
Univariate and multivariate logistic regression for factors associated with antithrombotic drug prescribing at most recent discharge for patients with CHA2DS2-VASc ≥ 2.
| Univariate | Multivariate | Univariate | Multivariate | Univariate | Multivariate | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Factor | OR (95%CI) | P- value | OR (95%CI) | P-value | OR (95%CI) | P-value | OR (95%CI) | P-value | OR (95%CI) | P-value | OR (95%CI) | P-value | |
| 0.9 (0.9–1.0) | 0.039 | 0.9 (0.8–0.9) | <0.001 | 1.3 (1.2–1.4) | <0.001 | 0.8 (0.8–0.9) | <0.001 | 1.4 (1.2–1.5) | <0.001 | 1.0 (1.0–1.1) | 0.080 | ||
| 0.9 (0.9–1.0) | <0.001 | 1.0 (0.9–1.0) | 0.016 | 1.1 (1.1–1.2) | <0.001 | 1.1 (1.0–1.1) | 0.025 | 1.1 (1.0–1.1) | 0.006 | 1.0 (1.0–1.1) | 0.073 | ||
| 1.1 (1.0–1.1) | <0.001 | 1.1 (1.1–1.1) | <0.001 | 1.1 (1.1–1.1) | <0.001 | 1.0 (1.0–1.1) | 0.446 | 0.9 (0.9–1.0) | <0.001 | 0.9 (0.9–1.0) | <0.001 | ||
| 1.7 (1.6–1.8) | <0.001 | 1.7 (1.5–1.8) | <0.001 | 1.0 (0.9–1.1) | 0.899 | 0.7 (0.6–0.8) | <0.001 | 0.7 (0.6–0.8) | <0.001 | ||||
| 1.4 (1.3–1.5) | <0.001 | 1.2 (1.1–1.3) | <0.001 | 1.0 (0.9–1.1) | 0.973 | 0.8 (0.7–0.9) | <0.001 | 0.9 (0.8–1.0) | 0.033 | ||||
| 1.0 (0.9–1.0) | 0.327 | 1.2 (1.1–1.4) | 0.002 | 1.1 (1.0–1.2) | 0.169 | 1.1 (1.0–1.2) | 0.221 | ||||||
| 1.1 (1.0–1.2) | 0.042 | 1.1 (1.0–1.2) | 0.137 | 1.4 (1.2–1.5) | <0.001 | 1.1 (0.9–1.2) | 0.254 | 1.1 (1.0–1.2) | 0.089 | ||||
| 1.1 (1.0–1.2) | 0.020 | 1.3 (1.1–1.4) | <0.001 | 1.4 (1.3–1.6) | <0.001 | 1.0 (0.9–1.2) | 0.669 | 1.0 (0.9–1.1) | 0.551 | ||||
| 1.1 (1.0–1.2) | 0.018 | 0.9 (0.8–0.9) | 0.003 | 1.0 (0.9–1.1) | 0.685 | 1.3 (1.1–1.5) | <0.001 | 1.6 (1.4–1.9) | <0.001 | ||||
| 0.7 (0.6–0.9) | <0.001 | 0.7 (0.5–0.8) | <0.001 | 1.0 (0.8–1.3) | 0.952 | 1.1 (0.8–1.4) | 0.559 | ||||||
| 1.1 (1.0–1.2) | 0.136 | 1.3 (1.1–1.5) | 0.002 | 1.0 (0.8–1.1) | 0.594 | 0.9 (0.8–1.0) | 0.117 | ||||||
| 0.6 (0.5–0.7) | <0.001 | 0.6 (0.5–0.6) | <0.001 | 1.3 (1.1–1.5) | 0.014 | 0.9 (0.8–1.2) | 0.620 | 1.2 (1.0–1.4) | 0.081 | ||||
| 0.8 (0.7–0.9) | <0.001 | 0.7 (0.6–0.8) | <0.001 | 2.6 (2.2–3.0) | <0.001 | 2.1 (1.8–2.6) | <0.001 | 1.2 (1.0–1.4) | 0.015 | 1.2 (1.0–1.4) | 0.041 | ||
| 0.6 (0.6–0.7) | <0.001 | (reference) | 1.4 (1.1–1.6) | <0.001 | (reference) | 2.2 (1.9–2.5) | <0.001 | (reference) | |||||
| 2.2 (2.0–2.5) | <0.001 | 2.6 (2.2–3.0) | <0.001 | 0.7 (0.6–0.8) | <0.001 | 0.5 (0.4–0.7) | <0.001 | 0.3 (0.3–0.4) | <0.001 | 0.2 (0.2–0.3) | <0.001 | ||
| 0.8 (0.7–0.9) | <0.001 | 1.2 (1.0–1.4) | 0.036 | 1.9 (1.6–2.2) | <0.001 | 0.8 (0.7–1.1) | 0.234 | 1.2 (1.0–1.4) | 0.013 | 0.6 (0.5–0.7) | <0.001 | ||
| 0.8 (0.8–0.9) | <0.001 | 1.2 (1.0–1.4) | 0.013 | 1.2 (1.0–1.3) | 0.023 | 0.7 (0.5–0.8) | <0.001 | 1.0 (0.9–1.1) | 0.905 | 0.6 (0.5–0.7) | <0.001 | ||
| 1.2 (1.1–1.3) | <0.001 | 1.6 (1.4–1.8) | <0.001 | 0.7 (0.6–0.8) | <0.001 | 0.5 (0.4–0.6) | <0.001 | 0.8 (0.7–1.0) | 0.013 | 0.5 (0.4–0.5) | <0.001 | ||
All factors significant at p<0.05 level in univariate analysis were included in the multivariate model. HFRS = hospital frailty risk score, LOS = length of stay
Fig 4Medication switching in patients with CHA2DS2-VASc ≥ 2 at last visit.
a) all visits at least 12 months apart and b) last visit before vs last visit after the 2014 NICE guideline update (b is a subset of a). Line width indicates overall proportion.