| Literature DB >> 32098599 |
Rashmee U Shah1, Rebeka Mukherjee1, Yue Zhang2, Aubrey E Jones3, Jennifer Springer1, Ian Hackett4, Benjamin A Steinberg1, Donald M Lloyd-Jones5, Wendy W Chapman6.
Abstract
Background Electronic medical records (EMRs) allow identification of disease-specific patient populations, but varying electronic cohort definitions could result in different populations. We compared the characteristics of an electronic medical record-derived atrial fibrillation (AF) patient population using 5 different electronic cohort definitions. Methods and Results Adult patients with at least 1 AF billing code from January 1, 2010, to December 31, 2017, were included. Based on different electronic cohort definitions, we trained 5 different logistic regression models using a labeled training data set (n=786). Each model yielded a predicted probability; patients were classified as having AF if the probability was higher than a specified cut point. Test characteristics were calculated for each model. These models were then applied to the full cohort and resulting characteristics were compared. In the training set, the comprehensive model (including demographics, billing codes, and natural language processing results) performed best, with an area under the curve of 0.89, sensitivity of 0.90, and specificity of 0.87. Among a candidate population (n=22 000), the proportion of patients identified as having AF varied from 61% in the model using diagnosis or procedure International Classification of Diseases (ICD) billing codes to 83% in the model using natural language processing of clinical notes. Among identified AF patients, the proportion of patients with a CHA2DS2-VASc score ≥2 varied from 69% to 85%; oral anticoagulant treatment rates varied from 50% to 66% depending on the model. Conclusions Different electronic cohort definitions result in substantially different AF study samples. This difference threatens the quality and reproducibility of electronic medical record-based research and quality initiatives.Entities:
Keywords: atrial fibrillation; electronic health records; health services research; informatics; quality of care
Mesh:
Substances:
Year: 2020 PMID: 32098599 PMCID: PMC7335556 DOI: 10.1161/JAHA.119.014527
Source DB: PubMed Journal: J Am Heart Assoc ISSN: 2047-9980 Impact factor: 5.501
Characteristics of the Model Training Population (n=786), According to the Presence or Absence of AF
| Characteristic | AF Present (n=632) | AF Absent (n=154) |
|
|---|---|---|---|
| Age, y, mean (SD) | 69.0 (14.2) | 61.3 (17.9) | <0.01 |
| Female sex | 249 (39.4) | 81 (52.6) | <0.01 |
| White race | 563 (89.1) | 132 (85.7) | 0.24 |
| Medicare insured | 411 (65.0) | 81 (52.6) | <0.01 |
| No. of outpatient AF diagnoses, mean (SD) | 10.1 (21.6) | 1.2 (1.7) | <0.01 |
| Primary AF diagnosis | 404 (63.9) | 99 (64.3) | 0.93 |
| Comorbid conditions | |||
| Acute myocardial infarction | 80 (12.7) | 13 (8.4) | 0.15 |
| Coronary artery disease | 302 (47.8) | 53 (34.4) | <0.01 |
| Valvular heart disease | 216 (34.3) | 47 (30.5) | 0.39 |
| Congestive heart failure | 222 (35.1) | 32 (20.8) | <0.01 |
| Cerebrovascular disease | 156 (24.7) | 60 (39.0) | <0.01 |
| Dementia | 183 (29.0) | 39 (25.3) | 0.37 |
| Liver disease | 132 (20.9) | 36 (23.4) | 0.50 |
| Diabetes mellitus | 289 (45.7) | 62 (40.3) | 0.22 |
| Acute renal failure | 161 (25.5) | 25 (16.2) | 0.02 |
| Chronic kidney disease | 165 (26.1) | 19 (12.3) | <0.01 |
| Pulmonary heart disease | 158 (25.0) | 25 (16.2) | 0.02 |
| Hypertension | 465 (73.6) | 103 (66.9) | 0.10 |
| Thyroid disease | 190 (30.1) | 29 (18.8) | <0.01 |
| Anemia | 219 (34.7) | 36 (23.4) | <0.01 |
| Cancer | 235 (37.2) | 33 (21.4) | <0.01 |
| Procedures, | |||
| Heart valve surgery | 27 (4.3) | 5 (3.3) | 0.56 |
| Coronary artery bypass grafting | 21 (3.3) | 4 (2.6) | 0.65 |
| Percutaneous coronary intervention | 17 (2.7) | 3 (1.8) | 0.60 |
| Angioplasty | 56 (8.9) | 8 (5.2) | 0.14 |
| Pacemaker/defibrillator | 23 (3.6) | 5 (3.3) | 0.81 |
| Cardioversion | 65 (10.3) | 2 (1.3) | <0.01 |
| Procedures, CPT codes | |||
| Ablation | 19 (3.0) | 2 (1.3) | 0.24 |
| Cardioversion | 234 (37.0) | 7 (4.6) | <0.01 |
| Natural language processing | |||
| At least 1 nonnegated mention | 614 (97.2) | 57 (37.0) | <0.01 |
| No. of AF mentions | |||
| None | 16 (2.5) | 76 (49.3) | <0.01 |
| First quartile | 123 (19.5) | 57 (37.0) | |
| Second quartile | 159 (25.2) | 15 (9.7) | |
| Third quartile | 162 (25.6) | 6 (3.9) | |
| Fourth quartile | 172 (27.2) | 0 (0) | |
| ECG with reference to AF | 234 (37.0) | 7 (4.6) | <0.01 |
Values are shown as n (%), unless otherwise specified. AF indicates atrial fibrillation; CPT, Current Procedural Terminology; ICD, International Classification of Diseases.
Primary diagnosis refers to position 1 in the order of the billed codes.
Comorbid conditions were identified from ICD billing codes present in the patient medical record.
Refers to the number of times a target term for AF was present in the clinical notes. The ranges are as follows: none, no mentions; first, 1–6; second, 7–19; third, 20–46; fourth, 48–670.
Figure 1Receiver operating characteristic curves for different models to identify atrial fibrillation patients using the electronic medical record. In the training set (n=786), the AUC was highest for the comprehensive model and lowest for the Medicare model. AUC indicates area under the receiver operating characteristic curve; International Classification of Diseases; NLP, natural language processing; Sens, sensitivity; Spec, specificity.
Figure 2Proportion of correct, false‐positive, and false‐negative classifications for each model in the training set. In the training set (n=786), the NLP model resulted in the highest number of correctly classified patients, at the expense of a high false‐positive rate. The outpatient billing codes and ECG method had the lowest number of correctly classified patients and the highest number of false negatives. AF indicates atrial fibrillation; International Classification of Diseases; NLP, natural language processing.
Population Characteristics Based on the Patient‐Selection Model
| Selected Characteristics | Medicare | Outpatient AF Codes, ECG | Demographics, | NLP | Comprehensive |
|
|---|---|---|---|---|---|---|
| Proportion identified as AF, % | 18 030 (82.0) | 11 512 (52.3) | 13 427 (61.0) | 18 202 (82.7) | 15 962 (72.6) | <0.01 |
| Age, y, mean (SD) | 67.8 (14.3) | 68.1 (14.1) | 70.8 (12.4) | 68.7 (13.8) | 69.8 (13.1) | <0.01 |
| Female sex | 7434 (41.2) | 4846 (42.1) | 5113 (38.1) | 7538 (41.4) | 6528 (40.9) | <0.01 |
| White race | 15 707 (87.2) | 10 143 (88.1) | 11 980 (89.2) | 15 957 (87.7) | 14 110 (88.4) | <0.01 |
| Medicare | 11 092 (61.5) | 7116 (61.8) | 8874 (66.1) | 11 481 (63.1) | 10 389 (65.1) | <0.01 |
| CHA2DS2‐VASc ≥2 | 14 920 (82.8) | 9156 (79.5) | 11 450 (85.3) | 15 110 (83.0) | 13 286 (83.2) | <0.01 |
| OAC prescribed | 7838 (52.5) | 6074 (66.3) | 6572 (57.4) | 7502 (49.6) | 8127 (61.2) | <0.01 |
| Comorbid conditions | ||||||
| Acute myocardial infarction | 2690 (14.9) | 1493 (13.0) | 2198 (16.4) | 2567 (14.2) | 2356 (14.8) | <0.01 |
| Coronary artery disease | 8463 (46.9) | 5365 (46.6) | 6809 (50.7) | 8431 (46.3) | 7496 (47.0) | <0.01 |
| Valvular heart disease | 6801 (37.7) | 4001 (34.7) | 5024 (37.4) | 6604 (36.3) | 5665 (35.5) | <0.01 |
| Congestive heart failure | 6859 (38.0) | 4352 (37.8) | 5766 (42.9) | 3828 (37.5) | 6173 (38.7) | <0.01 |
| Cerebrovascular disease | 5914 (32.8) | 3077 (26.7) | 3132 (23.3) | 5506 (30.3) | 4265 (27.7) | <0.01 |
| Dementia | 2488 (13.8) | 1340 (11.6) | 1776 (13.2) | 2386 (13.1) | 2092 (13.1) | <0.01 |
| Diabetes mellitus | 8283 (45.9) | 4779 (41.5) | 6219 (46.3) | 8106 (44.5) | 7080 (44.4) | <0.01 |
| Chronic kidney disease | 4487 (24.9) | 2610 (26.7) | 4082 (30.4) | 4504 (24.7) | 4306 (27.0) | <0.01 |
| Hypertension | 14 109 (78.3) | 8729 (75.8) | 10 797 (80.4) | 14 068 (77.3) | 12 261 (76.8) | <0.01 |
| Cancer | 6116 (33.9) | 3886 (33.8) | 5387 (40.1) | 6257 (34.4) | 5631 (35.3) | <0.01 |
| Procedures | ||||||
| Heart valve surgery | 867 (4.8) | 502 (4.4) | 627 (4.7) | 844 (4.6) | 672 (4.2) | <0.01 |
| Coronary artery bypass grafting | 644 (3.6) | 325 (2.8) | 457 (3.4) | 615 (3.4) | 583 (3.7) | <0.01 |
| Percutaneous coronary intervention | 608 (3.4) | 333 (2.9) | 553 (4.1) | 558 (3.1) | 481 (3.0) | <0.01 |
| Pacemaker/defibrillator | 812 (4.5) | 563 (4.9) | 582 (4.3) | 783 (4.3) | 706 (4.4) | <0.01 |
Values shown as n (%), unless otherwise specified. AF indicates atrial fibrillation; ICD, International Classification of Diseases; NLP, natural language processing; OAC, oral anticoagulant.
Including only patients with CHA2DS2‐VASc ≥2.
Figure 3Proportion of patients included with CHA score ≥2 and treated with an OAC for each model. When applied to the candidate population, different patient‐selection models resulted in populations with different sizes, stroke risks, and OAC treatment rates. The corresponding values are found in Table 2. “Outpatient AF codes, ECG” refers to the method used in prior publications from Kaiser Permanente. AF indicates atrial fibrillation; International Classification of Diseases; NLP, natural language processing; OAC, oral anticoagulant.