| Literature DB >> 31711543 |
Anoop D Shah1,2,3,4, Emily Bailey5, Tim Williams6, Spiros Denaxas7,8,9, Richard Dobson7,8,9,10, Harry Hemingway7,8,9.
Abstract
BACKGROUND: Free text in electronic health records (EHR) may contain additional phenotypic information beyond structured (coded) information. For major health events - heart attack and death - there is a lack of studies evaluating the extent to which free text in the primary care record might add information. Our objectives were to describe the contribution of free text in primary care to the recording of information about myocardial infarction (MI), including subtype, left ventricular function, laboratory results and symptoms; and recording of cause of death. We used the CALIBER EHR research platform which contains primary care data from the Clinical Practice Research Datalink (CPRD) linked to hospital admission data, the MINAP registry of acute coronary syndromes and the death registry. In CALIBER we randomly selected 2000 patients with MI and 1800 deaths. We implemented a rule-based natural language engine, the Freetext Matching Algorithm, on site at CPRD to analyse free text in the primary care record without raw data being released to researchers. We analysed text recorded within 90 days before or 90 days after the MI, and on or after the date of death.Entities:
Keywords: Chest pain; Free text; Myocardial infarction; Natural language processing; Primary care
Mesh:
Year: 2019 PMID: 31711543 PMCID: PMC6849160 DOI: 10.1186/s13326-019-0214-4
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Illustration of patient’s experience, information entered in the structured part of the primary care record (Read codes), and additional information that might be available in free text. In this hypothetical example, the subtype of myocardial infarction and preceding symptoms are present only in the free text
Fig. 2Data items extracted from primary care free text for 2000 patients within 90 days before or after myocardial infarction
Most common Read codes extracted from free text for 2000 patients within 90 days of MI in CALIBER (top five codes in each category)
| Number of records (%) | Read code | Read term |
|---|---|---|
| Current or previous condition (diagnosis Read codes) | ||
| 887 (8.1%) | G30z.00 | Acute myocardial infarction NOS |
| 443 (4.1%) | R065.00 | [D] Chest pain |
| 304 (2.8%) | C10..00 | Diabetes mellitus |
| 272 (2.5%) | G307100 | Acute non-ST segment elevation myocardial infarction |
| 256 (2.3%) | G33..00 | Angina pectoris |
| Current or previous condition (non − diagnosis Read codes) | ||
| 991 (9.5%) | 8H3Z.00 | Other hospital admission NOS |
| 913 (8.8%) | 8H…00 | Referral for further care |
| 795 (7.6%) | 1M…00 | Pain |
| 469 (4.5%) | 173..00 | Breathlessness |
| 445 (4.3%) | 8HA..11 | Discharged from follow up |
| Quantitative test result | ||
| 577 (15.8%) | 42Z7.00 | Red blood cell distribution width |
| 142 (3.9%) | 42M..00 | Lymphocyte count |
| 141 (3.9%) | 42N..00 | Monocyte count |
| 139 (3.8%) | 42K..00 | Eosinophil count |
| 138 (3.8%) | 42J..00 | Neutrophil count |
| Absence of condition (diagnosis Read codes) | ||
| 121 (7.8%) | R023.00 | [D] Oedema |
| 105 (6.8%) | G33..00 | Angina pectoris |
| 90 (5.8%) | R065.00 | [D] Chest pain |
| 57 (3.7%) | A....00 | Infectious and parasitic diseases |
| 38 (2.5%) | R006200 | [D] Fever NOS |
| Absence of condition (non-diagnosis Read codes) | ||
| 281 (15.6%) | 182..00 | Chest pain |
| 274 (15.2%) | 173..00 | Breathlessness |
| 208 (11.5%) | 1M…00 | Pain |
| 89 (4.9%) | 2I18.12 | O/E - tenderness |
| 48 (2.7%) | 199..00 | Vomiting |
| Suspected condition (diagnosis Read codes) | ||
| 70 (8.2%) | G30z.00 | Acute myocardial infarction NOS |
| 48 (5.6%) | K190.00 | Urinary tract infection, site not specified |
| 40 (4.7%) | A….00 | Infectious and parasitic diseases |
| 32 (3.8%) | G33..00 | Angina pectoris |
| 23 (2.7%) | G581.13 | Impaired left ventricular function |
| Suspected condition (non-diagnosis Read codes) | ||
| 44 (16.5%) | 8H…00 | Referral for further care |
| 18 (6.7%) | 8H3Z.00 | Other hospital admission NOS |
| 16 (6.0%) | 1M…00 | Pain |
| 9 (3.4%) | 173..00 | Breathlessness |
| 7 (2.6%) | 2C2..11 | O/E - anaemic |
Information available in CPRD primary care data (coded data and free text) for a random sample of 2000 patients with myocardial infarction in the linked CALIBER dataset
| Data element | Structured data only | Structured or free text | % increase by using free text | ||
|---|---|---|---|---|---|
| Within 90 days before or after MI: | |||||
| Pulse rate | 323 | (16.2%) | 634 | (31.7%) | 96% |
| Blood pressure | 1557 | (77.9%) | 1609 | (80.5%) | 3% |
| Left ventricular function result | 115 | (5.8%) | 309 | (15.5%) | 169% |
| Coronary angiogram results | 26 | (1.3%) | 198 | (9.9%) | 662% |
| Irregular pulse | 2 | (0.1%) | 6 | (0.3%) | 200% |
| Atrial fibrillation or flutter | 121 | (6.0%) | 153 | (7.6%) | 26% |
| Chest pain ≤7 days before MI | 378 | (18.9%) | 543 | (27.2%) | 44% |
| Chest pain ≤90 days before MI | 455 | (22.8%) | 642 | (32.1%) | 41% |
| Shortness of breath ≤7 days before MI | 62 | (3.1%) | 102 | (5.1%) | 65% |
| Shortness of breath ≤90 days before MI | 125 | (6.3%) | 196 | (9.8%) | 57% |
Type of MI as recorded in CPRD primary care data, for patients with a ‘gold standard’ MI subtype record in MINAP
| Subtype of MI | |||
|---|---|---|---|
| Primary care source of type of MI | STEMI ( | NSTEMI ( | |
Structured (Read codes) (number of patients) | STEMI | 41 | 6 |
| NSTEMI | 6 | 96 | |
Free text (number of patients) | STEMI | 13 | 5 |
| NSTEMI | 5 | 23 | |
| Patients with no information on type of MI in primary care | 250 | 163 | |
| Accuracy of MI classification using structured data | Sensitivity, % | 13.0 (9.5, 17.2) | 32.8 (27.4, 38.5) |
| Specificity, % | 98.0 (95.6, 99.2) | 98.1 (95.9, 99.3) | |
| Positive predictive value, % | 87.2 (74.3, 95.2) | 94.1 (87.6, 97.8) | |
| Accuracy of MI classification using structured and free text data | Sensitivity, % | 17.1 (13.1, 21.8) | 40.6 (34.9, 46.5) |
| Specificity, % | 96.2 (93.4, 98.1) | 96.5 (93.8, 98.2) | |
| Positive predictive value, % | 83.1 (71.7, 91.2) | 91.5 (85.4, 95.7) | |
Proportion of deaths with a cause recorded in CPRD primary care data (N = 600 for each 3-year band)
| How cause of death is recorded in primary care | Years 2001–2003 | Years 2004–2006 | Years 2007–2009 | Accuracy (95% CI) |
|---|---|---|---|---|
| Transcribed death certificate entry (e.g. 1a Heart failure, 1b Acute myocardial infarction) | ||||
| Read codes | 46 (7.7%) | 103 (17.2%) | 112 (18.7%) | 59% (52%, 65%) |
| Free text | 32 (5.3%) | 41 (6.8%) | 47 (7.8%) | 53% (44%, 63%) |
| Explicit cause of death (e.g. Cause of death: myocardial infarction) | ||||
| Read codes | 26 (4.3%) | 47 (7.8%) | 36 (6.0%) | 30% (22%, 40%) |
| Free text | 16 (2.7%) | 17 (2.8%) | 16 (2.7%) | 55% (40%, 69%) |
| Cause of death implied by diagnosis dated on or after date of death | ||||
| Read codes | 140 (23.3%) | 79 (13.2%) | 52 (8.7%) | 44% (37%, 51%) |
| Free text | 69 (11.5%) | 67 (11.2%) | 76 (12.7%) | 40% (34%, 46%) |
| No cause of death in CPRD | 271 (45.2%) | 246 (41.0%) | 261 (43.5%) | – |
Accuracy of underlying cause of death in CPRD primary care data compared to the death registry gold standard, for the 1022 individuals with cause of death recorded in both sources. For coronary deaths not recorded as coronary in CPRD, the most common causes in CPRD were I469 ‘Cardiac arrest’, I500 ‘Congestive heart failure’ and I501 ‘Left ventricular failure’. For stroke deaths not recorded as stroke in CPRD, the most common causes in CPRD were ‘J180 Bronchopneumonia, unspecified’, ‘J189 Pneumonia, unspecified’ and ‘F03X Unspecified dementia’
| Source of cause of death record in CPRD | Free text | Coded |
|---|---|---|
| Number of deaths | 381 | 641 |
| Same underlying cause | 184 (48.3%) | 293 (45.7%) |
| Same 2-character ICD-10 code for underlying cause | 222 (58.3%) | 371 (57.9%) |
| Same ICD-10 chapter for underlying cause | 278 (73.0%) | 463 (72.2%) |
| Coronary deaths (ICD-10 I20–I25, | ||
| Sensitivity, % | 65.3 (50.4, 78.3) | 68.4 (59.1, 76.8) |
| Specificity, % | 97.9 (95.7, 99.1) | 98.3 (96.8, 99.2) |
| Cerebrovascular deaths (ICD-10 F01, I60–I69, | ||
| Sensitivity, % | 66.7 (51.6, 79.6) | 58.5 (44.1, 71.9) |
| Specificity, % | 98.5 (96.5, 99.5) | 97.8 (96.2, 98.8) |
| Cancer deaths (ICD-10 C00–C97, | ||
| Sensitivity, % | 93.0 (86.1, 97.1) | 80.4 (73.5, 86.1) |
| Specificity, % | 95.7 (92.7, 97.8) | 98.5 (97.0, 99.4) |