| Literature DB >> 31730484 |
Tianrun Cai1,2, Luwan Zhang3, Nicole Yang4, Kanako K Kumamaru5, Frank J Rybicki6, Tianxi Cai3,7, Katherine P Liao4,3,8.
Abstract
BACKGROUND: Electronic medical records (EMR) contain numerical data important for clinical outcomes research, such as vital signs and cardiac ejection fractions (EF), which tend to be embedded in narrative clinical notes. In current practice, this data is often manually extracted for use in research studies. However, due to the large volume of notes in datasets, manually extracting numerical data often becomes infeasible. The objective of this study is to develop and validate a natural language processing (NLP) tool that can efficiently extract numerical clinical data from narrative notes.Entities:
Keywords: Big data; Data extraction; Data mining; EMR; Natural language processing; Numerical data
Mesh:
Substances:
Year: 2019 PMID: 31730484 PMCID: PMC6858776 DOI: 10.1186/s12911-019-0970-1
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Overview of EXTEND workflow (NLTK: The Natural Language Toolkit)
Fig. 2Report Normalization and Tokenization Using the Natural Language Toolkit
Fig. 3Extraction of a group of vital sign values
Dictionary terms used in EXTEND to search for relevant variables developed manually to identify the common terms used to report these variables in our institution
| Variables | Terms |
|---|---|
| Temperature | t, fever, t-max, fevers, ta, te, tmax, tear, temp, tm, tmp, tmt, tp, tpr, tr, tre, tt, temperature, afebrile |
| Blood Pressure | b/p, bps, bp, blood pressure, hypertensive, hypotensive |
| Respiratory Rate | rr, rp, r, resp., respiratory, respiration, respirations, tachypea, breathing |
| Heart Rate | hr, hrt, p, afib, af, tach, nsr, tachy, pulse, pulses, tachycardia, tachycardic, bradycardic, sinus |
| Oxygen Saturation | sat, sats, sating, satting, desat, o2sat, o2sats, pox, spo2, sa, sao2, s, oximetry, o2, saturation, saturating, saturations, desaturation, desaturations, desaturates, desaturate, desaturated |
| Ejection Fraction | ef, ejection fraction, lvef |
| Glycated Haemoglobin | glycated haemoglobin, glycated hemoglobins, glycated hemoglobin, glycohemoglobin a, glycosylate haemoglobin, glycosylate hemoglobin, glycosylated haemoglobin a, glycosylated haemoglobin, glycosylated hb, glycosylated hemoglobin a’,'glycosylated hemoglobins, glycosylated hemoglobin, haemoglobin a1c, hb a1a + b, hb a1c, hb a1, hba1c, hba1, hemoglobin a1c, hemoglobin glycated, a1c, a1cs, hgba1c, hb1c, hga1c |
| Creatine | creat, crn, cr, creatinine,scr, cri,creatinin, ctn, cre, crea |
| Height | h, hgt, hh, ht., height |
| Weight | wt, w, wgt, wi, bw, weight |
The performance of EXTEND to identify vital signs and EF on note level
| Clinical parameters | Sensitivity | Specificity | PPV | NPV | F1-score |
|---|---|---|---|---|---|
| HR | 0.95 (0.92, 0.99) | 0.95 (0.92,0.99) | 0.97 (0.94,1.0) | 0.93 (0.88,0.98) | 0.96 (0.94,0.98) |
| BP | 0.91 (0.87, 0.96) | 1.0 | 1.0 | 0.89 (0.83,0.94) | 0.95 (0.93,0.98) |
| T | 0.94 (0.90, 0.98) | 0.99 (0.98,1.0) | 0.99 (0.97,1.0) | 0.96 (0.93,0.99) | 0.96 (0.94,0.99) |
| RR | 0.94 (0.91, 0.98) | 0.99 (0.98,1.0) | 0.99 (0.97,1.0) | 0.96 (0.93,0.98) | 0.96 (0.94,0.99) |
| O2Sat | 0.94 (0.90, 0.98) | 0.98 (0.95,1.0) | 0.99 (0.97,1.0) | 0.9 (0.85,0.96) | 0.96 (0.94,0.98) |
| EF | 0.88 (0.77, 0.99) | 1.0 | 1.0 | 0.99 (0.97,1.0) | 0.94 (0.87,1.0) |
| HbA1C | 0.91 (0.88, 0.94) | 0.98 (0.97,0.99) | 0.96 (0.94,0.99) | 0.96 (0.94,0.97) | 0.94 (0.92,0.96) |
| Creat | 0.88 (0.85, 0.92) | 0.98 (0.97,0.99) | 0.95 (0.92,0.97) | 0.94 (0.93,0.96) | 0.92 (0.89,0.94) |
Abbreviations: HR Heart rate, BP Blood pressure, T Temperature, RR Respiratory rate, OSAT Oxygen saturation, EF Ejection fraction, Hba1c Hemoglobin A1C, create, creatinine. Brackets indicate 95% confidence interval
The performance of EXTEND to identify vital signs and EF on value level
| Clinical parameters | Sensitivity | PPV | F1_score |
|---|---|---|---|
| HR | 0.95(0.92,0.99) | 0.97(0.95,1.0) | 0.96(0.94,0.99) |
| BP | 0.91(0.87,0.96) | 1.0 | 0.95(0.93,0.98) |
| T | 0.94(0.9,0.98) | 0.99(0.97,1.0) | 0.96(0.94,0.99) |
| RR | 0.94(0.9,0.98) | 0.99(0.97,1.0) | 0.96(0.94,0.99) |
| O2Sat | 0.94(0.9,0.98) | 0.99(0.97,1.0) | 0.96(0.94,0.98) |
| EF | 0.92(0.81,1.0) | 1.0 | 0.96(0.89,1.0) |
| HbA1C | 0.95(0.92,0.98) | 0.95(0.92,0.98) | 0.95(0.93,0.97) |
| Creat | 0.95(0.92,0.97) | 0.97(0.96,0.99) | 0.95(0.93,0.97) |
Numbers in parentheses denote 95% confidence interval (CI). HR Heart rate, BP Blood pressure, T Temperature, RR Respiratory rate, OSAT Oxygen saturation, PPV Positive predictive value, NPV Negative predictive value. Brackets indicate 95% confidence interval