| Literature DB >> 35005697 |
Matthew Deady1, Hussein Ezzeldin2, Kerry Cook1, Douglas Billings1, Jeno Pizarro1, Amalia A Plotogea3, Patrick Saunders-Hastings3, Artur Belov2, Barbee I Whitaker2, Steven A Anderson2.
Abstract
Introduction: The Food and Drug Administration Center for Biologics Evaluation and Research conducts post-market surveillance of biologic products to ensure their safety and effectiveness. Studies have found that common vaccine exposures may be missing from structured data elements of electronic health records (EHRs), instead being captured in clinical notes. This impacts monitoring of adverse events following immunizations (AEFIs). For example, COVID-19 vaccines have been regularly administered outside of traditional medical settings. We developed a natural language processing (NLP) algorithm to mine unstructured clinical notes for vaccinations not captured in structured EHR data.Entities:
Keywords: clinical notes; electronic health records; natural language processing; vaccine adverse events; vaccine safety
Year: 2021 PMID: 35005697 PMCID: PMC8727347 DOI: 10.3389/fdgth.2021.777905
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Vaccine administration evidence algorithm steps.
|
|
|
|---|---|
| 1. Filter unstructured notes by type | Medical notes are often characterized by their type (e.g., Discharge Summary, Surgical History). In this case, certain note types were filtered out because the study team judged certain note types to have a reduced probability of containing free-text documentation of vaccinations based on manual review of a sample of our training set. Examples include notes populated with semi-structured interview questions like “Received Flu Vaccine: No” or patient education notes that discuss vaccinations in the hypothetical and might read “...after this procedure, do not receive a flu shot for at least a month...”. |
| 2. Tokenize Filtered Set of Notes | In order to process the filtered set of notes, we used a simple tokenize algorithm (SpaCy)1, to segment the text from the filtered notes into single words. |
| 3. Create simple part of speech tagging to identify presence of vaccine administration | Using the list of words produced by the tokenizer algorithm, we tagged verbs which indicated vaccine administration. The identification of a past tense verb (“got,” “received,” “given,” or “had)” assisted in identifying true instances of vaccination rather than vaccine education materials (Full list can be found in |
| 4. Using NLP rule-based matching to search for vaccine derivative in vicinity of verb | If a desired verb was found, the algorithm searched for evidence of a vaccination (e.g., vaccine, shot, vaccination) within five tokens, where a token is a continuous string of characters between a space or punctuation marks. |
| 5. Using NLP rule-based matching to search for and identify vaccine type | The algorithm used the preceding four tokens of the vaccination term to search for the vaccine type (e.g., influenza, flu, hep b, hepatitis b). It then looked for the mapped term that is the most complete match of the four preceding tokens (e.g., “pneumococcal 13” maps to “pneumococcal 13-valent” rather than more generic “pneumococcal)”. The table of the mappings to vaccine types was developed from an initial list from clinicians SMEs augmented by potential alternative names found in a manual review of a sample of training cases. The final table can be reviewed in |
| 6. Find or derive date of vaccine administration | The algorithm searched for an absolute date (ex. 1/12/19, 1/19) or relative date (yesterday, last week, today) within five tokens of the vaccination term. A table of the different date formats and relative date tokens used can be found in |
1
Figure 1NLP algorithm example, steps 2 through five.
Characteristics of the sample population and broader EHR population, 2014–2019.
|
|
| |
|---|---|---|
| Patient characteristic | ||
| Age (median, IQR) | 51.37 (26–63) | NA |
|
| ||
| <18 | 19.70 | 10.14 |
| 18–65 | 54.27 | 64.63 |
| 65+ | 22.61 | 25.21 |
| Unknown | 3.42 | 0.03 |
|
| ||
| Male | 45.63 | 41.51 |
| Female | 54.27 | 58.42 |
| Other/unknown | 0.10 | 0.07 |
|
| ||
| Asian/Pacific | 2.11 | 1.77 |
| Black/African American | 41.61 | 34.48 |
| Caucasian/White | 45.93 | 42.14 |
| Other | 10.25 | 21.61 |
|
| ||
| Vaccine administrations | 2,740 | NA |
| Influenza vaccine administrations | 1,706 | NA |
| Medical encounters | 2,068 | NA |
Age was calculated from the start of the study period (January 1, 2014). Patients born within the study period were automatically added to the <18 group.
NA, Not available.
Figure 2Vaccine administrations identified in structured vaccination data and unstructured notes.