| Literature DB >> 34423262 |
Craig H Ganoe1, Weiyi Wu1, Paul J Barr2, William Haslett1, Michelle D Dannenberg2, Kyra L Bonasia2, James C Finora2, Jesse A Schoonmaker2, Wambui M Onsando2, James Ryan3, Glyn Elwyn2, Martha L Bruce2, Amar K Das2, Saeed Hassanpour1.
Abstract
OBJECTIVES: The objective of this study is to build and evaluate a natural language processing approach to identify medication mentions in primary care visit conversations between patients and physicians.Entities:
Keywords: clinic visit recording; medication information extraction; natural language processing
Year: 2021 PMID: 34423262 PMCID: PMC8374372 DOI: 10.1093/jamiaopen/ooab071
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Participant demographics for transcribed visit recordings (SD: standard deviation)
| Development data set (%) | Validation data set (%) | Evaluation data set (%) | Total (%) | |
|---|---|---|---|---|
| Number of recordings in data set | 10 | 10 | 65 | 85 |
| Participants with demographic data | 9 | 10 (100.0) | 54 | 73 |
| Gender | ||||
| Female | 4 (40.0) | 6 (60.0) | 34 (52.3) | 44 (51.8) |
| Male | 5 (50.0) | 4 (40.0) | 20 (30.8) | 29 (34.1) |
| Mean age (SD) [range] | 50.00 (18.57) | 58.60 (18.95) | 54.65 (15.61) [25–92] | 54.62 (16.35) |
| [23–87] | [20–77] | [20–92] | ||
| Race | ||||
| White | 9 (90.0) | 10 (100.0) | 54 (83.1) | 73 (85.9) |
| Ethnicity | ||||
| Not Hispanic or Latino | 9 (90.0) | 10 (100.0) | 52 (80.0) | 71 (83.5) |
| Declines to list | – | – | 2 (3.1) | 2 (2.4) |
| Language spoken | ||||
| English | 9 (90.0) | 10 (100.0) | 54 (83.1) | 73 (85.9) |
| Recording length (SD) [range] | 36.46 (17.37) [17.55–70.39] | 37.07 (10.00) [20.16–49.41] | 28.36 (11.95) [5.42–55.33] | 30.55 (12.85) [5.42–70.39] |
| Visit type | ||||
| Annual physical established patient | 3 (30.0) | 6 (60.0) | 8 (12.3) | 17 (20.0) |
| Established patient follow-up | 2 (20.0) | 3 (30.0) | 29 (44.6) | 34 (40.0) |
| Same day add-on | 2 (20.0) | 1 (10.0) | 11 (16.9) | 14 (16.5) |
| New patient workup | 2 (20.0) | – | 1 (1.5) | 3 (3.5) |
| History and physical | – | – | 2 (3.1) | 2 (2.4) |
| Other | – | – | 3 (4.6) | 3 (3.5) |
Demographic data was not captured for 12 of the 85 transcripts.
“Other” includes “Res-visit 20” and diabetic follow-up.
Figure 1.Overview of our approach to annotate medication mentions in clinic visit transcripts.
UMLS semantic types in cTAKES annotations that are filtered out in our approach
| TUI | Semantic type |
|---|---|
| T114 | Nucleic acid, nucleoside, or nucleotide |
| T122 | Biomedical or dental material |
| T123 | Biologically active substance |
| T125 | Hormone |
| T130 | Indicator, reagent, or diagnostic aid |
| T197 | Inorganic chemical |
The performance of our approach on the validation and evaluation sets
| Data set | No. of true positives | No. of false positives | No. of false negatives | Precision (%) | Recall (%) | F-score (%) |
|---|---|---|---|---|---|---|
| Validation set | 291 | 46 | 40 | 86.4 | 87.9 | 87.1 |
| Evaluation set | 1062 | 168 | 206 | 86.3 | 83.8 | 85.0 |
The comparison of our proposed approach to existing baseline models for identification of gold standard medication mentions in our evaluation set
| Model | No. of true positives | No. of false positives | No. of false negatives | Precision (%) | Recall (%) | F-score (%) |
|---|---|---|---|---|---|---|
| cTAKES | 1119 | 2814 | 163 | 28.5 |
| 42.9 |
| MedEx-UIMA | 830 | 1215 | 292 | 40.6 | 74.0 | 52.4 |
| MedXN | 832 | 318 | 432 | 72.3 | 65.8 | 68.9 |
| Our approach | 1062 | 168 | 206 |
| 83.8 |
|