| Literature DB >> 35415451 |
Sapna Trivedi1, Roger Gildersleeve2, Sandra Franco2, Andrew S Kanter2, Afzal Chaudhry1.
Abstract
In this pilot study, we explore the feasibility and accuracy of using a query in a commercial natural language processing engine in a named entity recognition and normalization task to extract a wide spectrum of clinical concepts from free text clinical letters. Editorial guidance developed by two independent clinicians was used to annotate sixty anonymized clinic letters to create the gold standard. Concepts were categorized by semantic type, and labels were applied to indicate contextual attributes such as negation. The natural language processing (NLP) engine was Linguamatics I2E version 5.3.1, equipped with an algorithm for contextualizing words and phrases and an ontology of terms from Intelligent Medical Objects to which those tokens were mapped. Performance of the engine was assessed on a training set of the documents using precision, recall, and the F1 score, with subset analysis for semantic type, accurate negation, exact versus partial conceptual matching, and discontinuous text. The engine underwent tuning, and the final performance was determined for a test set. The test set showed an F1 score of 0.81 and 0.84 using strict and relaxed criteria respectively when appropriate negation was not required and 0.75 and 0.77 when it was. F1 scores were higher when concepts were derived from continuous text only. This pilot study showed that a commercially available NLP engine delivered good overall results for identifying a wide spectrum of structured clinical concepts. Such a system holds promise for extracting concepts from free text to populate problem lists or for data mining projects.Entities:
Keywords: Annotation; Clinical letters; Gold standard; Named entity recognition; Natural language processing; Text mining
Year: 2020 PMID: 35415451 PMCID: PMC8982815 DOI: 10.1007/s41666-020-00079-z
Source DB: PubMed Journal: J Healthc Inform Res ISSN: 2509-498X
Fig. 1Screenshot of MAE annotation tool. An example of the unstructured text is shown with the text spans in which concepts are embedded identified in red and an example of a discontinuous concept highlighted in yellow. The table shows the text excerpt, the matching IMO term, and attributes including semantic type and assessment of negation
Fig. 2Study design showing allocation of notes for IAA, training set, and validation set and process for evaluation of training set
Inter-annotator agreement. Precision, recall, and F1 score for concepts of different semantic types, with a strict expectation of matching synonymy and a relaxed expectation
| Precision | Recall | F1 score | |
|---|---|---|---|
| Strict agreement | |||
| Overall (all semantic types) | 0.82 | 0.77 | 0.79 |
| Disorder | 0.94 | 0.96 | 0.95 |
| Finding | 0.95 | 0.68 | 0.79 |
| Situation affecting health | 0.92 | 0.67 | 0.77 |
| Family history | 0.00 | N/Aa | N/Aa |
| History of procedure | 1.00 | 0.50 | 0.67 |
| Relaxed agreement | |||
| Overall | 0.88 | 0.78 | 0.83 |
| Disorder | 1.00 | 0.96 | 0.98 |
| Finding | 1.00 | 0.70 | 0.82 |
| Situation affecting health | 1.00 | 0.68 | 0.81 |
| Family history | 1.00 | 1.00 | 1.00 |
| History of procedure | 1.00 | 0.50 | 0.67 |
aThe recall and F score for family history were not calculable because there were no true positives with the strict requirement
Training data. True positives (TP), false positives (FP), false negatives (FN), precision, recall, and F1 scores when assessing for accuracy of negation, with strict and relaxed matching expectations
| Matching standard | Scored items = 721 | ||||||
|---|---|---|---|---|---|---|---|
| TP | FP | FN | Precision | Recall | F1 | ||
| Overall without negation | Strict | 506 | 20 | 199 | 0.95 | 0.72 | 0.82 |
| Relaxed | 545 | 17 | 166 | 0.97 | 0.77 | 0.86 | |
| Overall with negation | Strict | 470 | 52 | 207 | 0.90 | 0.69 | 0.78 |
| Relaxed | 508 | 45 | 175 | 0.92 | 0.74 | 0.82 | |
Test data. True positives (TP), false positives (FP), false negatives (FN), precision, recall, and F1 scores when assessing for accuracy of negation, with strict and relaxed matching expectations
| Matching standard | Scored items = 243 | ||||||
|---|---|---|---|---|---|---|---|
| TP | FP | FN | Precision | Recall | F1 | ||
| Overall without negation | Strict | 165 | 17 | 61 | 0.91 | 0.73 | 0.81 |
| Relaxed | 174 | 17 | 52 | 0.92 | 0.77 | 0.84 | |
| Overall with negation | Strict | 145 | 36 | 62 | 0.81 | 0.70 | 0.75 |
| Relaxed | 152 | 38 | 53 | 0.80 | 0.75 | 0.77 | |
Test data using continuous data only. True positives (TP), false positives (FP), false negatives (FN), precision, recall, and F1 scores when assessing for accuracy of negation, with strict and relaxed matching expectations
| Matching standard | Scored items = 221 | ||||||
|---|---|---|---|---|---|---|---|
| TP | FP | FN | Precision | Recall | F1 | ||
| Continuous text without negation | Strict | 162 | 17 | 42 | 0.91 | 0.80 | 0.85 |
| Relaxed | 167 | 17 | 37 | 0.91 | 0.82 | 0.87 | |
| Continuous text with negation | Strict | 143 | 35 | 43 | 0.81 | 0.77 | 0.79 |
| Relaxed | 146 | 37 | 38 | 0.80 | 0.80 | 0.80 | |
Test data results broken down by semantic type with requirement of accurate negation, including spans of discontinuous text, with strict and relaxed matching expectations
| Scored items | Matching standard | TP | FP | FN | Precision | Recall | F1 | |
|---|---|---|---|---|---|---|---|---|
| Disorder | 112 | Strict | 80 | 23 | 20 | 0.78 | 0.8 | 0.79 |
| Relaxed | 87 | 24 | 12 | 0.78 | 0.88 | 0.83 | ||
| Family history | 5 | Strict | 5 | 0 | 0 | 1.00 | 1.00 | 1.00 |
| Relaxed | 5 | 0 | 0 | 1.00 | 1.00 | 1.00 | ||
| Finding | 70 | Strict | 47 | 7 | 26 | 0.89 | 0.65 | 0.75 |
| Relaxed | 47 | 8 | 25 | 0.87 | 0.66 | 0.75 | ||
| History of procedure | 15 | Strict | 5 | 3 | 11 | 0.63 | 0.31 | 0.42 |
| Relaxed | 5 | 3 | 11 | 0.63 | 0.31 | 0.42 | ||
| Situation affecting health | 12 | Strict | 8 | 3 | 5 | 0.73 | 0.62 | 0.67 |
| Relaxed | 8 | 3 | 5 | 0.73 | 0.62 | 0.67 | ||
| Disorder + finding | 182 | Strict | 127 | 30 | 46 | 0.81 | 0.74 | 0.77 |
| Relaxed | 134 | 32 | 37 | 0.81 | 0.79 | 0.80 |
Test data results broken down by semantic type with requirement of accurate negation, excluding spans of discontinuous text, with strict and relaxed matching expectations
| Scored items | Matching standard | TP | FP | FN | Precision | Recall | F1 | |
|---|---|---|---|---|---|---|---|---|
| Disorder | 104 | Strict | 80 | 22 | 13 | 0.78 | 0.86 | 0.82 |
| Relaxed | 83 | 23 | 9 | 0.78 | 0.9 | 0.84 | ||
| Family history | 5 | Strict | 5 | 0 | 0 | 1.00 | 1.00 | 1.00 |
| Relaxed | 5 | 0 | 0 | 1.00 | 1.00 | 1.00 | ||
| Finding | 61 | Strict | 45 | 7 | 17 | 0.88 | 0.74 | 0.8 |
| Relaxed | 45 | 8 | 16 | 0.87 | 0.75 | 0.8 | ||
| History of procedure | 14 | Strict | 5 | 3 | 10 | 0.63 | 0.33 | 0.43 |
| Relaxed | 5 | 3 | 10 | 0.63 | 0.33 | 0.43 | ||
| Situation affecting health | 10 | Strict | 8 | 3 | 3 | 0.73 | 0.73 | 0.73 |
| Relaxed | 8 | 3 | 3 | 0.73 | 0.73 | 0.73 | ||
| Disorder + finding | 165 | Strict | 125 | 29 | 30 | 0.82 | 0.81 | 0.81 |
| Relaxed | 128 | 31 | 25 | 0.81 | 0.84 | 0.83 |