| Literature DB >> 25393544 |
Stephen Wu1, Timothy Miller2, James Masanz3, Matt Coarr4, Scott Halgrim5, David Carrell5, Cheryl Clark4.
Abstract
A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.Entities:
Mesh:
Year: 2014 PMID: 25393544 PMCID: PMC4231086 DOI: 10.1371/journal.pone.0112774
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Extensive successful previous work on negation detection in clinical text.
| Algorithm | Data source | Entities | Method | Prec. | Rec. | F1 |
|
| 10 surgery notes & discharge summaries | UMLS concepts | Lexical/syntax rules | 91.84 | 95.74 | 92.96 |
|
| UPMC ICU discharge summaries | clinical conditions | Trigger/scope rules | 84.49 | 77.84 | 80.35 |
|
| Hopkins HNP notes | SNOMED concepts | Negation ontology | 91.17 |
| 93.90 |
|
| Stanford radiology reports | unmapped text phrases | Regex/syntax rules |
| 92.58 |
|
|
| UPMC 6 note types | clinical conditions | Trigger/scope rules | 92 | 94 | 93 |
|
| 2010 i2b2/VA | “problem” phrases | Cue words, CRFs | 92 | 95 | 94 |
|
| Mayo clinical notes | symptoms & diseases | Dependency path rules | 96.65 | 73.93 | 83.78 |
Characteristics of four corpora with negation annotations.
| sharp | i2b2 | mipacq | negex | ||||
| Train | Test | Train | Test | Train | Test | Test | |
|
| 140 | 22 | 349 | 477 | 2,443 | 324 | 120 |
|
| 5,014 | 569 | 33,022° | 48,482° | 19,672 | 2,236 | 2,376 |
|
| 10,575 | 1,154 | 11,968 | 18,550 | 23,249 | 1,721 | 2,371 |
|
| 918 | 48 | 2,535 | 3,609 | 1,681 | 158 | 491 |
|
| 8.7% | 4.2% | 21.2% | 19.5% | 7.2% | 9.2% | 20.7% |
|
| Mayo, Group Health | Partners, BIDMC, UPMC | Medpedia, NLM ClinQ, Mayo | UPMC | |||
*subset selected manually; °automatic sentence detection on pre-whitespace-tokenized text.
In the MiPACQ and SHARP corpora, the named entities (NEs) are annotated with different semantic groups which occur with different frequencies (left columns).
| NEs by semantic group, percentage(number) | Negated NEs, percentage(number) | |||||||||||||||
| SHARPn | MiPACQ | SHARPn | MiPACQ | |||||||||||||
| Train | Test | Train | Test | Train | Test | Train | Test | |||||||||
|
| 20.36% | (4591) | 25.24% | (428) | 39.87% | (4216) | 50.69% | (585) | 4.01% | (184) | 7.48% | (32) | 0.43% | (18) | – | |
|
| 26.53% | (5981) | 23.29% | (395) | 27.54% | (2912) | 29.29% | (338) | 7.82% | (468) | 11.65% | (46) | 17.07% | (497) | 13.31% | (45) |
|
| – | – | 1.91% | (202) | 0.69% | (8) | – | – | 2.97% | (6) | 25.00% | (2) | ||||
|
| 14.74% | (3324) | 13.50% | (229) | 2.98% | (315) | – | 4.60% | (153) | 8.30% | (19) | 6.35% | (20) | – | ||
|
| 19.62% | (4424) | 22.52% | (382) | 16.64% | (1759) | 11.01% | (127) | 3.28% | (145) | 1.05% | (4) | 3.01% | (53) | – | |
|
| 16.28% | (3671) | 12.68% | (215) | 5.70% | (603) | 2.17% | (25) | 19.83% | (728) | 26.51% | (57) | 52.57% | (317) | 4.00% | (1) |
|
| 0.35% | (79) | 0.06% | (1) | 2.96% | (313) | 3.73% | (43) | 1.27% | (1) | – | 0.64% | (2) | – | ||
|
| 2.10% | (474) | 2.71% | (46) | 2.40% | (254) | 2.43% | (28) | 0.42% | (2) | – | 4.42% | (5) | – | ||
The prevalence of Negated NEs also differs by corpus and semantic group (right columns).
Figure 1The cTAKES Pipeline.
The SHARPn Polarity Module is an Attribute Discovery algorithm. Training and evaluations use gold standard NEs (skip NER).
Performance (F1 score) in practical negation detection situations.
| Test | sharp | i2b2 | mipacq | negexts | |
|
|
| – | 94 | – | 94.6 |
|
|
| 62.3c | 82.1d | 71.3a,b | 95.3a |
|
|
| 80.7e | 61.2b | 87.3b | |
|
| 74.7b,c |
|
| ||
|
| 72.9b,c | 82.6d | 59.3d | ||
|
| 58.6c | 81.1e | 70.6a,b | ||
|
|
|
| 69.1a,b | 69.9c | |
|
|
| 93.5a | 93.6a | 73.6a,b | (99.9) |
|
| 89.7a | 92.6b |
| (69.9c) | |
|
|
|
| 73.9a | (58.0d) |
Figure 2Significance bands of model performance for each test corpus.
These are labeled with successive letters from right to left in Table 4.
Figure 3Learning curve for i2b2 training data on various corpora.
For each proportion of the i2b2 corpus (x axis), the reported F-score (y axis) is an average of 5 randomly sampled runs.
Average F-score with and without frustratingly easy domain adaptation (FEDA).
|
| All | + FEDA |
|
| 89.66 | 97.87 |
|
| 92.57 | 93.93* |
|
| 75.29 | 73.93 |
|
| – | – |
|
| 85.84 | 88.58 |
|
| 91.91 | 93.28* |
Figure 4The effect of named entity length (in number of words) on performance for each of 6 training configurations.
SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
Figure 5The effect of named entity semantic group on the F-score of 6 models.
SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
Top negation context features in a multi-corpus model, by chi-square value; and feature rank in domain-specific models.
| Feature Rank in Training Data | ||||||
| Feature Description | Chi∧2 | all | i2b2 | mipacq | negexts | sharp |
| (D) DepNeg path: dt_nmod_mod | 16713.1 | 1 | 5 | 1 | 2 | 1 |
| (A) Bag of 5 preceding words: no | 15601.1 | 2 | 1 | 3 | 3 | 2 |
| (E) Tree Fragment Context Above-Left: (DT no) | 15263.2 | 3 | 3 | 2 | 4 | 3 |
| (A) Bag of 10 preceding words: no | 14928.9 | 4 | 2 | 4 | 1 | 5 |
| (A) Bag of 3 preceding words: no | 14207.4 | 5 | 4 | 5 | 5 | 4 |
| (B) Preceding word #0: no | 10683.5 | 6 | 6 | 9 | 6 | 10 |
| (C) Cue category: no | 9848.3 | 7 | 7 | 6 | 12 | 6 |
| (C) Cue word: no | 8866.9 | 8 | 9 | 7 | 7 | 15 |
| (C) Cue phrase (any negation) | 8110.7 | 9 | 10 | 8 | 8 | 9 |
| (E) Tree Fragment Context Above-Left:(NP (DT no) (CONCEPT )) | 8038.3 | 10 | 8 | 18 | 13 | 23 |
| (D) DepNeg path: negverb->dobj_mod | 3817.3 | 11 | 12 | 13 | 16 | 285 |
| (E) Tree Fragment Context Above-Left:(VBZ semclass_deny) | 3809.4 | 12 | 13 | 10 | 15 | 851 |
| (A) Bag of 10 preceding words: denies | 3081.2 | 13 | 22 | 12 | 21 | 1195 |
| (E) Tree Fragment Context Above-Left: (DT any) | 2721.2 | 14 | 43 | 11 | 24 | 486 |
| (B) Preceding word #2: no | 2672.9 | 15 | 15 | 28 | 22 | 53 |
| (C) Cue category: deny | 2479.0 | 16 | 16 | 19 | 38 | 327 |
| (A) Bag of 5 preceding words: denies | 2380.3 | 17 | 28 | 16 | 26 | 2070 |
| (A) Bag of 5 following words: or | 2350.9 | 18 | 25 | 30 | 9 | 46 |
| (E) Tree Fragment Context Above-Left: (NP (DT no) (NML )) | 2247.9 | 19 | 27 | 44 | 34 | 19 |
| (A) Bag of 10 following words: or | 2242.1 | 20 | 26 | 29 | 10 | 39 |
Feature types are classified as in Section 3.4.