| Literature DB >> 22779038 |
Sunghwan Sohn1, Stephen Wu, Christopher G Chute.
Abstract
Negation of clinical named entities is common in clinical documents and is a crucial factor to accurately compile patients' clinical conditions and to further support complex phenotype detection. In 2009, Mayo Clinic released the clinical Text Analysis and Knowledge Extraction System (cTAKES), which includes a negation annotator that identifies negation status of a named entity by searching for negation words within a fixed word distance. However, this negation strategy is not sophisticated enough to correctly identify complicated patterns of negation. This paper aims to investigate whether the dependency structure from the cTAKES dependency parser can improve the negation detection performance. Manually compiled negation rules, derived from dependency paths were tested. Dependency negation rules do not limit the negation scope to word distance; instead, they are based on syntactic context. We found that using a dependency-based negation proved a superior alternative to the current cTAKES negation annotator.Entities:
Year: 2012 PMID: 22779038 PMCID: PMC3392064
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.An example dependency parse.
Inter-annotator agreement for negation on the 160-note Mayo corpus.
A12 is the gold standard created by A1 and A2; A34 is the gold standard created by A3 and A4
| A1, A2, A3, A4 | 0.790 |
| A12, A34 | 0.848 |
Figure 2.A block diagram of modules in the cTAKES pipeline.
Figure 3.An example of a dependency path.
Figure 4.DepNeg negation patterns based on dependency paths, with examples
Figure 5.cTAKES negation module keywords, reused in DepNeg
Negation statistics and evaluation on the test set
| (a) cTAKES Negation | ||||||||
|---|---|---|---|---|---|---|---|---|
| TP | TN | FP | FN | precision | recall | F-score | accuracy | |
| Signs/symptoms | 85 | 258 | 12 | 26 | 0.876 | 0.766 | 0.817 | 0.900 |
| Diseases/disorders | 102 | 709 | 22 | 21 | 0.823 | 0.829 | 0.826 | 0.950 |
| All | 187 | 967 | 34 | 47 | 0.846 | 0.799 | ||
micro average – i.e., obtained by using a global count of each named entity and averaging these sums.