| Literature DB >> 23323936 |
Raheel Nawaz1, Paul Thompson, Sophia Ananiadou.
Abstract
BACKGROUND: Negation occurs frequently in scientific literature, especially in biomedical literature. It has previously been reported that around 13% of sentences found in biomedical research articles contain negation. Historically, the main motivation for identifying negated events has been to ensure their exclusion from lists of extracted interactions. However, recently, there has been a growing interest in negative results, which has resulted in negation detection being identified as a key challenge in biomedical relation extraction. In this article, we focus on the problem of identifying negated bio-events, given gold standard event annotations.Entities:
Mesh:
Year: 2013 PMID: 23323936 PMCID: PMC3561152 DOI: 10.1186/1471-2105-14-14
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Typical structured representation of the bio-event contained in the above sentence.
Figure 2A simple hypothetical sentence with complex event structure.
Statistics of bio-event corpora containing polarity information
| GENIA Event | 36 | 36,858 | 2,351 | 6.4% |
| BioInfer | 60 | 2,662 | 163 | 6.1% |
| BioNLP’09 Shared Task | 9 | 11,480 | 722 | 6.3% |
Figure 3Inherently negative bio-event example (Source: GENIA Event Corpus; PMID: 9427533).
Figure 4Bio-event example with negated event-trigger (Source: GENIA Event Corpus; PMID: 10022882).
Figure 5Bio-event example with negated participant (Source: GENIA Event Corpus; PMID: 10358173).
Figure 6Bio-event example with negated attribute (Source: BioNLP ST Corpus; PMID: 10022882).
Figure 7Bio-event example with comparison and contrast (Source: GENIA Event Corpus; PMID: 10079106).
Corpus-wise class distribution of negated bio-events
| Inherently Negative | 13% | 11% | 9% | 11% | 12% |
| Negated Trigger | 61% | 62% | 67% | 63% | 63% |
| Negated Participant | 10% | 17% | 12% | 14% | 11% |
| Negated Attribute | 7% | 2% | 6% | 4% | 6% |
| Comparison and Contrast | 9% | 8% | 6% | 8% | 8% |
Figure 8An instance of the word with positive contextual (biological) polarity; Source = PMID: 10202937.
Figure 9An instance of the low manner indicator being treated as a negation cue; Source = PMID: 20562282.
Figure 10An instance of negation triggered by the construction ; Source = PMID: 10221643.
Negation cue lists
| c40 | 40 | absence, absent, barely, cannot, deficiency, deficient, except, exception, fail, failure, impair, inability, inactive, independent, independently, insensitive, instead, insufficient, lack (noun), lack (verb), limited, little, loss, lose, lost, low, negative, neither, never, no, none, nor, not, prevent, resistance, resistant, unable, unaffected, unchanged, without |
| cBioScope | 28 | absence, absent, cannot, could not, either, except, exclude, fail, failure, favor over, impossible, instead of, lack (noun), lack (verb), loss, miss, negative, neither, never, no, no longer, none, not, rather than, rule out, unable, with the exception of, without |
| cBioInfer | 25 | abolished, absence, cannot, defective, deficient, despite, differ, different, differential, distinct, failure, independent, independently, lack, negligible, neither, no, nor, not, protected, separately, simultaneously, unable, unlike, without |
| cCore | 20 | absence, fail, inability, independent, independently, insensitive, insufficient, lack (noun), lack (verb), little, neither, no, nor, not, resistant, unable, unaffected, unchanged, without |
Best results for each dataset
| GENIA Event | 83.1% | 67.1% | 74.2% | Random Forest | c40 |
| BioInfer | 86.1% | 84.5% | 85.3% | Random Forest | cBioInfer |
| BioNLP’09 ST | 77.6% | 63.9% | 70.1% | Random Forest | c40 |
Comparison of results using different cue lists
| c40 | 84.4% | 70.8% | 77.0% | ||||||
| cCore | 82.6% | 66.7% | 73.8% | 87.0% | 70.8% | 78.1% | 76.3% | 61.6% | 68.2% |
| cBioInfer | 81.4% | 60.4% | 69.3% | 86.1% | 75.3% | 53.2% | 62.3% | ||
| cBioScope | 80.7% | 59.9% | 68.8% | 67.7% | 77.0% | 75.4% | 52.9% | 62.2% | |
Figure 11Cue list comparison: Micro-averaged results for the three datasets.
Comparison of results using different learning algorithms
| C4.5 | 62.4% | 71.8% | 82.1% | 68.3% | 74.6% | 56.5% | |||
| Random Forest | 82.6% | 66.7% | 70.8% | 76.3% | 58.4% | 66.2% | |||
| Logistic Regression | 82.8% | 58.7% | 68.7% | 79.3% | 71.4% | 75.1% | 80.5% | 53.1% | 64.0% |
| Naïve Bayes | 31.6% | 45.8% | 42.2% | 56.2% | 32.9% | 47.0% | |||
| SVM | 79.3% | 53.7% | 64.0% | 79.0% | 67.7% | 72.9% | 78.6% | 46.7% | 58.6% |
| IB1 | 66.1% | 66.7% | 66.4% | 85.8% | 71.4% | 77.9% | 70.8% | 59.5% | 64.7% |
Figure 12Algorithm comparison: Micro-averaged results for the three datasets.