| Literature DB >> 19025695 |
Veronika Vincze1, György Szarvas, Richárd Farkas, György Móra, János Csirik.
Abstract
BACKGROUND: Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus).Entities:
Mesh:
Year: 2008 PMID: 19025695 PMCID: PMC2586758 DOI: 10.1186/1471-2105-9-S11-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics of the three subcorpora
| Clinical | Full Paper | Abstract | |
| #Documents | 1954 | 9 | 1273 |
| #Sentences | 6383 | 2670 | 11871 |
| Negation sentences | 13.55% | 12.70% | 13.45% |
| #Negation cues | 877 | 389 | 1848 |
| Hedge sentences | 13.39% | 19.44% | 17.70% |
| #Hedge cues | 1189 | 714 | 2769 |
Agreement rates for the three subcorpora. The chief annotator resolved just the cases where the first two annotators disagreed, cases of agreement were accepted without further checking. The numbers denote agreement between the two student annotators (first one), and the agreements between each student and the chief annotator (second and third numbers).
| type | clinical records | abstracts | full articles | |
| NEGATION | ||||
| keyword | 90.70/94.56/95.81 | 91.46/91.71/98.05 | 79.42/86.77/91.71 | |
| left scope | 86.27/86.86/97.95 | 97.78/97.90/100 | 83.44/82.42/95.87 | |
| right scope | 88.88/91.26/97.39 | 94.56/95.17/99.42 | 84.36/88.19/95.09 | |
| full scope | 76.29/79.32/95.35 | 92.46/93.07/99.42 | 70.86/73.35/91.21 | |
| SPECULATION | ||||
| keyword | 84.01/89.86/92.37 | 79.12/83.92/92.05 | 77.60/81.49/90.81 | |
| left scope | 89.36/88.90/97.60 | 87.52/88.37/97.58 | 75.49/80.13/92.15 | |
| right scope | 91.28/92.64/97.90 | 87.13/89.92/96.16 | 82.40/83.28/96.97 | |
| full scope | 81.90/82.88/95.54 | 76.72/80.07/94.04 | 62.50/66.72/89.67 | |
Estimation of consistency in cases of initial agreement. We collected 200-200 randomly chosen examples from each type of corpus text to assess the level of consistency in cases when the two students provided identical annotation for the sentence (identical means here that all cues and scope boundaries were exactly the same) and they were compared to the annotation provided by the chief annotator. The agreement rates are given here.
| NEGATION | |
| keyword: | 98.65% |
| left scope: | 97.27% |
| right scope: | 98.64% |
| full scope: | 95.91% |
| SPECULATION | |
| keyword: | 99.63% |
| left scope: | 99.25% |
| right scope: | 99.63% |
| full scope: | 98.88% |