| Literature DB >> 20876049 |
Mike Conway1, Ai Kawazoe, Hutchatai Chanlekha, Nigel Collier.
Abstract
BACKGROUND: In recent years, there has been a growth in work on the use of information extraction technologies for tracking disease outbreaks from online news texts, yet publicly available evaluation standards (and associated resources) for this new area of research have been noticeably lacking.Entities:
Mesh:
Year: 2010 PMID: 20876049 PMCID: PMC2956322 DOI: 10.2196/jmir.1323
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Worked example of event frame construction from raw text. Note that this paper focuses on the construction of event frames from documents already tagged for named entities. The named entity tagging process is described by Kawazoe et al [16].
Agreement for 42 documents with precisely one event per annotator (note that only Boolean fixed slot properties are shown)
| Agreement for Fixed Slot Properties in Each of 42 Documents | |||||
| Property | Annotator 1 (true) | Annotator 1 (false) | Annotator 2 (true) | Annotator 2 (false) | Agreement (%) |
| DRUG_RESISTANCE | 0 | 42 | 0 | 42 | 100.0 |
| FARM_WORKER | 0 | 42 | 0 | 42 | 100.0 |
| FOOD_CONTAMINATION | 5 | 37 | 13 | 29 | 71.4 |
| HOSPITAL_WORKER | 0 | 42 | 1 | 41 | 97.6 |
| INTERNATIONAL_TRAVEL | 0 | 42 | 0 | 42 | 100.0 |
| PRODUCT_MALFORMATION | 0 | 42 | 0 | 42 | 100.0 |
| ZOONOSIS | 7 | 35 | 12 | 30 | 83.0 |
Corpus document sources (200 documents)
| Document Source | Number of Documents | % of 200 |
| ProMed-Mail | 43 | 21.5 |
| Reuters | 16 | 8.0 |
| BBC | 16 | 8.0 |
| WHO | 41 | 20.5 |
| CBS | 13 | 6.5 |
| CBC | 17 | 8.5 |
| Vietnam-net | 12 | 6.0 |
| Hindustan Times | 18 | 9.0 |
| The Nation (Thailand) | 9 | 4.5 |
| All Africa | 5 | 2.5 |
| Xinhua (China) | 5 | 2.5 |
| Antara (Indonesia) | 5 | 2.5 |
Event statistics (total number of events is 394)
| Type of Event | Number of Events | % of 394 |
| Events involving humans | 297 | 75.4 |
| Events involving food contamination | 35 | 8.9 |
| Events involving hospital workers | 3 | 0.8 |
| Events involving malformed products | 2 | 0.5 |
| Events classified as present | 321 | 81.5 |
| Events classified as historical | 49 | 12.4 |
| Events classified as recent_past | 11 | 2.8 |
| Events classified as hypothetical | 13 | 3.3 |
Figure 2Distribution of disease events in our corpus by country (only countries with 2 or more events shown) (Map produced by GPS visualizer)
Figure 3Linux BioCaster corpus event frame browsing tool [9]