| Literature DB >> 23731785 |
Makoto Miwa1, Sampo Pyysalo, Tomoko Ohta, Sophia Ananiadou.
Abstract
BACKGROUND: Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes.Entities:
Mesh:
Year: 2013 PMID: 23731785 PMCID: PMC3680179 DOI: 10.1186/1471-2105-14-175
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example event annotations. Entities and event triggers shown with types above their corresponding text and event participants as arcs marked with roles. Each type shown in different colours. Negation visualised as “crossed-out” event.
Figure 2Example sentences annotated with different scopes.
Statistics for training and development portions of applied corpora
| GE | 16,315 | 13,560 | 10,761 | 269,861 |
| ID | 8,501 | 2,779 | 3,412 | 83,063 |
| EPI | 10,094 | 2,453 | 7,827 | 170,809 |
| DNAm | 1,964 | 1,034 | 1,305 | 32,510 |
| EPTM | 4,698 | 1,142 | 3,692 | 82,994 |
| mTOR | 1,773 | 1,286 | 520 | 11,960 |
| MLEE | 3,553 | 4,491 | 1,931 | 37,483 |
Named entity types in applied corpora
| GE | Protein |
| ID | Protein, chemical, organism, Regulon-operon, |
| | two-component-system |
| EPI | Protein |
| DNAm | Protein |
| EPTM | Protein |
| mTOR | Protein, Drug, ion, simple molecule, tag |
| MLEE | Protein, drug or compound, cellular component, cell, tissue, |
| organ, anatomical system, organism, [ …] |
Figure 3Event types annotated in event extraction corpora.
Statistics for transferable events between training and development portions of applied corpora
| GE | - | 13,560 | 303 | 4,688 | 303 | 13,560 | 13,560 |
| ID | 1,878 | - | 69 | 524 | 69 | 1,878 | 1,878 |
| EPI | 130 | 130 | - | 1,668 | 1,988 | 473 | 1,226 |
| DNAm | 5 | 5 | 1,033 | - | 30 | 12 | 1,000 |
| EPTM | 85 | 85 | 315 | 176 | - | 138 | 155 |
| mTOR | 1,212 | 1,212 | 271 | 579 | 271 | - | 1,286 |
| MLEE | 2,843 | 2,843 | 49 | 958 | 38 | 2,852 | - |
| SUM | 6,153 | 17,835 | 2,040 | 8,593 | 2,699 | 18,913 | 19,105 |
| RATIO | 0.453 | 6.42 | 0.832 | 8.31 | 2.36 | 14.7 | 4.25 |
This table shows how many event instances in each row corpus can be transferred to each column corpus. SUM shows the sum of event instances in other corpora transferable to each column corpus, RATIO shows the ratio of the number of transferable event instances to the number of event instances in each column corpus. For the number of event instances in each row corpus, please refer to Table 1.
Figure 4Example eventMine pipeline.
Figure 5Example annotations with partially overlapping scopes.
Figure 6Restriction of negative instance generation. Different events annotated in the GE and EPI corpora are shown, along with the differences in training instances that would result in a simple merge (Merge) and our newly proposed method (Multiple) for trigger/entity and argument detectors, when all the triggers and entities are detected by the trigger/entity detectors. NONE shows negative instances, and spuriously created examples are shown in bold.
Characteristics of compared methods
| Learning from multiple corpora | | x | x | x | x |
| Instance addition | | x | | x | x |
| Single corpus-independent model | | x | | | x |
| Filtering falsely created instances | x |
Recall / precision / F-scores on the development portions of all the corpora
| Single | 49.2/ | 46.5/ | |
| Merge | 24.6/ | ||
| Stacking | 50.2/56.7/53.3 | 24.3/ | 47.5/55.7/51.3 |
| EasyAdapt | 50.7/58.4/54.3 | 25.4/48.7/33.4 | 48.1/57.8/52.5 |
| Multiple |
The FULL task evaluation criteria are employed. Event, hedge and their total results are shown. The highest results are shown in bold, and the lowest results are underlined.
F-scores on the development portions of the corpora
| Single | 50.9 | 47.1 | 46.2 | ||||
| Merge | 49.8 | 59.3 | 75.9 | 51.6 | 48.5 | ||
| Stacking | 51.3 | 50.3 | 56.6 | 72.4 | 44.6 | 48.6 | 47.1 |
| EasyAdapt | 51.3 | 58.4 | 75.6 | 49.7 | 47.6 | ||
| Multiple | 50.1 | 50.0 | 47.2 |
The FULL task evaluation criteria are employed. The highest results are shown in bold, and the lowest results are underlined.
F-scores on isolated and overlapping types on the development portions of all the corpora
| Single | 59.1 | 55.5 | 51.8 |
| Merge | 55.7 | ||
| Stacking | 58.7 | 52.9 | |
| EasyAdapt | 58.6 | 53.9 | |
| Multiple |
The FULL task evaluation criteria are employed. Isolated types excluding PTM, all isolated types and overlapping types are shown. The highest results are shown in bold, and the lowest results are underlined.
Recall / precision / F-scores on the test portions of the corpora
| Single | 50.24/63.97/56.28 | |||
| Merge | 61.59/ | 48.62/60.62/53.96 | 47.94/56.19/ | |
| Stacking | 54.60/61.89/58.02 | 40.13/66.19/49.97 | ||
| EasyAdapt | 49.72/ | 58.96/ | 44.70/ | 51.11/55.73/ |
| Multiple | 51.25/ | 50.51/55.22/52.76 |
The FULL task evaluation criteria are employed. We report the subtask Task 1 (core argument detection) result for GE. The highest results are shown in bold, and the lowest results are underlined.
Recall / precision / F-scores on the test portions of BioNLP ST 2011 corpora
| Multiple | 51.25/ | ||
| EasyAdapt | 49.72/63.19/55.65 | 58.96/61.33/ | 44.70/ |
| EM-CR | 60.55/54.97/57.63 | 49.06/55.39/52.03 | |
| FAUST | 49.41/64.75/56.04 | 48.03/ | 28.88/44.51/35.03 |
| TEES | 49.56/57.65/53.30 | 37.85/48.62/42.57 | 52.69/53.98/53.33 |
The official scores of the 2 top performing systems, FAUST and the Turku Event Extraction System (TEES), as well as EventMine with coreference resolution and domain adaptation (EM-CR), are shown for reference. We report the subtask Task 1 (core argument detection) result for GE. The highest scores are shown in bold.
Manual evaluation results on 261 event instances out of annotation scope
| Correct (Strict match) | 65 | 23 | 63 | 151 |
| Acceptable (Loose match) | 8 | 1 | 16 | 25 |
| Incorrect | 27 | 37 | 21 | 85 |