| Literature DB >> 22539668 |
Makoto Miwa1, Paul Thompson, Sophia Ananiadou.
Abstract
MOTIVATION: In recent years, several biomedical event extraction (EE) systems have been developed. However, the nature of the annotated training corpora, as well as the training process itself, can limit the performance levels of the trained EE systems. In particular, most event-annotated corpora do not deal adequately with coreference. This impacts on the trained systems' ability to recognize biomedical entities, thus affecting their performance in extracting events accurately. Additionally, the fact that most EE systems are trained on a single annotated corpus further restricts their coverage.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22539668 PMCID: PMC3381963 DOI: 10.1093/bioinformatics/bts237
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Coreference example. Two coreferential links are illustrated
Fig. 2.ST event representation. Events are represented as dotted rectangles with event types. Within events, solid rectangles represent participants, whereas trigger expressions are shown with a white background, argument roles with a grey background and arguments with a black background. The oval denotes an event modification, in this case speculation
Fig. 3.EventMine EE pipeline. Documents used as input to the system must be pre-annotated with entities. In this case, TRAF2 and CD40 have already been identified as Protein
Features for trigger/entity, argument, multi-argument event and modification detectors
| Detector | Type | Function |
|---|---|---|
| Trigger/entity | Target candidate | word |
| Words around candidate | word | |
| Path between candidate and all NEs | shortest path | |
| Arg. | Terminal nodes of candidate pair | word |
| Words around candidate pair | pair | |
| Path between candidate pair | shortest path | |
| Path between argument trigger and its closest NE | shortest path | |
| Confidences assigned to terminal nodes found by trigger/entity detector | – | |
| Multi-arg. | Included trigger-argument pairs | arg. detector |
| event | All pairs among arguments | arg. detector |
| All pairs sharing trigger outside of candidate event | arg. detector | |
| Confidences assigned to included pairs found by arg. detector | – | |
| Mod. | Trigger | neighbouring |
| Included trigger-argument pairs | pair |
Fig. 4.Parse result modification (PR) using the CR output. In addition to the dependencies identified in the original parser output (upper solid arrows), shared dependencies are generated between the mention its and its antecedent SLP-76 (lower dotted arrows)
Statistics for training and development sets
| Corpus | Abstracts | Full texts | Sentences | Events/coref. links |
|---|---|---|---|---|
| COREF | 950 | 0 | 7982 | 2786 |
| GE09 | 950 | 0 | 7982 | 10 410 |
| GE11 | 950 | 10 | 10 761 | 13 560 |
| EPI | 800 | 0 | 7827 | 2452 |
| ID | 0 | 20 | 3412 | 2679 |
In the last column, the number of coreferential links is shown for COREF, and the number of events is shown for the other corpora.
Performance of rule-based CR systems on the development and test sets of the COREF task
| Recall | Precision | ||
|---|---|---|---|
| Development | 53.5 | 69.8 | 60.5 |
| Test | 50.4 | 62.7 | 55.9 |
| Test ( | 22.2 | 73.3 | 34.1 |
Recall, Precision and F-Score were evaluated according to the protein evaluation criteria of the COREF task. The best performing system participating in the original COREF evaluation is shown for reference.
EE performance on the development (dev) and test sets of GE09, incorporating the CR results
| SVT | BIND | REG | TOT | ||||
|---|---|---|---|---|---|---|---|
| F | F | F | R | P | F | ||
| Dev | Base | 79.41 | 49.18 | 46.78 | 54.28 | 62.62 | 58.15 |
| +PR | 78.60 | 50.92 | 47.32 | 55.00 | 62.10 | 58.34 | |
| +FE | 80.31 | 48.69 | 47.31 | 54.11 | 63.70 | 58.51 | |
| +PR+FE | 80.16 | 50.52 | 47.48 | 55.00 | 63.17 | 58.81 | |
| Test | +PR+FE | 73.55 | 59.91 | 45.99 | 52.67 | 65.19 | 58.27 |
| UMass | 72.6 | 52.6 | 46.9 | – | – | 57.4 | |
F-Scores are shown for Simple (SVT), Binding (BIND) and Regulation (REG) events, together with overall recall, precision and F-Scores for all events (TOT). The results of the best reported system for GE09, UMass (Riedel and McCallum, 2011), are shown for reference.
F-Scores achieved through application of EventMine to the development sets of all corpora
| Corpus | Base | +DA | +PR +FE +DA |
|---|---|---|---|
| GE09 | 58.15 | – | 58.81 (+0.66) |
| GE11 | 55.67 | – | 56.73 (+1.06) |
| EPI (+ID (+GE11)) | 50.96 | 52.26 (+1.30) | 52.39 (+0.13) |
| ID (+GE11) | 47.88 | 49.64 (+1.76) | 51.24 (+1.60) |
The performance of the base system is compared with versions of the system incorporating DA and additionally CR (+PR+FE).
Overall recall/precision/F-Scores achieved for EE on the ST11 test sets
| System | GE11 Task 1 | EPI | ID |
|---|---|---|---|
| EventMine | 49.06/ | ||
| FAUST | 49.41/ | 28.88/44.51/35.03 | 48.03/ |
| UTurku | 49.56/57.65/53.30 | 37.85/48.62/42.57 |
Primary evaluation criteria are employed. Results for two top systems participating in the original evaluation, FAUST (Riedel ) and UTurku (Björne and Salakoski, 2011) are shown for reference. The highest scores are shown in bold.
Detailed EE F-Scores achieved on the ST11 test sets
| EventMine | FAUST | UTurku | |
|---|---|---|---|
| GE11 simple | 73.90 | 72.11 | |
| GE11 binding | 48.49 | 43.28 | |
| GE11 regulation | 44.94 | 42.72 | |
| GE11 full texts | 52.67 | 50.72 | |
| GE11 abstracts | 57.46 | 54.37 | |
| GE11 Task 1 (core arguments) | 56.04 | 53.30 | |
| GE11 site | 44.92 | 49.72 | |
| GE11 location | 47.42 | – | |
| GE11 Task 2 (secondary arguments) | 45.86 | 37.96 | |
| GE11 Task 3 (modification) | 26.24 | – | |
| EPI catalysis | 6.58 | 7.06 | |
| EPI full task | 52.03 | 35.03 | |
| EPI core task | 67.52 | 68.59 | |
| EPI modification | – | 28.07 | |
| ID simple | 61.12 | 62.67 | |
| ID binding | 31.30 | 22.22 | |
| ID process | 65.69 | 41.57 | |
| ID regulation | 47.07 | 39.49 | |
| ID full task | 55.59 | 42.57 | |
| ID core task | 57.57 | 43.93 | |
| ID modification | 17.48 | – |
The highest scores are shown in bold.