| Literature DB >> 22166257 |
Katsumasa Yoshikawa1, Sebastian Riedel, Tsutomu Hirao, Masayuki Asahara, Yuji Matsumoto.
Abstract
This paper presents a new approach to exploit coreference information for extracting event-argument (E-A) relations from biomedical documents. This approach has two advantages: (1) it can extract a large number of valuable E-A relations based on the concept of salience in discourse; (2) it enables us to identify E-A relations over sentence boundaries (cross-links) using transitivity of coreference relations. We propose two coreference-based models: a pipeline based on Support Vector Machine (SVM) classifiers, and a joint Markov Logic Network (MLN). We show the effectiveness of these models on a biomedical event corpus. Both models outperform the systems that do not use coreference information. When the two proposed models are compared to each other, joint MLN outperforms pipeline SVM with gold coreference information.Entities:
Year: 2011 PMID: 22166257 PMCID: PMC3239306 DOI: 10.1186/2041-1480-2-S5-S6
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Cross-sentence event-argument relation. An example of event-argument relation crossing sentence boundaries. In this figure, an event, “inducible” has “The region” as an Theme. But “The region” is coreferent to “The IRF-2 promoter region” in the forward sentence. So, “The IRF-2 promoter region” is also a Theme of “inducible”.
Figure 2Biomedical event extraction. A simple example of biomedical event extraction. Event: induction, increases, binding. Argument: AP-1 factors, this element, induction, binding Role: increases - induction (Cause), increases - binding (Theme), binding - AP-1 factors (Theme), binding - this element (Theme)
Used local features for SVM pipeline and MLN joint
| Description | SVM 1st phase | SVM 2nd phase | MLN predicate |
|---|---|---|---|
| Word Form | X | X | |
| Part-of-Speech | X | X | |
| Word Stem | X | X | |
| Named Entity Tag | X | X | |
| Chunk Tag | X | X | |
| In Event Dictionary | X | X | |
| Has Capital Letter | X | X | |
| Has Numeric Characters | X | X | |
| Has Punctuation Characters | X | X | |
| Character Bigram | X | ||
| Character Trigram | X | ||
| Dependency label | X | X | |
| Labeled dependency path between tokens | X | ||
| Unlabeled dependency path between tokens | X | ||
| Least common ancester of dependency path | X | ||
The three hidden predicates
| event( | token |
| eventType( | token |
| role( | token |
Basic global formulae
| Formula | Description |
|---|---|
| If there is an event there should be an event type | |
| If there is an event type there should be an event | |
| If | |
| Every event relates to need at least one argument. |
Coreference formulae
| Symbol | Name | Formula | Description |
|---|---|---|---|
| ( | corefer( | If a token | |
| ( | role( | If | |
| ( | corefer( | If |
Figure 3Experimental setup. An illustration of experimental setup. Data for learning and evaluation: GENIA Event Corpus (GEC). POS and NE Tagger: GENIA Tagger. Dependency Parser: Charniak-Johnson reranking parser with a Self-Training parsing model. Coreference Resolver: Pairwise model. Event Extractor: SVM-struct(SVM) and Markov TheBeast(MLN)
Results of event extraction (F1)
| System | Coreference | event | eventType | role |
|---|---|---|---|---|
| (a) SVM | NONE | 77.0 | 67.8 | 52.3 ( 0.0) |
| (b) SVM | SYS | 77.0 | 67.8 | 53.6 ( |
| (b′) SVM | GOLD | 77.0 | 67.8 | 55.4 (+3.1) |
| (c) MLN | NONE | 80.5 | 70.6 | 51.7 ( 0.0) |
| (g) MLN | SYS | 80.8 | 70.8 | 53.8 ( |
| (g′) MLN | GOLD | 81.2 | 70.8 | 56.7 (+5.0) |
“Coreference” has the tree options: without coreference information (NONE), with coreference resolver (SYS), and with gold coreference annotations (GOLD)
Three types of event-argument relations
| Type | Description | Edge in Figure |
|---|---|---|
| Cross | E-A relations crossing sentence boundaries (cross-link) | Arrow (C) |
| W-ANT | Intra-sententence E-As (intra-link) with antecedents | Arrow (A) |
| Normal | Neither Cross nor W-ANT | Arrow (D) |
Results of E-A relation extraction (F1)
| System | Corefer | Cross | W-ANT | Normal |
|---|---|---|---|---|
| (a) SVM | NONE | 0.0 | 56.0 | 53.6 |
| (b) SVM | SYS | 57.0 | 54.3 | |
| (b′) SVM | GOLD | 57.3 | 55.4 | |
| (c) MLN | NONE | 0.0 | 49.8 ( 0.0) | 53.2 |
| (d) MLN | 0.0 | 51.5 (+1.7) | 53.7 | |
| (e) MLN | 0.0 | 54.6 (+4.8) | 53.3 | |
| (f) MLN | 36.7 | 51.7 (+1.9) | 53.7 | |
| (g) MLN | 56.5 ( | 54.3 | ||
| (g′) MLN | GOLD | 66.7 ( | 55.3 | |
“Coreference” options include without coreference information (NONE), with coreference resolver (SYS), with gold coreference annotations (GOLD), with Feature Copy (FC), with Salience in Discourse (SiD), and with Transitivity (T)