| Literature DB >> 23157272 |
Ngan Nguyen1, Jin-Dong Kim, Makoto Miwa, Takuya Matsuzaki, Junichi Tsujii.
Abstract
BACKGROUND: Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the Protein Coreference (COREF) task was arranged in the BioNLP Shared Task (BioNLP-ST, hereafter) 2011, as a supporting task. However, the shared task results indicated that transferring coreference resolution methods developed for other domains to the biological domain was not a straight-forward task, due to the domain differences in the coreference phenomena.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23157272 PMCID: PMC3582588 DOI: 10.1186/1471-2105-13-304
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1An excerpt of protein coreference annotated data. Given protein names are highlighted in purple. Pronouns and definite noun phrases, are highlighted in red, T27, T29, T30, T32, of which the antecedents are indicated by arrows.
Performance evaluation on the test data set which contains 284 protein coreference links
| UU | 12.0 79.0 20.8 | 5.5 66.7 | 56.0 71.2 62.7 | 22.2 73.3 |
| UZ | 17.6 62.9 | 4.1 12.5 6.2 | 46.7 71.4 56.5 | 21.5 55.5 31.0 |
| CU | —– —– —– | —– —– —– | 64.6 68.0 | 19.4 63.2 29.7 |
| UT | 12.8 72.7 21.8 | 1.4 14.3 2.5 | 29.3 73.3 41.9 | 14.4 67.2 23.8 |
| RB-MIN | 28.0 40.2 33.0 | 9.6 31.8 14.7 | 76.0 62.0 | 35.6 49.0 41.2 |
| RB-MIN+1, 3 | 43.2 53.5 47.8 | 19.2 38.9 25.7 | 76.0 60.6 67.5 | 44.7 54.3 49.0 |
| RB-MIN+1, 2, 3 | 48.0 50.4 | 41.1 37.0 | 76.0 60.6 67.5 | 52.5 50.2 |
The first four rows show the top performances in BioNLP-ST 2011, row 5, 6 and 7 are our system results with different combination of rules.
Figure 2Protein coreference resolution workflow.
Illustration example for the system workflow of protein coreference resolution
| Example text (PMID-7964516) | T cell hybridomas respond to activation signals by undergoing apoptotic cell death, and this is likely to represent comparable events related to tolerance induction in immature and mature T cells in vivo. Previous studies using antisense oligonucleotides implicated the c-Myc protein in the phenomenon of activation-induced apoptosis. This role for c-Myc in apoptosis is now confirmed in studies using a dominant negative form of its heterodimeric binding partner, Max, which we show here inhibits activation-induced apoptosis. |
| Preprocessing results: sentences and chunks (partially) (Step 0) | S1: T cell hybridomas respond to activation signals by undergoing apoptotic cell death, and this is likely to represent comparable events related to tolerance induction in immature and mature T cells in vivo. S2: Previous studies using antisense oligonucleotides implicated the c-Myc protein in the phenomenon of activation-induced apoptosis. S3: [This [[role] for [c-Myc] in [apoptosis]]] is now confirmed in [[studies] using a dominant negative form of its heterodimeric binding partner, Max, which we show here inhibits activation-induced apoptosis]. |
| Markables (Step 1) | S1: [T cell hybridomas] respond to [activation signals] by undergoing [apoptotic cell death], and this is likely to represent [comparable events related to tolerance induction] in [immature and mature T cells [in vivo]]. S2: [Previous studies using [antisense oligonucleotides]] implicated [the c-Myc protein] in [the phenomenon of [activation-induced apoptosis]]. S3: [This role for [c-Myc] in [apoptosis]] is now confirmed in [studies] using a [dominant negative form of [[its] heterodimeric binding partner,[Max]]], [which] [we] show here inhibits [activation-induced apoptosis]. |
| Anaphors (Step 2) | S1: T cell hybridomas respond to activation signals by undergoing apoptotic cell death, and [this] is likely to represent comparable events related to tolerance induction in immature and mature T cells in vivo. S2: Previous studies using antisense oligonucleotides implicated the c-Myc protein in the phenomenon of activation-induced apoptosis. S3: This role for c-Myc in apoptosis is now confirmed in studies using a dominant negative form of [its] heterodimeric binding partner, Max, [which] we show here inhibits activation-induced apoptosis. |
| Antecedent candidates of | S1: [T cell hybridomas] respond to [activation signals] by undergoing [apoptotic cell death], and this is likely to represent [comparable events related to tolerance induction] in [immature and mature T cells [in vivo]]. S2: [Previous studies using [antisense oligonucleotides]] implicated [the c-Myc protein] in [the phenomenon of [activation-induced apoptosis]]. S3: [This role for [c-Myc] in [apoptosis]] is now confirmed in [studies] using [a dominant negative form of |
| Predicted antecedent of | S1: T cell hybridomas respond to activation signals by undergoing apoptotic cell death, and this is likely to represent comparable events related to tolerance induction in immature and mature T cells in vivo. S2: Previous studies using antisense oligonucleotides implicated the c-Myc protein in the phenomenon of activation-induced apoptosis. S3: This role for [ |
The resulted items of each step are shown in square brackets, and the whole syntactic parse tree of the sentence S3 is shown in a separated figure.
Figure 3Illustration of Enju parse output. Enju parse output for the sentence “This role for c-Myc in apoptosis is now confirmed in studies using a dominant negative form of its heterodimeric binding partner, Max, which we show here inhibits activation-induced apoptosis.” The red boxes show syntactic relations between of and its two arguments arg1 and arg2.
Figure 4Illustration of the decision list used in antecedent prediction.
Anaphor types in descending order of percentage measured on the training data set
| Possessive pronoun | 25.3 | 222 | |
| Relative pronoun | 18.6 | 163 | |
| Demonstrative noun phrase | 15.2 | 133 | |
| Demonstrative pronoun | 13.6 | 119 | |
| The- definite noun phrase | 10.9 | 96 | |
| Personal pronoun | 9.6 | 84 | |
| Other definite noun phrase | 2.2 | 19 | |
| Indefinite noun phrase | 1.6 | 14 | |
| Proper name | 1.5 | 13 | |
| Other pronoun | 0.8 | 7 | |
| Reflexive pronoun | 0.8 | 7 | |
| Total | 877 |
Contribution of different antecedent prediction rules in coreference system
| RB-MIN | 37.7 | 61.6 | 46.8 |
| RB-MIN + 1 (NUM-AGREE) | 39.7 | 64.8 | 49.2 |
| RB-MIN + 2 (SEM-CONS) | 47.1 | 58.2 | 52.0 |
| RB-MIN + 3 (DISC-PREF) | 46.6 | 65.1 | 54.3 |
| RB-MIN + 1, 2, 3 (RB-FULL) | 57.8 | 67.8 | |
| RB-MIN + 1, 3 | 50.5 | 69.1 | 58.4 |
Performance was measured on the development data set which contains 204 protein coreference links.
Influence of semantic information used in anaphor selection step on coreference resolution system
| RB-FULL | 57.8 | 67.8 | 62.4 |
| RB-FULL w/o PRO-ANA-SEM | 55.4 | 66.9 | 60.6 |
| RB-FULL w/o DEFNP-ANA-SEM | 55.9 | 14.5 | 23.0 |
| RB-FULL w/o DEFNP-ANA-SEM + PRO-ANA-SEM | 53.4 | 13.9 | 22.1 |
Performance was measured on the development data set which contains 204 protein coreference links.
Evaluation of RB-FULL system with different coreference evaluation scores
| BioNLP-ST | 57.8 | 67.8 | |
| MUC | 28.9 | 32.4 | 30.5 |
| B3 | 72.9 | 77.2 | 75.0 |
| BLANC | 61.3 | 63.4 | 62.2 |
| CEAF-M | 68.2 | 68.2 | 68.2 |
| CEAF-E | 66.6 | 62.5 | 64.5 |
Performance was measured on the development data set which contains 204 protein coreference links.