| Literature DB >> 22962484 |
Sampo Pyysalo1, Tomoko Ohta, Makoto Miwa, Han-Cheol Cho, Jun'ichi Tsujii, Sophia Ananiadou.
Abstract
MOTIVATION: Event extraction using expressive structured representations has been a significant focus of recent efforts in biomedical information extraction. However, event extraction resources and methods have so far focused almost exclusively on molecular-level entities and processes, limiting their applicability.Entities:
Mesh:
Year: 2012 PMID: 22962484 PMCID: PMC3436834 DOI: 10.1093/bioinformatics/bts407
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Example sentence with event annotation. Prot, -Reg and Cell comp. abbreviated for Protein, Negative regulation and Cell component, respectively
Fig. 2.Span versus structure. Although a representation using nested, typed spans (left) can capture the fact that specific entities participate in a process, it lacks the mechanisms to express, e.g. the direction of causality. The structured event representation (right) differentiates Themes from Causes
Primary entity types, related ontology terms and annotation counts
| Type | Term(s) | Examples | Count |
|---|---|---|---|
| O | |||
| O | Single cell org. | 722 | |
| A | |||
| O | Organism subdivision | 49 | |
| A | Anatomical system | 18 | |
| O | Compound organ | 176 | |
| M | Multi-tissue structure | 514 | |
| T | Portion of tissue | 426 | |
| C | Cell | 1198 | |
| C | Cellular component | 145 | |
| D | Developing anatomical structure | 6 | |
| O | Portion of organism substance | 142 | |
| I | Immaterial anatomical entity | 15 | |
| P | Cancer | 910 | |
| M | |||
| D | Inorganic molecular entity | 944 | |
| G | Gene | 2962 | |
Labels in gray identify informal categories used in evaluation.
Annotated also in previously introduced event extraction resources. to identifies a term t in an ontology o; ontology identifiers are OBO Foundry prefixes (namespaces).
Fig. 3.Annotation with detailed GO terms (top; hypothetical) and event annotation with general types (bottom; applied)
Primary event types, argument roles, related ontology terms and annotation counts
| Type | Arguments | Term(s) | Examples | Count |
|---|---|---|---|---|
| A | ||||
| C | Cell proliferation | 133 | ||
| D | Developmental process | 316 | ||
| B | Blood vessel development | 855 | ||
| G | Growth | 169 | ||
| D | Death | 97 | ||
| B | — | 69 | ||
| R | Tissue remodeling | 33 | ||
| M | ||||
| S | Biosynthetic process | 17 | ||
| G | Gene expression | 435 | ||
| T | Transcription, DNA-dependent | 37 | ||
| C | Catabolic process | 26 | ||
| P | Phosphorylation | 33 | ||
| D | Dephosphorylation | 6 | ||
| G | ||||
| L | Localization | 450 | ||
| B | Binding | 184 | ||
| R | Biological regulation | 773 | ||
| P | Pos.regulation of biol.proc. | 1327 | ||
| N | Neg.regulation of biol.proc. | 921 | ||
| P | ||||
| P | Planned process | 643 | ||
Labels in gray identify categories used in evaluation: events of the Anatomical category involve Organism or Anatomy entities (Table 1); Molecular involve Molecule entities; others can involve any entity type.
Annotated also in previously introduced event extraction resources.
Overall corpus statistics
| Item | Train | Devel | Test | Total |
|---|---|---|---|---|
| Document | 131 | 44 | 87 | 262 |
| Sentence | 1271 | 457 | 880 | 2608 |
| Word | 27 875 | 9610 | 19 103 | 56 588 |
| Entity | 4147 | 1431 | 2713 | 8291 |
| O | 359 | 126 | 237 | 722 |
| A | 1844 | 589 | 1166 | 3599 |
| M | 1944 | 716 | 1310 | 3970 |
| Event | 3296 | 1175 | 2206 | 6677 |
| A | 810 | 269 | 596 | 1675 |
| M | 340 | 125 | 240 | 705 |
| G | 1851 | 627 | 1176 | 3654 |
| P | 295 | 154 | 194 | 643 |
See Tables 1 and 2 for entity and event categories.
Fig. 4.Example Negative regulation (-Reg) event connecting entities at different levels of biological organization
Comparison of corpus statistics with BioNLP Shared Task 2011 corpora annotated using the same representation
| Item | MLEE | EPI | GE | ID |
|---|---|---|---|---|
| Document | 262 | 1200 | 1224 | 30 |
| Word | 56 588 | 253 628 | 348 908 | 153 153 |
| Entity | 8291 | 15190 | 21616 | 12740 |
| Event | 6677 | 3714 | 24967 | 4150 |
The ID document count is low as the corpus consists of full-text documents, not abstracts.
Overall entity mention detection results (prec/rec/F score)
| Model | Exact | Matching criterion | |
|---|---|---|---|
| Left boundary | Right boundary | ||
| Base | 77.03 / 69.18 / 72.89 | 79.85 / 71.72 / 75.57 | 82.47 / 74.07 / 78.04 |
| Dictionary | 79.49 / 73.77 / 76.52 | 82.59 / 76.64 / 79.50 | 84.68 / 78.58 / 81.52 |
Entity mention detection results by category for dictionary model (prec/rec/F score)
| Category | Exact | Matching criterion | |
|---|---|---|---|
| Left boundary | Right boundary | ||
| O | 90.82 / 82.10 / 86.24 | 91.79 / 82.97 / 87.16 | 91.79 / 82.97 / 87.16 |
| A | 77.47 / 72.70 / 75.01 | 78.67 / 73.83 / 76.17 | 84.58 / 79.38 / 81.90 |
| M | 79.37 / 73.25 / 76.18 | 84.54 / 78.03 / 81.15 | 83.54 / 77.10 / 80.19 |
Overall event extraction results
| Model | Prec | Rec | |
|---|---|---|---|
| Base | 56.53 | 48.72 | 52.34 |
| Stacking (GE) | 56.38 | 50.77 | 53.43 |
Event extraction results by category for stacked model
| Category | Prec | Rec | |
|---|---|---|---|
| A | 80.91 | 72.05 | 76.22 |
| M | 68.44 | 75.63 | 71.86 |
| G | 43.87 | 38.99 | 41.29 |
| P | 56.68 | 51.96 | 54.22 |
| M | 47.95 | 29.92 | 36.85 |
Event categories as defined in Table 2; Modification gives performance for Negation and Speculation detection.