| Literature DB >> 31992184 |
Jin-Woo Chung1, Wonsuk Yang1, Jong C Park2.
Abstract
BACKGROUND: Event extraction from the biomedical literature is one of the most actively researched areas in biomedical text mining and natural language processing. However, most approaches have focused on events within single sentence boundaries, and have thus paid much less attention to events spanning multiple sentences. The Bacteria-Biotope event (BB-event) subtask presented in BioNLP Shared Task 2016 is one such example; a significant amount of relations between bacteria and biotope span more than one sentence, but existing systems have treated them as false negatives because labeled data is not sufficiently large enough to model a complex reasoning process using supervised learning frameworks.Entities:
Keywords: Bacteria; Biomedical event extraction; Biotope; Cross-sentence relations; Natural language processing; Text mining; Unsupervised inference
Mesh:
Year: 2020 PMID: 31992184 PMCID: PMC6988352 DOI: 10.1186/s12859-020-3341-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of context triggers compiled from the BB-event training data and the large-scale unlabeled data
| No | Compiled from the training data | Compiled from the unlabeled data | ||
|---|---|---|---|---|
| Context trigger | Frequency | Context trigger | Frequency | |
| 1 | isolate | 8 | strain | 21256 |
| 2 | infection | 5 | infection | 12856 |
| 3 | strain | 5 | isolate | 9555 |
| 4 | attachment | 3 | prevalence | 3300 |
| 5 | adhesion | 3 | growth | 2868 |
| 6 | bacteremia | 2 | detection | 2621 |
| 7 | enrichment | 2 | resistance | 2479 |
| 8 | growth | 2 | bacteria | 1812 |
| 9 | carriage | 2 | pathogen | 1711 |
| 10 | resistance | 2 | culture | 1412 |
| 11 | detection | 1 | response | 1358 |
| 12 | bacteriophage | 1 | survival | 1275 |
| 13 | surveillance | 1 | abundance | 1196 |
| 14 | isolation | 1 | susceptibility | 1161 |
| 15 | elimination | 1 | colonization | 1015 |
| 16 | isolation | 935 | ||
| 17 | transmission | 853 | ||
| 18 | exposure | 846 | ||
| 19 | disease | 818 | ||
| 20 | adhesion | 791 | ||
They are sorted by their occurrence frequencies in the data
Comparison of intra-sentence and cross-sentence extraction on the development data
| Gold intra-sentence events only | ||||||
| Gold | Predicted | Correct | Precision | Recall | F1 | |
| Intra | 165 | 165 | 165 | 100.0 | 100.0 | 100.0 |
| Cross | 58 | 0 | 0 | 0.0 | 0.0 | 0.0 |
| All | 223 | 165 | 165 | 100.0 | 74.0 | 85.1 |
| Gold intra-sentence events + our cross-sentence extraction | ||||||
| Gold | Predicted | Correct | Precision | Recall | F1 | |
| Intra | 165 | 165 | 165 | 100.0 | 100.0 | 100.0 |
| Cross | 58 | 40 | 17 | 42.5 | 29.3 | 34.7 |
| All | 223 | 205 | 182 | 88.8 | 81.6 | 85.0 |
| Our intra-sentence extraction only | ||||||
| Gold | Predicted | Correct | Precision | Recall | F1 | |
| Intra | 165 | 325 | 153 | 47.1 | 92.7 | 62.4 |
| Cross | 58 | 0 | 0 | 0.0 | 0.0 | 0.0 |
| All | 223 | 325 | 153 | 47.1 | 68.6 | 55.8 |
| Our intra-sentence & cross-sentence extraction | ||||||
| Gold | Predicted | Correct | Precision | Recall | F1 | |
| Intra | 165 | 325 | 153 | 47.1 | 92.7 | 62.4 |
| Cross | 58 | 47 | 20 | 42.6 | 34.5 | 38.1 |
| All | 223 | 372 | 173 | 46.5 | 77.6 | 58.2 |
Comparison of event extraction performance
| Existing models | F1 | Recall | Precision | |||
| BB-event task participants (2016) | LIMSI | 48.5 | 64.6 | 38.8 | ||
| TurkuNLP | 52.1 | 44.8 | ||||
| VERSE | 55.8 | 61.5 | 51.0 | |||
| State-of-the-art systems | Li et al. [ | 58.1 | 58.0 | 56.3 | ||
| Li et al. [ | 57.4 | 56.8 | 59.4 | |||
| Gupta et al. [ | 58.7 | 65.7 | 53.0 | |||
| Proposed models | #intra | #cross | F1 | Recall | Precision | |
| M1: Intra-clause syntactic patterns | 213 | 0 | 37.7 | 30.7 | 48.8 | |
| M2: Intra-clause syntactic patterns + trigger-based inference (train) | 246 | 0 | 40.4 | 34.8 | 48.1 | |
| M3: Intra-clause syntactic patterns + trigger-based inference (unlabeled) | 417 | 64 | 56.7 | 48.2 | ||
| M4: VERSE (2016) + trigger-based inference (unlabeled) | 339 | 54 | 63.8 | 54.9 | ||
| M5: VERSE (2016) + trigger-based inference (unlabeled) (without linguistic modality detection) | 339 | 63 | 58.3 | 63.8 | 53.6 | |
The numbers in boldface indicate the highest scores for each metric
Syntactic patterns for propagating event labels to other candidate bacteria-location (or trigger-location) pairs, with examples where B, B1 and B2 are bacteria annotations, and L1 and L2 are location annotations
| Propagation patterns | Descriptions and examples | |
|---|---|---|
| 1 | Nesting | Given the phrase “ |
| 2 | Coordination | Given the two coordinated location mentions “ |
| 3 | Apposition | In the example “[ |
| 4 | Location hierarchy | Two geographical location mentions are sometimes connected via a comma when they have a clear hierarchical relationship, such as “ |
| 5 | Participle- preposition | In the example “[ |
Note that these patterns are applied to entity pairs of the same type, i.e., bacteria-bacteria or location-location pairs. B, B1, and B2 in boldface refer to bacteria entities. L, L1, L2, L3, and L4 in boldface refer to location entities
Fig. 1Screenshot of the PubMed original abstract text for the BB-event document (PMID: 10738994). The BB-event datasets do not contain headers such as OBJECTIVES, DESIGN, and METHODS
Syntactic patterns for collecting context triggers
| Trigger patterns | Examples | |
|---|---|---|
| 1 | { | ∙ |
| ∙ | ||
| 2 | [ | ∙ |
| ∙ [ | ||
| 3 | [ | ∙ [ |
| ∙ |
Underlined expressions are context triggers to be collected. [bacteria] and [location] are bacteria and location mentions annotated in BB-event training data, respectively. prep and loc_prep are a general preposition and a locational preposition, respectively. {n/v/p} is either a noun, a verb, or a participle. Examples on the right show actual snippets from the BB-event training data matched by syntactic patterns on the left. B and L in boldface refer to bacteria and location entities, respectively
Fig. 2Overview of the proposed model
Fig. 3Using a trigger pattern to collect a context trigger from training data
Syntactic patterns for extracting additional intra-clause events
| Intra-clause syntactic patterns | Examples | |
|---|---|---|
| 1 | {[bacteria] [location]} NP or {[location] [bacteria]} NP i.e., bacteria and location mentions are nested in a longer noun phrase | ∙ |
| ∙ | ||
| 2 | [bacteria [location]] or [location [bacteria]] i.e., a bacteria mention is nested within a location mention or vice versa. | ∙ [[ |
| ∙ | ||
| 3 | [bacteria] prep [location] or [location] prep [bacteria] | ∙ |
B and L in boldface refer to bacteria and location entities, respectively
Fig. 4Example of extracting long-range intra-sentence events
Fig. 5Example of propagating event labels to other pairs of bacteria and locations
Fig. 6Example of the mapping between context triggers and bacteria mentions, and of linking each trigger to location mentions. Blue-shaded words such as cultures and prevalence are context triggers. Dashed curved arrows are the mappings between context triggers and bacteria mentions within the context window of each bacteria mention. Solid curved arrows connecting context triggers to location mentions are intra-sentence relations between them. Solid vertical arrows on the right indicate the sliding context window (i.e., sentence range) of each of the three bacteria mentions (B1, B2, and B3)