| Literature DB >> 35671070 |
Yan Wang1, Jian Wang1, Huiyi Lu2, Bing Xu2, Yijia Zhang3, Santosh Kumar Banbhrani1, Hongfei Lin1.
Abstract
BACKGROUND: Event extraction is essential for natural language processing. In the biomedical field, the nested event phenomenon (event A as a participating role of event B) makes extracting this event more difficult than extracting a single event. Therefore, the performance of nested biomedical events is always underwhelming. In addition, previous works relied on a pipeline to build an event extraction model, which ignored the dependence between trigger recognition and event argument detection tasks and produced significant cascading errors.Entities:
Keywords: Dice loss; GCN; graph convolutional network; joint extraction; nested biomedical event; syntactic structure
Year: 2022 PMID: 35671070 PMCID: PMC9214613 DOI: 10.2196/37804
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Basic progress of biomedical event extraction, where yellow boxes represent the type of entity and the blue boxes represent the type of trigger. Theme and cause represent the relationship between participant and event, namely, argument detection. IL-8: interleukin 8; TNF-alpha: tumor necrosis factor.
Primary event types and argument roles in the multilevel event extraction corpus (N=6827).
| Event and subevent types | Core arguments | Values, n (%) | |||
|
| |||||
|
| Cell proliferation | Theme (entity) | 133 (2.42) | ||
|
| Development | Theme (entity) | 316 (4.81) | ||
|
| Blood vessel development | Theme (entity) | 855 (12.91) | ||
|
| Growth | Theme (entity) | 469 (2.65) | ||
|
| Death | Theme (entity) | 97 (1.53) | ||
|
| Breakdown | Theme (entity) | 69 (1.1) | ||
|
| Remodeling | Theme (entity) | 33 (0.45) | ||
|
| |||||
|
| Synthesis | Theme (entity) | 17 (0.3) | ||
|
| Gene expression | Theme (entity) | 435 (6.66) | ||
|
| Transcription | Theme (entity) | 37 (0.61) | ||
|
| Catabolism | Theme (entity) | 26 (0.39) | ||
|
| Phosphorylation | Theme (entity) | 33 (0.5) | ||
|
| Dephosphorylation | Theme (entity) | 6 (0.09) | ||
|
| |||||
|
| Localization | Theme (entity) | 450 (6.87) | ||
|
| Binding | Theme (entity) | 187 (2.92) | ||
|
| Regulation | Theme (entity or event) and cause (entity or event) | 773 (11.81) | ||
|
| Positive regulation | Theme (entity or event) and cause (entity or event) | 1327 (20.33) | ||
|
| Negative regulation | Theme (entity or event) and cause (entity or event) | 921 (14.08) | ||
|
| |||||
|
| Planned process | Theme (entity or event) | 643 (9.9) | ||
Figure 2The architecture of the conditional probability joint extraction framework, where numbers 0 to 9 represent each word in the sentence, the blue bar represents BioBERT embedding, the yellow bar represents POS-tagging embedding, and the green bar represents entity embedding. BERT: Bidirectional Encoder Representation From Transformers; BioBERT: Biomedical Bidirectional Encoder Representation From Transformers; B-BVD: B-blood vessel development; LSTM: long short-term memory; POS: parts of speech.
The multilevel event extraction statistical information.
| Item | Training, n (%) | Development, n (%) | Test, n (%) | Total, N | |||||
| Document | 131 (50) | 44 (16.8) | 87 (33.2) | 262 | |||||
| Sentence | 1271 (48.73) | 457 (17.52) | 880 (33.74) | 2608 | |||||
| Word | 27,875 (49.26) | 9610 (16.98) | 19,103 (33.76) | 56,588 | |||||
| Entity | 4147 (50.02) | 1431 (17.26) | 2713 (32.72) | 8291 | |||||
|
| 3296 (49.36) | 1175 (17.6) | 2206 (33.04) | 6677 | |||||
|
| Anatomical | 810 (48.36) | 269 (16.06) | 596 (35.58) | 1675 | ||||
|
| Molecular | 340 (48.2) | 125 (17.7) | 240 (34.0) | 705 | ||||
|
| General | 1851 (50.66) | 627 (17.16) | 1176 (32.18) | 3654 | ||||
|
| Planned | 295 (45.9) | 154 (24.0) | 194 (30.2) | 643 | ||||
The primary event types and core argument roles in the BioNLP-STa 2011 GEb corpus and the important statistical information of the GE corpus.
| Event types and BioNLP-ST 2011 GE items | Core arguments | Values, N | |
|
| |||
|
| Gene expression | Theme (protein) | N/Ac |
|
| Transcription | Theme (protein) | N/A |
|
| Protein catabolism | Theme (protein) | N/A |
|
| Phosphorylation | Theme (protein) | N/A |
|
| Localization | Theme (protein) | N/A |
|
| Binding | Theme (protein)d | N/A |
|
| Regulation | Theme (protein or event) and cause (protein or event) | N/A |
|
| Positive regulation | Theme (protein or event) and cause (protein or event) | N/A |
|
| Negative regulation | Theme (protein or event) and cause (protein or event) | N/A |
|
| |||
|
| Document | N/A | 1224 |
|
| Word | N/A | 348,908 |
|
| Entity | N/A | 21,616 |
|
| Event | N/A | 24,967 |
aBioNLP-ST: BioNLP shared task.
bGE: Genia event.
cN/A: not applicable.
dRepresents the number of arguments >1.
Overall performance on multilevel event extraction compared with the state-of-the-art methods with gold standard entities.
| Method | Trigger recognition (%) | Event extraction (%) | |||||
|
| Precision | Recall | F1 score | Precision | Recall | F1 score | |
| EventMinea | 70.79 | 81.69 | 75.84 | 62.28 | 49.56 | 55.20 | |
| SSLa,b | 72.17 | 82.26 | 76.89 | 55.76 | 59.16 | 57.41 | |
| CNNa,c | 80.92 | 75.23 | 77.97 | 60.56 | 56.23 | 58.31 | |
| mdBLSTMa,d | 82.79 | 76.56 | 79.55 | 90.24 | 44.50 | 59.61 | |
| RLe+KBsa,f | N/Ag | N/A | N/A | 63.78 | 56.81 | 60.09 | |
| DeepEventMineh | N/A | N/A | N/A | 69.91 | 55.49 | 61.87 | |
| HANNh,i | N/A | N/A | N/A | 63.91 | 56.08 | 59.74 | |
| Our modelh | 82.20 | 78.25 | 80.18 | 72.26 | 55.23 | 62.80j | |
aPipeline model.
bSSL: semisupervised learning.
cCNN: convolutional neural network.
dmdBLSTM: bidirectional long short-term memory with a multilevel attention mechanism and dependency-based word embeddings
eRL: reinforcement learning.
fKB: knowledge base
gN/A: not applicable.
hJoint model.
iHANN: hierarchical artificial neural network.
jThe best value compared with baselines.
The F1 score performance on simple events, nested events, and all events on the multilevel event extraction corpus.
| Subtask and model | Simple (%) | Nested (%) | All (%) | ||||
|
| |||||||
|
| CNNa | 79.52 | 78.80 | 78.52 | |||
|
| RLb+KBsc | N/Ad | N/A | N/A | |||
|
| DeepEventMine | N/A | 79.12 | N/A | |||
|
| HANNe | N/A | N/A | N/A | |||
|
| Our model | 79.96f | 80.05f | 80.18f | |||
|
| |||||||
|
| CNN | 61.33 | 54.29 | 58.87 | |||
|
| RL+KBs | N/A | 58.69 | 60.09 | |||
|
| DeepEventMine | N/A | 51.73 | 61.87 | |||
|
| HANN | 77.08f | 45.46 | 59.74 | |||
|
| Our model | 64.85 | 61.26f | 62.80f | |||
aCNN: convolutional neural network.
bRL: reinforcement learning.
cKB: knowledge base.
dN/A: not applicable.
eHANN: hierarchical artificial neural network.
fThe best value compared with other models.
The extraction performance for different events on multilevel event extraction corpus.
| Events | Precision (%) | Recall (%) | F1 score (%) |
| Cell proliferation | 62.50 | 58.57 | 60.47 |
| Development | 51.82 | 66.43 | 58.22 |
| Blood vessel development | 90.42 | 72.66 | 80.57 |
| Growth | 78.02 | 50.58 | 61.37 |
| Death | 79.12 | 44.32 | 56.81 |
| Breakdown | 71.30 | 48.30 | 57.59 |
| Remodeling | 85.71 | 58.32 | 69.41 |
| Synthesis | 48.00 | 20.30 | 28.53 |
| Gene expression | 74.72 | 82.42 | 78.38 |
| Transcription | 16.67 | 33.33 | 22.22 |
| Catabolism | 100.00 | 50.00 | 66.67 |
| Phosphorylation | 90.00 | 100.00 | 94.74 |
| Dephosphorylation | 100.00 | 100.00 | 100.00 |
| Localization | 76.86 | 49.98 | 60.57 |
| Binding | 74.52 | 51.23 | 60.71 |
| Regulation | 63.82 | 51.49 | 56.99 |
| Positive regulation | 78.28 | 50.66 | 61.51 |
| Negative regulation | 64.35 | 54.69 | 59.13 |
| Planned process | 69.57 | 51.86 | 59.42 |
| All | 64.85 | 61.26 | 62.80 |
The performance of biomedical event extraction on the BioNLP shared task 2011 Genia event corpus.
| Method and event type | Precision (%) | Recall (%) | F1 score (%) | |
|
| ||||
|
| Event totalc | 57.65 | 49.56 | 53.30 |
|
| ||||
|
| Event total | 63.48 | 53.35 | 57.98 |
|
| ||||
|
| Event total | 66.46 | 48.96 | 56.38 |
|
| ||||
|
| Event total | 69.45 | 49.94 | 58.07 |
|
| ||||
|
| Event total | 71.73 | 53.21 | 61.10 |
|
| ||||
|
| Simple totali | 85.95 | 72.62 | 78.73 |
|
| Binding | 53.16 | 37.68 | 44.10 |
|
| Regulation totalj | 55.73 | 41.73 | 47.72 |
|
| Event total | 67.10 | 52.14 | 58.65 |
|
| ||||
|
| Regulation total | 55.21 | 47.23 | 50.91 |
|
| Event total | 64.61 | 56.11 | 60.06 |
|
| ||||
|
| Regulation total | 62.36 | 51.88 | 56.64l |
|
| Event total | 76.28 | 55.06 | 63.96l |
|
| ||||
|
| Simple total | 82.23 | 78.88 | 80.52 |
|
| Binding | 55.12 | 37.48 | 44.62 |
|
| Regulation total | 57.82 | 46.39 | 51.48 |
|
| Event total | 72.62 | 53.33 | 61.50 |
aPipeline model.
bTEES: Turku Event Extraction System.
cRepresents the overall performance on the test set.
dCNN: convolutional neural network.
eJoint model.
fHANN: hierarchical artificial neural network.
gKB: knowledge base.
hLSTM: long short-term memory.
iRepresents the overall performance for simple events on the test set.
jRepresents the overall performance for nested events on the test set (including regulation, positive regulation, and negative regulation subevents).
kGEANet-SciBERT: Graph Edge-conditioned Attention Networks with Science BERT.
lThe best value compared with other models.
Figure 3An example of attention-based gate graph neural network effectiveness. (A) Row-wise heap map, where each row is an array of average scores of the 2 heads obtained from the multi-head attention mechanism. The darker the color, the higher the score and the stronger the interaction. (B) Dependency parsing result produced by Stanford CoreNLP and the golden relationships between event triggers and arguments, where yellow boxes represent entity type, and the blue boxes represent event type.
Figure 4Case study for a simple nested event on the multilevel event extraction corpus. CNN: convolutional neural network.
Figure 5Case study for a common nested event on multilevel event extraction corpus. CNN: convolutional neural network.
Figure 6Case study for an across-sentence nested event on multilevel event extraction corpus. CNN: convolutional neural network.