| Literature DB >> 34776775 |
Yohei Seki1, Kangkang Zhao1, Masaki Oguni1, Kazunari Sugiyama2.
Abstract
Temporal-relation classification plays an important role in the field of natural language processing. Various deep learning-based classifiers, which can generate better models using sentence embedding, have been proposed to address this challenging task. These approaches, however, do not work well due to the lack of task-related information. To overcome this problem, we propose a novel framework that incorporates prior information by employing awareness of events and time expressions (time-event entities) with various window sizes to focus on context words around the entities as a filter. We refer to this module as "question encoder." In our approach, this kind of prior information can extract task-related information from simple sentence embedding. Our experimental results on a publicly available Timebank-Dense corpus demonstrate that our approach outperforms some state-of-the-art techniques, including CNN-, LSTM-, and BERT-based temporal relation classifiers.Entities:
Keywords: Event and time expressions; Neural networks; Question encoder; Temporal-relation classification; Timebank
Year: 2021 PMID: 34776775 PMCID: PMC8513567 DOI: 10.1007/s00799-021-00310-1
Source DB: PubMed Journal: Int J Digit Libr ISSN: 1432-1300
Fig. 1Architecture of our proposed framework
Example of the context for questioned time–event entities in the Timebank-Dense corpus
Fig. 2Architecture of our question encoder
F1 score using validation dataset with hyperparameter
| 8 | 16 | 32 | 64 | 128 | 256 | 512 | |
|---|---|---|---|---|---|---|---|
| F1 | 0.670 | 0.662 | 0.675 | 0.680 | 0.675 | 0.673 |
F1 score using validation dataset with hyperparameter
| 8 | 16 | 32 | 64 | 128 | 256 | 512 | |
|---|---|---|---|---|---|---|---|
| F1 | 0.664 | 0.648 | 0.661 | 0.681 | 0.648 | 0.455 |
F1 score using validation dataset with context window size
| 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|
| F1 | 0.661 | 0.669 | 0.661 | 0.680 | 0.672 | 0.675 | 0.678 |
Overall comparison between our proposed framework and the three state-of-the-art models, DLIGACH [4], CHENG [2], and MIRZA [22], which are a CNN model, a Bi-LSTM-based model, and a feature-based model, respectively
| Systems | Proposed | DLIGACH [ | CHENG [ | MIRZA [ |
|---|---|---|---|---|
| Micro F1 | 0.667 | 0.497 | 0.520 | 0.512 |
Comparison by relation types for all pairs
| Methods | Approach | AFTER | BEFORE | INCLUDES | IS_ | VAGUE | F1 Score | ||
|---|---|---|---|---|---|---|---|---|---|
| INCLUDED | Macro | Micro | |||||||
| Proposed | CNN+QE | 0.738 | 0.730 | 0.409 | 0.371 | 0.710 | – | ||
| LSTM+QE | 0.688 | 0.670 | 0.385 | 0.372 | 0.669 | 0.557 | 0.622 | ||
| DLIGACH [ | CNN | 0.429 | 0.423 | 0.073 | 0.178 | 0.710 | 0.363 | 0.497 | |
| CHENG [ | LSTM | 0.454 | 0.391 | 0.216 | 0.309 | 0.623 | 0.399 | 0.520 | |
| MIRZA [ | LR | 0.44 | 0.51 | 0.11 | 0.47 | 0.58 | 0.422 | 0.083 | 0.518 |
“”denotes that the difference between our proposed CNN-based approach (bold score) and all the other four models in macro F1 is statistically significant for
Comparison by relation types for E–E pairs
| Methods | Approach | AFTER | BEFORE | INCLUDES | IS_ | VAGUE | F1 Score | ||
|---|---|---|---|---|---|---|---|---|---|
| INCLUDED | Macro | Micro | |||||||
| Proposed | CNN+QE | 0.747 | 0.732 | 0.456 | 0.419 | 0.741 | – | ||
| LSTM+QE | 0.665 | 0.672 | 0.427 | 0.419 | 0.711 | 0.579 | 0.661 | ||
| DLIGACH [ | CNN | 0.440 | 0.444 | 0.096 | 0.143 | 0.737 | 0.372 | 0.546 | |
| CHENG [ | LSTM | 0.440 | 0.460 | 0.025 | 0.170 | 0.624 | 0.344 | 0.529 | |
| ZHANG [ | 0.526 | 0.503 | 0.106 | 0.325 | 0.626 | 0.417 | 0.548 | ||
| MIRZA [ | LR | 0.430 | 0.471 | 0.049 | 0.250 | 0.613 | 0.363 | 0.519 | |
| WU [ | BERT | 0.536 | 0.613 | 0.202 | 0.234 | 0.656 | 0.448 | 0.601 | |
| SOARES [ | 0.297 | 0.308 | 0.067 | 0.102 | 0.311 | 0.217 | 0.444 | ||
“” and “”denote that the difference between our proposed CNN-based approach (bold score) and all the other seven models in macro F1 is statistically significant for and , respectively
Fig. 3Comparison between the CNN-based and the Bi-LSTM-based models by varying the ratio of the training data
Comparison of relation types in the CNN-based and the Bi-LSTM-based models by varying the ratio of training data
| Relation | CNN-based | Bi-LSTM-based | ||||||
|---|---|---|---|---|---|---|---|---|
| 30% | 50% | 70% | 100% | 30% | 50% | 70% | 100% | |
| AFTER | 0.471 | 0.574 | 0.632 | 0.738 | 0.413 | 0.566 | 0.637 | 0.688 |
| BEFORE | 0.503 | 0.584 | 0.582 | 0.730 | 0.386 | 0.562 | 0.647 | 0.670 |
| INCLUDES | 0.157 | 0.288 | 0.386 | 0.409 | 0.150 | 0.222 | 0.275 | 0.385 |
| IS_INCLUDED | 0.218 | 0.302 | 0.418 | 0.371 | 0.160 | 0.298 | 0.400 | 0.372 |
| VAGUE | 0.610 | 0.614 | 0.676 | 0.710 | 0.591 | 0.597 | 0.624 | 0.669 |
| Macro F1 | 0.392 | 0.472 | 0.539 | 0.340 | 0.449 | 0.517 | 0.557 | |
| Micro F1 | 0.492 | 0.547 | 0.601 | 0.440 | 0.529 | 0.586 | 0.622 | |
Example of original and expanded data instances
| Text | Label | Relation | |
|---|---|---|---|
| Original instance | He said he | Before | |
| Expanded instance | He said he | After |
Comparison between the expanded and original training data sizes
| Relation | # Expanded training data | # Original training data |
|---|---|---|
| AFTER | 4316 | 1889 |
| BEFORE | 4316 | 2427 |
| INCLUDES | 1733 | 695 |
| IS_INCLUDED | 1733 | 1038 |
| VAGUE | 442 | 442 |
Experimental results conducted on expanded data
| Relation | Expanded training data | Original training data |
|---|---|---|
| AFTER | 0.795 | 0.738 |
| BEFORE | 0.807 | 0.730 |
| INCLUDES | 0.617 | 0.409 |
| IS_INCLUDED | 0.558 | 0.371 |
| VAGUE | 0.767 | 0.718 |
| Macro F1 | 0.592 | |
| Micro F1 | 0.667 |