| Literature DB >> 35205464 |
Yu Zhao1,2, Wanli Zuo1,2, Shining Liang1,2, Xiaosong Yuan1,2, Yijia Zhang1,2, Xianglin Zuo1,2.
Abstract
As a data augmentation method, masking word is commonly used in many natural language processing tasks. However, most mask methods are based on rules and are not related to downstream tasks. In this paper, we propose a novel masking word generator, named Actor-Critic Mask Model (ACMM), which can adaptively adjust the mask strategy according to the performance of downstream tasks. In order to demonstrate the effectiveness of the method, we conducted experiments on two causal event extraction datasets. Experiment results show that, compared with various rule-based masking methods, the masked sentences generated by our proposed method can significantly enhance the generalization of the model and improve the model performance.Entities:
Keywords: adversarial attack; causal event extraction; information extraction; reinforcement learning
Year: 2022 PMID: 35205464 PMCID: PMC8870841 DOI: 10.3390/e24020169
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Examples of sentences with pairs of causal events.
Figure 2Examples of reprocessing data. The red and yellow words in the table are the newly added reason and result labels, respectively.
Experiment dataset statistics. means that sentence contains causal-effect pairs.
| SemEval | Causal TB | |||
|---|---|---|---|---|
| Train | Test | Train | Test | |
|
| 904 | 421 | 220 | 181 |
|
| 6574 | 2775 | 202 | 78 |
|
| 7478 | 3196 | 422 | 103 |
Figure 3An overview of the proposed ACMM framework.
Figure 4Experimental diagram of hyperparameter selection. The two pictures above show the results of models under different batch sizes. The remaining two pictures show the loss curve of the model.
Results on two causal extraction datasets.
| Model | SemEval | Causal TB | ||||
|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | |
| BiLSTM-CRF | 65.90 | 45.37 | 53.72 | 32.07 | 31.41 | 31.64 |
| TARGER | 61.38 | 70.20 | 65.49 | 40.69 | 49.55 | 44.69 |
| BERT | 77.61 | 77.85 | 77.71 | 44.01 | 53.85 | 47.60 |
| BERT-CRF | 78.63 | 77.91 | 78.26 | 55.24 | 46.80 | 50.66 |
| BERT-BiLSTM-CRF | 80.40 | 77.20 | 78.74 | 54.60 | 43.60 | 48.14 |
| BERT-ACMM | 79.19 | 77.41 | 78.28 | 48.37 |
|
|
| BERT-CRF-ACMM | 80.19 |
|
| 54.33 | 47.18 | 50.50 |
| BERT-BiLSTM-CRF-ACMM |
| 77.08 | 79.25 |
| 45.78 | 50.12 |
Comparison of our method with rule-based strategies.
| Mask | SemEval | Causal TB | ||||
|---|---|---|---|---|---|---|
| Function | P | R | F1 | P | R | F1 |
| No Mask | 77.61 | 77.85 | 77.71 | 44.01 | 53.85 | 47.60 |
| Whole Word | 76.19 |
| 77.99 | 48.38 | 53.85 | 50.94 |
| Word Piece | 76.54 | 78.67 | 77.58 |
| 49.36 | 50.78 |
| Span | 77.04 | 76.13 | 76.58 | 43.13 | 53.85 | 47.90 |
| RL Mask |
| 77.41 |
| 48.37 |
|
|
Figure 5Box plot of F1 for 5 different strategies. The upper and lower sides represent the maximum and minimum values, and the middle line represents the mean values.
Figure 6Examples of three masked sentences. Among them, “cas” represents the cause entity, and “eff” represents the result entity. There are two sentences with explicit causality, and one sentence with implicit causality.