Literature DB >> 36166421

Extracting Chinese events with a joint label space model.

Wenzhi Huang^1,2, Junchi Zhang², Donghong Ji¹.

Abstract

The task of event extraction consists of three subtasks namely entity recognition, trigger identification and argument role classification. Recent work tackles these subtasks jointly with the method of multi-task learning for better extraction performance. Despite being effective, existing attempts typically treat labels of event subtasks as uninformative and independent one-hot vectors, ignoring the potential loss of useful label information, thereby making it difficult for these models to incorporate interactive features on the label level. In this paper, we propose a joint label space framework to improve Chinese event extraction. Specifically, the model converts labels of all subtasks into a dense matrix, giving each Chinese character a shared label distribution via an incrementally refined attention mechanism. Then the learned label embeddings are also used as the weight of the output layer for each subtask, hence adjusted along with model training. In addition, we incorporate the word lexicon into the character representation in a soft probabilistic manner, hence alleviating the impact of word segmentation errors. Extensive experiments on Chinese and English benchmarks demonstrate that our model outperforms state-of-the-art methods.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36166421 PMCID： PMC9514653 DOI： 10.1371/journal.pone.0272353

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Event extraction is a field of study that aims to generate structural knowledge with regard to particular occurred events that people care about from plain texts [1, 2]. End-to-end event extraction contains three fundamental tasks, namely entity recognition, event trigger identification and argument role classification. Entities, referred to a set of world objects (e.g. Steve Jobs, Bill Gates), consist of several consecutive tokens in the sentence with an association of a particular type (e.g. Persons, Organizations and Locations). Event triggers, generally determined by verbs or nominalizations, are keywords that can mostly evoke the corresponding events. For example, given a Chinese text: “军警两名士兵丧生。”(Two soldiers of the military police were killed.) In this instance, an event detection system should be able to identify that the word “丧生”(were killed) is an event trigger of type Die. At last, event arguments are entities to be connected to triggers with specific roles in the event, such as “士兵”(soldiers) plays an Victim role in the Die event triggered by “丧生”(were killed). Traditional pipelined extraction systems treat entity, trigger and argument extractions as three separate tasks, which follow a procedure of entity recognition → trigger word identification → argument role classification [3-8]. Although these methods are flexible, they have the limitations that incorrect entity and trigger results would degrade the performance of argument role classification. These pipelined methods could lead to two issues:1) previous step errors would propagate to following steps;2)they are typically insufficient for modeling the mutual dependence among subtasks. Therefore, later approaches put more focus on building joint models to simultaneously extract entities, triggers and argument roles. Prior joint learning methods depend heavily on human-designed indicator features and pre-built syntax tools to capture most useful information for event extraction [9-11]. With the raise of deep learning models, recent studies concentrate on representation-based neural networks to automatically compose low-dimensional features, and multi-task approaches based on hard parameter sharing are applied to jointly solve information extraction tasks [12-15]. As shown in Fig 1(a), their approaches can be mainly divided into three components: (1) An embedding look-up layer with pre-tokenized words as inputs, the embedding table is usually initialized with pre-trained word vectors [16]; (2) A shared multi-layer Bi-directional recurrent neural network (BiRNN) to encode deep contextual representations, and the long-short term memory alternative (LSTM) [17] is typically adopted for handling the vanish gradient problem. (3) Independent output networks with the Softmax function are added on top for classifying specific labels.

Fig 1

Illustration of comparison of existing methods and our proposed method for an input Chinese sentence “军警两名士兵丧生(Two military soldiers were killed)”.

Although these neural-based joint learning methods perform better than the former, the complicated event structure still poses two challenges when applied to Chinese event extraction. First, unlike languages with explicit word boundaries (e.g. English), Chinese event extraction is more difficult since words in Chinese texts are not indicated naturally. Hence, a word segmentation procedure is often required before involving subsequent applications [18-20]. However, it is unavoidable that words are segmented incorrectly. This will result in inherent errors in the detection of entity and trigger boundaries and the prediction of their categories. Therefore, some approaches resort to performing Chinese event detection directly at the character level [21, 22]. This results in a dilemma between the choice of performing Chinese event extraction based on a fully character-level model and by first segment text into words. Second, traditional multi-task models that are based on hard parameter sharing rely on implicit network weights to capture correlations among tasks [13, 23–26], treating event labels as meaningless and independent one-hot vectors, which cause a loss of potential label information. However, this is inconsistent with the process of the human annotation of an event mention. For instance, for a trigger with event type e.g. Divorce, a human will only connect PERSON entities to the trigger as argument roles based on the fact that it is impossible for non-human beings to divorce. Previously, we have presented a transition-based method [27] that approaches the joint learning in a left-to-right decoding order, which has been proven to be better than simple shared-private models. However, it suffers from two limitations: 1) The elaborate modification to the standard LSTM hinders the computation of multiple sentence in a batch and not all lexicons that related to a character are used; 2) The interactive semantics of all task labels have not been fully explored, in the sense that the event label information has not been introduced into the shared representations. In this work, we introduce a novel Multi-layer Label Attentive framework to improve Chinese Event Extraction (MLAEE). For the above first issue, there have been studies showing that integrating lexicon features into character-based networks could lead to better entity recognition performance [28, 29]. Inspired by these methods, we propose to perform event extraction based on characters and enhance character representations by introducing the word lexicon, which is presented in Fig 1(b). In contrast to modifying LSTM interior to incorporate word embeddings in hidden layers [28], we propose to make use of a simple and effective method [30] that turns the lexicon matching results to the BMES encoding scheme, which bypass the need for a complicated model architecture. For the second issue, we propose a joint label space for all the entity and event types, thereby allowing correlated-type information to incorporate into the network representations. In particular, we map labels of each subtask into low-dimensional semantic vectors, similar to word embeddings [16]. By stacking all label types of events as a matrix, we let each character hidden state performs attention over it for deriving a label importance distribution, and share the label parameters with the output layers. By doing this, label embeddings can be viewed as a semantic bridge that enables interactions between the encoding and decoding stage, leading to a novel joint learning approach. We conduct sets of experiments on a standard benchmark dataset for event extraction. With comprehensive comparisons with existing advanced methods, our model achieves state-of-the-art results on the Chinese ACE2005 dataset. To demonstrate that our joint label space model is applicable across different languages and tasks, we make two additional experiments: 1)event extraction on the English ACE2005 dataset; 2)using entity relation extraction as an auxiliary task to boost event performance. Results show that our approach is also effective on both the English dataset and incorporating relation labels. Furthermore, we make an ablation analysis to show the contribution of each proposed module and visualization results indicate that label embeddings can indeed capture semantic correlations among entities, triggers and argument roles.

Task definition

Formally, given an input sentence represented as a sequence of characters C = c1, c2, …c, we extract a set of entities E, event triggers T and event arguments A. In particular, each token c will be determined to be an entity span e. Then c will be differentiated to be a part of a positive or negative trigger word and will be further categorized to an event subtype label t if c is a positive trigger word. Further, for each trigger t and entity e pair in the same sentence, an argument role a is required to be predicted. Following [9, 13, 25], we prepare argument candidates using predicted entity mentions.

Methodology

In this section, we will detail our proposed MLAEE model. As Fig 2 illustrates, MLAEE extract event outputs from an input Chinese text in three steps: input representation layer, label attentive encoding layer and event identification layer.

Fig 2

Illustration of our multi-task framework for Chinese event extraction model.

During event decoding, we use two separate sequence taggers to obtain entity and event trigger results, respectively. Then for each entity-trigger pair, we assign it with an event relationship under the definition of argument role types designated by ACE2005 https://catalog.ldc.upenn.edu/ldc2006t06. Note that the weights of entity, trigger and argument role output networks are stacked as one label embedding matrix, which will be used at the encoding layer.

Input embedding

At input embedding layer, a hybrid approach that both character-level and word-level features are used for input representations. In particular, for an input Chinese sequence consisting of n characters C = c1, c2, …c, we transform the one-hot vectors into the distributed representations with a deep transformer layer. Its weight is pre-trained on a large amount of raw text with the object of the masked bi-directional language model [31]. To be consistent with BERT pre-training, we add two special tokens [CLS] and [SEP] at the front and end of C, respectively, before obtaining E. These contextualized embeddings have been proved to be better than static word embeddings [31], e.g. Word2Vec, Glove in many natural language tasks, due to dynamic embeddings are more similar to the diverse nature in human languages in the sense that the meaning of a word is changed along with its surrounding words.

Soft lexicon features

A flaw of the merely character-based event extraction approach is that the word information can not be utilized correspondingly. In this work, we investigate a SoftLexicon approach [30] that simply augments current character representation c with all matching word embeddings in a soft probabilistic manner. In particular, as presented in Fig 2 (bottom right), SoftLexicon first extract all words that contain character c in a lexicon and only keep the words that can be found in the input sequence. Then based on the location of c in a matched word w, which can be in the Begin, Middle, End of w or a Single-character word, w is categorized and marked as one of the four segmentation labels K = {B, M, E, S}. For example, the character c7 (“丧”) in Fig 2 appears in the start of the word (“丧失”) and c8 (“生”) appears in the end of the word. Accordingly, their corresponding segmentation categories are {B} and {E}, respectively. Note that if a segmentation label set is empty for c, we add “NULL” to the set to maintain a consistent input vector size. After categorization, words that belongs to the same segmentation label are condensed into a fixed-dimensional vector, which is formally calculated as: where v denotes the word embedding lookup table, z(w) denote the frequency that a matched word w occurs in the in-domain statistical data. Z is the weight normalization term in the four segmentation sets: In addition, we do not increase the frequency of w if it has been counted by another sub-sequence that matches the word, thus preventing the longer words always have high frequency than the shorter words. With Eq 2, we can obtain the combined representation of four word sets into one distributed vector as: where ⊕ indicates concatenation operation. Finally, a character c is represented by concatenating its contextualized character embedding and its BMES style soft word vector as:

Encoder layer

After the word semantic information is incorporated, we then feed the character representations into the sequence encoding layer to capture context-sensitive features, which is implemented by stacking multi-layer bidirectional long-short term memory networks(Bi-LSTMs) [17]. This enables the preservation of the historical and future information in forward and reverse directions. The forward and backward representations are concatenated to obtain one layer bi-directional representation of a character i as . We use matrix H = [h1;h2;…;h] to denote stacked hidden states for input sequence C.

Multi-head label attention layer

To incorporate joint label information into Bi-LSTMs, we propose to let each character‘s hidden representation h to interact with all subtask labels through the multi-head attention mechanism [32]. Formally, given a set of candidate output labels L = l1, l2, …, l, we represent each label l using an low-dimensional vector: where v denotes a label embedding lookup table. We can thus obtain label embedding matrices , , for entities, triggers and argument roles, respectively by feeding their one-hot categorical labels to Eq 5. is denoted as a concatenation of all label matrices , which will then be used for calculating the label importance distribution to update input character embeddings. Label embeddings can be randomly initialized and adjusted along with model training, or loaded more informatively with descriptive words of the label type. For example, a BE-BORN event is defined with coarse type “Life” and fine-grained type “Be born”. Hence, for a label l, we collect all descriptive words S and average pre-trained word embeddings in S as the label type initial vector: To jointly encode features from the character subspace h and the concatenated label subspace , we apply a scaled dot-product attention: where d is the dimension of the Bi-LSTM outputs H used for forming a soft norm in the attention distribution. W, W and W are model parameters. For m-head attentions, we concatenate m subspaces to form the final representation of c: The output of the label attention layer is the concatenation of the i-th step BiLSTM hidden state h and normalized label vector a: As illustrated in Fig 2, we then apply the second layer Bi-LSTM on top of the 0-th label-informed hidden states to obtain high-level , leading to a K layers hierarchical refined representations. Note that the k-th will be fed to subsequent event decoding layer.

Decoder layer

Entity and trigger identification

Given an input sequence C, we predict its entity tags S and trigger tags S by applying two feed-forward networks (FFN) with Relu activation: where and are entity and trigger label embedding weights from section, respectively. After that, two softmax output layers are applied to obtain the entity and trigger label probabilities: The training objective is to minimize the negative log-probability of logP(S|C) and logP(S|C).

Argument classification

To obtain the argument probabilities for the entity e with regard to the event trigger t, we combine their hidden representations from and feed the concatenated vector to a softmax feed-forward layer for argument role decoding: where and are selected hidden states from for the predicted entity c and the trigger c by Eq 14, is the gold argument role annotation. To cope with the cases where or is a span that contains multiple consecutive tokens, we summarize their embeddings via average-pooling for the consideration of keeping token order information. For model training, we minimize the negative the log-probability of , which is similar to the entity and trigger classification.

Joint training strategy

Following the work [33], entity identification, trigger extraction and argument role classification are treated as subtasks of end-to-end event extraction, and are optimized jointly via a multi-task learning setting. A cross-entory loss is used as the object function and the log-likelihoods of all the tasks in a sentence are summarized: During the testing stage where the gold-standard entities and trigger are not available, we predict their sequence labels by choosing the output with the highest score by Eq 14 and then convert the BILOU tags to the corresponding spans and types. We next pair every entity and trigger spans to extract argument roles by Eq 15.

Experiments

Experimental settings

Dataset

To examine the effectiveness of various models on three subtasks of event extraction, we conduct experiments on a multilingual training corpus, Automatic Content Extraction (ACE) 2005 dataset [1]. The dataset contains documents mainly collected from Newswires (NW), Broadcast News (BN), Weblog (WL) fields. Following [24]‘s experiment setup, we conduct tests on the Chinese dataset (ACE-CN) and the English dataset (ACE-EN). There are totally 7914 sentences in ACE-CN and 17172 in ACE-EN, respectively. We divide the training/developing/testing set accordingly. Note that we use entity types with 7 categories, event subtypes with 33 categories, and 22 argument role relations to be consistent with the pre-processing step of [24]. We follow [34] and use automatically segmented Chinese Giga-Word as the matching dictionary. Our dataset and data will be released at https://gitee.com/zjcerwin/cn_labelattn upon the paper acceptance.

Evaluation metrics

We use Precision (P), Recall (R) and F-Measure (F1) scores to evaluate the performances of different approaches with respect to entity recognition, event trigger detection and argument role classification by following [14, 23, 24]: Entity: An entity is considered correct if we can identify its start and end locations as well as the entity type correctly. Trigger: An event trigger is treated as correct if its start and end offsets as well as its event subtype are all matched (Trig-C). Argument: An argument role is determined as correct when its entity offset, relation role type and the connected triggers are all identified correctly (Arg-C).

Pre-processing

To represent input Chinese sentences, we use the bert-base-multilingual-cased model https://huggingface.co/transformers/pretrained_models.html for characters, and word embeddings with [35], which are pretrained on Chinese Gigaword corpus using the skip-gram model [36]. For English, we use an improved roberta-base-cased model for word pieces encoding in addition to the traditional 100-dimensional GloVe http://nlp.stanford.edu/projects/glove/ embeddings. Note that we fine-tune all the static embeddings during training and keep contextualized models fixed to keep relatively low GPU memory usage.

Hyper-parameter settings

All the model hyper-parameters are selected by taking the evaluation results on the developing set with the early stopping strategy. Specifically, dropout technique is adopted to prevent overfitting, which is set to 0.33 on input embeddings and hidden states. Adam optimizer is applied to adjust the network weights, with an initial learning rate of 0.01 and a decay rate of 0.85 for every five epochs. The hidden state size of stacked BiLSTM and label attention layer are employed both with 200, and the layer number is set to 2. We test the batch size in [16, 32, 64] and set the maximum epoch numbers to 150. And to verify the superiority of the proposed method is not caused by noise in the data or other random factors, we use the pairwise t-test for measuring significance. For a fair comparison, we conduct all experiments on a machine with Intel Quad core CPU (I7-6700k, 4.0GHz) and GeForce GTX 1080 GPU with 8 GB graphic memory.

Results

Baselines

With regard to prior work that considers the three subtasks, we construct baselines on word-level and character level, where word-based approaches use Jieba tokenizer https://github.com/fxsjy/jieba for segmentation, which includes: Word-Tree-Joint [12] is a typical shared-private model, which recognizes entities on top of shared Bi-LSTM representations and then extracts relations between entity pairs using tree-LSTM over dependency parsers. Word-NP-pipeline [8] is a two-stage word-based method that first pick NP nodes from a constitute parser as candidate entities, then enable triggers and arguments interactions with attention. Character-based methods include: Char-GRU-Joint [37] is a multitask neural method considering the three subtasks by sharing Bi-GRU hidden representations. Char-BERT-pipeline [31] is a pipelined method that shares low-level BERT embeddings. To predict event mentions, we simply add a softmax transformation layer on top of the BERT encoder. There are methods not only consider event subtasks but also involve semantic relation extraction: Char-Span-Joint [23] is a top-performed end-to-end information extraction model, all possible spans in a sentence are considered to construct information graphs. Char-Global-Joint [24] is the state-of-the-art information extraction framework that introduces indicative global features at the decoding stage to capture the cross-subtask and cross-instance interactions. Transition-Joint [27] is a recent state-of-the-art joint decoding method based on the transition system. They use a hybrid approach to incorporate character and word features [28]. For the English dataset, only word inputs are used. To testify the effectiveness of the proposed methods, we construct two modifications: Lattice: A multi-task event extraction model with soft lexicon features replaced by lattice LSTM [28]. Note that the proposed joint label attention mechanism is not used. SoftLexicon: Using soft lexicon features and also without the label attention. MLAEE + REL: Using all techniques introduced in this work and additionally learns relation extraction with an extra FFN output layer. We use “+ REL” to indicate models involving entity relation annotations.

Main result

The comparison results of entity, trigger and argument extractions are shown in Table 1. We can observe that: 1) character-based methods perform better than word-based counterparts. One possible reason is that incorrect word segments would severely hurt event results; 2) Purely character-based approaches underperform word lexicon enhanced ones Lattice and SoftLexicon, demonstrating the semantic units of Chinese words are helpful for event extraction. But instead of modifying LSTM extensively to introduce word features Lattice, a simplified encoding scheme SoftLexicon is enough and effective; 3)Compared to Transition-Joint, our Lattice give 3.5% better F-scores on argument classification. This result indicates that the joint label information is more effective than the left-to-right decoding in introducing interactive knowledge at the decoding stage, particularly for argument roles. 4) When equipped with label attention, MLAEE is 2.4% higher than current SOTA [24] on argument F-scores, verifying the effectiveness of the joint label information coupled with character representations. In addition, we evaluate the proposed framework on ACE05-EN (Table 2). The results show that MLAEE also performs well on English data, justifying the usefulness of label embedding across languages. On the other hand, there have been frameworks that jointly perform relation and event extractions [23, 24, 27]. To have a fair comparison with these models, we also integrate relation annotations into our model, denoted as “MLAEE+REL”. As can be seen from Tabels 1 and 2, our MLAEE+REL can still outperform the current state-of-the-art method Char-Global-Joint in event trigger and argument role classification, without the particular design of relation and event communications. This result further demonstrates that our label attentive model is effective across different structural prediction tasks.

Table 1

Comparison results on ACE05-CN.

Model	ACE05-CN
Model	Entity	Trig-C	Arg-C
Word-Tree-Joint	81.2	58.4	39.5
Word-NP-pipeline	78.5	59.1	42.4
Char-GRU-Joint	83.4	59.6	45.6
Char-BERT-pipeline	87.2	61.6	45.6
Char-Span-Joint*	87.8	62.7	46.7
Char-Global-Joint*	88.5	65.6	52.0
Transition-Joint*	88.0	63.4	47.3
Lattice	87.7	62.4	50.8
SoftLexicon	88.5	63.3	51.2
MLAEE	88.6^‡	65.8^‡	54.4^‡
MLAEE+REL*	88.9 ^†	66.4 ^‡	55.0 ^†

* indicates relation annotations are used.

† and ‡ indicate statistical significance compared to Char-Global-Joint with p < 0.01 and p < 0.05, respectively.

Table 2

Comparison results on ACE05-EN.

Model	ACE05-EN
Model	Entity	Trig-C	Arg-C
Word-Tree-Joint	-	69.6	50.1
Char-GRU-Joint	81.2	69.8	52.1
Char-Span-Joint*	89.7	69.7	48.8
Char-Global-Joint*	90.2	74.7	56.8
Word-Transition-Joint*	88.1	73.8	55.3
MLAEE	89.3	74.2	55.9
MLAEE+REL*	90.0	75.1 ^†	56.9 ^‡

* indicates relation annotations are used.

† and ‡ indicate statistical significance compared to Char-Global-Joint with p < 0.01 and p < 0.05, respectively.

* indicates relation annotations are used. † and ‡ indicate statistical significance compared to Char-Global-Joint with p < 0.01 and p < 0.05, respectively. * indicates relation annotations are used. † and ‡ indicate statistical significance compared to Char-Global-Joint with p < 0.01 and p < 0.05, respectively.

Ablation study

To examine the influence of several key model components, we conduct ablation tests on ACE-CN. Table 3 shows the results, it can be observed that without BiLSTM, MLAEE presents a moderate drop of performances. By removing SoftLexicon or Label embedding, both trigger and argument classification degrades significantly, indicating their importance in the network. Not surprisingly, the BERT embedding brings the most performance improvements, which is consistent with the experiments in [23, 25].

Table 3

Ablation tests on ACE-CN.

Settings	Trig-C	Arg-C	Entity
MLAEE	65.8	54.4	88.6
-SoftLexicon	62.5^†	50.8^†	86.7^‡
-Label embedding	63.3^‡	51.2^†	87.3^‡
-BiLSTM	64.3^‡	52.6^‡	87.8^†
-BERT embedding	61.2^†	48.6^†	85.2^‡

† and ‡ indicate statistical significance compared to MLAEE with p < 0.01 and p < 0.05, respectively.

Visualization

To understand information learned in the label embeddings, we visualize label types of entities, triggers and arguments by deducing the 200D embedding matrix into a 2D map with t-SNE after 3, 10, 40 training epochs, respectively. As shown in Fig 3, the locations of label types are increasingly more informative as training proceeds. At the initial epoch, the vectors locate randomly in the reduction space. After 10 epochs, we can observe that there are small clusters emerge, such as “Attack” event and “Attacker” argument. As training goes, we find groups start to absorb more semantic related labels, likewise “weapon”, “vehicle” entity types and “victim”, “target” argument roles closely surround “Attack” event. It confirms that the refined joint learning mechanism can indeed capture the label interactions among event subtasks.

Fig 3

t-SNE plot of joint label embeddings of entities, triggers and argument roles with varying numbers of training epochs.

(a) 3 epochs, (b) 10 epochs, and (c) 40 epochs.

t-SNE plot of joint label embeddings of entities, triggers and argument roles with varying numbers of training epochs.

(a) 3 epochs, (b) 10 epochs, and (c) 40 epochs.

Case study

We make a case study by comparing our MLAEE model with the previous best model Char-Global-Joint, on two representative Chinese event instances. As shown in Table 4, there are two Attack triggers “开” and “丢” in the first case, “枪” is the Instrument argument of “开” and “汽油弹” is the Instrument argument of “丢”, respectively. It can be observed that Char-Global-Joint fail to identify that “丢” triggers the Attack event and falsely connect “汽油弹” to “开”, while our MLAEE model can recognize two event mentions correctly. The reason is that the joint entity and event label space can incorporate correlated-type information into the network representations, leading to a positive tendency toward the recalled of event recognition. In the second case, there is much ambiguity around the phrase “向前来”. The Char-Global-Joint yields the event trigger “向前” given that “向前” occurs more frequently than “前来” in the training set. Due to the lack of word unit semantics, it is challenging for the character-level model to infer the correct trigger and argument boundary in this case. In contrast, with the help of the soft lexicon knowledge, the MLAEE model detects the Transport trigger “前来” and Destination argument “日本” correctly.

Table 4

Event prediction made by different models.

Gold C indicates the standard annotation. Words in bold and italics are correct triggers and arguments, respectively, while the underlined ones are incorrect.

Gold C₁:	信中明白的指出, 因为被警方通缉需要钱, 否则就要[开]_Attack[枪]_Instrument[丢]_Attack[汽油弹]_Instrument, 让牙医师们人人自危。(The letter clearly pointed out that because being wanted by the police requires money, or else they will shoot and throw petrol bombs, putting the dentists in danger.)
Char-Global-Joint:	信中明白的指出, 因为被警方通缉需要钱, 否则就要[开]_Attack枪丢[汽油弹]_Instrument, 让牙医师们人人自危。
MLAEE:	信中明白的指出, 因为被警方通缉需要钱, 否则就要[开]_Attack[枪]_Instrument[丢]_Attack[汽油弹]_Instrument, 让牙医师们人人自危。
Gold C₂:	但是会向[前来]_Transport[日本]_Destination日本的秘鲁国会调查[委员会]_Agent成员进行汇报。 (However, it will report to the members of the investigation committee of the Peruvian Congress who come to Japan)
Char-Global-Joint:	但是会[向前]_Transport来日本的秘鲁国会调查[委员会]_Agent成员进行汇报。
MLAEE:	但是会向[前来]_Transport[日本]_Destination日本的秘鲁国会调查[委员会]_Agent成员进行汇报。

Event prediction made by different models.

Gold C indicates the standard annotation. Words in bold and italics are correct triggers and arguments, respectively, while the underlined ones are incorrect.

Related work

Our work mainly follows the line of event extraction and label embedding. English event extraction. Identify events in English texts is a heated topic in information extraction field [3, 12–15]. Feature-based methods [4–6, 11] and recent neural-based models [26, 37–39] have been used to promote the extraction performance continually. To improve model performance, some studies put focus on leveraging syntax information into neural networks, which include adding shortcut arcs in LSTM [8] or using Graph Convolutional Networks (GCNs) [26, 33, 37, 40, 41]. There has been work found that incorrect entity results would hurt argument role classifications significantly [9, 11, 37]. Subsequently, the transition based framework is devised [25] to jointly consider entity and events. Further, studies also learn relations together with events [15, 23, 24]. However, they all treat task labels as uninformative and categorical numbers. In contrast, our models map labels of event extraction tasks into semantic vectors and provide a way to realize joint learning with the interaction between the encoding and decoding stage. Chinese event extraction. For Chinese, a word segmentation procedure is often required before applying event systems, despite kernel-based methods [18, 42], feature-based methods [18-20] or neural network methods [21, 24, 27, 43] are used. Instead of relying on existing segmentors, which suffer from the potential issue of error propagation we take characters as the basic units and integrate word lexicon with the input encoding scheme [28, 30]. Label embedding. In computer vision, research has demonstrated the importance of label embeddings, including text recognition [44] and image classification [45]. Later, work [46] shows that label embedding can benefit text classification, where text descriptions are used to generate initial label vectors. Inspired by them, [47] proposes to denoise relation instances with the help of Knowledge Graphs and entity related label embeddings. Inspired by the recent work of label attention network [48], we propose a joint label space across event subtasks, enhancing network hidden representations with global task label information. To our knowledge, we are the first to apply it for joint entity and event extraction.

Conclusion

We present a multi-level label attentive multitask network for Chinese end-to-end event extraction. With a hierarchical refined attention mechanism, the label importance distribution is incorporated into each character’s hidden state and further shares label embeddings with output layers resulting in joint learning in both the encoding and decoding stage. Results on a multi-lingual benchmark show the superiority of our model over various advanced baselines. 2 Nov 2021

PONE-D-21-29056

Extracting Chinese Events with A Joint Label Space Model

PLOS ONE Dear Dr. Zhang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 17 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Fu Lee Wang Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper focus on end-to-end event extraction by jointly modeling entity typing, trigger classification and argument classification. To solve the error propagation problem in Chinese event extraction and the ignorance of event type labels, this paper involves soft lexicon information, represents type labels using low-dimensional vectors and proposes label-aware attentions. This paper is clear and sufficient, and I don't have many questions. Here are my concerns: 1) Please provide some significant test results when comparing with other methods. 2) Which equation the line 128 refers to is not clear. 3) I think a verb is missing in the sentence of line 239 before ``3.5% better F-scores’’. Reviewer #2: This paper proposes a joint label space framework to improve chinese event extraction, which conducts sets of experiments on a multilingual benchmark dataset. On the whole, the article is somewhat innovative and the experimental results seems to be authentic. However, I still have some concerns listed as follows: 1) for Soft Lexicon Features (equation 1-4), the matching type are easy to bring redundant errors, how to consider and solve this issue? 2) for Multi-head Label Attention Layer, adding label features have been conducted into several work, please explain the differences and innovative points. 3) for Joint Training Strategy (equation 16), do you consider the influence for entity identification, trigger extraction and argument role classification, especially for the value of the likelihood function for event triggers. 4) please explain the dividing ratio of the training/developing/testing set, how to construct the matching words？ 5) why do the ablation test lack of entity performance? 6) Some grammar typos need to be corrected, especially for subject-predicate singular and plural status. Reviewer #3: In this paper, the authors propose a Chinese event extraction system that utilizes label space information. Most traditional methods establish the event extraction pipeline without considering the underlying relationships and constraints among labels, e.g., only PERSON can DIVORCE. The authors use vectors to represent the labels and use multi-head attention to jointly train the label embedding, word embedding, sentence embedding as well as the other hidden layers. The authors also use soft lexicon features to provide additional information to mitigate the shortcoming of a character-based system. The authors demonstrate an extensive set of experiments and comparisons to prove the merit of the proposed method. To the best of my knowledge, I think the paper needs major revision, with the following concerns: 1. The authors need to comprehensively show that their proposed method is able to solve both problems raised in the introduction. I think the current version does not completely and convincingly cover one of the motivations of the paper. In the introduction, the authors mention that Chinese texts need segmentation while the segmenter may cause errors and propagate these errors, and this argument leads the authors to use a character-based approach. However, in the experiments, although there are other baseline models with word segmentation, the authors still need an ablative setting in their proposed framework with groundtruth segmentation and system segmentation (of course these settings should remove soft lexicon features). 2. I also would like to see the examples of the proposed method, especially those examples which failed in the baselines and ablations but succeed in the proposed method. The title of the paper indicates that the proposed framework is working on the Chinese dataset, however, I am confused that the authors show some results from the English dataset. In fact, this confusion already appears in the introductory section. 3. Another confusion is the count of ablation and modification (Line 224), the authors mention two but I see three. 4. Again in Line 224 and the following lines, I think the authors completely miss the description in REL, I guess it is relation extraction, another task. I understand any further features and tasks which are included in the joint training will boost the performance, but this definitely deviates the motivation of the proposed framework and hurts the fairness in the comparison. 5. Readers may find difficulty in reading the paper due to some informal or irregular writing and wording in the paper. Here I list a few points which the authors may consider revision: - Introduction: Line 19~21, it seems to me that the sentence is incomplete after “thus”, and the logic of the sentence is circular. - Line 52~57, as long as the paper is also a previous one, I suggest the authors merge this paragraph with the other potential baselines, and describe the paper in a manner that is same with the other traditional methods. - Line 59, if the authors merge the last paragraph, logically they do not need to “contrast” the work they proposed before. Again in Line 59, this work “proposes/introduces” etc. - Line 123: we propose to let each character’s hidden representation hi ~~to~~ interact … (remove the “to”) - Line 130: usually we say “updated” or “trained” when mentioning the change in the entries in the embeddings. - Line 173: usually we name “start/end location” “offset” - Overall, I also suggest the authors check through the whole paper, use the present tense and future tense (avoid present perfect tense and past tense), and use the active voice (e.g., “we place the embedding layer on top of”) instead of the passive voice (e.g., “the embedding layer is placed on top of”) ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 26 May 2022 Reviewer #1: This paper focus on end-to-end event extraction by jointly modeling entity typing, trigger classification and argument classification. To solve the error propagation problem in Chinese event extraction and the ignorance of event type labels, this paper involves soft lexicon information, represents type labels using low-dimensional vectors and proposes label-aware attentions. This paper is clear and sufficient, and I don't have many questions. Here are my concerns: 1) Please provide some significant test results when comparing with other methods. Response: Pairwise t-tests have been added to Tabel 1-3. 2) Which equation the line 128 refers to is not clear. Response: Notation E_a^l refers to the concatenation of all label matrices. The corresponding description has been added. 3) I think a verb is missing in the sentence of line 239 before ``3.5% better F-scores’’. Response: Addressed Reviewer #2: This paper proposes a joint label space framework to improve chinese event extraction, which conducts sets of experiments on a multilingual benchmark dataset. On the whole, the article is somewhat innovative and the experimental results seems to be authentic. However, I still have some concerns listed as follows: 1) for Soft Lexicon Features (equation 1-4), the matching type are easy to bring redundant errors, how to consider and solve this issue? Response: Thank you for your comment. We are sorry for the misleading description. In fact, for a character c_i, we first find all words that contain this character in a lexicon and only keep the words that can be found in the input sequence. Thus, redundant word information can be eliminated. Corresponding description has been revised in the Soft Lexicon Features section. 2) for Multi-head Label Attention Layer, adding label features have been conducted into several work, please explain the differences and innovative points. Response: Thank you for your suggestion. Compared to the previous work[1-3] that introduce label features, our work 1)first propose a unified label embedding space for entity, event trigger and argument role extraction;2) not only enhance the network hidden representations with the global label embeddings but also share the embedding weight with the subtask output layers, thereby making full use of label knowledge. This description has been added in the Related Work Section. 1. Wang G, Li C, Wang W, Zhang Y, Shen D, Zhang X, et al. Joint Embedding of Words and Labels for Text Classification. In: Proceedings of the 56th ACL; 2018. p. 2321–2331. 2. Hu L, Zhang L, Shi C, Nie L, Guan W, Yang C. Improving Distantly-Supervised Relation Extraction with Joint Label Embedding. In: Proceedings of EMNLP; 2019. p. 3812–3820. 3. Cui L, Li Y, Zhang Y. Label Attention Network for Structured Prediction. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2022;30:1235–1248. 3) for Joint Training Strategy (equation 16), do you consider the influence for entity identification, trigger extraction and argument role classification, especially for the value of the likelihood function for event triggers. Response: Thank you for your suggestion. We have tried to set weighting values (\\alpha \\beta \\gamma) to control the contribution of entity identification, trigger extraction and argument role classification in the joint training. As a result, we find that set all the weighting values to one leading to the best development performance, thus simply adding all losses together. 4) please explain the dividing ratio of the training/developing/testing set, how to construct the matching words？ Response: There are totally 7914 sentences in ACE2005 Chinese dataset, we follow[1] and divide the data into the training/developing/testing set with 6841, 526 and 547 sentences, respectively. We use automatically segmented Chinese Giga-Word as matching dictionary. This description has been added in the Dataset Section. [1] Yin L, et al. A Joint Neural Model for Information Extraction with Global Features, ACL 2020. 5) why do the ablation test lack of entity performance? Response: ablation test for entity recognition has been added 6) Some grammar typos need to be corrected, especially for subject-predicate singular and plural status. Response: addressed. Reviewer #3: In this paper, the authors propose a Chinese event extraction system that utilizes label space information. Most traditional methods establish the event extraction pipeline without considering the underlying relationships and constraints among labels, e.g., only PERSON can DIVORCE. The authors use vectors to represent the labels and use multi-head attention to jointly train the label embedding, word embedding, sentence embedding as well as the other hidden layers. The authors also use soft lexicon features to provide additional information to mitigate the shortcoming of a character-based system. The authors demonstrate an extensive set of experiments and comparisons to prove the merit of the proposed method. To the best of my knowledge, I think the paper needs major revision, with the following concerns: 1. The authors need to comprehensively show that their proposed method is able to solve both problems raised in the introduction. I think the current version does not completely and convincingly cover one of the motivations of the paper. In the introduction, the authors mention that Chinese texts need segmentation while the segmenter may cause errors and propagate these errors, and this argument leads the authors to use a character-based approach. However, in the experiments, although there are other baseline models with word segmentation, the authors still need an ablative setting in their proposed framework with groundtruth segmentation and system segmentation (of course these settings should remove soft lexicon features). Response: Thank you for your valuable suggestion. Because automatic word segmentation could bring in word boundary errors, we resort to making event predictions on character-level and integrate Chinese lexicon information with soft BMES embeddings. In this manner, our network can learn to select the most salient word features during model training, thereby avoiding potential segmentation errors. Unfortunately, in the ACE2005 dataset, there are no ground-truth word segmentations and we are limited in labor resources to manually label the word boundaries. Hence, it is very difficult to compare the framework with ground-truth segmentation and system segmentation. 2. I also would like to see the examples of the proposed method, especially those examples which failed in the baselines and ablations but succeed in the proposed method. The title of the paper indicates that the proposed framework is working on the Chinese dataset, however, I am confused that the authors show some results from the English dataset. In fact, this confusion already appears in the introductory section. Response: Thank you for your valuable suggestion. We added two representative cases in the Case Study Section. Compared to the previous best model Char-Global-Joint, our MLAEE model can fully output the correct event mention results in the cases, demonstrating the effectiveness of the proposed joint label space and the soft lexicon module. On the other hand, in the experiment, we aim to show that our proposed joint label space model works not only for Chinese but also can be applied to other languages such as English. In addition, a strand of previous event models typically give results on English dataset. To have a fair comparison with recent advanced methods, we also show the performance of our approach on the English ACE2005 dataset. We modified the introductory section and made a more clear description correspondingly. 3. Another confusion is the count of ablation and modification (Line 224), the authors mention two but I see three. Response: Thank you for your suggestion. In line 224, we introduce the baseline method "Lattice-Transition-Joint" and compare it in both Chinese and English dataset. To avoid confusion with the baseline "Lattice". We have renamed "Lattice-Transition-Joint" to "Transition-Joint". 4. Again in Line 224 and the following lines, I think the authors completely miss the description in REL, I guess it is relation extraction, another task. I understand any further features and tasks which are included in the joint training will boost the performance, but this definitely deviates the motivation of the proposed framework and hurts the fairness in the comparison. Response: Thank you for the valuable suggestion. Description of REL has been added. The task of entity relation extraction is similar to argument role extraction. We thus introduce relation extraction based on two considerations:1) A number of studies jointly extract relation and events [1-3]. To have a fair comparison with these models, we have constructed a modification "MLAEE + REL" (both relation and event data are used) of our "MLAEE" (only event data is used);2) We aim to demonstrate that our model can not only applied to event extraction but also entity relations. This shows its effectiveness across different structral prediction problems. In addition, in Tabel 1 and 2, models trained relation features have been clearly marked with the tag "*". And in the Main Result section, we analysed results between models that use the same labeled data (either only events or events plus relations). [1]Lin, Ying, et al. "A joint neural model for information extraction with global features." ACL. 2020. [2]Wadden D , et al. Entity, Relation, and Event Extraction with Contextualized Span Representations. EMNLP; 2019. [3]Huang W, et al. A transition-based neural framework for Chinese information extraction. Plos one. 2020. 5. Readers may find difficulty in reading the paper due to some informal or irregular writing and wording in the paper. Here I list a few points which the authors may consider revision: Response: Thank you for such detailed suggestions. We are sorry for these informal expressions. We have proofread the paper carefully and revised the irregular writing. - Introduction: Line 19~21, it seems to me that the sentence is incomplete after “thus”, and the logic of the sentence is circular. Response: revised. - Line 52~57, as long as the paper is also a previous one, I suggest the authors merge this paragraph with the other potential baselines, and describe the paper in a manner that is same with the other traditional methods. Response: last review suggests explicitly presenting the differences between this work and our previous one. Hence, we leave the paragraph in this revision. - Line 59, if the authors merge the last paragraph, logically they do not need to “contrast” the work they proposed before. Again in Line 59, this work “proposes/introduces” etc. Response: revised. - Line 123: we propose to let each character’s hidden representation hi ~~to~~ interact … (remove the “to”) Response: removed. - Line 130: usually we say “updated” or “trained” when mentioning the change in the entries in the embeddings. Response: revised. - Line 173: usually we name “start/end location” “offset” Response: revised. - Overall, I also suggest the authors check through the whole paper, use the present tense and future tense (avoid present perfect tense and past tense), and use the active voice (e.g., “we place the embedding layer on top of”) instead of the passive voice (e.g., “the embedding layer is placed on top of”) Response: the passive voice is changed to active voice. Submitted filename: revision_response.txt Click here for additional data file. 19 Jul 2022 Extracting Chinese Events with A Joint Label Space Model PONE-D-21-29056R1 Dear Dr. Zhang, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Fu Lee Wang Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper focuses on end-to-end event extraction by jointly modeling entity typing, trigger classification, and argument classification. This paper is clear and sufficient. The authors have addressed all my previous concerns. Reviewer #2: This paper proposes a joint label space framework to improve chinese event extraction, which conducts sets of experiments on a multilingual benchmark dataset. In summary, my comments are well considered and it can be accepted now. Reviewer #3: I have read the manuscript again as well as the responses from the authors. I think this paper qualifies publication to the best of my knowledge. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No ********** 16 Sep 2022 PONE-D-21-29056R1 Extracting Chinese Events with A Joint Label Space Model Dear Dr. Zhang: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Fu Lee Wang Academic Editor PLOS ONE

3 in total

Extracting Chinese events with a joint label space model.

Introduction

Task definition

Methodology

Input embedding

Soft lexicon features

Encoder layer

Multi-head label attention layer

Decoder layer

Entity and trigger identification

Argument classification

Joint training strategy

Experiments

Experimental settings

Dataset

Evaluation metrics

Pre-processing

Hyper-parameter settings

Results

Baselines

Main result

Ablation study

Visualization

t-SNE plot of joint label embeddings of entities, triggers and argument roles with varying numbers of training epochs.

Case study

Event prediction made by different models.

Related work

Conclusion

1. Label-Embedding for Image Classification.

2. Long short-term memory.

3. A transition-based neural framework for Chinese information extraction.