| Literature DB >> 35808519 |
Bo Chen1, Weiming Peng1, Jihua Song1.
Abstract
In the process of semantic capture, traditional sentence representation methods tend to lose a lot of global and contextual semantics and ignore the internal structure information of words in sentences. To address these limitations, we propose a sentence representation method for character-assisted construction-Bert (CharAs-CBert) to improve the accuracy of sentiment text classification. First, based on the construction, a more effective construction vector is generated to distinguish the basic morphology of the sentence and reduce the ambiguity of the same word in different sentences. At the same time, it aims to strengthen the representation of salient words and effectively capture contextual semantics. Second, character feature vectors are introduced to explore the internal structure information of sentences and improve the representation ability of local and global semantics. Then, to make the sentence representation have better stability and robustness, character information, word information, and construction vectors are combined and used together for sentence representation. Finally, the evaluation and verification are carried out on various open-source baseline data such as ACL-14 and SemEval 2014 to demonstrate the validity and reliability of sentence representation, namely, the F1 and ACC are 87.54% and 92.88% on ACL14, respectively.Entities:
Keywords: character vector; construction vector; internal structure information; sentence representation; sentiment classification
Mesh:
Year: 2022 PMID: 35808519 PMCID: PMC9269684 DOI: 10.3390/s22135024
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1(a) The overall network structure of CharAs-CBert; (b) The slice attention module (SAM); (c) The bidirectional independent recurrent neural network module (BIRM). Where “” indicates character graph convolution module; indicates independent recurrent neural network; indicates classifier with softmax; indicates the activation function of ; indicates the operation of multilayer perceptron; and indicate the channels of input features and satisfies . indicates the operation of GroupNorm; ⊕ indicates element-wise product operation. indicates language model of pre-training; indicates channel shuffle operation; indicates backward layers of ; indicates forward layers of . and represent the a priori word vector feature obtained by the multi-layer perceptron () of sentence A and sentence B, respectively.
Statistical results of SemEval2014 and ACL-14 baseline datasets. “” indicates the numbers of sentence construction. “” indicates the number of characters nodes used to build a topological graph.
| Datasets | Restaurant | Laptop | ACL14 | |||
|---|---|---|---|---|---|---|
| Training | Testing | Training | Testing | Training | Testing | |
| Positive | 2164 | 728 | 994 | 341 | 3142 | 346 |
| Negative | 807 | 196 | 870 | 128 | 1562 | 173 |
| Neutral | 637 | 196 | 464 | 169 | 1562 | 173 |
| Consnum | 100,043 | 1,105,665 | 241,546 | 992,438 | 819,242 | 286,552 |
Experimental results of different sentence representation methods. SemEval2014 including Laptop and Restaurant.
| Model-Datasets | Laptop | Restaurant | ACL14 | |||
|---|---|---|---|---|---|---|
| ACC |
| ACC |
| ACC |
| |
| LSTM | 75.38 | 72.24 | 73.98 | 70.07 | 77.42 | 73.19 |
| CNN-LSTM | 76.51 | 73.02 | 74.21 | 70.56 | 78.51 | 74.23 |
| Tree-LSTM | 78.08 | 74.88 | 76.64 | 72.89 | 80.5 | 77.06 |
| BERT-LSTM | 80.92 | 76.73 | 80.48 | 74.9 | 81.54 | 77.96 |
| TG-HRecNN | 82.08 | 79.52 | 80.93 | 75.92 | 82.46 | 80.63 |
| TG-HTreeLSTM | 83.03 | 81.41 | 80.96 | 76.42 | 85.83 | 82.17 |
| TE-DCNN | 87.55 | 83.25 | 83.93 | 78.99 | 87.49 | 83.84 |
| Capsule-B | 88.32 | 84.23 | 85.09 | 80.41 | 91.38 | 85.85 |
| Self-Att [ | 86.51 | 82.42 | 83.79 | 78.64 | 86.92 | 82.74 |
| SBERT-att [ | 90.59 | 85.93 | 85.31 | 81.93 | 91.53 | 86.37 |
| CharAs-CBert | 92.19 | 87.03 | 86.22 | 82.96 | 92.88 | 87.54 |
Figure 2The performance of different sentence representation models. indicates the number of model parameters; indicates floating-point operations.
Experimental results of different components. SemEval2014 including Laptop and Restaurant. indicates that the framework uses the three components SAM, BIRM, and CharCNN at the same time, and uses CharCNN to replace the CharGCM component we proposed; indicates that the SAM component is used; indicates that the BIRM component is used; means using CharGCM components; means using RNN to replace our proposed BIRM components; means using Att to replace our designed SAM components. indicates using a bidirectional long-short memory network (BILSTM) to replace the BIRM.
| Model-Datasets | Laptop | Restaurant | ACL14 | |||
|---|---|---|---|---|---|---|
| ACC |
| ACC |
| ACC |
| |
| CharAs-CBert ( | 86.35 | 82.4 | 81.34 | 77.65 | 86.01 | 83.48 |
| CharAs-CBert ( | 87.12 | 82.33 | 82.05 | 77.79 | 87.54 | 83.65 |
| CharAs-CBert ( | 88.04 | 82.82 | 82.59 | 78.73 | 88.02 | 83.72 |
| CharAs-CBert ( | 88.62 | 83.5 | 83.07 | 79.53 | 88.08 | 84.13 |
| CharAs-CBert ( | 88.84 | 83.62 | 83.29 | 80.06 | 88.4 | 85.5 |
| CharAs-CBert ( | 89.85 | 84.17 | 83.43 | 80.65 | 88.64 | 85.54 |
| CharAs-CBert ( | 90.02 | 84.75 | 83.97 | 80.67 | 88.72 | 85.92 |
| CharAs-CBert ( | 90.03 | 85.48 | 84.24 | 81.04 | 89.62 | 86.32 |
| CharAs-CBert ( | 91.3 | 85.65 | 84.67 | 82.03 | 90.62 | 86.42 |
| CharAs-CBert ( | 91.47 | 85.69 | 84.92 | 82.19 | 91.14 | 86.71 |
| CharAs-CBert ( | 91.96 | 85.75 | 85.44 | 82.56 | 92.27 | 86.92 |
| CharAs-CBert | 92.19 | 87.03 | 86.22 | 82.96 | 92.88 | 87.54 |
Figure 3The performance of different loss functions on the Laptop datasets. indicates class weights; indicates multiclass cross entropy loss; indicates focal loss; indicates multiclass cross entropy loss with class weights.
Figure 4The performance of different layers with BIRM and CharGCM on the Restaurant datasets. (a) indicates loss performance; (b) indicates performance of different layers. indicates the loss value of different layers. indicates the number of BIRM and CharGCM.