| Literature DB >> 35845762 |
Baohua Zhang1, Huaping Zhang1, Jianyun Shang1, Jiahao Cai1.
Abstract
Understanding human sentiment from their expressions is very important in human-robot interaction. But deep learning models are hard to represent grammatical changes for natural language processing (NLP), especially for sentimental analysis, which influence the robot's judgment of sentiment. This paper proposed a novel sentimental analysis model named MoLeSy, which is an augmentation of neural networks incorporating morphological, lexical, and syntactic knowledge. This model is constructed from three concurrently processed classical neural networks, in which output vectors are concatenated and reduced with a single dense neural network layer. The models used in the three grammatical channels are convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and fully connected dense neural networks. The corresponding output in the three channels is morphological, lexical, and syntactic results, respectively. Experiments are conducted on four different sentimental analysis corpuses, namely, hotel, NLPCC2014, Douban movie reviews dataset, and Weibo. MoLeSy can achieve the best performance over previous state-of-art models. It indicated that morphological, lexical, and syntactic grammar can augment the neural networks for sentimental analysis.Entities:
Keywords: augmentation; grammar; morphological; multi-channel CNN; sentiment analysis
Year: 2022 PMID: 35845762 PMCID: PMC9284271 DOI: 10.3389/fnbot.2022.897402
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 3.493
Figure 1The relation of the three levels.
Figure 2The architecture of MoLeSy.
Figure 3The architecture of multichannel convolutional neural network (CNN).
Figure 4The architecture of long short-term memory (LSTM).
Framework of ensemble learning for our system.
| Initialize |
| |
| |
| |
| |
| |
| |
| Initialize |
| |
| Rwp = find relation word position (relation word) |
| |
| |
| Initialize |
| according to the type of punction in the simple sentence, set the corresponding position of |
| |
| |
| |
The statics of the datasets.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Douban | 100,000 | 99 | 11 | 4.32 | 90,334 |
| Hotel | 10,000 | 1,985 | 4 | 8.74 | 9,861 |
| nlpcc2014 | 10,000 | 1,004 | 3 | 5.48 | 9,258 |
| 119,988 | 260 | 1 | 4.23 | 100,146 |
The parameter setting.
| Maxlength of CNN and NN | 300 |
| Maxlength of LSTM | 200 |
| Embedding dim | 300 |
| Embedding method | Word2vec |
| CNN kernel size | 2,3,4 |
| The number of convolutional filters | 150 |
| CNN activation | Relu |
| LSTM units | 128 |
| Optimizer | Adadelta |
The results of ablation experiments.
|
|
|
| |
|---|---|---|---|
| MoLeSy |
|
|
|
| MCNN_LSTM_NN | 91.60 | 84.55 | 75.49 |
| MCNN_S_LSTM | 91.45 | 84.61 | 75.20 |
| MCNN_LSTM | 91.00 | 84.30 | 74.20 |
| MCNN_NN | 90.50 | 84.50 | 72.50 |
| S_LSTM_NN | 90.50 | 83.94 | 72.90 |
| MCNN | 90.35 | 83.55 | 71.25 |
| S_LSTM | 89.90 | 83.20 | 73.50 |
| LSTM | 88.04 | 83.02 | 71.55 |
Bold values are the maximum number in each column.
The accuracy of different models on the Hotel and Douban.
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| CNN | 88.18 | 89.40 | 88.78 | 88.61 | 82.61 | 86.17 | 84.36 | 83.94 |
| LSTM | 87.79 | 89.70 | 88.74 | 88.40 | 86.01 | 80.54 | 83.23 | 83.64 |
| MCNNALSTM | 90.61 | 90.79 | 90.70 | 90.52 | 85.38 | 84.16 | 84.77 | 84.16 |
| SLCABG | 91.35 | 88.91 | 90.12 | 90.15 | 86.13 | 83.89 | 85.00 | 85.11 |
| ATTConv-RNN-rand | 92.45 | 88.42 | 90.43 | 90.55 | 86.19 | 84.85 | 85.51 | 85.55 |
| ABCDM | 89.48 | 92.67 | 91.05 | 90.80 | 84.20 |
| 85.53 | 85.22 |
| SAMF_BiLSTM |
| 87.52 | 90.30 | 90.50 | 84.72 | 85.49 | 85.10 | 84.96 |
| CNN-BiLSTM(SP) | 92.32 | 88.12 | 90.17 | 90.30 | 86.13 | 83.89 | 84.77 | 84.79 |
| MC-AttCNN-ArrBiGRU | 91.35 | 90.99 | 91.17 | 91.10 | 85.53 | 86.80 | 86.16 | 85.99 |
| MoLeSy | 90.74 |
|
|
|
| 84.17 |
|
|
Bold values are the maximum number in each column.
The accuracy of different models on the Hotel and Douban.
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| CNN | 78.32 | 67.67 | 72.57 | 73.56 | 91.26 | 96.85 | 93.97 | 93.77 |
| LSTM | 75.92 | 68.45 | 72.00 | 72.65 | 92.90 | 95.99 | 94.42 | 94.31 |
| MCNNALSTM | 74.36 | 75.50 | 74.93 | 73.80 | 94.03 | 95.66 | 94.84 | 94.78 |
| SLCABG | 79.08 | 71.46 | 75.08 | 75.40 | 96.19 | 96.96 | 96.57 | 96.57 |
| ATTConv-RNN-rand |
| 67.60 | 73.98 | 75.35 | 95.10 | 97.11 | 96.10 | 96.06 |
| ABCDM | 78.59 | 73.96 | 76.20 | 76.05 | 97.42 | 96.82 | 97.12 | 97.13 |
| SAMF_BiLSTM | 74.82 |
| 76.47 | 75.05 | 94.84 | 97.15 | 95.98 | 95.94 |
| CNN-BiLSTM(SP) | 78.32 | 71.07 | 74.52 | 74.80 | 94.04 |
| 95.65 | 95.58 |
| MC-AttCNN-ArrBiGRU | 75.99 | 78.11 | 77.03 | 75.85 | 98.30 | 96.79 | 97.54 | 97.56 |
| MoLeSy | 77.56 | 77.34 |
|
|
| 96.73 |
|
|
Bold values are the maximum number in each column.
Figure 5The influence of different kernel sizes on the Hotel data set.
Figure 6The accuracy at different epochs on the Hotel data set.
Figure 7The time consumption based on the LSTM model.