| Literature DB >> 35755752 |
Xiaodong Liu1, Huating Xu1, Miao Wang1.
Abstract
To solve the emotional differences between different regions of the video frame and make use of the interrelationship between different regions, a region dual attention-based video emotion recognition method (RDAM) is proposed. RDAM takes as input video frame sequences and learns a discriminatory video emotion representation that can make full use of the emotional differences of different regions and the interrelationship between regions. Specifically, we construct two parallel attention modules: one is the regional location attention module, which generates a weight value for each feature region to identify the relative importance of different regions. Based on the weight, the emotion feature that can perceive the emotional sensitive region is generated. The other is the regional relationship attention module, which generates a region relation matrix that represents the interrelationship of different regions of a video frame. Based on the region relation matrix, the emotion feature that can perceive interrelationship between different regions is generated. The outputs of these two attention modules are fused to produce the emotional features of video frames. Then, the features of video frame sequences are fused by attention-based fusion network, and the final emotion feature of the video is produced. The experimental results on the video emotion recognition data sets show that the proposed method outperforms the other related works.Entities:
Mesh:
Year: 2022 PMID: 35755752 PMCID: PMC9217593 DOI: 10.1155/2022/6096325
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Motivation of the region dual attention-based video emotion recognition model.
Figure 2Framework of the region dual attention-based video emotion recognition model.
Figure 3Region relationship attention module.
Evaluation of region dual attention mechanism on MHED-I data set.
| Method | Accuracy (%) |
|---|---|
| ResNet-50 + average aggregation | 44.16 |
| ResNet-50 + regional location attention mechanism | 46.73 |
| ResNet-50 + regional relationship attention mechanism | 46.26 |
| ResNet-50 + regional dual attention mechanism | 47.43 |
Evaluation of region dual attention mechanism on the MHED-F data set.
| Method | Accuracy (%) |
|---|---|
| ResNet-50 + average aggregation | 52.34 |
| ResNet-50 + regional location attention mechanism | 58.88 |
| ResNet-50 + regional relationship attention mechanism | 58.41 |
| ResNet-50 + regional dual attention mechanism | 59.81 |
Evaluation of region dual attention mechanism on the HEIV-I data set.
| Method | Accuracy (%) |
|---|---|
| ResNet-50 + average aggregation | 42.22 |
| ResNet-50 + regional location attention mechanism | 44.69 |
| ResNet-50 + regional relationship attention mechanism | 44.19 |
| ResNet-50 + regional dual attention mechanism | 45.68 |
Evaluation of region dual attention mechanism on the HEIV-F data set.
| Method | Accuracy (%) |
|---|---|
| ResNet-50 + average aggregation | 44.94 |
| ResNet-50 + regional location attention mechanism | 49.38 |
| ResNet-50 + regional relationship attention mechanism | 49.14 |
| ResNet-50 + regional dual attention mechanism | 50.12 |
Top-1 accuracy (%) compared with related works on the MHED and HEIV.
| Method | Accuracy on MEHD (%) | Accuracy on HEIV (%) |
|---|---|---|
| Quality-aware network [ | 46.03 | 43.95 |
| Vielzeuf et al. [ | 53.73 | 45.93 |
| Chen et al. [ | 55.60 | 46.17 |
| Attention clusters [ | 59.81 | 49.63 |
| Our method | 65.89 | 53.09 |
Top-1 accuracy (%) comparing with state-of-the-art methods on Ekman-6 and VideoEmotion-8.
| Method | Ekman (%) | VideoEmotion-8 (%) |
|---|---|---|
| Emotion in context [ | 51.8 | 50.6 |
| Xu et al. [ | 50.4 | 46.7 |
| Kernelized feature [ | 54.4 | 49.7 |
| Concept selection [ | 54.40 | 50.82 |
| Graph-based network [ | 55.01 | 51.77 |
| Ours | 58.07 | 54.22 |
We also evaluate the performance of our method on the SFEW data set for cross-validation purposes.
Top-1 accuracy (%) compared with state-of-the-art methods on SEWA.
| Method | SFEW (%) |
|---|---|
| RAN [ | 56.40 |
| DDL [ | 59.86 |
| FDRL [ | 62.16 |
| Ours | 63.41 |