Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 AR3D: Attention Residual 3D Network for Human Action Recognition.

Literature DB >> 33670835

AR3D: Attention Residual 3D Network for Human Action Recognition.

Min Dong^1,2, Zhenglin Fang¹, Yongfa Li¹, Sheng Bi^1,2,3, Jiangcheng Chen³.

Abstract

At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN's temporal and spatial feature extraction processes are independent of each other, which means that it is easy to ignore the internal connection, affecting the performance of recognition. Although 3D CNN can extract the temporal and spatial features of the video sequence at the same time, the parameters of the 3D model increase exponentially, resulting in the model being difficult to train and transfer. To solve this problem, this article is based on 3D CNN combined with a residual structure and attention mechanism to improve the existing 3D CNN model, and we propose two types of human action recognition models (the Residual 3D Network (R3D) and Attention Residual 3D Network (AR3D)). Firstly, in this article, we propose a shallow feature extraction module and improve the ordinary 3D residual structure, which reduces the parameters and strengthens the extraction of temporal features. Secondly, we explore the application of the attention mechanism in human action recognition and design a 3D spatio-temporal attention mechanism module to strengthen the extraction of global features of human action. Finally, in order to make full use of the residual structure and attention mechanism, an Attention Residual 3D Network (AR3D) is proposed, and its two fusion strategies and corresponding model structure (AR3D_V1, AR3D_V2) are introduced in detail. Experiments show that the fused structure shows different degrees of performance improvement compared to a single structure.

Entities: Chemical Disease Gene Species

Keywords: 3D; action recognition; attention mechanism; convolutional neural network; residual

Year: 2021 PMID： 33670835 DOI： 10.3390/s21051656

Source DB: PubMed Journal: Sensors (Basel) ISSN： 1424-8220 Impact factor: 3.576

1 in total

1. MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module.

Authors: Yi Zhang
Journal: Sensors (Basel) Date: 2022-09-01 Impact factor: 3.847

1 in total