Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.

Literature DB >> 29990061

Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.

Abstract

Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.

Entities: Chemical Species

Year: 2017 PMID： 29990061 DOI： 10.1109/TIP.2017.2778563

Source DB: PubMed Journal: IEEE Trans Image Process ISSN： 1057-7149 Impact factor: 10.856

Keyword Cloud
Cited

7 in total

Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.

1. Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations.

2. Pathological-Gait Recognition Using Spatiotemporal Graph Convolutional Networks and Attention Model.

3. Action Recognition Using Action Sequences Optimization and Two-Stream 3D Dilated Neural Network.

Review 4. Complex networks and deep learning for EEG signal analysis.

5. Dynamics of facial actions for assessing smile genuineness.

6. STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.

7. Classification of Tetanus Severity in Intensive-Care Settings for Low-Income Countries Using Wearable Sensing.