Literature DB >> 29990061

Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.

.   

Abstract

Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.

Entities:  

Year:  2017        PMID: 29990061     DOI: 10.1109/TIP.2017.2778563

Source DB:  PubMed          Journal:  IEEE Trans Image Process        ISSN: 1057-7149            Impact factor:   10.856


  7 in total

1.  Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations.

Authors:  Yanyi Zhang; Xinyu Li; Ivan Marsic
Journal:  Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit       Date:  2021-11-13

2.  Pathological-Gait Recognition Using Spatiotemporal Graph Convolutional Networks and Attention Model.

Authors:  Jungi Kim; Haneol Seo; Muhammad Tahir Naseem; Chan-Su Lee
Journal:  Sensors (Basel)       Date:  2022-06-27       Impact factor: 3.847

3.  Action Recognition Using Action Sequences Optimization and Two-Stream 3D Dilated Neural Network.

Authors:  Xin Xiong; Weidong Min; Qing Han; Qi Wang; Cheng Zha
Journal:  Comput Intell Neurosci       Date:  2022-06-13

Review 4.  Complex networks and deep learning for EEG signal analysis.

Authors:  Zhongke Gao; Weidong Dang; Xinmin Wang; Xiaolin Hong; Linhua Hou; Kai Ma; Matjaž Perc
Journal:  Cogn Neurodyn       Date:  2020-08-29       Impact factor: 3.473

5.  Dynamics of facial actions for assessing smile genuineness.

Authors:  Michal Kawulok; Jakub Nalepa; Jolanta Kawulok; Bogdan Smolka
Journal:  PLoS One       Date:  2021-01-05       Impact factor: 3.240

6.  STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.

Authors:  Guoan Yang; Yong Yang; Zhengzhi Lu; Junjie Yang; Deyang Liu; Chuanbo Zhou; Zien Fan
Journal:  PLoS One       Date:  2022-03-17       Impact factor: 3.240

7.  Classification of Tetanus Severity in Intensive-Care Settings for Low-Income Countries Using Wearable Sensing.

Authors:  Ping Lu; Shadi Ghiasi; Jannis Hagenah; Ho Bich Hai; Nguyen Van Hao; Phan Nguyen Quoc Khanh; Le Dinh Van Khoa; Louise Thwaites; David A Clifton; Tingting Zhu
Journal:  Sensors (Basel)       Date:  2022-08-30       Impact factor: 3.847

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.