Literature DB >> 30605101

Action-Stage Emphasized Spatio-Temporal VLAD for Video Action Recognition.

Zhigang Tu, Hongyan Li, Dejun Zhang, Justin Dauwels, Baoxin Li, Junsong Yuan.   

Abstract

Despite outstanding performance in image recognition, convolutional neural networks (CNNs) do not yet achieve the same impressive results on action recognition in videos. This is partially due to the inability of CNN for modeling long-range temporal structures especially those involving individual action stages that are critical to human action recognition. In this paper, we propose a novel action-stage (ActionS) emphasized spatiotemporal Vector of Locally Aggregated Descriptors (ActionS-STVLAD) method to aggregate informative deep features across the entire video according to adaptive video feature segmentation and adaptive segment feature sampling (AVFS-ASFS). In our ActionSST- VLAD encoding approach, by using AVFS-ASFS, the key frame features are chosen and the corresponding deep features are automatically split into segments with the features in each segment belonging to a temporally coherent ActionS. Then, based on the extracted key frame feature in each segment, a flow-guided warping technique is introduced to detect and discard redundant feature maps, while the informative ones are aggregated by using our exploited similarity weight. Furthermore, we exploit an RGBF modality to capture motion salient regions in the RGB images corresponding to action activity. Extensive experiments are conducted on four public benchmarks - HMDB51, UCF101, Kinetics and ActivityNet for evaluation. Results show that our method is able to effectively pool useful deep features spatiotemporally, leading to state-of-the-art performance for videobased action recognition.

Entities:  

Year:  2019        PMID: 30605101     DOI: 10.1109/TIP.2018.2890749

Source DB:  PubMed          Journal:  IEEE Trans Image Process        ISSN: 1057-7149            Impact factor:   10.856


  5 in total

1.  Action Recognition Using Action Sequences Optimization and Two-Stream 3D Dilated Neural Network.

Authors:  Xin Xiong; Weidong Min; Qing Han; Qi Wang; Cheng Zha
Journal:  Comput Intell Neurosci       Date:  2022-06-13

2.  An Efficient Human Instance-Guided Framework for Video Action Recognition.

Authors:  Inwoong Lee; Doyoung Kim; Dongyoon Wee; Sanghoon Lee
Journal:  Sensors (Basel)       Date:  2021-12-12       Impact factor: 3.576

3.  Motion-compensated online object tracking for activity detection and crowd behavior analysis.

Authors:  Ashish Singh Patel; Ranjana Vyas; O P Vyas; Muneendra Ojha; Vivek Tiwari
Journal:  Vis Comput       Date:  2022-04-13       Impact factor: 2.601

4.  A Video-Based DT-SVM School Violence Detecting Algorithm.

Authors:  Liang Ye; Le Wang; Hany Ferdinando; Tapio Seppänen; Esko Alasaarela
Journal:  Sensors (Basel)       Date:  2020-04-03       Impact factor: 3.576

5.  Development and Validation of a 3-Dimensional Convolutional Neural Network for Automatic Surgical Skill Assessment Based on Spatiotemporal Video Analysis.

Authors:  Daichi Kitaguchi; Nobuyoshi Takeshita; Hiroki Matsuzaki; Takahiro Igaki; Hiro Hasegawa; Masaaki Ito
Journal:  JAMA Netw Open       Date:  2021-08-02
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.