Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV.

Literature DB >> 33562612

STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV.

Yue Zhang^1,2,3, Shengli Sun^1,3, Linjian Lei^1,2,3,4, Huikai Liu^1,2,3, Hui Xie^1,2,3.

Abstract

Egocentric activity recognition in first-person video (FPV) requires fine-grained matching of the camera wearer's action and the objects being operated. The traditional method used for third-person action recognition does not suffice because of (1) the background ego-noise introduced by the unstructured movement of the wearable devices caused by body movement; (2) the small-sized and fine-grained objects with single scale in FPV. Size compensation is performed to augment the data. It generates a multi-scale set of regions, including multi-size objects, leading to superior performance. We compensate for the optical flow to eliminate the camera noise in motion. We developed a novel two-stream convolutional neural network-recurrent attention neural network (CNN-RAN) architecture: spatial temporal attention on compensation information (STAC), able to generate generic descriptors under weak supervision and focus on the locations of activated objects and the capture of effective motion. We encode the RGB features using a spatial location-aware attention mechanism to guide the representation of visual features. Similar location-aware channel attention is applied to the temporal stream in the form of stacked optical flow to implicitly select the relevant frames and pay attention to where the action occurs. The two streams are complementary since one is object-centric and the other focuses on the motion. We conducted extensive ablation analysis to validate the complementarity and effectiveness of our STAC model qualitatively and quantitatively. It achieved state-of-the-art performance on two egocentric datasets.

Entities: Chemical Disease Gene Species

Keywords: compensation information; egocentric video analysis; fine-grained activity recognition; location-aware attention

Mesh：

Year: 2021 PMID： 33562612 PMCID： PMC7914484 DOI： 10.3390/s21041106

Source DB: PubMed Journal: Sensors (Basel) ISSN： 1424-8220 Impact factor: 3.576

2 in total

1. Long-Term Temporal Convolutions for Action Recognition.

Authors: Gul Varol; Ivan Laptev; Cordelia Schmid
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2017-06-06 Impact factor: 6.226

2. Delving into Egocentric Actions.

Authors: Yin Li; Zhefan Ye; James M Rehg
Journal: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit Date: 2015-06

2 in total

1 in total

1. MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module.

Authors: Yi Zhang
Journal: Sensors (Basel) Date: 2022-09-01 Impact factor: 3.847

1 in total