Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Exploiting Images for Video Recognition: Heterogeneous Feature Augmentation via Symmetric Adversarial Learning.

Literature DB >> 31144637

Exploiting Images for Video Recognition: Heterogeneous Feature Augmentation via Symmetric Adversarial Learning.

Feiwu Yu, Xinxiao Wu, Jialu Chen, Lixin Duan.

Abstract

Training deep models of video recognition usually requires sufficient labeled videos in order to achieve good performance without over-fitting. However, it is quite labor-intensive and time-consuming to collect and annotate a large amount of videos. Moreover, training deep neural networks on large-scale video datasets always demands huge computational resources which further hold back many researchers and practitioners. To resolve that, collecting and training on annotated images are much easier. However, thoughtlessly applying images to help recognize videos may result in noticeable performance degeneration due to the well-known domain shift and feature heterogeneity. This proposes a novel symmetric adversarial learning approach for heterogeneous image-to-video adaptation, which augments deep image and video features by learning domain-invariant representations of source images and target videos. Primarily focusing on an unsupervised scenario where the labeled source images are accompanied by unlabeled target videos in the training phrase, we present a data-driven approach to respectively learn the augmented features of images and videos with superior transformability and distinguishability. Starting with learning a common feature space (called image-frame feature space) between images and video frames, we then build new symmetric generative adversarial networks (Sym-GANs) where one GAN maps image-frame features to video features and the other maps video features to image-frame features. Using the Sym-GANs, the source image feature is augmented with the generated video-specific representation to capture the motion dynamics while the target video feature is augmented with the image-specific representation to take the static appearance information. Finally, the augmented features from the source domain are fed into a network with fully connected layers for classification. Thanks to an end-to-end training procedure of the Sym-GANs and the classification network, our approach achieves better results than other state-of-the-arts, which is clearly validated by experiments on two video datasets, i.e., the UCF101 and HMDB51 datasets.

Year: 2019 PMID： 31144637 DOI： 10.1109/TIP.2019.2917867

Source DB: PubMed Journal: IEEE Trans Image Process ISSN： 1057-7149 Impact factor: 10.856

Keyword Cloud
Cited

3 in total

1. Construction of Sports Training Performance Prediction Model Based on a Generative Adversarial Deep Neural Network Algorithm.

Authors: Gang Li
Journal: Comput Intell Neurosci Date: 2022-05-21

2. Feature Recognition of English Based on Deep Belief Neural Network and Big Data Analysis.

Authors: Xiaoling Liu
Journal: Comput Intell Neurosci Date: 2021-07-13

3. Analysis of Volleyball Video Intelligent Description Technology Based on Computer Memory Network and Attention Mechanism.

Authors: Zhongzi Zhang
Journal: Comput Intell Neurosci Date: 2021-12-28

3 in total