Literature DB >> 31831421

Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition.

Yang Liu, Zhaoyang Lu, Jing Li, Tao Yang, Chao Yao.   

Abstract

Existing deep learning methods for action recognition in videos require a large number of labeled videos for training, which is labor-intensive and time-consuming. For the same action, the knowledge learned from different media types, e.g., videos and images, may be related and complementary. However, due to the domain shifts and heterogeneous feature representations between videos and images, the performance of classifiers trained on images may be dramatically degraded when directly deployed to videos. In this paper, we propose a novel method, named Deep Image-to-Video Adaptation and Fusion Networks (DIVAFN), to enhance action recognition in videos by transferring knowledge from images using video keyframes as a bridge. The DIVAFN is a unified deep learning model, which integrates domain-invariant representations learning and cross-modal feature fusion into a unified optimization framework. Specifically, we design an efficient cross-modal similarities metric to reduce the modality shift among images, keyframes and videos. Then, we adopt an autoencoder architecture, whose hidden layer is constrained to be the semantic representations of the action class names. In this way, when the autoencoder is adopted to project the learned features from different domains to the same space, more compact, informative and discriminative representations can be obtained. Finally, the concatenation of the learned semantic feature representations from these three autoencoders are used to train the classifier for action recognition in videos. Comprehensive experiments on four real-world datasets show that our method outperforms some state-of-the-art domain adaptation and action recognition methods.

Year:  2019        PMID: 31831421     DOI: 10.1109/TIP.2019.2957930

Source DB:  PubMed          Journal:  IEEE Trans Image Process        ISSN: 1057-7149            Impact factor:   10.856


  2 in total

1.  Action Recognition Using Action Sequences Optimization and Two-Stream 3D Dilated Neural Network.

Authors:  Xin Xiong; Weidong Min; Qing Han; Qi Wang; Cheng Zha
Journal:  Comput Intell Neurosci       Date:  2022-06-13

2.  Development and Validation of a 3-Dimensional Convolutional Neural Network for Automatic Surgical Skill Assessment Based on Spatiotemporal Video Analysis.

Authors:  Daichi Kitaguchi; Nobuyoshi Takeshita; Hiroki Matsuzaki; Takahiro Igaki; Hiro Hasegawa; Masaaki Ito
Journal:  JAMA Netw Open       Date:  2021-08-02
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.