| Literature DB >> 30475711 |
Yu Kong, Zhiqiang Tao, Yun Fu.
Abstract
Different from after-the-fact action recognition, action prediction task requires action labels to be predicted from partially observed videos containing incomplete action executions. It is challenging because these partial videos have insufficient discriminative information, and their temporal structure is damaged. We study this problem in this paper, and propose an efficient and powerful deep network for learning representative and discriminative features for action prediction. Our approach exploits abundant sequential context information in full videos to enrich the feature representations of partial videos. This information is encoded in latent representations using a variational autoencoder (VAE), which are encouraged to be progress-invariant. Decoding such latent representations using another VAE, we can reconstruct missing information in the features extracted from partial videos. An adversarial learning scheme is adopted to differentiate the reconstructed features from the features directly extracted from full videos in order to well align their distributions. A multi-class classifier is also used to encourage the features to be discriminative. Our network jointly learns features and classifiers, and generates the features particularly optimized for action prediction. Extensive experimental results on UCF101, Sports-1M and BIT datasets demonstrate that our approach remarkably outperforms state-of-the-art methods, and shows significant speedup over these methods. Results also show that actions differ in their prediction characteristics; some actions can be correctly predicted even though only the beginning 10% portion of videos is observed.Entities:
Year: 2018 PMID: 30475711 DOI: 10.1109/TPAMI.2018.2882805
Source DB: PubMed Journal: IEEE Trans Pattern Anal Mach Intell ISSN: 0098-5589 Impact factor: 6.226