| Literature DB >> 28783622 |
Hao Liu, Jiwen Lu, Jianjiang Feng, Jie Zhou.
Abstract
In this paper, we propose a two-stream transformer networks (TSTN) approach for video-based face alignment. Unlike conventional image-based face alignment approaches which cannot explicitly model the temporal dependency in videos and motivated by the fact that consistent movements of facial landmarks usually occur across consecutive frames, our TSTN aims to capture the complementary information of both the spatial appearance on still frames and the temporal consistency information across frames. To achieve this, we develop a two-stream architecture, which decomposes the video-based face alignment into spatial and temporal streams accordingly. Specifically, the spatial stream aims to transform the facial image to the landmark positions by preserving the holistic facial shape structure. Accordingly, the temporal stream encodes the video input as active appearance codes, where the temporal consistency information across frames is captured to help shape refinements. Experimental results on the benchmarking video-based face alignment datasets show very competitive performance of our method in comparisons to the state-of-the-arts.Mesh:
Year: 2017 PMID: 28783622 DOI: 10.1109/TPAMI.2017.2734779
Source DB: PubMed Journal: IEEE Trans Pattern Anal Mach Intell ISSN: 0098-5589 Impact factor: 6.226