Literature DB >> 24231875

A compact representation of visual speech data using latent variables.

Ziheng Zhou1, Xiaopeng Hong, Guoying Zhao, Matti Pietikäinen.   

Abstract

The problem of visual speech recognition involves the decoding of the video dynamics of a talking mouth in a high-dimensional visual space. In this paper, we propose a generative latent variable model to provide a compact representation of visual speech data. The model uses latent variables to separately represent the interspeaker variations of visual appearances and those caused by uttering within images, and incorporates the structural information of the visual data through placing priors of the latent variables along a curve embedded within a path graph.

Mesh:

Year:  2014        PMID: 24231875     DOI: 10.1109/TPAMI.2013.173

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  2 in total

1.  End-to-End Sentence-Level Multi-View Lipreading Architecture with Spatial Attention Module Integrated Multiple CNNs and Cascaded Local Self-Attention-CTC.

Authors:  Sanghun Jeon; Mun Sang Kim
Journal:  Sensors (Basel)       Date:  2022-05-09       Impact factor: 3.847

2.  Unsupervised random forest for affinity estimation.

Authors:  Yunai Yi; Diya Sun; Peixin Li; Tae-Kyun Kim; Tianmin Xu; Yuru Pei
Journal:  Comput Vis Media (Beijing)       Date:  2021-12-06
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.