Literature DB >> 22423722

Recognizing articulatory gestures from speech for robust speech recognition.

Vikramjit Mitra1, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Louis Goldstein.   

Abstract

Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.
© 2012 Acoustical Society of America

Mesh:

Year:  2012        PMID: 22423722     DOI: 10.1121/1.3682038

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  6 in total

1.  Spatio-temporal articulatory movement primitives during speech production: extraction, interpretation, and validation.

Authors:  Vikram Ramanarayanan; Louis Goldstein; Shrikanth S Narayanan
Journal:  J Acoust Soc Am       Date:  2013-08       Impact factor: 1.840

2.  Methods for eliciting, annotating, and analyzing databases for child speech development.

Authors:  Mary E Beckman; Andrew R Plummer; Benjamin Munson; Patrick F Reidy
Journal:  Comput Speech Lang       Date:  2017-09       Impact factor: 1.899

3.  Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories.

Authors:  Vikram Ramanarayanan; Maarten Van Segbroeck; Shrikanth S Narayanan
Journal:  Comput Speech Lang       Date:  2015-03-21       Impact factor: 1.899

4.  The Type of Noise Influences Quality Ratings for Noisy Speech in Hearing Aid Users.

Authors:  Emily M H Lundberg; Song Hui Chon; James M Kates; Melinda C Anderson; Kathryn H Arehart
Journal:  J Speech Lang Hear Res       Date:  2020-11-30       Impact factor: 2.297

5.  The Role of Temporal Modulation in Sensorimotor Interaction.

Authors:  Louis Goldstein
Journal:  Front Psychol       Date:  2019-12-06

6.  Modeling speech imitation and ecological learning of auditory-motor maps.

Authors:  Claudia Canevari; Leonardo Badino; Alessandro D'Ausilio; Luciano Fadiga; Giorgio Metta
Journal:  Front Psychol       Date:  2013-06-27
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.