Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 An audio-visual corpus for speech perception and automatic speech recognition.

Literature DB >> 17139705

An audio-visual corpus for speech perception and automatic speech recognition.

Martin Cooke¹, Jon Barker, Stuart Cunningham, Xu Shao.

Abstract

An audio-visual corpus has been collected to support the use of common material in speech perception and automatic speech recognition studies. The corpus consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers. Sentences are simple, syntactically identical phrases such as "place green at B 4 now". Intelligibility tests using the audio signals suggest that the material is easily identifiable in quiet and low levels of stationary noise. The annotated corpus is available on the web for research use.

Mesh：

Year: 2006 PMID： 17139705 DOI： 10.1121/1.2229005

Source DB: PubMed Journal: J Acoust Soc Am ISSN： 0001-4966 Impact factor: 1.840

Keyword Cloud
Cited

16 in total

1. Speaker-dependent multipitch tracking using deep neural networks.

Authors: Yuzhou Liu; DeLiang Wang
Journal: J Acoust Soc Am Date: 2017-02 Impact factor: 1.840

2. Modulation transfer functions for audiovisual speech.

Authors: Nicolai F Pedersen; Torsten Dau; Lars Kai Hansen; Jens Hjortkjær
Journal: PLoS Comput Biol Date: 2022-07-19 Impact factor: 4.779

3. The effects of Lombard perturbation on speech intelligibility in noise for normal hearing and cochlear implant listeners.

Authors: Juliana N Saba; John H L Hansen
Journal: J Acoust Soc Am Date: 2022-02 Impact factor: 2.482

10. The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure.

Authors: Paula C Stacey; Pádraig T Kitterick; Saffron D Morris; Christian J Sumner
Journal: Hear Res Date: 2016-04-13 Impact factor: 3.208

An audio-visual corpus for speech perception and automatic speech recognition.

1. Speaker-dependent multipitch tracking using deep neural networks.

2. Modulation transfer functions for audiovisual speech.

3. The effects of Lombard perturbation on speech intelligibility in noise for normal hearing and cochlear implant listeners.

4. The natural statistics of audiovisual speech.

5. The Bluegrass corpus: Audio-visual stimuli to investigate foreign accents.

6. Explaining face-voice matching decisions: The contribution of mouth movements, stimulus effects and response biases.

7.

8. Matching novel face and voice identity using static and dynamic facial images.

9. Temporal Fine-Structure Coding and Lateralized Speech Perception in Normal-Hearing and Hearing-Impaired Listeners.

10. The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure.