Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise.

Literature DB >> 31758279

Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise.

Guangxin Hu¹, Sarah C Determan¹, Yue Dong¹, Alec T Beeve¹, Joshua E Collins¹, Yan Gai².

Abstract

Acoustic features of speech include various spectral and temporal cues. It is known that temporal envelope plays a critical role for speech recognition by human listeners, while automated speech recognition (ASR) heavily relies on spectral analysis. This study compared sentence-recognition scores of humans and an ASR software, Dragon, when spectral and temporal-envelope cues were manipulated in background noise. Temporal fine structure of meaningful sentences was reduced by noise or tone vocoders. Three types of background noise were introduced: a white noise, a time-reversed multi-talker noise, and a fake-formant noise. Spectral information was manipulated by changing the number of frequency channels. With a 20-dB signal-to-noise ratio (SNR) and four vocoding channels, white noise had a stronger disruptive effect than the fake-formant noise. The same observation with 22 channels was made when SNR was lowered to 0 dB. In contrast, ASR was unable to function with four vocoding channels even with a 20-dB SNR. Its performance was least affected by white noise and most affected by the fake-formant noise. Increasing the number of channels, which improved the spectral resolution, generated non-monotonic behaviors for the ASR with white noise but not with colored noise. The ASR also showed highly improved performance with tone vocoders. It is possible that fake-formant noise affected the software's performance by disrupting spectral cues, whereas white noise affected performance by compromising speech segmentation. Overall, these results suggest that human listeners and ASR utilize different listening strategies in noise.

Entities: Species

Keywords: automated speech recognition; formants; noise vocoding; spectral; speech recognition; speech segmentation; temporal; tone vocoding

Mesh：

Year: 2019 PMID： 31758279 PMCID： PMC7062952 DOI： 10.1007/s10162-019-00737-z

Source DB: PubMed Journal: J Assoc Res Otolaryngol ISSN： 1438-7573

33 in total

1. Speech recognition with reduced spectral cues as a function of age.

Authors: L S Eisenberg; R V Shannon; A S Martinez; J Wygonski; A Boothroyd
Journal: J Acoust Soc Am Date: 2000-05 Impact factor: 1.840

2. On the number of channels needed to understand speech.

Authors: P C Loizou; M Dorman; Z Tu
Journal: J Acoust Soc Am Date: 1999-10 Impact factor: 1.840

3. A glimpsing model of speech perception in noise.

Authors: Martin Cooke
Journal: J Acoust Soc Am Date: 2006-03 Impact factor: 1.840

4. Speech intelligibility in cochlear implant simulations: Effects of carrier type, interfering noise, and subject experience.

Authors: Nathaniel A Whitmal; Sarah F Poissant; Richard L Freyman; Karen S Helfer
Journal: J Acoust Soc Am Date: 2007-10 Impact factor: 1.840

5. A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition.

Authors: Amit Juneja; Carol Espy-Wilson
Journal: J Acoust Soc Am Date: 2008-02 Impact factor: 1.840

6. Quantifying envelope and fine-structure coding in auditory nerve responses to chimaeric speech.

Authors: Michael G Heinz; Jayaganesh Swaminathan
Journal: J Assoc Res Otolaryngol Date: 2009-04-14

7. The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels.

Authors: Marylou Pausewang Gelfer; Victoria A Mikos
Journal: J Voice Date: 2005-12 Impact factor: 2.009

1. Differential weighting of temporal envelope cues from the low-frequency region for Mandarin sentence recognition in noise.

Authors: Yang Guo; Zhong Zheng; Keyi Li; Yuanyuan Sun; Liang Xia; Di Qian; Yanmei Feng
Journal: BMC Neurosci Date: 2022-06-13 Impact factor: 3.264

1 in total