Literature DB >> 31758279

Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise.

Guangxin Hu1, Sarah C Determan1, Yue Dong1, Alec T Beeve1, Joshua E Collins1, Yan Gai2.   

Abstract

Acoustic features of speech include various spectral and temporal cues. It is known that temporal envelope plays a critical role for speech recognition by human listeners, while automated speech recognition (ASR) heavily relies on spectral analysis. This study compared sentence-recognition scores of humans and an ASR software, Dragon, when spectral and temporal-envelope cues were manipulated in background noise. Temporal fine structure of meaningful sentences was reduced by noise or tone vocoders. Three types of background noise were introduced: a white noise, a time-reversed multi-talker noise, and a fake-formant noise. Spectral information was manipulated by changing the number of frequency channels. With a 20-dB signal-to-noise ratio (SNR) and four vocoding channels, white noise had a stronger disruptive effect than the fake-formant noise. The same observation with 22 channels was made when SNR was lowered to 0 dB. In contrast, ASR was unable to function with four vocoding channels even with a 20-dB SNR. Its performance was least affected by white noise and most affected by the fake-formant noise. Increasing the number of channels, which improved the spectral resolution, generated non-monotonic behaviors for the ASR with white noise but not with colored noise. The ASR also showed highly improved performance with tone vocoders. It is possible that fake-formant noise affected the software's performance by disrupting spectral cues, whereas white noise affected performance by compromising speech segmentation. Overall, these results suggest that human listeners and ASR utilize different listening strategies in noise.

Entities:  

Keywords:  automated speech recognition; formants; noise vocoding; spectral; speech recognition; speech segmentation; temporal; tone vocoding

Mesh:

Year:  2019        PMID: 31758279      PMCID: PMC7062952          DOI: 10.1007/s10162-019-00737-z

Source DB:  PubMed          Journal:  J Assoc Res Otolaryngol        ISSN: 1438-7573


  33 in total

1.  Speech recognition with reduced spectral cues as a function of age.

Authors:  L S Eisenberg; R V Shannon; A S Martinez; J Wygonski; A Boothroyd
Journal:  J Acoust Soc Am       Date:  2000-05       Impact factor: 1.840

2.  On the number of channels needed to understand speech.

Authors:  P C Loizou; M Dorman; Z Tu
Journal:  J Acoust Soc Am       Date:  1999-10       Impact factor: 1.840

3.  A glimpsing model of speech perception in noise.

Authors:  Martin Cooke
Journal:  J Acoust Soc Am       Date:  2006-03       Impact factor: 1.840

4.  Speech intelligibility in cochlear implant simulations: Effects of carrier type, interfering noise, and subject experience.

Authors:  Nathaniel A Whitmal; Sarah F Poissant; Richard L Freyman; Karen S Helfer
Journal:  J Acoust Soc Am       Date:  2007-10       Impact factor: 1.840

5.  A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition.

Authors:  Amit Juneja; Carol Espy-Wilson
Journal:  J Acoust Soc Am       Date:  2008-02       Impact factor: 1.840

6.  Quantifying envelope and fine-structure coding in auditory nerve responses to chimaeric speech.

Authors:  Michael G Heinz; Jayaganesh Swaminathan
Journal:  J Assoc Res Otolaryngol       Date:  2009-04-14

7.  The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels.

Authors:  Marylou Pausewang Gelfer; Victoria A Mikos
Journal:  J Voice       Date:  2005-12       Impact factor: 2.009

8.  Cues for Diotic and Dichotic Detection of a 500-Hz Tone in Noise Vary with Hearing Loss.

Authors:  Junwen Mao; Kelly-Jo Koch; Karen A Doherty; Laurel H Carney
Journal:  J Assoc Res Otolaryngol       Date:  2015-05-15

9.  Three design principles of language: the search for parsimony in redundancy.

Authors:  Barend Beekhuizen; Rens Bod; Willem Zuidema
Journal:  Lang Speech       Date:  2013-09       Impact factor: 1.500

10.  Speech analysis and synthesis by linear prediction of the speech wave.

Authors:  B S Atal; S L Hanaver
Journal:  J Acoust Soc Am       Date:  1971-08       Impact factor: 1.840

View more
  1 in total

1.  Differential weighting of temporal envelope cues from the low-frequency region for Mandarin sentence recognition in noise.

Authors:  Yang Guo; Zhong Zheng; Keyi Li; Yuanyuan Sun; Liang Xia; Di Qian; Yanmei Feng
Journal:  BMC Neurosci       Date:  2022-06-13       Impact factor: 3.264

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.