Literature DB >> 23654411

The role of binary mask patterns in automatic speech recognition in background noise.

Arun Narayanan1, DeLiang Wang.   

Abstract

Processing noisy signals using the ideal binary mask improves automatic speech recognition (ASR) performance. This paper presents the first study that investigates the role of binary mask patterns in ASR under various noises, signal-to-noise ratios (SNRs), and vocabulary sizes. Binary masks are computed either by comparing the SNR within a time-frequency unit of a mixture signal with a local criterion (LC), or by comparing the local target energy with the long-term average spectral energy of speech. ASR results show that (1) akin to human speech recognition, binary masking significantly improves ASR performance even when the SNR is as low as -60 dB; (2) the ASR performance profiles are qualitatively similar to those obtained in human intelligibility experiments; (3) the difference between the LC and mixture SNR is more correlated to the recognition accuracy than LC; (4) LC at which the performance peaks is lower than 0 dB, which is the threshold that maximizes the SNR gain of processed signals. This broad agreement with human performance is rather surprising. The results also indicate that maximizing the SNR gain is probably not an appropriate goal for improving either human or machine recognition of noisy speech.

Entities:  

Mesh:

Year:  2013        PMID: 23654411      PMCID: PMC4109294          DOI: 10.1121/1.4798661

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  12 in total

1.  Design, optimization and evaluation of a Danish sentence test in noise.

Authors:  Kirsten Wagener; Jane Lignel Josvassen; Regitze Ardenkjaer
Journal:  Int J Audiol       Date:  2003-01       Impact factor: 2.117

2.  Intelligibility of reverberant noisy speech with ideal binary masking.

Authors:  Nicoleta Roman; John Woodruff
Journal:  J Acoust Soc Am       Date:  2011-10       Impact factor: 1.840

3.  Robust speech recognition from binary masks.

Authors:  Arun Narayanan; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2010-11       Impact factor: 1.840

4.  A glimpsing model of speech perception in noise.

Authors:  Martin Cooke
Journal:  J Acoust Soc Am       Date:  2006-03       Impact factor: 1.840

5.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.

Authors:  Douglas S Brungart; Peter S Chang; Brian D Simpson; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2006-12       Impact factor: 1.840

6.  Determination of the potential benefit of time-frequency gain manipulation.

Authors:  Michael C Anzalone; Lauren Calandruccio; Karen A Doherty; Laurel H Carney
Journal:  Ear Hear       Date:  2006-10       Impact factor: 3.570

7.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.

Authors:  Ning Li; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2008-03       Impact factor: 1.840

8.  Speech perception of noise with binary gains.

Authors:  DeLiang Wang; Ulrik Kjems; Michael S Pedersen; Jesper B Boldt; Thomas Lunner
Journal:  J Acoust Soc Am       Date:  2008-10       Impact factor: 1.840

9.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech.

Authors:  Ulrik Kjems; Jesper B Boldt; Michael S Pedersen; Thomas Lunner; Deliang Wang
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

10.  Speech intelligibility in background noise with ideal binary time-frequency masking.

Authors:  DeLiang Wang; Ulrik Kjems; Michael S Pedersen; Jesper B Boldt; Thomas Lunner
Journal:  J Acoust Soc Am       Date:  2009-04       Impact factor: 1.840

View more
  2 in total

1.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

2.  Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.

Authors:  Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-01-14
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.