Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The role of binary mask patterns in automatic speech recognition in background noise.

Literature DB >> 23654411

The role of binary mask patterns in automatic speech recognition in background noise.

Abstract

Processing noisy signals using the ideal binary mask improves automatic speech recognition (ASR) performance. This paper presents the first study that investigates the role of binary mask patterns in ASR under various noises, signal-to-noise ratios (SNRs), and vocabulary sizes. Binary masks are computed either by comparing the SNR within a time-frequency unit of a mixture signal with a local criterion (LC), or by comparing the local target energy with the long-term average spectral energy of speech. ASR results show that (1) akin to human speech recognition, binary masking significantly improves ASR performance even when the SNR is as low as -60 dB; (2) the ASR performance profiles are qualitatively similar to those obtained in human intelligibility experiments; (3) the difference between the LC and mixture SNR is more correlated to the recognition accuracy than LC; (4) LC at which the performance peaks is lower than 0 dB, which is the threshold that maximizes the SNR gain of processed signals. This broad agreement with human performance is rather surprising. The results also indicate that maximizing the SNR gain is probably not an appropriate goal for improving either human or machine recognition of noisy speech.

Entities: Species

Mesh：

Year: 2013 PMID： 23654411 PMCID： PMC4109294 DOI： 10.1121/1.4798661

Source DB: PubMed Journal: J Acoust Soc Am ISSN： 0001-4966 Impact factor: 1.840

12 in total

The role of binary mask patterns in automatic speech recognition in background noise.

1. Design, optimization and evaluation of a Danish sentence test in noise.

2. Intelligibility of reverberant noisy speech with ideal binary masking.

3. Robust speech recognition from binary masks.

4. A glimpsing model of speech perception in noise.

5. Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.

6. Determination of the potential benefit of time-frequency gain manipulation.

7. Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.

8. Speech perception of noise with binary gains.

9. Role of mask pattern in intelligibility of ideal binary-masked noisy speech.

10. Speech intelligibility in background noise with ideal binary time-frequency masking.

1. On Training Targets for Supervised Speech Separation.

2. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.