Literature DB >> 30112422

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.

Donald S Williamson1, DeLiang Wang2.   

Abstract

In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

Entities:  

Keywords:  Complex ideal ratio mask; deep neural networks; dereverberation; speech quality; speech separation

Year:  2017        PMID: 30112422      PMCID: PMC6089240          DOI: 10.1109/TASLP.2017.2696307

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  9 in total

1.  On the importance of early reflections for speech in rooms.

Authors:  J S Bradley; H Sato; M Picard
Journal:  J Acoust Soc Am       Date:  2003-06       Impact factor: 1.840

2.  Pitch-based monaural segregation of reverberant speech.

Authors:  Nicoleta Roman; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2006-07       Impact factor: 1.840

3.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions.

Authors:  Jianfen Ma; Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2009-05       Impact factor: 1.840

4.  Speech intelligibility in reverberation with ideal binary masking: effects of early reflections and signal-to-noise ratio threshold.

Authors:  Nicoleta Roman; John Woodruff
Journal:  J Acoust Soc Am       Date:  2013-03       Impact factor: 1.840

5.  Perceptual linear predictive (PLP) analysis of speech.

Authors:  H Hermansky
Journal:  J Acoust Soc Am       Date:  1990-04       Impact factor: 1.840

6.  Monaural and binaural speech perception through hearing aids under noise and reverberation with normal and hearing-impaired listeners.

Authors:  A K Nabelek; J M Pickett
Journal:  J Speech Hear Res       Date:  1974-12

7.  Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction.

Authors:  B Kollmeier; R Koch
Journal:  J Acoust Soc Am       Date:  1994-03       Impact factor: 1.840

8.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

9.  Complex Ratio Masking for Monaural Speech Separation.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-12-23
  9 in total
  2 in total

1.  Deep Learning Based Target Cancellation for Speech Dereverberation.

Authors:  Zhong-Qiu Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-02-28

2.  Speech Enhancement by Multiple Propagation through the Same Neural Network.

Authors:  Tomasz Grzywalski; Szymon Drgas
Journal:  Sensors (Basel)       Date:  2022-03-22       Impact factor: 3.576

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.