Literature DB >> 31106230

Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.

Yan Zhao1, Zhong-Qiu Wang2, DeLiang Wang3.   

Abstract

In real-world situations, speech reaching our ears is commonly corrupted by both room reverberation and background noise. These distortions are detrimental to speech intelligibility and quality, and also pose a serious problem to many speech-related applications, including automatic speech and speaker recognition. In order to deal with the combined effects of noise and reverberation, we propose a two-stage strategy to enhance corrupted speech, where denoising and dereverberation are conducted sequentially using deep neural networks. In addition, we design a new objective function that incorporates clean phase during model training to better estimate spectral magnitudes, which would in turn yield better phase estimates when combined with iterative phase reconstruction. The two-stage model is then jointly trained to optimize the proposed objective function. Systematic evaluations and comparisons show that the proposed algorithm improves objective metrics of speech intelligibility and quality substantially, and significantly outperforms previous one-stage enhancement systems.

Entities:  

Keywords:  Deep neural networks; denoising; dereverberation; ideal ratio mask; phase

Year:  2018        PMID: 31106230      PMCID: PMC6519714          DOI: 10.1109/TASLP.2018.2870725

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  5 in total

1.  A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.

Authors:  Masood Delfarah; Yuzhou Liu; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2020-09       Impact factor: 1.840

2.  Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.

Authors:  Masood Delfarah; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-08-12

3.  On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-08-14

4.  Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-12-28

5.  Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement.

Authors:  Michelle Gutiérrez-Muñoz; Astryd González-Salazar; Marvin Coto-Jiménez
Journal:  Biomimetics (Basel)       Date:  2019-12-20
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.