Literature DB >> 33748323

Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

Ke Tan1, DeLiang Wang2.   

Abstract

Phase is important for perceptual quality of speech. However, it seems intractable to directly estimate phase spectra through supervised learning due to their lack of spectrotemporal structure in it. Complex spectral mapping aims to estimate the real and imaginary spectrograms of clean speech from those of noisy speech, which simultaneously enhances magnitude and phase responses of speech. Inspired by multi-task learning, we propose a gated convolutional recurrent network (GCRN) for complex spectral mapping, which amounts to a causal system for monaural speech enhancement. Our experimental results suggest that the proposed GCRN substantially outperforms an existing convolutional neural network (CNN) for complex spectral mapping in terms of both objective speech intelligibility and quality. Moreover, the proposed approach yields significantly higher STOI and PESQ than magnitude spectral mapping and complex ratio masking. We also find that complex spectral mapping with the proposed GCRN provides an effective phase estimate.

Entities:  

Keywords:  Complex spectral mapping; gated convolutional recurrent network; monaural speech enhancement; phase estimation

Year:  2019        PMID: 33748323      PMCID: PMC7970735          DOI: 10.1109/taslp.2019.2955276

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  5 in total

1.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

2.  Long short-term memory for speaker generalization in supervised speech separation.

Authors:  Jitong Chen; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

3.  Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.

Authors:  Ke Tan; Jitong Chen; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2018-10-15

4.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

5.  Complex Ratio Masking for Monaural Speech Separation.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-12-23
  5 in total
  7 in total

1.  Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2022-03-22

2.  Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-12-28

3.  Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-03-08

4.  Deep ANC: A deep learning approach to active noise control.

Authors:  Hao Zhang; DeLiang Wang
Journal:  Neural Netw       Date:  2021-04-01

5.  Towards Model Compression for Deep Learning Based Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-21

6.  Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.

Authors:  Ke Tan; Xueliang Zhang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-21

7.  An effectively causal deep learning algorithm to increase intelligibility in untrained noises for hearing-impaired listeners.

Authors:  Eric W Healy; Ke Tan; Eric M Johnson; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2021-06       Impact factor: 2.482

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.