Literature DB >> 31355300

Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.

Ke Tan1, Jitong Chen2, DeLiang Wang3.   

Abstract

For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.

Entities:  

Keywords:  dilated convolutions; gated linear units; residual learning; sequence-to-sequence mapping; speech enhancement

Year:  2018        PMID: 31355300      PMCID: PMC6660163          DOI: 10.1109/TASLP.2018.2876171

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  5 in total

1.  Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-11-22

2.  On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-08-14

3.  Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-03-08

4.  Speech Enhancement by Multiple Propagation through the Same Neural Network.

Authors:  Tomasz Grzywalski; Szymon Drgas
Journal:  Sensors (Basel)       Date:  2022-03-22       Impact factor: 3.576

5.  Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.

Authors:  Ke Tan; Xueliang Zhang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-21
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.