Literature DB >> 33997107

Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Ashutosh Pandey1, DeLiang Wang2.   

Abstract

Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.

Entities:  

Keywords:  Speech enhancement; dense convolutional network; frequency-domain loss; self-attention network; time-domain enhancement

Year:  2021        PMID: 33997107      PMCID: PMC8118093          DOI: 10.1109/taslp.2021.3064421

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  10 in total

1.  Supervised Speech Separation Based on Deep Learning: An Overview.

Authors:  DeLiang Wang; Jitong Chen
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2018-05-30

2.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors:  Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2016-05       Impact factor: 1.840

3.  Long short-term memory for speaker generalization in supervised speech separation.

Authors:  Jitong Chen; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

4.  Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

Authors:  Yi Luo; Nima Mesgarani
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-05-06

5.  Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.

Authors:  Ke Tan; Jitong Chen; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2018-10-15

6.  Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.

Authors:  Yan Zhao; DeLiang Wang; Buye Xu; Tao Zhang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-05-18

7.  Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-11-22

8.  On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-08-14

9.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

10.  Complex Ratio Masking for Monaural Speech Separation.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-12-23
  10 in total
  4 in total

1.  Towards Robust Speech Super-resolution.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-01-25

2.  Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2022-03-22

3.  Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-12-28

4.  Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models.

Authors:  Mohamed Nabih Ali; Daniele Falavigna; Alessio Brutti
Journal:  Sensors (Basel)       Date:  2022-01-04       Impact factor: 3.576

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.