Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Literature DB >> 33997107

Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Abstract

Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.

Entities: Chemical

Keywords: Speech enhancement; dense convolutional network; frequency-domain loss; self-attention network; time-domain enhancement

Year: 2021 PMID： 33997107 PMCID： PMC8118093 DOI： 10.1109/taslp.2021.3064421

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

10 in total

1. Supervised Speech Separation Based on Deep Learning: An Overview.

Authors: DeLiang Wang; Jitong Chen
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2018-05-30

2. Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors: Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal: J Acoust Soc Am Date: 2016-05 Impact factor: 1.840

3. Long short-term memory for speaker generalization in supervised speech separation.

Authors: Jitong Chen; DeLiang Wang
Journal: J Acoust Soc Am Date: 2017-06 Impact factor: 1.840

4. Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

Authors: Yi Luo; Nima Mesgarani
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2019-05-06

5. Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.

Authors: Ke Tan; Jitong Chen; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2018-10-15

6. Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.

Authors: Yan Zhao; DeLiang Wang; Buye Xu; Tao Zhang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2020-05-18

7. Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

Authors: Ke Tan; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2019-11-22

8. On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors: Ashutosh Pandey; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2020-08-14

9. On Training Targets for Supervised Speech Separation.

Authors: Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2014-12

10. Complex Ratio Masking for Monaural Speech Separation.

Authors: Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2015-12-23