Literature DB >> 36245814

Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

Ashutosh Pandey1, DeLiang Wang2.   

Abstract

Deep neural networks (DNNs) represent the mainstream methodology for supervised speech enhancement, primarily due to their capability to model complex functions using hierarchical representations. However, a recent study revealed that DNNs trained on a single corpus fail to generalize to untrained corpora, especially in low signal-to-noise ratio (SNR) conditions. Developing a noise, speaker, and corpus independent speech enhancement algorithm is essential for real-world applications. In this study, we propose a self-attending recurrent neural network (SARNN) for time-domain speech enhancement to improve cross-corpus generalization. SARNN comprises of recurrent neural networks (RNNs) augmented with self-attention blocks and feedforward blocks. We evaluate SARNN on different corpora with nonstationary noises in low SNR conditions. Experimental results demonstrate that SARNN substantially outperforms competitive approaches to time-domain speech enhancement, such as RNNs and dual-path SARNNs. Additionally, we report an important finding that the two popular approaches to speech enhancement: complex spectral mapping and time-domain enhancement, obtain similar results for RNN and SARNN with large-scale training. We also provide a challenging subset of the test set used in this study for evaluating future algorithms and facilitating direct comparisons.

Entities:  

Keywords:  Speech enhancement; cross-corpus generalization; recurrent neural network; self-attention; time-domain enhancement

Year:  2022        PMID: 36245814      PMCID: PMC9560045          DOI: 10.1109/taslp.2022.3161143

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  11 in total

1.  Supervised Speech Separation Based on Deep Learning: An Overview.

Authors:  DeLiang Wang; Jitong Chen
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2018-05-30

2.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors:  Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2016-05       Impact factor: 1.840

3.  A New Framework for CNN-Based Speech Enhancement in the Time Domain.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-04-29

4.  Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.

Authors:  Yan Zhao; DeLiang Wang; Buye Xu; Tao Zhang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-05-18

5.  Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-11-22

6.  On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-08-14

7.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

8.  Complex Ratio Masking for Monaural Speech Separation.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-12-23

9.  Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-03-08

10.  Towards Model Compression for Deep Learning Based Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-21
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.