Literature DB >> 33748327

On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Ashutosh Pandey1, DeLiang Wang2.   

Abstract

In recent years, supervised approaches using deep neural networks (DNNs) have become the mainstream for speech enhancement. It has been established that DNNs generalize well to untrained noises and speakers if trained using a large number of noises and speakers. However, we find that DNNs fail to generalize to new speech corpora in low signal-to-noise ratio (SNR) conditions. In this work, we establish that the lack of generalization is mainly due to the channel mismatch, i.e. different recording conditions between the trained and untrained corpus. Additionally, we observe that traditional channel normalization techniques are not effective in improving cross-corpus generalization. Further, we evaluate publicly available datasets that are promising for generalization. We find one particular corpus to be significantly better than others. Finally, we find that using a smaller frame shift in short-time processing of speech can significantly improve cross-corpus generalization. The proposed techniques to address cross-corpus generalization include channel normalization, better training corpus, and smaller frame shift in short-time Fourier transform (STFT). These techniques together improve the objective intelligibility and quality scores on untrained corpora significantly.

Entities:  

Keywords:  Speech enhancement; channel generalization; cross-corpus generalization; deep learning; robust enhancement

Year:  2020        PMID: 33748327      PMCID: PMC7971413          DOI: 10.1109/taslp.2020.3016487

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  7 in total

1.  Supervised Speech Separation Based on Deep Learning: An Overview.

Authors:  DeLiang Wang; Jitong Chen
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2018-05-30

2.  Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.

Authors:  Yan Zhao; Zhong-Qiu Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2018-09-17

3.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors:  Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2016-05       Impact factor: 1.840

4.  Long short-term memory for speaker generalization in supervised speech separation.

Authors:  Jitong Chen; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

5.  Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.

Authors:  Ke Tan; Jitong Chen; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2018-10-15

6.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

7.  Complex Ratio Masking for Monaural Speech Separation.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-12-23
  7 in total
  5 in total

1.  A causal and talker-independent speaker separation/dereverberation deep learning algorithm: Cost associated with conversion to real-time capable operation.

Authors:  Eric W Healy; Hassan Taherian; Eric M Johnson; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2021-11       Impact factor: 1.840

2.  Towards Robust Speech Super-resolution.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-01-25

3.  Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2022-03-22

4.  Deep learning based speaker separation and dereverberation can generalize across different languages to improve intelligibility.

Authors:  Eric W Healy; Eric M Johnson; Masood Delfarah; Divya S Krishnagiri; Victoria A Sevich; Hassan Taherian; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2021-10       Impact factor: 2.482

5.  Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-03-08
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.