Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Literature DB >> 33748327

On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Abstract

In recent years, supervised approaches using deep neural networks (DNNs) have become the mainstream for speech enhancement. It has been established that DNNs generalize well to untrained noises and speakers if trained using a large number of noises and speakers. However, we find that DNNs fail to generalize to new speech corpora in low signal-to-noise ratio (SNR) conditions. In this work, we establish that the lack of generalization is mainly due to the channel mismatch, i.e. different recording conditions between the trained and untrained corpus. Additionally, we observe that traditional channel normalization techniques are not effective in improving cross-corpus generalization. Further, we evaluate publicly available datasets that are promising for generalization. We find one particular corpus to be significantly better than others. Finally, we find that using a smaller frame shift in short-time processing of speech can significantly improve cross-corpus generalization. The proposed techniques to address cross-corpus generalization include channel normalization, better training corpus, and smaller frame shift in short-time Fourier transform (STFT). These techniques together improve the objective intelligibility and quality scores on untrained corpora significantly.

Entities: Chemical Disease Gene

Keywords: Speech enhancement; channel generalization; cross-corpus generalization; deep learning; robust enhancement

Year: 2020 PMID： 33748327 PMCID： PMC7971413 DOI： 10.1109/taslp.2020.3016487

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

7 in total

1. Supervised Speech Separation Based on Deep Learning: An Overview.

Authors: DeLiang Wang; Jitong Chen
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2018-05-30

2. Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.

Authors: Yan Zhao; Zhong-Qiu Wang; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2018-09-17

3. Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors: Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal: J Acoust Soc Am Date: 2016-05 Impact factor: 1.840

4. Long short-term memory for speaker generalization in supervised speech separation.

Authors: Jitong Chen; DeLiang Wang
Journal: J Acoust Soc Am Date: 2017-06 Impact factor: 1.840

5. Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.

Authors: Ke Tan; Jitong Chen; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2018-10-15

6. On Training Targets for Supervised Speech Separation.

Authors: Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2014-12

7. Complex Ratio Masking for Monaural Speech Separation.

Authors: Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2015-12-23

7 in total

5 in total

1. A causal and talker-independent speaker separation/dereverberation deep learning algorithm: Cost associated with conversion to real-time capable operation.

Authors: Eric W Healy; Hassan Taherian; Eric M Johnson; DeLiang Wang
Journal: J Acoust Soc Am Date: 2021-11 Impact factor: 1.840

2. Towards Robust Speech Super-resolution.

Authors: Heming Wang; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2021-01-25

3. Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

Authors: Ashutosh Pandey; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2022-03-22

4. Deep learning based speaker separation and dereverberation can generalize across different languages to improve intelligibility.

Authors: Eric W Healy; Eric M Johnson; Masood Delfarah; Divya S Krishnagiri; Victoria A Sevich; Hassan Taherian; DeLiang Wang
Journal: J Acoust Soc Am Date: 2021-10 Impact factor: 2.482

5. Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Authors: Ashutosh Pandey; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2021-03-08

5 in total