Literature DB >> 34262993

A New Framework for CNN-Based Speech Enhancement in the Time Domain.

Ashutosh Pandey1, DeLiang Wang2.   

Abstract

This paper proposes a new learning mechanism for a fully convolutional neural network (CNN) to address speech enhancement in the time domain. The CNN takes as input the time frames of noisy utterance and outputs the time frames of the enhanced utterance. At the training time, we add an extra operation that converts the time domain to the frequency domain. This conversion corresponds to simple matrix multiplication, and is hence differentiable implying that a frequency domain loss can be used for training in the time domain. We use mean absolute error loss between the enhanced short-time Fourier transform (STFT) magnitude and the clean STFT magnitude to train the CNN. This way, the model can exploit the domain knowledge of converting a signal to the frequency domain for analysis. Moreover, this approach avoids the well-known invalid STFT problem since the proposed CNN operates in the time domain. Experimental results demonstrate that the proposed method substantially outperforms the other methods of speech enhancement. The proposed method is easy to implement and applicable to related speech processing tasks that require time-frequency masking or spectral mapping.

Entities:  

Keywords:  Speech enhancement; deep learning; fully convolutional neural network; mean absolute error; time domain enhancement

Year:  2019        PMID: 34262993      PMCID: PMC8276831          DOI: 10.1109/taslp.2019.2913512

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  4 in total

1.  Towards Robust Speech Super-resolution.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-01-25

2.  Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2022-03-22

3.  Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-12-28

4.  Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models.

Authors:  Mohamed Nabih Ali; Daniele Falavigna; Alessio Brutti
Journal:  Sensors (Basel)       Date:  2022-01-04       Impact factor: 3.576

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.