Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A New Framework for CNN-Based Speech Enhancement in the Time Domain.

Literature DB >> 34262993

A New Framework for CNN-Based Speech Enhancement in the Time Domain.

Abstract

This paper proposes a new learning mechanism for a fully convolutional neural network (CNN) to address speech enhancement in the time domain. The CNN takes as input the time frames of noisy utterance and outputs the time frames of the enhanced utterance. At the training time, we add an extra operation that converts the time domain to the frequency domain. This conversion corresponds to simple matrix multiplication, and is hence differentiable implying that a frequency domain loss can be used for training in the time domain. We use mean absolute error loss between the enhanced short-time Fourier transform (STFT) magnitude and the clean STFT magnitude to train the CNN. This way, the model can exploit the domain knowledge of converting a signal to the frequency domain for analysis. Moreover, this approach avoids the well-known invalid STFT problem since the proposed CNN operates in the time domain. Experimental results demonstrate that the proposed method substantially outperforms the other methods of speech enhancement. The proposed method is easy to implement and applicable to related speech processing tasks that require time-frequency masking or spectral mapping.

Entities: Disease

Keywords: Speech enhancement; deep learning; fully convolutional neural network; mean absolute error; time domain enhancement

Year: 2019 PMID： 34262993 PMCID： PMC8276831 DOI： 10.1109/taslp.2019.2913512

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

Keyword Cloud
Cited

4 in total

A New Framework for CNN-Based Speech Enhancement in the Time Domain.

1. Towards Robust Speech Super-resolution.

2. Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

3. Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.

4. Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models.