| Literature DB >> 30112422 |
Donald S Williamson1, DeLiang Wang2.
Abstract
In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.Entities:
Keywords: Complex ideal ratio mask; deep neural networks; dereverberation; speech quality; speech separation
Year: 2017 PMID: 30112422 PMCID: PMC6089240 DOI: 10.1109/TASLP.2017.2696307
Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process