| Literature DB >> 33748321 |
Masood Delfarah1, DeLiang Wang1.
Abstract
Speaker separation refers to the problem of separating speech signals from a mixture of simultaneous speakers. Previous studies are limited to addressing the speaker separation problem in anechoic conditions. This paper addresses the problem of talker-dependent speaker separation in reverberant conditions, which are characteristic of real-world environments. We employ recurrent neural networks with bidirectional long short-term memory (BLSTM) to separate and dereverberate the target speech signal. We propose two-stage networks to effectively deal with both speaker separation and speech dereverberation. In the two-stage model, the first stage separates and dereverberates two-talker mixtures and the second stage further enhances the separated target signal. We have extensively evaluated the two-stage architecture, and our empirical results demonstrate large improvements over unprocessed mixtures and clear performance gain over single-stage networks in a wide range of target-to-interferer ratios and reverberation times in simulated as well as recorded rooms. Moreover, we show that time-frequency masking yields better performance than spectral mapping for reverberant speaker separation.Entities:
Keywords: Cochannel speech separation; deep neural networks; speech dereverberation; two-stage network
Year: 2019 PMID: 33748321 PMCID: PMC7970708 DOI: 10.1109/taslp.2019.2934319
Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process