Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.

Literature DB >> 34212067

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.

Zhong-Qiu Wang¹, Peidong Wang², DeLiang Wang³.

Abstract

We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry. State-of-the-art separation performance is obtained on the simulated two-talker SMS-WSJ corpus and the real-recorded LibriCSS dataset.

Entities: Chemical

Keywords: Complex spectral mapping; deep learning; microphone array processing; speaker separation

Year: 2021 PMID： 34212067 PMCID： PMC8240467 DOI： 10.1109/taslp.2021.3083405

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

Keyword Cloud
References

6 in total

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.

1. Supervised Speech Separation Based on Deep Learning: An Overview.

2. Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

3. Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.

4. Deep Learning Based Target Cancellation for Speech Dereverberation.

5. Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.

6. Complex Ratio Masking for Monaural Speech Separation.