Literature DB >> 33748322

Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.

Yuzhou Liu1, DeLiang Wang2.   

Abstract

We address talker-independent monaural speaker separation from the perspectives of deep learning and computational auditory scene analysis (CASA). Specifically, we decompose the multi-speaker separation task into the stages of simultaneous grouping and sequential grouping. Simultaneous grouping is first performed in each time frame by separating the spectra of different speakers with a permutation-invariantly trained neural network. In the second stage, the frame-level separated spectra are sequentially grouped to different speakers by a clustering network. The proposed deep CASA approach optimizes frame-level separation and speaker tracking in turn, and produces excellent results for both objectives. Experimental results on the benchmark WSJ0-2mix database show that the new approach achieves the state-of-the-art results with a modest model size.

Entities:  

Keywords:  Monaural speech separation; computational auditory scene analysis; deep CASA; speaker separation

Year:  2019        PMID: 33748322      PMCID: PMC7976856          DOI: 10.1109/taslp.2019.2941148

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  6 in total

1.  Informational and energetic masking effects in the perception of two simultaneous talkers.

Authors:  D S Brungart
Journal:  J Acoust Soc Am       Date:  2001-03       Impact factor: 1.840

2.  An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker.

Authors:  Eric W Healy; Masood Delfarah; Jordan L Vasko; Brittney L Carter; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

3.  Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

Authors:  Yi Luo; Nima Mesgarani
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-05-06

4.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

5.  A Deep Ensemble Learning Method for Monaural Speech Separation.

Authors:  Xiao-Lei Zhang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2016-03-01

6.  Complex Ratio Masking for Monaural Speech Separation.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-12-23
  6 in total
  5 in total

1.  A causal and talker-independent speaker separation/dereverberation deep learning algorithm: Cost associated with conversion to real-time capable operation.

Authors:  Eric W Healy; Hassan Taherian; Eric M Johnson; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2021-11       Impact factor: 1.840

2.  Deep learning based speaker separation and dereverberation can generalize across different languages to improve intelligibility.

Authors:  Eric W Healy; Eric M Johnson; Masood Delfarah; Divya S Krishnagiri; Victoria A Sevich; Hassan Taherian; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2021-10       Impact factor: 2.482

3.  Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.

Authors:  Zhong-Qiu Wang; Peidong Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-26

4.  Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception.

Authors:  Enea Ceolini; Jens Hjortkjær; Daniel D E Wong; James O'Sullivan; Vinay S Raghavan; Jose Herrero; Ashesh D Mehta; Shih-Chii Liu; Nima Mesgarani
Journal:  Neuroimage       Date:  2020-08-20       Impact factor: 6.556

5.  Towards Model Compression for Deep Learning Based Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-21
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.