Literature DB >> 27917394

A Deep Ensemble Learning Method for Monaural Speech Separation.

Xiao-Lei Zhang1, DeLiang Wang1.   

Abstract

Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose a deep ensemble method, named multicontext networks, to address monaural speech separation. The first multicontext network averages the outputs of multiple DNNs whose inputs employ different window lengths. The second multicontext network is a stack of multiple DNNs. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ratio mask of the target speaker; the DNNs in the same module employ different contexts. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations.

Entities:  

Keywords:  Deep neural networks; ensemble learning; mapping-based separation; masking-based separation; monaural speech separation; multicontext networks

Year:  2016        PMID: 27917394      PMCID: PMC5131883          DOI: 10.1109/TASLP.2016.2536478

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  7 in total

1.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors:  Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

2.  Noise Perturbation for Supervised Speech Separation.

Authors:  Jitong Chen; Yuxuan Wang; DeLiang Wang
Journal:  Speech Commun       Date:  2016-04-01       Impact factor: 2.017

3.  Computational speech segregation based on an auditory-inspired modulation analysis.

Authors:  Tobias May; Torsten Dau
Journal:  J Acoust Soc Am       Date:  2014-12       Impact factor: 1.840

4.  A classification based approach to speech segregation.

Authors:  Kun Han; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2012-11       Impact factor: 1.840

5.  Tensor deep stacking networks.

Authors:  Brian Hutchinson; Li Deng; Dong Yu
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2013-08       Impact factor: 6.226

6.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

7.  Complex Ratio Masking for Monaural Speech Separation.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-12-23
  7 in total
  6 in total

1.  An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker.

Authors:  Eric W Healy; Masood Delfarah; Jordan L Vasko; Brittney L Carter; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

Review 2.  Evolution of the liver biopsy and its future.

Authors:  Dhanpat Jain; Richard Torres; Romulo Celli; Jeremy Koelmel; Georgia Charkoftaki; Vasilis Vasiliou
Journal:  Transl Gastroenterol Hepatol       Date:  2021-04-05

3.  A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.

Authors:  Masood Delfarah; Yuzhou Liu; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2020-09       Impact factor: 1.840

4.  Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.

Authors:  Masood Delfarah; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-08-12

5.  Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.

Authors:  Yuzhou Liu; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-09-12

6.  The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

Authors:  Thomas Bentsen; Tobias May; Abigail A Kressner; Torsten Dau
Journal:  PLoS One       Date:  2018-05-15       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.