Literature DB >> 25599083

On Training Targets for Supervised Speech Separation.

Yuxuan Wang1, Arun Narayanan1, DeLiang Wang2.   

Abstract

Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the use of binary targets. In this study, we evaluate and compare separation results by using different training targets, including the IBM, the target binary mask, the ideal ratio mask (IRM), the short-time Fourier transform spectral magnitude and its corresponding mask (FFT-MASK), and the Gammatone frequency power spectrum. Our results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics. In addition, we find that masking based targets, in general, are significantly better than spectral envelope based targets. We also present comparisons with recent methods in non-negative matrix factorization and speech enhancement, which show clear performance advantages of supervised speech separation.

Entities:  

Keywords:  Deep neural networks; speech separation; supervised learning; training targets

Year:  2014        PMID: 25599083      PMCID: PMC4293540          DOI: 10.1109/TASLP.2014.2352935

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  9 in total

1.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.

Authors:  Douglas S Brungart; Peter S Chang; Brian D Simpson; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2006-12       Impact factor: 1.840

2.  Determination of the potential benefit of time-frequency gain manipulation.

Authors:  Michael C Anzalone; Lauren Calandruccio; Karen A Doherty; Laurel H Carney
Journal:  Ear Hear       Date:  2006-10       Impact factor: 3.570

3.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.

Authors:  Ning Li; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2008-03       Impact factor: 1.840

4.  An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors:  Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2013-10       Impact factor: 1.840

5.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech.

Authors:  Ulrik Kjems; Jesper B Boldt; Michael S Pedersen; Thomas Lunner; Deliang Wang
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

6.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors:  Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

7.  A classification based approach to speech segregation.

Authors:  Kun Han; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2012-11       Impact factor: 1.840

8.  The role of binary mask patterns in automatic speech recognition in background noise.

Authors:  Arun Narayanan; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2013-05       Impact factor: 1.840

9.  Speech intelligibility in background noise with ideal binary time-frequency masking.

Authors:  DeLiang Wang; Ulrik Kjems; Michael S Pedersen; Jesper B Boldt; Thomas Lunner
Journal:  J Acoust Soc Am       Date:  2009-04       Impact factor: 1.840

  9 in total
  34 in total

1.  Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2015-09       Impact factor: 1.840

2.  An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker.

Authors:  Eric W Healy; Masood Delfarah; Jordan L Vasko; Brittney L Carter; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

3.  Noise Perturbation for Supervised Speech Separation.

Authors:  Jitong Chen; Yuxuan Wang; DeLiang Wang
Journal:  Speech Commun       Date:  2016-04-01       Impact factor: 2.017

4.  A deep learning algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker and reverberation.

Authors:  Eric W Healy; Masood Delfarah; Eric M Johnson; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2019-03       Impact factor: 1.840

5.  An ideal quantized mask to increase intelligibility and quality of speech in noise.

Authors:  Eric W Healy; Jordan L Vasko
Journal:  J Acoust Soc Am       Date:  2018-09       Impact factor: 1.840

6.  A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions.

Authors:  Yan Zhao; DeLiang Wang; Eric M Johnson; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2018-09       Impact factor: 1.840

7.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors:  Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2016-05       Impact factor: 1.840

8.  A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.

Authors:  Masood Delfarah; Yuzhou Liu; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2020-09       Impact factor: 1.840

9.  Long short-term memory for speaker generalization in supervised speech separation.

Authors:  Jitong Chen; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

10.  Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.

Authors:  Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-01-14
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.