Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 On Training Targets for Supervised Speech Separation.

Literature DB >> 25599083

On Training Targets for Supervised Speech Separation.

Yuxuan Wang¹, Arun Narayanan¹, DeLiang Wang².

Abstract

Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the use of binary targets. In this study, we evaluate and compare separation results by using different training targets, including the IBM, the target binary mask, the ideal ratio mask (IRM), the short-time Fourier transform spectral magnitude and its corresponding mask (FFT-MASK), and the Gammatone frequency power spectrum. Our results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics. In addition, we find that masking based targets, in general, are significantly better than spectral envelope based targets. We also present comparisons with recent methods in non-negative matrix factorization and speech enhancement, which show clear performance advantages of supervised speech separation.

Entities: Chemical Disease Species

Keywords: Deep neural networks; speech separation; supervised learning; training targets

Year: 2014 PMID： 25599083 PMCID： PMC4293540 DOI： 10.1109/TASLP.2014.2352935

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

9 in total

1. Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.

Authors: Douglas S Brungart; Peter S Chang; Brian D Simpson; DeLiang Wang
Journal: J Acoust Soc Am Date: 2006-12 Impact factor: 1.840

2. Determination of the potential benefit of time-frequency gain manipulation.

Authors: Michael C Anzalone; Lauren Calandruccio; Karen A Doherty; Laurel H Carney
Journal: Ear Hear Date: 2006-10 Impact factor: 3.570

3. Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.

Authors: Ning Li; Philipos C Loizou
Journal: J Acoust Soc Am Date: 2008-03 Impact factor: 1.840

4. An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors: Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal: J Acoust Soc Am Date: 2013-10 Impact factor: 1.840

5. Role of mask pattern in intelligibility of ideal binary-masked noisy speech.

Authors: Ulrik Kjems; Jesper B Boldt; Michael S Pedersen; Thomas Lunner; Deliang Wang
Journal: J Acoust Soc Am Date: 2009-09 Impact factor: 1.840

6. An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors: Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal: J Acoust Soc Am Date: 2009-09 Impact factor: 1.840

7. A classification based approach to speech segregation.

Authors: Kun Han; DeLiang Wang
Journal: J Acoust Soc Am Date: 2012-11 Impact factor: 1.840

8. The role of binary mask patterns in automatic speech recognition in background noise.

Authors: Arun Narayanan; DeLiang Wang
Journal: J Acoust Soc Am Date: 2013-05 Impact factor: 1.840

9. Speech intelligibility in background noise with ideal binary time-frequency masking.

Authors: DeLiang Wang; Ulrik Kjems; Michael S Pedersen; Jesper B Boldt; Thomas Lunner
Journal: J Acoust Soc Am Date: 2009-04 Impact factor: 1.840

9 in total

34 in total

1. Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

Authors: Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal: J Acoust Soc Am Date: 2015-09 Impact factor: 1.840

2. An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker.

Authors: Eric W Healy; Masood Delfarah; Jordan L Vasko; Brittney L Carter; DeLiang Wang
Journal: J Acoust Soc Am Date: 2017-06 Impact factor: 1.840

3. Noise Perturbation for Supervised Speech Separation.

Authors: Jitong Chen; Yuxuan Wang; DeLiang Wang
Journal: Speech Commun Date: 2016-04-01 Impact factor: 2.017

4. A deep learning algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker and reverberation.

Authors: Eric W Healy; Masood Delfarah; Eric M Johnson; DeLiang Wang
Journal: J Acoust Soc Am Date: 2019-03 Impact factor: 1.840

5. An ideal quantized mask to increase intelligibility and quality of speech in noise.

Authors: Eric W Healy; Jordan L Vasko
Journal: J Acoust Soc Am Date: 2018-09 Impact factor: 1.840

6. A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions.

Authors: Yan Zhao; DeLiang Wang; Eric M Johnson; Eric W Healy
Journal: J Acoust Soc Am Date: 2018-09 Impact factor: 1.840