Literature DB >> 26900194

Noise Perturbation for Supervised Speech Separation.

Jitong Chen1, Yuxuan Wang1, DeLiang Wang2.   

Abstract

Speech separation can be treated as a mask estimation problem, where interference-dominant portions are masked in a time-frequency representation of noisy speech. In supervised speech separation, a classifier is typically trained on a mixture set of speech and noise. It is important to efficiently utilize limited training data to make the classifier generalize well. When target speech is severely interfered by a nonstationary noise, a classifier tends to mistake noise patterns for speech patterns. Expansion of a noise through proper perturbation during training helps to expose the classifier to a broader variety of noisy conditions, and hence may lead to better separation performance. This study examines three noise perturbations on supervised speech separation: noise rate, vocal tract length, and frequency perturbation at low signal-to-noise ratios (SNRs). The speech separation performance is evaluated in terms of classification accuracy, hit minus false-alarm rate and short-time objective intelligibility (STOI). The experimental results show that frequency perturbation is the best among the three perturbations in terms of speech separation. In particular, the results show that frequency perturbation is effective in reducing the error of misclassifying a noise pattern as a speech pattern.

Entities:  

Keywords:  Speech separation; noise perturbation; supervised learning

Year:  2016        PMID: 26900194      PMCID: PMC4754974          DOI: 10.1016/j.specom.2015.12.006

Source DB:  PubMed          Journal:  Speech Commun        ISSN: 0167-6393            Impact factor:   2.017


  10 in total

1.  Multi-column deep neural network for traffic sign classification.

Authors:  Dan Cireşan; Ueli Meier; Jonathan Masci; Jürgen Schmidhuber
Journal:  Neural Netw       Date:  2012-02-14

2.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.

Authors:  Douglas S Brungart; Peter S Chang; Brian D Simpson; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2006-12       Impact factor: 1.840

3.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.

Authors:  Ning Li; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2008-03       Impact factor: 1.840

4.  An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors:  Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2013-10       Impact factor: 1.840

5.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors:  Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

6.  Perceptual learning for speech in noise after application of binary time-frequency masks.

Authors:  Mahnaz Ahmadi; Vauna L Gross; Donal G Sinex
Journal:  J Acoust Soc Am       Date:  2013-03       Impact factor: 1.840

7.  Evaluation of the importance of time-frequency contributions to speech intelligibility in noise.

Authors:  Chengzhu Yu; Kamil K Wójcicki; Philipos C Loizou; John H L Hansen; Michael T Johnson
Journal:  J Acoust Soc Am       Date:  2014-05       Impact factor: 1.840

8.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

9.  Environment-specific noise suppression for improved speech intelligibility by cochlear implant users.

Authors:  Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2010-06       Impact factor: 1.840

10.  Speech intelligibility in background noise with ideal binary time-frequency masking.

Authors:  DeLiang Wang; Ulrik Kjems; Michael S Pedersen; Jesper B Boldt; Thomas Lunner
Journal:  J Acoust Soc Am       Date:  2009-04       Impact factor: 1.840

  10 in total
  4 in total

1.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors:  Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2016-05       Impact factor: 1.840

2.  Long short-term memory for speaker generalization in supervised speech separation.

Authors:  Jitong Chen; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

3.  A Deep Ensemble Learning Method for Monaural Speech Separation.

Authors:  Xiao-Lei Zhang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2016-03-01

4.  The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

Authors:  Thomas Bentsen; Tobias May; Abigail A Kressner; Torsten Dau
Journal:  PLoS One       Date:  2018-05-15       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.