Literature DB >> 25096123

Reconstruction techniques for improving the perceptual quality of binary masked speech.

Donald S Williamson1, Yuxuan Wang1, DeLiang Wang2.   

Abstract

This study proposes an approach to improve the perceptual quality of speech separated by binary masking through the use of reconstruction in the time-frequency domain. Non-negative matrix factorization and sparse reconstruction approaches are investigated, both using a linear combination of basis vectors to represent a signal. In this approach, the short-time Fourier transform (STFT) of separated speech is represented as a linear combination of STFTs from a clean speech dictionary. Binary masking for separation is performed using deep neural networks or Bayesian classifiers. The perceptual evaluation of speech quality, which is a standard objective speech quality measure, is used to evaluate the performance of the proposed approach. The results show that the proposed techniques improve the perceptual quality of binary masked speech, and outperform traditional time-frequency reconstruction approaches.

Mesh:

Year:  2014        PMID: 25096123      PMCID: PMC5392053          DOI: 10.1121/1.4884759

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  11 in total

1.  Learning the parts of objects by non-negative matrix factorization.

Authors:  D D Lee; H S Seung
Journal:  Nature       Date:  1999-10-21       Impact factor: 49.962

2.  Image denoising via sparse and redundant representations over learned dictionaries.

Authors:  Michael Elad; Michal Aharon
Journal:  IEEE Trans Image Process       Date:  2006-12       Impact factor: 10.856

3.  Determination of the potential benefit of time-frequency gain manipulation.

Authors:  Michael C Anzalone; Lauren Calandruccio; Karen A Doherty; Laurel H Carney
Journal:  Ear Hear       Date:  2006-10       Impact factor: 3.570

4.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction.

Authors:  Ning Li; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2008-03       Impact factor: 1.840

5.  Sparse representation for color image restoration.

Authors:  Julien Mairal; Michael Elad; Guillermo Sapiro
Journal:  IEEE Trans Image Process       Date:  2008-01       Impact factor: 10.856

Review 6.  Time-frequency masking for speech separation and its potential for hearing aid design.

Authors: 
Journal:  Trends Amplif       Date:  2008-10-30

7.  An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors:  Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2013-10       Impact factor: 1.840

8.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors:  Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

9.  Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise.

Authors:  Shuyang Cao; Liang Li; Xihong Wu
Journal:  J Acoust Soc Am       Date:  2011-04       Impact factor: 1.840

10.  Speech intelligibility in background noise with ideal binary time-frequency masking.

Authors:  DeLiang Wang; Ulrik Kjems; Michael S Pedersen; Jesper B Boldt; Thomas Lunner
Journal:  J Acoust Soc Am       Date:  2009-04       Impact factor: 1.840

View more
  3 in total

1.  Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2015-09       Impact factor: 1.840

2.  Impact of phase estimation on single-channel speech separation based on time-frequency masking.

Authors:  Florian Mayer; Donald S Williamson; Pejman Mowlaee; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

3.  Complex Ratio Masking for Monaural Speech Separation.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-12-23
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.