Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

Literature DB >> 26428778

Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

Donald S Williamson¹, Yuxuan Wang¹, DeLiang Wang².

Abstract

As a means of speech separation, time-frequency masking applies a gain function to the time-frequency representation of noisy speech. On the other hand, nonnegative matrix factorization (NMF) addresses separation by linearly combining basis vectors from speech and noise models to approximate noisy speech. This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. An ideal ratio mask is estimated, which separates speech from noise with reasonable sound quality. A deep neural network then approximates clean speech by estimating activation weights from the ratio-masked speech, where the weights linearly combine elements from a NMF speech model. Systematic comparisons using objective metrics, including the perceptual evaluation of speech quality, show that the proposed algorithm achieves higher speech quality than related masking and NMF methods. In addition, a listening test was performed and its results show that the output of the proposed algorithm is preferred over the comparison systems in terms of speech quality.

Mesh：

Year: 2015 PMID： 26428778 PMCID： PMC5392055 DOI： 10.1121/1.4928612

Source DB: PubMed Journal: J Acoust Soc Am ISSN： 0001-4966 Impact factor: 1.840

8 in total

1. Learning the parts of objects by non-negative matrix factorization.

Authors: D D Lee; H S Seung
Journal: Nature Date: 1999-10-21 Impact factor: 49.962

2. Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners.

Authors: Kathryn H Arehart; James M Kates; Melinda C Anderson; Lewis O Harvey
Journal: J Acoust Soc Am Date: 2007-08 Impact factor: 1.840

3. An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors: Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal: J Acoust Soc Am Date: 2013-10 Impact factor: 1.840

4. Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis.

Authors: Cédric Févotte; Nancy Bertin; Jean-Louis Durrieu
Journal: Neural Comput Date: 2009-03 Impact factor: 2.026

5. An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors: Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal: J Acoust Soc Am Date: 2009-09 Impact factor: 1.840

6. Ideal time-frequency masking algorithms lead to different speech intelligibility and quality in normal-hearing and cochlear implant listeners.

Authors: Raphael Koning; Nilesh Madhu; Jan Wouters
Journal: IEEE Trans Biomed Eng Date: 2014-08-26 Impact factor: 4.538

7. Reconstruction techniques for improving the perceptual quality of binary masked speech.

Authors: Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal: J Acoust Soc Am Date: 2014-08 Impact factor: 1.840

8. On Training Targets for Supervised Speech Separation.

Authors: Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2014-12