Literature DB >> 27069955

Complex Ratio Masking for Monaural Speech Separation.

Donald S Williamson1, Yuxuan Wang2, DeLiang Wang3.   

Abstract

Speech separation systems usually operate on the short-time Fourier transform (STFT) of noisy speech, and enhance only the magnitude spectrum while leaving the phase spectrum unchanged. This is done because there was a belief that the phase spectrum is unimportant for speech enhancement. Recent studies, however, suggest that phase is important for perceptual quality, leading some researchers to consider magnitude and phase spectrum enhancements. We present a supervised monaural speech separation approach that simultaneously enhances the magnitude and phase spectra by operating in the complex domain. Our approach uses a deep neural network to estimate the real and imaginary components of the ideal ratio mask defined in the complex domain. We report separation results for the proposed method and compare them to related systems. The proposed approach improves over other methods when evaluated with several objective metrics, including the perceptual evaluation of speech quality (PESQ), and a listening test where subjects prefer the proposed approach with at least a 69% rate.

Entities:  

Keywords:  Deep neural networks; complex ideal ratio mask; speech quality; speech separation

Year:  2015        PMID: 27069955      PMCID: PMC4826046          DOI: 10.1109/TASLP.2015.2512042

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  8 in total

1.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech.

Authors:  Cees H Taal; Richard C Hendriks; Richard Heusdens; Jesper Jensen
Journal:  J Acoust Soc Am       Date:  2011-11       Impact factor: 1.840

2.  Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2015-09       Impact factor: 1.840

3.  Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners.

Authors:  Kathryn H Arehart; James M Kates; Melinda C Anderson; Lewis O Harvey
Journal:  J Acoust Soc Am       Date:  2007-08       Impact factor: 1.840

4.  An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors:  Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2013-10       Impact factor: 1.840

5.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors:  Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

6.  Ideal time-frequency masking algorithms lead to different speech intelligibility and quality in normal-hearing and cochlear implant listeners.

Authors:  Raphael Koning; Nilesh Madhu; Jan Wouters
Journal:  IEEE Trans Biomed Eng       Date:  2014-08-26       Impact factor: 4.538

7.  Reconstruction techniques for improving the perceptual quality of binary masked speech.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2014-08       Impact factor: 1.840

8.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12
  8 in total
  19 in total

1.  A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions.

Authors:  Eric W Healy; Eric M Johnson; Masood Delfarah; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2020-06       Impact factor: 1.840

2.  A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.

Authors:  Masood Delfarah; Yuzhou Liu; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2020-09       Impact factor: 1.840

3.  Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.

Authors:  Zhong-Qiu Wang; Peidong Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-05-28

4.  Deep Learning Based Target Cancellation for Speech Dereverberation.

Authors:  Zhong-Qiu Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-02-28

5.  Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-11-22

6.  On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-08-14

7.  Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.

Authors:  Yuzhou Liu; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-09-12

8.  Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.

Authors:  Donald S Williamson; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2017-04-20

9.  A Deep Ensemble Learning Method for Monaural Speech Separation.

Authors:  Xiao-Lei Zhang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2016-03-01

10.  Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.

Authors:  Ke Tan; Xueliang Zhang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-21
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.