Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Complex Ratio Masking for Monaural Speech Separation.

Literature DB >> 27069955

Complex Ratio Masking for Monaural Speech Separation.

Donald S Williamson¹, Yuxuan Wang², DeLiang Wang³.

Abstract

Speech separation systems usually operate on the short-time Fourier transform (STFT) of noisy speech, and enhance only the magnitude spectrum while leaving the phase spectrum unchanged. This is done because there was a belief that the phase spectrum is unimportant for speech enhancement. Recent studies, however, suggest that phase is important for perceptual quality, leading some researchers to consider magnitude and phase spectrum enhancements. We present a supervised monaural speech separation approach that simultaneously enhances the magnitude and phase spectra by operating in the complex domain. Our approach uses a deep neural network to estimate the real and imaginary components of the ideal ratio mask defined in the complex domain. We report separation results for the proposed method and compare them to related systems. The proposed approach improves over other methods when evaluated with several objective metrics, including the perceptual evaluation of speech quality (PESQ), and a listening test where subjects prefer the proposed approach with at least a 69% rate.

Entities: Chemical Disease Gene Species

Keywords: Deep neural networks; complex ideal ratio mask; speech quality; speech separation

Year: 2015 PMID： 27069955 PMCID： PMC4826046 DOI： 10.1109/TASLP.2015.2512042

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

8 in total

1. An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech.

Authors: Cees H Taal; Richard C Hendriks; Richard Heusdens; Jesper Jensen
Journal: J Acoust Soc Am Date: 2011-11 Impact factor: 1.840

2. Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

Authors: Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal: J Acoust Soc Am Date: 2015-09 Impact factor: 1.840

3. Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners.

Authors: Kathryn H Arehart; James M Kates; Melinda C Anderson; Lewis O Harvey
Journal: J Acoust Soc Am Date: 2007-08 Impact factor: 1.840

4. An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors: Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal: J Acoust Soc Am Date: 2013-10 Impact factor: 1.840

5. An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors: Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal: J Acoust Soc Am Date: 2009-09 Impact factor: 1.840

6. Ideal time-frequency masking algorithms lead to different speech intelligibility and quality in normal-hearing and cochlear implant listeners.

Authors: Raphael Koning; Nilesh Madhu; Jan Wouters
Journal: IEEE Trans Biomed Eng Date: 2014-08-26 Impact factor: 4.538

7. Reconstruction techniques for improving the perceptual quality of binary masked speech.

Authors: Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal: J Acoust Soc Am Date: 2014-08 Impact factor: 1.840

8. On Training Targets for Supervised Speech Separation.

Authors: Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2014-12

8 in total

19 in total

1. A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions.

Authors: Eric W Healy; Eric M Johnson; Masood Delfarah; DeLiang Wang
Journal: J Acoust Soc Am Date: 2020-06 Impact factor: 1.840

Complex Ratio Masking for Monaural Speech Separation.

1. An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech.

2. Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

3. Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners.

4. An algorithm to improve speech recognition in noise for hearing-impaired listeners.

5. An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

6. Ideal time-frequency masking algorithms lead to different speech intelligibility and quality in normal-hearing and cochlear implant listeners.

7. Reconstruction techniques for improving the perceptual quality of binary masked speech.

8. On Training Targets for Supervised Speech Separation.

1. A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions.

2. A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.

3. Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.

4. Deep Learning Based Target Cancellation for Speech Dereverberation.

5. Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

6. On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

7. Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.

8. Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.

9. A Deep Ensemble Learning Method for Monaural Speech Separation.

10. Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.