Literature DB >> 28679243

Impact of phase estimation on single-channel speech separation based on time-frequency masking.

Florian Mayer1, Donald S Williamson2, Pejman Mowlaee3, DeLiang Wang4.   

Abstract

Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-to-noise ratio and noise scenarios.

Mesh:

Year:  2017        PMID: 28679243      PMCID: PMC6909979          DOI: 10.1121/1.4986647

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  5 in total

1.  Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2015-09       Impact factor: 1.840

2.  An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors:  Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2013-10       Impact factor: 1.840

3.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors:  Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

4.  Reconstruction techniques for improving the perceptual quality of binary masked speech.

Authors:  Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2014-08       Impact factor: 1.840

5.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.