Literature DB >> 31223631

Supervised Speech Separation Based on Deep Learning: An Overview.

DeLiang Wang1, Jitong Chen2.   

Abstract

Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then, we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multitalker separation), and speech dereverberation, as well as multimicrophone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

Entities:  

Keywords:  Seech separation; array separation; beamforming; deep learning; deep neural networks; speaker separation; speech dereverberation; speech enhancement; supervised speech separation; time-frequency masking

Year:  2018        PMID: 31223631      PMCID: PMC6586438          DOI: 10.1109/TASLP.2018.2842159

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  25 in total

1.  A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.

Authors:  Masood Delfarah; Yuzhou Liu; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2020-09       Impact factor: 1.840

2.  Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

Authors:  Yi Luo; Nima Mesgarani
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-05-06

3.  Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.

Authors:  Masood Delfarah; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-08-12

4.  Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.

Authors:  Zhong-Qiu Wang; Peidong Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-05-28

5.  Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention.

Authors:  Yan Zhao; DeLiang Wang; Buye Xu; Tao Zhang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-05-18

6.  Deep Learning Based Target Cancellation for Speech Dereverberation.

Authors:  Zhong-Qiu Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-02-28

7.  On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-08-14

8.  SSGD: SPARSITY-PROMOTING STOCHASTIC GRADIENT DESCENT ALGORITHM FOR UNBIASED DNN PRUNING.

Authors:  Ching-Hua Lee; Igor Fedorov; Bhaskar D Rao; Harinath Garudadri
Journal:  Proc IEEE Int Conf Acoust Speech Signal Process       Date:  2020-05-14

9.  Towards Model Compression for Deep Learning Based Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-21

10.  Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.

Authors:  Ke Tan; Xueliang Zhang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-05-21
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.