Literature DB >> 28679261

Long short-term memory for speaker generalization in supervised speech separation.

Jitong Chen1, DeLiang Wang1.   

Abstract

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation.

Mesh:

Year:  2017        PMID: 28679261      PMCID: PMC5482750          DOI: 10.1121/1.4986931

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  10 in total

1.  Learning to forget: continual prediction with LSTM.

Authors:  F A Gers; J Schmidhuber; F Cummins
Journal:  Neural Comput       Date:  2000-10       Impact factor: 2.026

2.  Learning long-term dependencies with gradient descent is difficult.

Authors:  Y Bengio; P Simard; P Frasconi
Journal:  IEEE Trans Neural Netw       Date:  1994

3.  An algorithm to improve speech recognition in noise for hearing-impaired listeners.

Authors:  Eric W Healy; Sarah E Yoho; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2013-10       Impact factor: 1.840

4.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

Authors:  Gibak Kim; Yang Lu; Yi Hu; Philipos C Loizou
Journal:  J Acoust Soc Am       Date:  2009-09       Impact factor: 1.840

5.  Noise Perturbation for Supervised Speech Separation.

Authors:  Jitong Chen; Yuxuan Wang; DeLiang Wang
Journal:  Speech Commun       Date:  2016-04-01       Impact factor: 2.017

6.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

7.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors:  Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2016-05       Impact factor: 1.840

8.  Long short-term memory for speaker generalization in supervised speech separation.

Authors:  Jitong Chen; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

9.  An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type.

Authors:  Eric W Healy; Sarah E Yoho; Jitong Chen; Yuxuan Wang; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2015-09       Impact factor: 1.840

10.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12
  10 in total
  17 in total

1.  An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker.

Authors:  Eric W Healy; Masood Delfarah; Jordan L Vasko; Brittney L Carter; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

2.  A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions.

Authors:  Eric W Healy; Eric M Johnson; Masood Delfarah; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2020-06       Impact factor: 1.840

3.  A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions.

Authors:  Yan Zhao; DeLiang Wang; Eric M Johnson; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2018-09       Impact factor: 1.840

4.  Long short-term memory for speaker generalization in supervised speech separation.

Authors:  Jitong Chen; DeLiang Wang
Journal:  J Acoust Soc Am       Date:  2017-06       Impact factor: 1.840

5.  Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-11-22

6.  On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-08-14

7.  Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2022-03-22

8.  Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-12-28

9.  Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-03-08

10.  Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants.

Authors:  Tobias Goehring; Mahmoud Keshavarzi; Robert P Carlyon; Brian C J Moore
Journal:  J Acoust Soc Am       Date:  2019-07       Impact factor: 1.840

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.