Literature DB >> 29430212

DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION.

Zhuo Chen1, Yi Luo1, Nima Mesgarani1.   

Abstract

Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source permutation and unknown number of sources in the mixture. We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source. Attractor points in this study are created by finding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. The proposed model is different from prior works in that it implements an end-to-end training, and it does not depend on the number of sources in the mixture. Two strategies are explored in the test time, K-means and fixed attractor points, where the latter requires no post-processing and can be implemented in real-time. We evaluated our system on Wall Street Journal dataset and show 5.49% improvement over the previous state-of-the-art methods.

Entities:  

Keywords:  Source separation; attractor network; deep clustering; multi-talker

Year:  2017        PMID: 29430212      PMCID: PMC5805382          DOI: 10.1109/ICASSP.2017.7952155

Source DB:  PubMed          Journal:  Proc IEEE Int Conf Acoust Speech Signal Process        ISSN: 1520-6149


  2 in total

1.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

2.  Human adults and human infants show a "perceptual magnet effect" for the prototypes of speech categories, monkeys do not.

Authors:  P K Kuhl
Journal:  Percept Psychophys       Date:  1991-08
  2 in total
  7 in total

1.  Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

Authors:  Yi Luo; Nima Mesgarani
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-05-06

Review 2.  Active inference, selective attention, and the cocktail party problem.

Authors:  Emma Holmes; Thomas Parr; Timothy D Griffiths; Karl J Friston
Journal:  Neurosci Biobehav Rev       Date:  2021-10-21       Impact factor: 8.989

3.  Towards reconstructing intelligible speech from the human auditory cortex.

Authors:  Hassan Akbari; Bahar Khalighinejad; Jose L Herrero; Ashesh D Mehta; Nima Mesgarani
Journal:  Sci Rep       Date:  2019-01-29       Impact factor: 4.379

4.  Speaker-independent auditory attention decoding without access to clean speech sources.

Authors:  Cong Han; James O'Sullivan; Yi Luo; Jose Herrero; Ashesh D Mehta; Nima Mesgarani
Journal:  Sci Adv       Date:  2019-05-15       Impact factor: 14.136

5.  Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception.

Authors:  Enea Ceolini; Jens Hjortkjær; Daniel D E Wong; James O'Sullivan; Vinay S Raghavan; Jose Herrero; Ashesh D Mehta; Shih-Chii Liu; Nima Mesgarani
Journal:  Neuroimage       Date:  2020-08-20       Impact factor: 6.556

6.  Enhancement of speech-in-noise comprehension through vibrotactile stimulation at the syllabic rate.

Authors:  Pierre Guilleminot; Tobias Reichenbach
Journal:  Proc Natl Acad Sci U S A       Date:  2022-03-21       Impact factor: 12.779

7.  Transcranial Alternating Current Stimulation With the Theta-Band Portion of the Temporally-Aligned Speech Envelope Improves Speech-in-Noise Comprehension.

Authors:  Mahmoud Keshavarzi; Tobias Reichenbach
Journal:  Front Hum Neurosci       Date:  2020-05-29       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.