Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 DEEP CLUSTERING AND CONVENTIONAL NETWORKS FOR MUSIC SEPARATION: STRONGER TOGETHER.

Literature DB >> 29398973

DEEP CLUSTERING AND CONVENTIONAL NETWORKS FOR MUSIC SEPARATION: STRONGER TOGETHER.

Yi Luo¹, Zhuo Chen¹, John R Hershey², Jonathan Le Roux², Nima Mesgarani¹.

Abstract

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the strengths of deep clustering and conventional network architectures appear complementary, we explore combining them in a single hybrid network trained via an approach akin to multi-task learning. Remarkably, the combination significantly outperforms either of its components.

Entities: Chemical Disease Gene

Keywords: Deep clustering; Deep learning; Music separation; Singing voice separation

Year: 2017 PMID： 29398973 PMCID： PMC5791533 DOI： 10.1109/ICASSP.2017.7952118

Source DB: PubMed Journal: Proc IEEE Int Conf Acoust Speech Signal Process ISSN： 1520-6149

Keyword Cloud
Cited

1 in total

1. Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

Authors: Yi Luo; Nima Mesgarani
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2019-05-06

1 in total