Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Learning speaker-specific characteristics with a deep neural architecture.

Literature DB >> 21954206

Learning speaker-specific characteristics with a deep neural architecture.

Abstract

Speech signals convey various yet mixed information ranging from linguistic to speaker-specific information. However, most of acoustic representations characterize all different kinds of information as whole, which could hinder either a speech or a speaker recognition (SR) system from producing a better performance. In this paper, we propose a novel deep neural architecture (DNA) especially for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and SR, which results in a speaker-specific overcomplete representation. In order to learn intrinsic speaker-specific characteristics, we come up with an objective function consisting of contrastive losses in terms of speaker similarity/dissimilarity and data reconstruction losses used as regularization to normalize the interference of non-speaker-related information. Moreover, we employ a hybrid learning strategy for learning parameters of the deep neural networks: i.e., local yet greedy layerwise unsupervised pretraining for initialization and global supervised learning for the ultimate discriminative goal. With four Linguistic Data Consortium (LDC) benchmarks and two non-English corpora, we demonstrate that our overcomplete representation is robust in characterizing various speakers, no matter whether their utterances have been used in training our DNA, and highly insensitive to text and languages spoken. Extensive comparative studies suggest that our approach yields favorite results in speaker verification and segmentation. Finally, we discuss several issues concerning our proposed approach.

Mesh：

Year: 2011 PMID： 21954206 DOI： 10.1109/TNN.2011.2167240

Source DB: PubMed Journal: IEEE Trans Neural Netw ISSN： 1045-9227

Keyword Cloud
Cited

2 in total

1. Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization.

Authors: Monisankha Pal; Manoj Kumar; Raghuveer Peri; Tae Jin Park; So Hyun Kim; Catherine Lord; Somer Bishop; Shrikanth Narayanan
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2021-02-26

2. Neural networks within multi-core optic fibers.

Authors: Eyal Cohen; Dror Malka; Amir Shemer; Asaf Shahmoon; Zeev Zalevsky; Michael London
Journal: Sci Rep Date: 2016-07-07 Impact factor: 4.379

2 in total