Literature DB >> 28253703

Speaker-dependent multipitch tracking using deep neural networks.

Yuzhou Liu1, DeLiang Wang1.   

Abstract

Multipitch tracking is important for speech and signal processing. However, it is challenging to design an algorithm that achieves accurate pitch estimation and correct speaker assignment at the same time. In this paper, deep neural networks (DNNs) are used to model the probabilistic pitch states of two simultaneous speakers. To capture speaker-dependent information, two types of DNN with different training strategies are proposed. The first is trained for each speaker enrolled in the system (speaker-dependent DNN), and the second is trained for each speaker pair (speaker-pair-dependent DNN). Several extensions, including gender-pair-dependent DNNs, speaker adaptation of gender-pair-dependent DNNs and training with multiple energy ratios, are introduced later to relax constraints. A factorial hidden Markov model (FHMM) then integrates pitch probabilities and generates the most likely pitch tracks with a junction tree algorithm. Experiments show that the proposed methods substantially outperform other speaker-independent and speaker-dependent multipitch trackers on two-speaker mixtures. With multi-ratio training, the proposed methods achieve consistent performance at various energies ratios of the two speakers in a mixture.

Mesh:

Year:  2017        PMID: 28253703      PMCID: PMC6909980          DOI: 10.1121/1.4973687

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  5 in total

1.  YIN, a fundamental frequency estimator for speech and music.

Authors:  Alain de Cheveigné; Hideki Kawahara
Journal:  J Acoust Soc Am       Date:  2002-04       Impact factor: 1.840

2.  A fast learning algorithm for deep belief nets.

Authors:  Geoffrey E Hinton; Simon Osindero; Yee-Whye Teh
Journal:  Neural Comput       Date:  2006-07       Impact factor: 2.026

3.  An audio-visual corpus for speech perception and automatic speech recognition.

Authors:  Martin Cooke; Jon Barker; Stuart Cunningham; Xu Shao
Journal:  J Acoust Soc Am       Date:  2006-11       Impact factor: 1.840

4.  Cepstrum pitch determination.

Authors:  A M Noll
Journal:  J Acoust Soc Am       Date:  1967-02       Impact factor: 1.840

5.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.

Authors:  Jitong Chen; Yuxuan Wang; Sarah E Yoho; DeLiang Wang; Eric W Healy
Journal:  J Acoust Soc Am       Date:  2016-05       Impact factor: 1.840

  5 in total
  1 in total

1.  Application of deep neural network and deep reinforcement learning in wireless communication.

Authors:  Ming Li; Hui Li
Journal:  PLoS One       Date:  2020-07-02       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.