Literature DB >> 30823790

Formant estimation and tracking: A deep learning approach.

Yehoshua Dissen1, Jacob Goldberger2, Joseph Keshet1.   

Abstract

Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the task of tracking the input is a series of speech frames, and the goal is to track the trajectory of the formant frequencies throughout the signal. The use of supervised machine learning techniques trained on an annotated corpus of read-speech for these tasks is proposed. Two deep network architectures were evaluated for estimation: feed-forward multilayer-perceptrons and convolutional neural-networks and, correspondingly, two architectures for tracking: recurrent and convolutional recurrent networks. The inputs to the former are composed of linear predictive coding-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the inputs to the latter are raw spectrograms. The performance of the methods compares favorably with alternative methods for formant estimation and tracking. A network architecture is further proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time. The adapted networks were evaluated on three datasets, and their performance was further improved.

Entities:  

Year:  2019        PMID: 30823790     DOI: 10.1121/1.5088048

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  1 in total

1.  Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986).

Authors:  D H Whalen; Wei-Rong Chen; Christine H Shadle; Sean A Fulop
Journal:  J Acoust Soc Am       Date:  2022-08       Impact factor: 2.482

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.