Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Formant estimation and tracking: A deep learning approach.

Literature DB >> 30823790

Formant estimation and tracking: A deep learning approach.

Yehoshua Dissen¹, Jacob Goldberger², Joseph Keshet¹.

Abstract

Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the task of tracking the input is a series of speech frames, and the goal is to track the trajectory of the formant frequencies throughout the signal. The use of supervised machine learning techniques trained on an annotated corpus of read-speech for these tasks is proposed. Two deep network architectures were evaluated for estimation: feed-forward multilayer-perceptrons and convolutional neural-networks and, correspondingly, two architectures for tracking: recurrent and convolutional recurrent networks. The inputs to the former are composed of linear predictive coding-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the inputs to the latter are raw spectrograms. The performance of the methods compares favorably with alternative methods for formant estimation and tracking. A network architecture is further proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time. The adapted networks were evaluated on three datasets, and their performance was further improved.

Entities: Disease

Year: 2019 PMID： 30823790 DOI： 10.1121/1.5088048

Source DB: PubMed Journal: J Acoust Soc Am ISSN： 0001-4966 Impact factor: 1.840

Keyword Cloud
Cited

1 in total

1. Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986).

Authors: D H Whalen; Wei-Rong Chen; Christine H Shadle; Sean A Fulop
Journal: J Acoust Soc Am Date: 2022-08 Impact factor: 2.482

1 in total