Literature DB >> 20428476

Robust Speech Rate Estimation for Spontaneous Speech.

Dagen Wang1, Shrikanth S Narayanan.   

Abstract

In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database.

Entities:  

Year:  2007        PMID: 20428476      PMCID: PMC2860302          DOI: 10.1109/TASL.2007.905178

Source DB:  PubMed          Journal:  IEEE Trans Audio Speech Lang Process        ISSN: 1558-7916


  6 in total

1.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation.

Authors:  Alan Bell; Daniel Jurafsky; Eric Fosler-Lussier; Cynthia Girand; Michelle Gregory; Daniel Gildea
Journal:  J Acoust Soc Am       Date:  2003-02       Impact factor: 1.840

2.  An Acoustic Measure for Word Prominence in Spontaneous Speech.

Authors:  Dagen Wang; Shrikanth Narayanan
Journal:  IEEE Trans Audio Speech Lang Process       Date:  2007-02-01

3.  Automatic segmentation of speech into syllabic units.

Authors:  P Mermelstein
Journal:  J Acoust Soc Am       Date:  1975-10       Impact factor: 1.840

4.  Articulatory strengthening at edges of prosodic domains.

Authors:  C Fougeron; P A Keating
Journal:  J Acoust Soc Am       Date:  1997-06       Impact factor: 1.840

5.  Articulation rate and its variability in spontaneous speech: a reanalysis and some implications.

Authors:  J L Miller; F Grosjean; C Lomanto
Journal:  Phonetica       Date:  1984       Impact factor: 1.759

6.  Automatic speech recognition using psychoacoustic models.

Authors:  E Zwicker; E Terhardt; E Paulus
Journal:  J Acoust Soc Am       Date:  1979-02       Impact factor: 1.840

  6 in total
  6 in total

1.  The consonant-weighted envelope difference index (cEDI): a proposed technique for quantifying envelope distortion.

Authors:  Eric C Hoover; Pamela E Souza; Frederick J Gallun
Journal:  J Speech Lang Hear Res       Date:  2012-03-12       Impact factor: 2.297

2.  Convex weighting criteria for speaking rate estimation.

Authors:  Yishan Jiao; Visar Berisha; Ming Tu; Julie Liss
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2015-09

3.  Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

Authors:  Shrikanth Narayanan; Panayiotis G Georgiou
Journal:  Proc IEEE Inst Electr Electron Eng       Date:  2013-02-07       Impact factor: 10.961

4.  A Novel Application System of Assessing the Pronunciation Differences Between Chinese Children and Adults.

Authors:  Xiaoyang Zhang; Lei Xue; Zhi Zhang; Yiwen Zhang
Journal:  Open Biomed Eng J       Date:  2016-08-04

5.  Automatic Evaluation of Speech Rhythm Instability and Acceleration in Dysarthrias Associated with Basal Ganglia Dysfunction.

Authors:  Jan Rusz; Jan Hlavnička; Roman Čmejla; Evžen Růžička
Journal:  Front Bioeng Biotechnol       Date:  2015-07-24

6.  Automatic Prosodic Analysis to Identify Mild Dementia.

Authors:  Eduardo Gonzalez-Moreira; Diana Torres-Boza; Héctor Arturo Kairuz; Carlos Ferrer; Marlene Garcia-Zamora; Fernando Espinoza-Cuadros; Luis Alfonso Hernandez-Gómez
Journal:  Biomed Res Int       Date:  2015-10-19       Impact factor: 3.411

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.