Literature DB >> 20161342

Speaker-Independent Phoneme Alignment Using Transition-Dependent States.

John-Paul Hosom1.   

Abstract

Determining the location of phonemes is important to a number of speech applications, including training of automatic speech recognition systems, building text-to-speech systems, and research on human speech processing. Agreement of humans on the location of phonemes is, on average, 93.78% within 20 msec on a variety of corpora, and 93.49% within 20 msec on the TIMIT corpus. We describe a baseline forced-alignment system and a proposed system with several modifications to this baseline. Modifications include the addition of energy-based features to the standard cepstral feature set, the use of probabilities of a state transition given an observation, and the computation of probabilities of distinctive phonetic features instead of phoneme-level probabilities. Performance of the baseline system on the test partition of the TIMIT corpus is 91.48% within 20 msec, and performance of the proposed system on this corpus is 93.36% within 20 msec. The results of the proposed system are a 22% relative reduction in error over the baseline system, and a 14% reduction in error over results from a non-HMM alignment system. This result of 93.36% agreement is the best known reported result on the TIMIT corpus.

Entities:  

Year:  2009        PMID: 20161342      PMCID: PMC2682710          DOI: 10.1016/j.specom.2008.11.003

Source DB:  PubMed          Journal:  Speech Commun        ISSN: 0167-6393            Impact factor:   2.017


  4 in total

1.  Age-related differences in identification and discrimination of temporal cues in speech segments.

Authors:  Sandra Gordon-Salant; Grace H Yeni-Komshian; Peter J Fitzgibbons; Jessica Barrett
Journal:  J Acoust Soc Am       Date:  2006-04       Impact factor: 1.840

2.  On the role of spectral transition for speech perception.

Authors:  S Furui
Journal:  J Acoust Soc Am       Date:  1986-10       Impact factor: 1.840

3.  A diagnostic marker for childhood apraxia of speech: the lexical stress ratio.

Authors:  Lawrence D Shriberg; Thomas F Campbell; Heather B Karlsson; Roger L Brown; Jane L McSweeny; Connie J Nadler
Journal:  Clin Linguist Phon       Date:  2003 Oct-Nov       Impact factor: 1.346

4.  A diagnostic marker for childhood apraxia of speech: the coefficient of variation ratio.

Authors:  Lawrence D Shriberg; Jordan R Green; Thomas F Campbell; Jane L McSweeny; Alison R Scheer
Journal:  Clin Linguist Phon       Date:  2003 Oct-Nov       Impact factor: 1.346

  4 in total
  4 in total

1.  Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment.

Authors:  Christian DiCanio; Hosung Nam; Douglas H Whalen; H Timothy Bunnell; Jonathan D Amith; Rey Castillo García
Journal:  J Acoust Soc Am       Date:  2013-09       Impact factor: 1.840

2.  Spoken Language Derived Measures for Detecting Mild Cognitive Impairment.

Authors:  Brian Roark; Margaret Mitchell; John-Paul Hosom; Kristy Hollingshead; Jeffrey Kaye
Journal:  IEEE Trans Audio Speech Lang Process       Date:  2011-09-01

3.  Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production.

Authors:  Matthew Goldrick; Joseph Keshet; Erin Gustafson; Jordana Heller; Jeremy Needle
Journal:  Cognition       Date:  2016-01-09

4.  Determining the relevance of different aspects of formant contours to intelligibility.

Authors:  Akiko Amano-Kusumoto; John-Paul Hosom; Alexander Kain; Justin M Aronoff
Journal:  Speech Commun       Date:  2014-04-01       Impact factor: 2.017

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.