Literature DB >> 33705675

Performance of Forced-Alignment Algorithms on Children's Speech.

Tristan J Mahr1, Visar Berisha2, Kan Kawabata2,3, Julie Liss2, Katherine C Hustad1,4.   

Abstract

Purpose Acoustic measurement of speech sounds requires first segmenting the speech signal into relevant units (words, phones, etc.). Manual segmentation is cumbersome and time consuming. Forced-alignment algorithms automate this process by aligning a transcript and a speech sample. We compared the phoneme-level alignment performance of five available forced-alignment algorithms on a corpus of child speech. Our goal was to document aligner performance for child speech researchers. Method The child speech sample included 42 children between 3 and 6 years of age. The corpus was force-aligned using the Montreal Forced Aligner with and without speaker adaptive training, triphone alignment from the Kaldi speech recognition engine, the Prosodylab-Aligner, and the Penn Phonetics Lab Forced Aligner. The sample was also manually aligned to create gold-standard alignments. We evaluated alignment algorithms in terms of accuracy (whether the interval covers the midpoint of the manual alignment) and difference in phone-onset times between the automatic and manual intervals. Results The Montreal Forced Aligner with speaker adaptive training showed the highest accuracy and smallest timing differences. Vowels were consistently the most accurately aligned class of sounds across all the aligners, and alignment accuracy increased with age for fricative sounds across the aligners too. Conclusion The best-performing aligner fell just short of human-level reliability for forced alignment. Researchers can use forced alignment with child speech for certain classes of sounds (vowels, fricatives for older children), especially as part of a semi-automated workflow where alignments are later inspected for gross errors. Supplemental Material https://doi.org/10.23641/asha.14167058.

Entities:  

Mesh:

Year:  2021        PMID: 33705675      PMCID: PMC8740721          DOI: 10.1044/2020_JSLHR-20-00268

Source DB:  PubMed          Journal:  J Speech Lang Hear Res        ISSN: 1092-4388            Impact factor:   2.297


  4 in total

1.  Automatic speech recognition: A primer for speech-language pathology researchers.

Authors:  Joseph Keshet
Journal:  Int J Speech Lang Pathol       Date:  2018-11       Impact factor: 2.484

2.  Methods for eliciting, annotating, and analyzing databases for child speech development.

Authors:  Mary E Beckman; Andrew R Plummer; Benjamin Munson; Patrick F Reidy
Journal:  Comput Speech Lang       Date:  2017-09       Impact factor: 1.899

3.  Children's Consonant Acquisition in 27 Languages: A Cross-Linguistic Review.

Authors:  Sharynne McLeod; Kathryn Crowe
Journal:  Am J Speech Lang Pathol       Date:  2018-11-21       Impact factor: 2.408

4.  Examining Factors Influencing the Viability of Automatic Acoustic Analysis of Child Speech.

Authors:  Thea Knowles; Meghan Clayards; Morgan Sonderegger
Journal:  J Speech Lang Hear Res       Date:  2018-10-26       Impact factor: 2.297

  4 in total
  1 in total

1.  Speech Development Between 30 and 119 Months in Typical Children II: Articulation Rate Growth Curves.

Authors:  Tristan J Mahr; Jennifer U Soriano; Paul J Rathouz; Katherine C Hustad
Journal:  J Speech Lang Hear Res       Date:  2021-09-29       Impact factor: 2.674

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.