PURPOSE: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. METHOD: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (support-vector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. RESULTS: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). CONCLUSION: We identified a 4-sensor set--that is, T1, T4, UL, LL--that yielded a classification accuracy (91%-95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.
PURPOSE: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. METHOD: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (support-vector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. RESULTS: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). CONCLUSION: We identified a 4-sensor set--that is, T1, T4, UL, LL--that yielded a classification accuracy (91%-95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.
Authors: Joseph S Perkell; Frank H Guenther; Harlan Lane; Melanie L Matthies; Ellen Stockmann; Mark Tiede; Majid Zandipour Journal: J Acoust Soc Am Date: 2004-10 Impact factor: 1.840
Authors: Jun Wang; Prasanna V Kothalkar; Myungjong Kim; Andrea Bandini; Beiming Cao; Yana Yunusova; Thomas F Campbell; Daragh Heitzman; Jordan R Green Journal: Int J Speech Lang Pathol Date: 2018-11-08 Impact factor: 2.484
Authors: Jonghye Woo; Fangxu Xing; Jerry L Prince; Maureen Stone; Jordan R Green; Tessa Goldsmith; Timothy G Reese; Van J Wedeen; Georges El Fakhri Journal: J Acoust Soc Am Date: 2019-05 Impact factor: 1.840
Authors: Gabriel J Cler; Jackson C Lee; Talia Mittelman; Cara E Stepp; Jason W Bohland Journal: J Speech Lang Hear Res Date: 2017-06-22 Impact factor: 2.297
Authors: Jun Wang; Prasanna V Kothalkar; Myungjong Kim; Yana Yunusova; Thomas F Campbell; Daragh Heitzman; Jordan R Green Journal: Workshop Speech Lang Process Assist Technol Date: 2016-09