Maryam Naghibolhosseini1, Dimitar D Deliyski2, Stephanie R C Zacharias3, Alessandro de Alarcon4, Robert F Orlikoff5. 1. Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan. Electronic address: naghib@msu.edu. 2. Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan; Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio. 3. Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Otolaryngology Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio; Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio. 4. Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Otolaryngology Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio. 5. College of Allied Health Sciences, East Carolina University, Greenville, North Carolina.
Abstract
OBJECTIVE: This study proposes a gradient-based method for temporal segmentation of laryngeal high-speed videoendoscopy (HSV) data obtained during connected speech. METHODS: A custom-developed HSV system coupled with a flexible fiberoptic nasolaryngoscope was used to record one vocally normal female participant during reading of the "Rainbow Passage." A gradient-based algorithm was developed to generate a motion window. When applied to the HSV data, the motion window acted as a filter tracking the location of the vibrating vocal folds. The glottal area waveform was estimated using a statistical-based image-processing approach. The vocal fold vibratory frequency was computed by an autocorrelation-based extraction of the fundamental frequency (f0) from the glottal area waveform. Temporal segmentation was then performed based on the f0 contour and automatic detection of the epiglottic obstructions. Additionally, visual temporal segmentation was performed by viewing the HSV images frame by frame to determine the time points of the vocalization onsets and offsets, and the epiglottic obstructions of the glottis. RESULTS: The time points resulting from the automatic and visual temporal segmentation methods were cross-validated. The f0-contour patterns of rise and fall resulting from the automatic algorithm were found to be in agreement with the visual inspection of the vibratory frequency change in the HSV data. CONCLUSIONS: This study demonstrated the feasibility of automatic temporal segmentation of HSV imaging of connected speech, which allows for mapping the video content into onsets, offsets, and epiglottic obstructions for each vocalization. Automated analysis of HSV imaging of connected speech has significant clinical potential for advancing instrumental voice assessment protocols.
OBJECTIVE: This study proposes a gradient-based method for temporal segmentation of laryngeal high-speed videoendoscopy (HSV) data obtained during connected speech. METHODS: A custom-developed HSV system coupled with a flexible fiberoptic nasolaryngoscope was used to record one vocally normal female participant during reading of the "Rainbow Passage." A gradient-based algorithm was developed to generate a motion window. When applied to the HSV data, the motion window acted as a filter tracking the location of the vibrating vocal folds. The glottal area waveform was estimated using a statistical-based image-processing approach. The vocal fold vibratory frequency was computed by an autocorrelation-based extraction of the fundamental frequency (f0) from the glottal area waveform. Temporal segmentation was then performed based on the f0 contour and automatic detection of the epiglottic obstructions. Additionally, visual temporal segmentation was performed by viewing the HSV images frame by frame to determine the time points of the vocalization onsets and offsets, and the epiglottic obstructions of the glottis. RESULTS: The time points resulting from the automatic and visual temporal segmentation methods were cross-validated. The f0-contour patterns of rise and fall resulting from the automatic algorithm were found to be in agreement with the visual inspection of the vibratory frequency change in the HSV data. CONCLUSIONS: This study demonstrated the feasibility of automatic temporal segmentation of HSV imaging of connected speech, which allows for mapping the video content into onsets, offsets, and epiglottic obstructions for each vocalization. Automated analysis of HSV imaging of connected speech has significant clinical potential for advancing instrumental voice assessment protocols.
Authors: Stephanie R C Zacharias; Charles M Myer; Jareen Meinzen-Derr; Lisa Kelchner; Dimitar D Deliyski; Alessandro de Alarcón Journal: Ann Otol Rhinol Laryngol Date: 2016-07-12 Impact factor: 1.547
Authors: Dimitar D Deliyski; Maria Eg Powell; Stephanie Rc Zacharias; Terri Treman Gerlach; Alessandro de Alarcon Journal: Biomed Signal Process Control Date: 2014-12-29 Impact factor: 3.880
Authors: Hamzeh Ghasemzadeh; Dimitar D Deliyski; David S Ford; James B Kobler; Robert E Hillman; Daryush D Mehta Journal: J Voice Date: 2019-05-29 Impact factor: 2.009
Authors: Ahmed M Yousef; Dimitar D Deliyski; Stephanie R C Zacharias; Alessandro de Alarcon; Robert F Orlikoff; Maryam Naghibolhosseini Journal: J Voice Date: 2020-11-27 Impact factor: 2.300
Authors: Patrick Schlegel; Melda Kunduk; Michael Stingl; Marion Semmler; Michael Döllinger; Christopher Bohr; Anne Schützenberger Journal: PLoS One Date: 2019-04-22 Impact factor: 3.240
Authors: Ahmed M Yousef; Dimitar D Deliyski; Stephanie R C Zacharias; Alessandro de Alarcon; Robert F Orlikoff; Maryam Naghibolhosseini Journal: Appl Sci (Basel) Date: 2021-01-27 Impact factor: 2.679
Authors: Fabian Thornton; Michael Döllinger; Stefan Kniesburges; David Berry; Christoph Alexiou; Anne Schützenberger Journal: Appl Sci (Basel) Date: 2019-05-13 Impact factor: 2.679