| Literature DB >> 27990428 |
Anne Schützenberger1, Melda Kunduk2, Michael Döllinger1, Christoph Alexiou3, Denis Dubrovskiy1, Marion Semmler1, Anja Seger1, Christopher Bohr1.
Abstract
The current use of laryngeal high-speed videoendoscopy in clinic settings involves subjective visual assessment of vocal fold vibratory characteristics. However, objective quantification of vocal fold vibrations for evidence-based diagnosis and therapy is desired, and objective parameters assessing laryngeal dynamics have therefore been suggested. This study investigated the sensitivity of the objective parameters and their dependence on recording frame rate. A total of 300 endoscopic high-speed videos with recording frame rates between 1000 and 15 000 fps were analyzed for a vocally healthy female subject during sustained phonation. Twenty parameters, representing laryngeal dynamics, were computed. Four different parameter characteristics were found: parameters showing no change with increasing frame rate; parameters changing up to a certain frame rate, but then remaining constant; parameters remaining constant within a particular range of recording frame rates; and parameters changing with nearly every frame rate. The results suggest that (1) parameter values are influenced by recording frame rates and different parameters have varying sensitivities to recording frame rate; (2) normative values should be determined based on recording frame rates; and (3) the typically used recording frame rate of 4000 fps seems to be too low to distinguish accurately certain characteristics of the human phonation process in detail.Entities:
Mesh:
Year: 2016 PMID: 27990428 PMCID: PMC5136634 DOI: 10.1155/2016/4575437
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 2HSV recording setup and view of the vocal folds as seen through the camera. In (b), the dark glottis between the two vocal folds can be seen.
Figure 3(a) Glottal area waveform (GAW) and the subdivision of the different oscillation states for computing the GAW parameters. (b) Definitions of GAW conditions used for parameter computation.
The parameters analyzed, with units, value ranges, and descriptions. Definitions of the GAW conditions are illustrated in Figure 3.
| Parameter (abbreviation) & reference | Units | Range | Description/formula |
|---|---|---|---|
| (A) Glottal dynamic characteristics | |||
|
| |||
| Amplitude Quotient (AMQ) [ | Frames | <0 | Glottal area (GA) dynamic range/MADR |
| Asymmetry Quotient (ASQ) [ | — | [0; 1) | SQ/(SQ + 1) |
| Closing Quotient (CQ) [ | — | [0,1) | Closing/ |
| Glottis Gap Index (GGI) [ | — | [0; 1] | Min (GA)/max (GA) |
| Maximum Area Declination Rate (MADR) [ | Pixels/frames | <0 | Value of negative peak in the 1st derivative of GAW, that is, max. GAW closing velocity |
| Open Quotient (OQ) [ | — | [0; 1] | Open/ |
| Phase Asymmetry (PA) [ | — | (−1; 1) | ( |
| Rate Quotient (RQ) [ | — | >0 | (Closed + opening)/closing |
| Speed Index (SI) [ | — | (−1; 1) | (SQ − 1)/(SQ + 1) |
| Speed Quotient (SQ) [ | — | ≥0 | Opening/closing |
|
| |||
| (B) Glottal perturbation characteristics | |||
|
| |||
| (B1) Amplitude stability: these parameters reflect the periodicity of the GAW area | |||
| Amplitude Periodicity (AP) [ | — | [0; 1] | Ratio (min/max) of the GA dynamic ranges of 2 consecutive GAW cycles |
| Amplitude Variability Index (AVI) [ | dB | (− | That is, dB-scaled coefficient of variation applied to GA dynamic ranges of all GAW cycles |
| Mean Shimmer (mSH) [ | dB | ≥0 | Mean dB-scaled difference between the GA dynamic ranges of 2 consecutive GAW cycles |
| Shimmer (SH) [ | % | [0; 100] | Ratio between mSH and mean dB-scaled GA dynamic range over all GAW cycles |
|
| |||
| (B2) Period length stability: these parameters reflect the time periodicity of the GAW | |||
| Jitter (JT) [ | % | [0; 100] | Ratio between mJT and mean duration over all GAW cycles |
| Mean Jitter (mJT) [ | ms | ≥0 | Mean difference between the durations of 2 consecutive GAW cycles |
|
| |||
| (C) Noise components. HI, NNE, and SPF relate the harmonic components to the noise components and are computed in the spectral space | |||
|
| |||
| Harmonic Intensity (HI) [ | % | [0; 100] | Spectrum-based ratio between the energy of harmonics components and the total energy of GAW |
| Harmonic to Noise Ratio (HNR) [ | dB | (− | Ratio between energies of harmonics-based signal and noise contained in GAW |
| Normalized Noise Error (NNE) [ | dB | (− | dB-scaled ratio between the estimated noise energy and the total energy of GAW |
| Spectral Flatness (SPF) [ | dB | (− | dB-scaled difference between arithmetic and geometric means of the energy spectrum coefficients of GAW |
Figure 1Computed mean parameter values for the different recording frame rates. For statistically not significantly different p values (p > 0.0036), the values are merged and the parameter value range is given. Gray-shaded areas mean that there is no statistical difference from the left and right recording frame rate intervals.
Figure 4The first six parameters representing glottal dynamic characteristics. Means and standard deviations are given. The subjective rating of stable values over HSV recording frame rates is indicated.
Figure 5The other four parameters representing glottal dynamic characteristics. Means and standard deviations are given. The subjective rating of stable values over HSV recording frame rates is indicated.
Figure 6The six parameters representing glottal perturbation. Means and standard deviations are given. The subjective rating of stable values over HSV recording frame rates is indicated.
Figure 7Parameters representing glottal harmonic and noise components. The parameters HI, NNE, and SPF are computed on the spectral space of the GAW. Means and standard deviations are given. The obvious, subjectively and objectively determined, instability, and the dependence on the HSV recording rates suggest that HI, NNE, and SPF are not suitable for evaluating the GAW signal.
AMQ values in similar studies in comparison with a recording rate of 15 kfps in our study.
| Amplitude Quotient (AMQ), healthy females | |||||||
|---|---|---|---|---|---|---|---|
| (Ms ± SD) | Age/number of subjects | Phonation |
| Spatial resolution (pixels) | Recording rate (fps) | Sequence length (cycles or ms) | Study |
| (–3.33) ± 0.73 | 21–45 yr/19 | /i/ | 251 ± 31 | 512 × 256 | 4000 | 50 cycles | Patel et al. (2014) [ |
| (–3.86) ± 1.17 | 22 ± 4 yr/77 | Vowel | — | 256 × 256 | 4000 | 250 ms | Bohr et al. (2013) [ |
| (–12.09) ± 3.13 | 45 yr/1 | /i/ | 176 ± 11 | 768 × 512 | 15000 | 106 cycles | This work |
Computed Closing Quotients and settings in similar studies.
| Closing Quotient (CQ), healthy | |||||||
|---|---|---|---|---|---|---|---|
| (Ms ± SD) or range | Age/number of subjects/gender | Phonation |
| Spatial resolution (pixels) | Recording rate (fps) | Sequence length (cycles or ms) | Study |
| 0.34 | 18–45 yr/18/f | / | 207 ± 16 | 256 × 256 | 4000 | 5 cycles | Baravieira et al. (2014) [ |
| 0.26 ± 0.08 | 20–52 yr/7/m + f | /i/ | — | 320 × 352 | 4000/6250 | 320–400 ms | Mehta et al. (2011) [ |
| 0.21–0.48 | 18–36 yr/20/f | /ae/ | 162–252 | Inverse filtered airflow | 8192 | 4 cycles | Holmberg et al. (1988) [ |
GGI values in a similar study.
| Glottis gap index (GGI), healthy females | |||||||
|---|---|---|---|---|---|---|---|
| (Ms ± SD) | Age/number of subjects | Phonation |
| Spatial resolution (pixels) | Recording rate (fps) | Sequence length (cycles) | Study |
| 0.054 ± 0.072 | 28 ± 7 yr/19 | /i/ | 251 ± 31 | 512 × 256 | 4000 | 50 | Patel et al. (2014) [ |
Open quotients and settings in similar studies.
| Open quotient (OQ), healthy | |||||||
|---|---|---|---|---|---|---|---|
| (Ms ± SD) or range | Age/number of subjects/gender | Phonation |
| Spatial resolution (pixels) | Recording rate (fps) | Sequence length (cycles or ms) | Study |
| 0.64 ± 0.10 | 20–28 yr/14/f | /i/ – healthy | — | 120 × 256 | 2000 | 250 ms | Kunduk et al. (2010) [ |
| 0.91 ± 0.11 | 21–58 yr/8/m + f | /i/ – post-surgery | 165 ± 58 | 120 × 256 | 2000 | 1000 ms | Ikuma et al. (2014) [ |
| 0.68–0.78 | 18–44 yr/26/f | /i/ – healthy | 264 ± 43 | 160 × 140 | 2000 | 300 ms | Ahmad et al. (2012) [ |
| 0.79 ± 0.17 | 24–51 yr/16/m | /i/ – healthy | 194 ± 65 | 256 × 256 | 4000 | 250 ms | Warhurst et al. (2014) [ |
| 0.44 ± 0.10 | 5–11 yr/#20/m + f | /i/ – healthy | 287 ± 40 | 512 × 256 | 4000 | 30 cycles | Patel et al. (2016) [ |
Shimmer values and settings in similar studies.
| Shimmer SH (%), healthy | |||||||
|---|---|---|---|---|---|---|---|
| Means | Age/number of subjects/gender | Phonation |
| Spatial resolution (pixels) | Recording rate (fps) | Sequence length (cycles or ms) | Study |
| 2.1 | –/1/– | /i/ | 200 | 160 × 140 | 2000 | 40 cycles | Yan et al. (2005) [ |
| 6.2 | 63–82/20/f | /i/ | 275 ± 47 | 160 × 140 | 2000 | 300 ms | Ahmad et al. (2012) [ |