| Literature DB >> 34539451 |
Emiro J Ibarra1,2, Jesús A Parra1, Gabriel A Alzamendi3, Juan P Cortés1,4, Víctor M Espinoza5, Daryush D Mehta4, Robert E Hillman4, Matías Zañartu1.
Abstract
The ambulatory assessment of vocal function can be significantly enhanced by having access to physiologically based features that describe underlying pathophysiological mechanisms in individuals with voice disorders. This type of enhancement can improve methods for the prevention, diagnosis, and treatment of behaviorally based voice disorders. Unfortunately, the direct measurement of important vocal features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation is impractical in laboratory and ambulatory settings. In this study, we introduce a method to estimate these features during phonation from a neck-surface vibration signal through a framework that integrates a physiologically relevant model of voice production and machine learning tools. The signal from a neck-surface accelerometer is first processed using subglottal impedance-based inverse filtering to yield an estimate of the unsteady glottal airflow. Seven aerodynamic and acoustic features are extracted from the neck surface accelerometer and an optional microphone signal. A neural network architecture is selected to provide a mapping between the seven input features and subglottal pressure, vocal fold collision pressure, and cricothyroid and thyroarytenoid muscle activation. This non-linear mapping is trained solely with 13,000 Monte Carlo simulations of a voice production model that utilizes a symmetric triangular body-cover model of the vocal folds. The performance of the method was compared against laboratory data from synchronous recordings of oral airflow, intraoral pressure, microphone, and neck-surface vibration in 79 vocally healthy female participants uttering consecutive /pæ/ syllable strings at comfortable, loud, and soft levels. The mean absolute error and root-mean-square error for estimating the mean subglottal pressure were 191 Pa (1.95 cm H2O) and 243 Pa (2.48 cm H2O), respectively, which are comparable with previous studies but with the key advantage of not requiring subject-specific training and yielding more output measures. The validation of vocal fold collision pressure and laryngeal muscle activation was performed with synthetic values as reference. These initial results provide valuable insight for further vocal fold model refinement and constitute a proof of concept that the proposed machine learning method is a feasible option for providing physiologically relevant measures for laboratory and ambulatory assessment of vocal function.Entities:
Keywords: ambulatory monitoring; clinical voice assessment; neck-surface accelerometer; neural networks; subglottal pressure estimation; voice production model
Year: 2021 PMID: 34539451 PMCID: PMC8440844 DOI: 10.3389/fphys.2021.732244
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.566
Figure 1A schematic of the proposed method for the ambulatory vocal assessment based on processing the neck skin acceleration signal and a regression neural network.
Description of aerodynamic features extracted from the glottal airflow signal and acoustic sound pressure level extracted from the microphone or accelerometer signal.
|
|
|
|
|---|---|---|
| ACFL | The difference between the maximum and minimum amplitude of the AC glottal airflow (peak-to-peak) within each glottal cycle | |
| MFDR | Maximum flow declination rate: Negative peak of the first derivative of the glottal waveform | |
| OQ | Open quotient: Ratio of the open time of the glottal vibratory cycle to the corresponding cycle period. Computed as in Cortés et al. ( | % |
| SQ | Speed quotient: Ratio of the opening time of the glottis to the closing time. Computed as in Cortés et al. ( | – |
| Difference between the magnitude of the first two harmonics |
| |
|
| Fundamental frequency |
|
| SPL | Sound pressure level: dB from the RMS envelope of the acoustic signal | dB SPL |
Figure 2A schematic for the proposed training procedure. A regression neural network is built for mapping accelerometer-based vocal features into clinically relevant estimates for subglottal pressure, subglottal collision pressure, and laryngeal muscle activation levels of the thyroarytenoid (TA) and cricothyroid (CT) muscles. Training data are produced from a numerical voice production model.
Range and increment step for control parameters in the numerical voice production model considered for building the synthetic dataset.
|
|
|
|
|
|---|---|---|---|
|
| 0-1 | 0.1 | – |
|
| 0-1 | 0.1 | – |
|
| 0.2-0.8 | 0.1 | – |
|
| 0-0.1 | 0.1 | – |
|
| 0.2-0.8 | 0.1 | – |
|
| 500 – 2000 | 150 | Pa |
Figure 3Normalized histogram describing vocal features obtained for all measured quantities in the clinical data set (blue color). Resulting histograms for the synthetic dataset (red color) are superimposed to illustrate model matching. Bias correction for synthetic SPL and Ps are shown as additional histograms (light red color).
MAE and RMSE between the estimated P with the proposed NN regression model and the reference measures from synthetic and laboratory test data.
|
|
|
|
| ||
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| |||||
| 4 | 2 | 1.98 | 2.51 | 2.23 | 2.82 |
| 8 | 2 | 1.81 | 2.34 | 2.28 | 2.86 |
| 16 | 2 | 1.35 | 1.83 | 2.56 | 3.13 |
| 32 | 2 | 1.18 | 1.64 | 2.82 | 3.43 |
| 64 | 2 | 1.02 | 1.48 | 2.89 | 3.50 |
| 128 | 2 | 0.99 | 1.68 | 2.94 | 3.58 |
| 128 | 4 | 0.93 | 1.33 | 3.17 | 3.87 |
| 128 | 6 | 0.97 | 1.38 | 3.14 | 3.85 |
| 128 | 8 | 1.01 | 1.45 | 3.12 | 3.76 |
|
| |||||
| 4 | 2 | 1.84 | 2.42 | 1.95 | 2.48 |
| 8 | 2 | 1.87 | 2.43 | 1.97 | 2.52 |
| 16 | 2 | 1.27 | 1.74 | 2.42 | 2.98 |
| 32 | 2 | 1.13 | 1.58 | 2.55 | 3.17 |
| 64 | 2 | 0.99 | 1.42 | 2.88 | 3.45 |
| 128 | 2 | 0.90 | 1.30 | 2.98 | 3.58 |
| 128 | 4 | 0.78 | 1.12 | 3.23 | 3.87 |
| 128 | 6 | 0.87 | 1.21 | 3.04 | 3.71 |
| 128 | 8 | 1.00 | 1.38 | 3.08 | 3.70 |
Errors are reported for different NN architecture (different number of neurons and hidden layers). Case I: Input aerodynamic features of ACFL, MFDR, OQ, SQ, f.
Figure 4Mean Squared Error (MSE) vs. epoch for training (blue color) and validation (red color) for two neural networks architectures. (Left) 2 hidden layers with 4 neurons. (Right) 8 hidden layers and 128 neurons.
Figure 5Comparison between laboratory-estimated subglottal pressure and the corresponding estimates from the trained neural network (2-hidden layer, 4 neurons in each layer, and 7 voice features). R2 = 0.65. The dashed line represents the theoretical 1:1 perfect matching.
Assessment of estimated vocal measures P, P, a, and a using the proposed NN regression method.
|
|
|
|
|
|
|---|---|---|---|---|
|
|
| |||
|
| ||||
|
| cm H2O | 0.64 | 1.84 | 11.4 |
|
| cm H2O | 0.70 | 3.33 | 8.2 |
|
| - | 0.07 | 0.21 | 21.1 |
|
| - | 0.53 | 0.15 | 14.6 |
|
| ||||
|
| cm H2O | 0.93 | 0.74 | 4.7 |
|
| cm H2O | 0.92 | 1.70 | 4.2 |
|
| – | 0.52 | 0.13 | 13.3 |
|
| – | 0.84 | 0.07 | 7.1 |
Reported values for R.