| Literature DB >> 35930317 |
Zhengdong Lei1, Lisa Martignetti1, Chelsea Ridgway2, Simon Peacock3, Jon T Sakata4,5, Nicole Y K Li-Jessen1,5,6,7,8.
Abstract
BACKGROUND: Neck surface accelerometer (NSA) wearable devices have been developed for voice and upper airway health monitoring. As opposed to acoustic sounds, NSA senses mechanical vibrations propagated from the vocal tract to neck skin, which are indicative of a person's voice and airway conditions. NSA signals do not carry identifiable speech information and a speaker's privacy is thus protected, which is important and necessary for continuous wearable monitoring. Our device was already tested for its durable endurance and signal processing algorithms in controlled laboratory conditions.Entities:
Keywords: mechano-acoustic sensing; neck surface accelerometer; voice monitoring; wearable device
Year: 2022 PMID: 35930317 PMCID: PMC9391979 DOI: 10.2196/39789
Source DB: PubMed Journal: JMIR Form Res ISSN: 2561-326X
Figure 1The NSA Wearable Device. (A) Hardware instrument, and (B) Schematic design. Adapted from “Figure 1. The physical prototype and schematic of the NSA”, by Lei et al, 2019 [4] and licensed under CC BY 4.0. PCB: printed circuit board.
Figure 2Human Protocol of Voice Assessments and Voice Acting. Voice assessments included Self-Administrated Voice Rating questionnaire (SAVRa) and neck surface accelerometer (NSA)-derived acoustic voice evaluation.
Phonation tasks.
| Task number | Phonation task | Acoustic metrics |
| 1 | 1-minute reading of the Rainbow Passage |
Cepstral peak prominence Fundamental frequency H1 – H2a Harmonic richness factor Spectral entropy Spectral tilt Surface/skin acceleration level |
| 2 | Vowel phonation /a/ for 5 seconds |
Cepstral peak prominence Fundamental frequency H1 – H2 Harmonic richness factor Spectral entropy Spectral tilt Surface/skin acceleration level Jitter Shimmer |
| 3 | Deep breath and vowel phonation /a/ |
Maximum phonation time |
| 4 | Glide on vowel /a/ from low to high pitch |
|
aH1 – H2: difference between the first and second harmonic magnitudes.
Mathematical formulas and definitions of acoustic metrics.
| Acoustic metrics | Mathematic formula | Units | Definition |
| CPPa | Peak_max – (b0+b1*|q|) | Decibels | The difference in amplitude between the cepstral peak and the corresponding value on the trend line through the overall spectrum, which represents how far the cepstral peak emerges from the cepstrum background. |
|
| 1/ | Hertz | Frequency of vocal fold vibration that is the lowest of all the frequencies in the voice spectrum and is obtained by the reciprocal of the smallest period. |
| H1 – H2c | 20log(A1/A2) | Decibels | The log-magnitude difference between the amplitudes of the first and second harmonics in the spectrum. |
| HRFd | Decibels | Ratio of the sum of the amplitudes at the harmonics above the fundamental frequency to the amplitude of the component at the fundamental frequency. | |
| SEe | Relative value | Estimates the uniformity of signal energy distribution in the frequency domain. | |
| Tiltf | Decibels/Hertz | Tilt of the trend line of the long-term average spectrum, which represents the degree to which intensity drops off as frequency increases. | |
| SALg | 20log(max[data_frame]/A_noise) | Decibels | The calculation is based on the maximum of each voiced segment amplitude for every 45-ms segment window. |
| Jitter(relative) | Percent | Average absolute difference between consecutive periods divided by average period, indicating the cycle-to-cycle variation of the fundamental frequency. | |
| Shimmer(relative) | Percent | Average absolute difference between the amplitudes of consecutive periods divided by average amplitude, indicating the cycle-to-cycle variation of vocal amplitude. | |
| MPTh | T2 – T1 | Seconds | Measure of a maximally sustained vowel following a maximal inspiration, which provides an indication of the efficiency of the respiratory mechanism. |
aCPP: cepstral peak prominence.
bf0: fundamental frequency.
cH1 – H2: difference between the first and second harmonic magnitudes.
dHRF: harmonic richness factor.
eSE: spectral entropy.
fTilt: spectral tilt.
gSAL: skin acceleration level.
hMPT: maximum phonation time.
Participant descriptive statistics.
| Group | Age (years), mean (SD) | Voice acting experience (years), mean (SD) | |||
|
|
|
| |||
|
| No warm-up | 32 (5.1) | 4 (2.9) | ||
|
| Warm-up | 32 (5.5) | 8 (5.7) | ||
|
|
|
| |||
|
| Female | 32 (5.5) | 7 (5.4) | ||
|
| Male | 33 (4.7) | 4 (3.2) | ||
Acoustic metrics comparison.a
| Sources |
| CPPc | H1 – H2d | Tilte | Tilt Absf | |||||||||||||
| Mode | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | |||||||||||||
|
|
|
|
|
|
|
| ||||||||||||
|
|
|
|
|
|
|
|
| |||||||||||
|
|
| T1 | 104 | 156.1 (49.0) | 20.5 (3.7) | 5.4 (17.1) | –0.048 (0.009) | –6.0 (5.3) | ||||||||||
|
|
| T2b | 108 | 150.2 (88.0) | 26.3 (8.9) | 3.8 (17.4) | –0.044 (0.008) | –4.6 (4.0) | ||||||||||
|
|
| T3 | 99 | 151 (61.4) | 29.5 (7.8) | 5.2 (18) | –0.041 (0.007) | –6.0 (4.7) | ||||||||||
|
|
| T4 | 81 | 140.6 (48.8) | 26.3 (7.7) | 3.8 (16) | –0.045 (0.009) | –4.7 (4.0) | ||||||||||
|
|
| T6 | 85 | 146.4 (49.3) | 21 (3.8) | 4.6 (15.6) | –0.049 (0.009) | –4.9 (4.5) | ||||||||||
|
|
| T7 | 82 | 143 (53.8) | 20.7 (3.7) | 10.3 (16.2) | –0.049 (0.010) | –5.6 (4.4) | ||||||||||
|
|
|
|
|
|
|
|
| |||||||||||
|
|
| T1 | 176 | 183.3 (75.8) | 21.1 (3.8) | 8.2 (23.5) | –0.048 (0.011) | –1.2 (5.0) | ||||||||||
|
|
| T2a | 173 | 184.9 (113.3) | 23.6 (7.6) | 7.4 (21.8) | –0.05 (0.011) | –.07 (5.2) | ||||||||||
|
|
| T2b | 173 | 182.5 (76.5) | 23.1 (7.6) | 8.5 (23.5) | –0.048 (0.011) | –0.8 (4.8) | ||||||||||
|
|
| T3 | 186 | 182.2 (69.5) | 26.7 (8.6) | 10.1 (24.8) | –0.045 (0.012) | –1.2 (5.2) | ||||||||||
|
|
| T4 | 138 | 171.3 (75.0) | 24.4 (6.4) | 13 (23.3) | –0.045 (0.009) | –2.3 (5.8) | ||||||||||
|
|
| T6 | 151 | 176.5 (63.9) | 21.4 (3.9) | 8.7 (23.7) | –0.05 (0.011) | –1.3 (5.2) | ||||||||||
|
|
| T7 | 181 | 182.4 (70.2) | 20.8 (3.9) | 4.9 (22) | –0.052 (0.011) | –0.9 (5.7) | ||||||||||
|
|
|
|
|
|
|
| ||||||||||||
|
| Patients with PVFLg | 198.1 | —h (76.1) | — | — | — | — | |||||||||||
|
| Matched controls | 202.9 | — (88.0) | — | — | — | — | |||||||||||
|
|
|
|
|
|
|
| ||||||||||||
|
| Patients with PVHi | 197.2 | — (75.3) | 23.2 (4.4) | — | — | –14.4 (2.4) | |||||||||||
|
| PVH controls | 201.4 | — (89.6) | 22.9 (4.5) | — | — | –14.1 (2.4) | |||||||||||
|
| Patients with NPVHj | 193.8 | — (73.5) | 21.4 (4.2) | — | — | –13.6 (2.5) | |||||||||||
|
| NPVH controls | 192.9 | — (70.1) | 22.8 (4.4) | — | — | –14.1 (2.4) | |||||||||||
|
|
|
|
|
|
|
| ||||||||||||
|
| Patients with PVH | 196.1 | — (73.5) | 23.1 (4.4) | 4.4 (6.1) | — | — | |||||||||||
|
| Matched controls | 199.4 | — (86.7) | 22.7 (4.4) | 5.1 (7.0) | — | — | |||||||||||
|
|
|
|
|
|
|
| ||||||||||||
|
| Combined phonation (healthy) | 205.7 | — (91.6) | 22.7 (4.5) | 5.5 (7.2) | — | — | |||||||||||
|
| Singing (healthy) | 325.4 | — (94.6) | 21.5 (4) | 9.7 (7.3) | — | — | |||||||||||
|
| Speech (healthy) | 203.5 | — (62.4) | 23.1 (4.5) | 4.2 (6.6) | — | — | |||||||||||
|
|
|
|
|
|
|
| ||||||||||||
|
| Patients with NPVH | 202.4 | — (68.1) | 20.6 (3.9) | 2.6 (6.7) | — | — | |||||||||||
|
| Matched controls | 182.8 | — (68.6) | 22.1 (4.3) | 2.5 (6.5) | — | — | |||||||||||
aMode and mean (SD) data for the acoustic metrics f0, CPP, H1 – H2, Tilt, and Tilt Abs are presented for our Rainbow Passage task as well as for conversational speech from related research studies.
bf0: fundamental frequency.
cCPP: cepstral peak prominence.
dH1 – H2: difference between the first and second harmonic magnitudes.
eTilt: spectral tilt.
fTilt Abs: tilt absolute.
gPVFL: phonotraumatic vocal fold lesions.
h—: data not available.
iPVH: phonotraumatic vocal hyperfunction.
jNPVH: nonphonotraumatic vocal hyperfunction.
Figure 3Means and standard errors (error bars) of Self-Administrated Voice Rating questionnaire (SAVRa) as functions of Time and Gender Group. The voice acting session is highlighted in the pink region. Asterisks denote statistically significant differences between a specific time point and Day 1 (**P≤.01, *** P≤.001). DISC: laryngeal discomfort level; EFFT: current speaking effort level; IPSV: inability to produce soft voice; n.s.=no significant differences.
Figure 4Means and standard errors (error bars) of accumulated distance dose (Dd) as functions of Study Group and Gender Group. (A) Total 4-hour sessions. (B) First and second parts of session. n.s.=no significant differences, ie, P>.01.
Figure 5Means and standard errors (error bars) of neck surface accelerometer-derived acoustic metrics in the Rainbow Passage Task as functions of Time and Gender Group. Asterisks denote statistically significant differences (1) between the female (F) and the male (M) participant groups, as well as, (2) between a specific time point and Day 1 (*** P≤.001). CPP: cepstral peak prominence; f0: fundamental frequencyo; H1 – H2: difference between the first and second harmonic magnitudes; HRF: harmonic richness factor; SAL: skin acceleration level; SE: spectral entropy.
Figure 6Means and standard errors (error bars) of neck surface accelerometer-derived acoustic metrics in the Sustained Vowel Task as functions of Time and Gender Group. Asterisks denote statistically significant differences (1) between the female (F) and the male (M) participant groups, as well as, (2) between a specific time point and Day 1 (*** P≤.001). CPP: cepstral peak prominence; f0: fundamental frequencyo; H1 – H2: difference between the first and second harmonic magnitudes; HRF: harmonic richness factor; SAL: skin acceleration level; SE: spectral entropy.
Group-based means (SD) for the maximum phonation time and pitch glide tasks.a
| Acoustic metrics and gender groups | Experimental time points, mean (SD) | ANOVA | ||||||||||||||||||
| Day 1 | Day 2 presession | Day 2 midsession | Day 2 postsession | Day 3 | Day 4 | Time | Gender | Time × gender | ||||||||||||
|
|
|
|
|
|
|
| ||||||||||||||
|
| Female | 25.29 (7.53) | 22.60 (6.42) | 25.68 (6.84) | 24.12 (7.38) | 22.22 (6.86) | 24.90 (7.90) |
|
|
| ||||||||||
|
| Male | 30.08 (8.51) | 26.74 (11.35) | 30.41 (15.36) | 30.56 (10.65) | 30.87 (12.11) | 28.59 (10.50) | |||||||||||||
|
|
|
|
|
|
|
| ||||||||||||||
|
| Female | 13.98 (4.88) | 12.85 (0.41) | 12.83 (0.62) | 12.87 (0.42) | 15.03 (7.62) | 12.63 (0.61) |
|
|
| ||||||||||
|
| Male | 12.77 (0.68) | 16.80 (8.80) | 14.06 (3.57) | 13.08 (0.56) | 18.81 (8.43) | 25.18 (16.20) | |||||||||||||
|
|
|
|
|
|
|
| ||||||||||||||
|
| Female | 929.14 (335.34) | 870.47 (269.30) | 911.29 (211.76) | 934.05 (301.39) | 842.46 (242.24) | 864.20 (231.92) |
|
|
| ||||||||||
|
| Male | 694.61 (259.44) | 689.70 (281.84) | 661.19 (188.34) | 626.75 (191.97) | 823.09 (614.29) | 574.22 (249.39) | |||||||||||||
aThere are no statistically significant effects (P<.01).
bMPT: maximum phonation time.
cf0 min: f0 minimum.
df0 max: f0 maximum.