| Literature DB >> 26347863 |
David Ireland1, Christina Knuepffer2, Simon J McBride1.
Abstract
Signal processing on digitally sampled vowel sounds for the detection of pathological voices has been firmly established. This work examines compression artifacts on vowel speech samples that have been compressed using the adaptive multi-rate codec at various bit-rates. Whereas previous work has used the sensitivity of machine learning algorithm to test for accuracy, this work examines the changes in the extracted speech features themselves and thus report new findings on the usefulness of a particular feature. We believe this work will have potential impact for future research on remote monitoring as the identification and exclusion of an ill-defined speech feature that has been hitherto used, will ultimately increase the robustness of the system.Entities:
Keywords: adaptive multi-rate codec; speech compression; speech processing; vowel sounds
Year: 2015 PMID: 26347863 PMCID: PMC4542648 DOI: 10.3389/fbioe.2015.00118
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Error for each speech feature when the audio signal is compressed using AMR-NB codec at 4.75 kbps.
Average fundamental frequencies and formant frequencies of the vowel data-set produced by 45 men, 48 women, and 46 children.
| /i/ | /I/ | /e/ | /ε/ | /æ/ | / | /ↄ/ | /O/ | /℧/ | /u/ | /Λ/ | /з/ | ||
| M | 243 | 192 | 267 | 189 | 278 | 267 | 283 | 265 | 192 | 237 | 188 | 263 | |
| F | 306 | 237 | 320 | 254 | 332 | 323 | 353 | 326 | 249 | 303 | 226 | 321 | |
| C | 297 | 248 | 314 | 235 | 322 | 311 | 319 | 310 | 247 | 278 | 234 | 307 | |
| M | 342 | 427 | 476 | 580 | 588 | 768 | 652 | 497 | 469 | 378 | 623 | 474 | |
| F | 437 | 483 | 536 | 731 | 669 | 936 | 781 | 555 | 519 | 459 | 753 | 523 | |
| C | 452 | 511 | 564 | 749 | 717 | 1002 | 803 | 597 | 568 | 494 | 749 | 586 | |
| M | 2322 | 2034 | 2089 | 1799 | 1952 | 1333 | 997 | 910 | 1122 | 997 | 1200 | 1379 | |
| F | 2761 | 2365 | 2530 | 2058 | 2349 | 1551 | 1136 | 1035 | 1225 | 1105 | 1426 | 1588 | |
| C | 3081 | 2552 | 2656 | 2267 | 2501 | 1688 | 1210 | 1137 | 149 | 1345 | 1546 | 1719 | |
All measurements in Hz.
M, males, F, females, C, children.
Mean and SD (in brackets) of error for AMR-NB compression at various bit-rates.
| Feature | Gender | Bitrate (kbps) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 4.75 | 5.15 | 5.90 | 6.70 | 7.40 | 7.95 | 10.2 | 12.2 | ||
| M | 0 (4) | 0 (4) | 0 (2) | 0 (4) | 0 (0) | 0 (5) | 0 (4) | 0 (4) | |
| F | 0 (4) | 0 (5) | 0 (6) | 0 (6) | 0 (4) | 0 (6) | 0 (5) | ||
| C | 0 (6) | 0 (4) | 0 (0) | ||||||
| Jitter | M | ||||||||
| F | |||||||||
| C | |||||||||
| Shimmer | M | ||||||||
| F | |||||||||
| C | |||||||||
| HNR | M | ||||||||
| F | 5 (5) | 5 (5) | 4 (4) | 4 (4) | 4 (4) | 4 (4) | 4 (4) | 4 (4) | |
| C | 9 (8) | 8 (8) | 7 (7) | 7 (7) | 7 (7) | 7 (7) | 6 (7) | 6 (6) | |
| M | 16 (14) | 15 (14) | 13 (12) | 13 (13) | 13 (13) | 13 (12) | 11 (11) | 11 (11) | |
| F | |||||||||
| C | |||||||||
| M | |||||||||
| F | |||||||||
| C | |||||||||
| MFCC1 | M | ||||||||
| F | |||||||||
| C | |||||||||
| MFCC2 | M | ||||||||
| F | |||||||||
| C | |||||||||
| MFCC3 | M | ||||||||
| F | |||||||||
| C | |||||||||
Table elements in boldface represent metrics that showed a significant difference compared to metrics based on uncompressed audio files.
Gender: M, males, F, females, C, children.
Figure 2Error for each speech feature when the audio signal is compressed using AMR-NB codec at 4.75 kbps.
Figure 4Error for each speech feature when the audio signal is compressed using AMR-NB codec at 12.20 kbps.
Mean and SD (in brackets) of error for AMR-WB compression at various bit-rates.
| Feature | Gender | Bitrate (kbps) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 6.60 | 8.85 | 12.65 | 14.25 | 15.85 | 18.25 | 19.85 | 23.05 | 23.85 | ||
| M | 0 (5) | 0 (4) | 0 (4) | 0 (4) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |
| F | 0 (4) | 0 (6) | 0 (6) | 0 (4) | 0 (6) | 0 (6) | ||||
| C | 0 (2) | 0 (4) | 0 (4) | 0 (1) | 0 (4) | 0 (4) | 0 (0) | |||
| Jitter | M | |||||||||
| F | ||||||||||
| C | ||||||||||
| Shimmer | M | |||||||||
| F | ||||||||||
| C | ||||||||||
| HNR | M | 0 (2) | 0 (3) | |||||||
| F | 0 (1) | 0 (1) | 0 (1) | 0 (1) | 0 (1) | 0 (1) | 0 (1) | |||
| C | 0 (1) | 0 (1) | 0 (1) | 0 (1) | 0 (1) | 0 (1) | 0 (1) | |||
| M | 1 (4) | 1 (3) | 0 (3) | 0 (3) | 0 (3) | 0 (2) | 0 (3) | 0 (2) | 0 (2) | |
| F | 3 (5) | 3 (4) | 1 (2) | 1 (2) | 1 (2) | 0 (2) | 0 (2) | 0 (1) | 0 (2) | |
| C | 1 (4) | 1 (4) | 1 (4) | 0 (3) | 0 (3) | 0 (3) | 0 (2) | |||
| M | 0 (4) | 0 (3) | ||||||||
| F | 0 (4) | 1 (3) | 0 (2) | 0 (2) | 0 (1) | 0 (1) | 0 (1) | 0 (1) | 0 (1) | |
| C | 0 (2) | 0 (3) | 0 (2) | 0 (2) | 0 (2) | 0 (2) | 0 (2) | |||
| MFCC1 | M | 3 (7) | 2 (7) | 2 (7) | 2 (6) | 2 (6) | 1 (6) | 1 (6) | ||
| F | ||||||||||
| C | ||||||||||
| MFCC2 | M | 12 (21) | 10 (17) | 7 (14) | 6 (13) | 6 (14) | 5 (12) | 5 (12) | 5 (11) | 5 (11) |
| F | ||||||||||
| C | 26 | |||||||||
| MFCC3 | M | 12 (17) | 9 (15) | 8 (12) | 8 (12) | 7 (11) | 7 (11) | 6 (11) | 5 (10) | 5 (10) |
| F | ||||||||||
| C | ||||||||||
Table elements in boldface represent metrics that showed a significant difference compared to metrics based on uncompressed audio files.
Gender: M, males, F, females, C, children.
Figure 5Error for each speech feature when the audio signal is compressed using AMR-WB codec at 12.65 kbps.
Figure 7Error for each speech feature when the audio signal is compressed using AMR-WB codec at 23.85 kbps.
Figure 8Jitter, shimmer, and HNR errors for each spoken vowel when the audio signal is compressed using AMR-WB codec at 23.85 kbps.
Figure 10MFCC errors for each spoken vowel when the audio signal is compressed using AMR-WB codec at 23.85 kbps.
Vowel error in ascending order (left to right) sorted by mean and SD (in brackets).
| Feature | Vowel error in ascending order | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| aw(aw) | ||||||||||||
| Jitt | aw( | |||||||||||
| Shimm | aw( | |||||||||||
| HNR | aw( | |||||||||||
| aw( | ||||||||||||
| MFCC1 | aw( | |||||||||||
| MFCC2 | aw( | |||||||||||
| MFCC3 | ||||||||||||