| Literature DB >> 36059405 |
Venkata Subbaiah Putta1, A Selwin Mich Priyadharson1, Venkatesa Prabhu Sundramurthy2.
Abstract
Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabilities than standard air-conduction microphones (ACMs). BCMs have a different frequency response than ACMs because they only capture the low-frequency portion of speech signals. When we replace an ACM with a BCM, we may get satisfactory noise suppression results, but the speech quality and intelligibility may suffer due to the nature of the solid vibration. Mismatched BCM and ACM characteristics can also have an impact on ASR performance, and it is impossible to recreate a new ASR system using voice data from BCMs. The speech intelligibility of a BCM-conducted speech signal is determined by the location of the bone used to acquire the signal and accurately model phonemes of words. Deep learning techniques such as neural network have traditionally been used for speech recognition. However, neural networks have a high computational cost and are unable to model phonemes in signals. In this paper, the intelligibility of BCM signal speech was evaluated for different bone locations, namely the right ramus, larynx, and right mastoid. Listener and deep learning architectures such as CapsuleNet, UNet, and S-Net were used to acquire the BCM signal for Tamil words and evaluate speech intelligibility. As validated by the listener and deep learning architectures, the Larynx bone location improves speech intelligibility.Entities:
Mesh:
Year: 2022 PMID: 36059405 PMCID: PMC9436543 DOI: 10.1155/2022/4473952
Source DB: PubMed Journal: Comput Intell Neurosci
Related works.
| Reference | Sensor | Problem | Method |
|---|---|---|---|
| [ | Stethoscope and acoustic sensor | Lombard reflex on nonaudible murmur recognition in the presence of noise | Evaluation of nonaudible murmur microphone robustness with real and simulated noisy data |
| [ | Softband bone conducted hearing device | Analyze auditory, speech development of bilateral microtia-affected children | The speech development of children assesses with a meaningful auditory integration scale and speech intelligibility rating |
| [ | Throat, acoustic microphone | Improve throat acoustic microphone speech recognition | The throat and acoustic microphone correlate to extract acoustic feature vector for speech recognition |
| [ | Baha attract bone hearing system | Speech recognition of wireless bluetooth device in patients using a baha attract bone hearing system and traditional hearing aid | Speech perception, recognition of Korean sentences were performed in quiet and noisy conditions |
| [ | Bonebridge™ MED-EL | Speech recognition performance comparison of semiimplanted bonebridge MED-EL and adhesive bone-conduction device | Free-field audiometry test was conducted with speech, noise produced through loud speaker |
| [ | Air and bone conduction microphone | Evaluate enhanced speech quality signal | The equalised bone conducted speech produced by maximum likelihood and bone conducted estimator for high and low SNR conditions, respectively. The equalised bone conducted speech quality evaluates with wiener gain and priori SNR estimator |
| [ | Bone conducted microphone | Nonstationary noise suppression of speech signal | Supress noise in speech signal by selection of speech codebook based on noise free bone conducted microphone reference signal |
| [ | Bone conducted microphone | Low frequency noise suppression | Supress low frequency noise namely colour, multitasker babble, and car from speech signal with bone conducted speech. The low noise frequency signal present in air-conducted speech is replaced with bone-conducted speech |
| Proposed | MEMS acoustic vibration transducer | Tamil word recognition | One syllable, two-syllable, and three-syllable Tamil speech recognize with CapsuleNet, UNet, and S-Net |
Figure 1Overview of speech signal processing.
Signal parameters of different Tamil words.
| Syllable | Sigma | Mu | Crest factor | Dynamic range (dB) | Autocorrelation time (sec) |
|---|---|---|---|---|---|
| Amam | 0.055079 | −0.16242 | 15.3147 | 79.7327 | 3.0891 |
| Vena | 0.047551 | −0.20297 | 13.6193 | 72.0075 | 3.0784 |
| Iruku | 0.043221 | −0.24713 | 12.0106 | 66.6918 | 3.0838 |
| Illa | 0.042194 | −0.14086 | 16.6509 | 75.1304 | 3.0865 |
| Enna | 0.050101 | −0.10114 | 18.9488 | 83.7131 | 3.0449 |
Figure 2Bone location in skull and throat.
Figure 3“Amam” speech signal waveform, spectrogram, magnitude response, amplitude spectrum, autocorrelation, and probability distribution. (a) “Amam” speech signal. (b) “Amam” signal spectrogram. (c) “Amam” signal magnitude response. (d) “Amam” signal amplitude spectrum. (e) “Amam” signal autocorrelation time. (f) “Amam” signal probability distribution.
Figure 4“Vena” speech signal waveform, spectrogram, magnitude response, amplitude spectrum, autocorrelation, and probability distribution in Table 2. (a) “Vena” speech signal. (b) “Vena” signal spectrogram. (c) “Vena” signal magnitude response. (d) “Vena” signal amplitude spectrum. (e) “Vena” signal autocorrelation time. (f) “Vena” signal probability distribution.
Figure 5CapsuleNet architecture.
Figure 6CapsuleNet Speech recognition. (a) “Amam” query speech signal. (b) Recognised speech signal.
Figure 7UNet architecture.
Figure 8UNet speech recognition. (a) “Vena” query speech signal. (b) Recognised speech signal.
Figure 9S-Net architecture.
Figure 10S-Net speech recognition. (a) Query speech signal. (b) Recognised speech signal.
Voice and BCM signal correlation with LSSVM, SVM, and SVR.
| Correlation between voice and BCM signal | ||||
|---|---|---|---|---|
| Tamil words and syllabi | Architecture | Correlation algorithm (%) | ||
| S-Net | UNet | CapsuleNet | ||
| Amam (2 syllabi) | 83.29 | 85.62 | 87.52 | LSSVM |
| Vena (2 syllabi) | 81.56 | 84.58 | 88.15 | |
| Iruku (2 syllabi) | 82.69 | 87.56 | 92.15 | |
| Illa (2 syllabi) | 83.59 | 87.91 | 93.45 | |
| Enna (2 syllabi) | 83.15 | 88.15 | 91.25 | |
| Engae (2 syllabi) | 82.18 | 88.94 | 93.58 | |
| Naan (2 syllabi) | 83.29 | 89.18 | 94.59 | |
| Va (1 syllabi) | 92.6 | 93.25 | 96.12 | SVM |
| Engae va (3 syllabi) | 91.2 | 92.89 | 97.25 | SVR |