| Literature DB >> 26729126 |
Fuming Chen1, Sheng Li2, Chuantao Li3, Miao Liu4, Zhao Li5, Huijun Xue6, Xijing Jing7, Jianqi Wang8,9.
Abstract
In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection.Entities:
Keywords: 94 GHz MMW; empirical mode decomposition; mutual information entropy; radar speech; speech enhancement
Mesh:
Year: 2015 PMID: 26729126 PMCID: PMC4732083 DOI: 10.3390/s16010050
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Schematic diagram of the 94 GHz millimeter wave radar sensor.
Figure 2The flow chart of empirical mode decomposition algorithm.
Figure 3(a) The original radar speech signal contaminated by white noise; (b) the decomposition of the original radar speech corrupted by white noise using EMD.
Figure 4The time-domain waveforms and the spectrograms of the radar speech “1-2-3-4-5-6”. (a,e) are the original radar speech; (b,f) are enhanced speech obtained by the spectral subtraction; (c,g) are enhanced speech obtained by the wavelet shrinkage; (d,h) are enhanced speech obtained by the proposed algorithm.
Comparison of the results of averaged MOS with three types of noise at a SNR of 5 dB. The numbers in the brackets represent standard deviation for these mean opinion scores.
| Enhancement Algorithms | White | Pink | Babble |
|---|---|---|---|
| Spectral subtraction | 2.78 (0.30) | 2.98 (0.38) | 2.64 (0.35) |
| Wavelet shrinkage | 3.25 (0.46) | 3.37 (0.32) | 3.21 (0.27) |
| Proposed method | 3.59 (0.37) | 3.71 (0.35) | 3.56 (0.42) |
Comparison of the SNRs obtained by using three enhancement algorithms.
| Enhancement Algorithms | White | Pink | Babble | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| −5 | 0 | 5 | 10 | −5 | 0 | 5 | 10 | −5 | 0 | 5 | 10 | |
| Spectral subtraction | 4.1 | 7.1 | 8.9 | 9.7 | 3.7 | 6.8 | 7.4 | 9.2 | 2.3 | 3.7 | 7.1 | 8.7 |
| Wavelet shrinkage | 4.6 | 7.6 | 10.2 | 12.3 | 4.1 | 7.2 | 8.6 | 12.1 | 2.7 | 5.6 | 7.3 | 11.9 |
| Proposed method | 5.2 | 7.5 | 10.9 | 14.9 | 4.8 | 7.3 | 10.2 | 13.7 | 3.9 | 6.7 | 10.1 | 12.3 |