| Literature DB >> 35161469 |
Tao Tao1, Hong Zheng1, Jianfeng Yang1, Zhongyuan Guo1, Yiyang Zhang1, Jiahui Ao1, Yuao Chen1, Weiting Lin1, Xiao Tan1.
Abstract
In order to simplify the complexity and reduce the cost of the microphone array, this paper proposes a dual-microphone based sound localization and speech enhancement algorithm. Based on the time delay estimation of the signal received by the dual microphones, this paper combines energy difference estimation and controllable beam response power to realize the 3D coordinate calculation of the acoustic source and dual-microphone sound localization. Based on the azimuth angle of the acoustic source and the analysis of the independent quantity of the speech signal, the separation of the speaker signal of the acoustic source is realized. On this basis, post-wiener filtering is used to amplify and suppress the voice signal of the speaker, which can help to achieve speech enhancement. Experimental results show that the dual-microphone sound localization algorithm proposed in this paper can accurately identify the sound location, and the speech enhancement algorithm is more robust and adaptable than the original algorithm.Entities:
Keywords: dual-microphone array; post-filtering; sound localization; speech enhancement; time delay estimation
Mesh:
Year: 2022 PMID: 35161469 PMCID: PMC8840739 DOI: 10.3390/s22030715
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Principle of speaker positioning.
Figure 2Dual-microphone positioning model.
Figure 3Multi-microphone positioning model.
Figure 4Sound localization algorithm based on dual-microphone.
Figure 5Adaptive filter flow.
Figure 6Adaptive speech enhancement flow.
Figure 7Allwinner R328 microphone array physical picture: (a) the front of Allwinner R328 microphone array; (b) the back of Allwinner R328 microphone array.
Figure 8The experiment result of localization experiment: (a) the comparison of azimuth error around no-noise; (b) the comparison of azimuth error around I-noise; (c) the comparison of azimuth error around II-noise; (d) the comparison of azimuth error around III-noise; (e) distance error around noise for this paper.
The comparison of the PESQ value.
| The Type of Noise | SNR | The Speech with Noise | GCC-Enhanced Speech | AGSC-Enhanced Speech | The Speech Enhanced by the Algorithm in This Paper |
|---|---|---|---|---|---|
| babble | −5 dB | 1.38 | 1.35 | 1.53 | 1.74 |
| 0 | 1.69 | 1.55 | 1.59 | 1.85 | |
| 5 dB | 2.12 | 1.97 | 1.98 | 2.26 | |
| street | −5 dB | 1.23 | 1.15 | 1.28 | 1.55 |
| 0 | 1.72 | 1.61 | 1.80 | 1.96 | |
| 5 dB | 2.21 | 2.16 | 2.27 | 2.29 | |
| car | −5 dB | 1.48 | 1.33 | 1.46 | 1.77 |
| 0 | 1.89 | 1.67 | 1.92 | 1.91 | |
| 5 dB | 2.45 | 2.44 | 2.43 | 2.63 | |
| train | −5 dB | 1.27 | 1.30 | 1.28 | 1.46 |
| 0 | 1.56 | 1.67 | 1.66 | 1.91 | |
| 5 dB | 2.17 | 2.18 | 2.15 | 2.52 |
Figure 9The results of speech enhancement comparison experiment in high-noise: (a) Original speech wav; (b) Original speech spectrogram; (c) GCC-enhanced speech wave; (d) GCC-enhanced speech spectrogram; (e) AGSC-enhanced speech wave; (f) AGSC-enhanced speech spectrogram; (g) the speech wave enhanced by the algorithm in this paper; (h) the speech spectrogram enhanced by the algorithm in this paper.
Figure 10The results of speech enhancement comparison experiment in low-noise: (a) Original speech wave; (b) Original speech spectrogram; (c) GCC-enhanced speech wave; (d) GCC-enhanced speech spectrogram; (e) AGSC-enhanced speech wave; (f) AGSC-enhanced speech spectrogram; (g) the speech wave enhanced by the algorithm in this paper; (h) the speech spectrogram enhanced by the algorithm in this paper.
The correct rate of speech recognition.
| Type | Test Environment | Correct Rate (%) |
|---|---|---|
| Original speech file | low-noise | 67.31 |
| high-noise | 57.69 | |
| GCC-enhanced speech file | low-noise | 80.77 |
| high-noise | 73.77 | |
| AGSC-enhanced speech file | low-noise | 90.38 |
| high-noise | 78.85 | |
| the speech file enhanced by the algorithm in this paper | low-noise | 100 |
| high-noise | 98.77 |