| Literature DB >> 24959621 |
Yan Zhang1, Zhen-min Tang2, Yan-ping Li3, Yang Luo4.
Abstract
Accurate and effective voice activity detection (VAD) is a fundamental step for robust speech or speaker recognition. In this study, we proposed a hierarchical framework approach for VAD and speech enhancement. The modified Wiener filter (MWF) approach is utilized for noise reduction in the speech enhancement block. For the feature selection and voting block, several discriminating features were employed in a voting paradigm for the consideration of reliability and discriminative power. Effectiveness of the proposed approach is compared and evaluated to other VAD techniques by using two well-known databases, namely, TIMIT database and NOISEX-92 database. Experimental results show that the proposed method performs well under a variety of noisy conditions.Entities:
Mesh:
Year: 2014 PMID: 24959621 PMCID: PMC4052886 DOI: 10.1155/2014/723643
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Hierarchical framework of proposed system.
Figure 2Speech spectrograms, buccaneer noise, 10 dB. From (a) to (d), clean signal, noisy signal, enhanced signal by Wiener filter, and enhanced signal by modified Wiener filter.
Figure 3Results of segmental SNR measures of noisy signal, the Wiener filter, and the modified Wiener filter.
Figure 4Clean/noisy signal and its correspondence short-term energy and spectral entropy.
Average speech/nonspeech hit rates under different noise conditions at 5 dB.
| VAD | Noise | ||||
|---|---|---|---|---|---|
| White | Buccaneer | Babble | Factory | Average | |
| G.729 | 80.4% | 49.4% | 51.6% | 57.1% | 59.6% |
| Amr2 | 75.5% | 63.3% | 64.2% | 64.5% | 66.9% |
| Proposed | 87.9% | 69.7% | 70.3% | 73.2% | 75.3% |
Accuracy of the proposed VAD algorithm at different SNR levels.
| Noise | SNR | ||||
|---|---|---|---|---|---|
| 0 dB | 5 dB | 10 dB | 15 dB | 20 dB | |
| White noise | 53.3% | 61.7% | 67.1% | 94.5% | 97.8% |
| Buccaneer noise | 67.6% | 75.4% | 84.1% | 90.8% | 96.5% |
| Babble noises | 84.2% | 88.3% | 93.2% | 94.7% | 99.1% |
| Factory noise | 61.9% | 69.5% | 82.9% | 97.5% | 98.0% |
|
| |||||
| Average | 66.8% | 73.7% | 81.8% | 94.4% | 97.8% |