| Literature DB >> 25485194 |
Abstract
In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).Entities:
Keywords: Auditory filter model; Feature extraction; Hidden Markov Models; Noisy speech recognition
Year: 2014 PMID: 25485194 PMCID: PMC4230714 DOI: 10.1186/2193-1801-3-651
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Figure 1Automatic speech recognition system.
Figure 2The Markov Model with 5 states simple model (Young et al. 2009 ).
Figure 3Block diagram of PLP technique (Hermansky 1990 ).
Figure 4The top panel represents the 25 ms waveform segment of the word “Water” (sampling frequency =16 kHz). The bottom panel illustrates the simulation of BMM for the waveform segment.
Figure 5Block diagram of the proposed Perceptual linear predictive auditory Gammachirp (PLPaGc) method.
Figure 6The temporal representations and the spectrograms of the used noises.
Used Gammachirp parameters
| Parameter | Value |
|---|---|
| n | 4 |
| a | 1 |
| b | 1.019 |
| c | 2 |
|
| 0 |
Recognition rate (%) obtained by proposed and standard methods with suburban train noise
| Recognition rate with HMM-4-GM | ||||||
|---|---|---|---|---|---|---|
| SNR level | PLPaGc | PLP | LPCC | LPC | MFCC | |
| 0 dB | 38.55 | 27.77 | 21.79 | 11.86 | 26.95 | |
| 5 dB | 65.59 | 50.16 | 40.48 | 13.62 | 49.42 | |
| Suburban train noise | 10 dB | 84.71 | 72.74 | 60.96 | 18.47 | 71.66 |
| 15 dB | 92.74 | 85.82 | 77.90 | 28.96 | 86.30 | |
| 20 dB | 95.77 | 91.72 | 87.06 | 41.96 | 92.60 | |
| Average | 75.47 | 65.64 | 57.64 | 22.97 | 65.39 | |
Recognition rate (%) obtained by proposed and standard methods with exhibition hall noise
| Recognition rate with HMM-4-GM | ||||||
|---|---|---|---|---|---|---|
| SNR level | PLPaGc | PLP | LPCC | LPC | MFCC | |
| 0 dB | 37.53 | 26.67 | 18.33 | 8.31 | 26.04 | |
| 5 dB | 61.36 | 48.31 | 39.06 | 14.67 | 47.18 | |
| Exhibition hall noise | 10 dB | 81.73 | 69.30 | 60.54 | 20.65 | 68.74 |
| 15 dB | 90.58 | 84.17 | 77.99 | 29.87 | 84.09 | |
| 20 dB | 95.74 | 91.40 | 86.92 | 40.00 | 92.14 | |
| Average | 73.39 | 63.97 | 56.57 | 22.70 | 63.64 | |
Recognition rate (%) obtained by proposed and standard methods with street noise
| Recognition rate with HMM-4-GM | ||||||
|---|---|---|---|---|---|---|
| SNR level | PLPaGc | PLP | LPCC | LPC | MFCC | |
| 0 dB | 39.86 | 32.03 | 25.13 | 10.52 | 30.64 | |
| 5 dB | 65.90 | 51.60 | 41.73 | 12.65 | 50.52 | |
| Street noise | 10 dB | 84.26 | 72.99 | 60.51 | 16.88 | 73.13 |
| 15 dB | 92.84 | 85.93 | 76.79 | 26.35 | 86.33 | |
| 20 dB | 96.00 | 91.63 | 87.09 | 38.04 | 92.31 | |
| Average | 75.70 | 66.84 | 58.25 | 20.89 | 66.59 | |
Recognition rate (%) obtained by proposed and standard methods with car noise
| Recognition rate with HMM-4-GM | ||||||
|---|---|---|---|---|---|---|
| SNR level | PLPaGc | PLP | LPCC | LPC | MFCC | |
| 0 dB | 45.96 | 28.51 | 23.15 | 10.13 | 29.19 | |
| 5 dB | 70.81 | 56.37 | 46.55 | 13.50 | 56.14 | |
| Car noise | 10 dB | 88.94 | 80.57 | 70.87 | 20.65 | 81.08 |
| 15 dB | 94.84 | 91.55 | 86.07 | 31.74 | 92.23 | |
| 20 dB | 96.74 | 94.89 | 91.60 | 43.21 | 95.63 | |
| Average | 79.46 | 70.38 | 63.65 | 23.85 | 70.85 | |