Literature DB >> 25920855

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

Marc René Schädler1, Birger Kollmeier1.   

Abstract

To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance.

Entities:  

Mesh:

Year:  2015        PMID: 25920855     DOI: 10.1121/1.4916618

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  5 in total

1.  Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.

Authors:  Amin Edraki; Wai-Yip Chan; Jesper Jensen; Daniel Fogerty
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-11-24

2.  Matching Pursuit Analysis of Auditory Receptive Fields' Spectro-Temporal Properties.

Authors:  Jörg-Hendrik Bach; Birger Kollmeier; Jörn Anemüller
Journal:  Front Syst Neurosci       Date:  2017-02-09

3.  Time-frequency scattering accurately models auditory similarities between instrumental playing techniques.

Authors:  Vincent Lostanlen; Christian El-Hajj; Mathias Rossignol; Grégoire Lafay; Joakim Andén; Mathieu Lagrange
Journal:  EURASIP J Audio Speech Music Process       Date:  2021-01-11

4.  Attention Differentially Affects Acoustic and Phonetic Feature Encoding in a Multispeaker Environment.

Authors:  Emily S Teoh; Farhin Ahmed; Edmund C Lalor
Journal:  J Neurosci       Date:  2021-12-10       Impact factor: 6.167

5.  Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms.

Authors:  Marc R Schädler; Anna Warzybok; Birger Kollmeier
Journal:  Trends Hear       Date:  2018 Jan-Dec       Impact factor: 3.293

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.