Literature DB >> 22559385

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.

Marc René Schädler1, Bernd T Meyer, Birger Kollmeier.   

Abstract

In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a feature extraction scheme is proposed that takes spectro-temporal modulation frequencies (MF) into account. This physiologically inspired approach uses a two-dimensional filter bank based on Gabor filters, which limits the redundant information between feature components, and also results in physically interpretable features. Robustness against extrinsic variation (different types of additive noise) and intrinsic variability (arising from changes in speaking rate, effort, and style) is quantified in a series of recognition experiments. The results are compared to reference ASR systems using Mel-frequency cepstral coefficients (MFCCs), MFCCs with cepstral mean subtraction (CMS) and RASTA-PLP features, respectively. Gabor features are shown to be more robust against extrinsic variation than the baseline systems without CMS, with relative improvements of 28% and 16% for two training conditions (using only clean training samples or a mixture of noisy and clean utterances, respectively). When used in a state-of-the-art system, improvements of 14% are observed when spectro-temporal features are concatenated with MFCCs, indicating the complementarity of those feature types. An analysis of the importance of specific MF shows that temporal MF up to 25 Hz and spectral MF up to 0.25 cycles/channel are beneficial for ASR.

Entities:  

Mesh:

Year:  2012        PMID: 22559385     DOI: 10.1121/1.3699200

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  8 in total

1.  Idealized computational models for auditory receptive fields.

Authors:  Tony Lindeberg; Anders Friberg
Journal:  PLoS One       Date:  2015-03-30       Impact factor: 3.240

2.  General auditory and speech-specific contributions to cortical envelope tracking revealed using auditory chimeras.

Authors:  Kevin D Prinsloo; Edmund C Lalor
Journal:  J Neurosci       Date:  2022-08-30       Impact factor: 6.709

3.  Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.

Authors:  Amin Edraki; Wai-Yip Chan; Jesper Jensen; Daniel Fogerty
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2020-11-24

4.  Matching Pursuit Analysis of Auditory Receptive Fields' Spectro-Temporal Properties.

Authors:  Jörg-Hendrik Bach; Birger Kollmeier; Jörn Anemüller
Journal:  Front Syst Neurosci       Date:  2017-02-09

5.  Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech.

Authors:  Christoph Daube; Robin A A Ince; Joachim Gross
Journal:  Curr Biol       Date:  2019-05-23       Impact factor: 10.834

6.  Speech signal analysis of alzheimer's diseases in farsi using auditory model system.

Authors:  Maryam Momeni; Mahdiyeh Rahmani
Journal:  Cogn Neurodyn       Date:  2020-10-13       Impact factor: 3.473

7.  Auditory Cortical Plasticity Dependent on Environmental Noise Statistics.

Authors:  Natsumi Y Homma; Patrick W Hullett; Craig A Atencio; Christoph E Schreiner
Journal:  Cell Rep       Date:  2020-03-31       Impact factor: 9.423

8.  Mechanisms of Spectrotemporal Modulation Detection for Normal- and Hearing-Impaired Listeners.

Authors:  Emmanuel Ponsot; Léo Varnet; Nicolas Wallaert; Elza Daoud; Shihab A Shamma; Christian Lorenzi; Peter Neri
Journal:  Trends Hear       Date:  2021 Jan-Dec       Impact factor: 3.293

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.