Literature DB >> 28736736

Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.

Ashwin Bellur1, Mounya Elhilali1.   

Abstract

Parsing natural acoustic scenes using computational methodologies poses many challenges. Given the rich and complex nature of the acoustic environment, data mismatch between train and test conditions is a major hurdle in data-driven audio processing systems. In contrast, the brain exhibits a remarkable ability at segmenting acoustic scenes with relative ease. When tackling challenging listening conditions that are often faced in everyday life, the biological system relies on a number of principles that allow it to effortlessly parse its rich soundscape. In the current study, we leverage a key principle employed by the auditory system: its ability to adapt the neural representation of its sensory input in a high-dimensional space. We propose a framework that mimics this process in a computational model for robust speech activity detection. The system employs a 2-D Gabor filter bank whose parameters are retuned offline to improve the separability between the feature representation of speech and nonspeech sounds. This retuning process, driven by feedback from statistical models of speech and nonspeech classes, attempts to minimize the misclassification risk of mismatched data, with respect to the original statistical models. We hypothesize that this risk minimization procedure results in an emphasis of unique speech and nonspeech modulations in the high-dimensional space. We show that such an adapted system is indeed robust to other novel conditions, with a marked reduction in equal error rates for a variety of databases with additive and convolutive noise distortions. We discuss the lessons learned from biology with regard to adapting to an ever-changing acoustic environment and the impact on building truly intelligent audio processing systems.

Entities:  

Keywords:  Adaptation; gabor filters; genetic algorithm; spectrotemporal filters; speech activity detection

Year:  2016        PMID: 28736736      PMCID: PMC5516649          DOI: 10.1109/TASLP.2016.2639322

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  23 in total

Review 1.  Processing of complex stimuli and natural scenes in the auditory cortex.

Authors:  Israel Nelken
Journal:  Curr Opin Neurobiol       Date:  2004-08       Impact factor: 6.627

2.  Multiresolution spectrotemporal analysis of complex sounds.

Authors:  Taishih Chi; Powen Ru; Shihab A Shamma
Journal:  J Acoust Soc Am       Date:  2005-08       Impact factor: 1.840

3.  Modeling feature-based attention as an active top-down inference process.

Authors:  Fred H Hamker
Journal:  Biosystems       Date:  2006-04-07       Impact factor: 1.973

4.  Adaptive changes in cortical receptive fields induced by attention to complex sounds.

Authors:  Jonathan B Fritz; Mounya Elhilali; Shihab A Shamma
Journal:  J Neurophysiol       Date:  2007-08-15       Impact factor: 2.714

Review 5.  Auditory attention--focusing the searchlight on sound.

Authors:  Jonathan B Fritz; Mounya Elhilali; Stephen V David; Shihab A Shamma
Journal:  Curr Opin Neurobiol       Date:  2007-08-21       Impact factor: 6.627

6.  Object-based auditory and visual attention.

Authors:  Barbara G Shinn-Cunningham
Journal:  Trends Cogn Sci       Date:  2008-04-07       Impact factor: 20.229

Review 7.  Adaptive auditory computations.

Authors:  Shihab Shamma; Jonathan Fritz
Journal:  Curr Opin Neurobiol       Date:  2014-02-11       Impact factor: 6.627

8.  Music in our ears: the biological bases of musical timbre perception.

Authors:  Kailash Patil; Daniel Pressnitzer; Shihab Shamma; Mounya Elhilali
Journal:  PLoS Comput Biol       Date:  2012-11-01       Impact factor: 4.475

9.  Top-down feedback in an HMAX-like cortical model of object perception based on hierarchical Bayesian networks and belief propagation.

Authors:  Salvador Dura-Bernal; Thomas Wennekers; Susan L Denham
Journal:  PLoS One       Date:  2012-11-05       Impact factor: 3.240

10.  Object recognition with hierarchical discriminant saliency networks.

Authors:  Sunhyoung Han; Nuno Vasconcelos
Journal:  Front Comput Neurosci       Date:  2014-09-09       Impact factor: 2.380

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.