| Literature DB >> 28044012 |
Emine Merve Kaya1, Mounya Elhilali2.
Abstract
Sounds in everyday life seldom appear in isolation. Both humans and machines are constantly flooded with a cacophony of sounds that need to be sorted through and scoured for relevant information-a phenomenon referred to as the 'cocktail party problem'. A key component in parsing acoustic scenes is the role of attention, which mediates perception and behaviour by focusing both sensory and cognitive resources on pertinent information in the stimulus space. The current article provides a review of modelling studies of auditory attention. The review highlights how the term attention refers to a multitude of behavioural and cognitive processes that can shape sensory processing. Attention can be modulated by 'bottom-up' sensory-driven factors, as well as 'top-down' task-specific goals, expectations and learned schemas. Essentially, it acts as a selection process or processes that focus both sensory and cognitive resources on the most relevant events in the soundscape; with relevance being dictated by the stimulus itself (e.g. a loud explosion) or by a task at hand (e.g. listen to announcements in a busy airport). Recent computational models of auditory attention provide key insights into its role in facilitating perception in cluttered auditory scenes.This article is part of the themed issue 'Auditory and visual scene analysis'.Entities:
Keywords: auditory attention; auditory scene; bottom-up; computational model; salience; top-down
Mesh:
Year: 2017 PMID: 28044012 PMCID: PMC5206269 DOI: 10.1098/rstb.2016.0101
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1.A broad classification of models described in this review. Reconstruction techniques are not computational models in the traditional forward architecture of sound to ‘perception’; however, this methodology provides valuable insight in understanding task-directed attention.
Figure 2.The spectrogram (time–frequency ‘image’) of an excerpt from Haydn's Surprise Symphony. Marked times correspond to the approximate location in the second movement. The surprising section is a loud chord played by the entire orchestra following a long passage of quiet string instruments. We consider the scenario of an orchestral passage immediately following the surprise chord. If the passage were reversed in time, the surprise chord would no longer be surprising, and the switch to a quiet passage is not as surprising as the switch to a sudden loud passage. This figure demonstrates the dependence of auditory salience on time and context.
Figure 3.Attending to a particular sound characteristic tunes the neural spectro-temporal receptive fields (STRFs) and boosts the neural signal at times of attended event. Violin notes are overlaid with frequency modulations (FMs), illustrated with the spectrogram S(t). When instructed to attend to the FM segments, the STRF adapts to the orientation of the modulations, resulting in an enhancement in the neural response R(t).