| Literature DB >> 23437044 |
Sebastian McBride1, Martin Huelse, Mark Lee.
Abstract
Computational visual attention systems have been constructed in order for robots and other devices to detect and locate regions of interest in their visual world. Such systems often attempt to take account of what is known of the human visual system and employ concepts, such as 'active vision', to gain various perceived advantages. However, despite the potential for gaining insights from such experiments, the computational requirements for visual attention processing are often not clearly presented from a biological perspective. This was the primary objective of this study, attained through two specific phases of investigation: 1) conceptual modeling of a top-down-bottom-up framework through critical analysis of the psychophysical and neurophysiological literature, 2) implementation and validation of the model into robotic hardware (as a representative of an active vision system). Seven computational requirements were identified: 1) transformation of retinotopic to egocentric mappings, 2) spatial memory for the purposes of medium-term inhibition of return, 3) synchronization of 'where' and 'what' information from the two visual streams, 4) convergence of top-down and bottom-up information to a centralized point of information processing, 5) a threshold function to elicit saccade action, 6) a function to represent task relevance as a ratio of excitation and inhibition, and 7) derivation of excitation and inhibition values from object-associated feature classes. The model provides further insight into the nature of data representation and transfer between brain regions associated with the vertebrate 'active' visual attention system. In particular, the model lends strong support to the functional role of the lateral intraparietal region of the brain as a primary area of information consolidation that directs putative action through the use of a 'priority map'.Entities:
Mesh:
Year: 2013 PMID: 23437044 PMCID: PMC3577816 DOI: 10.1371/journal.pone.0054585
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Glossary of terms and mathematical variables.
| Abbreviation | Explanation |
|
| |
| FEF | Frontal eye field. |
| Gaze space | The egocentric space around the whole of the vision system. |
| ICR | Identified computational requirements. |
| IOR | Inhibition of return-the inhibition of a saccade to a previously fixated object within a defined time frame. |
| Linear Ballistic Accumulator | A model to describe the linear accumulation of information to a point of threshold upon which action e.g. saccade is taken. Starting points and rates of accumulation can vary. |
| LIP | Lateral intraparietal region of the brain predominantly associated with initiation of saccade. |
| MIP | Medial intraparietal sulcus of the brain predominantly associated with initiation of motor action. |
| Retinotopic space | The space as currently observed within the camera's visual scene. |
| what' pathway | Dorsal visual stream that passes through the V1, V2 and V5 layers of the visual arriving at the posterior parietal cortex (and particualrly LIP) and is considered to process and hold spatial information about objects. |
| where' pathway | Ventral visual strem that passes through V1, V2 and V4 layers of the visual cortex before arriving at the inferior temporal cortex. The ventral visual stream in concerned predominantly with object identification |
|
| |
|
| Excitatory aspect of task relevance modulation. |
|
| Activation value derived from modulation of the saliency value |
|
| Set of coordinates associated with egocentric space as a summation of |
|
| Set of coordinates associated with the current retinotopic space. |
|
| Set of coordinates associated with the spatial memory. |
|
| Inhibitory aspect of task relevance modulation. |
|
| Egocentric co-ordinates derived from retinotopic coordinates modulated by relative and absolute pan and tilt camera positions. |
|
| The maximum number of attributes that determine the saliency value |
|
| Time since entry of an activation value f into the gaze space mapping. |
|
| The maximum time that a stimulus can be stored in the spatial memory. |
|
| Saliency value derived after visual RGB filters and movement algorithm. |
|
| Bottom-up weightings conferred at the point of initial filtering of visual information. |
Figure 1Primary brain regions associated with visual attention with identified first stage computational requirements (ICR) 1–4 (AIP-Anterior Intraparietal region; VIP, Ventral Intra-parietal region; MIP- Medial Intraparietal region; LIP-Lateral Intraparietal region).
Figure 2Computational domains of the robotic architecture.
Figure 3Computational architecture for visual attention integrating bottom-up and top-down modulation.
Figure 4Activation values over time undergoing spatial modulation.
Figure 5Activation values over time undergoing spatial and feature modulation for different excitation and inhibition levels; = excitatory modulation, = inhibitory modulation, = maximal time a co-ordinate is stored in the spatial memory.
Figure 6Bottom-up (columns A-G) versus top-down (columns H-M) modulation of visual attention; filled circles refers to bottom-up saliency weightings.