| Literature DB >> 27601991 |
Abstract
Parsing the visual scene into objects is paramount to survival. Yet, how this is accomplished by the nervous system remains largely unknown, even in the comparatively well understood visual system. It is especially unclear how detailed peripheral signal representations are transformed into the object-oriented representations that are independent of object position and are provided by the final stages of visual processing. This perspective discusses advances in computational algorithms for fitting large-scale models that make it possible to reconstruct the intermediate steps of visual processing based on neural responses to natural stimuli. In particular, it is now possible to characterize how different types of position invariance, such as local (also known as phase invariance) and more global, are interleaved with nonlinear operations to allow for coding of curved contours. Neurons in the mid-level visual area V4 exhibit selectivity to pairs of even- and odd-symmetric profiles along curved contours. Such pairing is reminiscent of the response properties of complex cells in the primary visual cortex (V1) and suggests specific ways in which V1 signals are transformed within subsequent visual cortical areas. These examples illustrate that large-scale models fitted to neural responses to natural stimuli can provide generative models of successive stages of sensory processing.Entities:
Keywords: Convolutional Neural Networks (CNN); area V4; auditory system; curvature; object recognition; phase invariance; quadrature model; visual system
Year: 2016 PMID: 27601991 PMCID: PMC4993779 DOI: 10.3389/fnsyn.2016.00026
Source DB: PubMed Journal: Front Synaptic Neurosci ISSN: 1663-3563
Figure 1Estimating feature selectivity and invariance properties from neural responses to natural stimuli. (A) Schematic of the model that combines position invariance with selectivity to conjunctions of two features at one position in the visual space. At each position in the visual space, stimuli are compared to the two features. The result of this computation produces two projection values, one for each feature. The two projection values are combined according to a nonlinear function (determined by fitting and not shown here). The results of this nonlinear computation at each position are combined according to a MAX (or logical OR) operation to obtain a prediction for the spike rate elicited by the stimulus. (B) Examples of the two most relevant image features for three V4 neurons from two different animals. Columns refer to different neurons, from left to right: m26a_3, j15c_1, j46a_1. The first and second rows show the first and second maximally informative feature per neuron, respectively. Each feature is shown after fitting by a curved Gabor model to the templates estimated from the responses of these neurons to natural stimuli (Sharpee et al., 2013). (C) A pair of two most relevant temporal profiles for an auditory neuron. Data are from field L (Sharpee et al., 2011a), a region analogous to the mammalian primary auditory cortex (Sharpee et al., 2011a). The sum of the two relevant features (magenta) produces a time dilated version of the first feature (blue). Neuron “udon2120.”