Antonio J Rodríguez-Sánchez1, John K Tsotsos. 1. Intelligent and Interactive Systems, University of Innsbruck, Innsbruck, Austria. Antonio.Rodriguez-Sanchez@uibk.ac.at
Abstract
That shape is important for perception has been known for almost a thousand years (thanks to Alhazen in 1083) and has been a subject of study ever since by scientists and phylosophers (such as Descartes, Helmholtz or the Gestalt psychologists). Shapes are important object descriptors. If there was any remote doubt regarding the importance of shape, recent experiments have shown that intermediate areas of primate visual cortex such as V2, V4 and TEO are involved in analyzing shape features such as corners and curvatures. The primate brain appears to perform a wide variety of complex tasks by means of simple operations. These operations are applied across several layers of neurons, representing increasingly complex, abstract intermediate processing stages. Recently, new models have attempted to emulate the human visual system. However, the role of intermediate representations in the visual cortex and their importance have not been adequately studied in computational modeling.This paper proposes a model of shape-selective neurons whose shape-selectivity is achieved through intermediate layers of visual representation not previously fully explored. We hypothesize that hypercomplex--also known as endstopped--neurons play a critical role to achieve shape selectivity and show how shape-selective neurons may be modeled by integrating endstopping and curvature computations. This model--a representational and computational system for the detection of 2-dimensional object silhouettes that we term 2DSIL--provides a highly accurate fit with neural data and replicates responses from neurons in area V4 with an average of 83% accuracy. We successfully test a biologically plausible hypothesis on how to connect early representations based on Gabor or Difference of Gaussian filters and later representations closer to object categories without the need of a learning phase as in most recent models.
That shape is important for perception has been known for almost a thousand years (thanks to Alhazen in 1083) and has been a subject of study ever since by scientists and phylosophers (such as Descartes, Helmholtz or the Gestalt psychologists). Shapes are important object descriptors. If there was any remote doubt regarding the importance of shape, recent experiments have shown that intermediate areas of primate visual cortex such as V2, V4 and TEO are involved in analyzing shape features such as corners and curvatures. The primate brain appears to perform a wide variety of complex tasks by means of simple operations. These operations are applied across several layers of neurons, representing increasingly complex, abstract intermediate processing stages. Recently, new models have attempted to emulate the human visual system. However, the role of intermediate representations in the visual cortex and their importance have not been adequately studied in computational modeling.This paper proposes a model of shape-selective neurons whose shape-selectivity is achieved through intermediate layers of visual representation not previously fully explored. We hypothesize that hypercomplex--also known as endstopped--neurons play a critical role to achieve shape selectivity and show how shape-selective neurons may be modeled by integrating endstopping and curvature computations. This model--a representational and computational system for the detection of 2-dimensional object silhouettes that we term 2DSIL--provides a highly accurate fit with neural data and replicates responses from neurons in area V4 with an average of 83% accuracy. We successfully test a biologically plausible hypothesis on how to connect early representations based on Gabor or Difference of Gaussian filters and later representations closer to object categories without the need of a learning phase as in most recent models.
Since the foundation of modern neuroanatomy by Ramón y Cajal, who gave a detailed description of the nerve cell organization in the central and peripheral nervous system [1]–[4], great progress has been achieved in understanding the human brain. At the same time, computing power and technology have provided more sophisticated tools to study the brain and its great complexity. Computational neuroscience has appeared as an important methodology for formalizing and testing new hypotheses on how that complex system may perform certain operations.Over the last decades, many models inspired by advances in the anatomy of the visual cortex have been presented, the earliest from the late 60 s and early 70 s [5]–[8]. A subsequent and very influential model is Fukushima's Neocognitron [9]. The Neocognitron is a self-organizing neural network model that achieves position invariance and later demonstrated to perform well on digit recognition [10]. The network contains an input layer followed by a cascade of S-cells (for simple cells) and C-cells (complex cells). After unsupervised training thanks to a self-organization process, one of the C-cells in the last layer will respond selectively to the input pattern used in training. Later models, based on Fukushima's foundation, that included backpropagation [11] were also successful at the task of handwriting digit recognition [12], [13].Since then, there have been several relevant works. Visnet [14] consists of a four layer network that achieves invariant object recognition. The most crucial part of such a method is a trace learning rule that is Hebbian based. To achieve translation invariance, the network is trained with inputs at different positions. Riesenhuber and Poggio's [15]–[19] model consists of five hierarchical levels of S and C neurons (following Fukushima's Neocognitron [9]) that are connected through linear operations in one layer and non-linear (MAX) in the next (the strongest units determine the response of the system). The first level receives input from the retina and is composed of simple neuron receptive fields that analyze orientations. The next levels account for more complex features (e.g. junctions). The last level is composed of view-tuned neurons that achieve position and scale invariance.Amit [20], [21] presents a parallel neural network for visual selection. This network is trained to detect candidate locations for object recognition. Objects are represented as composed of features localized at different locations with respect to an object centre. Simple features (edges and conjunctions) are detected in lower levels, while higher levels carry out disjunctions over regions. Suzuki and colleagues [22] construct a model of the form pathway based on predictive coding [23], [24]. Predictive coding hypothesizes that feedback connections from high to lower-order cortical areas carry predictions of lower-level neural activities. Feedforward connections carry residual errors between predictions and the actual lower-level activities. In the model, a fast coarse processing precedes and contrains more detailed processing.None of the models presented until now fully explore the possible contributions of intermediate representations as they are known in the brain. Common to most models is a first step that performs some sort of edge-detection in a similar way to some V1 neurons in the brain. Even though some of the proposals may include hierarchies with intermediate representations (e.g. [19], [25]), these representations do not include much of the complexity now known to exist in the intermediate layers of the visual cortex. The usual modeling of intermediate layers to date is a simple composition of earlier features to approximate shape without computing curvature or shape directly. Here, we propose a more direct approach, one that provides models of units that compute shape properties directly using several novel neurally-based computations. Distinct from the best of the previous approaches, we do not use simple hierarchical composition of a common neural type but rather, define new neural selectivities for each of several intermediate visual computation layers.Models up to now have been stagnant on the representation of contours following Marr's [26], [27] primal sketch, that is, edge combinations are used to represent shapes and objects. Models have added layers of S and C cells following early systems [9] into higher levels of the hierarchy, not considering that cells in those higher levels perform quite different, more complex, operations. There has been some progress on how hypercomplex cells, also known as endstopped, may be defined [28]–[30], but except for the work of [31]–[33] on figure-ground segregation, the role of endstopping has been neglected. Here, following this past work, we hypothesize that endstopped neurons play an important role in encoding curvature and shape.We present a biologically plausible model for shape representation, 2DSIL, where the focus is on 2D silhouettes. In the following section we describe in detail each layer in the model. Next we show the strongly positive results of testing the model with stimuli used in previous single-cell recording studies followed by a discussion regarding the characteristics of 2DSIL. In a previous paper [34] we showed that even when this representation is used within a recognition system, it outperforms the leading competing models. Material and methods are presented at the end.
Results
In this section we explain how shape selectivity may be achieved with a model that incorporates intermediate layers inspired by the primate visual system. We demonstrate the performance of our model by comparing computed responses with neurons from area V4.
Incorporating endstopping and curvature in a model of shape representation
Figure 1 presents a depiction of the proposed architecture, which comprises simple, complex, endstopped, local curvature and shape-selective cells that are described next in detail. In what follows whenever a neuron is referred to as model neuron/cell it is one developed for our theory. A neuron or cell referred to without the model adjective is a biological one.
Figure 1
Architecture of the representational and computational system for the detection of 2-dimensional object silhouettes (2DSIL).
Model simple cells
Simple neurons of visual area V1 are sensitive to bar and edge orientations as previous models also stipulate. Common spatial response profiles to model simple neurons in area V1 include Gabor filters [35] and Difference of Gaussians. The latter provides a better fit to neuronal responses [36] and accordingly gave better results in our case than the Gabor filter formulation:where is the height and and are the width of each Gaussian function. is their orientation. The relation between these parameters may be referred to as the aspect ratio and the width ratio . Size of filters were 4. As with all the model neurons within 2DSIL, these are defined at multiple scales, each scale being band-pass for a range of receptive field sizes, with the number of scales represented appropriate for the modelling task. Values assigned to these parameters are exposed in the methods section.Cells in area V1 are heterogeneous, i.e. they are not all uniform. In the realization of the model, four different groups of simple cells were designed, varying sizes and values of width and length. Model simple cells are organized into hypercolumns. Within a hypercolumn, cells are organized at the same orientation but are spatially displaced and combined into model complex cells as described next (Figure 1), however there is no input from left and right eye since binocular responses are not considered in this study. Model simple cells are at different orientations and scales.
Model complex cells
Complex cells have a sensitivity for bars and orientations as well, but their receptive fields are larger than the ones of simple neurons. Hubel and Wiesel [37]–[39] suggested that complex cells may integrate the responses of simple cells. In addition to this, [40] showed that complex cells may be the result of the addition of simple cells along the axis perpendicular to their orientation. Following these studies, in our model, a complex cell is the weighted sum of 5 laterally displaced model simple cells within a column. The model complex cell response is given by [30]:is the response of the ith cell and is its weight. Model cells are Gaussian weighted by position, with weight inversely proportional to distance to the center. is a rectification function, where any value less than 0 is set to 0. Model simple cells combining into a model complex cell are laterally displaced, their displacement being proportional to the cell's size as well as the height () and width () of the Gaussian function. Displacement is in the direction of the orientation perpendicular to the preferred one (, using the modulo function to keep values in the range ) and are given by (displacement in x axis) and (displacement in y axis) in the following equation:The construction of a model complex neuron is depicted in Figure 2A. The orientation of its model simple neuronal components in this case is for 90 (vertical), while the 5 model simple cells are organized perpendicularly (spatially displaced but overlapping) to this preferred orientation, that is, 0. This results in slightly less sensitivity for orientations since each model complex cell integrates five model simple cells. A model complex neuron yields a positive response for stimuli at more locations inside its receptive field and their receptive fields are larger as well. These characteristics follow [37]–[39] and up to this point our model simple and complex cells follow [9] and share some similarities with its followers as well [15], [21], [41].
Figure 2
Endstopping.
(A) Model complex cell. (B) Structure of model endstopped cell. (C) Response of the model endstopped cells to different radius of curvatures. Simple cell sizes were 40 (blue color), 80 (red color), 100 (green color) and 120 pixels (black color). = (10,20,25,30). AR (aspect ratio) = (1.15,2,3,4). WR (width ratio) = 2.5 for all cells. Gain c = (0.7,0.8,1,2). Responses were normalized for the range [0,1].
Endstopping.
(A) Model complex cell. (B) Structure of model endstopped cell. (C) Response of the model endstopped cells to different radius of curvatures. Simple cell sizes were 40 (blue color), 80 (red color), 100 (green color) and 120 pixels (black color). = (10,20,25,30). AR (aspect ratio) = (1.15,2,3,4). WR (width ratio) = 2.5 for all cells. Gain c = (0.7,0.8,1,2). Responses were normalized for the range [0,1].
Model endstopped cells
Endstopped - also known as hypercomplex - neurons respond to contours, both real and illusory [42]. A more recent study [43] has found that although V2 neurons are mainly selective for angles and corners, these neurons also showed submaximal responses for bars. Model endstopped cells result from the difference between a simple cell and two displaced complex cells [44]. At this point, our model diverges strongly away from formulations in the previous works cited above. When simple and complex cells are combined at the same orientation we can distinguish between degrees of curvature. Through the use of model complex cells at different orientations with respect to the simple cell, we can obtain the sign of the curvature. These two model neuron types are explained next.
Model cells discriminant to the degree of curvature
This model endstopped cell is the neural convergence of a model simple neuron and two displaced model complex neurons selective for the same orientation as follows (Figure 2B):, and are the gains for the center and displaced cells. , and are the responses of the center and the two displaced cells. is a rectification function, where any value less than 0 is set to 0. is:This sigmoidal function - whose parameter values are given in the methods section - scales responses to highly intense stimuli. Displaced cells are shifted 1/2 of their receptive field size in the direction of their prefered orientation. The center simple cell has an excitatory effect while the two complex cells (at the top and bottom in Figure 2B) have an inhibitory effect, which are wider than the center cell, following [45], [46]. This design follows the work of [28], [30], [47] and [44], [45], [48], [49].Thanks to this configuration of simple and complex cells, we obtain a coarse estimation of curvature such that different curvatures can be discriminated into classes. Figure 2C shows how this type of cell can discriminate among different degrees of curvature. The plot shows how arcs of different radius provide different responses from this type of cell depending on the size of the component simple and complex cells. The scales of the simple and complex neurons that are combined in the configuration of endstopped cells play an important role in this curvature discrimination as it is shown in Figure 2C. Different neuronal sizes provide a different response to different degress of curvature. The model endstopped smallest neuron (Figure 2C blue plot, simple cell size 40 pixels) is selective for very high curvatures, while the largest model enstopped neuron (Figure 2C black plot, simple cell size 120 pixels) was selective to very broad curvatures, in-between scales (sizes of 80 and 100 pixels) provide preferred responses to intermediate curvatures (red and green plots). Note that this configuration also has maximal responses to bars of a specified length (that of the simple cell at the center) as it is the case of real endstopped cells as well. Also note that the choice of these sizes, and even the number of sizes or scales in the model overall, are at the discretion of the modeler so that the space of visual contours addressed by the model are best fit by the scales represented.
Model cells selective to the sign of curvature
Apart from the degree of curvature, an additional contour characteristic that V2 cells seem to encode is the sign of curvature [28], [50]. Through the local information available to endstopping we may compute the sign of curvature. Here, in contrast to the curvature model cells, each displaced complex cell has a different orientation to the simple cell, and the two model complex cells are oriented at opposite signs (e.g. 45 and 135 for the 0 model endstopped neurons) (Figure 3). A hint regarding this concept was first proposed by [30], which is extended here to all orientations and used on curvatures.
Figure 3
Model endstopped cell selective for curvature sign.
For one sign of curvature, a curve excites the model simple excitatory cell at the center but curves falling into the region of the model complex inhibitory cells reduces the response of the model endstopped cell. A similar curve of the opposite sign passes only through the excitatory region (model simple cell), the curve having no inhibition effect (or a very low inhibition) on the overall response of the model endstopped cell since it is not, or is barely, falling on the model complex cell receptive fields (Figure 3).Two types of model sign cells are used. These different signs are obtained by changing the order of the displaced subtracted neurons.where , and are the gains for the center and displaced cells as before. , and are the responses of center and displaced cells. The difference here is that the displaced cells are at different orientations of the preferred center simple cell, for the positive sign model endstopped neuron, the displaced model complex neuron d1 is at 45, while the model complex component d2 is at 135. For the negative sign model endstopped cell, the order is the opposite. For best results, these model cells required larger receptive field overlap than their degree of curvature endstopped model cells counterpart (see methods).
Model local curvature cells
This type of cell is the result of the combination of the responses from the two types of model endstoped cells (degree and sign of curvature), e.g. a model curvature cell that is selective for broad curvatures whose sign is positive as opposed to a model cell also selective for broad curvatures whose sign is negative. Through this neural convergence of model endstopped cells discriminative to the degree of curvature and the ones to the sign of curvature, we obtain twice the number of curvature classes. For example, if we have four types of model endstopped cells, through the use of the sign of curvature of those cells we obtain eight curvature classes.where denotes the response of a neuron tuned to angle , curvature r and sign s. n is the number of model endstopped cell types, is the response of the model endstopped cell i and
are the responses of the model sign selective endstopped neurons. In the realization of our model i = {1, 2, 3, 4} and n = 4 (see Material and Methods). This equation is read like: If the value of is greater than , has the same value as the model curvature endstopped cell, otherwise, contains that value and is 0. For the case where the response from endstopped cells is small, a high response from a model orientation simple cell means the contour is a straight line, so its curvature is set to 0. is computed at each location.
Model shape cells
V4 cells are quite sensitive to shape and less sensitive to spatial position [51]. Experiments in area V4 [52] and TEO [53], [54] of the macaque monkey seem to point to a strategy of recognition of objects by parts. In the case of V4 and TEO, those parts would be local curvatures [52], [54]–[56]. The response to a shape could correspond to the response of the local curvatures of the object. In TEO, some components of local curvatures excite the neuron, and others inhibit its response [54].Neurons in areas V4 and TEO share similar characteristics regarding shape analysis [54], [56] and selectivity [57]. Although similar, TEO neurons show a higher degree of complexity than V4 neurons [54]. Our model shape neurons mimic that curvature by parts representation of shapes and silhouettes but are slightly more complex than just the curvatureangular position coding proposed by [56] for V4 neurons since they are not only selective to curvatures at angular positions but also to the distance of the curvature element to the center of the shape. This conveys more information regarding the contour element. A shape would be different if the curvature is far away from the shape center or near the shape center even though its angular position is the same. We thus make use of both components to better describe the position of the curvature element than just one of them (angular position) as proposed in [56].Our model shape cells integrate the responses from a population of model local curvature neurons to encode a shape. The proposed response of a model's shape neuron at location x is:where denotes the response of a model local curvature cell tuned to angle , curvature r and sign s at location x, and is a gaussian weight centered at (x and are in polar coordinates). max selects the maximum reponse from the local curvature over all angles, since the importance is on the responses to curvatures from curvature neurons, not their orientation at this level of the architecture. A model shape neuron will respond to a shape, and depending on how close the stimulus is to its selectivity (controlled through - see Materials and Methods), its response will be stronger or weaker. Total response of a shape neuron is the summation over all p locations:
Response of a model shape neuron in curvature space
The model shape neuron of Figure 4A has a response depending on how close the stimulus is to its curvature-by-parts selectivity (Figure 4A). In the figure, the model neuron is selective to a sharp curvature at the top left. This neuron would respond maximally when that feature is present at that specific location, but it would respond also to a broader curvature at that location with a lower value and would have a small response to a very broad curvature or a straight line.
Figure 4
Shape-selective neuron.
(A) Shape-selective neurons respond to different curvatures at different positions. The response is maximal when those curvatures are present at their selective positions (red). If they are in nearby positions the neuron provides some response as well (orange and yellow). (B) Shape-selective neuron tuning profile for location and curvature. (C) Shape neuron response to different stimuli, maximum response is to the stimulus at the top (value 1).
Shape-selective neuron.
(A) Shape-selective neurons respond to different curvatures at different positions. The response is maximal when those curvatures are present at their selective positions (red). If they are in nearby positions the neuron provides some response as well (orange and yellow). (B) Shape-selective neuron tuning profile for location and curvature. (C) Shape neuron response to different stimuli, maximum response is to the stimulus at the top (value 1).Model shape neurons exhibit band-pass tuning for curvature information. Their responses achieve a peak at a specific curvature, then decay providing a decreasing response for curvature values of increasing distance. No response is provided for curvatures very far from the optimal. The model shape neuron in this example is then selective for those model endstopped neurons that respond strongly to sharp curvatures at that position. Since a model endstopped neuron with a high response to a sharp curvature has also some response to a slightly broader type of curvature, model shape neurons will not provide a binary response but a range or responses depending on the distance between curvatures in curvature space (Figure 4B,C).
Response of a model shape neuron based on curvature locations
Features (curvatures) comprising the model shape neuron are weighted with respect to a factor (Equation 8) depending on how close the desired curvature is to the desired position (Figure 4A). Continuing with the example of a neuron selective for a sharp curvature at the top left, this model neuron will have a high response to any stimuli that contain such sharp curvature at that position, but some response will still be elicited in a nearby position, e.g. a sharp curvature at the top mid-left, but no response will be obtained for a sharp curvature present at far away positions (e.g. the sharp curvature is at the bottom) (Figure 4B).The curvatures that fall into the preferred cell's positions are considered in their full value (red in Figure 4A), but if they fall close, they are weighted in a Gaussian manner depending on how far from the preferred position they are (orange and yellow in Figure 4A).This is encoded using polar coordinates [52], that is, the radial distance to the center of the model shape neuron and its angular position.
Representational adequacy
In the words of Pasupathy and Connor [52]: The population code for shape has to accomodate the virtual infinity of possible objects as well as the variability of a given object's retinal image. Our model shape neuron has the capability of representing that virtual infinity of objects: If we consider that our stimuli are within 400400 pixel images, for the bin size selection used in the experiments below (see Material and Methods) this gives a total of 1,800 possible curvature parts inside a model shape neuron receptive field. In the case of only 8 curvature classes, when we consider any possible combination of curvature/location, our model can represent a maximum of 14,400 (approximately 10 to the power of 86400) possible configurations of stimuli. In practice, one might take into account Gestalt properties such as continuity, proximity and others, and that number can be reduced to reflect only realizable configurations. The point here is that this representation is sufficiently rich to enable coding of a wide variety of shapes and task knowledge or learning through developmental experience will help determine the relevant subset for a given task domain.
Comparison with biological neurons from area V4
Here we compare the performance of the model shape neurons with neurons in area V4 of the macaque's visual cortex from the same study on which our shape cells are based. For most cells in area V4 of the macaque, shapes evoking strongest responses are characterized by a consistent type of boundary configuration at a specific position within the stimulus [56]. We show that this behavior is compatible with the model shape-selective neurons constructed as explained previously.Pasupathy and Connor [56] recorded the responses of 109 neurons to 366 different shapes. Each cell in the sample responded to a variety of very different shapes. No cell displayed a response pattern that could be characterized in terms of a single type of global shape. However, for most cells the effective stimuli showed some degree of shape consistency at one position. In other words, these cells were tuned for boundary configuration in one part of the shape.In order to demonstrate the plausibility of our shape neurons and the hypothesis that curvature and shape may be encoded through endstopping, we study the behavior of the model shape neurons by comparing their responses against real neuron responses. We compared the responses from 75 - those cells where the shape consistency was more clear (see Material and Methods) - out of the 109 neurons recorded by Pasupathy and Connor's group. Data from real neurons to achieve this set of experiments was kindly provided by Dr. Anitha Pasupathy.We first compared the responses from our shape-selective neurons with the four examples from [56]. We start with Figure 2 from [56] (our Figure 5). Real V4 neuron responses are on the left (stimuli within circles), our model shape neuron equivalent responses are on the right (stimuli within squares). Each row on both cases contains stimuli consisting of 2 shapes (one after the other) rotated in steps of 45. This is the stimulus set used by [56]. Each stimulus is represented by a white icon drawn within a circle (Pasupathy and Connor's results) or within a square (model shape neuron responses) representing the unit receptive field. The darker the background behind the icon, the higher the response of the neuron is to that shape, this applies both to Pasupathy and Connor's neuron recording and our model shape neuron.
Difference between the model's Shape-selective neurons and 75 real cells responses from area V4.
For both cases we can see that there are only a few model shape neurons with over 20% error, most of the differences between the model and that of real cells fall in the range 10–20%. Average error for all model shape neurons was 16.95% for the center of the model neuron (stdev = 12.61) and almost the same when using the centroid of the shape (error = 16.98%, stdev = 12.25). This shows that even for such a large number of neurons the model performs well and the difference between the response of the model shape-selective neurons and that of real cells is small. In direct comparison with the only other work to compare performance to this dataset of neural responses, our method significantly outperforms [25].
Discussion
We have presented a model of 2D shape representation - 2DSIL - that follows the structure and behavior of the visual cortex. Building on past conjectures that one of the functional roles of endstopped cells may be to aid in shape analysis [28], [47], [55], we set out to define a biologically plausible computational model of shape representation. Here, we tested this hypothesis and have shown how a hierarchy starting from basic simple neurons, that combine into complex neurons and further endstopped neurons provide local curvature neurons that are selective for shape stimuli.The main element in this architecture is that of the model shape-selective neuron, that represents curvature parts in a curvatureposition (radial and angular) domain. The possible number of shapes that may be represented by our model shape neurons is very large, considering the limited types of neurons at each level of the architecture. Even though the primate visual system and our model have the capability to represent a virtual infinity of shapes, the way to handle the large but finite number of shapes in our world may be achieved through learning, selecting those configurations of curvatures relevant to recognize the shapes around us based on our visual experiences. Since the representation has the capability to represent any shape, a new shape can be easily incorporated into the system. The model supports a recognition by parts strategy, in which the parts are curvature values at different positions, as suggested also by Connor's group [54]. We have compared the response of our model shape neurons with 75 real neurons from [56]. The results obtained by the model are very similar to those of the neurons, and accomplished without any learning or classifier method.Our model local curvature neurons do not provide an exact value of curvature but can discriminate among degrees of curvature (e.g. 4 in Figure 2C). This was done using a starting point where V1 is composed of neurons of different sizes. Through the use of different neuronal sizes and the integration of model simple neurons into model complex neurons we obtained model endstopped neurons able to discriminate between degrees of curvature, from very sharp to very broad (Figure 2C). It is important to note as well that these neurons do not provide a binary response for a given curve; model local curvature neurons provide a band-pass curvature filtering, with the highest response to the selective curvature and a decaying response that is inversely proportional to the curvature distances in curvature space. The response of model endstopped and curvature neurons over a range of curvatures have a Gaussian shape (Figure 2C), as well as a model shape neuron (Figure 4B). There is no maximum selection from the responses from early areas, so, no information is lost when ascending the hierarchy in a feedforward direction. However, there is a max selection computation at the last stage of the hierarchy, the shape cells, where it no longer affects further decisions, in keeping with Marr's Principle of Least Commitment [27]. We consider that any attentive selection, filtering or bias [58]–[61] in such a hierarchy would occur top-down and leave that for future work. Interestingly, our model of sign endstopped neurons could provide a foundation to deal with the border-ownership problem. Sign endstopped neurons could represent opponent channels [62], and this combined with feedback modulation through a model of attention (e.g. [58]) would further support a model such as the one presented by [33] on border ownership.Our model may be considered as a major extension of the works [9] and [28], [30]. In a similar work, Serre, Cadieu and colleagues construct a hierarchical representation with a first layer computing oriented edge responses. This is followed by a maximum response selection layer that feeds a pooling stage that groups spatial piece-wise linear elements. This strategy - borrowed from Fukushima's NeoCognitron [9] - is repeated for each layer of the hierarchy. Curved lines are thus approximated by linear pieces and there is no direct computation of curvature of any form. Another related model, based on excitatory connections is the one proposed by Amit [20]. One important difference (among others) between our model and these types of models is that we use inhibition for curvature representation through endstopping instead of purely excitatory components. Inhibitory flankers as proposed in our model have been strongly supported by neurophysiological studies [39], [44]–[46], [48], [49] and since our goal is to test the computational embodiment of these neurophysiological results, this necessarily figures prominently in our model. It is an aspect that is considered of great importance by neuroscientists [46], and surprisingly has been neglected in models to date.Given that it seems accepted that the visual system computes increasingly abstract quantities as a signal ascends the visual processing hierarchy, are those quantities computed by applying the same computation and thus neural convergence alone suffices to achieve abstraction, or, is it truly necessary to include more sophisticated computations layer by layer? This is not easy to answer in the general case. However, we can point to one important instance that supports the latter position. In our previous work where we look at motion processing [63], we found that simple neural convergence did not suffice. We needed to include a layer of neurons selective to the spatial derivative of velocity, a much more complex construct. This is supported by neurophysiology in monkey [64], [65] and by our own fMRI human studies [66]. Similarly, for shape representation, although our approach is also based on a hierarchical set of computations, we deploy different processes at each layer, not simply repetitions of the same process. Those different processes are intended to reflect the reality of the different neural computations in the visual cortex. Our approach is distinct in that we perform a direct computation of curvature and the sign of curvature. We develop that computation using well documented neural computation types that include not only oriented simple cells and complex cells (as the pooling layer of others is intended to capture) but also endstopped cells, curvature cells, and curvature sign cells. These naturally provide a sufficient basis for the definition of shape cells, a basis that not only mirrors neurophysiological reality of the visual cortex better, but also provides a richer substrate for shape definition than piecewise linear components. This is the first model of shape representation (to the best of our knowledge) to include aforementioned cells in intermediate layers departing from the near universal previous use of Fukushima's S and C types of cells.The role of learning from examples also differs between our work and those mentioned. Although a statistical learning approach such as that employed by Serre, Cadieu and colleagues for all of the layers of their processing hierarchy except for the first, is valuable when there is no other option, we show that in the case of the successive representations, namely those computed by endstopped and curvature cells, there is now sufficient knowledge to directly model these cells and to do so with a significantly high degree of fidelity. Learning is not required if the appropriate representations are selected in the first place.Although this paper does not address object recognition directly, it may provide important contributions to elements that may advance the state-of-the-art. In a previous paper [34], we connected the 2DSIL representation to a recognition system and compared its performance in object recognition tasks with several other systems including benchmark systems. Our system performed well beating other systems in several categories while maintaining comparable performance in others. Following previous authors such as Zucker and Marr, we advocate that deeper understanding of visual processes in humans and non-human primates can lead to important advancements in perceptual theories and computational systems.With the model introduced in this paper we follow the steps of early theories of vision [9], [26], [67] and propose how to – following the philosophy of those influential works – take modeling to a next stage by incorporating new intermediate layer computations hoping future works will continue building on these hierarchies aimed at modeling the visual cortex.
Materials and Methods
We used the same stimuli created for [52], [56]. In order to construct the stimuli, a Matlab program was provided by Dr. Anitha Pasupathy. The stimuli were constructed combining convex and concave boundary elements to form closed shapes. Boundary elements include sharp convex angles, and medium and high convex and concave curvatures. The combination of these boundary elements gave rise to 49 different stimuli. Stimuli were composed of white edges against a black background, the inside was black as well but it is shown in our figures (Figures 5, 6, 7, and 8) as white-filled for illustration purposes. For the experiment, stimuli were those 49 shapes but rotated to 8 orientations (some only 2 or 4 due to redundancies) in 45 increments to give a total of 366 different shapes. Stimuli are shown in Figures 5, 6, 7, and 8.Experiments were run on Matlab in a Mac G5 PowerPC. The input to the model is a gray-value image. Images used are 400400 pixels, a shape would span 300300 pixels and correspond to the stimuli used in the aforementioned study. For our experiments, we used 12 orientations (0, 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165) and 4 different sizes for model simple cells, this gives a total of 48 types. Size of V1 model simple neurons are 40, 60, 88 and 120 pixels, their corresponding values for AR are 0.7, 1.4, 2.15 and 3 respectively, WR is 2.5 for all model neurons.For the integration into model endstopped neurons, the values of gain c (Equation 4) for displaced neurons were from the smaller to the larger cell: = = {1.5, 1.25, 1, 3}, = 1 for all centre cells. For the chosen parameters, cells respond (90% of their maximum value) to the following ranges of curvature radius: 6 to 11, 25 to 52, 48 to 77 and 140 to 301 pixels. Refer also to Figure 2C for an example on how the selection of these parameters (size, AR, WR and gain) affect neuronal curvature selectivity. The parameters for the rectification function (Equation 5) were = 0.01 and is the maximum response of the set of neurons for a given scale divided by 8.5, a factor that provided a good normalization approximation for this sigmoidal saturation function. The displacement values for model endstopped neurons selective to degrees of curvature was 1/2 the size of the simple neuron component along its preferred orientation. Displacements for the model sign endstopped neurons were from smaller to larger: 1/5 the size, 1/4 the size, 1/4 the size and 2/5 the size along the orientations stated in Equation 6. The 4 types of model endstopped neurons and the curvature direction selective neurons lead to eight curvatures. In order to obtain the aforementioned parameter values, a program designed to evaluate different parameter values was created. The target of this program was to obtain values that would provide neurons able to separate different degrees of curvature, providing a graph such as the one shown in Figure 2C.Neuron responses were provided by Dr. Anitha Pasupathy for the comparison with model shape neuron responses. In their influential study [56], the results from 109 neurons are reported for 366 different stimuli. We compared with 75 out of those 109 neurons, the reason for this as well as the detailed process are explained next. Due to the enormous range of shape representation of the model, we needed to select (or isolate in neurophysiological terms) a subset of model shape neurons that would correspond to their 109 subset of V4 biological counterparts recorded in [56]. In order to do this, we created new stimulus images and stored their model shape representation. The way these stimuli were created was by superimposing the stimuli for which the biological neuronal responses were on the 70% maximum percentile (e.g. Figure 10A). This simple process would give us an insight on the selectivity of the 109 biological neurons and is similar to the way [56] analyzes the selectivity of 4 neurons (Figures 2, 4, 5 and 8 on that work). That is, we consider the stimuli that maximize the neuron responses to reach the conclusion that a neuron is selective to some type of curvature at a specified position, e.g. in Figure 10A it is clear that this biological neuron is selective for a sharp curvature at the top-right, flanked by a broad concavity that ends in a medium convexity on the left side of the stimulus. Then, this image would be modified such as to only keep the relevant curvatures. This is the stimulus used to isolate our model shape neurons. This would also be the stimulus for which the model shape neuron response is maximum.
Figure 10
How the features for isolating a Shape neuron are obtained.
See text.
How the features for isolating a Shape neuron are obtained.
See text.We repeated this process for the 109 biological neurons, but 34 of them failed to provide any clear insight on their selectivity using the present process (e.g. Figure 10D). On the other hand, the other 75 provided a very clear picture on their selectivities (Figure 10A–C). We then stored the representation (Figure 4A) of each shape model neuron for the stimuli created the way explained above. The weights (Equation 8) are derived from the responses from the eight curvature classes model neurons at their different positions. Model shape neuron's receptive fields were organized into angular-radial bins (Figure 4A) of 10 pixels for radial values and /45 for angular values. A smaller bin size did not provide significantly better results while having a much higher computational load.For each one of the model shape neurons isolated this way, we recorded responses for each of the 366 stimuli in [56]. Each response is normalized in the 0–1 range using the maximum response for the created stimulus as explained before as the normalization factor. These normalized responses were compared to their biological counterparts (responses already normalized) and the absolute value of the difference was computed for each one of the 366 stimuli. Figure 9 shows the results of these averaged values with their corresponding standard deviations for each neuron.
Authors: Akihiro Eguchi; Bedeho M W Mender; Benjamin D Evans; Glyn W Humphreys; Simon M Stringer Journal: Front Comput Neurosci Date: 2015-08-04 Impact factor: 2.380