Literature DB >> 19412529

Predictive feedback can account for biphasic responses in the lateral geniculate nucleus.

Abstract

Biphasic neural response properties, where the optimal stimulus for driving a neural response changes from one stimulus pattern to the opposite stimulus pattern over short periods of time, have been described in several visual areas, including lateral geniculate nucleus (LGN), primary visual cortex (V1), and middle temporal area (MT). We describe a hierarchical model of predictive coding and simulations that capture these temporal variations in neuronal response properties. We focus on the LGN-V1 circuit and find that after training on natural images the model exhibits the brain's LGN-V1 connectivity structure, in which the structure of V1 receptive fields is linked to the spatial alignment and properties of center-surround cells in the LGN. In addition, the spatio-temporal response profile of LGN model neurons is biphasic in structure, resembling the biphasic response structure of neurons in cat LGN. Moreover, the model displays a specific pattern of influence of feedback, where LGN receptive fields that are aligned over a simple cell receptive field zone of the same polarity decrease their responses while neurons of opposite polarity increase their responses with feedback. This phase-reversed pattern of influence was recently observed in neurophysiology. These results corroborate the idea that predictive feedback is a general coding strategy in the brain.

Entities: Disease Gene Species

Mesh：

Year: 2009 PMID： 19412529 PMCID： PMC2670540 DOI： 10.1371/journal.pcbi.1000373

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Cells in the LGN exhibit striking receptive field dynamics. Besides their well-known center-surround organization, LGN receptive fields are characterized by bright-excitatory (dark-excitatory) regions that become dark-excitatory (bright-excitatory) over time intervals that may be as short as 20 milliseconds [1]–[3]. Biphasic responses have been described not only in the LGN, but seem to be characteristic of neurons in many visual areas. For example, biphasic responses have been observed in primary visual cortex [3],[4], and also in MT, where the optimal stimulation changes from one direction of motion to a 180° reversal in motion preference with time [5],[6]. What computational reason would neurons have to change their preferred stimulus over such short periods of time? Here, we argue that biphasic dynamics naturally follow from neural mechanisms of predictive coding. A longstanding approach to understanding early-level processing has been to consider it in terms of efficient coding of natural images [7]–[10]. Natural images are typically highly correlated in both space and time, and a neural code that ignores these correlations would be very inefficient. It has therefore been postulated that early-level visual processing removes correlations in the input, resulting in a more sparse and statistically independent output. Building along these lines, it has been suggested that early visual areas remove correlations by removing the predictable, and hence redundant, components in their input. For example, the center-surround structure of LGN receptive fields can be explained using predictive coding mechanisms [11],[12]. Because a center pixel intensity value in natural images can often be predicted from its surrounding values, its value can be replaced with the difference between the center value and a prediction from a linear weighted sum of its surrounding values. This decorrelates the neuronal input and removes redundancy in the outputs [7],[13]. Predictive coding may have further value as a general principle that works through interactions between all lower-order and higher-order visual areas [14]–[16]. Low-order and high-order visual areas are reciprocally connected [17], and responses of neurons in these areas are often correlated due to their overlapping receptive fields. To reduce redundancy and decorrelate the visual responses, low-level visual input could therefore be replaced by the difference between the input and a prediction from higher-level structures. Put another way, higher-level receptive fields could represent the predictions of the visual world, while lower-level areas could signal the error between predictions and the actual visual input [14]–[16],[18],[19]. An advantage of feedback interactions over local, within-area computations is that higher-level cortical receptive fields are larger and encode more complex stimuli, therefore allowing for complex predictions about large portions of the visual field. This hypothesis has been shown to account for steady state extra-classical receptive field effects such as end-stopping [14]. Here, we show by simulation that biphasic responses may result from similar interactions with higher-order areas, which remove redundancy by removing the predictable components in their input. We focus on the LGN and V1, for which the feedforward-feedback connectivity structure and bottom-up inputs are fairly well-known. Although responses of LGN cells tend to follow many of the characteristics of their retinal input [20], biphasic responses are stronger in geniculate neurons than in the retinal neurons driving their response [21]. We show that these stronger rebound effects in LGN may result from predictive feedback interactions with area V1. Moreover, after training on natural images, the model exhibits the brain's LGN-V1 connectivity structure, and it displays a phase-reversed pattern of influence of feedback on LGN cells. This phase-reversed pattern of influence was recently observed in neurophysiology [22].

Results

Hierarchical model of predictive coding

The model consists of two layers (Fig. 1). The first layer, which corresponds to part of the lateral geniculate nucleus, consists of on-center and off-center type units, with on-center type units coding for brighter stimulus regions and off-center type units coding for darker regions. The model's next higher level, which corresponds to an orientation column in primary visual cortex, receives input from the model LGN through feedforward connections. After receiving its LGN input, the feedforward V1 receptive field that best matches the input (i.e. the one that makes the most likely prediction) is selected with high probability, and the selected neuron feeds its prediction back to model LGN. The layout of feedback connections follows the structure of feedforward connections, as has been found experimentally [22],[23]. LGN neurons then compute the error between the higher-level prediction and the actual input. This error is sent forward to correct the higher-level prediction, and the entire process is repeated in the next feedforward-feedback cycle. Thus, a feedforward-feedback cycle comprises lower-level error detectors correcting higher-level predictions, and higher-level responses updating the lower-level error signals, similar to some previous models [14],[16],[18],[19],[24]. We assume that a single feedforward-feedback cycle takes around 20 milliseconds, but our results do not critically depend on the value of this parameter (see Methods).

Figure 1

Hierarchical model for predictive coding.

Hierarchical model for predictive coding.

Higher-level coding units attempt to predict the responses of units in the next lower level via feedback connections, while lower-level error detectors signal the difference between the prediction and the actual input. Feedforward connections encode the synaptic weights represented by a matrix . Higher-level units maintain the current estimate of the input signal and convey the top-down prediction to the lower level via feedback connections. Difference detectors compute the difference between current activity and the top-down prediction . Connection weights of the model are adapted to the input by minimizing the description length or entropy of the joint distribution of inputs and neural responses (Methods, see also [25]). This minimizes the model's prediction errors and improves the sparseness of the neural code. Thus, for any given input, the model converges to a set of connection weights that is optimal for predicting that input. The model is trained on image patches extracted from natural scenes, as receptive field properties might be largely determined by the statistics of their natural input [7],[11],[14],[26].

LGN-V1 connectivity structure after training

To characterize V1 model receptive fields, feedforward connection weights from on-center type and off-center type LGN cells coding for the same spatial location are summed for each of the model's 128 V1 cells. These summed weights are shown in Figure 2. This gives an indication of the V1 receptive fields, as V1 responses in the model are linear across their on and off inputs (Methods, equation 7). After training, the receptive fields show orientation tuning as found for simple cells in V1.

Figure 2

Receptive fields of model V1 units after training on natural images.

Receptive fields of model V1 units after training on natural images.

(A) Examples of natural images used for training. The square denotes model V1 receptive field size. (B) V1 receptive fields after training. Plots are scaled in magnitude so that each fills the gray scale, but with zero always represented by the same gray level. Black depicts off-regions in the model V1 receptive field, white depicts on-regions. In Figure 3, the relation between the learned receptive fields in model V1 and the properties of LGN units is further investigated. The figure depicts the connection weights from on-center type cells to a given V1 model neuron, as well as those from off-center type cells to the same V1 neuron. The on- and off-center units are spatially aligned with the on- and off-zones of the model V1 receptive field, as first proposed by Hubel and Wiesel [27] and later confirmed experimentally [22], [28]–[30]. Similar results are found for the connection structure of other V1 model neurons (results not shown). Note that this alignment is not predetermined in the model. The connections are initially random and are adjusted as a consequence of the model's learning rule together with exposure to natural images.

Figure 3

Connection weights after training.

The figure depicts learned connection weights from 64 LGN off-center type units and from 64 LGN on-center type units to one representative V1 unit. Red: connection weights from on-center type cells, blue: connection weights from off-center type cells. Brighter values indicate higher connection weights. The value zero is represented by the color black. The on- and off-center units are spatially aligned with the on- and off-zones of the model V1 receptive field.

Connection weights after training.

Reversal of polarity due to predictive feedback

To ascertain whether biphasic responses can be interpreted as the result of predictive feedback, we first consider a model with non-biphasic inputs (Fig. 4A and 4C). The spatio-temporal response of model on-center type geniculate cells is calculated using a reverse correlation algorithm (Methods, see also [1],[2],[31]). The time course of the model response is shown by a series of receptive field maps calculated for different delays between stimulus and response in Figure 4A. For comparison Figure 4B shows results obtained from on-center type cells in cat LGN [1]. As in cat LGN, model on-center type receptive fields are arranged in center and surround, and the bright-excitatory phase is followed by a dark-excitatory phase. Removing feedback in the model causes the previously biphasic responses to disappear (Fig. 4C), supporting the idea that predictive feedback may be important for rebound effects in neural response profiles. To determine whether predictive feedback can result in geniculate biphasic responses stronger than those in the retina, the model is modified to simulate biphasic retinal inputs (Methods). The temporal response profile of model on-center type cells is obtained using reverse correlation and illustrated in Figure 4D. Predictive feedback interactions cause reversals of polarity in LGN to be more pronounced than the retinal input, as has been observed in physiology [21].

Figure 4

Spatio-temporal response of LGN on-center type cell.

Spatio-temporal response of LGN on-center type cell.

The response was mapped with the reverse correlation algorithm, using either non-biphasic retinal inputs (A–C,E) or biphasic retinal inputs (D). (A) Spatio-temporal response of an on-center type cell in model LGN. Responses were obtained by cross-correlating stimulus and response at the time intervals given below the figures. Red: response to bright stimulus at that location, blue: response to dark stimulus at that location. Note the change in sign after 50 milliseconds. Similar results were obtained for other LGN on-center type units. (B) Spatio-temporal response profile of on-center type cells in cat LGN obtained with the reverse correlation algorithm [1]. (C) The removal of feedback in the model causes the previously biphasic responses to disappear. (D) Temporal response profile of on-center type cell in a model with biphasic retinal inputs. Model activity is normalized by the initial response magnitudes. The biphasic response in LGN is more pronounced in the presence of predictive feedback compared to a situation in which the LGN response is fully determined by biphasic retinal input (for comparison, see e.g. [21]). (E) Average model LGN and V1 representations after reference stimuli consisting of bright stimulus regions have been presented. Black depicts off regions, white depicts on regions. When V1 predictions of the bright reference stimulus arrive in model LGN, they are compared against a new and unexpected stimulus representation. The difference between the predicted bright region and the second stimulus is negative, exciting LGN off-center type cells. Why do biphasic responses appear in the mapped model LGN receptive fields? Recall that reverse correlation uses a large number of white noise stimuli presented in rapid succession, resulting in visual changes much faster than most natural input the system would encode. Consider when a stimulus consisting primarily of bright regions is presented to the model (Fig. 4E). On-center type LGN cells will respond to the onset of this stimulus. On zones in the LGN are linked to on zones of receptive fields in V1, which soon start to increase activation and make predictions. However, by the time that predictions of the first stimulus arrive in lower-level areas, the initial representation of the bright stimulus has been replaced by a second white noise stimulus, and the prediction is compared against a new and unexpected stimulus representation. Any given second white-noise stimulus region can be of either high or low luminance; however, the running average luminance will lie in between. In reverse correlation, predictive processing shows up as a comparison against this running average white-noise stimulation. The predicted bright region is of higher luminance than the average second stimulus, causing off-center type cells to respond to the offset of the bright reference stimulus. Reversals in polarity of model LGN cells are most profound in a small time window after presentation of the reference stimulus but disappear gradually later on. This happens because the initial prediction is dynamically updated to include predictions of stimuli presented after the reference stimulus, bringing new predictions closer to the average white-noise stimulation. Note that reversals in polarity will appear as long as predictions deviate from the average white-noise stimulation; the precise amount of overlap between prediction and stimulus is not critical. These findings suggest a specific pattern of influence of feedback on LGN cells, in which the simple cell off-zones mediate inhibitory influences to off-center LGN cells and excitatory influences to on-center LGN cells. This effect is further investigated and quantified in Figure 5. For all model on- or off-center LGN receptive fields that are aligned over a V1 receptive field region of the same polarity, firing rates decrease due to feedback (Fig. 5, top). Where the overlapping fields are of reversed polarity, there is an increase in firing rate (Fig. 5, bottom). This effect is consistent with recent results from neurophysiology showing that the influence of V1 simple cells on LGN on- and off-cells is phase-reversed [22], and further corroborates the hypothesis that predictive feedback is important in mediating responses of LGN cells.

Figure 5

Effects of feedback on LGN on-center and off-center type cells.

Dashed: probability that the LGN cell coding for this location is active (i.e. response>0) in the first feedforward sweep of the model when a V1 region will subsequently be selected that codes for the same or opposite polarity, blank: probability that the LGN cell is active after the first feedforward-feedback pass (i.e. when feedback exerts its effect) when a V1 region is selected that codes for the same or opposite polarity. Red: on-center type cell, blue: off-center type cell. The results were obtained after presenting the model with a white-noise stimulus every 100 milliseconds for a total of 10,000 images. Comparing initial feedforward activity with subsequent LGN activity shows that feedback has a negative influence on cells of similar sign, and a positive influence on cells of opposite sign. Thus, the probability that an LGN off-cell is active increases after feedback from a V1 on-region (lower right, blue), and the probability that an LGN on-type cell is active decreases after feedback from an on-region in V1 (upper left, red).

Effects of feedback on LGN on-center and off-center type cells.

Discussion

We have shown that a model that encodes an image using predictive feedforward-feedback cycles can learn the brain's LGN-V1 connectivity structure, in which the structure of V1 receptive fields is linked to the spatial alignment and properties of center-surround cells in the LGN [27],[28],[30]. In addition, the model captures reversals in polarity of neuronal responses in LGN [1]–[3] and a phase-reversed pattern of influence from V1 simple cells on LGN cells [22]. These results corroborate the idea that the visual system uses predictive feedforward-feedback interactions to efficiently encode natural input. The natural visual world is dominated by low temporal frequencies [32], causing the retinal image to be relatively stable over the periods of time considered in the model. However, under certain conditions visual inputs do change rapidly—more rapidly than most natural inputs the system would encode. One such situation is brought about by reverse correlation mapping, in which a white-noise stimulus is presented at a temporal frequency much higher than the temporal dynamics of natural visual input. In such a case, higher-level predictions of the reference stimulus are compared against a new and unexpected white-noise stimulus, which emerges in the responses of on-center type model cells as a bright-excitatory phase followed by a dark-excitatory phase. We hypothesize that similar predictive coding mechanisms are at play in geniculate and cortical cells whose spatiotemporal response profiles also display reversals in polarity over short periods of time [3],[5],[6],[33]. Geniculate cells receive many more feedback connections (around 30%) than feedforward connections (around 10%) [34]. In addition, it is known from both cat and monkey neurophysiology that feedback signals from primary visual cortex affect the response properties of LGN cells ([35],[36] see for review [34],[37],[38]). For example, feedback from V1 seems to affect the strength of center-surround interactions in LGN ([34],[37],[38] but see [39]). Geniculate cells respond strongly to bars that are roughly the same size as the center of their receptive field, but responses are attenuated or eliminated when the bar extends beyond the receptive field center [35],[40]. Neurons that respond in this way are also known as end-stopped neurons, and this property has been found to depend on feedback signals from primary visual cortex [35],[40]. A previously published model has successfully captured end-stopping and some other modulations due to surround inhibition in terms of predictive feedback [14]. Like here, the predictive feedback model was trained on natural images, in which lines are usually longer rather than shorter, resulting in higher-level receptive fields optimized for representing longer bars. Thus, when presented with shorter bars, the model's higher-level units could not predict their lower-level input, and error responses in the lower-level neurons could not be suppressed. This resulted in more vigorous responses for shorter bars than for longer bars, similar to end-stopping in geniculate neurons [35]. Here, we have extended the predictive feedback framework to also include rebound effects in LGN. Although responses of LGN cells tend to follow many of the characteristics of their strongest retinal inputs [20], biphasic responses are stronger in geniculate neurons than in the retinal neurons driving their response [21], suggesting that the cells may receive further sources of input. Our simulations indicate that these stronger components in the biphasic geniculate response may result from predictive feedback interactions, similar to end-stopping and some other inhibitory effects [34],[37],[38]. Reversals in polarity have also been described for several cortical areas that do not receive direct input from biphasic retinal cells [3]–[6] and that are too complex to result from retinal responses (e.g., for orientation or motion [4],[6]). We hypothesize that these response profiles result from similar mechanisms of predictive feedback. Indeed, neurophysiological studies have ascribed some cortical rebound effects to network interactions [4],[6], and computational work similarly suggests the involvement of cortical projections [41]. Our work extends these studies by providing a computational explanation for these effects. Previous authors have suggested mechanisms that could account for the stronger biphasic responses in the LGN, such as higher LGN thresholds [42], inhibitory feedback from the perigeniculate nucleus, or feedforward inhibition [20]. In addition, a variety of models has been proposed to account for orientation selectivity in early visual cortex [9], [43]–[45]. However, our model differs from earlier work in that it offers a computational, not a mechanistic, explanation of these early visual response properties [8]. Furthermore, the framework provides a parsimonious explanation for a number of neurophysiological effects. For example, the model not only captures biphasic responses and orientation selectivity, but also a phase-reversed influence of cortical feedback to LGN, as well as end-stopping and some other modulations due to surround inhibition in V1 and LGN [14]. Reversals in polarity have also been described for many areas in cortex [3]–[6]. Consistent with our interpretation, neurophysiological studies have ascribed some of these biphasic responses to network interactions [4],[6]. While a number of mechanisms can be proposed to account for many of these effects individually, the computational explanation proposed here offers a simple, unifying framework in which to understand all of these effects. While predictive coding could work through local computations between neighboring neurons (providing a possible explanation for biphasic responses in the retina [13],[46]), we argue that it would be computationally advantageous to (also) implement predictive operations through feedback projections. Feedback mechanisms allow the system to remove redundancy and decorrelate visual responses between areas. Moreover, higher-level cortical receptive fields are larger and encode more complex stimuli, allowing for predictions of higher complexity and larger regions in the visual field. A strong prediction of the model would therefore be that biphasic responses are attenuated in the LGN, or absent in cortex, without cortical feedback. The model uses subtractive feedback to compare higher-level predictions with actual lower-level input. In physiology, this process could be mediated by, for example, local inhibitory neurons in the same-level area together with long-range excitatory connections from the next higher-level area (for a similar connectivity scheme, see e.g. [34]). Here we have shown that these comparisons can result in reduced as well as enhanced lower-level responses. Support for a dependence of some inhibitory and excitatory effects on top-down feedback has been found in neurophysiology [47]–[50]. We have considered only two hierarchical levels but the model could easily be extended to include more cortical areas. In an extended model, each level would have both coding units and difference detecting units (for a concrete example, see Figure 2 in [15]). Coding units would not only predict their lower-level input but also convey the current estimate to the error detectors of the same-level area. Error detectors then signal the difference between their input and its prediction to the next higher level, until finally one prediction becomes dominant in the entire system. The model suggests that more accurate higher-level predictions, or equivalently greater overlap between the visual input and higher-level receptive fields, results in reduced activity of lower-level difference detectors. In contrast, when top-down predictions in the model are off, lower-level difference detectors enhance their responses. Consistent with this, recent fMRI studies have shown that increased activity in higher-level areas accompanies decreased responses in lower-level areas, presumably due to feedback processing [51]–[53]. Other imaging studies have found supporting evidence for predictive feedback as well [54],[55]. The predictive feedback framework suggests that higher-level coding neurons enhance their activity when stimuli are presented that match their receptive field properties (rather than decrease [56]), in accordance with neurophysiology [27], [57]–[59]. Subsequent feedforward-feedback passes refine the initial predictions, until finally the entire system settles on the mostly likely interpretation. Coding an image using recurrent cycles of processing incurs a cost in time, but has the advantage of resolving error signals in even the earliest sensory areas. Moreover, recurrent cycles of processing are less costly in time when the system forms a hierarchy. The most likely predictions are computed first and sent on to higher-level processing areas, which do not have to wait to begin their own computations, enabling initial rapid gist-of-the-scene processing and subsequent feedforward-feedback cycles to fill in the missing details. In accordance with this, psychophysical studies have shown that some global aspects of a stimulus can be detected very rapidly while detailed aspects are reported later in time [60]–[62], and neurophysiological studies have found dynamic changes in tuning properties of both lower-level and higher-level neurons consistent with these ideas ([33],[63],[64] see also [65],[66]). It is likely that top-down signals serve many computational functions, of which the sparsifying mechanism suggested here is but one. Also, the effect of top-down signals in general is not best described as either inhibitory or excitatory. The effect can be of many different kinds, depending on the specific computational goals the top-down interaction fulfills. For example, it has been proposed that higher-level areas feed anticipatory signals back to earlier areas, enhancing neural responses to a stimulus that would otherwise fall below threshold [67]. This is probably best implemented as an excitatory interaction between higher-level anticipation and the incoming lower-level signal. Feedback could also act as a bayesian style prior [68]–[70], and adapt early level signals according to different sensory or behavioral conditions [71]. In view of these previously suggested roles of feedback, the mechanism presented here should be regarded as a relatively low-level mechanism that automatically creates sparser solutions, rather than the more flexible, higher-level mechanism that sets specific behavioral demands. In conclusion, rebound effects are a common feature in reverse correlation mapping and have been described in several visual areas. For example, biphasic responses have been found for neurons in LGN [1]–[3], as well as in primary visual cortex [3],[4], and reversals in selectivity in the motion domain have also been found for neurons in MT [5],[6]. Here, we have explained these biphasic sensory responses in terms of predictive feedback. Moreover, we have shown that a model that processes its inputs using predictive feedback can learn the brain's LGN-V1 connectivity structure [27],[28],[30] and captures a phase-reversed pattern of influence of feedback [22]. These results corroborate the idea that predictive feedback is a general principle used by the visual system to efficiently encode its natural inputs.

Methods

Model dynamics

Here we briefly discuss model equations and parameters. The interested reader is referred to [25]. The input is obtained from 768 by 768 pixel black-and-white images of natural surroundings (Fig. 2A), filtered with a zero-phase whitening/lowpass filter [7],[9]:where the tilde represents the fourier transform in 2D, and = 300 cycles/image. The initial activation values of on-center type cells are obtained from the filtered images by subtracting the mean and taking positive values: The initial activation values of off-center type cells are obtained from negative pixel values that are rectified: We limit the LGN input into the model's second layer to 8 by 8 (64) on-center type cells and 8 by 8 (64) off-center type cells. This LGN ‘patch’ is randomly selected from the filtered image and represented as a single vector (128 values), unless described otherwise. At any given time step in the model, either the on-center cell or its off-center counterpart coding for the same spatial location is active. The second layer, which would correspond to an orientation column in cortical area V1, is represented by 128 units. In the language of the model, the synaptic weights between LGN and V1 units form basis vectors that represent the preferred stimulus of the model V1 neurons (Fig. 6A). Model V1 predicts its LGN input as a linear combination of active basis vectors, where the weighting coefficient of each basis vector is given by the response of its corresponding V1 neuron:in which is a stochastic noise process. The 64 on-type and 64 off-type connections are combined to form a single basis vector of 128 values, unless described otherwise. To choose the V1 neurons that best predict a given input (i.e., neurons that are active), we use a modified version of the matching pursuit algorithm [72]. Matching-pursuit uses the least number of basis vectors or equivalently the least number of active V1 neurons to accurately predict its input [72]. In a deterministic version of the algorithm, the first vector is chosen as the vector that minimizes

Figure 6

Predictive feedback using the matching pursuit algorithm.

Predictive feedback using the matching pursuit algorithm.

(A) Model receptive fields (RFs) are represented as basis vectors. When the input (blue vector) arrives in model V1, a basis vector that has high overlap with the input is selected (red vector ). The V1 basis vector weighted by its response is then subtracted from the input and the selection-subtraction process is repeated on the residual LGN representation (green vector). (B,C) Model V1 prediction and residual LGN representation over time. (B) The blue vector represents the actual input, its prediction is represented by the red dot. (C) Black depicts off-regions, white depicts on-regions. A prediction is obtained by summing the selected V1 basis vectors weighted by their response. LGN difference detectors represent the error between V1 prediction and actual input. (B,C) Subsequent feedforward-feedback cycles refine the higher-level prediction of the input. Without predictive feedback, the model would represent just the initial, less accurate prediction. At the next time step, an additional vector is chosen that minimizesand so on, where the response of the vectors is given byand the vectors are subdivided into 64 on-values and 64 off-values. This deterministic version was modified for the learning algorithm to be optimal in terms of sparseness [25]. Thus, after learning not only do the V1 units make more accurate predictions, but also few of them participate in any given prediction. The modification is as follows: at each time step, a V1 unit is selected randomly from the distributionwhere is the index of the unit in the iteration, is given by equation (7), is the Heaviside function, is a temperature parameter and is a normalizing term given by: Thus, the probability with which a unit is selected increases when its receptive field structure better predicts the lower level input and its response is higher. To guarantee optimality, the response of a selected unit should be drawn from a normal distribution with small variance [25], but the effect of this process is negligible so that in practice the response of a neuron in the modified model is given by equation (7). The selected basis vector weighted by its neuronal response is then subtracted from the input. This subtraction represents the predictive feedback process assumed to take place between V1 and LGN, and is essential to the predictive coding theory: it reduces output redundancy by allowing LGN difference detectors to represent only the error signal, and no longer the predicted components now represented in V1. Furthermore, it optimizes the higher-level predictions (or equivalently, minimizes the prediction error). For a further discussion of the effectiveness of this approach, see [25]. The LGN on-center type cell and LGN off-center type cell code for two opposite sides of the same dimension. Thus, wherever the subtraction process results in negative LGN values, or equivalently when the value crosses the dividing edge between the two dimensional sides, the value is rectified and added to the activation value of the unit's counterpart (i.e. negative value of on-center unit is rectified and added to activation value of off-center unit coding for the same location, and vice versa). The feedforward-feedback cycle is then repeated on the residual input so that after iterations the residual LGN input is given by In words, the number of active V1 neurons increases at each time step in the model, and their combined prediction is subtracted from the actual LGN input. To see how combining active V1 units results in better predictions of the input, consider when a bar of a certain orientation is presented to the model, together with a small dot next to the bar. A likely first prediction could capture the bar, but not necessarily also the dot. However, this initial prediction will be updated in later feedforward-feedback cycles when other V1 neurons become active, so that the neurons will together represent both the bar and the dot. This is what combining V1 neurons represents, a prediction that is updated in each cycle to better capture the V1 input (see also Fig. 6B and 6C). We assume model V1 responses to be stable and non-decaying over the time scales considered. Note that feedback connections in the model follow the alignment of feedforward connections, which has been observed experimentally [22],[23]. We assume delays of 20 milliseconds for predictive feedback effects to set in, which complies with the usual response lag of about 10 milliseconds in the next higher-order visual area [73],[74] together with similar conduction times for feedforward and feedback connections [75]. Although this estimate is likely at the longer end of the range [76], the model does not critically depend on the value of this parameter and similar results would have been obtained using shorter time delays. As the model does not incorporate neural structures earlier than the LGN, we added 30 milliseconds to the data points in the figures to account for the delays before the LGN [73].

Learning rule

To enhance the sparseness of the neural code and better capture the input statistics, basis vectors are updated in each feedforward-feedback cycle. This is done by minimizing the description length (MDL) or entropy of the joint distribution of inputs and neural responses [25]. MDL chooses as the best model the one that enables the shortest code length for both prediction error and model parameters in bits (in base ) [77],[78]. As a consequence, it favors accurate, yet sparse, neural representations. However, the same learning rule can also be obtained from the gradient of the error function for the feedforward-feedback cycle [25]:where is the learning rate, in which is initially equal to 1 and increases by 1 every 1000 image patches. V1 basis vectors are normalized, and initialized using 64 random values with zero mean: positive values are taken as the initial values of the entries coding for on-type inputs, negative values are rectified and taken as the initial values of the off-type entries of the basis vector, the remaining 64 entries are initialized with value zero. Initializing all 128 entries of the basis vector with random values gives similar results. Because neuronal receptive field properties might be largely determined by the statistics of their natural input [7],[11],[14],[16], the basis vectors are trained on 10,000 image patches extracted from 16 natural scenes. The model receptive fields are trained using static natural images (for receptive fields obtained from time-varying natural inputs see [45], and see [15] for a description within the predictive coding framework). The model is allowed to process each image for four feedforward-feedback cycles, which corresponds to around 100 milliseconds (see above). Parameter values are kept constant throughout all simulations. While natural input is not expected to be completely static, even over the short time scales considered here, we argue that this is not highly relevant to the general results of the simulation. The critical factor for our results is that reverse correlation stimuli are presented on a time scale much faster than the system's typical inputs. The temporal dynamics used in reverse correlation mapping are, indeed, much faster than most natural inputs, as the natural visual world is dominated by low temporal frequencies [32]. Including time-varying V1 receptive fields and/or training the system on natural dynamical images presented for seconds would therefore result in similar biphasic responses, given natural dynamics that are slower than the temporal dynamics of reverse correlation. For example, Kiebel et al. [16] use a predictive feedback framework to model temporal receptive fields, resulting in similar lower-level error signals when the actual input deviates from the expected temporal dynamics.

Interpretation of model activation values

To model the spatiotemporal response of LGN neurons, we have to relate model scalar activation values directly to biology. Several possibilities exist; we emphasize, however, that the model does not explicitly implement any of these interpretations and is, in fact, very general. One possibility is that activation values of model units stand for the mean firing rate of a group of functionally similar neurons in physiology [79]–[82]. This interpretation is corroborated by most neurophysiological studies that show a correlation between increased firing rates and behavioral measures. The model is also compatible with the idea that neurons code information in the precise timing of their spikes. This view has received increasing attention over the past years [83]–[85] as more data is becoming available suggesting that spike timing may be important for neural communication [86]–[89]. Specifically, scalar activation values in the model can be interpreted as indicating the time from spike arrival to a reference signal, taking this small delay in time between a single spike and the reference as the information carrier [85],[90]. Direct neurophysiological evidence for this signaling strategy has been obtained in hippocampus [91],[92] and in human somatosensory system [93]. Here, we interpret the model's activation values along the spike timing lines and take scalar activation values of model units as information transmitted using one spike. We do not implement the reference signal explicitly but argue that the model could easily be amplified to take this into account. We emphasize, however, that we obtain similar results if we interpret the model's activation values as firing rate (i.e. indicating a number of spikes).

Reverse correlation

The spatio-temporal response of on-center type units in model LGN (Fig. 4) was calculated using a reverse correlation algorithm ([1],[2],[31] see also C.-I. Yeh et al., Soc. Neurosci. Abstr. 163.21, 2008). This algorithm presents neurons with an image sequence of white-noise stimuli (where each pixel is either maximally black or white with equal probability), records the stimuli that precede a response (i.e. the activity value of the recorded model neuron is above zero) by a fixed time interval, and then averages across the recorded stimuli; the receptive field is defined as this average stimulus that preceded the response by the given time interval. Following [1], the model was presented with a new white-noise stimulus every 20 milliseconds for a total of 50,000 presentations. We used time intervals between response and stimulus of 30, 50, 70 and 90 milliseconds (see also Fig. 4A), and obtained a composite receptive field by taking the difference of responses to bright stimuli and dark stimuli [2]. We used a spike timing interpretation of model responses and recorded stimuli when activation values in the model were above zero. However, we obtain similar results if we interpret model activation values as firing rates and weigh recorded stimuli by the unit's activation value.

Biphasic retinal input

In the modified model, the initial feedforward input into on-center type LGN cells is obtained using equation 2, and then this bottom-up input is updated in each feedforward-feedback cycle as follows:where is the number of feedforward-feedback cycles in the model. The retinal input into off-center type LGN cells is initialized using equation 3, and then updated in a similar fashion in each feedforward-feedback cycle. When mapped with reversed correlation, this results in retinal input that is biphasic in structure (Fig. 4D).

83 in total

Review 1. Looking back: corticothalamic feedback and early visual processing.

Authors: Javier Cudeiro; Adam M Sillito
Journal: Trends Neurosci Date: 2006-05-19 Impact factor: 13.837

Review 2. Always returning: feedback and sensory processing in visual cortex and thalamus.

Authors: Adam M Sillito; Javier Cudeiro; Helen E Jones
Journal: Trends Neurosci Date: 2006-05-19 Impact factor: 13.837

Review 3. Building the gist of a scene: the role of global image features in recognition.

Authors: Aude Oliva; Antonio Torralba
Journal: Prog Brain Res Date: 2006 Impact factor: 2.453

4. Learning receptive fields using predictive feedback.

Authors: Janneke F M Jehee; Constantin Rothkopf; Jeffrey M Beck; Dana H Ballard
Journal: J Physiol Paris Date: 2006-10-25

5. Functional alignment of feedback effects from visual cortex to thalamus.

Authors: Wei Wang; Helen E Jones; Ian M Andolina; Thomas E Salt; Adam M Sillito
Journal: Nat Neurosci Date: 2006-09-17 Impact factor: 24.884

Review 6. Brain states: top-down influences in sensory processing.

Authors: Charles D Gilbert; Mariano Sigman
Journal: Neuron Date: 2007-06-07 Impact factor: 17.173

7. The proactive brain: using analogies and associations to generate predictions.

Authors: Moshe Bar
Journal: Trends Cogn Sci Date: 2007-06-04 Impact factor: 20.229

8. Interactions between higher and lower visual areas improve shape selectivity of higher level neurons-explaining crowding phenomena.

Authors: Janneke F M Jehee; Pieter R Roelfsema; Gustavo Deco; Jaap M J Murre; Victor A F Lamme
Journal: Brain Res Date: 2007-04-12 Impact factor: 3.252

Predictive feedback can account for biphasic responses in the lateral geniculate nucleus.

Introduction

Results

Hierarchical model of predictive coding

Hierarchical model for predictive coding.

LGN-V1 connectivity structure after training

Receptive fields of model V1 units after training on natural images.

Connection weights after training.

Reversal of polarity due to predictive feedback

Spatio-temporal response of LGN on-center type cell.

Effects of feedback on LGN on-center and off-center type cells.

Discussion

Methods

Model dynamics

Predictive feedback using the matching pursuit algorithm.

Learning rule

Interpretation of model activation values

Reverse correlation

Biphasic retinal input

Review 1. Looking back: corticothalamic feedback and early visual processing.

Review 2. Always returning: feedback and sensory processing in visual cortex and thalamus.

Review 3. Building the gist of a scene: the role of global image features in recognition.

4. Learning receptive fields using predictive feedback.

5. Functional alignment of feedback effects from visual cortex to thalamus.

Review 6. Brain states: top-down influences in sensory processing.

7. The proactive brain: using analogies and associations to generate predictions.

8. Interactions between higher and lower visual areas improve shape selectivity of higher level neurons-explaining crowding phenomena.

9. A fast, reciprocal pathway between the lateral geniculate nucleus and visual cortex in the macaque monkey.

10. Extra-classical receptive field effects measured in striate cortex with fMRI.

Review 1. Theory of mind: a neural prediction problem.

2. Predictive coding as a model of cognition.

3. Preparatory delta phase response is correlated with naturalistic speech comprehension performance.

Review 4. The default-mode, ego-functions and free-energy: a neurobiological account of Freudian ideas.

5. Dynamic coding of signed quantities in cortical feedback circuits.

6. A corticothalamic circuit model for sound identification in complex scenes.

7. Dual roles for spike signaling in cortical neural populations.

Review 8. Reflections on agranular architecture: predictive coding in the motor cortex.

Review 9. Neural Elements for Predictive Coding.

10. Predictive Coding: A Possible Explanation of Filling-In at the Blind Spot.