Literature DB >> 32123102

Active efficient coding explains the development of binocular vision and its failure in amblyopia.

Samuel Eckmann1,2,3, Lukas Klimmasch4,2, Bertram E Shi5, Jochen Triesch1,2.   

Abstract

The development of vision during the first months of life is an active process that comprises the learning of appropriate neural representations and the learning of accurate eye movements. While it has long been suspected that the two learning processes are coupled, there is still no widely accepted theoretical framework describing this joint development. Here, we propose a computational model of the development of active binocular vision to fill this gap. The model is based on a formulation of the active efficient coding theory, which proposes that eye movements as well as stimulus encoding are jointly adapted to maximize the overall coding efficiency. Under healthy conditions, the model self-calibrates to perform accurate vergence and accommodation eye movements. It exploits disparity cues to deduce the direction of defocus, which leads to coordinated vergence and accommodation responses. In a simulated anisometropic case, where the refraction power of the two eyes differs, an amblyopia-like state develops in which the foveal region of one eye is suppressed due to inputs from the other eye. After correcting for refractive errors, the model can only reach healthy performance levels if receptive fields are still plastic, in line with findings on a critical period for binocular vision development. Overall, our model offers a unifying conceptual framework for understanding the development of binocular vision.
Copyright © 2020 the Author(s). Published by PNAS.

Entities:  

Keywords:  accommodation; active perception; amblyopia; efficient coding; vergence

Mesh:

Year:  2020        PMID: 32123102      PMCID: PMC7084066          DOI: 10.1073/pnas.1908100117

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


Our brains are responsible for 20% of our energy consumption (1). Therefore, organizing neural circuits to be energy efficient may provide a substantial evolutionary advantage. One means of increasing energy efficiency in sensory systems is to attune neural representations to the statistics of sensory signals. Based on this efficient coding hypothesis (2), numerous experimental observations in different sensory modalities have been explained (3, 4). For instance, it has been shown that receptive field properties in the early visual pathway can be explained through models that learn to efficiently encode natural images (5, 6). These findings have extended classic results showing that receptive field shapes in visual cortex are highly malleable and a product of the organism’s sensory experience (7–10). Importantly, however, animals can shape the statistics of their sensory inputs through their behavior (Fig. 1). This gives them additional degrees of freedom to optimize coding efficiency by jointly adapting their neural representations and behavior. This idea has recently been advanced as active efficient coding (11, 12). It can be understood as a generalization of the efficient coding hypothesis (2) to active perception (13). Along these lines, active efficient coding models have been able to explain the development of visual receptive fields and the self-calibration of smooth pursuit and vergence eye movements (11, 12). This has been achieved by optimizing the neural representation of the sensory signal statistics while simultaneously, via eye movements, optimizing the statistics of sensory signals themselves for maximal coding efficiency.
Fig. 1.

The action–perception loop in active efficient coding. The sensory input is obtained by sampling input signals from the environment (e.g., via eye movements). A percept is formed by neural encoding, which drives the selection of actions and thereby, shapes the sampling process. Therefore, perception depends on both neural encoding and active input sampling. Classic efficient coding theories do not consider the active sampling component (orange).

The action–perception loop in active efficient coding. The sensory input is obtained by sampling input signals from the environment (e.g., via eye movements). A percept is formed by neural encoding, which drives the selection of actions and thereby, shapes the sampling process. Therefore, perception depends on both neural encoding and active input sampling. Classic efficient coding theories do not consider the active sampling component (orange). In our formulation of active efficient coding, we maximize coding efficiency as measured by the Shannon mutual information between the retinal stimulus represented by retinal ganglion cell activity and its cortical representation under a limited resource constraint. The mutual information can be decomposed aswhere is the entropy of the retinal response and is its conditional entropy given the cortical representation. Prior formulations focused on minimizing the conditional entropy only (6). is a measure of the information that is lost (i.e., not represented in the cortical encoding). The limitation of this prior formulation is that this quantity can be minimized by simply reducing , the entropy of the retinal response, since . Thus, an active agent could minimize by, for example, defocusing or closing the eyes altogether. In the free energy and predictive processing literature, this is known as the “dark room problem” (14, 15). In our formulation, maximizing is achieved by maximizing and minimizing simultaneously, thus avoiding this problem. We demonstrate this approach through a concrete model of the development of binocular vision, including the simultaneous calibration of vergence and accommodation control. Indeed, newborns have difficulties bringing objects into focus and cannot yet verge their eyes properly (16). How infants manage to self-calibrate their control mechanisms while interacting with their visual environment is currently unknown. Additionally, in certain medical conditions, the calibration of vergence and accommodation control is impaired. For example, anisometropia describes a difference in the refractive error between the eyes. If not corrected early during development, this can evoke amblyopia: a disorder of the developing visual system that is characterized by an interocular difference in visual acuity that is not immediately resolved by refractive correction. Amblyopia can be associated with a loss of stereopsis and in severe cases, leads to monocular blindness (17). Furthermore, vergence and accommodation eye movements are either less accurate or completely absent (18, 19). Although there have been recent advances in the treatment of amblyopia (20, 21), existing treatment methods do not lead to satisfactory outcomes in all patients. This is aggravated by the fact that treatment success strongly depends on the stage of neural circuit maturation (20). When young patients are still in a critical period of visual cortex plasticity (10), they often recover after refractive errors are corrected, while adults mostly remain impeded (22, 23). The above findings are all readily explained by our model. Under healthy conditions, our model develops accurate vergence and accommodation eye movements. When the model is impaired due to strong monocular hyperopia, we observe that an amblyopia-like state develops. We show that this is due to the abnormal development of binocular receptive fields in the model and demonstrate that healthy binocular vision is regained as the receptive fields readapt after refraction correction. However, if the sensory encoding is no longer plastic and does not adapt to the changes in the visual input statistics, suppression prevails. Overall, our model suggests that coding efficiency may provide a unifying explanation for the development of binocular vision.

Model Formulation

The active efficient coding model that we propose has a modular structure (Fig. 2). A cortical coding module models the learning of an efficient representation of the binocular retinal representation by minimizing the conditional entropy . At the same time, an accommodation reinforcement learning module maximizes , and a vergence reinforcement learning module minimizes (Eq. ). All three modules are plastic and adjust simultaneously in response to changes in the sensory input statistics. The exact choice of the algorithms is not important for the model to function. In fact, different cortical coding and reinforcement learning models have been successfully applied in previous active efficient coding models (24, 25).
Fig. 2.

Model architecture, with solid arrows representing the flow of sensory information and dashed arrows representing the flow of control commands. Sampled input images with given defocus blur and disparity are whitened at the retinal stage and contrast adjusted through an interocular suppression mechanism based on the recent history of cortical activity (Left). Thereafter, they are encoded by a set of binocular neurons that represents the cortical encoding (Center). The cortical population activity serves as input to two reinforcement learning modules (Right) that control vergence and accommodation commands. Details are in .

Model architecture, with solid arrows representing the flow of sensory information and dashed arrows representing the flow of control commands. Sampled input images with given defocus blur and disparity are whitened at the retinal stage and contrast adjusted through an interocular suppression mechanism based on the recent history of cortical activity (Left). Thereafter, they are encoded by a set of binocular neurons that represents the cortical encoding (Center). The cortical population activity serves as input to two reinforcement learning modules (Right) that control vergence and accommodation commands. Details are in . Our model is presented with a textured planar object. The object is sampled by the two eyes for 10 iterations, which constitute one fixation. After each fixation, a new object is presented at a new random distance. The retinal images are rendered based on the positions of the accommodation, vergence, and object planes (Fig. 3). The inputs are whitened, contrast adjusted by an interocular suppression mechanism, and then, binocularly encoded by a population of cortical neurons. The reinforcement learning modules control the retinal input of the next iteration by shifting accommodation and vergence planes along the egocentric axis (Fig. 3).
Fig. 3.

Input sampling from the environment. (A) Object (obj.) position, vergence (verg.) distance, and left (l.) and right (r.) accommodation (acc.) distance are represented as different plane positions. (B) Abstraction of A. The gray horizontal bar indicates the range where objects are presented during the simulation and also, indicates the fixation range (i.e., possible vergence plane positions). Horizontal axes indicate reachable accommodation plane positions for the left (light blue) and right (green) eyes. Note that, when the stimulus is placed at, for example, position 0, it cannot be focused by the right eye in this example. Accommodation and vergence errors are measured as the distance between the respective planes and the object position in a.u. (C) Position range of accommodation and vergence planes under different conditions. Same scheme as in B. (D) Examples of retinal input images for different plane position configurations. For better visibility, disparity shifts and defocus blur are increased compared to actual values.

Input sampling from the environment. (A) Object (obj.) position, vergence (verg.) distance, and left (l.) and right (r.) accommodation (acc.) distance are represented as different plane positions. (B) Abstraction of A. The gray horizontal bar indicates the range where objects are presented during the simulation and also, indicates the fixation range (i.e., possible vergence plane positions). Horizontal axes indicate reachable accommodation plane positions for the left (light blue) and right (green) eyes. Note that, when the stimulus is placed at, for example, position 0, it cannot be focused by the right eye in this example. Accommodation and vergence errors are measured as the distance between the respective planes and the object position in a.u. (C) Position range of accommodation and vergence planes under different conditions. Same scheme as in B. (D) Examples of retinal input images for different plane position configurations. For better visibility, disparity shifts and defocus blur are increased compared to actual values.

Cortical Encoding.

In our model, the cortical population activity represents the binocular “percept” based on which behavioral commands are generated (compare Fig. 1, Upper). The cortical encoding comprises two efficient coders: one for fine details in the foveal region and one for the periphery that receives a low pass-filtered input. Both are implemented using the standard matching pursuit algorithm (26) (). To find a set of neurons that best encodes the input, instead of minimizing the conditional entropy directly, we minimize an upper bound (i.e., the average of the encoding error ) (27):where is an estimate of the input based on the activities of cortical neurons with receptive fields ( has details):In every iteration, both activities and receptive fields adjust online to minimize the encoding error (). Thus, the receptive fields reflect the stimulus statistics (28) and resemble those of simple cells in the visual cortex (6) ().

Vergence Learning.

The vergence reinforcement learner also aims to minimize the conditional entropy (i.e., the encoding error ). Therefore, vergence movements are favored that produce visual input that can be most accurately encoded with the current set of receptive fields. This leads to a self-reinforcing feedback cycle (Fig. 4). If inputs of a certain disparity can be encoded particularly well, the vergence learner will try to produce visual input that is dominated by this disparity. This will cause even more neurons to become selective for this disparity and make the encoding of this disparity even more efficient (Fig. 4). Thus, an initial bias for, say, small disparities can be magnified until the model always favors input with small disparities and most neurons are tuned to small disparities.
Fig. 4.

The feedback loop of active efficient coding and reward dependencies. (A) Positive feedback loop of active efficient coding. An efficiently encoded stimulus is preferred over other stimuli (acting). Therefore, the sensory system is more frequently exposed to the stimulus, and neural circuits adapt to reflect this overrepresentation (statistical learning), which further increases encoding efficiency (neural coding). (B) Normalized (norm.) vergence (verg.) reward for different disparity distributions and neural populations (averaged over 300 textures). The receptive fields of 300 neurons adapted to different distributions of input disparities with color-coded SDs. Gray indicates unbiased/uniform, pink and purple indicate Laplacian distributed, and dark blue indicates model trained under healthy conditions. In each case, stimuli seen at zero absolute (abs.) disparity produce the highest average (avg.) vergence reward (i.e., the most efficient encoding). This advantage is even more pronounced when small disparities have been encountered more frequently (i.e., for smaller ). (C) Normalized accommodation (acc.) reward for different whitening filters. Zero-blur input yields the highest accommodation reward independent of the size of the whitening filter. However, smaller whitening filters induce a stronger preference for focused input. The smallest filter (dark blue) was used for the simulation ( has details).

The feedback loop of active efficient coding and reward dependencies. (A) Positive feedback loop of active efficient coding. An efficiently encoded stimulus is preferred over other stimuli (acting). Therefore, the sensory system is more frequently exposed to the stimulus, and neural circuits adapt to reflect this overrepresentation (statistical learning), which further increases encoding efficiency (neural coding). (B) Normalized (norm.) vergence (verg.) reward for different disparity distributions and neural populations (averaged over 300 textures). The receptive fields of 300 neurons adapted to different distributions of input disparities with color-coded SDs. Gray indicates unbiased/uniform, pink and purple indicate Laplacian distributed, and dark blue indicates model trained under healthy conditions. In each case, stimuli seen at zero absolute (abs.) disparity produce the highest average (avg.) vergence reward (i.e., the most efficient encoding). This advantage is even more pronounced when small disparities have been encountered more frequently (i.e., for smaller ). (C) Normalized accommodation (acc.) reward for different whitening filters. Zero-blur input yields the highest accommodation reward independent of the size of the whitening filter. However, smaller whitening filters induce a stronger preference for focused input. The smallest filter (dark blue) was used for the simulation ( has details).

Accommodation Learning.

The entropy of the retinal response (Eq. ) is maximized via the accommodation reinforcement learning module. For this, H(R) is approximated by the squared activity of the retinal representation (). We assume the spatial frequency tuning of retinal ganglion cells to be static and thus, independent of the distribution of spatial frequencies in the retinal input as suggested by deprivation experiments (9, 29, 30). However, the exact receptive field shape does not matter for the model to favor focused input (Fig. 4).

Suppression Model.

Interocular suppression is thought to be a central mechanism in amblyopia. We use a basic interocular suppression model (Fig. 5) to describe dynamic contrast modulation based on the ocular balance of the input encoding. If mostly right (left) monocular receptive fields are recruited during cortical encoding, the contrast of the left (right) eye input becomes suppressed in subsequent iterations. This is in agreement with reciprocal excitation of similarly tuned neurons in visual cortex (31, 32). At the same time, the total input energy is kept balanced to ensure similar activity levels for monocular and binocular visual experience as observed experimentally at high contrast levels (33–35) (). This leads to a self-reinforcing suppression cycle when left and right eye inputs are dissimilar (Fig. 5).
Fig. 5.

Interocular suppression model. (A) When mostly right (left) monocular neurons are activated to encode an input image patch, the right (left) contrast unit () is excited, and the left (right) retinal image is suppressed in subsequent iterations. Color hue indicates response selectivity for left eye (blue) or right eye (green). Dashed (solid) lines indicate inhibitory (excitatory) interactions. Connection strength is represented by line thickness. We model interocular suppression as being scale specific (i.e., when the high-resolution foveal region of the left eye is suppressed, the low-resolution periphery of the left eye may still provide unattenuated input) (). (B) Feedback cycle of the suppression model. Disparate inputs to both eyes lead to preferential recruitment of monocular neurons, which results in interocular suppression inducing competition between the eyes. This impedes precise vergence eye movements and exacerbates disparate input (purple; left cycle). On a slower timescale, receptive fields (RFs) adapt to suppression by becoming more monocular, which makes future suppression more likely (red; right cycle). Dashed lines indicate feedback that affects future input processing.

Interocular suppression model. (A) When mostly right (left) monocular neurons are activated to encode an input image patch, the right (left) contrast unit () is excited, and the left (right) retinal image is suppressed in subsequent iterations. Color hue indicates response selectivity for left eye (blue) or right eye (green). Dashed (solid) lines indicate inhibitory (excitatory) interactions. Connection strength is represented by line thickness. We model interocular suppression as being scale specific (i.e., when the high-resolution foveal region of the left eye is suppressed, the low-resolution periphery of the left eye may still provide unattenuated input) (). (B) Feedback cycle of the suppression model. Disparate inputs to both eyes lead to preferential recruitment of monocular neurons, which results in interocular suppression inducing competition between the eyes. This impedes precise vergence eye movements and exacerbates disparate input (purple; left cycle). On a slower timescale, receptive fields (RFs) adapt to suppression by becoming more monocular, which makes future suppression more likely (red; right cycle). Dashed lines indicate feedback that affects future input processing.

Results

Active Efficient Coding Leads to Self-Calibration of Active Binocular Vision.

In the healthy condition without refractive errors, the model learns to perform precise vergence and accommodation eye movements (Fig. 6 and ). The object is continuously tracked by the eyes (), and most neurons develop binocular receptive fields (). This is not due to artificially introducing a bias for zero disparity during initialization of the model. When receptive fields are adapted to a uniform input disparity distribution, the encoding of zero-disparity input is still most efficient (Fig. 4). Due to the overlap of the left and right eye visual fields, the information contained in the retinal response is smallest for zero disparity when the images projected onto the two eyes maximally overlap. Thus, even an unbiased encoder that can encode inputs of all disparities equally well will tend to encode zero-disparity input more accurately because such input contains less information. This bootstraps the positive feedback loop of active efficient coding (Fig. 4 and ).
Fig. 6.

Model performance. (A) Average (avg.) absolute (abs.) vergence (verg.) and accommodation (acc.) errors of the left (l.) and right (r.) eye after training under healthy and anisometropic conditions. The dashed line indicates the expected average vergence error when accommodation planes are moved randomly under healthy conditions. (B) Vergence performance of the formerly anisometropic model after correction of all refractive errors at iteration (vertical gray line). The (dotted) solid line indicates the model with (non-)plastic receptive fields (RFs). The initial increase in the vergence error is due to the recalibration of the reinforcement learning module. (C) Histogram of foveal RFs binocularity as measured by the right monocular (monoc.) dominance before and after refractive error correction ( has details).

Model performance. (A) Average (avg.) absolute (abs.) vergence (verg.) and accommodation (acc.) errors of the left (l.) and right (r.) eye after training under healthy and anisometropic conditions. The dashed line indicates the expected average vergence error when accommodation planes are moved randomly under healthy conditions. (B) Vergence performance of the formerly anisometropic model after correction of all refractive errors at iteration (vertical gray line). The (dotted) solid line indicates the model with (non-)plastic receptive fields (RFs). The initial increase in the vergence error is due to the recalibration of the reinforcement learning module. (C) Histogram of foveal RFs binocularity as measured by the right monocular (monoc.) dominance before and after refractive error correction ( has details). Accommodation performance becomes highly accurate as well. This is due to the edge-enhancing nature of retinal ganglion cell receptive fields. With their center-surround shape, they are selective for sharp contrasts and respond poorly when out of focus input is presented (Fig. 4). For sharper input, the range of responses across the population and thus, the response entropy increase (). Furthermore, accurate accommodation is achieved without obvious sign cues: in our simplified visual environment, defocus blur is independent of whether an eye focuses behind or in front of the object. Also, neither chromatic nor other higher-order aberrations are provided in our model, which could help to steer focus in the right direction (36, 37). Instead, the model learns to infer the sign of defocus from disparity cues (). We further examined this entanglement under abnormal input conditions (e.g., when simulated lenses were placed in front of the eyes of an agent trained under healthy conditions). We find the responses of the model to qualitatively agree with experimental results (38, 39) ().

Anisometropia Drives Model into Amblyopic State.

To test how the model evolves under abnormal rearing conditions, we simulated an anisometropic case by adding a simulated lens in front of the right eye such that it became hyperopic and was unable to focus objects at close distances (Fig. 3, Middle). Therefore, unlike the healthy case, where neither eye is favored over the other, in the anisometropic case, the impaired eye receives systematically more defocused input. Cortical receptive fields reflect this imbalance and become more monocular, favoring the unimpaired eye (compare Fig. 6, Lower and , Upper Center). The combined effect of imbalanced input and adapting receptive fields results in a vicious cycle that drives the model into an amblyopia-like state (Fig. 5). Foveal input from the hyperopic eye becomes actively suppressed (), while the low-resolution peripheral input is unaffected and still provides binocular information such that a coarse control of vergence is maintained (Fig. 6 and ). This results in stable binocular receptive fields in the periphery (, Lower Center), which provide enough information for coarse stereopsis as observed in experiments (40–42). Accommodation adapts such that the stimulus is continuously tracked with the unimpaired eye (). When both eyes were similarly impaired but with opposite sign of the refractive error (Fig. 3, Bottom), receptive fields still become more monocular, but no eye is preferred (, Upper Right). As a result, the relatively more myopic eye is used for near vision, the relatively less myopic eye is used for distant vision (), and the respective other defocused eye is suppressed. At intermediate ranges, the stimulus history determines which eye gets recruited (). This configuration is similar to monovision, which results from a treatment method for presbyopia, where the ametropic condition is achieved via optical lenses or surgery (43).

Early but Not Late Refractive Correction Rescues Binocular Vision.

To test if the anisometropic model can recover from amblyopia upon correction of the refractive error, we first trained a fully plastic model under anisometropic conditions until it had converged to the amblyopic state. Then, all refractive errors were corrected. When the receptive fields were fixed after the refractive error was corrected, receptive fields remained monocular, and the model did not recover from the amblyopic state. Instead, it maintained a high level of vergence error (Fig. 6). In contrast, when receptive fields remained plastic and could adapt to the changed input statistics, the vergence error decreased (Fig. 6), and the strong suppression of the formerly impaired eye was restored to lower values (). This was due to a shift from monocular to binocular receptive fields as a result of the changed input statistics (Fig. 6). This is in line with a large body of evidence suggesting that limited cortical plasticity in adults prevents recovery from amblyopia after the correction of refractive errors (10, 20, 21). Furthermore, it predicts that therapies reinstating visual cortex plasticity should be effective.

Discussion

We have shown how simultaneously optimizing both behavior and encoding for efficiency leads to the self-calibration of active binocular vision. Specifically, our model, which is based on the active efficient coding theory, accounts for the simultaneous development of vergence and accommodation. Previous computational models have focused on either the development of disparity tuning or the development of vergence and accommodation control but have failed to capture their rich interdependence (28, 44–46). For example, a model by Hunt et al. (28) explained how disparity tuning may emerge through sparse coding and how alternate rearing conditions could give rise to systematic differences in receptive field properties, but their model completely neglected vergence and accommodation behavior. Conversely, others have presupposed populations of cells readily providing error signals for vergence and accommodation control without explaining their developmental origin (44, 46). Therefore, previous models have failed to explain how the visual system solves the fundamental “chicken and egg” of disparity tuning and eye movement control: the development of fine disparity detectors requires the ability to accurately focus and align the eyes, which in turn, relies on the ability to detect fine disparities. Our active efficient coding model solves this problem through the positive feedback loop between disparity tuning, which facilitates the control of eye movements, and improved accommodation and vergence behavior, which enhances the representation of fine disparities. In the end, the tuning properties of sensory neurons reflect the image statistics produced by the system’s own behavior (47). Under healthy conditions, the model develops accurate vergence and accommodation eye movements. For a simulated anisometropia, however, where one eye suffers from a refractive error while the other eye is unaffected, it develops into an amblyopia-like state with monocular receptive fields and loss of fine stereopsis. Recovery from this amblyopia-like state is only possible if receptive fields in the model remain plastic, matching findings of a critical period for binocular development (10). An important mechanism in amblyopia is interocular suppression. The simple logic behind the model’s suppression mechanism is that every neuron suppresses input that is incongruent to its own receptive field (34, 35). This implementation proved sufficient to account for the development of an amblyopia-like state, with mostly monocular receptive fields in the representation of the fovea. More sophisticated suppression models could be incorporated in the future (48, 49), but we do not expect them to change the conclusions from the present model. Future work should focus on understanding the principles of interocular suppression within the active efficient coding framework. A topic of current interest is how suppression develops during disease and treatment (e.g., with the standard patching method) (50). A better understanding of the role of suppression in amblyopia could lead to improved therapies in the future. While we have focused on the development of active binocular vision, including accommodation and vergence control, our formulation of active efficient coding is very general and could be applied to many active perception systems across species and sensory modalities. Active efficient coding is rooted in classic efficient coding ideas (2–6), of which predictive coding theories are special examples (51–53). Classic efficient coding does not, however, consider optimizing behavior. Friston’s active inference approach does consider the generation of behavior in a very general fashion. There, motor commands are generated to fulfill sensory predictions. In our formulation of active efficient coding, motor commands are learned to maximize the mutual information between the sensory input and its cortical representation. This implies maximizing the amount of sensory information sampled from the environment and avoids the problem of deliberately using accommodation to defocus the eyes or closing the eyes altogether to make the sensory input easy to encode and/or predict.

Materials and Methods

Input Image Rendering.

We used 300 grayscale-converted natural images of the “manmade” category from the McGill Database (54). One image was presented at a random position (Fig. 3 ) during one fixation (i.e., 10 subsequent iterations) before the next image and position were randomly selected for the next fixation. For every distance unit between vergence and object plane, the left (right) eye image was shifted 1 px (pixel) to the left (right). This resulted in a disparity of per distance unit. A Gaussian blur filter was applied to the left and the right eye image, where the SDs depended linearly on the distance between object and accommodation planes. The 1-a.u. distance equals 0.8 px of SD (). Errors were measured in arbitrary units (a.u.) as distances to the object plane. For the foveal (peripheral) scale, two retinal images of size () pixels were cropped from the center of the original image ().

Input Processing.

The left and right retinal input images were whitened as described by Olshausen and Field (6) ( has details). For each scale, images were cut and merged into 81 binocular patches of size 2 8 px, where the peripheral scale was down-sampled with a Gaussian pyramid by a factor of four (). The whitened retinal patches were normalized to zero mean intensity and subsequently contrast adjusted via the interocular suppression mechanism (see below). The contrast-adjusted patches were encoded with the matching pursuit algorithm (26). For each patch, we recruited of 300 cortical neurons to most efficiently encode the image. The cortical response was determined via an iterative process, where the activities of neurons that were not selected for encoding remained zero. In the first encoding step, , the neuron with the receptive field that was most similar to the retinal input was selected:where the similarity between a receptive field and retinal input was measured with the scalar product . When selecting the next neuron, all information that was already encoded by the first neuron is subtracted from the original input :where is the residual image after the first encoding step and is the generalized residual after the th encoding step. Subsequent neurons are selected based on the similarity of their receptive fields with the residual according to Eq. . By greedily selecting the neuron with maximum response in each encoding step, the reconstruction error is minimized (i.e., coding efficiency is maximized). After encoding, all receptive fields were updated through gradient descent on and normalized to unit length. Thus, their tuning reflects the input statistics ():where is a learning rate. Each patch of the foveal scale was encoded by a subset of the same 300 neural receptive fields. For the peripheral scale, a separate set of 300 neurons was used for encoding (). At the beginning of each simulation, all receptive field weights were drawn randomly from a zero-mean Gaussian distribution and subsequently normalized to unit norm.

Reinforcement Learning.

We used two separate natural actor critic reinforcement learners (55, 56) with identical architectures to control the accommodation planes and the vergence plane. Possible actions correspond to shifts in the respective plane positions: (compare Fig. 3). The state information vector comprises the patch-averaged squared responses of the cortical neurons:where is the activity of neuron after encoding patch and is the number of patches per scale. Therefore, is spatially invariant due to averaging over patches and does not depend on the polarity of the input due to the squaring. This is similar to the properties of complex cells in primary visual cortex (57, 58). After they were normalized to unit norm, the peripheral and foveal scale state vectors are concatenated into the combined state vector of size . The next action is chosen with probability , and the state value is estimated as :The weights and are updated via approximate natural gradient descent on an approximation of the temporal difference error (algorithm 3 in ref. 56):where is the accommodation or vergence reward and is the temporal discounting factor. At the beginning of each simulation, all network weights were initialized randomly.

Approximating Mutual Information.

Rewards for the reinforcement learners are based on the squared response after whitening and cortical encoding. Together, this can be understood as an empirical estimate of the mutual information between the whitened response and cortical response :where the conditional entropy is upper bounded by the reconstruction error (Eq. and ). Due to the “energy conservation” property of the matching pursuit algorithm (26), the energy of the residual image is equal to the energy of the retinal representation minus the energy of the cortical representation (): that is,Therefore, we take the difference between cortical and retinal response energy as the reward for the vergence learner. For the accommodation learner, we maximize the entropy of the whitened retinal response . We take each entry of as an independent sample of the same underlying random variable and estimate the entropy of its probability distribution. The distribution is well approximated by a Laplace distribution, independent of the level of blur in the input (). Therefore, we approximate with the entropy of a Laplace distribution with the same SD :Since the expected squared activity of the retinal representation is equal to the variance , it is also a monotonic function of the entropy . More generally, since the retinal response has bounded support and its probability distribution is unimodal and Lipschitz continuous, the variance is a monotonic function of a lower bound of the entropy (59). Therefore, we use as an empirical estimate of for the reward of the accommodation reinforcement learning module. As one would expect for the entropy , also decreases for increasing input blur. Under the assumption of a flat frequency spectrum after whitening, one finds ()where is the SD of the Gaussian blur filter that is applied before whitening to simulate defocus blur.

Reward Normalization.

Before being passed to the reinforcement learning agents, the accommodation and the vergence rewards were normalized online to zero mean and unit variance: that is,where is the exponentially weighted running average of the reward and is an online estimate of its variancewhere is an update rate that sets the decay of the exponential weighting (60).

Suppression Mechanism.

There are two separate suppression modules, one per scale, that adjust the contrast of left and right input images (Fig. 5). We introduce a contrast measure that gives an estimate of the amount of left (right) monocular input over the previous iterations: The monocular dominance of each neuron is weighted with its relative patch-averaged squared activation . Here, is the exponential moving average over time with decay constant (). The monocular dominance is defined aswhere is the left/right monocular subfield of neuron . The contrast estimate of the left and right subfields of the input image is separately processed by two contrast units:As the contrast estimate crosses the threshold , the output increases from zero until it saturates at . We chose the threshold just above perfect binocular input at to provide some margin before the self-reinforcing feedback loop becomes active (Fig. 5). Furthermore, we set the saturation to prevent total suppression of one eye. Finally, the subsequent input subpatches for the cortical coder are adjusted toNote that since is homeomorphic to (61). Therefore, in our theoretical framework, we do not distinguish between the contrast-adjusted and raw retinal response. For the model implementation, the contrast-adjusted retinal response is used. has additional detail.

Software and Documentation.

Documented MATLAB code of the model is available in ModelDB under accession no. 261483 (62).
  46 in total

Review 1.  Natural image statistics and neural representation.

Authors:  E P Simoncelli; B A Olshausen
Journal:  Annu Rev Neurosci       Date:  2001       Impact factor: 12.449

2.  Efficient coding of natural sounds.

Authors:  Michael S Lewicki
Journal:  Nat Neurosci       Date:  2002-04       Impact factor: 24.884

Review 3.  Amblyopia and the binocular approach to its therapy.

Authors:  Robert F Hess; Benjamin Thompson
Journal:  Vision Res       Date:  2015-04-20       Impact factor: 1.886

Review 4.  The free-energy principle: a unified brain theory?

Authors:  Karl Friston
Journal:  Nat Rev Neurosci       Date:  2010-01-13       Impact factor: 34.870

5.  Accommodative performance of children with unilateral amblyopia.

Authors:  Vivian Manh; Angela M Chen; Kristina Tarczy-Hornoch; Susan A Cotter; T Rowan Candy
Journal:  Invest Ophthalmol Vis Sci       Date:  2015-01-27       Impact factor: 4.799

6.  Effects of early unilateral blur on the macaque's visual system. III. Physiological observations.

Authors:  J A Movshon; H M Eggers; M S Gizzi; A E Hendrickson; L Kiorpes; R G Boothe
Journal:  J Neurosci       Date:  1987-05       Impact factor: 6.167

Review 7.  Visual neural development.

Authors:  J A Movshon; R C Van Sluyters
Journal:  Annu Rev Psychol       Date:  1981       Impact factor: 24.137

8.  Altered Balance of Receptive Field Excitation and Suppression in Visual Cortex of Amblyopic Macaque Monkeys.

Authors:  Luke E Hallum; Christopher Shooner; Romesh D Kumbhani; Jenna G Kelly; Virginia García-Marín; Najib J Majaj; J Anthony Movshon; Lynne Kiorpes
Journal:  J Neurosci       Date:  2017-07-25       Impact factor: 6.167

Review 9.  Normalization as a canonical neural computation.

Authors:  Matteo Carandini; David J Heeger
Journal:  Nat Rev Neurosci       Date:  2011-11-23       Impact factor: 34.870

10.  A model of binocular rivalry and cross-orientation suppression.

Authors:  Christopher P Said; David J Heeger
Journal:  PLoS Comput Biol       Date:  2013-03-28       Impact factor: 4.475

View more
  4 in total

Review 1.  Efficient Temporal Coding in the Early Visual System: Existing Evidence and Future Directions.

Authors:  Byron H Price; Jeffrey P Gavornik
Journal:  Front Comput Neurosci       Date:  2022-07-04       Impact factor: 3.387

2.  Active efficient coding explains the development of binocular vision and its failure in amblyopia.

Authors:  Samuel Eckmann; Lukas Klimmasch; Bertram E Shi; Jochen Triesch
Journal:  Proc Natl Acad Sci U S A       Date:  2020-03-02       Impact factor: 11.205

3.  Recurrent processing improves occluded object recognition and gives rise to perceptual hysteresis.

Authors:  Markus R Ernst; Thomas Burwick; Jochen Triesch
Journal:  J Vis       Date:  2021-12-01       Impact factor: 2.240

4.  Unsupervised learning of haptic material properties.

Authors:  Anna Metzger; Matteo Toscani
Journal:  Elife       Date:  2022-02-23       Impact factor: 8.140

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.