Wei-Mien M Hsu1, David B Kastner2, Stephen A Baccus3, Tatyana O Sharpee4. 1. Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA; Department of Physics, University of California, San Diego, La Jolla, CA, USA. 2. Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, School of Medicine, San Francisco, CA, USA; Department of Neurobiology, Stanford University School of Medicine, Stanford, CA, USA. 3. Department of Neurobiology, Stanford University School of Medicine, Stanford, CA, USA. 4. Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA; Department of Physics, University of California, San Diego, La Jolla, CA, USA. Electronic address: sharpee@salk.edu.
Abstract
Modulation of neuronal thresholds is ubiquitous in the brain. Phenomena such as figure-ground segmentation, motion detection, stimulus anticipation, and shifts in attention all involve changes in a neuron's threshold based on signals from larger scales than its primary inputs. However, this modulation reduces the accuracy with which neurons can represent their primary inputs, creating a mystery as to why threshold modulation is so widespread in the brain. We find that modulation is less detrimental than other forms of neuronal variability and that its negative effects can be nearly completely eliminated if modulation is applied selectively to sparsely responding neurons in a circuit by inhibitory neurons. We verify these predictions in the retina where we find that inhibitory amacrine cells selectively deliver modulation signals to sparsely responding ganglion cell types. Our findings elucidate the central role that inhibitory neurons play in maximizing information transmission under modulation.
Modulation of neuronal thresholds is ubiquitous in the brain. Phenomena such as figure-ground segmentation, motion detection, stimulus anticipation, and shifts in attention all involve changes in a neuron's threshold based on signals from larger scales than its primary inputs. However, this modulation reduces the accuracy with which neurons can represent their primary inputs, creating a mystery as to why threshold modulation is so widespread in the brain. We find that modulation is less detrimental than other forms of neuronal variability and that its negative effects can be nearly completely eliminated if modulation is applied selectively to sparsely responding neurons in a circuit by inhibitory neurons. We verify these predictions in the retina where we find that inhibitory amacrine cells selectively deliver modulation signals to sparsely responding ganglion cell types. Our findings elucidate the central role that inhibitory neurons play in maximizing information transmission under modulation.
The need to use efficient representations within the nervous system currently provides one of the leading frameworks for understanding neural computation. This framework accounts for a number of different properties of neural responses (Bialek, 2012; Atick and Redlich, 1992; Pitkow and Meister, 2012; Haft and van Hemmen, 1998; Borghuis et al., 2008; Liu et al., 2009; Doi et al., 2012; Zhaoping, 2006; Garrigan et al., 2010; Gjorgjieva et al., 2014; Balasubramanian and Sterling, 2009; Ratliff et al., 2010; Laughlin, 1981; Kastner et al., 2015; Brinkman et al., 2016), including optimal ways for neural circuits to adapt to statistically consistent changes in the input statistics (Bialek, 2012; Fairhall et al., 2001; Simmons et al., 2013; Brenner et al., 2000a). However, it is also important to consider the case where information transmission occurs in the presence of fluctuations in input statistics that might not be strong enough, or persist for a long enough time, to trigger full-scale adaptation. These types of fluctuations are nevertheless important to take into account because they can evoke and/or represent modulatory influences from other circuits, as is ubiquitous in the brain. For example, modulatory influences include contextual or top-down signals about input properties on scales larger than that of the neuron’s primary receptive field, which closely follows the neuron’s linear or the so-called classic receptive field (Vinje and Gallant, 2000). Such contextual effects underlie figure-ground segmentation, motion selectivity, motion reversal or anticipation, and other predictive effects in the retina the retina (Gollisch and Meister, 2010; Kastner and Baccus, 2013, 2014). These effects are also prominent in the cortex where they include cross-orientation suppression (Morrone et al., 1982; Nishimoto et al., 2006) and other non-classic receptive field effects in visual (Roelfsema, 2006; Vinje and Gallant, 2000) and auditory (Bar-Yosef and Nelken, 2007) cortices. Threshold modulation can also result from the direct action of neuromodulatory circuits (Aston-Jones and Cohen, 2005) that represent changes in arousal and attention (Kato et al., 2012; Luck et al., 1997; Goris et al., 2014). The ubiquity of modulatory signals makes it essential to consider how they may influence the properties of maximally informative neural circuits.It turns out that modulation has surprisingly non-trivial effects on information transmission. On one hand, for a sensory circuit, modulation of neuronal threshold that is independent of the primary sensory input is bound to decrease the information that this circuit can transmit about that primary input. On the other hand, we will show that modulation always decreases information less than an equivalent increase in the primary noise. We further show that the negative impacts of modulation can be nearly eliminated if it is directed to a subset of sparsely responding neurons in a coupled neural circuit. In this way, the neural circuit can take advantage of the flexibility afforded by modulation of its response properties without suffering a reduction in information transmission.We test predictions of this theory on responses of pairs of retinal ganglion cells (RGCs) that encode the same temporal fluctuations of light intensities but with different thresholds (Kastner and Baccus, 2011). These cells have been termed adapting and sensitizing based on their short-term plasticity, but for the present analyses in steady-state conditions, the main differences between these cell types are that adapting cells have higher thresholds and larger noise levels than sensitizing cells. Previous maximally informative solutions for pairs of neurons accounted for many aspects of these neurons’ responses, including why these two separate cell types are observed among Off neurons but not among On neurons (Kastner et al., 2015). However, some noticeable quantitative differences between theory and experimental measurements were left unexplained (Kastner et al., 2015). Recent studies have pointed out that incorporating multiple noise sources could affect the predictions for threshold differences between cell types (Brinkman et al., 2016). Therefore, we set out to determine whether modulatory effects on a cell’s threshold would influence the theoretical predictions, bringing them into better agreement with experimental measurements. After testing a number of scenarios, we found that a model where a secondary pathway modulated the threshold of the primary pathway for each cell type (Figure 1) could quantitatively account for the measurements of threshold differences between cell types, across several different contrasts. We envision that this threshold modulation occurs even for a fixed contrast, and in the case of the retina derives from contextual modulation from inputs on scales larger than neuronal receptive field center, or for cortical neurons, the classic receptive field (Vinje and Gallant, 2002).
Figure 1.
Two-pathway model of information transmission with threshold modulation
(A) The experimentally observed neural nonlinearity reflects two noise sources (purple line): the intrinsic noise ν in the primary pathway (blue) and threshold modulation that occurs on longer timescales with variance σ (red). Over time, the observed nonlinearity is an average over different threshold positions and has an effective width .
(B) Threshold variation over time is much stronger than variation in the primary noise. Each point displays the variance in the threshold (μ) and variance in the slope (ν) across the set of 30 nonlinearities estimated for each adapting cells. Each nonlinearity was estimated usimg responses to 10 s of visual stimulation with Gaussian white noise.
Fitting the maximally informative model with threshold modulation to the retinal data also made it possible to separate the observed neural variability into the contributions due to threshold modulation and noise in the primary pathway. We found that higher noise levels of adapting cells can be fully explained by larger threshold modulation experienced by these neurons compared to those experienced by sensitizing cells; the primary pathway noise levels were similar for both cell types. Mechanistically, threshold modulation in adapting cells could be implemented as additional input from inhibitory amacrine cells. To confirm this prediction, we then directly recorded from and manipulated sustained Off amacrine cells. These experiments revealed a more reliable distance-dependent input from amacrine cells to adapting cells compared to sensitizing cells, consistent with the scheme where amacrine cells modulate the thresholds of adapting cells.The theoretical results are obtained here using basic concepts of information theory. Therefore, they should apply not only in the retina but also in the cortex and other neural circuits. The results highlight the importance of using inhibitory neurons to deliver modulatory signals into a circuit, which can provide a new framework for understanding the function of inhibitory neurons in the brain.
RESULTS
Impact of threshold modulation on information transmission
To understand information transmission in the presence of threshold modulation, we modeled responses of individual neurons as binary, 1 or 0, corresponding to the presence or absence of a spike in a small time bin, respectively. Spiking probability is modeled as a threshold crossing event, with a threshold (μ) and a noise level (ν), which determines the variation in neural responses for a given input value. When parameter ν is small, there is only a small range of stimuli for which neuronal responses varies strongly from trial-to-trial with a probability ~0.5. For inputs that are either much greater or smaller than the threshold μ, the spike probability is nearly certain, with values close to either 1 or 0, cf. Figure 1. When the parameter ν is large, the range of stimuli with uncertain neuronal responses is large. The increase in the uncertainty in neural responses with ν can be quantified using a quantity known as noise entropy (Brenner et al., 2000b), which represents the average uncertainty in the neural responses across different stimuli.This model of neural responses yields a saturating nonlinearity shown in Figure 1 and described by the following equation:In this equation, we write νeff instead of ν to emphasize the fact that the observed noise in neural responses represents actually a joint effect of multiple different types of noise (Brinkman et al., 2016). Here, we will focus on two types of noise: the “primary” noise ν that arises in the direct afferent circuitry for each cell, and the secondary source of variability that arises from the modulation of the threshold μ of the primary pathway and acts on longer timescales. On short timescales, similar to those of the spike generating process, the threshold value does not vary, and variability in neural responses is described by ν only. On long timescales (~ seconds), which are necessary to measure the neural input-output function, its width is described byWe note that, in principle, noise ν in the primary pathway can itself also be subject to modulation, not just the threshold μ. This modulation would also increase νeff. However, in practice, we found that variation in ν was much weaker (Figure 1B). Therefore, in what follows, we focus on the effect of modulation on changes in the threshold.To analyze the impact of threshold modulation on information transmission, we compute the Shannon mutual information in two steps. In the first step, mutual information between stimuli and neural responses is computed on short timescales, i.e., for a fixed threshold μ, as a difference between the total response entropy S[p(r)] of neural responses and the “noise” entropy S[p(r|x)] in the neural response:
where x is the filtered stimulus according to the spatiotemporal receptive field of the neuron, and r ∈ {0, 1} represents the response of a single neuron before the incorporation of the modulation in the secondary pathway (σ = 0, νeff = ν). At this step, the mutual information quantifies the impact of the primary noise (without the input from the modulatory pathway). In the second step, we integrate this mutual information over threshold positions to take into account the impact of variability from the modulatory pathway:Here, describes the distribution of threshold values.The information in Equation 5 is actually the so-called conditional mutual information (Cover and Thomas, 1991) I(X; R|M) between the input and the responses of the primary pathway, conditional on the signals μ from the modulatory pathway. As such, this information differs from the full information provided jointly by modulatory and primary pathways only by the term I(X; M): I(X; R|M) = I(X; {R, M}) − I(X; M), where I(X; M) represents information provided solely by the modulatory pathway. Because I(X; M) does not depend on the parameters of the nonlinearity of the primary pathway, it can be dropped when searching for the maximally informative properties of the primary pathway. Thus, one can find the maximally informative setting for the primary pathway and the optimal modulation by maximizing information from Equation 5. These arguments generalize to the case of multiple neurons where one evaluates information between inputs X to the primary pathway of each neuron and the vector of responses across the neural population R = {r}, r ∈ {0, 1}.We start by considering the impact of threshold modulation on single neurons. Here, modulation always decreases information transmission (Figure 2A). However, for an equivalent amount of variance, modulation decreases information less than does primary noise. Therefore, if the system has a choice between reducing the primary noise or reducing modulation, it is always preferable to reduce the primary noise first, cf. Figure 2B.
Figure 2.
Impact of threshold modulation on information transmission
(A) The difference in information before and after adding different types of variability: either modulation (blue lines) or primary noise (black lines). Both types of variability decrease information, but modulation (blue lines) decreases information much less than the primary noise (black lines). We note that both the primary noise and the modulation also increase the spike rate. Therefore, the baseline information (without modulation) is computed for the higher rate that matches the rate in the presence of modulation.
(B) The stronger detrimental effects of primary noise on information transmission compared with modulation are shown here for the case where primary noise and modulatory variance are constrained to sum . In this case, the smaller the primary noise (bottom x axis), the larger the information (y axis), despite the corresponding increases in modulatory variance (top x axis).
The effect becomes more interesting in groups of neurons, starting with pairs of neurons. Here, we find that if modulation is directed to the neuron with the lowest firing rate in the group, then the negative effect of modulation is almost completely removed, cf. Figures 3A and 3B. In these calculations, the firing rates were assigned to maximize information while constraining the average spike rate across the neurons (Figure S2). We find that one can apply much larger modulation to a single neuron than the modulation distributed to many neurons and still have less of a decrease in information. Selective application of modulation also maximized information in groups of three neurons (Figures 3C and 3D). With three neurons, information was maximally preserved under modulation when it was applied to the neuron with the smallest spike rate. The most detrimental effects of modulation were observed when modulation was applied to the neuron with the largest spike rate. This was followed by progressively better results if modulation was applied equally to all neurons or to the neurons with the intermediate spiking rate. However, these intermediate cases still led to worse performances compared to the case where modulation is directed to the neuron with the lowest spike rate (Figure 3D). The degree of protection from modulation-induced loss is higher for the three-neuron circuit compared with a two-neuron circuit (Figure 3D). This suggests that the benefits of including a sparsely responding neurons can be larger in large groups of neurons.
Figure 3.
Modulation directed to sparsely responding neurons protects against modulation-induced information loss
(A) The information loss is smallest when only the lowest-spiking neuron (red line) receives modulation, compared to modulating all neurons (gray line) or the highest-spiking one (blue line). The black line shows information in the absence of modulation. The primary noise ν = 0.2 for all cases; lines with modulation have the same averaged effective noise νeff = 0.4 after modulation. Arrows describe how points on the unmodulated curve change in terms of information and spike rate upon adding the same amount of overall modulation. The red and blue arrows have different final values for spike rate because the modulation-induced increase in the spike rate depends on the initial spike rate values and is different for the lowest and highest spiking neuron in the pair. The averaged effective noises after modulation are νeff = 0.3 for all curves. The spike rates were optimized to yield maximal information for a given average spike rate. The corresponding rates are shown in Figure S2.
(B) Same as (A) but shows the results on an expanded scale in terms of percentage of information loss (relative to the black line in (A), i.e., Iloss = 1 − Ilong–term/Iwithout modulation from Equations 3 and 4). (C and D) Same as (A) and (B) but for three neurons. In (D), results from (B) pertaining to pairs of neurons are re-plotted using dashed lines for comparison. Green lines show the case where modulation is directed to the neurons with intermediate spike rates; other colors are the same as for pairs of neurons. Directing modulation to the most sparse neurons yields the smallest information loss from modulation. Modulation can be more fully compensated in three-neuron groups compared to two neurons, for smaller spike rates. Further details for the plots are provided in the Supplemental Information.
We also examined the case where neurons have the same thresholds and spike rates, as can be optimal for high values of the primary noise (Kastner et al., 2015). In this case, we found that the optimal ways to apply modulation differed depending on whether same-threshold neurons had small or large spike rates, cf. Figure S3. In the case where neurons had small rates, it was optimal to apply modulation equally to both of them. In the case where neurons had large response rates, it was optimal to direct modulation to one of the neurons than split it equally to both neurons. In this case, the application of modulation lowered the spike rate in the target neurons. The implication from these results therefore is that if a large neural circuit contains neurons of the same type that have small spike rates, such as, for example, the adapting cells in the retina, then modulation should be applied selectively to the class of neurons with sparse responses and equally within this class of neurons.Why is it beneficial to direct modulation to the neuron with the lowest spike rate? An intuitive explanation for this phenomenon can be obtained by considering the shape of the information function for a single neuron with respect to its threshold (Figure 4A). This function is concave for small thresholds and convex for large thresholds. This is important because concave functions decrease their value upon averaging of their inputs, as occurs as a result of threshold modulation, while convex functions increase their value. This means that neurons with small thresholds, i.e., high spike rates, will suffer a decrease in information transmission upon modulation, cf. Figure 4B. In contrast, neurons with large thresholds, i.e., small spike rates, will increase information transmission upon threshold modulation. The lower the spike rate is, the greater the increase in the information transmission with modulation. This explains why directing modulation to the neuron with the lowest firing rate is more beneficial than directing modulation to neurons with higher firing rate. As a related point, one can also notice in Figure 3B that the protection against modulation-induced loss in information transmission decreases with the average spike rate.
Figure 4.
Modulation-induced transition in information transmitted as a function of spike rate
(A) Spike probability, computed according to Equation 1, is a convex function of threshold position (black line). In contrast, information (red line) changes convexity as a function of threshold. When a function has positive convexity (solid segments of the curve), the average of its two values at points a and b is always larger than the function value at (a + b)/2. In this regime, fluctuations increase information transmission. The opposite is true for regions of negative convexity (dashed curve). As a result, fluctuations in threshold decrease information when thresholds are low and increase information when threshold are high, i.e., when neurons respond sparsely.
(B) Threshold modulation increases mutual information from Equation 4 when spike rates are small (filled dots) but decreases it when spike rates exceed a certain transitional value (open dots). The shaded pink region denotes the value where modulation increases information transmission. Thick solid lines show information in the absence of threshold modulation , for two noise levels ν1,2 = 0 (black) and 0.2 (light blue). Thin solid lines and the eight series of color dots on them show how curves shift upon introduction of modulation. Each series of color dots evolves from the same intrinsic noise (ν) and threshold (μ). Color denotes the resulting effective noise . (inset) The transitional value in response rate is plotted as a function of the intrinsic noise. (C) Modulation increases response rate.
At this point, it is important to clarify that this increase in information transmission with modulation is accompanied by an increase in the spike rate. Unlike information, the firing rate function is convex for all values of its argument (Figure 4A). As a result, modulation always increases the spike rate (Figure 4C). The increase in the information from modulation is less than it would have been if the rate was simply increased by lowering the threshold, without the modulation. As a result, the information versus rate curve in the presence of modulation has the same shape as in the absence of modulation, just with reduced information for a given rate. Thus, these results are consistent with those in Figure 2A showing modulation decreases information. It is just that the increase in information upon modulation can nearly completely match the increase that would have been observed if the firing rate was increased without modulation.The conclusions from the theoretical analyses of information transmission in the presence of threshold modulation indicate that modulation should not be equally distributed to all neurons in the target circuit. Instead, it should be directed to the neuron with the lowest spike rate with inhibitory signals. The use of inhibitory signals ensures that the rank ordering of neurons does not change under modulation, and the neuron that receives modulation does not get its spike rate raised. The theoretical analyses also illustrate the need to use neurons with diverse spike rates, because the average spike rate in the circuit sets the upper limit on the amount of information that this group of neurons can transmit, with or without modulation. To have the capability to transmit large amounts of information, the circuit has to include neurons with large spike rates. Including neurons with small response rates and directing modulation to them helps maintain information transmission near its maximal levels in the presence of modulation.
Retinal input-output functions are maximally informative under threshold modulation
We now test these predictions using responses of pairs of cells in the retina that differ in their average spike rates. The adapting and sensitizing cells are two cell types that represent the same temporal pattern of light intensity modulation but have different thresholds. Our first analysis is to fit the maximally informative model with modulation to the responses of pairs of adapting/sensitizing cells. The fit was made while requiring that the effective noise and the average spike rate for the pair matched experimental measurements (see STAR Methods for details). The fit yields estimates for threshold modulation and primary noise for each neuron in the pair as well as an estimate for the difference in their thresholds. These estimates can then be compared to direct experimental measurements of these variables.We find that the inferred amount of noise in the primary pathway was similar for both adapting and sensitizing cells (Figure 5A). However, the threshold modulation was substantial for adapting cells and very close to zero for the sensitizing cells (Figure 5A). The fitting results were consistent across cell pairs (Table S1). Thus, the differences in the effective noise that are observed between these two cell types (Kastner and Baccus, 2011) are due to differences in threshold modulation. We also note that threshold modulation was small in sensitizing cell even relatively to their thresholds (the modulation was ~100 times smaller for sensitizing cells compared to adapting cells, whereas their thresholds are only approximately half as small as those of adapting cells).
(A) Intrinsic neural noise and threshold modulation inferred using the maximally informative model with modulation from retinal data, cf. Equation 8 in Method Details. Both neural types have comparable amounts of intrinsic neural noise (ν) but distinct levels of threshold modulation (σ). All noise types varied linearly with the stimulus contrast, except for modulatory noise in the sensitizing cells, which was small and contrast independent.
(B) The experimentally observed threshold variation (from Figure 1B) is positively correlated across adapting cells (r = 0.3, p = 0.015) with threshold modulation inferred from the maximally informative model from Equation 8. Both axes are in units of contrast. Colors denote different neurons. Data points for the same neuron/color represent measurements from different input contrasts. The bar denotes standard deviation of the data points.
The threshold modulation values predicted by the maximally informative model with modulation can be compared with direct experimental estimates of their threshold modulation. To compute the amount of threshold modulation that is observed experimentally, we estimated neuronal nonlinearities from shorter data subsets (1/4 to 1/6 compared to the full dataset). Each nonlinearity was fit with a logistic function to determine its threshold value. We find that the observed variation in thresholds for a given adapting cell matches those estimated using the maximally informative model (Figure 5B, paired non-parametric t test p = 0.73). (This analysis was only carried out for adapting cells, because threshold modulation was negligible in sensitizing cells.) Those adapting cells that had larger variance in thresholds across trials also had larger values of threshold modulation as indicated by fitting the maximally informative model to the full set of their response (the correlation was statistically significant, with p = 0.015, Figure 5B). These analyses add credence to the use of the maximally informative model with modulation as a method for separating the noise component that is due to threshold modulation. They also indicate that the observed threshold modulation in adapting cells is maximally informative given their other parameters, such as the primary noise and firing rate.Another prediction that one can obtain from the maximally informative model with modulation pertains to the differences in the thresholds between adapting and sensitizing cells. Previous predictions for the threshold differences obtained for pairs of neurons without taking modulation into account yielded values that were systematically larger than those observed experimentally (Kastner et al., 2015), replotted in Figure 6 with a black line. We find that the maximally informative model with modulation provided more accurate predictions for thresholds differences between pairs of neurons than the model with no modulation, cf. Figure 6. Statistically, the threshold difference (in units of contrast) between adapting and sensitizing cells were consistent between the average values across contrasts for each cell pairs from the maximally informative model and experimental measurements (paired non-parametric t test p = 0.14). By comparison, the model with no modulation yielded systematically greater threshold differences than is observed experimentally (black line in Figure 6). We note that experimental data points show larger residual variation across different contrasts than our model indicates. The reason for this is that, in the model, noise components and threshold modulation for adapting cells were constrained to change linearly with contrast (to reduce the number of fitted parameters, see STAR Methods). Thus, the model was not meant to predict residual variation across contrasts that remains after rescaling inputs by their contrast. Other than this variability, the predictions of the maximally informative model with modulation for threshold differences between adapting and sensitizing cells are fully consistent with experimental measurements (p > 0.14, Figure 6B).
Figure 6.
Maximally informative model with modulation accounts for threshold differences between adapting and sensitizing cells
Threshold differences between adapting and sensitizing cells are plotted in normalized coordinates relative to their optimal values in the absence of modulation (black lines in top row); see STAR Methods. Top row (A and B) shows normalized threshold differences as a function of average effective noise of the adapting/sensitizing cell pair. Bottom row (C and D) shows normalized threshold differences as a function of difference in the effective noise between the two neurons. Columns show data (left), maximally informative predictions with modulation (right). Different colors denote different cell pairs. Open circles represent data for a given contrast; filled circles show the average across contrasts. Black lines show predictions for threshold differences without threshold modulation. Gray dashed lines denote spinodal lines that separate regions where information has two maxima versus a single maximum. Points close to the spinodal lines (e.g., blue, light blue, and light green) are more difficult to fit because they mark the region where one of the maxima ceases to exist. This pushes the interpolated solutions away from the spinodal line (cf. Figure S1). Despite these technical issues, the overall distribution of mean threshold values normalized across contrasts was not statistically different between fitted and experimental values, p = 0.14.
Amacrine cells as a source of threshold modulation for adapting cells
One of the key predictions of the theory is that modulation should be directed to neurons with low spike rates. However, as we have seen above, modulation increases the spike rate (Figure 4C), albeit by moderate amounts. One way to minimize the risk of altering the rank ordering of neurons in terms of their spike rate is to deliver it with inhibitory neurons. In this way, the neuron that is undergoing modulation will automatically have its threshold raised and spike rate lowered. This is consistent with our observations in the retina where adapting neurons, which undergo modulation, also have larger thresholds and smaller spike rates. In the retina, inhibitory amacrine cells could be the source of that input (Figure 7A). If amacrine cells provide stronger inputs to adapting cells than the sensitizing cells, then this would simultaneously explain why the thresholds of adapting cells are higher and more variable than those of sensitizing cells. The fact that both the mean threshold and its modulation varies approximately linearly with contrast is also consistent with this wiring scheme. Inputs to and from amacrine cells just need to be scaled by contrast just like inputs within the primary pathway for the adapting and sensitizing cells.
Figure 7.
Distance-dependent inputs from amacrine to adapting cells
(A) Inferred model of the presynaptic circuitry of the two types of Off retinal ganglion cells based on observed differences in the strength of the modulatory pathway.
(B) The nonlinearity of Off ganglion cells during the depolarizing (dot) and hyperpolarizing (triangle) current injection into the amacrine cell. Inset shows the unit of the visual stimulus that consisted of 100 ms steps up/down in contrast followed by 200 ms of mean contrast. The solid and dashed curves show the fit with sigmoid function. The error bar denotes standard error of firing rate. The distance between the receptive field (RF) of the amacrine cell to that of the adapting cell was 0.090 mm, 0.101 mm to the RF of the sensitizing cell.
(C) The amount of inhibitory input from amacrine cells to the adapting cell decreases with distance significantly (p×10−8, f-test). (Inhibition may be direct or polysynaptic, through circuitry involving bipolar cells or other amacrine cells.) The dependence on distance was not statistically significant for sensitizing cells (p = 0.9). Solid lines show the exponential fits with distance. The error bar is standard error of the sample.
We tested this hypothesis by performing a separate set of experiments to analyze how the hyperpolarization and depolarization of sustained Off-type amacrine cells by intracellular current injection affected responses of nearby adapting and sensitizing cells recorded simultaneously with a multielectrode array (see STAR Methods and Figure 7). The setup in these experiments was similar to our recent study (Kastner et al., 2019) that focused on the dynamics of sensitizing cells but included much larger steps in stimulus amplitude to probe responses of both adapting and sensitizing neurons. We analyzed the change in the mean threshold of adapting/sensitizing neurons between hyperpolarization and depolarization of the amacrine cell. When an amacrine cell is hyperpolarized (depolarized), this decreases (increases) its inhibition onto neurons it is directly connected to. Although we do not assume that there are direct connections between amacrine cells and the ganglion cells we recorded (the connection could be polysynaptic, through circuitry involving bipolar or other amacrine cells), this approach measures the functional effect of individual amacrine cells. We find that inputs from amacrine cells have a much stronger impact on the thresholds of nearby adapting cells compared to sensitizing cells (p = 0.04, for cells within 0.2 mm from the amacrine cell RF), cf. Figure 7C. Here, we also plot the change in the threshold as a function of distance between the receptive fields (RFs) of the amacrine cell (that was subjected to hyperpolarization/depolarization) and the adapting/sensitizing cell whose nonlinearity was measured to estimate its threshold. In the case of adapting cells, there was a clear and statistically significant dependence of the amount of threshold shift as a function of the distance to the amacrine cell RF center (p = 8×10−5 F-test compared with null hypothesis of no dependence on distance). The dependence was not statistically significant in the case of sensitizing cells (p = 0.9). Thus, these data support the hypothesis that amacrine cells exert stronger influence on the thresholds of adapting neurons than on the threshold of sensitizing neurons, and that the larger thresholds of adapting ganglion cells arise as a result of inhibition from the amacrine cells, and that this inhibition also brings with itself stronger threshold modulation.
DISCUSSION
In this work, we analyzed information transmission in the presence of threshold modulation. There are two main conclusions. The first conclusion is that modulation should not be equally applied to all neurons in the circuit. Instead, it should be directed to select neurons, preferably those with the low spike rates in the circuit. The second conclusion describes the central role that inhibitory neurons play in delivering modulatory signals into the circuit. These conclusions are obtained from basic analyses using information theory and therefore should apply to all neural circuits. We now discuss the implications of these conclusions, with a focus on cortical circuits.The first conclusion highlights the need to form circuits using neurons with different spike rates. The large number of sparsely firing neurons in the cortex have long presented a puzzling observation (Olshausen and Field, 2005). The chief explanation offered so far is that sparse responses arise because of metabolic constraints (Laughlin et al., 1998). However, one could have hypothetically used a smaller number of neurons with higher spike rates, if metabolic constraints were the leading cause for the sparseness of neural responses. The information-theoretic analyses in the presence of modulation offer a different explanation. Neural circuits need to have neurons with both high and low firing rates in order to transmit large amounts of information in the presence of modulation. High firing neurons make it possible to transmit large amount of information, whereas neurons with small spike rates protect against loss of information transmission in the presence of modulation.The second conclusion describes a rather unexpected role for inhibitory neurons as intermediaries for delivering modulation signals. This setup helps to ensure that low-spiking neurons that receive modulation remain in this regime under varying modulation levels. We find support for this prediction in the retina where inhibitory amacrine cells send modulatory signals to sparsely spiking adapting cells. If modulation were delivered to adapting cells via excitatory pathway, then this would risk making their spike rate greater than that of sensitizing cells and losing protection against negative effects of threshold modulation on information transmission.The amacrine cells studied here were sustained Off amacrine cells, which have been shown to be involved in various adaptive functions in the retinal circuit. They have been shown to act through disinhibition (Manu and Baccus, 2011), they contribute to the classic receptive field surround in ganglion cells (Manu et al., 2017), and adaptation of their transmission mediates the phenomenon of sensitization (Kastner et al., 2019). The same amacrine cells both establish the threshold of the nonlinearity of ganglion cells during steady state (Figure 7), and their dynamics lead to the change in threshold that creates sensitization.The theory of modulation analyzed here can be implemented with both spiking and non-spiking neurons. The sustained Off amacrine cells that we studied here experimentally are non-spiking, as are many amacrine cells in the salamander. However, elsewhere in the nervous system modulation is commonly delivered using spiking neurons. For example, most of the modulatory signals are delivered to cortical circuits via inhibitory neurons (Harris and Shepherd, 2015). This includes inhibitory neurons expressing the vasoactive intestinal peptide that are major recipients of neuromodulatory and context-dependent inputs from higher-order cortical areas (Harris and Shepherd, 2015). Similarly, somatostatin-expressing inhibitory neurons use this neuro-peptide as a co-transmitter with GABA to modulate the activity of local neurons (Liguz-Lecznar et al., 2016). The slow action of neuro-peptides, such as somatostatin, conforms with our modeling framework where modulation changes neuronal threshold on slower timescales than those on which the primary activation pathway operates. We note also that all of the other inhibitory neurons, including parvalbumin-positive inhibitory neurons, are directly responsive to neuromodulators such as acetylcholine and serotonin (Yi et al., 2014). Furthermore, even when neuromodulators, such as acetylcholine, act directly on excitatory neurons, they exert first an inhibitory response (Dasari et al., 2017) in their target neurons. In addition to these post-synaptic mechanisms of threshold modulation, there are several known mechanisms that operate pre-synaptically (Debanne et al., 2013) and are based on inactivating hyperpolarizing channels. This includes inactivation of presynaptic K
+ channels and modulation of G-protein coupled receptors that produce tonic inhibition of transmitter release (Debanne et al., 2013) and hyper-polarization-induced recovery of Na channels from inactivation (Rama et al., 2015). Our theoretical results suggest that there might be fundamental information-theoric reasons why all of these different forms of threshold modulation engage hyperpolarization and inhibitory mechanisms.
Limitations of study
Analysis of information transmission in the presence of modulation was based on the separation of timescales, with threshold modulation having a much slower dynamics than the response dynamics of the primary pathway and its noise characteristics.From a numerical perspective, computation of the mutual information in the presence of threshold modulation (Equation 5) represents a multidimensional integral with a dimensionality equal to the number of cells. We can numerically compute this integral for arbitrary modulation strength only for pairs of neurons. For more than two neurons, we approximate the integral using the perturbation method for small modulation values. However, to compute the higher-dimensional integral without approximation, one might need other algorithms to carry it out (e.g., Monte Carlo methods), which was not performed here.
STAR★METHODS
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources and should be directed to and will be fulfilled by the Lead Contact, Tatyana Sharpee (sharpee@salk.edu).
Materials availability
This study did not generate new unique reagents.
Data and code availability
Data is available upon request.
METHOD DETAILS
Experimental preparation
We use a combination of new and previously published experimental data (Kastner and Baccus, 2011). Full details of the experimental procedures for measuring neural nonlinearities are provided in Kastner and Baccus (2011). Briefly, uniform field stimuli were drawn from a Gaussian distribution with constant mean intensity, M, of 10 mW=m2. Contrast is defined as σ = W/M, where W is the SD of the intensity distribution. Neurons were probed with flashes of nine different contrast values from 12% to 36% in 3% intervals. The contrasts were randomly interleaved and repeated. Each contrast was presented, in total, for ≥600 s. For the calculation of the response functions, the first 10 s of data in each contrast were not used to allow for a better estimation of the steady state.
Intracellular recording
Simultaneous intracellular and multielectrode recordings from the isolated intact salamander retina were performed as described (Manu and Baccus, 2011) but using stimuli with larger steps in visual contrast to fully probe both adapting and sensitizing nonlinearities. Sustained amacrine cells were distinguished from horizontal cells by their flash response and their spatiotemporal receptive fields, with horizontal cells lacking an inhibitory surround and being greater than 300 μm in diameter. For the intracellular recordings the stimulus comprised of randomly drawn contrasts with contrast amplitudes that ranges from 0 to 40% Michelson contrast units, where Michelson contrast is defined as (Imax − Imin)/(Imax + Imin). The flash amplitude varied randomly every 400 ms, the first 100 ms the flash was greater than the mean, from 100 to 200 ms the flash was lower than the mean, and for the last 200 ms the flash was at the mean luminance level (cf. inset in Figure 7B). Changing the distribution of amplitudes slower than the integration time of ganglion cells allowed for a rapid measurement of the ganglion cell response function without having to also measure the ganglion cell temporal filter (Brenner et al., 2000a). Synchronized to the visual stimulus, we injected from 100 to 300 ms, randomly interleaved, hyperpolarizing (−500 pA) or depolarizing (+ 500 pA) current pulses into the amacrine cell. The ganglion cell response function was calculated at the firing rate of the ganglion cell from 100 to 400 ms of each contrast amplitude. This focused on the off response of the ganglion cell.
Maximally informative modulation model for two neurons
Here we begin by reviewing the main features of maximally informative solutions for two neurons obtained in the absence of threshold modulation (Kastner et al., 2015; McDonnell et al., 2006). The most prominent feature of the mutual information is a bifurcation that occurs when noise decreases below a certain, critical value (Figure S1). In the case where both neurons have the same noise levels ν1 = ν2, a single peak at zero threshold difference splits into two symmetric peaks upon decreasing noise level. Each of these peaks represents equivalent solutions obtained by exchanging neuronal indices. One of the peaks describes the case where μ1 > μ2 whereas the other describes the case where μ1 < μ2. When neurons have different noise values ν1 and ν2, the peak with μ1 < μ2 becomes suboptimal if ν1 > ν2. Thus, the lower threshold neurons should have lower noise. This agrees with the intuition that a neuron which is more sensitive to small input fluctuations should have smaller noise. From the measurements of the average spike rate for the two neurons, one can predict the critical noise value (ν) below which one can expect to find neurons with different thresholds encoding the same filtered stimulus x. The critical noise value was indeed above the measured noise values for the adapting and sensitizing retinal ganglion cells (RGCs) (Kastner et al., 2015). In addition, one can make detailed predictions for the expected value μ1 − μ2 based on the measurements of other parameters ν1, ν2 and p, where p is the averaged total spiking probability
Note that both the optimal threshold difference (μ1 − μ2) and critical noise (ν) depend on the average spike rate (pspike) for the cell pair. Therefore, to represent all retinal data (ν1, ν2, μ1 − μ2) on one coordinate frame that is universal across different pspike, we transformed the noise levels to a set of basis normalized by the the rate-dependent ν. Then, we rescaled each observed μ1 − μ2 (y axis) relative to its optimal prediction and spinodal point at (the black and the gray-dashed lines in Figure 6A), similar to the rescaling method provided in Ref. (Kastner et al., 2015). Here, theoretical predictions were in qualitative agreement with experimental measurements, but quantitatively the observed threshold differences between the adapting/sensitizing neuron pairs were systematically smaller than those predicted based on maximizing information (Figure 6A). We now show that taking into account threshold modulation brings theoretical predictions into agreement with experimental data.To understand how threshold modulation affects maximally informative threshold positions, one may note that threshold modulation effectively smooths the information surface computed over long timescales (Figure S4). In the regime where the mutual information has two maxima, it has the effect of bringing the maxima closer to each other. Another effect that proved necessary to take into account is that noise in the primary pathway can be larger for the neuron that experiences smaller threshold modulation, leading to a smaller overall effective noise value for that neuron. In this case, the information transmitted matches the smaller (local) of the two maxima of the information. In other words, the model allows for the possibility that coordination of neural thresholds between neurons might not be able to keep up with changes in input statistics for the circuit to match the properties of the global maximum of information. Instead, we observed that in some cases neural response properties match a local maximum of the information that required smaller adjustments in thresholds following the change in input statistics.Taking both of these effects – threshold modulation and the possibility of local optimality – made it possible to account for the observed threshold differences between sensitizing and adapting cells. Each cell pair was probed with flashes of nine different contrasts, producing four experimental parameters of the neuronal nonlinearity (νeff,1, νeff,2, μ1, μ2) at each contrast. The maximally informative model also has six parameters (μ1, μ2, ν1, ν2, σ, σ). It can predict the difference μ1 − μ2 given a set of values for pspike, ν1, ν2, σ, σ; only three of these five parameters are constrained by the measured input-output functions. Thus, the model is under-constrained for one value of contrast. However, experiments indicate that once neurons are adapted to a given value of contrast, parameters of experimentally measured nonlinearities increase approximately as a linear function of contrast (Laughlin, 1981; Kastner and Baccus, 2011; Brenner et al., 2000a; Fairhall et al., 2001; Baccus and Meister, 2002). We use this observation to fit the maximally informative model across contrasts. The resulting model has eight parameters altogether: the linear and offset terms with respect to contrast for each of the four noise terms (ν1, ν2, σ, σ). Because position of information maxima are affected by changes in any of these parameters, the maximally informative model can therefore be used to predict 27 independent measurements across contrasts (three values of μ1 − μ2, νeff,1, and νeff,2 for each contrast). Supplemental Information contains additional details related to the formalism of maximizing information transmission in neural responses and the procedures for generating the figures.
Least-squared-fitting for parameters of the threshold modulation model from RGCs data
Base on the maximally informative modulation model, at a given pspike the solution to threshold difference between a pair of adapting and sensitizing cell, Δμmodel, is nonlinearly dependent on the magnitude of each noise source (ν, σ). This allows us to separately estimate the magnitude of these noise components from the neural data.The results of least-square fitting were also constrained to match the observed values for νeff,. Seven pairs of adapting (index 1) and sensitizing cells (index 2) were probed by the nine different full range of contrasts (σ = 12% to 36% in 3% intervals (Kastner and Baccus, 2011). The adaptive dynamics of noise level has been experimentally observed in many sensory systems (Laughlin, 1981; Kastner and Baccus, 2011; Brenner et al., 2000a; Fairhall et al., 2001; Baccus and Meister, 2002). Typically, the width of the transition region of the nonlinearity changes linearly with stimulus contrast (standard deviation). This adaptive process serves to optimize the information processing (Brenner et al., 2000a). Here, we assume that both the primary (ν) and the secondary (σ) noise sources are approximately linearly dependent on contrast (σ),
The effective noise also depends on contrast,
where i = 1; 2 denotes adapting or sensitizing neuron, respectively.The parameters are to obtained by the least-squared-fitting for each cell pair while requiring them to also be consistent with νeff, measurements from the shape of the nonlinearity. This model has eight parameters. Although formally it can be fit to data points for each individual cell pair, we reduced the number of parameters in half by focusing on the dominant term between the linear and contrast-independent terms for each type of noise. Initial fitting of the model indicated very small values for , , , and . The final fitting reported here was obtained by setting these terms to zero, i.e., that noise in the primary pathway scales linearly with contrast for both types of cells; threshold modulation was set to be linearly increasing with contrast for adapting cells and to be contrast-independent for sensitizing cells.The observed nonlinearities for a pair of adapting (index 1) and sensitizing cells (index 2) determine the threshold separations (Δμ = μ1 − μ2) and the effective noise levels (νeff,1 or 2). For each cell pair, we aim to dissect two contributions to their νeff,1(or 2): the one from the intrinsic noise level (ν) and that due to threshold modulation (σ), via minimizing the squared-error between the retinal data and the model predictions across the nine contrasts (σ = 12% to 36%(in 3% intervals). Given a contrast (σ) a data point of a cell pair, , consists of three components,
and so does our model ,
Here, is the predicted threshold separation from our model, dependent on the intrinsic ν and modulatory noise σ of each cell types,
The predicted threshold differences (Δμmodel) were first computed discretely in the grid space (ν1, ν2, σ, σ) and interpolated with Mathematica build-in function to construct the solutions between the grids. To avoid biasing the result by the component with largest error-bar, we standardize the of each dimension with the inverse of its standard deviation. That is, the rescaling factors (weights) were
or more specifically,
We defined the sum of weighted squared errors (or residuals) as
where ⊙ denotes component-wise multiplication. The parameter is the best-fit minimizing the weighted least-squared-error,
which predicts how the intrinsic (ν) and the modulation noise (σ) depend on the stimulus contrast (σ). To quantify the goodness of fit, we use the variance (or reduced χ2)
where d.o.f. = the number of degrees of freedom = N − n; N is the number of observations (nine contrasts in our case), and n is the number of fitted parameters. Note that by considering the threshold modulation, the predictions for the minimal threshold differences between the two cell types cannot go below the spinodal line. This makes it difficult to fit the data points adjacent to or below the spinodal region with our model. Therefore, the fitting results for three cell pairs did not adequately capture the trends (Figure 6).Finally, we also fit a single model across all cell pairs and contrasts. The resulting parameters (provided in the last row of Table S1 were consistent with average values of parameters fitted to individual cell pairs (Figure 5).
Analysis of inhibition from amacrine cells versus RFs distance
To quantify the amount of inhibition from the amacrine cells to a ganglion adapting/sensitizing cells (Figure 7), we analyzed how the threshold of the ganglion cells changes when nearby amacrine cells are depolarized or hyperpolarized. For each ganglion cell and amacrine cell condition, the relation between firing rate and filtered input was recorded (c.f. Method of intracellular recording). Fitting the two response curves with sigmoid functions yielded thresholds of a ganglion cell during the hyperpolarizing (μh) and the depolarizing (μd) current injection to the amacrine cell. The difference in thresholds (μd − μh) reflects the impact of amacrine cell inputs on the response properties of the ganglion cell. We analyzed these differences as a function of the receptive field distance between the ganglion and amacrine cells. Overall, the analysis was based on current injection to 40 different amacrine cells and recordings from 144 Off ganglion cells. We note that an amacrine cell usually connects to multiple ganglion cells, and some of the ganglion cells receive inputs from multiple amacrine cells. The red and blue points shown in Figure 7 are obtained by binning (according to RFs distance) results from 169 amacrine-to-adapting cell pairs and 32 amacrine-to-sensitizing pairs, respectively. The standard error in RFs distance (x axis error) is too small to be visible in the plot.
Authors: Kristina D Simmons; Jason S Prentice; Gašper Tkačik; Jan Homann; Heather K Yee; Stephanie E Palmer; Philip C Nelson; Vijay Balasubramanian Journal: PLoS Comput Biol Date: 2013-12-05 Impact factor: 4.475
Authors: Eizaburo Doi; Jeffrey L Gauthier; Greg D Field; Jonathon Shlens; Alexander Sher; Martin Greschner; Timothy A Machado; Lauren H Jepson; Keith Mathieson; Deborah E Gunning; Alan M Litke; Liam Paninski; E J Chichilnisky; Eero P Simoncelli Journal: J Neurosci Date: 2012-11-14 Impact factor: 6.167