Literature DB >> 23436991

Optimal properties of analog perceptrons with excitatory weights.

Abstract

The cerebellum is a brain structure which has been traditionally devoted to supervised learning. According to this theory, plasticity at the Parallel Fiber (PF) to Purkinje Cell (PC) synapses is guided by the Climbing fibers (CF), which encode an 'error signal'. Purkinje cells have thus been modeled as perceptrons, learning input/output binary associations. At maximal capacity, a perceptron with excitatory weights expresses a large fraction of zero-weight synapses, in agreement with experimental findings. However, numerous experiments indicate that the firing rate of Purkinje cells varies in an analog, not binary, manner. In this paper, we study the perceptron with analog inputs and outputs. We show that the optimal input has a sparse binary distribution, in good agreement with the burst firing of the Granule cells. In addition, we show that the weight distribution consists of a large fraction of silent synapses, as in previously studied binary perceptron models, and as seen experimentally.

Entities: Chemical Disease Species

Mesh：

Year: 2013 PMID： 23436991 PMCID： PMC3578758 DOI： 10.1371/journal.pcbi.1002919

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Purkinje cells (PCs) are the only outputs of the cerebellar cortex, a brain structure involved in motor learning. They receive a very large number (150,000) of excitatory synaptic inputs from Granule Cells (GCs) through parallel fibers (PFs), and a single very strong input from the inferior olive through climbing fibers (CFs). Single PCs have long been considered as a neurobiological implementation of a perceptron [1], [2], the simplest feedforward network endowed with supervised learning [3], since CFs are thought to provide PCs with an error signal [4]. A perceptron learns associations between input patterns and a binary output that are imposed to it. Learning is due to synaptic modifications, under the control of an error signal. The learning capabilities of perceptrons have been extensively studied for unbiased [5], [6] as well as biased patterns [6], and for unconstrained synapses [5], [6]. In real neurons, synapses are either excitatory (glutamatergic synapses), or inhibitory (GABAergic synapses), depending on the identity of the pre-synaptic neurons (except during early development, when GABAergic synapses are initially excitatory and then become inhibitory). A multitude of experiments characterizing synaptic plasticity have shown that the strength, but not the sign, of a synapse can be modified by patterns of neuronal activity. This has led to the study of perceptrons with sign-constrained weights [7], [8], [9], [10]. In particular, Brunel et al. [10] showed that when synaptic weights are constrained to be excitatory (positive or zero), a perceptron at maximal capacity has a distribution of synaptic weights with two components: a finite fraction of zero-weight (‘silent’) synapses; and a truncated Gaussian distribution for the rest of the synapses. They further showed that this distribution is in striking agreement with experimental data [10]. Numerous experiments show however that in the course of specific motor tasks, the firing rate of Purkinje cell varies in an analog, not binary, fashion [11], [12], [13], [14]. We therefore set out to investigate the capacity and distribution of synaptic weights of a perceptron storing associations between analog inputs and outputs. More precisely, each input or output unit can take an analog value drawn from a distribution with a given mean and variance. We show that the optimal input distribution matches the firing pattern of the Granule cells, and weight distribution at maximal capacity reproduces the experimental Parallel Fiber to Purkinje cell synaptic weight distribution.

Results

The analog perceptron

The perceptron consists of inputs and one output. Both inputs and outputs take continuous values. We require this perceptron to learn a set of prescribed random input-output associations, where the inputs (, ) are drawn randomly and independently from a distribution , with mean and standard deviation while the target outputs are drawn randomly and independently from a distribution with mean and standard deviation . Note that since and represent firing rates of input and output cells, respectively, they must be non-negative quantities. In particular, , represent the mean firing rates of granule/Purkinje cell, respectively. The output of the perceptron when a pattern is presented in input is given bywhere is a monotonically increasing static transfer function (f-I curve), are the synaptic weights from input , represents inhibitory inputs that cancel the leading order term in so that the argument of is of order 1. In Purkinje cells, these inhibitory inputs are provided by interneurons of the molecular layer. The goal of perceptron learning is to find a set of synaptic weights for which for all . We focus for simplicity on a linear transfer function , but our results can be applied to arbitrary invertible transfer functions . Indeed, the problem of learning associations () in a perceptron with an arbitrary invertible transfer function is equivalent to the problem of learning () in a linear perceptron. All the results derived in this paper can then be applied to a perceptron with transfer function , except that and are now defined to be the two first moments of .

Storage capacity

In the large limit the probability of finding a set of weights that satisfies for all is expected to be 1 if is below a critical value , while it is 0 when [15]. is therefore the number of associations that can be learned per synapse, and is commonly used as a measure of storage capacity. This storage capacity can be computed analytically using the replica method (see Methods) [6], [16], [17], [10], [15]. The capacity is given by is given by the equation , , and depends on the statistics of the associations as Therefore, the maximal capacity only depends on a single parameter , which is a function of the statistics of the patterns that need to be learned. This dependence is shown in Fig. 1A. It shows that the capacity is exactly equal to 0.5 when , while it decreases monotonically as increases.

Figure 1

A. Maximal capacity as a function of .

A. Maximal capacity as a function of .

B. Mean squared error between the output and the target output as a function of , for (). Red: analytical calculation, Eq. (17); Blue, numerical simulations (with parameters: , , , simulation length , average over 20 trails, error bars: standard deviation). If the number of patterns to be learned exceeds the maximal capacity, the mean squared error becomes strictly positive. It can also be computed using the replica method (see Methods, Eq. (17)). Unsurprisingly, it increases monotonically with , as shown in Fig. 1B which shows the result of the analytical calculation, as well as numerical simulations. If uncorrelated noise is added to the perceptron, the total mean squared error is the sum of the error without noise (Eq. 17) and the variance of the uncorrelated noise. In the simulations, inputs and outputs are drawn from an exponential distribution. The weight update at each presentation is the standard perceptron one, i.e.where is the learning rate. is set to zero if application of the update leads to a negative weight. This corresponds to a gradient descent of a cost function proportional to , in the closed orthant . This learning rule is in qualitative agreement with experimental data on synaptic plasticity in GC to PC synapses [18], . In Purkinje cells, the error signal is thought to be conveyed by climbing fiber (CF) activation. Two protocols have been shown to be effective in eliciting long-term plasticity. Pairing GC with and CF activation leads to Long-Term Depression (LTD) of the synapse, while Long-Term Potentiation (LTP) is induced by stimulating the GC alone (see Fig. 3AB of [19] for details). Writing climbing fiber activation as , we see that Eq. (5) is recovered if one chooses , which captures the two experimental protocols described above.

Figure 3

A. Information as a function of for different levels of , and .

B. Optimal (green line, right y-axis) and information (blue line, left y-axis) as a function of .

Distribution of synaptic weights

The distribution of synaptic weights at maximal capacity can also be computed using the replica method (see [10] for details of the calculation). It turns out that the distribution obeys exactly the same equation as in the binary perceptron, i.e.whereand is the average synaptic weight. In particular the fraction of zero weight synapses is . Interestingly, there is a very simple relationship between capacity and fraction of silent synapses, , that holds for any value of . The fraction of silent synapses is shown as a function of in Fig. 2A. It shows that when , and increases monotonically with .

Figure 2

A. Fraction of silent synapses at maximal capacity as a function of .

A. Fraction of silent synapses at maximal capacity as a function of .

B. Distribution of synaptic weights for , at maximal capacity (). Red: analytical calculation, Eq. (6); Blue, numerical simulations (with parameters: , , , simulation length ). C. Fraction of silent synapses as a function of , beyond the maximal capacity (), for (red: analytical calculation, ); blue: numerical simulations, with parameters , , , simulation length , average over 10 trails, error bars: standard deviation). The full distribution of weights is shown in Fig. 2B, together with the results of a numerical simulation (see parameters in the caption of Fig. 2B). The theoretical distribution of synaptic weights is in good agreement with experimental measurements of the efficacy of a large set of GC to PC synapses, using paired recordings in vitro (see Fig. 6A of [10] for details) [20], [21], [10]. Above maximal capacity, , the distribution of synaptic weights is still given by Eq. (6), but the fraction of zero weight synapses decreases monotonically with , and goes to zero in the large limit (see Fig. 2C). In that limit the distribution becomes increasingly close to a Gaussian distribution peaked around a positive value, with a width that tends to zero in the large limit.

Statistics of inputs and outputs maximizing storage capacity

To maximize storage capacity, should be as small as possible. We first ask which distribution of inputs maximize capacity. From Eq. (4), it is clear that to maximize capacity, should be as small as possible, while should be as large as possible. Since is a distribution of firing rates, it must be bounded between 0 and a maximal firing rate . The distribution of a bounded variable that maximizes the variance with a fixed mean is a binary distribution . Thus, we predict that to optimize capacity, patterns of activity in the Granule cell layer should be sparse (to ensure is small), but active cells should be active close to their maximal firing rates. Interestingly, this is in striking agreement with available data [22], [19], [23] showing that (i) Granule cells have very sparse activity in vivo (average firing rates of 0.5 Hz [22]) (ii) they can respond with brief, high frequency bursts of action potentials to sensory inputs (with an average frequency of 77 Hz within the burst, and maximal frequencies up to 250 Hz, see e.g. Fig. 3 of [22]). The next question is which distribution of output firing rates optimizes the capacity. Eq. (4) makes it clear the capacity is optimized for . In this limit however, all input patterns lead to exactly the same output, and the Purkinje cell output contains no information on which input was presented. This is of course not a desirable outcome, and suggests the capacity is not the correct measure to maximize in this case. We therefore turn to the Shannon mutual information between the Purkinje cell output and its inputs as a more suitable measure. In the presence of additive Gaussian noise of zero mean and standard deviation , this is simply the mutual information of a Gaussian channel with a signal-to-noise ratio , i.e. bits per pattern (see e.g. [24]). The total information in bits per synapse is therefore . The information is zero when , and reaches a maximum for a finite value of , which depends on both the noise standard deviation and . Fig. 3A shows the information as a function of , for different values of , for . It shows that the optimal value of increases approximately linearly with for large (see Fig. 3B).

A. Information as a function of for different levels of , and .

B. Optimal (green line, right y-axis) and information (blue line, left y-axis) as a function of .

Discussion

In this paper, we have considered an analog firing rate model for a Purkinje cell with plastic excitatory weights, and derived both its maximal capacity and the distribution of weights at maximal capacity. We showed that the capacity is of the same order as in a binary perceptron model. The distribution of synaptic weights of the analog perceptron is composed at maximal capacity of two parts: a large fraction () of silent synapses and a truncated Gaussian. It has exactly the same shape as in several other models: a standard binary perceptron [10], and a bistable perceptron [25]. This distribution is in quantitative agreement with a combination of electron microscopy and electrophysiological data in adult rat slices [20], [21], [10]. Furthermore, a gradient descent learning rule leading to maximal capacity bears strong similarities with synaptic plasticity experiments: LTD when PF and CF are coactivated, LTP when PF fires alone (i.e. CF below baseline, thus ) [18], [19]. We found that in order to maximize the capacity, the input variance should be as large as possible. We argue that GCs in vivo are close to such an optimal distribution, since they fire high-frequency bursts at very low rates [22], [19], [23]. Furthermore, GC bursts have been found in some experiments to be critical to induce plasticity in PF to PC synapse [26]. Indeed, no plasticity is induced in those protocols with a single GC spike. Secondly, lower variance in the output also increases the capacity, but at a cost of losing information contained in the output, in the presence of noise. For a given variance of the noise, there is an optimal variance of the output that maximizes the information contained in the output. The model we have studied here is essentially equivalent to the ADALINE (Adaptive Linear Neuron) model [27], whose storage capacity, in the absence of constraints on synaptic weights, is equal to 1. The result can be easily intuitively understood by the fact that when , there are exactly N linear equations to solve, Eq. 1, with N unknowns, (see e.g. [15]). We have shown here that the constraints that all synaptic weights should be positive or zero leads to a capacity which is decreased by a factor 2 or more, depending on the value of . This decrease in capacity is similar to what is observed in the standard perceptron with excitatory synapses [7], [8], [9], [10]. Note that learning associations with constrained weights is similar conceptually to non-negative matrix factorization [28], [29]. Generalizations of such models in the temporal domain (the so-called adaptive filter models) have been proposed to describe learning in the cerebellar cortex [30], [31], [32], [33]. It would be of interest to investigate capacity and distribution of synaptic weights of such models. In this paper, we have focused on a single plasticity site, the GC to PC synapse. Many other sites of plasticity are known to exist in the cerebellum [18]. Future studies are needed to clarify the impact of these additional sites of plasticity on the learning capabilities of this structure.

Methods

Calculation of the storage capacity

The replica method involves calculating the average logarithm of the volume of the space of weights satisfying all constraints given by Eq. (1) [6]. To compute the average logarithm, one uses the replica trick: replicas of the system are introduced, one computeswhere represents an average over the patterns, and is a replica index. This calculation is done using a standard procedure. After introducing integral representations for the delta functions, one averages over the distribution of the patterns. One then introduces order parameters together with conjugate parameters , and . We then use a replica-symmetric ansatz (all the order parameters are taken to be independent of replica index ), perform the limit and obtain where in the Equation for (Eq. (12)), the two first lines are identical to the binary perceptron with excitatory weights [10], while the last line is specific to the analog perceptron. In the large limit, the integral in Eq. (11) is dominated by the extremum of . The typical values of all order parameters are then obtained by the resulting saddle point equations, setting the derivatives of with respect to all order parameters to zero. The maximal capacity is obtained in the limit , for which the volume vanishes. This limit yields Eqs. (2,4).

Calculation of the mean squared error

Following [34], we introduce a cost function which is given by the sum of the squared error for all patterns,and compute its minimum over the space of weights. This is done introducing a partition function ,where is an inverse temperature, and computing using the replica method. The mean squared error is then given by To perform this calculation, a new parameter has to be introduced,which will remain finite when in the limit , . The mean squared error is then given bywhere When , diverges to infinity, , and Eqs. (19,20) reduce to Eqs. (2,3).

21 in total

1. Learning the parts of objects by non-negative matrix factorization.

Authors: D D Lee; H S Seung
Journal: Nature Date: 1999-10-21 Impact factor: 49.962

Review 2. Beyond parallel fiber LTD: the diversity of synaptic and non-synaptic plasticity in the cerebellum.

Authors: C Hansel; D J Linden; E D'Angelo
Journal: Nat Neurosci Date: 2001-05 Impact factor: 24.884

3. Decorrelation control by the cerebellum achieves oculomotor plant compensation in simulated vestibulo-ocular reflex.

Authors: Paul Dean; John Porrill; James V Stone
Journal: Proc Biol Sci Date: 2002-09-22 Impact factor: 5.349

4. Integration of quanta in cerebellar granule cells during sensory processing.

Authors: Paul Chadderton; Troy W Margrie; Michael Häusser
Journal: Nature Date: 2004-04-22 Impact factor: 49.962

5. Quantitative study of granule and Purkinje cells in the cerebellar cortex of the rat.

Authors: R J Harvey; R M Napper
Journal: J Comp Neurol Date: 1988-08-08 Impact factor: 3.215

6. Discharge of Purkinje and cerebellar nuclear neurons during rapidly alternating arm movements in the monkey.

Authors: W T Thach
Journal: J Neurophysiol Date: 1968-09 Impact factor: 2.714

7. Simulation of adaptive modification of the vestibulo-ocular reflex with an adaptive filter model of the cerebellum.

Authors: M Fujita
Journal: Biol Cybern Date: 1982 Impact factor: 2.086

8. Functions of interneurons in mouse cerebellum.

Authors: Neal H Barmack; Vadim Yakhnitsa
Journal: J Neurosci Date: 2008-01-30 Impact factor: 6.167

9. Elimination of climbing fiber instructive signals during motor learning.

Authors: Michael C Ke; Cong C Guo; Jennifer L Raymond
Journal: Nat Neurosci Date: 2009-08-16 Impact factor: 24.884

10. Cerebellar motor learning: when is cortical plasticity not enough?

Authors: John Porrill; Paul Dean
Journal: PLoS Comput Biol Date: 2007-10 Impact factor: 4.475

8 in total

1. Is cortical connectivity optimized for storing information?

Authors: Nicolas Brunel
Journal: Nat Neurosci Date: 2016-04-11 Impact factor: 24.884

2. Purkinje cell intrinsic excitability increases after synaptic long term depression.

Authors: Zhen Yang; Fidel Santamaria
Journal: J Neurophysiol Date: 2016-06-15 Impact factor: 2.714

3. Cerebellar learning using perturbations.

Authors: Guy Bouvier; Johnatan Aljadeff; Claudia Clopath; Célian Bimbard; Jonas Ranft; Antonin Blot; Jean-Pierre Nadal; Nicolas Brunel; Vincent Hakim; Boris Barbour
Journal: Elife Date: 2018-11-12 Impact factor: 8.140

4. A Three-Threshold Learning Rule Approaches the Maximal Capacity of Recurrent Neural Networks.

Authors: Alireza Alemi; Carlo Baldassi; Nicolas Brunel; Riccardo Zecchina
Journal: PLoS Comput Biol Date: 2015-08-20 Impact factor: 4.475

5. Energy Efficient Sparse Connectivity from Imbalanced Synaptic Plasticity Rules.

Authors: João Sacramento; Andreas Wichert; Mark C W van Rossum
Journal: PLoS Comput Biol Date: 2015-06-05 Impact factor: 4.475

6. Latching dynamics in neural networks with synaptic depression.

Authors: Carlos Aguilar; Pascal Chossat; Martin Krupa; Frédéric Lavigne
Journal: PLoS One Date: 2017-08-28 Impact factor: 3.240

7. Gradients in the mammalian cerebellar cortex enable Fourier-like transformation and improve storing capacity.

Authors: Isabelle Straub; Laurens Witter; Abdelmoneim Eshra; Miriam Hoidis; Niklas Byczkowicz; Sebastian Maas; Igor Delvendahl; Kevin Dorgans; Elise Savier; Ingo Bechmann; Martin Krueger; Philippe Isope; Stefan Hallermann
Journal: Elife Date: 2020-02-05 Impact factor: 8.140

8. Optimal learning with excitatory and inhibitory synapses.

Authors: Alessandro Ingrosso
Journal: PLoS Comput Biol Date: 2020-12-28 Impact factor: 4.475

8 in total