| Literature DB >> 22657827 |
Mathieu Schiess1, Robert Urbanczik, Walter Senn.
Abstract
We study synaptic plasticity in a complex neuronal cell model where NMDA-spikes can arise in certain dendritic zones. In the context of reinforcement learning, two kinds of plasticity rules are derived, zone reinforcement (ZR) and cell reinforcement (CR), which both optimize the expected reward by stochastic gradient ascent. For ZR, the synaptic plasticity response to the external reward signal is modulated exclusively by quantities which are local to the NMDA-spike initiation zone in which the synapse is situated. CR, in addition, uses nonlocal feedback from the soma of the cell, provided by mechanisms such as the backpropagating action potential. Simulation results show that, compared to ZR, the use of nonlocal feedback in CR can drastically enhance learning performance. We suggest that the availability of nonlocal feedback for learning is a key advantage of complex neurons over networks of simple point neurons, which have previously been found to be largely equivalent with regard to computational capability.Entities:
Year: 2012 PMID: 22657827 PMCID: PMC3365869 DOI: 10.1186/2190-8567-2-2
Source DB: PubMed Journal: J Math Neurosci Impact factor: 1.300
Fig. 1Sketch of the neuronal cell model. Spatio-temporally clustered postsynaptic potentials (PSP, green) can give rise to NMDA-spikes (red) which superimpose additively in the soma (blue) controlling the generation of action potentials (AP).
Fig. 2Learning to stay quiescent. (A) Learning curves for cell reinforcement (blue) and zone reinforcement (red) when the neuron should not respond with any somatic firing to one pattern which is repeatedly presented. Values shown are averages over 40 runs with different initial weights and a different input pattern. (B) Distributions of the performance after 1500 trials. (C) A bad run of the CR-rule where performance drops dramatically after the 397th pattern presentation. The grey points show the Euclidean norm of the change in the neurons weight matrix W, highlighting the excessively large synaptic update after trial 397. (D) Time course of the somatic potential during trial 397 (the straight line at marks a somatic spike). As shown more clearly by the blow-up in the bottom row an NMDA-spike occurring at yields a value of U which stays strongly positive for some . (U drops thereafter because a NMDA-spike in a different zone ends.) Improbably, however, the sustained elevated value of U after does not lead to a somatic spike. Hence, the likelihood of the observed somatic response Z given the activity in the zone ν where the NMDA-spike at time occurred is quite small, . Indeed, the actual somatic response would have been much more likely without the NMDA-spike, . The discrepancy between the two probabilities yields a large value of in Equation 24, leading to the strong weight change. Error bars in the figure show 1 SEM.
Fig. 3Balanced cell reinforcement (bCR, Equation 26) compared to zone reinforcement. (A) Average performance of bCR (green) and ZR (red) on the same task as in panel 6A. (B) Performance when learning stimulus-response associations for four different patterns; bCR (green), ZR (red), a logarithmic scale is used for the x-axis. The inset shows the distribution of NMDA-spike durations after learning the task with bCR. The performance values in the figure are averages over 40 runs, and error bars show 1 SEM. (C) Development of the average reward signal for bCR (green) and ZR (red) when the task is to spike at the mid time of the single input pattern (, where , , is the ith of the n output spike times, the target spike time, and the pattern duration; if there was no output spike within we added one at T, yielding ). (D) Spike raster plot of the output spike times Z with shown in C using bCR. With ZR, the distribution of spike times after 3000 trials roughly corresponds to the one for bCR after 160 trials (vertical line at ∗), where the two performances coincide (see ∗ and black lines in C). The mean and standard deviation of the spike times at the end of the learning process, averaged across the last 300 trials, was and for bCR and ZR, respectively.