Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

Literature DB >> 23592970

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

Nicolas Frémaux¹, Henning Sprekeler, Wulfram Gerstner.

Abstract

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

Entities: Chemical Disease Species

Mesh：

Year: 2013 PMID： 23592970 PMCID： PMC3623741 DOI： 10.1371/journal.pcbi.1003024

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

42 in total

1. Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity.

Authors: A Arleo; W Gerstner
Journal: Biol Cybern Date: 2000-09 Impact factor: 2.086

2. Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity.

Authors: Verena Pawlak; Jason N D Kerr
Journal: J Neurosci Date: 2008-03-05 Impact factor: 6.167

3. Gain in sensitivity and loss in temporal contrast of STDP by dopaminergic modulation at hippocampal synapses.

Authors: Ji-Chuan Zhang; Pak-Ming Lau; Guo-Qiang Bi
Journal: Proc Natl Acad Sci U S A Date: 2009-07-20 Impact factor: 11.205

4. A neuronal learning rule for sub-millisecond temporal coding.

Authors: W Gerstner; R Kempter; J L van Hemmen; H Wagner
Journal: Nature Date: 1996-09-05 Impact factor: 49.962

5. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat.

Authors: J O'Keefe; J Dostrovsky
Journal: Brain Res Date: 1971-11 Impact factor: 3.252

6. Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.

Authors: Eleni Vasilaki; Nicolas Frémaux; Robert Urbanczik; Walter Senn; Wulfram Gerstner
Journal: PLoS Comput Biol Date: 2009-12-04 Impact factor: 4.475

7. Is there a geometric module for spatial orientation? Insights from a rodent navigation model.

Authors: Denis Sheynikhovich; Ricardo Chavarriaga; Thomas Strösslin; Angelo Arleo; Wulfram Gerstner
Journal: Psychol Rev Date: 2009-07 Impact factor: 8.934

8. Timing is not Everything: Neuromodulation Opens the STDP Gate.

Authors: Verena Pawlak; Jeffery R Wickens; Alfredo Kirkwood; Jason N D Kerr
Journal: Front Synaptic Neurosci Date: 2010-10-25

Review 9. Differential regulation of fronto-executive function by the monoamines and acetylcholine.

Authors: T W Robbins; A C Roberts
Journal: Cereb Cortex Date: 2007-09 Impact factor: 5.357

10. A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.

Authors: Robert Legenstein; Dejan Pecevski; Wolfgang Maass
Journal: PLoS Comput Biol Date: 2008-10-10 Impact factor: 4.475

31 in total

1. Spatial generalization in operant learning: lessons from professional basketball.

Authors: Tal Neiman; Yonatan Loewenstein
Journal: PLoS Comput Biol Date: 2014-05-22 Impact factor: 4.475

2. Evolutionary algorithm optimization of biological learning parameters in a biomimetic neuroprosthesis.

Authors: S Dura-Bernal; S A Neymotin; C C Kerr; S Sivagnanam; A Majumdar; J T Francis; W W Lytton
Journal: IBM J Res Dev Date: 2017-05-23 Impact factor: 1.889

Review 3. Control of synaptic plasticity in deep cortical networks.

Authors: Pieter R Roelfsema; Anthony Holtmaat
Journal: Nat Rev Neurosci Date: 2018-02-16 Impact factor: 34.870

4. A biologically plausible learning rule for the Infomax on recurrent neural networks.

Authors: Takashi Hayakawa; Takeshi Kaneko; Toshio Aoyagi
Journal: Front Comput Neurosci Date: 2014-11-25 Impact factor: 2.380

5. Combining hypothesis- and data-driven neuroscience modeling in FAIR workflows.

Authors: Olivia Eriksson; Upinder Singh Bhalla; Kim T Blackwell; Sharon M Crook; Daniel Keller; Andrei Kramer; Marja-Leena Linne; Ausra Saudargienė; Rebecca C Wade; Jeanette Hellgren Kotaleski
Journal: Elife Date: 2022-07-06 Impact factor: 8.713

6. A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning.

Authors: David Kappel; Robert Legenstein; Stefan Habenschuss; Michael Hsieh; Wolfgang Maass
Journal: eNeuro Date: 2018-04-24

7. Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control.

Authors: Sakyasingha Dasgupta; Florentin Wörgötter; Poramate Manoonpong
Journal: Front Neural Circuits Date: 2014-10-28 Impact factor: 3.492