Literature DB >> 23592970

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

Nicolas Frémaux1, Henning Sprekeler, Wulfram Gerstner.   

Abstract

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

Entities:  

Mesh:

Year:  2013        PMID: 23592970      PMCID: PMC3623741          DOI: 10.1371/journal.pcbi.1003024

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


  42 in total

1.  Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity.

Authors:  A Arleo; W Gerstner
Journal:  Biol Cybern       Date:  2000-09       Impact factor: 2.086

2.  Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity.

Authors:  Verena Pawlak; Jason N D Kerr
Journal:  J Neurosci       Date:  2008-03-05       Impact factor: 6.167

3.  Gain in sensitivity and loss in temporal contrast of STDP by dopaminergic modulation at hippocampal synapses.

Authors:  Ji-Chuan Zhang; Pak-Ming Lau; Guo-Qiang Bi
Journal:  Proc Natl Acad Sci U S A       Date:  2009-07-20       Impact factor: 11.205

4.  A neuronal learning rule for sub-millisecond temporal coding.

Authors:  W Gerstner; R Kempter; J L van Hemmen; H Wagner
Journal:  Nature       Date:  1996-09-05       Impact factor: 49.962

5.  The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat.

Authors:  J O'Keefe; J Dostrovsky
Journal:  Brain Res       Date:  1971-11       Impact factor: 3.252

6.  Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.

Authors:  Eleni Vasilaki; Nicolas Frémaux; Robert Urbanczik; Walter Senn; Wulfram Gerstner
Journal:  PLoS Comput Biol       Date:  2009-12-04       Impact factor: 4.475

7.  Is there a geometric module for spatial orientation? Insights from a rodent navigation model.

Authors:  Denis Sheynikhovich; Ricardo Chavarriaga; Thomas Strösslin; Angelo Arleo; Wulfram Gerstner
Journal:  Psychol Rev       Date:  2009-07       Impact factor: 8.934

8.  Timing is not Everything: Neuromodulation Opens the STDP Gate.

Authors:  Verena Pawlak; Jeffery R Wickens; Alfredo Kirkwood; Jason N D Kerr
Journal:  Front Synaptic Neurosci       Date:  2010-10-25

Review 9.  Differential regulation of fronto-executive function by the monoamines and acetylcholine.

Authors:  T W Robbins; A C Roberts
Journal:  Cereb Cortex       Date:  2007-09       Impact factor: 5.357

10.  A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.

Authors:  Robert Legenstein; Dejan Pecevski; Wolfgang Maass
Journal:  PLoS Comput Biol       Date:  2008-10-10       Impact factor: 4.475

View more
  31 in total

1.  Spatial generalization in operant learning: lessons from professional basketball.

Authors:  Tal Neiman; Yonatan Loewenstein
Journal:  PLoS Comput Biol       Date:  2014-05-22       Impact factor: 4.475

2.  Evolutionary algorithm optimization of biological learning parameters in a biomimetic neuroprosthesis.

Authors:  S Dura-Bernal; S A Neymotin; C C Kerr; S Sivagnanam; A Majumdar; J T Francis; W W Lytton
Journal:  IBM J Res Dev       Date:  2017-05-23       Impact factor: 1.889

Review 3.  Control of synaptic plasticity in deep cortical networks.

Authors:  Pieter R Roelfsema; Anthony Holtmaat
Journal:  Nat Rev Neurosci       Date:  2018-02-16       Impact factor: 34.870

4.  A biologically plausible learning rule for the Infomax on recurrent neural networks.

Authors:  Takashi Hayakawa; Takeshi Kaneko; Toshio Aoyagi
Journal:  Front Comput Neurosci       Date:  2014-11-25       Impact factor: 2.380

5.  Combining hypothesis- and data-driven neuroscience modeling in FAIR workflows.

Authors:  Olivia Eriksson; Upinder Singh Bhalla; Kim T Blackwell; Sharon M Crook; Daniel Keller; Andrei Kramer; Marja-Leena Linne; Ausra Saudargienė; Rebecca C Wade; Jeanette Hellgren Kotaleski
Journal:  Elife       Date:  2022-07-06       Impact factor: 8.713

6.  A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning.

Authors:  David Kappel; Robert Legenstein; Stefan Habenschuss; Michael Hsieh; Wolfgang Maass
Journal:  eNeuro       Date:  2018-04-24

7.  Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control.

Authors:  Sakyasingha Dasgupta; Florentin Wörgötter; Poramate Manoonpong
Journal:  Front Neural Circuits       Date:  2014-10-28       Impact factor: 3.492

8.  How attention can create synaptic tags for the learning of working memories in sequential tasks.

Authors:  Jaldert O Rombouts; Sander M Bohte; Pieter R Roelfsema
Journal:  PLoS Comput Biol       Date:  2015-03-05       Impact factor: 4.475

9.  RM-SORN: a reward-modulated self-organizing recurrent neural network.

Authors:  Witali Aswolinskiy; Gordon Pipa
Journal:  Front Comput Neurosci       Date:  2015-03-24       Impact factor: 2.380

10.  Liquid computing on and off the edge of chaos with a striatal microcircuit.

Authors:  Carlos Toledo-Suárez; Renato Duarte; Abigail Morrison
Journal:  Front Comput Neurosci       Date:  2014-11-21       Impact factor: 2.380

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.