Literature DB >> 18624657

Stimulus representation and the timing of reward-prediction errors in models of the dopamine system.

Elliot A Ludvig1, Richard S Sutton, E James Kehoe.   

Abstract

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18624657     DOI: 10.1162/neco.2008.11-07-654

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  48 in total

1.  Selective maintenance of value information helps resolve the exploration/exploitation dilemma.

Authors:  Michael N Hallquist; Alexandre Y Dombrovski
Journal:  Cognition       Date:  2018-11-28

Review 2.  Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis.

Authors:  Henry W Chase; Poornima Kumar; Simon B Eickhoff; Alexandre Y Dombrovski
Journal:  Cogn Affect Behav Neurosci       Date:  2015-06       Impact factor: 3.282

3.  The Medial Prefrontal Cortex Shapes Dopamine Reward Prediction Errors under State Uncertainty.

Authors:  Clara Kwon Starkweather; Samuel J Gershman; Naoshige Uchida
Journal:  Neuron       Date:  2018-04-12       Impact factor: 17.173

Review 4.  Striatal action-learning based on dopamine concentration.

Authors:  Genela Morris; Robert Schmidt; Hagai Bergman
Journal:  Exp Brain Res       Date:  2009-11-11       Impact factor: 1.972

Review 5.  Reinforcement learning, conditioning, and the brain: Successes and challenges.

Authors:  Tiago V Maia
Journal:  Cogn Affect Behav Neurosci       Date:  2009-12       Impact factor: 3.282

6.  Alternative time representation in dopamine models.

Authors:  François Rivest; John F Kalaska; Yoshua Bengio
Journal:  J Comput Neurosci       Date:  2009-10-22       Impact factor: 1.621

7.  A model of interval timing by neural integration.

Authors:  Patrick Simen; Fuat Balci; Laura de Souza; Jonathan D Cohen; Philip Holmes
Journal:  J Neurosci       Date:  2011-06-22       Impact factor: 6.167

8.  Rethinking dopamine as generalized prediction error.

Authors:  Matthew P H Gardner; Geoffrey Schoenbaum; Samuel J Gershman
Journal:  Proc Biol Sci       Date:  2018-11-21       Impact factor: 5.349

9.  Learning to represent reward structure: a key to adapting to complex environments.

Authors:  Hiroyuki Nakahara; Okihide Hikosaka
Journal:  Neurosci Res       Date:  2012-10-13       Impact factor: 3.304

10.  Temporal-difference reinforcement learning with distributed representations.

Authors:  Zeb Kurth-Nelson; A David Redish
Journal:  PLoS One       Date:  2009-10-20       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.