Literature DB >> 21471387

Signals in human striatum are appropriate for policy update rather than value prediction.

Jian Li1, Nathaniel D Daw.   

Abstract

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21471387      PMCID: PMC3132551          DOI: 10.1523/JNEUROSCI.6316-10.2011

Source DB:  PubMed          Journal:  J Neurosci        ISSN: 0270-6474            Impact factor:   6.167


  47 in total

1.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control.

Authors:  Nathaniel D Daw; Yael Niv; Peter Dayan
Journal:  Nat Neurosci       Date:  2005-11-06       Impact factor: 24.884

2.  Valid conjunction inference with the minimum statistic.

Authors:  Thomas Nichols; Matthew Brett; Jesper Andersson; Tor Wager; Jean-Baptiste Poline
Journal:  Neuroimage       Date:  2005-04-15       Impact factor: 6.556

3.  Representation of action-specific reward values in the striatum.

Authors:  Kazuyuki Samejima; Yasumasa Ueda; Kenji Doya; Minoru Kimura
Journal:  Science       Date:  2005-11-25       Impact factor: 47.728

4.  Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia.

Authors:  Randall C O'Reilly; Michael J Frank
Journal:  Neural Comput       Date:  2006-02       Impact factor: 2.026

5.  The role of the dorsomedial striatum in instrumental conditioning.

Authors:  Henry H Yin; Sean B Ostlund; Barbara J Knowlton; Bernard W Balleine
Journal:  Eur J Neurosci       Date:  2005-07       Impact factor: 3.386

6.  Regret and its avoidance: a neuroimaging study of choice behavior.

Authors:  Giorgio Coricelli; Hugo D Critchley; Mateus Joffily; John P O'Doherty; Angela Sirigu; Raymond J Dolan
Journal:  Nat Neurosci       Date:  2005-08-07       Impact factor: 24.884

7.  Dynamic response-by-response models of matching behavior in rhesus monkeys.

Authors:  Brian Lau; Paul W Glimcher
Journal:  J Exp Anal Behav       Date:  2005-11       Impact factor: 2.468

8.  Characterizing dynamic brain responses with fMRI: a multivariate approach.

Authors:  K J Friston; C D Frith; R S Frackowiak; R Turner
Journal:  Neuroimage       Date:  1995-06       Impact factor: 6.556

9.  An fMRI study of reward-related probability learning.

Authors:  M R Delgado; M M Miller; S Inati; E A Phelps
Journal:  Neuroimage       Date:  2004-11-18       Impact factor: 6.556

Review 10.  Neural systems of reinforcement for drug addiction: from actions to habits to compulsion.

Authors:  Barry J Everitt; Trevor W Robbins
Journal:  Nat Neurosci       Date:  2005-11       Impact factor: 24.884

View more
  65 in total

Review 1.  The striatum: where skills and habits meet.

Authors:  Ann M Graybiel; Scott T Grafton
Journal:  Cold Spring Harb Perspect Biol       Date:  2015-08-03       Impact factor: 10.005

2.  The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice.

Authors:  Simon Dunne; Arun D'Souza; John P O'Doherty
Journal:  J Neurophysiol       Date:  2016-04-06       Impact factor: 2.714

3.  Striatal action-value neurons reconsidered.

Authors:  Lotem Elber-Dorozko; Yonatan Loewenstein
Journal:  Elife       Date:  2018-05-31       Impact factor: 8.140

4.  Reinforcement learning with Marr.

Authors:  Yael Niv; Angela Langdon
Journal:  Curr Opin Behav Sci       Date:  2016-10

5.  Impaired adaptation of learning to contingency volatility in internalizing psychopathology.

Authors:  Christopher Gagne; Ondrej Zika; Peter Dayan; Sonia J Bishop
Journal:  Elife       Date:  2020-12-22       Impact factor: 8.140

6.  Ventral striatum and the evaluation of memory retrieval strategies.

Authors:  David Badre; Sophie Lebrecht; David Pagliaccio; Nicole M Long; Jason M Scimeca
Journal:  J Cogn Neurosci       Date:  2014-02-24       Impact factor: 3.225

7.  Reward-related activity in ventral striatum is action contingent and modulated by behavioral relevance.

Authors:  Thomas H B FitzGerald; Philipp Schwartenbeck; Raymond J Dolan
Journal:  J Neurosci       Date:  2014-01-22       Impact factor: 6.167

8.  Impaired Flexible Reward-Based Decision-Making in Binge Eating Disorder: Evidence from Computational Modeling and Functional Neuroimaging.

Authors:  Andrea M F Reiter; Hans-Jochen Heinze; Florian Schlagenhauf; Lorenz Deserno
Journal:  Neuropsychopharmacology       Date:  2016-06-15       Impact factor: 7.853

9.  Learning the opportunity cost of time in a patch-foraging task.

Authors:  Sara M Constantino; Nathaniel D Daw
Journal:  Cogn Affect Behav Neurosci       Date:  2015-12       Impact factor: 3.282

10.  Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia.

Authors:  Carlos Diuk; Karin Tsai; Jonathan Wallis; Matthew Botvinick; Yael Niv
Journal:  J Neurosci       Date:  2013-03-27       Impact factor: 6.167

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.