Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Signals in human striatum are appropriate for policy update rather than value prediction.

Literature DB >> 21471387

Signals in human striatum are appropriate for policy update rather than value prediction.

Abstract

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2011 PMID： 21471387 PMCID： PMC3132551 DOI： 10.1523/JNEUROSCI.6316-10.2011

Source DB: PubMed Journal: J Neurosci ISSN： 0270-6474 Impact factor: 6.167

47 in total

1. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control.

Authors: Nathaniel D Daw; Yael Niv; Peter Dayan
Journal: Nat Neurosci Date: 2005-11-06 Impact factor: 24.884

2. Valid conjunction inference with the minimum statistic.

Authors: Thomas Nichols; Matthew Brett; Jesper Andersson; Tor Wager; Jean-Baptiste Poline
Journal: Neuroimage Date: 2005-04-15 Impact factor: 6.556

3. Representation of action-specific reward values in the striatum.

Authors: Kazuyuki Samejima; Yasumasa Ueda; Kenji Doya; Minoru Kimura
Journal: Science Date: 2005-11-25 Impact factor: 47.728

4. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia.

Authors: Randall C O'Reilly; Michael J Frank
Journal: Neural Comput Date: 2006-02 Impact factor: 2.026

5. The role of the dorsomedial striatum in instrumental conditioning.

Authors: Henry H Yin; Sean B Ostlund; Barbara J Knowlton; Bernard W Balleine
Journal: Eur J Neurosci Date: 2005-07 Impact factor: 3.386

6. Regret and its avoidance: a neuroimaging study of choice behavior.

Authors: Giorgio Coricelli; Hugo D Critchley; Mateus Joffily; John P O'Doherty; Angela Sirigu; Raymond J Dolan
Journal: Nat Neurosci Date: 2005-08-07 Impact factor: 24.884

7. Dynamic response-by-response models of matching behavior in rhesus monkeys.

Authors: Brian Lau; Paul W Glimcher
Journal: J Exp Anal Behav Date: 2005-11 Impact factor: 2.468

8. Characterizing dynamic brain responses with fMRI: a multivariate approach.

Authors: K J Friston; C D Frith; R S Frackowiak; R Turner
Journal: Neuroimage Date: 1995-06 Impact factor: 6.556

9. An fMRI study of reward-related probability learning.

Authors: M R Delgado; M M Miller; S Inati; E A Phelps
Journal: Neuroimage Date: 2004-11-18 Impact factor: 6.556

Review 10. Neural systems of reinforcement for drug addiction: from actions to habits to compulsion.

Authors: Barry J Everitt; Trevor W Robbins
Journal: Nat Neurosci Date: 2005-11 Impact factor: 24.884

65 in total

Review 1. The striatum: where skills and habits meet.

Authors: Ann M Graybiel; Scott T Grafton
Journal: Cold Spring Harb Perspect Biol Date: 2015-08-03 Impact factor: 10.005

2. The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice.

Authors: Simon Dunne; Arun D'Souza; John P O'Doherty
Journal: J Neurophysiol Date: 2016-04-06 Impact factor: 2.714

3. Striatal action-value neurons reconsidered.

Authors: Lotem Elber-Dorozko; Yonatan Loewenstein
Journal: Elife Date: 2018-05-31 Impact factor: 8.140

4. Reinforcement learning with Marr.

Authors: Yael Niv; Angela Langdon
Journal: Curr Opin Behav Sci Date: 2016-10

5. Impaired adaptation of learning to contingency volatility in internalizing psychopathology.

Authors: Christopher Gagne; Ondrej Zika; Peter Dayan; Sonia J Bishop
Journal: Elife Date: 2020-12-22 Impact factor: 8.140

6. Ventral striatum and the evaluation of memory retrieval strategies.

Authors: David Badre; Sophie Lebrecht; David Pagliaccio; Nicole M Long; Jason M Scimeca
Journal: J Cogn Neurosci Date: 2014-02-24 Impact factor: 3.225

7. Reward-related activity in ventral striatum is action contingent and modulated by behavioral relevance.

Authors: Thomas H B FitzGerald; Philipp Schwartenbeck; Raymond J Dolan
Journal: J Neurosci Date: 2014-01-22 Impact factor: 6.167

8. Impaired Flexible Reward-Based Decision-Making in Binge Eating Disorder: Evidence from Computational Modeling and Functional Neuroimaging.

Authors: Andrea M F Reiter; Hans-Jochen Heinze; Florian Schlagenhauf; Lorenz Deserno
Journal: Neuropsychopharmacology Date: 2016-06-15 Impact factor: 7.853

9. Learning the opportunity cost of time in a patch-foraging task.

Authors: Sara M Constantino; Nathaniel D Daw
Journal: Cogn Affect Behav Neurosci Date: 2015-12 Impact factor: 3.282

10. Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia.

Authors: Carlos Diuk; Karin Tsai; Jonathan Wallis; Matthew Botvinick; Yael Niv
Journal: J Neurosci Date: 2013-03-27 Impact factor: 6.167