Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A distributional code for value in dopamine-based reinforcement learning.

Literature DB >> 31942076

A distributional code for value in dopamine-based reinforcement learning.

Will Dabney¹, Zeb Kurth-Nelson^2,3, Matthew Botvinick^2,4, Naoshige Uchida⁵, Clara Kwon Starkweather⁵, Demis Hassabis², Rémi Munos².

Abstract

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Dopamine

Year: 2020 PMID： 31942076 PMCID： PMC7476215 DOI： 10.1038/s41586-019-1924-6

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Keyword Cloud
Cited

53 in total

1. Beyond the Average View of Dopamine.

Authors: Angela J Langdon; Nathaniel D Daw
Journal: Trends Cogn Sci Date: 2020-05-15 Impact factor: 20.229

Review 2. Distributional Reinforcement Learning in the Brain.

Authors: Adam S Lowet; Qiao Zheng; Sara Matias; Jan Drugowitsch; Naoshige Uchida
Journal: Trends Neurosci Date: 2020-10-19 Impact factor: 13.837

3. Response-based outcome predictions and confidence regulate feedback processing and learning.

Authors: Romy Frömer; Matthew R Nassar; Rasmus Bruckner; Birgit Stürmer; Werner Sommer; Nick Yeung
Journal: Elife Date: 2021-04-30 Impact factor: 8.140

4. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task.

Authors: Iku Tsutsui-Kimura; Hideyuki Matsumoto; Korleki Akiti; Melissa M Yamada; Naoshige Uchida; Mitsuko Watabe-Uchida
Journal: Elife Date: 2020-12-21 Impact factor: 8.140

A distributional code for value in dopamine-based reinforcement learning.

1. Beyond the Average View of Dopamine.

Review 2. Distributional Reinforcement Learning in the Brain.

3. Response-based outcome predictions and confidence regulate feedback processing and learning.

4. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task.

5. Adiposity covaries with signatures of asymmetric feedback learning during adaptive decisions.

6. Transforming task representations to perform novel tasks.

Review 7. How Outcome Uncertainty Mediates Attention, Learning, and Decision-Making.

8. Context-Dependent Multiplexing by Individual VTA Dopamine Neurons.

9. A Unified Framework for Dopamine Signals across Timescales.

Review 10. Dopamine, Updated: Reward Prediction Error and Beyond.