| Literature DB >> 31942076 |
Will Dabney1, Zeb Kurth-Nelson2,3, Matthew Botvinick2,4, Naoshige Uchida5, Clara Kwon Starkweather5, Demis Hassabis2, Rémi Munos2.
Abstract
Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31942076 PMCID: PMC7476215 DOI: 10.1038/s41586-019-1924-6
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962