| Literature DB >> 29269085 |
Melissa J Sharpe1, Geoffrey Schoenbaum2.
Abstract
The phasic dopamine error signal is currently argued to be synonymous with the prediction error in Sutton and Barto (1987, 1998) model-free reinforcement learning algorithm (Schultz et al., 1997). This theory argues that phasic dopamine reflects a cached-value signal that endows reward-predictive cues with the scalar value inherent in reward. Such an interpretation does not envision a role for dopamine in more complex cognitive representations between events which underlie many forms of associative learning, restricting the role dopamine can play in learning. The cached-value hypothesis of dopamine makes three concrete predictions about when a phasic dopamine response should be seen and what types of learning this signal should be able to promote. We discuss these predictions in light of recent evidence which we believe provide particularly strong tests of their validity. In doing so, we find that while the phasic dopamine signal conforms to a cached-value account in some circumstances, other evidence demonstrate that this signal is not restricted to a model-free cached-value reinforcement learning signal. In light of this evidence, we argue that the phasic dopamine signal functions more generally to signal violations of expectancies to drive real-world associations between events. Published by Elsevier Inc.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29269085 PMCID: PMC6136434 DOI: 10.1016/j.nlm.2017.12.002
Source DB: PubMed Journal: Neurobiol Learn Mem ISSN: 1074-7427 Impact factor: 2.877
Fig. 1.Brief optogenetic activation of VTA dopamine neurons strengthens associations between cues (adapted from Sharpe. Chang, et al., 2017). Plots show number of food cup entries occurring during cue presentation across all phases of the blocking of sensory preconditioning task for the eYFP control group (top) and the ChR2 experimental group (bottom): (A) preconditioning, (B) conditioning, and (C) the probe test. Brief stimulation of dopamine neurons in the ChR2 group during the presentation of X when it was preceded by compound AC unblocked learning of the C–X association. This allowed C to enter into an association with sucrose-pellet reward and promote conditioned responding directed towards the food port. ** indicates significance at the p < .05 level for either a main effect (F vs D) or simple main effect following a significant interaction (D vs C).