| Literature DB >> 23267662 |
Abstract
Recent work has advanced our knowledge of phasic dopamine reward prediction error signals. The error signal is bidirectional, reflects well the higher order prediction error described by temporal difference learning models, is compatible with model-free and model-based reinforcement learning, reports the subjective rather than physical reward value during temporal discounting and reflects subjective stimulus perception rather than physical stimulus aspects. Dopamine activations are primarily driven by reward, and to some extent risk, whereas punishment and salience have only limited activating effects when appropriate controls are respected. The signal is homogeneous in terms of time course but heterogeneous in many other aspects. It is essential for synaptic plasticity and a range of behavioural learning situations.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23267662 PMCID: PMC3866681 DOI: 10.1016/j.conb.2012.11.012
Source DB: PubMed Journal: Curr Opin Neurobiol ISSN: 0959-4388 Impact factor: 6.627
Figure 1Characteristics of phasic dopamine reward prediction error responses. (a) Neuronal coding of reward prediction error closely parallels theoretical prediction error of temporal difference (TD) model ([4], © National Academy of Sciences USA). (b) Temporal discounting of neuronal response to stimulus predicting differently delayed rewards closely parallels behavioural discounting ([15], © Society for Neuroscience). (c) Neuronal response depends on subjective stimulus perception ([24], © National Academy of Sciences USA). (d) Stimulus generalisation explains majority of responses to conditioned aversive stimuli. Change in sensory modality of reward predicting stimulus reduces response to unchanged aversive stimulus ([34], © Nature). (e) Percentages of dopamine neurons activated by reward (blue, left), motivational salience uncontrolled for stimulus or context generalisation (green) and true motivational salience (red, right). Data from [34]. (f) Graded coding of value prediction after initial generalisation coincides with stimulus identification by animal in dot motion task. Percentage of coherently moving dots results in graded percentage of correct performance and reward delivery ([3], © Society for Neuroscience).
Figure 2Dopamine dependency of neuronal plasticity and behavioural learning. (a) Positive timing in spike time dependent plasticity protocol (STDP) results in long term potentiation (LTP) at synapses from cortical inputs to striato-nigral neurons (direct pathway) (black) and is blocked by dopamine D1 receptor antagonist SCH23390 (red) ([77], © Science). (b) Negative timing in STDP protocol results in long term depression (LTD) at cortical synapses onto striato-pallidal neurons (indirect pathway) (black) and is blocked by dopamine D2 receptor antagonist sulpiride (red) ([77], © Science). (c) T-maze learning deficit in mice with NMDA receptor knock-out in midbrain dopamine neurons impairing dopamine burst firing ([84], © National Academy of Sciences USA). (d) Separate performance deficit in mice tested in (c).