Literature DB >> 29505948

The biological and behavioral computations that influence dopamine responses.

Abstract

Phasic dopamine responses demonstrate remarkable simplicity; they code for the differences between received and predicted reward values. Yet this simplicity belies the subtle complexity of the psychological, computational, and contextual factors that influence this signal. Advances in behavioral paradigms and models, in monkeys and rodents, have demonstrated that phasic dopamine responses reflect numerous behavioral computations and factors including choice, subjective value, confidence, and context. The application of optogenetics has provided evidence that dopamine reward prediction error responses cause value learning. Furthermore, studies using advanced circuit tracing techniques have begun to uncover the biological network implementation of the reward learning algorithm. The purpose of this review is to summarize the recent advances in dopamine neurophysiology and synthesize an updated account of the behavioral function of dopamine signals.

Entities: Chemical Disease Species

Mesh：

Substances：
Dopamine

Year: 2018 PMID： 29505948 PMCID： PMC6095465 DOI： 10.1016/j.conb.2018.02.005

Source DB: PubMed Journal: Curr Opin Neurobiol ISSN： 0959-4388 Impact factor: 6.627

Introduction

Reward prediction errors are arguably one of the oldest biological computations on Earth. The single cell bacteria that dominated life for over two billion years detected and responded to positive and negative differences, in time and space, in the concentrations of environmental substances [1]. Positive concentration changes evoke approach behavior in the form of movements towards the source, whereas increasing concentrations of harmful chemicals cause bacteria to avoid the source and ‘tumble’ away in random directions [2,3]. Much has changed in two billion (or so) years of evolution, but computation of unpredicted changes for better or worse remains critical to optimal behavioral function and is broadly employed in the brain. Phasic dopamine responses constitute the prime example of neuronal reward prediction error coding. Dopamine neurons are predominantly located in the midbrain A8, A9, and A10 cell groups that correspond roughly to the Retrorubral Field (RRF), the Substantia Nigra pars compacta (SNc) and Ventral Tegmental Area (VTA), respectively [4]. These neurons receive synaptic input from over 30 different brain regions [5-11], and send the majority of their projections to basal ganglia and frontal cortex areas involved in motor control, learning, and cognitive function [11-14]. They respond to rewards and reward predicting cues with phasic bursts of action potentials that code for reward prediction errors, the differences between received and predicted rewards [15,16]. Dopamine prediction error responses are an ideal mechanism to guide behaviors to harvest more and better rewards. Positive prediction error responses indicate that the preceding action should be repeated or invigorated, whereas negative prediction error responses indicate that the preceding behavior should be decreased or avoided [17]. Recent studies have shown that numerous behavioral computations, including value, choice, confidence, and contextual expectations are factored into the canonical reward prediction error (RPE) response in dopamine neurons [18,19-21,22,23]. Next generation technologies have been critical to understanding the behavioral functions of dopamine neurons [24,25,26,27,28], their downstream effects [29,30], and how they compute RPEs [6,9,31,32,33,34]. This non-comprehensive review endeavors to highlight the recent novel findings in dopamine physiology as it pertains to reward coding and its behavioral consequences.

Phasic dopamine signals are reinforcement learning signals

Prediction errors are the fundamental element of reinforcement learning models, including the Rescorla-Wagner [35] and temporal difference (TD) [17] models. Prediction errors are used to update (i.e., learn) the value of predictive stimuli. The prediction error in TD models provides a theoretical account for phasic dopamine activity [36,37]. The TD prediction error (TDPE) is a diference between the predicted and actual value: Thus, subtraction is the fundamental operation that guides value updating. To verify that subtraction governs the response of dopamine neurons, the activity of optogenetically identified mouse dopamine neurons was recorded during reward delivery. The delivered rewards were either (a) completely unpredicted — neither the magnitude nor timing was known, or (b) followed a cue (odor) that predicted the average magnitude and exact timing — the prediction was a constant and only the exact reward magnitude was unknown. The constant reward expectation generated by the predictive odor reduced, by an equivalent amount, the dopamine response to every reward magnitude [32]. This result indicates that dopamine neurons perform subtraction of expected reward value from actual value, as opposed to using divisive operations that are more commonly observed in neural circuits [38,39]. Moreover, every recorded dopamine neuron used a similar subtractive algorithm [31]. These results confirm that, just like the prediction error signal that forms the core of reinforcement learning models, the magnitude of the phasic dopamine response is governed by subtraction. More than two decades of research has provided strong correlational evidence that phasic dopamine responses constitute a reward learning signal (for a concise summary, see [16], a more comprehensive review is provided in [15]). However, new techniques like optogenetics finally permit us to ask whether dopamine signals cause learning to occur. Prediction error responses have been simulated using optogenetic techniques in a variety of behavioral tasks in mice [27,28,40], rats [26,41,42], and monkeys [25]. In every species tested, phasic optogenetic stimulation or suppression of dopamine neurons has resulted in behavioral obser-vations consistent with a critical role for dopamine neurons in reward learning. A fundamental insight from animal learning theory is that rewards must be unpredicted, that is, they must generate reward prediction errors, for learning to occur [35]. The strongest evidence for the causal role of dopamine in learning comes from experimental manipulations in behavioral paradigms where prediction errors would not normally occur, and where no learning would normally happen. These paradigms reveal how introduction of phasic activations or suppressions of dopamine neurons affect learning. Effects of optical activation and suppression of dopamine neurons in rats have been tested during a blocking and an overestimation paradigm, respectively [24,26]. During blocking, formation of associative strength between a conditioned stimulus (CS) and an unconditioned stimulus (US) is ‘blocked’ by a secondary stimulus that fully predicts the US. Dopamine neurons do not respond to CSs that have been blocked [43]. Artificial phasic dopamine activations unblock the CS leading to increased conditioned responses (time spent in the reward port) and indicating learning of the unblocked CS-US association [26]. Thus, optogenetic activation of phasic dopamine mimics the effects of positive prediction errors, and is sufficient to cause associative learning. During overexpectation, the compound presentation of two reward-predicting CSs generates heightened expectation — ‘overexpectation’ — that likely corresponds to both rewards being delivered. The negative prediction errors associated with delivery of only one reward leads to extinction of the original CS-reward associations [35,44-46]. In a modified overexpectation paradigm, the two rewards were actually delivered, fulfilling the heightened expectations, and this modification eliminated the extinction. Phasic optogenetic silencing of dopamine neurons reinstates extinction learning in the modified overexpectation task [24]. Thus, optogenetic silencing of dopamine mimics the effects of negative prediction errors, and is sufficient to cause extinction learning. Together, these findings provide evidence that phasic dopamine activations and suppressions constitute bidirectional teaching signals that cause increases and decreases (respectively) in the associative strengths between rewards and their predictors. In many situations, including the behavioral tasks reviewed so far, dopamine signals update predictions using a ‘model-free’-like algorithm. That is, cue-outcome associations are updated according to direct experience of the cues and outcomes. In contrast, some outcomes can be used to update a model that contains multiple associations. Such ‘model-based’ learning can occur, for instance, during a reversal learning task (for a review of model-free vs model-based reinforcement learning, see [47]). Monkeys learned that one cue predicted reward while another cue predicted no reward, and on a randomly selected trial the reward contingencies reversed. Model-based learning can use the outcome of the first reversal trial to update the value of both stimuli, even with no direct experience of the other cue-outcome association. Dopamine responses reflected values updated according to this model-based rule [48]. This result suggests that the dopamine system is adapted to efficiently learn environmental reward contingencies whether they are experienced directly or merely inferred. Accordingly, this neuronal teaching signal can support multiple forms of reinforcement learning and likely updates value correlates throughout the brain.

Dopamine responses reflect behavioral computations associated with value

Most rewards do not possess a common physical scale for direct comparison, and they often enter awareness via biophysically distinct pathways. Food, drink, money, and social interaction are but a few examples of the larger category of objects, events, or thoughts that we readily recognize as rewards [15]. Despite the heterogeneity of reward features, individuals quickly appreciate reward value and readily exchange one reward for another. For example, individuals readily handover money in exchange for ice cream. This behavior implies that coding of reward value is not critically dependent of the sensory properties of rewards. When monkeys make choices between different reward types, dopamine responses are larger to more preferred rewards, compared to less preferred rewards. Importantly, when choices indicate indifference between two rewards, the dopamine responses to those rewards are indistinguishable (Figure 1a) [20]. Similarly, when rats have been fed to satiety on one reward, their choices indicate that value of the overfed reward is decreased, and the dopamine response to the overfed reward is also decreased [49]. These patterns of activity suggest that dopamine responses reflect the subjective value of rewards.

Figure 1:

Dopamine neurons reflect the behavioral computations of value. (a) Dopamine neurons represent a common scale of value. Monkeys indicated preference between different reward types by making choices. Orange and brown boxes represent the CSs that were associated with the different rewards, and that the monkeys chose between. ‘Greater than’ symbols indicate more preferred, whereas tildes indicate choice indifference. Peri-Stimulus Time Histogram (PSTH) of dopamine responses to onset of visual reward-predicting CS. Individual PSTH are color and dash coded according to the CS. Dopamine responses were largest for the most preferred rewards, smallest for the least preferred rewards, and indistinguishable for rewards that the monkey was indifferent between. This figure was modified and reproduced with permission from Ref. [20]. (b) The mathematical relationship between subjective value (utility) and physical reward size was described by an S-shaped function (red line). Grey bars indicate dopamine responses to unpredicted rewards that varied in magnitude between 0.1 ml and 1.2 ml in 0.1 ml increments. Error bars are SEM across 16 neurons. This figure was modified and reproduced with permission from Ref. [23]. (c) Raster plot (top) and PSTH (bottom) of one dopamine neuron in response to the onset of a RDM stimulus. Data are divided according to the accuracy of the subsequent choice. Dopamine neurons were more active on trials when the monkey chose correctly (green), rather than incorrectly (red). Numbers along the side of the raster plot indicate RDM coherence. Shaded error bars on PSTH are SEM across trials. This figure was modified and reproduced with permission from Ref. [19]. (d) Dopamine neurons are silenced by distorted audio feedback (DAF) during bird song learning. Voltage traces (top) and raster plots (bottom) around normal (‘Normal’) and distorted (‘DAF’) audio feedback. This figure was modified and reproduced with permission from Ref. [18].

To demonstrate the functional relationship between subjective value and dopamine activity, subjective value was measured as a function of physical value in monkeys making choices between risky and safe outcomes. There is a mathematical relationship between risk attitudes, whether the monkey is risk seeking or risk avoiding, and the curvatures of the resulting value functions [50]. Choices between risky rewards revealed an ‘S’-shaped subjective value (utility) function that reflected risk seeking for small rewards and risk avoiding for large rewards (Figure 1b, red line) [23,51]. The magnitudes of dopamine responses to unpredicted rewards were correlated with the shape of the measured utility functions (Figure 1b) [23]. During behavioral choices, dopamine responses scaled with the value of the chosen options [21,52]. Moreover, dopamine responses were larger on trials when the monkey indicated the correct choice, compared to when it was mistaken (Figure 1c) [19]. These results demonstrate that dopamine responses integrate moment-by-moment behavioral information with reinforcement learning to code the same dynamic value information that is used to make decisions. Value may be gained from physical rewards, but also may be derived from the internal evaluation of performance. A novel study examined dopamine activity during bird song learning. As juvenile birds learned to sing, distorted audio feedback (DAF) was provided at unpredictable times. Dopamine neurons paused their firing when they heard the DAF, as though they were responding to a negative prediction error (Figure 1d). The response was contingent upon the bird singing; dopamine neurons were unaffected by DAF when they were not singing [18]. This result demonstrates that dopamine neurons are active during performance monitoring, but it remains unclear whether this response reflected the performance error itself or whether it reflected the value of that error. Developing behavioral technologies to measure the value of good performance is key to understanding how reward and motor systems interact to motivate motor learning. Value has many sources, including long term reinforcement learning history, context, and trial-by-trial behavioral factors. Overall, these recent studies demonstrate that dopamine prediction error responses reflect these many sources of value, and provide deeper insights into the nature of biological reinforcement learning.

Biological implementation of reward prediction error computations

Dopamine neurons receive input from more than thirty brain areas, including the lateral hypothalamus, subthalamic nucleus, the pedunculopontine nucleus, the lateral habenula, the striatum, and the dorsal raphe nucleus [6,7,11,53,54]. It is of considerable interest how dopamine neurons integrate information from these diverse brain regions to compute RPE. Electrophysiological analysis of more than 200 neurons mono-synaptically connected to VTA dopamine neurons revealed that most input neurons coded for some computational components of RPEs, such as responses to unpredicted rewards or reward expectation. These different component responses were not localized to specific input nuclei, but rather were distributed across all the sampled nuclei [6]. Thus, dopamine neurons receive distributed inputs from many structures in the brain that code for parts of the RPE, and appear to integrate these distributed inputs to compute RPE responses. A critical factor determining the dopamine response is the cell type-identity of their inputs. For example, the lateral hypothalamus (LH) is a major source of input to the midbrain. Optogenetic activations of GABAergic projections from the LH to the VTA cause phasic dopamine release in the striatum and promote reward seeking behaviors [9,33]. By contrast, phasic activations of LH glutamatergic projections to the midbrain result in avoidance behaviors [9]. The appetitive and aversive effects of LH inputs seem to operate through di-synaptic mechanisms involving local GABA neurons in the vicinity of the VTA [9]. However, the exact identity and location of GABA neurons mediating the disinhibition that could lead to dopamine activations remains unclear. Although LH-GABA neurons synapse onto GABAergic neurons in the VTA [9], previous research has failed to find consistent, phasic inhibition of VTA-GABA neurons at the precise moments when such inhibitions could mediate dopamine activations [55]. Similar to the input from the LH, the different cell types in the dorsal raphe nucleus (DRN) differentially contribute to appetitive and aversive dopamine-mediated behaviors. Both serotonergic and glutamatergic projections from the DRN appear to provide appetitive information to dopamine neurons [53-56], whereas DRN GABA neuron activity is correlated with aversive responses [56]. Glutamatergic projection neurons in the lateral habenula (LHb) are a major source of aversive information to dopamine neurons. Activation of LHb inhibits the majority of dopamine neurons via di-synaptic connections with GABAergic neurons in the rostromedial tegmental nucleus (RMTg) [57-59]. Activation of this pathway causes conditioned place aversion [10,60], whereas lesioning this pathway disrupts the normal inhibitory responses to unpredicted reward omission [8]. Thus, the majority of LHb activation is related to negative prediction error responses (pauses) in dopamine neurons. Studies in the mouse have recently discovered a group of dopamine neurons that receive mono-synaptic, excitatory drive from the LHb [10,61]. Notably, this group of dopamine neurons has different electrophysiological properties than dopamine neurons that have been identified with apomorphine [62-65] and optogenetics [31,32,55]. They emit action potentials at much higher rates than classical dopamine neurons, and do not express somatodendritic dopamine D2 receptors [61]. As such, they are likely insensitive to apomorphine. Because of the distinct electrophysiological properties and putative apomorphine insensitivity, these neurons are not likely to be sampled by studies that use traditional waveform identification techniques [62-65]. Therefore, it remains to be seen what the behavioral function of these neurons is, and indeed whether they code for prediction errors.

Diversity of dopamine responses

The majority of dopamine neurons are activated by reward [31,55,66]. However, dopamine neurons do not respond solely to rewards, nor do all dopamine neurons respond identically. In fact, numerous other stimuli elicit dopamine responses, including noxious, aversive, and physically salient stimuli [64,67-73], novelty [21,73], and large movements [74,75]. Likewise, the well-known role of dopamine neuron loss in movement disorders and Parkinson’s disease clearly implicates these neurons in movement, albeit indirectly. These observations have generated intense interest into the functional heterogeneity of the dopamine population. Phasic dopamine responses display complex temporal dynamics that may reflect different variables and even have multiple behavioral functions [19,21,70,71,76]. Dopamine neurons generally encode RPEs at latencies between 150 and 250 ms following reward; the longer latencies are observed when stimuli are hard to distinguish or the experiment highly dynamic [19,21,77]. On the other hand, many non-reward activations of dopamine neurons occur very early in the response, within 50–200 ms of the behavioral event. Aversive air-puffs evoke short-latency activations and inhibitions in dorsolateral and ventromedial dopamine neurons, respectively [72]. Similarly, aversive electrical shocks activate dopamine neurons projecting to the dorsolateral striatum (DLS), whereas the same stimuli inhibit dopamine neurons projecting to the dorsomedial striatum (DMS) [68]. The Ca2+ signal used to detect the electric shock activations and inhibitions has a slower time-course than action potential signals [78]. Nevertheless, in dopamine neurons that project to the DLS, activations driven by electric shock appear faster and shorter than activations driven by reward [68]. Thus, dopamine responses to aversive stimuli are heterogeneous, but short-latency non-reward responses appear to arise earlier than RPE coding. Context sensitivity also plays an important role in dopamine response heterogeneity. Context can be defined by numerous factors including the overall reward availability, task dynamics, and the physical nature of stimuli and rewards. In particular, the overall reward availability strongly modulates dopamine responses to rewards, reward predictors, and nonrewards. For example, when the overall availability of rewards was low (i.e., rewards were only delivered on a small fraction of trials) dopamine neurons responded to aversive stimuli with inhibitions. However, when the overall availability of rewards was high, the very same dopamine neuron respond to aversive stimuli with short-latency activations, rather than inhibitions [34]. Neutral cues were similarly influenced by the amount of reward delivered in a specific context; greater overall reward availability resulted in greater responses to neutral cues [79]. Because of the influence of overall reward availability, the short-latency activations observed in these (and other) studies are likely instances of pseudo-conditioning, a process of generalization between USs, rather than true responses to nonrewarding stimuli [80]. In a similar fashion, sensory stimulus (CS) generalization has a major impact on the number of dopamine neurons that respond to aversive-predicting CS. When visual cues are used to predict both rewarding and aversive outcomes, more than half of dopamine neurons can respond to the aversive visual cue [66,72]. This is in contrast to when an auditory cue is used to predict a reward and a visual cue is used to predict an aversive air puff, only 15% of dopamine neurons respond to the aversive visual cue [66]. Thus, less stimulus generalization translates into fewer dopamine neurons responding to aversive events. These critical studies highlight the importance of considering behavioral and contextual factors, in addition to the underlying circuits, when designing behavioral tasks and interpreting the motivational implications of dopamine activity.

Conclusions

The value assigned to rewards is a highly dynamic quantity influenced by numerous factors. Dopamine responses code for subjective reward value (utility) [20,21,23] and reflect the numerous behavioral factors that influence value, including choice [21,81], confidence [19], context [34,79], and satiety [49]. Optogenetic stimulation and suppression of dopamine neurons demonstrates that these signals cause value learning [24,26], which likely updates action values in the striatum and elsewhere [34,82-84]. Beyond learning, dopamine neurons have many behavioral functions including roles in movement and motivation. Not discussed here, but at the cutting edge of dopamine investigations, are studies deciphering the precise behavioral roles of prediction error coding and dopamine release at the interface between motivation and movement [29,30]. Our current understanding of dopamine neurons has been greatly facilitated by recent technological developments, including optogenetics and advanced circuit tracing techniques. Optogenetic technologies have been used to unambiguously identify dopamine neurons and test their behavioral function [6,22,31,32,34,55]. These studies have confirmed the major role that dopamine neurons have in reward coding [55], and detailed the algorithm used by the dopamine population [31,32]. Optogenetics stimulation has been used to test and confirm the hypothesis that phasic dopamine activations and suppressions constitute a bi-directional teaching signal for value learning [24,26]. Optogenetic stimulation of monkey dopamine neurons biases the choices of the animal to the stimulation-reinforced options, and translates these technological capabilities into a species with greater anatomical and functional homology with humans [25]. In the physical dopamine circuit, perhaps more than anywhere else in the brain, we are starting to understand the Marr-level III implementation of the reward prediction error algorithm [85]. Recent studies have mapped the anatomical and functional inputs of dopamine neurons [6,8,9,10,33,34,61]. These results demonstrate how different cell types and the micro-circuits they form are critical to understand how dopamine responses are shaped. An important question for future studies to address is, ‘what level of circuit detail is relevant to behavioral function?’ Advancing technological capabilities are revealing ever more complex circuit maps with ever finer details, but it is critical that these findings are interpreted in light of well-founded behavioral theories and experiments [86]. Nevertheless, these developments promise to provide deeper insights into the behavioral functions and information processing capacities of this critical neural system. The next step, I believe, is to gain a broader and clearer appreciation of the nature of reward predictions. To say that dopamine neurons code for reward prediction errors is to imply subtraction is taking place. The operation of subtraction includes three terms, the minuend (the number being subtracted from), the subtrahend (the number being subtracted), and the resulting difference. Recent studies have confirmed that dopamine prediction error responses truly represent differences [31,32]. Therefore, to fully understand the dopamine signal, we need to understand how dopamine neurons code the minuend (reward) and the subtrahend (prediction). Regarding the former, significant progress has been made in understanding how the brain codes for rewards. Signals related to subjective value have been recorded in multiple brain areas. Dopamine neurons reflect multiple attributes of reward, including reward magnitude [87], probability [88,89], and delay [90]. They integrate these attributes and code for a highly specified form of subjective reward value, economic utility, that places the value of different rewards on a common scale for easy comparison [20,23]. Well-defined and easily measured utility functions, therefore, provide a rigorous account of the dopaminergic minuend (Figure 1b). However, we know less about the dopaminergic subtrahend: reward predictions. The classical model for dopamine activity, the TD model, predicts the time-discounted expected value of future rewards [36]. Although this quantity is surely factored into dopamine signals, the results reviewed here demonstrate that this model provides an inadequate description of dopamine activity. Inference about hidden states of the world as well as factors related to decision confidence are incorporated into dopamine responses [19,22]; both of these factors are well beyond simple first-order reward statistics. These results indicate that the reward predictions made by dopamine neurons, and by implication the brain, are far richer than was previously thought. Fortunately, the well-defined nature of the dopamine signal provides an excellent substrate to learn about the shape and character of neuronal predictions.

84 in total

1. Spontaneous recovery from overexpectation.

Authors: Robert A Rescorla
Journal: Learn Behav Date: 2006-02 Impact factor: 1.986

2. Renewal after overexpectation.

Authors: Robert A Rescorla
Journal: Learn Behav Date: 2007-02 Impact factor: 1.986

3. Lateral habenula stimulation inhibits rat midbrain dopamine neurons through a GABA(A) receptor-mediated mechanism.

Authors: Huifang Ji; Paul D Shepard
Journal: J Neurosci Date: 2007-06-27 Impact factor: 6.167

4. The organization of midbrain projections to the striatum in the primate: sensorimotor-related striatum versus ventral striatum.

Authors: E Lynd-Balta; S N Haber
Journal: Neuroscience Date: 1994-04 Impact factor: 3.590

Review 5. A neural substrate of prediction and reward.

Authors: W Schultz; P Dayan; P R Montague
Journal: Science Date: 1997-03-14 Impact factor: 47.728

6. Utility functions predict variance and skewness risk preferences in monkeys.

Authors: Wilfried Genest; William R Stauffer; Wolfram Schultz
Journal: Proc Natl Acad Sci U S A Date: 2016-07-11 Impact factor: 11.205

7. Inhibitory Input from the Lateral Hypothalamus to the Ventral Tegmental Area Disinhibits Dopamine Neurons and Promotes Behavioral Activation.

Authors: Edward H Nieh; Caitlin M Vander Weele; Gillian A Matthews; Kara N Presbrey; Romy Wichmann; Christopher A Leppla; Ehsan M Izadmehr; Kay M Tye
Journal: Neuron Date: 2016-05-26 Impact factor: 17.173