Literature DB >> 24795588

The causal role between phasic midbrain dopamine signals and learning.

Abstract

Entities: Chemical Disease Gene Species

Keywords: behavioral flexibility; dopamine; electrophysiology; errors; reward prediction error

Year: 2014 PMID： 24795588 PMCID： PMC4007013 DOI： 10.3389/fnbeh.2014.00139

Source DB: PubMed Journal: Front Behav Neurosci ISSN： 1662-5153 Impact factor: 3.558

× No keyword cloud information.

Reinforcement learning occurs when organisms adapt behavior on the basis of associations with reward and punishment.Reinforcement learning is a useful algorithm because it is unsupervised, relying on trial-and-error learning under conditions in which the optimal solution is unknown. Recent neural network models of reinforcement learning are based on the neurophysiology of the rat, monkey, and human dopamine systems (Montague et al., 1996; Dayan and Balleine, 2002; Schultz, 2002; Montague et al., 2004; Pan et al., 2008). The main finding of this research is that the dopamine system appears to minimize errors in the prediction of reward through a process called temporal difference learning. As predicted by the temporal difference learning models, dopamine neurons respond during the early stages of classical and operant conditioning with a burst of action potentials (a phasic-like response) after reward presentation (Schultz, 1998; O'Doherty et al., 2006). However, after repeated pairings of a given stimulus and reinforcement, the dopamine neurons respond to the onset of the stimulus, be it a conditioned stimulus or a cue that triggers a stereotyped action that results in reward (Mirenowicz and Schultz, 1994). After an association has been formed between the stimulus and reinforcement, dopamine ceases responding to the reinforcer itself (Schultz et al., 1997). Based on these neurophysiological data, reinforcement learning models have proposed that the role of the midbrain phasic DA neurons is to act as a teaching signal which adjusts reward prediction errors and broadcasts such information to upstream cell populations involved in reward learning such as the nucleus accumbens (NAc) (Joel et al., 2002; Wassum et al., 2013). More recently, a number of computational studies have added another layer of complexity to their models by incorporating the idea of incentive motivation as a way to better capture the role of dopamine in reward learning (McClure et al., 2003; Niv, 2007; Zhang et al., 2009; Morita et al., 2013). This has largely been based on findings from lesion and pharmacological studies whereby it has been hypothesized that dopamine neurons respond to conditioned stimuli by invigorating instrumental actions that lead to the obtainment of rewards (Berridge et al., 2009; Wassum et al., 2011). In the meantime, a number of authors have suggested that because midbrain dopamine neurons also respond to aversive and salient stimuli by phasic DA activations (Matsumoto and Hikosaka, 2009; Cohen et al., 2012; Ilango et al., 2012; Tan et al., 2012; Brooks and Berns, 2013; Fiorillo et al., 2013), that their role in encoding reward prediction errors may be more limited than first envisaged (Horvitz et al., 1997; Redgrave and Gurney, 2006; Redgrave et al., 2008; May et al., 2009; Thirkettle et al., 2013). The scope of this Opinion article, however, is not to assess the validity of such claims. On the contrary, the aim of this article is to focus on one area of research that has received relatively little attention, namely, how the phasic DA signal may be causally related to action selection, goal-directed behavior, and behavioral flexibility. This is partially because the vast majority of studies which have explored whether DA neurons may encode more than reward prediction errors (e.g., including measures related to behavioral flexibility such as reward value, reward probability, choice behavior, discounting of delayed rewards) (Fiorillo et al., 2003, 2008; Morris et al., 2006; Roesch et al., 2007; Takahashi et al., 2009; Bromberg-Martin et al., 2010a,b; Nomoto et al., 2010), have been based upon electrophysiological data, which by their very nature can only support a correlation between neuronal activation and inhibition with behavior but cannot establish causation. This has been acknowledged by a statement from Wolfram Schultz who declared that “although the prediction error response of dopamine neurons would make a good teaching signal, the bulk of the available data are correlational” (Schultz, 2010). Therefore, to establish causation we will look at a number of recent studies that have used primarily, optogenetic, voltammetry and pharmacological interventions and that may provide an answer to this question. With the recent introduction of optogenetics, for example, it has been possible to perturb neural activity at millisecond timescales and directly relate this manipulation to an array of behaviors including sleep, anxiety, depression, and fear, to name but a few (Rolls et al., 2011; Kim et al., 2013; Tye et al., 2013; Courtin et al., 2014). More specifically, midbrain DA neurons and their striatal projections have also been selectively targeted resulting in behavioral modifications of food intake, cocaine consumption, conditioned place preference and aversion (by inhibition of DA activity via GABAergic VTA cells) (Tsai et al., 2009; Lobo et al., 2010; Domingos et al., 2011; Tan et al., 2012). Optogenetic targeting of midbrain DA cells and their striatal projections, has also revealed interesting observations regarding their causal role in reward prediction, and possibly, behavioral flexibility. With regards to the causal role of DA in reward prediction (Kim et al., 2012), the authors showed that phasic activation of VTA DA neurons after a nose poke could drive operant responses in the absence of food reward. In another laboratory, a blocking procedure was used to demonstrate that activation of DA neurons at the time of reward delivery during compound stimulus presentation could artificially produce a conditioned response to the normally blocked cue. In other words, phasic DA stimulation at a point in time (reward delivery) when this would normally be absent could unblock learning (Steinberg et al., 2013). In a separate study looking at manipulation of the GABAergic cells of the VTA on reward learning and its effect on DA release, optogenetic stimulation of VTA GABAergic neurons disrupted consummatory behavior but not if the VTA GABA projections to the NAc were targeted. Moreover, stimulation of the GABA neurons suppressed VTA DA firing and release in the NAc (Van Zessen et al., 2012). In a further study to characterize the VTA GABA projections to the NAc, it was found that activation of this pathway selectively inhibited cholinergic neurons of the NAc which in turn increased associative learning of an aversive predictive cue (Brown et al., 2012). Importantly, this effect was dopamine independent, as stimulation of GABA terminals in the NAc did not change baseline firing of VTA DA cells. Taken together, these studies confirm that within the VTA, DA activity regulates aspects related to appetitive reward learning. Moreover, these data highlight how the encoding of an aversive outcome may not only be signaled by DA cells projecting to the NAc but also by activation of cholinergic cells in the NAc that receive preferential input from VTA GABA neurons, extending the results from previous investigations (Tan et al., 2012). With regards to the causal role of DA in behavioral flexibility, in a recent study (Adamantidis et al., 2011), the authors targeted the dopaminergic neurons of the VTA by injecting channelrhodopsin-2 (ChR2) in Th-Cre mice. The initial behavioral paradigm required mice to bar press one of two levers. The “active” lever resulted in food delivery plus optogenetic stimulation whereas bar pressing on the “inactive” lever resulted in the delivery of food only. Compared to controls (YFP mice), phasic DA stimulation enhanced the effects of food-reward seeking (i.e., mice bar pressed the active lever preferentially over the inactive). Interestingly, they also found that after a series of extinction sessions during which no food reward or phasic DA stimulation occurred, preferential lever pressing (to the initial active lever) could be reestablished by DA stimulation in the absence of both external cues and, critically, food reward. Finally, the authors used a reversal learning session where the relationship between the active (optical stimulation + no food reward) and inactive (no optical stimulation + no food reward) levers were switched, and demonstrated that ChR2 mice switched their lever pressing to the previously inactive lever compared to control mice. This finding is particularly important because it suggests that not only is the phasic DA signal driving and enhancing simple stimulus-reward associations but it is also causally involved in flexible behavioral adaptations that occur as a result of changes in stimulus-reward contingencies. Behavioral flexibility has also been tested by optogenetic manipulations of dopamine receiving NAc neurons. In a recent study, dopamine D1 and D2 receptors were selectively targeted while D1-cre and D2–cre mice were performing a probabilistic switching task (Tai et al., 2012). The results showed that activation of D1 and D2 neurons was effective at increasing lose-shift behavior (i.e., moving from an incorrect to a correct response) compared to controls but had no effect on win-stay performance (i.e., repeating the previously rewarded response). Moreover, the effect was dependent on whether stimulation occurred before movement initiation but not if it was delayed by 150 ms. Interestingly, we recently found (Aquili et al., 2014) that non-specific optogenetic inhibition and not excitation of NAc shell neurons increased lose-shift behavior but only if the inhibition occurred during feedback of results (between lever pressing and rewards or non-rewards) but not during action selection (preceding a lever press). We speculated that inhibition of NAc cells during specific time segments may have weakened reward expectancy signals which would in turn facilitate switching to a correct response after an error. Differential effects between NAc core and shell on learning have been observed using fast-scan cyclic voltammetry which may explain the contradictory findings from the two previous optogenetic studies. In fact, in one study cue-evoked dopamine release was larger and longer lasting in the NAc shell than in the core during goal-directed behavior for sucrose (Cacciapaglia et al., 2012). In two related studies, it was also found that concentrations of cue-evoked DA release closely tracked differences in reward magnitude in the NAc shell (Beyene et al., 2010) and reward delays in both NAc core and shell (Wanat et al., 2010). DA reward prediction error signals in the NAc core have also been reported using voltammetry (Hart et al., 2014). Here, using a probabilistic decision-making task, the authors found that dopamine concentrations varied systematically as differing degrees of reward uncertainty were introduced, in a manner closely resembling the predictions of reinforcement learning models and electrophysiological data of VTA DA neurons. Similarly, the observation that the DA phasic response to rewards gradually shifts to the earliest predictor of reinforcement over the course of learning as predicted by temporal difference models (Sutton and Barto, 1981) and validated by DA electrophysiological recordings, has been confirmed by voltammetric data (Sunsay and Rebec, 2008). These findings are important because changes in firing rates may not always reflect changes in DA release (Youngren et al., 1993), and these voltammetric data allow us to better establish the causal role of DA in reward learning. Data from pharmacological manipulation of (mostly) dopamine D1 and D2 function in the striatum is another important component to take into account when trying to establish a causal link between neural activity and behavior. Dopamine depletion, for example, in the dorsomedial striatum results in reversal learning impairments (O'Neill and Brown, 2007). Moreover, in stimulant dependent individuals who display perseverative behaviors following an incorrect response during a reversal learning task, administration of a dopamine D2/3 antagonist reduced perseverative errors and improved caudate nucleus function (Ersche et al., 2011), and in separate study, administration of a D2 antagonist enhanced reward related prediction error signals in the striatum (Jocham et al., 2011). Conversely, stimulation of D2 (but not D1) receptors using the agonist quinpirole impaired goal-directed behavior and decision making (St Onge et al., 2011; Naneix et al., 2013) and broad inactivation of caudate nucleus cells disrupted the ability for flexible responses based on previous reward history (Muranishi et al., 2011). Interestingly, in monkeys, D2 receptor availability in the dorsal striatum was correlated with the number of reversal learning errors (Groman et al., 2011). Overall, these data suggest that abnormal increases/decreases in striatum DA activity via D1/D2 receptors causally influence several important measures of behavioral flexibility. Studies that have looked at increasing dopamine concentration have demonstrated that DA stimulation by injection of amphetamine in the NAc core or shell increased instrumental responding to a conditioned stimulus predictive of reward (Pecina and Berridge, 2013), and administration of the dopamine precursor L-DOPA in older adults restored reward prediction error signaling (Chowdhury et al., 2013). In conclusion, increasing evidence from optogenetic, voltammetry, and pharmacological studies over the recent years have added a new dimension to the established but mostly correlation role between the midbrain DA neurons and reward learning. This evidence suggests that this phasic response may have a causal role not only in reward prediction error signaling, but also in driving flexible behavioral adaptations to changes in stimulus-reward contingencies.

Conflict of interest statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

67 in total

1. Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat.

Authors: J C Horvitz; T Stewart; B L Jacobs
Journal: Brain Res Date: 1997-06-13 Impact factor: 3.252

Review 2. A neural substrate of prediction and reward.

Authors: W Schultz; P Dayan; P R Montague
Journal: Science Date: 1997-03-14 Impact factor: 47.728

3. Dissociable contributions by prefrontal D1 and D2 receptors to risk-based decision making.

Authors: Jennifer R St Onge; Hamed Abhari; Stan B Floresco
Journal: J Neurosci Date: 2011-06-08 Impact factor: 6.167

4. Prefrontal parvalbumin interneurons shape neuronal activity to drive fear expression.

Authors: Julien Courtin; Fabrice Chaudun; Robert R Rozeske; Nikolaos Karalis; Cecilia Gonzalez-Campo; Hélène Wurtz; Azzedine Abdi; Jerome Baufreton; Thomas C M Bienvenu; Cyril Herry
Journal: Nature Date: 2013-11-20 Impact factor: 49.962

5. The path to learning: action acquisition is impaired when visual reinforcement signals must first access cortex.

Authors: Martin Thirkettle; Thomas Walton; Ashvin Shah; Kevin Gurney; Peter Redgrave; Tom Stafford
Journal: Behav Brain Res Date: 2013-02-01 Impact factor: 3.332

6. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term.

Authors: Andrew S Hart; Robb B Rutledge; Paul W Glimcher; Paul E M Phillips
Journal: J Neurosci Date: 2014-01-15 Impact factor: 6.167

7. Distinct actions of endogenous excitatory amino acids on the outflow of dopamine in the nucleus accumbens.

Authors: K D Youngren; D A Daly; B Moghaddam
Journal: J Pharmacol Exp Ther Date: 1993-01 Impact factor: 4.030

8. The temporal precision of reward prediction in dopamine neurons.

Authors: Christopher D Fiorillo; William T Newsome; Wolfram Schultz
Journal: Nat Neurosci Date: 2008-08 Impact factor: 24.884

9. Behavioral flexibility is increased by optogenetic inhibition of neurons in the nucleus accumbens shell during specific time segments.

Authors: Luca Aquili; Andrew W Liu; Mayumi Shindou; Tomomi Shindou; Jeffery R Wickens
Journal: Learn Mem Date: 2014-04-01 Impact factor: 2.460

10. A neural computational model of incentive salience.

Authors: Jun Zhang; Kent C Berridge; Amy J Tindell; Kyle S Smith; J Wayne Aldridge
Journal: PLoS Comput Biol Date: 2009-07-17 Impact factor: 4.475

1 in total

1. The effects of methylphenidate on cerebral responses to conflict anticipation and unsigned prediction error in a stop-signal task.

Authors: Peter Manza; Sien Hu; Jaime S Ide; Olivia M Farr; Sheng Zhang; Hoi-Chung Leung; Chiang-shan R Li
Journal: J Psychopharmacol Date: 2016-01-11 Impact factor: 4.153

1 in total