Literature DB >> 24639489

Behavioral flexibility is increased by optogenetic inhibition of neurons in the nucleus accumbens shell during specific time segments.

Luca Aquili¹, Andrew W Liu, Mayumi Shindou, Tomomi Shindou, Jeffery R Wickens.

Abstract

Behavioral flexibility is vital for survival in an environment of changing contingencies. The nucleus accumbens may play an important role in behavioral flexibility, representing learned stimulus-reward associations in neural activity during response selection and learning from results. To investigate the role of nucleus accumbens neural activity in behavioral flexibility, we used light-activated halorhodopsin to inhibit nucleus accumbens shell neurons during specific time segments of a bar-pressing task requiring a win-stay/lose-shift strategy. We found that optogenetic inhibition during action selection in the time segment preceding a lever press had no effect on performance. However, inhibition occurring in the time segment during feedback of results--whether rewards or nonrewards--reduced the errors that occurred after a change in contingency. Our results demonstrate critical time segments during which nucleus accumbens shell neurons integrate feedback into subsequent responses. Inhibiting nucleus accumbens shell neurons in these time segments, during reinforced performance or after a change in contingencies, increases lose-shift behavior. We propose that the activity of nucleus shell accumbens shell neurons in these time segments plays a key role in integrating knowledge of results into subsequent behavior, as well as in modulating lose-shift behavior when contingencies change.

Mesh：

Substances：
Halorhodopsins

Year: 2014 PMID： 24639489 PMCID： PMC3966536 DOI： 10.1101/lm.034199.113

Source DB: PubMed Journal: Learn Mem ISSN： 1072-0502 Impact factor: 2.460

Behavioral flexibility—the ability to change responses in accordance with feedback of results—is crucial for adaptive behavior. Tasks that include a switch in contingencies from a previously reinforced response to another response provide a sensitive measure of behavioral flexibility (Bitterman 1975; Kehagia et al. 2010; Rayburn-Reeves et al. 2013). The nucleus accumbens (NAcc) plays an important role in several forms of behavioral flexibility, including latent inhibition, attentional set shifting, and reversal learning (Stern and Passingham 1995; Cools et al. 2006; Floresco et al. 2006; O’Neill and Brown 2007), with the core and shell subregions of the NAcc regulating separate components (Weiner et al. 1996; Parkinson et al. 1999; Corbit et al. 2001; Ito et al. 2004; Cardinal and Cheung 2005; Pothuizen et al. 2005a,c; Granon and Floresco 2009). Inactivation of NAcc shell prior to initial discrimination learning improves performance of set shift behavior (Floresco et al. 2006) and blocks latent inhibition (Weiner et al. 1996; Jongen-Relo et al. 2002; Pothuizen et al. 2005b, 2006). The NAcc shell plays a particular role in responses to changes in the incentive value of conditioned stimuli (Floresco et al. 2008; Granon and Floresco 2009), which may be important in different forms of behavioral flexibility. Here we investigate the neural mechanisms underlying behavioral flexibility in a task requiring a shift in responses after a contingency switch, using brief optogenetic inhibition to silence NAcc shell neurons in specific time segments. At the cellular level, changes in the firing activity of NAcc neurons are associated with different phases of behavior, including preparation and response, reward expectation, and reward delivery (Carelli and Deadwyler 1994; Bowman et al. 1996; Carelli et al. 2000; Hollerman et al. 2000). During the response preparation phase, anticipatory increases in firing related to reward expectation occur (Carelli and Deadwyler 1994; Carelli et al. 2000). Similarly, phasic increases and more prolonged decreases in firing occur in response to a conditioned stimulus or the associated approach response (Day et al. 2006). Later in the sequence, when the outcome of the response is made known, some cells exhibit excitation, while others exhibit inhibition (Carelli and Deadwyler 1994; Carelli et al. 2000). This activation and inhibition of different neural subpopulations in the NAcc occurs in time segments related to different phases of action, from decision to feedback of results. Neural activity in these different time segments—response selection, reward expectancy, and reward delivery—may therefore play a specific causal role in response selection. Correlation of activation and inhibition of neural activity with behavior establishes the possibility of a causal relationship between the neural activity and the behavior, but a causal relationship cannot be inferred from recording studies showing only correlation. For example, behavior may cause the neural activity, rather than the converse. To show a causal role of neural activity during particular intervals it is necessary to manipulate this activity on similar timescales to the recorded activations and inhibitions. The recent introduction of optogenetic methods has made it possible to modify ongoing neural activity on millisecond timescales (Aravanis et al. 2007; Arenkiel et al. 2007; Gradinaru et al. 2007, 2008; Zhang et al. 2008; Tsai et al. 2009; Gunaydin et al. 2010; Liu and Tonegawa 2010; Lobo et al. 2010; Zhang et al. 2010). This temporal precision of optogenetics makes it possible to investigate the causal role of neural activity in different time segments of a response on a second-by-second basis, extending previous work based on correlation of neural activity and behavior (Tye et al. 2012; Nakamura et al. 2013; Steinberg et al. 2013). In particular, the light-activated halorhodopsin (Han and Boyden 2007; Zhang et al. 2007; Gradinaru et al. 2008) provides a means to optogenetically inhibit neurons in the rodent brain. Recent developments support the use of halorhodopsin in rats (Witten et al. 2011; Stefanik et al. 2012; Nakamura et al. 2013), which have some advantages for testing behavioral flexibility (Whishaw 1995; Cressant et al. 2007). In the present study we investigated the effect on behavioral flexibility of optogenetic inhibition of the NAcc shell neurons in behaving rats during specific time segments related to task events. Based on evidence that NAcc shell neurons encode the incentive value of conditioned stimuli, we hypothesized that inhibition during feedback of results would change the probability of a shift in response after a switch in contingencies. To test this hypothesis we used viral mediated gene transfer to express halorhodopsin in neurons of the rat NAcc shell. We injected a lentiviral vector (pLenti-hSyn-eNpHR3.0-EYFP) into the NAcc shell bilaterally and implanted optic fibers above the injection sites on both sides. A light-emitting diode (LED) delivered light to the optic fibers, so that infected NAcc shell neurons were inhibited when the LED was on. The LED was turned on or off in specific time segments of a task in which contingencies switched several times during a session. We used this approach to investigate the causal role of NAcc shell neurons in integration of the results of previous responses into subsequent responses, focusing on the role of their activity in specific time segments of the sequence of behavior.

Results

Functional expression of halorhodopsin in medium spiny neurons

Halorhodopsin expression and optical fiber placement in the NAcc shell was confirmed by histology in all animals at the end of the experiments (Fig. 1). Cellular expression in the principal neurons of the NAcc shell—the medium spiny neurons (MSNs)—was shown by colocalization of yellow fluorescent protein (YFP) and a specific marker for MSNs (DARPP-32, dopamine and cyclic-adenosine monophospate responsive phosphoprotein of molecular weight 32 kDa) that labels both D1 and D2 subtypes of MSN (Bertran-Gonzalez et al. 2008; Matamales et al. 2009; Rajput et al. 2009) (Fig. 2A). In total, 93% (114/122) of halorhodopsin-expressing cells were MSNs, consistent with the percentage of MSNs measured in the striatum by quantitative neuroanatomy (Oorschot 1996). The infected neurons exhibited electrophysiological characteristics of MSNs including inward rectification and delayed action potential firing (Fig. 2B).

Figure 1.

Figure 2.

Cellular expression of halorhodopsin and electrophysiology of light-activated inhibition in medium spiny neurons. (A) Medium spiny neurons (DARPP-32, red), halorhodopsin (YFP, green), and their colocalization (MERGED) indicating halorhodopsin expression in medium spiny neurons (red + green). Scale bar, 20 µm. (B) Electrophysiological recording from YFP-positive neurons in NAcc shows the characteristic voltage response (above) of a medium spiny neuron to depolarizing and hyperpolarizing current pulses (below). (C) Optical stimulation of YFP-positive neurons induces hyperpolarization in medium spiny neurons on short and long timescales. Black bars indicate illumination time (upper trace, 1.5 sec; lower trace, 10 sec). (D) Illumination (black bar) blocked repetitive action potential firing induced by suprathreshold current injection.

Halorhodopsin expression and position of optical fibers in the NAcc shell. Location of maximal expression of halorhodopsin (eNpHR expression, left) and tips of optical fibers (right) are indicated by circles, color-coded by animal. Several animals showed halorhodopsin expression in multiple sections. Halorhodopsin expression extended ∼300 µm in the medial–lateral and 400 µm in the dorsal–ventral directions. Cellular expression of halorhodopsin and electrophysiology of light-activated inhibition in medium spiny neurons. (A) Medium spiny neurons (DARPP-32, red), halorhodopsin (YFP, green), and their colocalization (MERGED) indicating halorhodopsin expression in medium spiny neurons (red + green). Scale bar, 20 µm. (B) Electrophysiological recording from YFP-positive neurons in NAcc shows the characteristic voltage response (above) of a medium spiny neuron to depolarizing and hyperpolarizing current pulses (below). (C) Optical stimulation of YFP-positive neurons induces hyperpolarization in medium spiny neurons on short and long timescales. Black bars indicate illumination time (upper trace, 1.5 sec; lower trace, 10 sec). (D) Illumination (black bar) blocked repetitive action potential firing induced by suprathreshold current injection. Functional expression of halorhodopsin was confirmed by hyperpolarization of YEF-positive neurons on exposure to yellow light for 1.5 sec or 10 sec (Fig. 2C). Illumination for 1.5 sec caused hyperpolarization of 26.8 mV below the resting membrane potential on average (n = 3), indicating strong inhibition. Action potential firing induced by strong depolarizing current was completely stopped by illumination (Fig. 2D). Control neurons (YFP-negative) from outside the injected areas were not responsive to light stimulation. Thus, optogenetic inhibition of MSNs was able to block firing in response to excitatory currents which were many times greater than synaptic inputs recorded in vivo (Wickens and Wilson 1998).

Experiment 1: Behavioral flexibility is increased by inhibition during feedback

We investigated the effect on behavioral flexibility of optogenetic inhibition during specific time segments of a task that involved within-session switching of contingencies. Rats (n = 8) were trained to criterion on tasks of increasing difficulty, initially learning to press one of two levers for an immediate food reward, and then progressing through stages: between-session reversal, single reversal within a session, and multiple reversals within a session. In the final, multiple reversal testing sessions there were 80 rewards per session. After 20 rewards had been given on one lever, the stimulus–reward contingencies were reversed so that pressing on the other lever was required for reward delivery. Four different stimulus–reward contingencies were tested in each session, with the switching sequence counterbalanced. Rats had 90 minutes to complete the task, and could choose to lever press at any time within a session. To investigate the role of neural activity in the NAcc shell in different phases of their responses, rats were subjected to optogenetic inhibition by turning on the LED during specific time segments (Fig. 3A). In the REWARD condition, the LED was turned on by a correct lever press and stayed on until reward collection. In the ERROR condition, the LED was turned on after an incorrect lever press and stayed on for 1.5 sec. In the ILPI (inter lever-press interval) condition, the LED was on throughout the session, but turned off whenever the rat made a lever press. If the lever press was correct the LED stayed off until reward collection. If the lever press was incorrect the LED stayed off for 1.5 sec. Control conditions included leaving the LED off throughout all sessions (OFF condition), or turning the LED on at random for 1.5 sec every 30, 45, or 60 sec (RANDOM condition). In addition, a separate group of rats received an inactive halorhodopsin to control for direct effects of LED illumination. The total time of LED illumination for each condition is shown in Supplemental Table 1.

Figure 3.

Schematic representation of optogenetic conditions. (A) Experiment 1. REWARD: LED on after a correct lever press and off when the reward was collected. ERROR: LED on after an incorrect lever press and off after 1.5 sec. ILPI: LED on throughout but after a correct level press was turned off until reward collection and after an incorrect lever press was turned off for 1.5 sec. (B) Experiment 2. FEEDBACK: LED on for either REWARD or ERROR conditions as in Experiment 1. DECISION: LED on during tone and lever-out period until a correct or incorrect lever press. We first analyzed errors that occurred immediately after the switch in contingencies at the end of each block of 20 rewards. Such errors may be due to a failure to cease pressing an unreinforced lever and could be considered failure to implement a lose–shift strategy. We found a main effect of LED condition (F(1.29,9.05) = 33.5, P < 0.001). We then made post-hoc comparisons of the error rate for each optogenetic manipulation (ERROR, REWARD, ILPI, RANDOM, OFF). There was a significant reduction in errors in REWARD or ERROR conditions compared to the two control conditions (REWARD vs. OFF, P = 0.002; REWARD vs. RANDOM, P = 0.001; ERROR vs. OFF, P = 0.002; ERROR vs. RANDOM, P = 0.002) (Fig. 4A). The reduction in errors in ERROR and REWARD conditions was also significant when compared to the ILPI condition (ERROR vs. ILPI, P = 0.001; REWARD vs. ILPI, P = 0.001). There was no significant difference between ILPI and control conditions (ILPI vs. OFF, P = 0.175; ILPI vs. RANDOM, P = 0.215) suggesting that inhibition of activity in the decision period preceding action selection had no effect on the errors. These results suggest that the activity of NAcc cells in the time after action selection (lever-pressing) until outcome (reward or nonreward), corresponding to feedback of reward or error results, is important for lose–shift behavior.

Figure 4.

Reversal errors are reduced by optogenetic inhibition during REWARD or ERROR epochs. (A) Mean total number of lever-pressing errors after reversal until first correct response summed over three reversals. There is a significant decrease in these errors in the ERROR and REWARD conditions. (B) Mean total number of lever-pressing errors excluding those shown in A. There is no difference between the conditions. (C) Learning curve showing cumulative errors over rewards acquired (80 rewards per session) for the three optogenetic conditions (ERROR, REWARD, ILPI) and two control conditions (OFF, RANDOM), confirming that the main effect is in the errors after reversal and before the first correct response. To investigate whether win–stay behavior was also affected, we analyzed errors that occurred after the first correct response. There was no significant main effect of optogenetic manipulation on lever-pressing errors that occurred after the first correct response (Fig. 4B). We then examined trial by trial whether the optogenetic stimulated rats and controls were learning the task differently. A learning curve (Fig. 4C) showing cumulative errors confirmed that the main effects occurred in the errors after reversal. The learning rate during rewarded correct responding was similar across conditions. These results indicate that win–stay behavior was not affected by any of the optogenetic manipulations. We further examined the microstructure of learning by analyzing the number of times the animal made an error and then chose the correct lever on the next trial, divided by the total number of errors, as a percentage of the total number of errors (lose–shift percentages). We also analyzed the number of times the animal received a reward for pressing one lever and then chose the same lever on the next trial, expressed as a percentage of the total number of rewards (win–stay percentages). Consistent with the statistical analysis of the number of errors between contingency shifts, lose–shift percentages were higher in the REWARD (62.1%) and ERROR (61.6%) conditions than in the ILPI (49.4%) or OFF (45.1%) conditions. In contrast, win–stay percentages were similar across all conditions (range 83.9% to 85.9%). These results confirm that the main effect of optogenetic inhibition in REWARD and ERROR conditions is increased probability of lose–shift behavior without an effect on win–stay behavior. Optogenetic inhibition had no effect on how well the rats learned the task. Analysis of discrimination percentages for each block (Supplemental Table 2) confirmed that rats’ overall performance was similar across all conditions. Optogenetic condition also had no effect on motivational measures such as total time to complete each session (F(2.27,11,38) = 0.97, P > 0.05) (Fig. 5A) and latency of reward collection after a correct lever press (F(2.22,11.12) = 0.33, P > 0.05) (Fig. 5B).

Figure 5.

Optogenetic stimulation has no effect on motivational variables. (A) There is no significant effect of condition on latency from correct lever press to reward collection. (B) There is no significant effect of condition on time to complete the task. Light delivery may, in theory, alter neural activity even in nonexpressing cells. Although this is unlikely with the light levels used in the current experiment (Yizhar et al. 2011), to control for the nonoptogenetic effects of the LED we tested a separate group of rats (n = 6) that received an inactive halorhodopsin (pLenti-YFP). In these rats the three LED manipulations (ERROR, REWARD, and ILPI) had no effect on performance (Fig. 6A,B). This confirmed that direct physiological effects of LED illumination were unrelated to learning performance.

Figure 6.

Optical stimulation has no effect on control rats with inactive halorhodopsin. (A) Total number of lever-pressing errors after reversal until first correct response summed over three reversals. There is no effect of condition. (B) Total number of lever-pressing errors excluding errors in the period after reversal until first correct response. There is no effect of condition.

Experiment 2: Behavioral flexibility is increased by optogenetic inhibition during FEEDBACK but not DECISION periods

In Experiment 2 we made two modifications to the task in light of results from Experiment 1. First, we delineated a decision period so that optogenetic inhibition could be applied in a distinct time segment. In this DECISION time segment, optogenetic inhibition started with the onset of a discriminative stimulus (a tone starting with the protrusion of the two levers), and stopped either after 5 sec (when the tone ceased and the levers retracted) or when the rat pressed a lever (correct or incorrect). Second, to exclude the possibility that optogenetic inhibition in the former REWARD and ERROR conditions might act as a discriminative stimulus and increase the effect of reward or error outcomes, we applied optogenetic inhibition during both errors and correct responses (FEEDBACK) within the same session. In Experiment 2 (n = 6) we found a main effect of condition on the total number of lever-pressing errors after reversal until the first correct response over three reversals (F(3,15) = 19.3, P < 0.001). There was a significant error reduction in those sessions in which the LED was turned on during either a correct or incorrect response (FEEDBACK vs. OFF, P = 0.015; FEEDBACK vs. RANDOM, P = 0.024; FEEDBACK vs. DECISION, P = 0.027) (Fig. 4A), but not during decisions (DECISION vs. OFF, P = 0.630; DECISION vs. RANDOM, P = 0.170) (Fig. 7A). We found no significant main effect of optogenetic manipulation on the total number of lever-pressing errors excluding errors in the period after reversal until the first correct response (Fig. 7B). These results confirm that the activity of NAcc cells during FEEDBACK is important for lose–shift behavior, regardless of whether feedback is of reward or error results. These results also confirm that inhibition during a more distinct DECISION period has no significant effect on the measures tested.

Figure 7.

Reversal errors are reduced by optogenetic inhibition during FEEDBACK. (A) Mean total number of lever-pressing errors after reversal until first correct response summed over three reversals. There is a significant decrease in these errors in the FEEDBACK condition but not in the DECISION, RANDOM, or OFF conditions. (B) Mean total number of lever-pressing errors excluding those shown in A. There is no difference between the conditions. (C) Learning curve showing cumulative errors over rewards acquired (80 rewards per session) for the two optogenetic conditions (FEEDBACK, DECISION) and the control conditions (OFF, RANDOM), confirming that the main effect occurs in the period between contingency switch and the first correct response. As in Experiment 1, the learning curve for each optogenetic condition shows that across the 80 rewards in a session, rats across all conditions learned at a similar rate (with an increase in the error rate between the 2nd and 3rd reversal), but a reduced number of errors immediately after the switch in the FEEDBACK condition (Fig. 7C). Consistent with the appearance of the learning curve, there was a higher lose–shift percentage in the FEEDBACK condition (57.8%) than in other conditions (40.7%–44.6%). In contrast, win–stay percentages were similar across all conditions (88.1%–88.8%). Analysis of discrimination percentages for each block confirmed that overall task performance was similar across all conditions (Supplemental Table 3). These results confirm that the main effect of optogenetic inhibition in FEEDBACK conditions is increased probability of lose–shift behavior.

Discussion

To the best of our knowledge, this is the first demonstration that optogenetic inhibition of NAcc shell neurons during reward or error feedback intervals increases behavioral flexibility in a task requiring a win–stay/lose–shift strategy. We found that optogenetic inhibition of NAcc shell activity in the time segment between action selection and outcome reduced the number of errors after a stimulus–reward contingency switch. However, optogenetic inhibition in other time segments had no effect on our behavioral measures. Our results demonstrated critical time windows during which NAcc shell neurons (1) integrate reward or error feedback history and (2) use this integrated history to resist lose–shift behavior after a contingency switch. Inhibiting NAcc shell neurons in these critical time windows increased lose–shift behavior, thus facilitating behavioral flexibility. The optogenetic manipulation we used enabled us to inhibit halorhodopsin-expressing neurons within range of the optic fiber. Based on the wavelength, the fiber numerical aperture, and the light power output from the tip of the optic fiber we estimate that the light emanating from the tip would penetrate 0.2–0.3 mm into the tissue to inactivate a volume of 0.034–0.11 mm3. Since the density of MSNs is 84,900 mm−3 (Oorschot 1996) we estimate that about 104 neurons caused the effects we observed. In our intracellular recordings, the inhibitory effect of halorhodopsin was strong enough to block of firing in response to injected currents that were several times larger than synaptic currents. No rebound spiking was observed in the MSNs, consistent with previous studies (Wickens and Wilson 1998; Lansink 2008). Thus, the main effect of the optogenetic manipulation was reduced firing of the MSN output neurons of the NAcc shell. In addition to the MSNs, the NAcc has a small population of fast-spiking interneurons (Kawaguchi et al. 1995). These interneurons are relatively few in number (<1% of the total neurons) but they have strong inhibitory effects on MSNs (Koos and Tepper 1999; Koos et al. 2004). The promoter we used, synapsin, may have resulted in expression of halorhodopsin in fast spiking interneurons as well as in MSNs, raising the possibility of a disinhibitory action on MSNs. However, our intracellular records show this is unlikely because the inhibitory effect of halorhodopsin in MSNs is greater than the inhibitory currents caused by fast-spiking interneurons (Koos et al. 2004; Tepper et al. 2004). Moreover, although these low-threshold spiking interneurons do exhibit rebound low-threshold calcium bursts after release from hyperpolarization (Kubota and Kawaguchi 2000), such bursts would increase the inhibitory effect of halorhodopsin on MSNs rather than diminish it. Therefore, inhibition of MSN output is the dominant effect of the optogenetic manipulation. Our main finding is that the optogenetic inhibition of MSNs in the NAcc shell causes improved switching performance when inhibition is applied in either of two time windows: after correct responses (REWARD), or after errors (ERROR). We suggest that these effects are most probably mediated by the same neurons—those that express halorhodopsin—involved in a mechanism that first integrates feedback during learning and is later engaged after a shift in contingencies. We discuss this putative mechanism below. In the REWARD condition, NAcc shell neurons were inhibited only after correct responses. The effect of inhibition in the REWARD condition—seen after the reversal—was evident in the smaller number of errors before the first correct response. However, the rats in the REWARD condition receive no optogenetic inhibition during the unrewarded responding on the incorrect lever immediately after the contingency switch, because they are making incorrect responses. This means that the cause of the improvement must have occurred during the initial learning, before the reversal, when the rat is making correct responses. However, there is no evidence in our data of fewer correct responses in the periods before the reversal, where the learning curves are similar for all conditions. To explain the effect of optogenetic inhibition in the REWARD condition, we postulate weaker learning of reward expectancy prior to the switch in contingencies, caused by inhibition of NAcc shell neurons after a correct response. The discharge rate of ventral striatal neurons has been reported to correlate with reward magnitude and expectancies and encode cues related to reward (Cromwell and Schultz 2003; Roitman et al. 2005; Wood et al. 2011). Optogenetic inhibition of such neurons during learning experiences would reduce activity that is necessary for synaptic plasticity (Reynolds and Wickens 2000, 2002; Reynolds et al. 2001), leading to weaker reward expectation signals. Weaker reward expectation signals in turn may cause a reduced tendency to resist lose–shift behavior, resulting in more labile responding after a contingency switch. The effect of optogenetic inhibition in the ERROR condition requires a different explanation. Optogenetic inhibition when an incorrect response was made after a shift in contingency was sufficient to reduce the number of reversal errors. The mechanism mediating this effect presumably involves the same neurons as in the REWARD condition but at a different time point. In the ERROR condition, inhibition of these same neurons representing reward expectancy in the REWARD condition would also cause a weaker reward expectation signal in the ERROR condition. The weaker reward expectancy activity might result in more labile responding after a contingency shift. We considered but rejected an alternative interpretation that the increase in behavioral flexibility in REWARD and ERROR is caused by LED-induced neural inhibition acting as cue to increase the effect of feedback. Such a cue might have been informative if it only occurred in association with errors in the ERROR condition, or in association with rewards in the REWARD condition. To test this possibility we added FEEDBACK condition in Experiment 2, in which optogenetic inhibition was given during both REWARD and ERROR conditions. This ensured that the optogenetic inhibition provided no additional information about the outcome. We found that the FEEDBACK condition also increased lose–shift behavior resulting in fewer errors after a switch, thus failing to support the alternative interpretation. Considering our results in the framework of behavioral flexibility, decision-making, and reinforcement learning, it is plausible to suggest that normally the tendency to shift after a loss is reduced by an expectancy of reward. Even in the absence of a reward on the most recent trial, the integrated history of previous association of the lever press with reward may sustain responding. This would cause continued incorrect responding after errors brought about by a switch in contingencies. Inhibiting such an expectancy of reward signal—as in our experiments—would lead to more rapid switching to the correct response. Such a framework combines rule-based win–stay, lose–shift models (Worthy et al. 2012) with reinforcement learning models (Sutton and Barto 1981; Barto and Sutton 1982) by basing decisions on the recency-weighted average reward, leading to selection of the option with greatest expected reward values (Worthy et al. 2013). Optimal behavior in such contingency switch tasks is possible using a win–stay, lose–shift strategy that repeats the response from the last trial if it was correct but switches to the alternative response if it was incorrect (Rayburn-Reeves et al. 2013). Such a strategy requires integration of the results of previous responses into subsequent responses. Our results may be explained in this framework if REWARD, ERROR, and FEEDBACK conditions result in weaker reward expectancy signals. The present findings contribute to a larger body of work in which prolonged inactivation or lesions of the NAcc shell improve behavioral flexibility in various forms, including latent inhibition, Pavlovian-instrumental transfer, and attentional set-shifting (Weiner et al. 1996; Corbit et al. 2001; Pothuizen et al. 2005a; Floresco et al. 2006). For example, Floresco et al. (2006) found that NAcc shell inhibition during the first day of discrimination learning led to improved performance (fewer trials to reach criterion) relative to controls receiving inhibition after a set switch. The improvement in shifting from the previously learned strategy was interpreted as inhibition-induced inability of the rats to fully ignore the irrelevant stimulus during the first discrimination, making this stimulus more salient during the shift, and thus reducing perseverative errors. However, Ambroggi et al. (2011) found that NAcc shell inactivation did not impair the ability to discriminate between cues, even though it reduced inhibition of responding to a nonrewarded stimulus. Our findings are consistent with Ambroggi et al. (2011) but not with Floresco et al. (2006), because we did not see increased lose–shift behavior in the ILPI or DECISION conditions. However, the firing of a subset of NAcc neurons in the delay period preceding movement is correlated with the direction of subsequent movement (Taha et al. 2007) leaving open the possibility that they may contribute to decisions if inactivation over both response directions leaves some differential activation. Some effects of optogenetic inhibition of the NAcc shell may be mediated by dopamine neurons in the ventral tegmental area that receive inputs from the NAcc (Zahm and Heimer 1990; Heimer et al. 1991; Usuda et al. 1998; Aggarwal et al. 2012). Alterations in dopamine signaling have previously been associated with changes in behavioral flexibility. For example, Colpaert et al. (2007) found that systemic dopamine D2 antagonists increased win–shift behavior after a rewarded trial. Conversely, Halluk and Floresco (2009) found that infusions of the D2 agonist quinpirole directly into the NAcc impaired reversal learning without disrupting initial response learning. St Onge et al. (2011) found that rats would bias their choices toward a lose–shift strategy better than controls if a D1 agonist was injected into the prefrontal cortex. Together, these studies suggest that some of the effects of NAcc shell inhibition may be mediated by dopaminergic projections to the prefrontal cortex or NAcc (Floresco et al. 2009). In conclusion, our optogenetic experiments indicate critical time segments during which NAcc shell neurons integrate reward or error feedback history, and use this integrated history to make decisions. Inhibiting NAcc shell neurons in these critical time segments increased lose–shift behavior, thus facilitating behavioral flexibility. The effects we observed may be explained by reduced integration of reinforcement history causing reduced reward expectancy, or disrupted readout of reward expectancy for decision making. Weaker reward expectancy signals might explain the observed more labile responding after a contingency shift. Further work is needed to examine the natural firing patters of the infected neurons in these critical time windows, which our evidence suggests play a key role in integrating knowledge of results into subsequent behavior, and in modulating lose–shift behavior when contingencies change.

Materials and Methods

Subjects

Twenty Long Evans rats were obtained from Charles River weighing 250–275 g on arrival. The animals were initially housed in pairs and later housed individually after they had cannulae implanted. They were maintained on a 12-h light–dark cycle. Rats were restricted to 15 g of chow per day with free access to water. The Okinawa Institute of Science and Technology Animal Care and Use Committee approved the procedures.

Virus production and purification

The pLenti-hSyn-eNpHR3.0-EYFP lentiviral vector was kindly provided by Karl Deisseroth’s Lab. This contains a fusion protein of halorhodopsin (eNpHR) and the hSyn promoter which is highly specific for neurons (Kugler et al. 2003). This was used along with packaging plasmid, psPAX2, and envelope plasmid, pMD.2G, in a liposome mediated triple transfection (FuGene-6, Roche) of HEK293T cells (ATCC). After a period of 6 h, the medium was replaced with an Ultraculture serum free medium (Lonza Bio) supplemented with 5-mM sodium butyrate and viral particles shed from the cells were collected over a period of 36 h. The virus containing media was then filtered through a 0.45-mm SFCA filter unit (Nalgene) and spun in a CP100WX refrigerated (4°C) ultracentrifuge for 2 h at 16K rpm (Hitachi). Supernatants were discarded and the viral pellets re-suspended in PBS and frozen at −80°C for subsequent use. Final concentrated viral titers were 1.78 × 1011 copies/mL and determined by RT-PCR using the Lenti-X qRT-PCR Titration Kit (Clontech).

Stereotaxic optic fiber implantation and virus injection

We injected pLenti-eNpHR3.0-YFP bilaterally at stereotaxic coordinates for the NAcc shell and implanted optical fibers at the site of each injection on both sides. For these procedures, rats were anesthetized with a mixture of isoflurane and oxygen at a ratio of 5:1 (induction) and placed in a stereotaxic frame (David Kopf Instruments). The isoflurane to oxygen ratio was changed to 2:1 during the surgical procedure. Two holes were drilled and 1.0 µL virus was injected via Hamilton syringe into the NAcc shell bilaterally at the following coordinates from bregma: anterior–posterior, +1.6 mm; dorsoventral, −7.0 mm; medial–lateral, +/− 0.8 mm. After the injections, the two fiber-optic cannulae were inserted and anchored to the skull with stainless-steel screws and dental cement. For maximum viral expression, the animals were rested for 2 wk before behavioral training began.

Behavioral procedures and optogenetic conditions

Rats were trained and tested in sound-attenuated testing chambers (34 × 29 × 25 cm, Med Associates) and the behavioral task was programmed using a MED-PC system. Retractable levers were fitted on the left and right walls of the chamber, with a pellet receptacle in the center. A head entry detector was used to measure reward collection. A house light was located in the top center of the response panel and a Sonalert attachment was mounted above the cage. Food pellets (45 mg) were delivered via a pellet dispenser (ENV-203M). Light delivery into the rat’s fiber-optic cannulae was gated by a digital logic control signal between the MED-PC system and an LED driver. The LED (wavelength 590 nm) was connected via a two-channel fiber-optic swivel that allowed the animal to turn freely. Two mono fiber patch cords provided optical connection from the swivel to the fiber-optic cannulae. All optical equipment was obtained from Doric Lenses. The optical fiber diameter was 200 µm with a numerical aperture of 0.37. Power output from the tip of the fiber was 0.40 mW.

Experiment 1

We began training 2 wk after pLenti-eNpHR3.0-YFP injection, and tested from 4- to 8-wk post-injection. Training took place in four stages. In the first stage—fixed-ratio (FR1) discrimination—rats learned to press the left or right lever (counterbalanced) for a food reward. Each correct lever press resulted in the simultaneous illumination of a visual cue, the onset of an auditory stimulus, and the delivery of a 45-mg food pellet. Rats had 90 min to complete the task and could receive 80 rewards. The criterion for moving to the next stage of training was 90% discrimination accuracy. In the second stage—between-session reversal—the stimulus–reward contingencies were reversed so that if lever-pressing on the left previously resulted in reward delivery this ceased to be the case and lever-pressing on the right became rewarding, and vice versa. Rats underwent 3 d of training before moving on to the third stage. In the third stage—within session reversal—the stimulus–reward contingencies were switched twice within the same session, so that after 40 lever presses on the left to receive food pellets, rats had to lever press on the right to receive the remaining 40 pellets. In the final stage, four different stimulus–reward contingencies were tested in each session. Rats obtained 20 rewards on a lever before a switch occurred, with the switching sequence counterbalanced, accumulating a total of 80 rewards per session. The rat could choose to lever press at any time within a session. After 3 d of training on this paradigm, two mono fiber patch cords from the LED were connected to the implanted fiber-optic cannulae and the effects of optogenetic inhibition at different time points were tested. We conducted a total of 27 testing sessions of 90 min on different days. To investigate the role of neural activity in the NAcc shell in different time segments, each rat was subjected to optogenetic inhibition during days 10–36. Optogenetic conditions were defined by the timing of the LED illumination, as shown in Figure 3A.

Experiment 2

All training and testing procedures were identical to those in Experiment 1 in terms of stimulus–reward contingencies, but optogenetic conditions were changed to those shown in Figure 3B.

Histology

Expression of pLenti-eNpHR3.0-YFP and location of the optical fiber tips in the NAcc shell were confirmed by histology in all animals at the end of the experiments (Fig. 1). Following optogenetic behavioral experiments, all animals were sacrificed by injection of pentobarbital and perfused transcardially with PBS/heparin (60 U/mL) followed by 4% PFA. The brains were removed and postfixed on 4% PFA overnight. The following day the PFA was discarded and the brains were immersed in a 20% sucrose solution in PBS overnight. Brain slices were cut into 60-µm thick sections on a freezing microtome (Yamato), placed in PBS for storage at 4°C.

Immunocytochemistry

To determine the efficiency of expression of eNpHR3.0-YFP in NAcc medium spiny cells, slices were permeabilized/quenched with 0.05 M NH4Cl and 0.02% (w/v) Saponin in PBS or 15 min and blocked with PGAS containing 2% Goat Serum (Jackson) and 1% (w/v) BSA for 1 h. For double staining of eNpHR and DARPP-32, brain sections were stained using chicken-derived GFP (1:2000, Abcam) and rabbit-derived DARPP-32 antibodies (1:1000, Chemicon) overnight at 4°C. The following day and after PGAS washes to remove the primary antibodies, secondary labeling of the bound antibodies on the slices were stained using goat-derived anti-chicken Alexa 488 and anti-rabbit Alexa 594 conjugated secondary antibodies (both at 1:500, Invitrogen) for 4 h at 25°C. After washing the secondary antibodies with PGAS and PBS, slices were mounted onto slides with Vecta-Shield (Vector Labs) and images were acquired using a Zeiss LSM 510 Meta Confocal microscope. We quantified the infection efficacy by counting the number of YFP positive cells that were also DARPP-32 (double labeled). The analysis was applied to each NAcc shell section (N = 4) in which YFP was present within a counting frame (200 µm × 200 µx × 50µm). The total number of double labeled cells (YFP+DARPP-32) was compared with the total number of YFP positive cells.

Electrophysiology

Electrophysiological studies were performed 2 wk after bilateral injection of pLenti-eNpHR3.0-YFP at 5 wk, corresponding to the time at which training began in behaviorally tested rats. To prepare brain slices for electrophysiology, coronal sections (300 µm thick) containing the NAcc were cut on a VT1000S microtome (Leica) in cold modified artificial CSF (ACFS) containing 50 mM NaCl, 2.5 mM KCl, 7 mM MgCl2, 0.5 mM CaCl2, 1.25 mM NaH2PO4, 25 mM NaHCO3, 95 mM sucrose, 25 mM glucose and saturated with 95% O2/5% CO2. Slices were then incubated in oxygenated standard ACSF containing 120 mM NaCl, 2.5 mM KCl, 2 mM CaCl2, 1 mM MgCl2, 25 mM NaHCO3, 1.25 mM NaHPO4, 15 mM glucose. After recovery for 1–4 h, slices were transferred to a recording chamber where they were perfused with standard ACSF (3–4 mL/min, 30°C). Prior to recording, eNpHR-expressing neurons were identified by YFP fluorescence. We used patch pipettes (2–4 MΩ) filled with internal solution (115 mM K gluconate, 1.2 mM MgCl2, 10 mM HEPES, 4 mM ATP, 0.3 mM GTP, 0.5% biocytin, pH 7.2–7.4) to make whole-cell current-clamp recordings from eNpHR-YFP-positive neurons in NAcc. Neural electrical responses were amplified using a Multiclamp 700B amplifier and signals were digitized at 10 kHz. Stimulation of NpHR was achieved by epifluorescence illumination (100-W xenon arc lamp, 560-nm excitation filter; Semrock FF01-562/40) gated by a Uniblitz VS25 Shutter under through-the-lens control.

Statistical analyses

For statistical analyses we used SPSS version 18 (SPSS Inc). Data were analyzed using one-way repeated measures analysis of variance (ANOVA). During testing, we calculated the total number of errors either during the reversal phase (total number of lever-pressing errors after reversal until first correct response over three reversals) or before the reversal (total number of lever-pressing errors excluding errors in the period after reversal until first correct response) for all the optogenetic manipulations over three sessions (three sessions for each LED condition). If a main effect was found, we conducted post-hoc pairwise comparisons using the Bonferroni correction for the repeated measures ANOVA. A P value of <0.05 was considered significant.

73 in total

1. Dependence of GABAergic synaptic areas on the interneuron type and target size.

Authors: Y Kubota; Y Kawaguchi
Journal: J Neurosci Date: 2000-01-01 Impact factor: 6.167

2. Multimodal fast optical interrogation of neural circuitry.

Authors: Feng Zhang; Li-Ping Wang; Martin Brauner; Jana F Liewald; Kenneth Kay; Natalie Watzke; Phillip G Wood; Ernst Bamberg; Georg Nagel; Alexander Gottschalk; Karl Deisseroth
Journal: Nature Date: 2007-04-05 Impact factor: 49.962

3. The nucleus accumbens in monkeys (Macaca fascicularis). III. Reversal learning.

Authors: C E Stern; R E Passingham
Journal: Exp Brain Res Date: 1995 Impact factor: 1.972

4. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element.

Authors: A G Barto; R S Sutton
Journal: Behav Brain Res Date: 1982-03 Impact factor: 3.332

5. Dissociable contributions by prefrontal D1 and D2 receptors to risk-based decision making.

Authors: Jennifer R St Onge; Hamed Abhari; Stan B Floresco
Journal: J Neurosci Date: 2011-06-08 Impact factor: 6.167

6. Spatial learning in Long-Evans Hooded rats and C57BL/6J mice: different strategies for different performance.

Authors: Arnaud Cressant; Morgane Besson; Sandra Suarez; Anne Cormier; Sylvie Granon
Journal: Behav Brain Res Date: 2006-12-08 Impact factor: 3.332

Review 7. Learning and cognitive flexibility: frontostriatal function and monoaminergic modulation.

Authors: Angie A Kehagia; Graham K Murray; Trevor W Robbins
Journal: Curr Opin Neurobiol Date: 2010-02-16 Impact factor: 6.627

8. A comparison of rats and mice in a swimming pool place task and matching to place task: some surprising differences.

Authors: I Q Whishaw
Journal: Physiol Behav Date: 1995-10

Review 9. Neural circuits subserving behavioral flexibility and their relevance to schizophrenia.

Authors: Stan B Floresco; Ying Zhang; Takeshi Enomoto
Journal: Behav Brain Res Date: 2008-12-06 Impact factor: 3.332

10. L-DOPA disrupts activity in the nucleus accumbens during reversal learning in Parkinson's disease.

Authors: Roshan Cools; Simon J G Lewis; Luke Clark; Roger A Barker; Trevor W Robbins
Journal: Neuropsychopharmacology Date: 2006-07-12 Impact factor: 7.853

10 in total

Review 1. Viral vector-based tools advance knowledge of basal ganglia anatomy and physiology.

Authors: Rachel J Sizemore; Sonja Seeger-Armbruster; Stephanie M Hughes; Louise C Parr-Brownlie
Journal: J Neurophysiol Date: 2016-02-17 Impact factor: 2.714

2. Catecholaminergic modulation of indices of cognitive flexibility: A pharmaco-tDCS study.

Authors: Olivia Dennison; Jie Gao; Lee Wei Lim; Charlotte J Stagg; Luca Aquili
Journal: Brain Stimul Date: 2018-12-07 Impact factor: 8.955

3. Abundant collateralization of temporal lobe projections to the accumbens, bed nucleus of stria terminalis, central amygdala and lateral septum.

Authors: Rhett A Reichard; Suriya Subramanian; Mikiyas T Desta; Tej Sura; Mary L Becker; Comeron W Ghobadi; Kenneth P Parsley; Daniel S Zahm
Journal: Brain Struct Funct Date: 2016-10-04 Impact factor: 3.270

4. A Streamlined Method for the Preparation of Gelatin Embedded Brains and Simplified Organization of Sections for Serial Reconstructions.

Authors: Andrew W Liu; Sho Aoki; Jeffery R Wickens
Journal: Bio Protoc Date: 2017-11-20

Review 5. The Roles of Optogenetics and Technology in Neurobiology: A Review.

Authors: Wenqing Chen; Chen Li; Wanmin Liang; Yunqi Li; Zhuoheng Zou; Yunxuan Xie; Yangzeng Liao; Lin Yu; Qianyi Lin; Meiying Huang; Zesong Li; Xiao Zhu
Journal: Front Aging Neurosci Date: 2022-04-19 Impact factor: 5.702

6. Optogenetic Activation of Adenosine A2A Receptor Signaling in the Dorsomedial Striatopallidal Neurons Suppresses Goal-Directed Behavior.

Authors: Yan Li; Yan He; Mozi Chen; Zhilan Pu; Li Chen; Ping Li; Bo Li; Haiyan Li; Zhi-Li Huang; Zhihui Li; Jiang-Fan Chen
Journal: Neuropsychopharmacology Date: 2015-07-28 Impact factor: 7.853

7. An approach to monitoring home-cage behavior in mice that facilitates data sharing.

Authors: Edoardo Balzani; Matteo Falappa; Fuat Balci; Valter Tucci
Journal: Nat Protoc Date: 2018-05-17 Impact factor: 13.491

8. The causal role between phasic midbrain dopamine signals and learning.

Authors: Luca Aquili
Journal: Front Behav Neurosci Date: 2014-04-25 Impact factor: 3.558

9. Basal forebrain projections to the lateral habenula modulate aggression reward.

Authors: Sam A Golden; Mitra Heshmati; Meghan Flanigan; Daniel J Christoffel; Kevin Guise; Madeline L Pfau; Hossein Aleyasin; Caroline Menard; Hongxing Zhang; Georgia E Hodes; Dana Bregman; Lena Khibnik; Jonathan Tai; Nicole Rebusi; Brian Krawitz; Dipesh Chaudhury; Jessica J Walsh; Ming-Hu Han; Matt L Shapiro; Scott J Russo
Journal: Nature Date: 2016-06-30 Impact factor: 49.962

10. Dorsal and ventral striatal dopamine D1 and D2 receptors differentially modulate distinct phases of serial visual reversal learning.

Authors: Johan Alsiö; Trevor W Robbins; Júlia Sala-Bayo; Leanne Fiddian; Simon R O Nilsson; Mona E Hervig; Colin McKenzie; Alexis Mareschi; Maria Boulos; Peter Zhukovsky; Janet Nicholson; Jeffrey W Dalley
Journal: Neuropsychopharmacology Date: 2020-01-15 Impact factor: 7.853

10 in total