Literature DB >> 36160041

Laser stimulation of the skin for quantitative study of decision-making and motivation.

Julia Pai¹, Takaya Ogasawara¹, Ethan S Bromberg-Martin¹, Kei Ogasawara¹, Robert W Gereau^1,2,3,4, Ilya E Monosov^1,2,4,5,6.

Abstract

Neuroeconomics studies how decision-making is guided by the value of rewards and punishments. But to date, little is known about how noxious experiences impact decisions. A challenge is the lack of an aversive stimulus that is dynamically adjustable in intensity and location, readily usable over many trials in a single experimental session, and compatible with multiple ways to measure neuronal activity. We show that skin laser stimulation used in human studies of aversion can be used for this purpose in several key animal models. We then use laser stimulation to study how neurons in the orbitofrontal cortex (OFC), an area whose many roles include guiding decisions among different rewards, encode the value of rewards and punishments. We show that some OFC neurons integrated the positive value of rewards with the negative value of aversive laser stimulation, suggesting that the OFC can play a role in more complex choices than previously appreciated.

Entities: Chemical

Keywords: aversion; decision; motivation; neuroeconomics; orbitofrontal; value

Year: 2022 PMID： 36160041 PMCID： PMC9499993 DOI： 10.1016/j.crmeth.2022.100296

Source DB: PubMed Journal: Cell Rep Methods ISSN： 2667-2375

Introduction

The burgeoning fields of neuroeconomics and neuroethology study the neural basis of how humans and animals make decisions that are in large part regulated by the subjective experience of rewards and punishments, referred to as subjective value (Glimcher and Fehr, 2013; Glimcher and Rustichini, 2004; Padoa-Schioppa and Cai, 2011; Pessiglione and Delgado, 2015; Schultz, 2006; Seymour et al., 2007). The goal of much of that work is to determine how predictions and occurrences of rewards and noxious punishments impact choices among distinct behavioral options or offers. Despite this effort, when it comes to studying decision-making motivated by the threat of noxious stimuli, only relatively small progress has been made. One of the challenges is that most methods of delivering aversive stimuli are not temporally precise, or do not allow dynamic adjustments of intensity or stimulation location, during neural data acquisition on a trial-by-trial basis. Many of these methods are also difficult to deploy during imaging, neurophysiology, or both (Table 1). Finally, and perhaps most critically to study subjective value and its relationship to neural activity, large numbers of trials are often required within a single experimental session (Angner and Loewenstein, 2007; Camerer et al., 2004; Glimcher and Fehr, 2013; Kable and Glimcher, 2007; Padoa-Schioppa and Assad, 2006). Many methods of aversive stimulation are inappropriate to deliver many times to animals or humans within a single behavioral session.

Table 1

Some opportunities and limitations of different methods of aversive stimulus delivery

Acute stimulus type	Millisecond-by-millisecond temporal precision	Compatible with electrophysiology	Compatible with functional magnetic imaging	Compatible with calcium imaging	No excessive or extraneous sensory component	Dynamic (trial-by-trial) adjustment of intensity	Dynamic (trial-by-trial) adjustment of precise stimulation location	Suitability for psychophysics (many trials within single session)
Pressure (von Frey)	no	yes	yes	yes	no	yes	difficult	no
Shock	yes	no	difficult	yes	yes	yes	not applicable	yes
Thermal—hot plate	difficult	yes	difficult	yes	yes	yes	not applicable	no
Thermal—probe	difficult	yes	difficult	yes	yes	yes	difficult	no
Air puff	yes	yes	difficult	yes	no	difficult	difficult	difficult
Laser	yes	yes	yes	yes	yes	yes	yes	yes

Some opportunities and limitations of different methods of aversive stimulus delivery Common methods used to deliver noxious stimuli include air puffs, electric shocks, thermal stimulation (i.e., cold or hot plate), and pressure (i.e., von Frey). Each of these methods have limitations (Table 1): some are either not temporally precise, incompatible with electrophysiology or imaging, cause too much stress, or cannot be used in large numbers of trials within single experimental sessions. To take one example, air puffs aimed at the eye are the most commonly used noxious stimulus in non-human primate (NHP) studies and are commonly used in rodent studies of reinforcement learning. While air puffs are subjectively aversive (Amemori and Graybiel, 2012; Jezzini et al., 2021; Monosov, 2017), their prominent auditory components (Fiorillo et al., 2013), poor trial-by-trial controllability, and high sensory salience (Barberini et al., 2012) make neural and behavioral responses difficult to interpret. There are also issues in cross-species interpretability. Unlike many other animals, primates use eye movements to gather information (Bromberg-Martin and Monosov, 2020; Gottlieb, 2012; Gottlieb et al., 2013, 2014; Monosov, 2020), attend to salient stimuli (Ghazizadeh et al., 2016; Goldberg et al., 2002), and evaluate behavioral possibilities (Ghazizadeh et al., 2016; Hikosaka et al., 2013, 2014; Hunt et al., 2018; Traner et al., 2021), and their eye movements are often used as reports of their preferences in many, if not most, decision-making experiments (Britten et al., 1996; Bromberg-Martin and Hikosaka, 2011; Jezzini et al., 2021; Louie and Glimcher, 2010; Padoa-Schioppa and Assad, 2006). Air puffs aimed at the eye can interfere or even compete with other processes that influence overt action (Barberini et al., 2012; Matsumoto and Hikosaka, 2009; Monosov, 2017). In rodents, using air puffs can produce variable results, with some animals either never expressing conditioned responses or becoming too stressed to participate in complex tasks. Air puffs are rarely used in human studies of value and decision-making, with experimenters often choosing other stimuli (Delgado et al., 2009; Ironside et al., 2020). If our goal is to translate the neural mechanisms of behavior from animal models to humans, we must develop aversive stimuli that are translatable across species and usable in complex behavioral tasks that allow for precise psychophysical and econometric measures. Here, we show that laser-based aversive stimulation of the skin can be readily used for studies of decision-making, cognition, choice, and acute aversion in NHPs and mice. This method is already used to produce pain responses in humans (Hu et al., 2014; Moayedi et al., 2015; Mouraux and Iannetti, 2009; Ronga et al., 2013). We show that laser stimulation of the skin can be used to measure the subjective value of aversive experiences in NHPs and can be used to train head-fixed mice to associate auditory stimuli with aversive outcomes. We then show that aversive laser stimulation can address an outstanding issue in systems neuroscience, studying how value-based decision-making functions at the level of individual neurons. We asked if the regions in the primate orbitofrontal cortex (OFC) known to directly contribute to value-based choices between different juice types and quantities (Ballesta et al., 2020) also contain neurons that integrate multiple dimensions of offers predicting both future rewards and punishments to reflect a valence-independent subjective value signal that reflected the monkeys’ preferences and could, in principle, guide choice. We show that that some OFC neurons integrate both rewards and punishments into a combined valence-independent value signal.

Results

Inducing negative value in decision-making with laser stimulation of the skin

We sought to shed light on the neuronal representation of negative values induced by aversive stimuli. To do so, we first introduced the use of skin laser stimulation in an economic decision-making paradigm. We used a 4 ms laser pulse generated by an infrared neodymium:yttirum-aluminum perovskite (Nd:YAP) laser with wave length of 1.34 μm (STAR Methods). This stimulus has been used to drive acute pain responses in human subjects, but its integration into a neuroeconomic behavioral paradigm to study mechanisms of decision-making has not been reported (Hu et al., 2014; Moayedi et al., 2015; Mouraux and Iannetti, 2009; Ronga et al., 2013). We used laser stimulation of lesser energy than that of human studies. The goal was to test whether skin laser stimulation could be useful as an aversive stimulus to induce negative value in an economic choice task. We trained two monkeys to make choices among two offers indicating different levels of reward (juice quantity) and punishment (laser power) that would be delivered over the back of the neck at the end of the trial. Each offer included two distinct bars (Figures 1A–1C). The height of one bar indicated the quantity of juice (mL) while the height of the other bar indicated the power of laser stimulation (joules [J]; Figure 1C). By letting the subjects choose among these offers (Figure 1D), we were able to measure their preferences to obtain reward and avoid laser stimulation and to infer the total subjective value of each offer. Thus, we tested if the monkeys considered both the punishment value (aversiveness) of laser stimulation and the positive value of the juice reward from each offer to generate decisions.

Figure 1

Valuation and aversive decision-making in non-human primates using laser stimulation

(A) Cartoon schematic of the experiment and laser parameters (top).

(B and C) Monkeys chose between two offers, which contained information about juice reward quantity and punishment magnitude (laser power). Each offer contained two bars, the height of which conveyed the appetitive (reward quantity) and aversive (laser power) attributes of each offer.

(D) Two monkeys’ (M1 and M2) average choice behavior indicates that the negative value of the laser grows as a function of laser power. y axis: percentage of choices of offers that contained laser stimulation versus those that contained no laser stimulation. x axis: choices organized by the difference in reward quantity between offers that predicted laser stimulation versus no laser stimulation. The choices are shown for the three laser powers separately (0.5–1.5 J). Shaded areas are confidence intervals. Indifference (50% of choosing either offer) is shown by a dashed line.

(E) Weights from a logistic regression fit to trials pooled from both subjects. Monkeys weighted increasing amounts of reward more positively and increasing laser powers more negatively. Error bars are ±1 SE. Asterisks indicate significance of each weight, and additional asterisks in between the bars indicate differences between weights of adjacent reward amounts or laser powers. ∗∗∗p < 0.001.

Valuation and aversive decision-making in non-human primates using laser stimulation (A) Cartoon schematic of the experiment and laser parameters (top). (B and C) Monkeys chose between two offers, which contained information about juice reward quantity and punishment magnitude (laser power). Each offer contained two bars, the height of which conveyed the appetitive (reward quantity) and aversive (laser power) attributes of each offer. (D) Two monkeys’ (M1 and M2) average choice behavior indicates that the negative value of the laser grows as a function of laser power. y axis: percentage of choices of offers that contained laser stimulation versus those that contained no laser stimulation. x axis: choices organized by the difference in reward quantity between offers that predicted laser stimulation versus no laser stimulation. The choices are shown for the three laser powers separately (0.5–1.5 J). Shaded areas are confidence intervals. Indifference (50% of choosing either offer) is shown by a dashed line. (E) Weights from a logistic regression fit to trials pooled from both subjects. Monkeys weighted increasing amounts of reward more positively and increasing laser powers more negatively. Error bars are ±1 SE. Asterisks indicate significance of each weight, and additional asterisks in between the bars indicate differences between weights of adjacent reward amounts or laser powers. ∗∗∗p < 0.001. We found that the laser indeed had a negative subjective value that scaled with laser energy (Figures 1D and 1E). To confirm the aversiveness of the laser, we first compared the subjects’ preference (percentage chosen) for offers that contained different magnitudes of laser stimulation versus offers that did not deliver laser stimulation (Figure 1D). We did this separately for trials where there was medium, low, or no difference in the value of juice reward between the laser and no-laser offers (Figure 1D, x axis). The results show that the animals sacrificed reward in order to avoid the laser. This was particularly the case for the highest laser power (1.5 J) and medium laser power (1 J). To quantitatively measure the aversiveness of the laser, we used a generalized linear model to model the animals’ log odds of choosing each offer as a linear weighted combination of its key decision attributes: the amount of reward and laser power (Figure 1E). The results corroborate the qualitative visualization in Figure 1D. As expected, the monkeys' choices were best fit by significant positive weights for reward quantity and negative weights for laser power. Consistent with Figure 1D, the greatest laser power had the greatest negative weight. Notably, even the low- and medium-power laser stimuli were significantly aversive, despite the fact that all three laser pulses were lower than those used in humans (Hu et al., 2014; Moayedi et al., 2015; Mouraux and Iannetti, 2009; Ronga et al., 2013), which produce subjective reports of “clear pinprick” sensations in humans (Moayedi et al., 2015; Mouraux and Iannetti, 2009). In M1, we used all three laser powers, and all three (0.5–1.5 J) were significantly fit by negative weights, indicating that they were aversive (Figure S1). Because one of our goals was to have a reliably aversive but mild stimulus that could be used over many trials, in M2, we restricted the laser stimuli to low and medium power. We found that medium power was reliably aversive, while the low laser power was not (Figure S1). In sum, laser skin stimulation was aversive in each animal, namely at the medium (1 J) power, and the subjects had differences in their subjective valuation of the weakest laser (0.5 J). Overall, these results indicate that laser stimulation is reliably aversive and can be used in neuroeconomic studies of aversive decision-making. While it is aversive, laser skin stimulation also produces behavioral variability, which is important for relating behavior and neural activity on a trial-by-trial basis, on a session-by-session basis, and across subjects.

Negative motivational value of laser skin stimulation in mice

Next, we sought to understand whether laser stimulation of the skin could be used as an aversive stimulus for studies of motivated behavior in rodents. This is important because rodent models provide powerful opportunities to study neural circuitry with a high degree of cellular and molecular precision. Laser stimulation has been used in rodents to activate cutaneous nociceptive terminals of superficial skin layers (Sikandar et al., 2013) but has not yet been reported in studies of motivation, such as in aversive conditioning or learning. We piloted a three-tone auditory Pavlovian conditioning task in head-fixed mice (Figure 2A). The three tones served as conditioned stimuli (CSs) predicting the delivery of unconditioned stimuli (USs) 2 s after the start of the CS. Tone 1 indicated to the mouse that juice reward (∼4 μL) would be delivered in 2 s. Tone 2 indicated that no outcome would be delivered. Tone 3 indicated that laser stimulation (0.75 J, 1 ms pulse duration) would be delivered to the posterior regions of the skin (Figure 2A, top). Head-fixed mice were placed on a wheel that allowed us to measure running and licking behavior throughout the session.

Figure 2

Measuring motivational value of water rewards and laser punishments in head-fixed mice

(A) Behavioral task diagram.

(B) Time courses of licking activity aligned to tone CS onset, averaged over trials from all animals (S2: 540 trials, S5: 600 trials, S7: 840 trials, S8: 901 trials) over sessions after the animal learned the tones (18 sessions total; S2 = 3 sessions, S5 = 5 sessions, S7 = 5 sessions, S8 = 5 sessions). Shaded error bars are ±1 SE. Mice show anticipatory licking for water shortly after the onset of the reward CS tone, no licking to the neutral CS, and a small but significant increase in licking at the beginning of the punishment CS. (B and C, bottom) Colored bars show significant differences in time among the conditioned responses to reward and neutral (blue), punishment and neutral (red), and reward and punishment (purple) CSs (rank-sum test, p < 0.001). Gray bar shows time window used for analyses in (D)–(G) (last 500 ms of CS epoch before the US). (B, right inset) Licking activity from an example session from S7. Licks on individual trials are shown in gray on each row (60 trials for each condition). Average activity is shown in the overlaid line (blue = reward, gray = neutral, red = punishment).

(C) Time courses of normalized running activity aligned to CS time; conventions are same as in (B). Running speed was normalized within each session by first Z scoring running speeds across trials, then normalizing Z scores between 0 and 1. Mice show the greatest increase in running in anticipation of laser punishments. On some trials, mice flinch, pausing and/or running backwards before laser delivery, leading to a dip in average running activity before the US. Mice, on average, increase their running near the beginning of the reward CS but slow down in anticipation of receiving the water reward. (C, right inset) Raw running speeds from the same example session as (B). In the heatmap, red indicates forward speed, and blue indicates backward speed. Average activity is shown in overlaid line.

(D) Average of all licking (left) and running (right) activity, averaged over individual learned session averages. Different colored lines indicate average activity of different mice. Error bars are ±1 SE over session averages. Average licking activity was significantly different between reward and neutral conditions (p = 1.964 × 10−4, signed rank test) and between reward and punish conditions (p = 1.96 × 10−4). Average running activity was significantly different between punish and neutral conditions (p = 1.96 × 10−4) and between punish and reward conditions (p = 3.86 × 10−4).

(E) Licking (top) and running (bottom) for all animals across sessions of training. Mice quickly acquired conditioned responses to the reward and punish CSs, with differences in conditioned behavior to the different tones emerging after 1–3 sessions. Error bars are ±1 SE across trials.

(F and G) Average of all licking (F) and running (G) activity for the 5-condition Pavlovian conditioning paradigm. Naive mice were trained to associate 5 different tones with big water reward, small water reward, neutral (no outcome), small laser punishment, and big laser punishment outcomes. Colored lines are individual mice averages across sessions; error bars are ±1 SE across sessions (total sessions = 25; S15 = 5 sessions, S16 = 6 sessions, S17 = 7 sessions, S18 = 7 sessions).

(F) Mice show graded anticipatory licking to different CSs, licking most to the big reward CS, less to the small reward CS, and least to the neutral and punish CSs. Differences were significant between large reward and neutral (p = 1.25 × 10−5), small reward and neutral (p = 1.25 × 10−5, small punish and neutral (p = 0.025), large reward and small reward (p = 0.00942), large reward and small punish (p = 1.39 × 10−5), large reward and large punish (1.23 × 10−5), small reward and small punish (p = 1.57 × 10−5), and small reward and large punish (p = 2.54 × 10−5).

(G) Mice show graded anticipatory running to increasing negative value, running most to the big punish CS and least to the big reward CS. Differences were significant between large reward and neutral (p = 7.33 × 10−4), small reward and neutral (p = 0.00143), small punish and neutral (p = 0.00351), large punish and neutral (p = 3.22 × 10−5), large punish and small punish (p = 2.4 × 10−4), large reward and small punish (p = 7.22 × 10−5), large reward and large punish (p = 8.09 × 10−5), small reward and small punish (p = 1.26 × 10−4), and small reward and large punish (p = 1.01 × 10−4; Wilcoxon signed rank test).

Measuring motivational value of water rewards and laser punishments in head-fixed mice (A) Behavioral task diagram. (B) Time courses of licking activity aligned to tone CS onset, averaged over trials from all animals (S2: 540 trials, S5: 600 trials, S7: 840 trials, S8: 901 trials) over sessions after the animal learned the tones (18 sessions total; S2 = 3 sessions, S5 = 5 sessions, S7 = 5 sessions, S8 = 5 sessions). Shaded error bars are ±1 SE. Mice show anticipatory licking for water shortly after the onset of the reward CS tone, no licking to the neutral CS, and a small but significant increase in licking at the beginning of the punishment CS. (B and C, bottom) Colored bars show significant differences in time among the conditioned responses to reward and neutral (blue), punishment and neutral (red), and reward and punishment (purple) CSs (rank-sum test, p < 0.001). Gray bar shows time window used for analyses in (D)–(G) (last 500 ms of CS epoch before the US). (B, right inset) Licking activity from an example session from S7. Licks on individual trials are shown in gray on each row (60 trials for each condition). Average activity is shown in the overlaid line (blue = reward, gray = neutral, red = punishment). (C) Time courses of normalized running activity aligned to CS time; conventions are same as in (B). Running speed was normalized within each session by first Z scoring running speeds across trials, then normalizing Z scores between 0 and 1. Mice show the greatest increase in running in anticipation of laser punishments. On some trials, mice flinch, pausing and/or running backwards before laser delivery, leading to a dip in average running activity before the US. Mice, on average, increase their running near the beginning of the reward CS but slow down in anticipation of receiving the water reward. (C, right inset) Raw running speeds from the same example session as (B). In the heatmap, red indicates forward speed, and blue indicates backward speed. Average activity is shown in overlaid line. (D) Average of all licking (left) and running (right) activity, averaged over individual learned session averages. Different colored lines indicate average activity of different mice. Error bars are ±1 SE over session averages. Average licking activity was significantly different between reward and neutral conditions (p = 1.964 × 10−4, signed rank test) and between reward and punish conditions (p = 1.96 × 10−4). Average running activity was significantly different between punish and neutral conditions (p = 1.96 × 10−4) and between punish and reward conditions (p = 3.86 × 10−4). (E) Licking (top) and running (bottom) for all animals across sessions of training. Mice quickly acquired conditioned responses to the reward and punish CSs, with differences in conditioned behavior to the different tones emerging after 1–3 sessions. Error bars are ±1 SE across trials. (F and G) Average of all licking (F) and running (G) activity for the 5-condition Pavlovian conditioning paradigm. Naive mice were trained to associate 5 different tones with big water reward, small water reward, neutral (no outcome), small laser punishment, and big laser punishment outcomes. Colored lines are individual mice averages across sessions; error bars are ±1 SE across sessions (total sessions = 25; S15 = 5 sessions, S16 = 6 sessions, S17 = 7 sessions, S18 = 7 sessions). (F) Mice show graded anticipatory licking to different CSs, licking most to the big reward CS, less to the small reward CS, and least to the neutral and punish CSs. Differences were significant between large reward and neutral (p = 1.25 × 10−5), small reward and neutral (p = 1.25 × 10−5, small punish and neutral (p = 0.025), large reward and small reward (p = 0.00942), large reward and small punish (p = 1.39 × 10−5), large reward and large punish (1.23 × 10−5), small reward and small punish (p = 1.57 × 10−5), and small reward and large punish (p = 2.54 × 10−5). (G) Mice show graded anticipatory running to increasing negative value, running most to the big punish CS and least to the big reward CS. Differences were significant between large reward and neutral (p = 7.33 × 10−4), small reward and neutral (p = 0.00143), small punish and neutral (p = 0.00351), large punish and neutral (p = 3.22 × 10−5), large punish and small punish (p = 2.4 × 10−4), large reward and small punish (p = 7.22 × 10−5), large reward and large punish (p = 8.09 × 10−5), small reward and small punish (p = 1.26 × 10−4), and small reward and large punish (p = 1.01 × 10−4; Wilcoxon signed rank test). An example session’s data obtained after 8 days of training are shown in Figures 2B and 2C (right panels). First, the mouse showed significant and consistent licking in anticipation of reward (Figure 2B, right). Licking was significantly greater for the reward tone than the no outcome (neutral) tone (p = 1.28 × 10−23, rank-sum test) and the laser-associated tone (p = 5.69 × 10−24). In contrast, running was greatest in anticipation of the laser punishment (Figure 2C, bottom right), significantly more so than to the neutral (p = 7.9 × 10−13) or reward tone (p = 3.12 × 10−21). Moreover, beyond the running behavior, on some trials, the mouse displayed flinching-like behavior shortly before the laser was delivered—pausing in its running or shifting backwards (Figure 2C, bottom right). Overall, running was related to the expectation of the laser stimulation and produced behavioral variability on a trial-by-trial basis that would be necessary to relate neural activity to behavior and to expectancy of aversive stimulation. We next analyzed the behavior of 4 mice. We found that on average the animals quickly learned the three-tone task (Figures 2B–2E), and that our task tended to produce a behavioral response that closely resembled the example session (Figures 2B and 2C). Analyzing across many behavioral sessions also revealed additional important and diverse features in the animals’ reward- and punishment-related behavioral repertoire (Barberini et al., 2012; Monosov, 2017). In early running responses (∼1 s following the tone), the running increased in response to the two tones that predicted motivationally salient outcomes (the reward tone and laser tone) but not to the neutral tone. In late running responses (∼500 ms before the outcome), on average across many sessions, the anticipatory running selectively scaled with expected negative value: most running in anticipation of the laser, no change in running for the neutral tone, and slowly reducing running in anticipation of reward delivery and consumption (Figure 2C, but note that this reward expectation-related reduction in running was not evident on every single session or for every mouse). In early licking responses, like early running responses, licking increased at first to the two outcome-predicting tones (reward and laser) and was most prominent following the reward tone (Figure 2B). This behavior was not explained by the expected value of the tones (reward > no outcome > laser). Later in the trial, this effect was quenched, and the mice licked selectively for the reward tone. This result shows that mice readily learned the reward tone and differentiated the laser and no-outcome tones. It also demonstrates the interaction of salience and value during distinct phases of the task, which will be key for investigation of neural processing, for example in the context of reinforcement learning and emotional regulation (Barberini et al., 2012; Bromberg-Martin et al., 2010; Jezzini et al., 2021; Matsumoto and Hikosaka, 2009; Monosov, 2017; Monosov and Hikosaka, 2012; Shabel and Janak, 2009; Yee et al., 2021). To study how the laser affected mouse behavior over the course of a session, we examined their behavioral responses within and across different task epochs (Figure S2). During each experiment, we shifted the laser location slightly every 60 trials, such that for each experimental session, there were 2 or 3 stimulated locations on the back. We found that both licking and running behavior changed subtly over the course of the entire session and across stimulation locations (which correlated with the time course of the session). The direction of these changes suggested shifts in motivation over long timescales, rather than laser-specific or general stress-induced responses or simple sensitization. Overall, anticipatory licking roughly increased to reward cues and decreased to non-reward cues, and anticipatory running decreased to reward and laser cues over the course of the entire experiment. There was no significant difference in either running or licking behavior from the first half of a particular stimulation location epoch to the second half. To test whether the running escape-like behavior scaled with the aversive motivational value of the laser, we trained 4 additional naive mice on a five-tone task where the two additional tones indicated a smaller reward (∼2 μL) and a smaller magnitude of laser power (0.5 J). Mice learned the meanings of the 5 tones. During the CS epoch, mice’s anticipatory licking scaled with the magnitude of predicted rewards (Figure 2F). Anticipatory running increased with the magnitude of the predicted laser power, suggesting that running is a useful measure of the negative motivational value of the laser stimulation for many mice in our task (Figure 2G). We note that one of the mice displayed little selective anticipatory running behavior in anticipation of the laser. This heterogeneity can be highly helpful for studying neural and behavioral differences among mice and will be the subject of our own future investigations. For example, mice may have a distinct strategies for coping with aversive states or may have different sensitivities or subjective experiences related to the laser. Understanding the cause of this heterogeneity will open opportunities to understand the mechanisms of aversion-processing. It will be particularly important to test the effects of laser stimulation on free-moving, as well as head-fixed, mice to assess their responses to laser stimulation in contexts in which their full behavioral repertoire is available. Together, the results indicate that the laser stimulation produces aversive sensations in mice and that they reliably express anticipatory running behavior that scales with laser power. Moreover, they show inter-trial behavioral variability in their anticipatory running to the laser-predicting cue, freezing or pausing in their running on some trials right before the outcome. Running was also a sensitive measure of reward anticipation. Mice slowed their running in anticipation of reward, and this effect scaled with reward magnitude. Hence, our three- or five-tone Pavlovian conditioning approaches that incorporated laser stimulation provide a surprisingly behaviorally rich and reliable approach to study reward- and punishment-related behaviors and neural activity in mice. After the conclusion of these experiments, we conducted histology on the skin. We compared hematoxylin and eosin-stained samples from unstimulated skin, skin that had been subjected to acute laser stimulation (50 pulses, 3.5 s apart), and skin that had been regularly shaved and stimulated chronically by the laser in behavioral experiments over a period of 3 weeks in the same location (Figure S3). We noted that the collagen in acutely stimulated skin appeared sparser compared with the unstimulated skin, though the epidermis or deeper layers did not seem clearly affected, and histological signs of collagen degradation did not persist in the chronically stimulated skin in our analyzed skin samples. This indicates that our procedure of moving the stimulation site across the skin (STAR Methods) is sufficiently safe for research of negative valence and value that requires multiple trials and experimental sessions.

Correlates of negative value during decision-making in orbitofrontal neurons

Thus far, we showed that laser skin stimulation can be used to study decision-making motivated by aversive negative value in monkeys and mice. We next focused on the neural processing of such aversive negative value. To start, we chose to evaluate an outstanding issue in the systems neuroscience of decision-making: we used our appetitive-aversive choice task (Figure 1) and tested whether single OFC neurons integrate the value of rewards and punishments (Figure 1D). The results indicated that a small but significant proportion of OFC neurons reflected the integration of value in a valence independent manner. The lateral region of primate OFC (area 13) is causally involved in value-based choices among juice types and their quantities (Ballesta et al., 2020). Single OFC neurons in that area also signal the subjective values of choice offers that predict distinct juice types (Padoa-Schioppa and Assad, 2006), incorporating juice probability and quantity (O'Neill and Schultz, 2010; Raghuraman and Padoa-Schioppa, 2014; Setogawa et al., 2019; Yun et al., 2020), and desirability of rewards (Rudebeck et al., 2017). However, it remains unclear whether other decision-related attributes, such as the magnitude of noxious stimuli, are reflected in the activity of OFC neurons signaling reward value during the decision process. Anatomical studies suggest that the OFC is positioned as a second-order gustatory cortex (Lara et al., 2009; Ogawa, 1994; Ongur and Price, 2000; Rolls and Baylis, 1994; Seabrook and Borgland, 2020; Sewards and Sewards, 2001) that could, in principle, be explicitly dedicated to processing the value of primary consumptive reward. In contrast to this idea, human imaging studies indicate that the OFC is sensitive to financial and abstract rewards (Charpentier et al., 2018; Kahnt et al., 2010; O'Doherty et al., 2001). The OFC is also sensitive to reward loss and action costs, as well as to air puff predictions and deliveries during Pavlovian conditioning (Cai and Padoa-Schioppa, 2019; Hosokawa et al., 2013; Kennerley et al., 2009; Morrison and Salzman, 2009). But whether some single OFC neurons integrate both reward and aversive or noxious punishment values of offers during decision-making is unclear. We sought to test if OFC neurons signal only choice-related values of reward or if some of them could integrate both reward and punishment values of offers in our task. This was not intended to discount other theories of the functional role of OFC that do not concentrate on the comparisons of values (Stalnaker et al., 2015; Wilson et al., 2014; Zhou et al., 2021) but rather to concentrate on one computation supported by OFC: processing of subjective value. We recorded neural activity in the OFC while two monkeys participated in the appetitive-aversive decision-making task with low and medium laser power (Figure 1) and found that the OFC contained neurons that indeed integrated the positive subjective value of reward and the negative subjective value of punishments. One example neuron is shown in Figures 3A and 3B. The neuron’s activity was significantly correlated with the subjective value of rewards (Figure 3A, left) and punishments (Figure 3A, center) and to their total value (combined subjective value across the two attributes; Figure 3A, right). In other words, the neuron’s activity reflected the negative weight of aversive laser stimulation and the positive weight of reward quantity on the monkey’s choice. For this neuron, the effect was most prominent during the presentation of offer 2, the time when the monkeys could compare the values of the two offers and formulate their choice (Figure 1C).

Figure 3

Neural correlates of aversive decision-making in orbitofrontal cortex

(A) Example single neuron’s activity is related to the subjective value of reward (left column), punishment (middle column), and their total integrated subjective value (right column). The effects were significant during offer 2. Error bars are ±1 SE. ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001 in all panels of this figure.

(B) Summary of rank correlation coefficients (rho) from (A). Error bars are bootstrapped confidence intervals (200 bootstraps).

(C) (Left) Percentage of neurons signaling subjective value of reward, punishment, and their total subjective value (STAR Methods). (Right) Among punishment subjective value neurons, a significant percentage of neurons’ activities signaled the subjective value of reward; similarly, among reward subjective value neurons, a significant percentage of neurons’ activities signaled the subjective value of punishment. Error bars are bootstrapped confidence intervals (200 bootstraps). Asterisks indicate a significantly higher proportion of neurons than expected by chance (one-tailed binomial test). The number of neurons are indicated for each analysis.

(D) Reward value neurons integrated punishment value. Reward subjective value neurons were selected as those having significant reward subjective value coding at a p < 0.05 threshold during offer 1 (left) and during offer 2 (right). Next, a separate receiver operating characteristic (ROC) analysis was used to measure the discrimination of reward magnitude and punishment magnitude for each neuron during non-overlapping trials. During offer 2, the reward and punishment discrimination indices were negatively correlated, meaning that neurons tended to encode reward and punishment with opposite signs, consistent with a total subjective value representation. Each dot is a neuron. Least square linear fits are shown for significant correlations (red). Correlations were Spearman’s rank correlations. We verified that increasing the threshold for inclusion of neurons to p <0.001 (e.g., effectively tightening the definition of reward value coding; STAR Methods) did not change the results. These data are shown as insets for each offer. In fact, the stricter inclusion strengthened, not weakened, the correlation.

(E) The proportion of the total time spent looking at the offer that monkeys spent looking at the punishment bar. Monkeys spent significantly more of this time looking at the punishment bar during offer 2 than during offer 1 (rank sum test; p < 0.001).

Neural correlates of aversive decision-making in orbitofrontal cortex (A) Example single neuron’s activity is related to the subjective value of reward (left column), punishment (middle column), and their total integrated subjective value (right column). The effects were significant during offer 2. Error bars are ±1 SE. ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001 in all panels of this figure. (B) Summary of rank correlation coefficients (rho) from (A). Error bars are bootstrapped confidence intervals (200 bootstraps). (C) (Left) Percentage of neurons signaling subjective value of reward, punishment, and their total subjective value (STAR Methods). (Right) Among punishment subjective value neurons, a significant percentage of neurons’ activities signaled the subjective value of reward; similarly, among reward subjective value neurons, a significant percentage of neurons’ activities signaled the subjective value of punishment. Error bars are bootstrapped confidence intervals (200 bootstraps). Asterisks indicate a significantly higher proportion of neurons than expected by chance (one-tailed binomial test). The number of neurons are indicated for each analysis. (D) Reward value neurons integrated punishment value. Reward subjective value neurons were selected as those having significant reward subjective value coding at a p < 0.05 threshold during offer 1 (left) and during offer 2 (right). Next, a separate receiver operating characteristic (ROC) analysis was used to measure the discrimination of reward magnitude and punishment magnitude for each neuron during non-overlapping trials. During offer 2, the reward and punishment discrimination indices were negatively correlated, meaning that neurons tended to encode reward and punishment with opposite signs, consistent with a total subjective value representation. Each dot is a neuron. Least square linear fits are shown for significant correlations (red). Correlations were Spearman’s rank correlations. We verified that increasing the threshold for inclusion of neurons to p <0.001 (e.g., effectively tightening the definition of reward value coding; STAR Methods) did not change the results. These data are shown as insets for each offer. In fact, the stricter inclusion strengthened, not weakened, the correlation. (E) The proportion of the total time spent looking at the offer that monkeys spent looking at the punishment bar. Monkeys spent significantly more of this time looking at the punishment bar during offer 2 than during offer 1 (rank sum test; p < 0.001). Among 412 OFC neurons included in our analyses (STAR Methods), we found that there were significant proportions of neurons that signaled the subjective value of rewards, punishments, and their total integrated subjective value (Figure 3C). Moreover, among neurons that were sensitive to reward, a significant proportion (13/132; p = 0.01; binomial test) were sensitive to the negative value of noxious punishment, even though the strongest laser stimulation we used during neural recording was only mildly aversive (Figures 1D and 1E; highest laser power during neural recording was 1 J). In these data, the small proportion of neurons that significantly reflected both the positive value of reward and the negative value of the punishment could be due to weakness of the laser and/or because many OFC neurons are only sensitive to reward values. Therefore, the strongest test of our hypothesis that OFC neurons tend to integrate the positive value of reward and negative value of punishment of each offer was to assess whether reward and punishment coding are negatively correlated across OFC neurons. In other words, reward and punishment values should influence neural activity in opposite directions. This is important for several reasons. During neural recording, we used only the medium and low laser power. The purpose was to test if we could study aversive decision-making mechanisms with the weakest possible laser parameters that produced reliable negative value effects on choice (Figure 1E). Due to this, neural representations of laser’s negative value must be relatively weaker. And indeed, fewer neurons were recruited by the weak negative value of laser than by the overwhelming large positive value of reward (Figure 3C; n = 29 versus 132 of 412 neurons; Figure 1). Nevertheless, if some OFC neurons integrate reward and punishment value, then even relatively weak punishment signals should decrease the value signal in reward value neurons. To test this, we used an analysis of discrimination (STAR Methods) and computed the strength of reward and punishment coding for each neuron. Indeed, we found that in neurons sensitive to reward value (n = 132; STAR Methods), this was indeed the case (Figure 3D). Interestingly, like for the example neuron (Figure 3A), the effect only occurred when offer 2 was presented, at which time the animals could decide between the offers. We used separate trials to compute the magnitudes of reward and punishment discrimination, so the negative correlation between them cannot be attributed to statistical double dipping (STAR Methods). This difference in neural activity between offer 1 and offer 2 in incorporating reward and punishment attributes seemed to have a behavioral correlate in the monkeys’ gaze. During the presentation of the offers, the animals overtly attended to punishment bars more during offer 2, at which time they had to compare the two offers and make a choice (Figure 3E). The observation that punishment value is represented more when animals attend to the visual features that signal it is consistent with the notion that attention or gaze influences decision processes in the OFC (Ballesta and Padoa-Schioppa, 2019; McGinty, 2019; McGinty et al., 2016; Rich and Wallis, 2016; Xie et al., 2018). However, the mechanisms of this modulation remain unknown. In sum, our results indicate that laser skin stimulation can be used as a weak negative value stimulus in an economic decision-making task to study how single neurons integrate the positive value of rewards and negative value of punishments.

Discussion

Translating aversive stimulation flexibly across humans, monkeys, and mice is required to solve the many behavioral and circuit-level puzzles in the neurobiology of decision-making, pain, and mood. We show that aversive laser stimulation can be used for this purpose. In this study, we developed and validated behavioral measures of aversion using laser stimulation in awake behaving animals and used the laser to demonstrate that some single OFC neurons integrate multiple dimensions (or attributes) of offers that include both appetitive and aversive values during economic decisions. One previous study used Pavlovian conditioning in primates and showed that single OFC neurons are sensitive to the expectation and receipt of juice rewards and air-puff punishments. Their results are compatible with our findings and suggest that during appetitive-aversive decisions, OFC neurons could guide decisions by signaling their total integrated subjective value (Morrison and Salzman, 2009). Another related study using electric shocks seemed to also suggest that the OFC could process aversive events, but precisely how was unclear (Hosokawa et al., 2007). To our knowledge, only a few studies successfully used aversive stimuli in monkeys to study the neural substrates of decision-making. This is in part due to the difficulty of using an air puff—the most common aversive stimulus in studies of choice and motivation in non-human primates—as an aversive stimulus. One study used an approach-avoid paradigm to infer subjective value rather than directly measure it using a 2-alternative forced choice task (Amemori and Graybiel, 2012). Our studies using probabilistic air-puff deliveries as negative outcomes in choice tasks (Jezzini et al., 2021; Monosov, 2017) suggest that non-value processes closely related to salience and to defensive behaviors such as blinking can interact or interfere with value-driven behavior and computations (Barberini et al., 2012; Ghazizadeh et al., 2016). And, because air puffs evoke an aversive experience but also are loud, physically strong, salient, and hence evoke a complex range of defense- and salience-related behaviors, we believe that, in many instances, they complicate the interpretation of behavioral and neural data. Here, we show that laser stimulation can be used instead of air puffs to study how monkeys evaluate different magnitudes of juice and of aversive or noxious stimuli in a 2 alternative forced choice task and that laser stimulation provides a titratable nociceptive stimulus. Beyond choice, laser stimulation opens many important directions to explore valence and cognition in primates. For example, when used in tasks in which monkeys or humans fixate in anticipation of outcomes, the laser can be used to further explore the important link between pupillary responses and value-based and cognition-related neural computations (Rudebeck et al., 2014). We do not discount theories of the OFC that concentrate on its roles in other cognitive functions beyond value comparisons (Schuck et al., 2016; Zhou et al., 2021). In fact, many of our neurons (Figure 3) did not signal values, consistent with previous studies (Padoa-Schioppa and Assad, 2006). We speculate that the OFC has many functions, depending on the task and cell type within the OFC that is engaged. The Nd:YAP laser activates Aδ and C pain fibers in the superficial skin. It can produce a rapid initial pricking sensation (Aδ-fiber mediated) followed by a prolonged burning sensation (C-fiber mediated), depending on the intensity and duration of stimulation (Hu et al., 2014; Legrain et al., 2012; Mancini et al., 2015; Moayedi et al., 2016; Mouraux and Iannetti, 2009; Xia et al., 2016). This is particularly the case when complex dynamic laser stimulation trains are used to explicitly dissociate these two different pain sensations. For our purposes, we used a combination of trial randomization, long inter-trial intervals, and a short single pulse to minimize complex nociceptive dynamics. The intensity and perceived aversiveness of the laser pulses are easily adjustable on a subject-by-subject basis (Hu et al., 2014; Mancini et al., 2015; Moayedi et al., 2015, 2016; Mouraux and Iannetti, 2009; Xia et al., 2016). Future studies can now ask not only what laser intensity is aversive but also what magnitude of laser stimulation is perceivable in NHPs or mice using detection tasks, on a subject-by-subject basis (Ronga et al., 2013; Wiech et al., 2010). This will be very useful to isolate pain responses from non-aversive (perception-related) responses. The capacity for dynamic calibration of aversive stimulation will allow for a powerful study of negative affective states across distinct behavioral profiles, circuits, and species. In principle, adjusting the location of stimulation on a trial-by-trial basis could go further in this direction. Among studies of aversive states and pain that we are aware of, aversive stimuli are either delivered broadly (e.g., shock, heat pads) or to a single location of the skin. Because of this, we do not yet know how much the representation of aversive experiences generalizes across the body. This is particularly interesting because sensitivity to stimuli is not constant across different regions of the body, and subjective perception of pain differs depending on its source, intensity, and location. Thus, the encoding of subjective value of aversive stimuli may differ depending on where on the body it is delivered. The laser can be easily modified (via attachments to manual or robotically driven manipulators) to shift the location of the laser stimulation rapidly and easily during a single experiment. With the level of control afforded by the laser, one can carefully study how the subjective experience of pain differs across intensity, duration, and location of the body being stimulated within the same experiment and assess how neural representation is dependent or independent of stimulation location. In our experiments, mice show consistent and rich behavioral responses to reward and punishment predicting cues. And yet, it is always possible that the components of these behaviors are context specific, such as, for example, related to head fixation or to the task structure. The laser can also be used to deliver stimulation to subjects that are not head fixed with the help of online position/behavioral tracking tools (Kane et al., 2020). Further advancements in integrating the laser with free-moving behavioral paradigms can therefore unlock the entire repertoire of punishment-related actions. We note that there are non-nociceptive methods for injecting negative value into animals’ states, such as predator odor, loud tones, and bright lights commonly used in rodent studies. These methods rely on having “innate” negative value to rodents and can sometimes be relatively spatially imprecise and also relatively lacking dynamic range. The laser stimulation allows generalizability across model species and compatibility with neural recording (i.e., bright light stimuli may be incompatible with imaging setups). However, in some contexts, the naturalistic qualities of other non-nociceptive stimuli may be desirable or required (Adams et al., 2012; Ahmadlou et al., 2021; Anderson and Perona, 2014; Bigot et al., 2022; Chang et al., 2013; Friedman et al., 2017; McCall et al., 2017; Menegas et al., 2017; Mobbs et al., 2018; Namburi et al., 2016; Olsson and Phelps, 2007) and could be combined with skin laser stimulation. Therefore, the laser can be used to answer how nociception affects decision-making, learning, or emotion compared with other forms of negative experiences and how these different states are represented in neural pathways (Čeko et al., 2022; Seymour et al., 2007). In summary, we demonstrate the efficacy of skin laser stimulation as an aversive stimulus that can be used across species. We used laser punishments in auditory conditioning experiments in mice and appetitive-aversive decision-making tasks in monkeys. Our behavioral and neural results show that animals perceive the aversiveness of different laser stimulation strengths and use it to guide anticipatory behavior (through conditioned running in mice) and economic decision-making (through choice behavior in monkeys). Even relatively weak levels of laser stimulation elicited effects on behavior and did not lead to long-term skin damage. Aversive laser stimulation of the skin can be safe, reliable, precise, and compatible with neural experiments. Therefore, it opens new opportunities to better understand topics ranging from pain to motivated decision-making in many experimental models, including humans.

Limitations of the study

Research of internal states related to expectation and receipt of rewards and punishments requires a large degree of stimulus titration and adjustment on an experiment-by-experiment basis. This can also be the case for laser stimulation. We do not provide a turnkey solution for all studies of aversive decision-making. Any aversive stimulus will evoke some salience-related and defensive behaviors and internal states that need to be considered. Also, comparisons of laser stimulation with other aversive stimuli will be useful to better understand the benefits and limitations of the laser for research of decision-making and motivation. A related technical detail to note for future experimental use is that the absorption of the laser may be affected by skin pigmentation. Melanin may attenuate the transmittance of laser-delivered infrared heat to the underlying nociceptors (Lenoir et al., 2017; Milanič and Majaron, 2013). The monkeys and mice used in our study all had relatively uniformly colored skin under the area of laser stimulation. But because subjects may vary in their skin pigmentation, the laser intensity may need to be calibrated on a subject-by-subject basis through psychophysics and careful examination of the skin. Unlike many other common methods of aversive stimulation, the dermal laser does not produce extraneous sensory components at the time of delivery. But one important exception is that ultrasonic sounds may occur during laser delivery, which mice are capable of hearing. In studies that seek to assess the responses to the laser itself in rodents, using masking noise during the experiments could be important. Our work does not assess whether the OFC contains cells dedicated to encoding offers in specific decision-related reference frames (e.g., only coding the value of the chosen versus non-chosen offer, or first versus second offer). We conjecture that depending on the architecture or reference frame of a decision task, OFC neurons, and other brain regions, likely support choice behavior in distinct manners. Also, we cannot strongly say whether all OFC neurons or only some tend to integrate rewards and punishments. In other words, it is highly possible that the OFC may contain neurons dedicated to integrating rewards and punishments but also ones that only process rewards or only punishments (e.g., as observed in the anterior cingulate [Monosov, 2017]). This possibility does not negate any key conclusions of the current study but should be investigated in the future. For example, high channel recording methods could allow this issue to be further investigated at the population level (Rich and Wallis, 2016; Wallis, 2018), where the relationship of single-unit activity and population-level activity (McGinty and Lupkin, 2021) can also be further assessed during reward-punishment value integration.

STAR★Methods

Key resources table

Resource availability

Lead contact

Further information and requests should be directed to Dr. Ilya E. Monosov (ilya.monosov@gmail.com).

Material availability

This study did not generate unique reagents.

Experimental model and subject details

Two adult male monkeys (Macaca mulatta; Monkey M1 and Monkey M2; ages: 7-10 years old) were used for recording and behavioral experiments studying economic choices. Eight mice (C57BL/6J, 3-6 months, 3 male (S2, S7, S8) and 5 female (S5, S15, S16, S17, S18)) were used for auditory Pavlovian conditioning. All these procedures conform to the Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee at Washington University.

Method details

Monkeys

Relevant methods were previously detailed in Monosov (2017). Monkeys were monitored daily to ensure participating in the laser experiments did not lead to changes in their typical demeanor or damage to the skin.

Rodents

Mice were implanted with custom stainless steel headplates for head-fixed restraint. Anesthesia was induced with 3% isoflurane and maintained at 1-1.5%. The skin over the skull was shaved and sanitized, an incision was made over the skull, and the skull was cleaned and lightly scraped with a scalpel in order to promote adhesion to the headplate. Headplates were attached to the skull with dental cement (C&B Metabond, Parkell) and dental acrylic (Jet, Lang Dental). Mice recovered in their home cages for 7 days. After recovery, they were singly housed and water restricted to ∼80% of their baseline weight over a period of 5-7 days. Over this time, they were also acclimated daily to being handled, being put into the experimental rig, and being head restrained. Mice were weighed and inspected daily to ensure that weight, body condition, and demeanor were stable over the course of the experiment. A day before the start of behavioral training, mice were briefly anesthetized and the fur over their necks and upper backs was removed. We used a small animal shaver and hair removal cream (Nair) to remove all fur from the region under laser stimulation.

Skin histology in mice

We stimulated the posterior regions on the skin of two mice either acutely or chronically. In the acute sample, the skin was stimulated with 0.75 J laser pulses 50 times, 3.5 s apart, 1 h prior to sacrificing the mouse. In the chronic sample, the skin was regularly shaved and stimulated with 0.75 J laser pulses 20-30 times a day during daily behavioral testing (weekends excluded) for 3 weeks; the mouse was sacrificed 1 day after the end of testing. Mice were deeply anesthetized with a ketamine/xylazine cocktail and transcardially perfused with 0.1M phosphate-buffered saline followed by 4% paraformaldehyde. Skin samples were taken from areas under laser stimulation and from adjacent non-stimulated areas. Samples were dehydrated in increasing concentrates of ethanol and stored in 70% ethanol. For histology, samples were embedded in paraffin blocks, sectioned at 5 μm, mounted on glass slides, stained for hematoxylin and eosin, and imaged under 40x resolution.

Data acquisition

While the monkeys participated in the behavioral procedure, we recorded single neurons in the OFC. The recording sites were determined with 1 mm-spacing grid system and with the aid of MR (3T) and CT images. This imaging-based estimation of neuron recording locations was aided by custom-built software (PyElectrode, Daye et al., 2013). Single-unit recording was performed using 32 channel linear arrays (V-probes, Plexon). Arrays were inserted into the brain through a stainless-steel guide tube and advanced by an oil-driven micromanipulator (MO-97A, Narishige). Signal acquisition (including amplification and filtering) was performed using a Plexon 40 kHZ recording system. Action potential waveforms were identified online by multiple time-amplitude windows, and the isolation was refined offline. Monkeys’ eye position was obtained with an infrared video camera (Eyelink, SR Research). Behavioral events and stimuli were controlled by MATLAB (Mathworks, Natick, MA) with Psychophysics Toolbox extensions. Juice, used as reward, was delivered with a solenoid delivery reward system (CRIST Instruments). For mice, behavioral events were controlled by MATLAB with Psychophysics Toolbox extensions. Water (for the three-tone task) or 15% sucrose water (for the five-tone task) rewards were delivered with custom-built solenoid delivery systems. Sine wave tones were generated in MATLAB and delivered through speakers (Sennheiser HD600). Wheel running speed was captured with a rotary encoder (Yumo). Recording targeted regions within OFC (area 13) previously associated with economic choice valuation (Ballesta et al., 2020; Padoa-Schioppa and Assad, 2006). In Monkey 1(Sb) the mean was 11.5 anterior and 9.5 lateral to the center of the anterior commissure (AC). The recording spanned +/− 1mm around 9.5 lateral to the center of the AC, and +/− 2 mm around 11.5 anterior to the center of the AC. In Monkey 2(Sl) the mean was 11 anterior and 10.5 lateral to the center of the anterior commissure (AC). The recording spanned +/− 1 mm around 10.5 lateral to the center of the AC, and +/− 1 mm around 11 anterior to the center of the AC.

Behavioral tasks

Appetitive-aversive choice task

Each trial started with the presentation of a trial start cue at the center of the screen. Then, after 0.5 s, the first offer was presented, followed by a second offer 0.5 s later. The monkeys had 5 s to make a choice. Monkeys made saccadic eye movements to their preferred offer and fixated it for a required duration (0.3 s for M1 and 0.5 s for M2) to indicate their choice. Then, the unchosen stimulus disappeared. Then, after 3.5 s the laser stimulation was delivered and the laser bar disappeared. On trials in which the offer did not predict laser stimulation, the same sequence occurred, but with no laser. Finally, 1.5 s after the laser delivery, reward was delivered and the reward bar disappeared, indicating the start of the inter-trial-interval (ITI). During neural recordings, for M1 the juice quantities used were 0.22, 0.34, and 0.45 mL; M2 the juice quantities used were 0.43, 0.65, and 0.86 mL. The ITI was ∼6.4 s. All laser and juice magnitudes were equally probable, and the offers were generated independently, except that the two offers were always different from each other (that is, they never indicated exactly the same juice size and laser power). The offers appeared 90 degrees and 270 degrees relative to the center fixation spot at an eccentricity of 11 degrees of visual angle. Each bar was 2 by 8 degrees of visual angle.

Auditory Pavlovian conditioning

Mice were able to freely run and lick throughout the experiment. The lick spout delivering reward was positioned in front of the mouth. Each trial started with the presentation of a sine wave tone (CS) that played for 2 s, followed by the delivery of the trial outcome (water reward, laser punishment, or nothing). Tone type was presented pseudo-randomly for the 3-tone task and randomly for the 5-tone task. Different frequencies of tones predicted different outcomes. S2, S7, S8, and S5 were trained on the 3-tone task (4 μL reward = 3.5 kHz, 0.75 J laser punishment = 6.5 kHz, neutral = 2 kHz), S15, S16, S17, and S18 were trained on the 5-tone task (4 uL big reward = 3.5 kHz, 2 μL small reward = 4.7 kHz, 0.75 J big laser punishment = 8 kHz, 0.5 J small laser punishment = 6.5 kHz, neutral = 2 kHz). Inter-trial intervals were randomly distributed from 8 to 14 s. Sessions lasted from 120 to 180 trials.

Laser delivery

The laser beam was delivered through an optic fiber using a manufactured 1.34um laser system (Electronic Engineering, Florence, Italy). Pulse duration was 4 ms in monkey experiments, and 1 ms in mouse experiments. We used laser stimulation strengths of 0.5, 1, and 1.5 J in monkey experiments, and 0.5 and 0.75 J in mouse experiments. The spot size (diameter of the area hit by laser) was 4 mm.

Quantification and statistical analysis

All statistical tests were two-tailed and non-parametric unless otherwise noted. A significance threshold of p < 0.05 was used unless otherwise noted. We included 412/483 OFC neurons in our analyses that were recorded for more than n = 180 trials. We removed 71/483 neurons because they did not modulate their bulk average activity across task epochs (ITI, trial start, offer 1, offer 2, laser delivery, reward delivery; Kruskal-Wallis test; p ≥ 0.001). This test did not select for neurons with reward or punishment selectivity (e.g., it did not test for modulation of activity across task conditions within any single epoch). Offer 1 responses were studied in the 0.25 s time window starting from 0.1 s after Offer 1 onset. Offer 2 responses were studied in the 0.25 s time window, but starting from 0.15 s from Offer 2 onset to avoid residual activity from Offer 1. Activity was analyzed for all completed trials. Each neuron’s firing rates were normalized by z-scoring, i.e. by subtracting by the mean and then dividing by the SD of a vector of firing rates summarizing that neuron’s responses during the major task-related epochs of all performed trials (White et al., 2019), which consisted of all of the Offer 1 and Offer 2 responses described above, as well as pre-choice responses (the 0.75 s before choice) and pre-outcome responses (the 3.4 s before the time of laser onset or omission). To assess neural discrimination of reward and punishment values, we calculated receiver operating characteristics (ROCs) that assessed neuronal discrimination of reward magnitude by comparing trials with lowest expected reward with trials with the highest expected reward, and of punishment magnitude by comparing trials with lowest expected punishment with trials with the highest expected punishment. Reward discrimination and punishment discrimination were computed from non-overlapping trials. These analyses were performed during the time windows defined above for the analyses of offer related activity. The analysis was structured so that if the area under the curve (AUC) was > 0.5 it indicated that the neuron's activity was higher for the higher magnitude of reward/punishment, while if the AUC was < 0.5 it indicated that the neuron's activity was higher for the lower magnitude of reward/punishment.

Analysis of monkeys’ behavioral choice and its relationship to OFC activity

To model choice behavior, each animal’s full dataset of binary choices was fit with a simple logistic regression model. In effect, their behavior was modeled as arising from a choice procedure in which the log odds of choosing Offer 2 over Offer 1 was equal to the difference in the values of the two offers (Jezzini et al., 2021) plus a constant factor β representing the animal’s order bias in choice (i.e. any generalized tendency to choose Offer 2 over Offer 1): Each offer’s value was modeled as the sum of two factors: the value of its reward, and the value of its laser punishment. Importantly, this framework made it straightforward to model the fact that the subjective value individual animals placed on rewards and punishments could be nonlinear functions of their magnitudes (Figure 1E). To do this, given that an animal was presented with n possible punishment magnitudes (p, p, …, p), we included a separate weight in the model for each of them that was above zero; analogously, given there were m possible reward magnitudes (r, r, …, r), we included a separate weight for each of them that was above the lowest used for that animal. Thus, the value of an offer with reward and punishment magnitudes R and P is:where I is an indicator function that is 1 when its arguments are equal and 0 otherwise. Thus, taken together, the model was a simple logistic regression: The fitted weights for the regressors indicated that both animals placed greater positive value on higher reward magnitudes and greater negative value on higher laser punishment magnitudes (Figure 1E), as one would expect from behavior motivated by rewards and punishments. To model the relationship between OFC activity and value, we defined each offer’s total value with the value function described above (V(offer)), using the weights that were fitted to the animal’s behavior. Thus, each offer’s value was quantified in terms of our behavioral model’s estimate of its effect on choice, in units of log odds. We also defined each offer’s reward value as the sum of the reward-related terms in the value function, and its punishment value as the sum of the punishment-related terms in the value function. To visualize the relationship between OFC activity and offer value, we correlated OFC activity with the total, reward, and punishment value of the offers (Spearman’s rho; Figures 3B and 3C). Finally, to classify neurons as significantly responsive to total, reward, and/or punishment value, we simply regressed neural responses on those values. Specifically, for total value, we fit each neuron’s activity using an ordinary linear regression model with a constant factor and a single regressor equal to total offer value, and obtained the p value for that regressor. To pool data over offer 1 and offer 2, we fit this model separately for offer 1 and offer 2 and then converted the two resulting p values into a single combined p value using Fisher’s combined probability test. We then classified each neuron as significantly responsive to total offer value if that p value was significant (p < 0.05). Finally, we also fit each neuron’s activity using a regression model with a constant factor, a regressor equal to reward value, and a regressor equal to punishment value. We used the same procedure and classified each neuron as significantly related to reward and/or punishment value if the corresponding p values were significant (p < 0.05).

Laser usage and safety

We used laser stimulation in the range of 0.5-1.5 J in NHPs and 0.5-0.75 J in mice. These are lower energies than used in humans. Also, in human studies, longer stimulation trains are used to try to dissociate or differentially engage Aδ and C fibers in the skin. Instead, we used single short duration pulses (4 ms in NHP, 1 ms in mice), and long inter trial intervals separating randomized trial types to minimize accumulation or adaptation. We limited the number of trials per experimental session to ∼400 in NHP and ∼120 trials in mice. We also changed the position of the laser halfway through each experimental session. In monkeys, the laser was often moved to a position ∼15 mm away from the previous position after ∼150 trials. In mice, the laser was moved ∼10 mm after 60 trials. Following each experiment, we carefully examined the skin for signs of inflammation and damage. We found that limiting laser power to 1 J in NHP and 0.75 in mice did not create any change to the underlying skin. After sessions using 1.5 J stimuli, we sometimes noticed the formation of a small red dot that disappeared over the next few subsequent days. For mice, if after an experimental session, the skin appeared irritated (typically due to the laser inadvertently hitting a region of fur), the mouse was temporarily taken off the study and put on ad libitum water for 3 days. Overall, laser safety depends on several experimental factors. Preparing the skin so it is free of all hair was crucial, as excess hair introduces extraneous sensory components and irritation. We shaved NHP daily with a small animal shaver prior to the start of the experimental session. In mice, we shaved and applied hair removal cream one day before the start of experiments, monitored daily, and performed additional hair removal as necessary. We did notice that in mice, but not monkeys, on occasion fur would start to regrow relatively more vigorously in the region of regular shaving and subsequent laser stimulation after 2-3 weeks of daily shaving and laser stimulation. The rate of fur regrowth after this stage can make keeping up with fur removal more of a challenge or nuisance. For short-term experiments, this may not be an issue. For longer-term experiments, we suggest either using a different skin area on the mouse or waiting until the fur completely regrows (∼1 week), then re-shaving the mouse and resuming experimentation.

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Chemicals, peptides, and recombinant proteins

Phosphate buffered saline	Gibco	CAT#70011069
Paraformadehyde	Sigma-Aldrich	CAT#158127-100G
Ketamine-xylazine cocktail	Nexgen	SKU#NC-0254

Experimental models: Organisms/strains

Rhesus macaque	PrimGenNIH Animal Center at Poolesville	Macaca mulatta
Mouse	Jackson Labs	Mus musculus, C57BL/6

Software and algorithms

MATLAB	Mathworks	https://www.mathworks.com/
pyElectrode	Daye et al., 2013	https://github.com/pierredaye/pyElectrode
MATLAB toolbox for behavioral control (PLDAPS)	Eastman and Huk, 2012	https://github.com/HukLab/PLDAPS

Other

32 channel linear array	Plexon	V-probes
Oil-driven micromanipulator	Narishige	MO-97A
40kHz neural recording data acquisition system	Plexon	Omniplex
Eye tracker	SR research	EyeLink 1000 Plus
Behavioral data acquisition system	VPixx	DataPixx
Nd:YAP laser	Electronic Engineering (Florence, Italy)	Stimul 1340

91 in total

Review 1. Social learning of fear.

Authors: Andreas Olsson; Elizabeth A Phelps
Journal: Nat Neurosci Date: 2007-09 Impact factor: 24.884

2. Novelty is not enough: laser-evoked potentials are determined by stimulus saliency, not absolute novelty.

Authors: I Ronga; E Valentini; A Mouraux; G D Iannetti
Journal: J Neurophysiol Date: 2012-11-07 Impact factor: 2.714

Review 3. What the orbitofrontal cortex does not do.

Authors: Thomas A Stalnaker; Nisha K Cooch; Geoffrey Schoenbaum
Journal: Nat Neurosci Date: 2015-05 Impact factor: 24.884

4. Orbitofrontal cortex as a cognitive map of task space.

Authors: G Schoenbaum; Yael Niv; Robert C Wilson; Yuji K Takahashi
Journal: Neuron Date: 2014-01-22 Impact factor: 17.173

Review 5. Neuroethology of decision-making.

Authors: Geoffrey K Adams; Karli K Watson; John Pearson; Michael L Platt
Journal: Curr Opin Neurobiol Date: 2012-08-16 Impact factor: 6.627