Literature DB >> 27399844

Fast and slow transitions in frontal ensemble activity during flexible sensorimotor behavior.

Michael J Siniscalchi¹, Victoria Phoumthipphavong², Farhan Ali², Marc Lozano², Alex C Kwan^1,2,3.

Abstract

The ability to shift between repetitive and goal-directed actions is a hallmark of cognitive control. Previous studies have reported that adaptive shifts in behavior are accompanied by changes of neural activity in frontal cortex. However, neural and behavioral adaptations can occur at multiple time scales, and their relationship remains poorly defined. Here we developed an adaptive sensorimotor decision-making task for head-fixed mice, requiring them to shift flexibly between multiple auditory-motor mappings. Two-photon calcium imaging of secondary motor cortex (M2) revealed different ensemble activity states for each mapping. When adapting to a conditional mapping, transitions in ensemble activity were abrupt and occurred before the recovery of behavioral performance. By contrast, gradual and delayed transitions accompanied shifts toward repetitive responding. These results demonstrate distinct ensemble signatures associated with the start versus end of sensory-guided behavior and suggest that M2 leads in engaging goal-directed response strategies that require sensorimotor associations.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27399844 PMCID： PMC5003707 DOI： 10.1038/nn.4342

Source DB: PubMed Journal: Nat Neurosci ISSN： 1097-6256 Impact factor: 24.884

Operant behaviors are structured around stimuli, actions, and outcomes. Successful execution of a task requires selecting actions that are consistent with the contingencies between these task variables. Importantly, control of action selection in the brain should be both stable and flexible. On the one hand, stability allows a subject to sustain high performance to maximize reward. On the other hand, flexibility is essential for quickly adjusting behavior when a change in contingencies occurs. Striking the delicate balance between stability and flexibility is therefore a key requirement of adaptive decision-making. Moreover, a lack of balance between these opposing aspects of cognitive control is a hallmark of psychiatric disorder[1]. How do we know when to be stable or flexible in a changing environment? In tasks without explicit contextual cues, subjects may adjust their response strategy through reward feedback. Prior studies have observed task-dependent differences in neuronal firing rates and selectivity in multiple frontal cortical regions[2-4]. Notably, during periods of behavioral adjustment, evolution of cortical activity was found to be gradual and late, occurring at time courses that generally match or lag the improvement in task performance[5-8]. However, neurons in the frontal cortex exhibit substantial cell-to-cell variability in such time courses[5]. Population activity may therefore be more useful for capturing circuit dynamics[9-11]. Using ensemble recordings, two studies examined reward-guided adaptations, and found the corresponding changes in network activity to be surprisingly abrupt[12,13]. Determining the functional significance of these findings, however, will require quantitative comparisons of ensemble activity transitions that differ in their dynamics. Transitions that are relatively gradual versus abrupt, or that differ in onset with respect to behavioral changes, could reflect distinct underlying mechanisms for cognitive control. To study adaptive sensorimotor decision-making in mice, we designed a novel head-fixed task that requires animals to shift many times between three sets of stimulus-response contingencies. This task is a variant of arbitrary sensorimotor mapping, a classic paradigm in which subjects are required to follow “conditional rules”[14,15], such as 'for stimulus A, perform one action; for stimulus B, perform another action.' Once learned, the stimulus-response contingencies can then be switched, requiring the learning of novel mappings or retrieval of familiar associations. Associations are made by linking non-spatial stimuli or conditions to actions, and are therefore termed arbitrary[16]. A number of brain regions are involved in arbitrary sensorimotor mapping, including the frontal lobe, striatum, hippocampus and thalamus[16]. Within the frontal lobe, the dorsal premotor cortex has been implicated in the selection of motor programs based on antecedent conditions, as evidenced by the results of lesion studies[17-19], electrophysiology[5,6], functional imaging[20,21], and transcranial stimulation[22] in humans and non-human primates. Secondary motor cortex (M2) has been described as a potential rodent homolog of primate higher-order motor areas[23,24]. Its location, adjacent to the medial prefrontal and primary motor regions, suggests that it may function as a cognitive-motor interface. A long line of research has linked the premotor cortex and neighboring regions to the generation of volitional movements[25-27]. Recent studies in rodents have also focused on the role of M2 in driving motor actions. Random-ratio lever-pressing was shown to become insensitive to reward devaluation in M2-lesioned mice, suggesting a role in goal-directed actions[28]. Neural activity is modulated prior to movement, reflecting involvement in action preparation and initiation[29-31]. Moreover, M2 neurons not only encode current action, but also prior choice and outcome, indicating a broader role in decision-making[29]. One early study showed that rats with lesions of medial frontal cortex, a broader region including M2, had deficits in a visual conditional motor task[32]. To elucidate the relationship between frontal ensemble activity and adaptive behavior, we used two-photon calcium imaging to record from M2 neurons in behaving mice. We found distinct population activity patterns associated with each of the three sets of stimulus-response contingencies. Moreover, following a contingency switch, transitions between ensemble patterns occurred earlier and were more abrupt when animals were required to abort repetitive actions and use a conditional rule. In fact, this change in ensemble activity state could be detected after only a few error trials, preceding the more gradual recovery of behavioral performance. Our results uncover distinct neural transitions associated with different phases of voluntary behavior, and identify a leading role for M2 in engaging actions that require the use of sensorimotor associations.

Results

An adaptive decision-making task for head-fixed mice

We trained head-fixed mice to perform a task requiring flexible sensorimotor mapping. In each trial, fluid-restricted mice were presented with one of two randomized auditory stimuli – either logarithmic frequency-modulated sweeps from 5 to 15 kHz (“upsweep”), or from 15 to 5 kHz (“downsweep”) – and had to respond with a lick to the left or right port (Fig. 1a, Supplementary Video 1). A correct response was rewarded with 2 μL of water, while an incorrect response resulted in white noise. Trials were organized into blocks (Fig. 1b), each with a distinct set of stimulus-response contingencies: “sound-guided” (upsweep-left; downsweep-right), “action-left” (upsweep-left; downsweep-left), and “action-right” (upsweep-right; downsweep-right). When performance reached a criterion of 85% correct over 20 trials, a new block began with different contingencies. Sound and action blocks alternated, and no contextual cue was given to signal the block transition. Therefore, performance beyond the first block required flexible response selection and outcome monitoring. Mice were prepared for this task by initial training to an expert level on two-choice auditory discrimination, i.e. ~30 days on a task with only sound-guided trials. Here, we present data from mice with fewer than six sessions of experience in the adaptive decision-making task.

Figure 1

Behavioral performance of head-fixed mice in an adaptive sensorimotor decision-making task

(a) Schematic of experiment. Each trial begins with an auditory cue. A response window starts 0.5 s after cue onset, during which the first lick is recorded as the response for that trial. Water reward is delivered contingent on a correct response.

(b) There are three trial types, which vary by their cue-response mappings. In sound-guided trials, the correct response is left for the upsweep sound (5-to-15 kHz frequency-modulated) and right for the downsweep sound (15-to-5 kHz). For action-guided left and right trials, the correct responses are left and right, respectively, for either sound cue. Trials of the same type are presented in blocks. Block switches, in which a new trial type is introduced, occur when the correct rate reaches 85% for the last 20 trials.

(c) Behavioral performance surrounding a block switch, either from action to sound (top) or sound to action (bottom). Filled circle, hit rate. Open circle, perseverative error rate. Dotted line, other error rate. Mean±s.e.m. of 33 action to sound switches and 38 sound to action switches.

(d) Performance from one example behavioral session. Each trial results in 1 of 4 outcomes: correct (filled circle), perseverative error (open circle), other error (open triangle), or miss (cross). Vertical line, block switch.

(e) Lick rates detected at the left and right lick ports for upsweep or downsweep sound cues during all correct sound-guided (black), action-left (red), and action-right (blue) trials. For each choice direction, lick rates in action trials were compared with those in sound trials in 0.1 s bins, and the bars atop the panels denote significant differences (p<0.01, paired t-test). Line, mean. Shading, ±s.e.m.

n = 9 sessions from 5 mice.

As expected, a switch in contingencies was associated with an immediate drop in correct response rate (Fig. 1c, d). Most incorrect responses were perseverative errors, indicating a failure to update response strategy for ~20 trials after the switch. We obtained concurrent calcium imaging and behavioral data during 9 sessions from 5 mice (Supplementary Table 1). On average, these mice performed 418±49 trials per session, including 296±38 rewarded trials and 9±1 block switches (mean±s.e.m.; range: 6–19 switches; Supplementary Fig. 1a). To quantify motor output, we calculated the mean lick rates and the time of first lick for different trial types. Overall, licks were tightly locked to the time of auditory cue during correct trials (Fig. 1e). For congruent trials (in which stimulus-response contingencies match), lick rates were indistinguishable across sound and action blocks. For incongruent trials (e.g., left action for upsweep during sound block vs. downsweep during action-left block), there was a noticeable difference in mean lick rates and an increased latency to first lick (Supplementary Fig. 2). Nevertheless, the major determinant for the shape of the lick distribution was response direction, i.e. whether the animal chose left or right (Fig. 1e). Additionally, we used video tracking to monitor whisker and hindpaw positions, and found that their movements also depended mostly on response direction (Supplementary Fig. 3). Therefore, although tongue licks were the means for making operant responses in this head-fixed setup, mice performed more complex motor programs to indicate their choices.

Silencing M2 selectively impairs shift to sound-guided actions

To determine whether frontal cortical activity is necessary for adaptive decision-making in our task, we used the GABAA receptor agonist muscimol to inactivate M2 bilaterally. Muscimol (5mM, 46 nL per hemisphere) or saline vehicle was injected ~1 hr before behavioral testing (n=11 mice; Fig. 2a, Supplementary Table 1). We injected low-molecular-weight fluorescein to estimate the extent of the affected region, which included M2 and Cg1, but not other neighboring regions (Supplementary Fig. 4). Compared to controls, muscimol-injected mice performed fewer trials (Fig. 2b; saline: 608±42, muscimol: 476±31, mean±s.e.m.; p=0.007, W=62, Wilcoxon signed-rank test), although there was no difference in the number of switches per 100 trials (saline: 2.7±0.2, muscimol: 2.7±0.1, mean±s.e.m.; p=0.96, W=34). Notably, separate analyses of sound and action blocks revealed selective impairments in the animals’ ability to engage sound-guided actions, evidenced by a marked (55%) increase in the number of perseverative errors per block (Fig. 2c; saline: 5.7±1.1, muscimol: 8.9±1.9, mean±s.e.m.; p=0.042, W=10, Wilcoxon signed-rank test), and a greater number of trials to reach criterion (saline: 38±4, muscimol: 48±6, mean±s.e.m.; p=0.042, W=10). Nevertheless, muscimol-injected mice eventually reached the criterion of >85% correct, indicating that the transition to a high level of performance was slowed, but not blocked, by M2 inactivation. Intriguingly, silencing had the opposite effect on shifts into action blocks, during which the mice required fewer trials to reach, although this effect fell short of statistical significance criterion (saline: 43±4, muscimol: 32±2, mean±s.e.m.; p=0.054, W=55). Inactivation had no effect on the timing or rates of lick motor output (Supplementary Fig. 5). These results indicate a causal role for M2 in the flexible control of action selection. Additionally, the opposing effects of silencing are useful for understanding how the mouse performs the outlined task. One solution to the task would be to forget and re-learn the relevant stimulus-response associations after each contingency change, similar to a reversal task[33]. This approach predicts symmetric changes in behavior following perturbations. An alternative approach would be to rely on these associations for sound-guided trials, and then ignore them during action blocks to favor repeated selection of the same response. In this case, the mouse would perform the task by shifting the balance between conditional and non-conditional means of responding. The asymmetric deficits observed in our experiments are consistent with the second approach, and implicate M2 in the breaking of repetitive actions and biasing choices based on learned associations.

Figure 2

Bilateral inactivation of secondary motor cortex impairs adjustment to sound-guided trials

(a) Schematic of experiment.

(b) Task performance after bilateral infusion of saline vehicle (Veh) or muscimol (Mus) into M2

Gray lines, individual paired experiments. Bar, mean±s.e.m. Wilcoxon signed-rank test. n = 11 mice.

Imaging task-related activity at cellular resolution in M2

To characterize neural activity, we injected adeno-associated viruses encoding GCaMP6s into layer 2/3 of M2 (AAV1-Syn-GCaMP6s-WPRE-SV40; Fig. 3a). GCaMP6s is a genetically encoded calcium indicator that exhibits a ~25% rise in fluorescence intensity per action potential in cortical pyramidal neurons[34]. While mice performed the adaptive decision-making task, we used two-photon microscopy to record from 62±6 cells per field of view (mean±s.e.m.; range, 26–83 cells; n=9 sessions from 5 mice; Fig. 3b). Figure 3c shows four example M2 neurons with fluorescence transients (ΔF/F) concurrent with responses during sound-guided trials. To examine how the use of conditional rules affects the activity of individual neurons, we averaged ΔF/F across correct trials for the congruent upsweep-left and downsweep-right conditions, separately for sound and action blocks. Neural responses were diverse, even for neurons within the same field of view (Fig. 3d). During sound-guided trials, neurons could exhibit higher ΔF/F for specific associations, i.e. upsweep-left (cell 2) or downsweep-right (cells 1 and 3), or have no preference (cell 4). The use of conditional rules clearly modulated ΔF/F in some neurons (cells 1, 2, and 3), and in other cases had no effect (cell 4).

Figure 3

Two-photon calcium imaging of task-related activity in secondary motor cortex

(a) Example post hoc and

(b) in vivo two-photon images of GCaMP6s-expressing neurons in layer 2/3 of M2.

(c) Fractional fluorescence changes (ΔF/F) in example M2 neurons during performance of sound-guided trials. Vertical line indicates the time of response associated with correct left (solid black), correct right (dotted black), or incorrect (magenta) trials.

(d) Trial-averaged ΔF/F of four M2 neurons for correct left (solid line) and correct right (dotted line) responses in sound-guided (black) and action-guided (red) trials. Line, mean. Gray shading, 95% confidence intervals.

Neural transition is more rapid during shift to sound rule

The observed heterogeneity of neural responses opened the question of whether single-neuron activity in M2 reflects the components of an ensemble representation for specific task variables. If so, then population-level analyses might more effectively capture the content of such representations. Toward this end, we calculated population activity vectors from ΔF/F and used demixed principal component analysis[35,36] to project the vectors in a reduced representational space (see Methods). Plotting these vectors over time generates trajectories describing the time-dependent evolution of ensemble activity during behavior. To determine how the ensemble activity evolved around block switches on a trial-by-trial basis, we calculated the Mahalanobis distances between population activity vectors of each trial and those of the 20 trials prior to the last or next block switch. Following a contingency switch, we found that the ensemble activity migrated away from the previous representational subspace, toward a new subspace associated with the new rule (Fig. 4a). Notably, comparisons of the transition dynamics following a switch into conditional versus non-conditional rules uncovered marked differences. Out of 33 action-to-sound and 38 sound-to-action transitions, 33 and 35 respective switches could be fitted to a logistic function to compare the onset and rate of shifts in population activity patterns (Fig. 4b, c). State transitions associated with the shift to sound-guided responses occurred after only several trials, much earlier than with shifts into repeated actions (Fig. 4d, g; sound: x=4.0, action: x=10.4, median; p=0.007, z=2.70, Wilcoxon rank-sum test). Furthermore, breaking from repetitive to sound-guided responding involved transitions that were more abrupt (sound: k=1.02, action: k=0.35, median; p=0.03, z= −2.17, Wilcoxon rank-sum test). These differences in neural dynamics were not due to behavioral differences, because in this set of experiments, trials to criterion were similar for the two rule types (sound: 39, action: 38, median; p=0.9, z=0.09, Wilcoxon rank-sum test; Supplementary Fig. 1a). Overall, these results suggest that ensemble activity patterns in M2 shift earlier and more steeply when animals are required to abort repetitive actions and engage conditional associations to perform sound-guided behavior.

Figure 4

Transitions in ensemble activity occur earlier and are more abrupt following switch to sound-guided trials

(a) A schematic illustrating ensemble activity dynamics around a block switch. Each curved line represents a single-trial neural trajectory deduced from calcium imaging data. When the contingencies switch, neural trajectories move within the representational space. Trial-by-trial location of ensemble activity patterns was determined by calculating a ratio of Mahalanobis distances, d, where d and d are the Mahalanobis distance from neural trajectory in the current trial to those of the 20 trials pre-switch for the last and current blocks, respectively.

(b) Trial-by-trial location of ensemble activity patterns surrounding two switches from action to sound block. Trial outcomes are plotted on the top row: correct (filled circle), perseverative error (open circle), and other error (open triangle). Filled circles, Mahalanobis distance ratios for individual trials. Line, fit to the logistic function. Upward arrow, behavioral transition trial. Downward arrow, neural transition trial.

(c) Same as (b) for two switches from sound to action block. Note the vertical axis is inverted for presentation purposes.

(d) Summary of parameters extracted by fitting action-to-sound neural transitions with the logistic function. Arrow, median value.

(e) Neural transition trials plotted against behavioral transition trials for action-to-sound switches (see Methods for definition of transition trials). Each symbol represents one block switch. Symbol shapes denote the different sessions. Large circle, median value.

(f) Mean hit and error rates at the behavioral trial corresponding to specific neural transition locations as estimated by the logistic fit for each action-to-sound transition. Circles, mean±s.e.m.

(g–i) Same as (d–f) for switches from sound to action block; black arrows in (f) shown for comparison with (c). *, p<0.05; **, p<0.01, Wilcoxon rank-sum test. Difference in range, L, was not significant (sound: 0.36, action: 0.38, median; p = 0.8, z = 0.28, Wilcoxon rank-sum test). Rightmost bar of the histogram includes all instances above the range.

n = 33 action-to-sound and 35 sound-to-action switches from 9 sessions from 5 mice.

To what extent must population activity resemble the final ensemble state in order to improve behavior? To address this question, we performed two analyses to compare the timing of neural and behavioral transitions. In the first analysis, we defined “transition trials” for behavior (trials to criterion minus 20, the sliding window for assessing criterion) and neural ensemble activity (Mahalanobis distance ratio equaling 75% L based on logistic fit, see Methods). Block-by-block paired comparisons of neural and behavioral transition trials showed that ensemble activity in M2 shifted prior to the recovery of behavioral performance when adapting to conditional rules (Fig. 4e and Supplementary Fig. 6; p=0.003, z= −2.96; Wilcoxon signed-rank test). By contrast, neural and behavioral changes occurred at around the same time for shifts to non-conditional responding (Fig. 4h; p=0.19, z=1.32; Wilcoxon signed-rank test). We should note, however, that the definitions used for transition trials were arbitrary. Therefore, we performed a second, more unbiased analysis in which we determined the mean performance at the behavioral trial corresponding to a series of different neural transition locations. Compared with shifts to action trials (Fig. 4i), transitions to sound-guided trials were associated with hit and error rates that diverged later (Fig. 4f), indicating that behavioral improvement occurred later along the time course of neural transitions. Taken together, these two analyses suggest that when shifting to sound-guided actions, neural ensemble transitions in M2 are nearly complete before behavioral performance improvement can be detected.

Distinct activity patterns accompany rule implementations

Our results indicate that rule shifts are associated with distinct transitions in network activity. This leads naturally to the question of what ensemble dynamics accompany successful rule implementation. We examined trajectories associated with correct responses in the 20 trials pre-switch, when response strategies have stabilized (>85% correct by task design). Figure 5a shows the trajectories of a 56-cell ensemble for left and right responses during sound-guided trials. The trajectories are initially indistinguishable, and then diverge sharply after the animal has made a response. Expanding this analysis to include action blocks reveals population activity patterns that occupy additional, distinct subspaces within the same representational space (Fig. 5b). To quantify rule representations present in the population code, we asked how accurately block type could be predicted from individual population activity vectors. For each session, we constructed a classifier based on linear discriminant analysis (see Methods). Testing each classifier with five-fold cross-validation revealed that in all cases, trial type could be decoded well above chance (Fig. 5c; sound: 78±3%, action-left: 86±4%, action-right: 82±3%; versus chance level of 33%, p=1 x 10−6, 1 x 10−6, 5 x 10−7, t(8)=13.2, 12.8, 14.3, one-sample t-test, n=9 sessions). Repetition of this analysis using a moving window yielded high decoding accuracy at all times during a trial (Fig. 5d), consistent with a global shift in engagement of the network, rather than a simple change in the processing of cue, action, or outcome related signals. Next, we asked whether accuracy of the ensemble classifier could have been driven by a few cells that were highly selective for rule. When classifiers were trained on ΔF/F of individual cells, we found that 27% of the cells could be used to decode block-type above chance; however, accuracies fell along a continuum and at levels below the accuracy of the ensemble (Fig. 5e). To ensure that the differences in trajectories and decoding accuracies were not due to simple sensory or motor parameters, we computed trajectories with matched stimulus, prior choice, current choice, and outcome conditions. Analyses of these congruent trials that differed only by rule yielded similar results (Fig. 5f–i and Supplementary Fig. 7a–d). Taken together, these results indicate that the behavioral implementation of specific conditional and non-conditional rules is associated with distinct network activity patterns in M2, such that population activity from any time during behavior can be used to decode task contingencies with high accuracy.

Figure 5

Multiple strategies are associated with distinct population activity patterns

(a) Neuronal circuit trajectories for an ensemble of 56 simultaneously imaged cells in one experiment. Trajectories were determined from trial-averaged ΔF/F for 44 correct left (dotted line) and 59 correct right (solid line) responses in sound-guided trials. Open circle, time of response. Filled circles, 3 and 6 s after response. PC, principal component.

(b) Same axes as (a), with additional trajectories from 51 correct action-left (red) and 51 correct action-right (blue) trials. Left, trajectories calculated using trial-averaged ΔF/F. Right, three representative single-trial trajectories from each trial type.

(c) Median accuracy of decoding trial type from individual population activity vectors. S, sound; AL, action-left; AR, action-right. Open triangles, individual experiments. Filled triangles, mean±s.e.m. Dotted line, chance-level accuracy.

(d) Temporal dependence of ensemble decoding accuracy, calculated by repeating the decoding analysis separately for each 0.28-s-long window with step size of 0.28 s. The window duration is the inverse of imaging frame rate, which is 3.6 Hz. Gray, individual experiments. Black, mean. Dotted line, chance-level accuracy.

(e) Median accuracy of decoding trial type from fluorescence transients of single neurons. Green circles, trial type-selective cells, i.e. 95% percentile confidence intervals are above chance-level of 33.3%. Black circles, other cells. Black line, mean decoding accuracy using ensemble activity. Dotted line, chance-level accuracy.

(f–g) Same as (b–c) except restricting to trials matching these conditions: stimulus was upsweep, choice was left, and outcome was reward for the current trial, and choice was left for the prior trial. Trial type could be decoded well above chance (sound: 87±5%, action-left: 88±5%; versus chance level of 50%, p = 7 x 10−5, 7 x 10−5, t(8) = 7.50, 7.52, one-sample t-test).

(h–i) Same as (b–c) except restricting to trials matching these conditions: stimulus was downsweep, choice was right, and outcome was reward for the current trial, and choice was right for the prior trial. Trial type could be decoded well above chance (sound: 85±5%, action-right: 86±4%; versus chance level of 50%, p = 7 x 10−5, 8 x 10−6, t(8) = 7.54, 10.01, one-sample t-test).

n = 9 sessions from 5 mice.

Activity patterns toggle between rule-related configurations

When animals solve trials with the same contingencies a second time, do M2 ensembles revisit similar activity patterns, or alternatively, does population activity migrate to a previously uncharted region of state space? Our task was well suited to address this question because blocks of the same trial type were presented multiple times within the same behavioral session. Figure 6a shows an example set of neural circuit trajectories for the first 12 trial blocks within one behavioral session, in which trajectories could be clearly grouped by block-type, and not by their temporal order. To quantify the representational similarity of ensemble dynamics on a block-by-block basis, we calculated the mean Euclidean distances between all possible pairwise comparisons of trajectories within an experiment (see Methods). We found that neural circuit trajectories from blocks of the same type have a relatively small distance of separation, and are similarly compact (Fig. 6b and Supplementary Fig. 7e,f; for Sound (S), Action-left (AL), and Action-right (AR); S-S vs. AL-AL: p=0.6, W=3; S-S vs. AR-AR, p=0.5, W=13; Wilcoxon signed-rank test). By contrast, trajectories from different block types were represented by markedly different ensemble activity (S-S vs. S-AL, p=0.004, W=0; S-S vs. S-AR, p=0.004, W=0; S-S vs. AL-AR, p=0.004, W=0; corrected α =0.01, Wilcoxon signed-rank test with Bonferroni correction). These results indicate that during adaptive decision-making, M2 toggles between distinct functional configurations as the animal repeatedly engages corresponding changes in task demands.

Figure 6

M2 ensembles revisit previous activity patterns upon re-exposure to corresponding trial-type

(a) Neural circuit trajectories, calculated from trial-averaged ΔF/F for each trial block during one behavioral session. Circled numbers denote temporal order in which trial blocks were presented. Open circles, time of response. Black, sound-guided. Blue, action-right. Red, action-left.

(b) Normalized distance between neural circuit trajectories from different trial types across all experiments (see Methods) for sound (S), action-left (AL), and action-right (AR). Open triangles, median distances from individual experiments. Solid triangles, mean±s.e.m. **, p<0.01, Wilcoxon rank-sum test, corrected α = 0.01. n = 9 sessions except for AL-AL (n = 4) and AR-AR (n = 8) because mice did not perform enough switches to experience the same block type again in some sessions.

n = 9 sessions from 5 mice.

Comparison of task-related neural dynamics in M2, ALM, and V1

Next, we sought to determine whether the observed neural dynamics are specific to M2, or may be found also in other brain regions. For this purpose, we imaged neural ensembles in layer 2/3 of anterior lateral motor cortex (ALM; 65±6 cells per field of view, mean±s.e.m.; 8 sessions from 4 mice; Supplementary Fig. 1b) and primary visual cortex (V1; 57±7 cells per field of view; 4 sessions from 2 mice; Supplementary Fig. 1c) to compare with the data from M2 (62±6 cells per field of view; 9 sessions from 5 mice). ALM has been implicated in motor planning and execution[37,38]; however, it is ~1.5 mm distant from M2, and the relationship between the two frontal cortical regions is not understood. V1 was chosen as a control region because the task is performed in the dark and involves no visual stimulus. Multiple linear regression analysis showed that M2 neurons robustly encode not only choice of the current trial, but also choices of the two prior trials (Fig. 7a). By contrast, a higher proportion of cells in ALM encode current choice; however, the signals decay faster, resulting in weaker encoding of prior choices (Fig. 7b). Activity of M2 and ALM neurons can prefer either the ipsilateral or contralateral direction (Supplementary Fig. 8a,b), consistent with prior studies[30,38]. Unexpectedly, choice signals were also observed in V1 (Fig. 7c). Choice selectivity in V1 was relatively weak, and ΔF/F was almost always higher when animals made an ipsilateral choice (Supplementary Fig. 8c). Because choice signals in V1 were transient and animals performed the task in the dark, we conjecture that the selectivity might relate to corollary discharge. To investigate ensemble activity, we employed the same dPCA and linear classifier analyses used for M2. We found that rule type could be decoded with high accuracy using ensemble activity from ALM (matched sound: action-left trials: 78±4%, t(7)=6.94; p=2 x 10−4; matched sound: action-right trials: 78±3%, t(7)=8.68; p=5 x 10−5; versus chance level of 50%, one-sample t-test; Fig. 7d), but at a much worse rate for V1 (matched sound: action-left trials: 58±5%, t(3)=1.88; p=0.2; matched sound: action-right trials: 67±4%, t(3)=3.76; p=0.03). Therefore, both ALM and M2 exhibit task-specific ensemble activity patterns. However, unlike M2, characterization of ensemble transitions in ALM did not reveal significant differences between switches to sound versus action blocks (sound: x=8.2, action: x=10.2, median, z=1.62, p=0.11; sound: k=0.37, action: k=0.56, median, z=0.89, p=0.4; Wilcoxon rank-sum test; Fig. 7e). There were also no detectable timing differences between neural and behavioral transitions in ALM (sound: p=0.5, z=0.61; action: p=0.13, z=−1.50; Wilcoxon signed-rank test; neural transition defined as 75% L). Taken together, these data indicate regionally specific ensemble dynamics associated with adaptive behavior.

Figure 7

Comparison between neural activity patterns in M2, ALM, and V1 during flexible sensorimotor behavior

(a) Multiple linear regression analysis was used to evaluate the fraction of 562 M2 neurons encoding choice signals as a function of time. Regression was performed with a moving window (duration = 0.5 s, step = 0.5 s) to test for significance with α = 0.01 on all sound-guided trials where the current and prior outcomes are hits (i.e. R(n) = 1 and R(n-1) = 1). The bars atop the panels denote significant fractions (p<0.01, binomial test). Gray shading, significance level of 0.01. Dotted line in the middle panel, fraction of cells significant for the interaction term C(n)*C(n-1). n = 9 sessions from 5 mice.

(b) Same as (a) for 518 ALM neurons. n = 8 sessions from 4 mice.

(d) Median accuracy of decoding trial type from individual population activity vectors restricted to matched trials, comparing M2, ALM, and V1 ensembles. For the S:AL subset, sound and action-left trial types were decoded from trials where stimulus was upsweep, choice was left, and outcome was reward for the current trial, and choice was left for the prior trial. For the S:AR subset, sound and action-right trial types were decoded from trials where stimulus was downsweep, choice was right, and outcome was reward for the current trial, and choice was right for the prior trial. Trial type decoded with high accuracy using ensemble activity from M2 (matched sound: action-left trials: 84±4%, t(8)=8.37; p=3 x 10−5; matched sound: action-right trials: 81±2%, t(8)=12.73; p=1 x 10−6; versus chance level of 50%, one-sample t-test). Open triangles, individual experiments. Filled triangles, mean±s.e.m. Dotted line, chance-level accuracy. (e) Neural transition parameters obtained by fitting action-to-sound (black) and sound-to-action (red) transitions with the logistic function, comparing M2 and ALM ensembles. Difference in L for ALM was not significant (sound: 0.35, action: 0.31, median; p = 0.51, z = −0.65, Wilcoxon rank-sum test). Filled circle, median. Line, 25th and 75th percentiles. *, p<0.05; **, p<0.01, Wilcoxon rank-sum test.

Discussion

The results support two novel insights regarding the function of higher-order motor cortex in adaptive choice behavior. First, fast and slow ensemble transitions are neural signatures for distinct phases of voluntary behavior. Comparison between transitions was possible because our task design allows for multiple shifts between multiple contingencies within a single behavioral session. Second, the relative timing of neural and behavioral shifts, as well as the specific deficits following inactivation, highlight a leading role for this region in the engagement of sensory cue-guided actions (Fig. 8). This conclusion contrasts with previous studies of homologous or nearby prefrontal cortical regions, in which neural changes closely match or lag the time course of behavioral adaptation[5,7,12]. A key difference is that prior studies have focused on the learning of novel sensorimotor mappings or new rules, whereas our task requires animals to repeatedly disengage and re-engage the learned associations required for sound-guided trials. Likewise, although our paradigm shares important features with other assays for flexibility[7,9,10,12,39-41], there are also crucial differences (see Methods). We found that bilateral inactivation of M2 selectively impairs the shift into sound-guided actions. This observation is highly consistent with results of dorsal premotor lesions in primates, which disrupt both the learning of novel visuomotor associations and the engagement of previously learned mappings[17-19]. Interestingly, adaptation to action blocks was facilitated by M2 inactivation. This effect could result from a tendency to repeat the prior choice[29]: if M2 normally biases animals toward sensory cue-guided actions, then inactivation may remove an important brake on the non-conditional strategy. In our hands, M2 inactivation slowed but did not preclude the eventual transition to high performance on sound-guided trials. This suggests that at least for trained mice, two-choice auditory discrimination alone does not require M2 and may be subserved by other circuits[42]. Furthermore, we have taken the opposing effects of inactivation on shifts to sound-guided versus repeated actions as evidence that mice perform the task by balancing the use of conditional and non-conditional responses. A key finding of this study regards how specific parameters of ensemble activity transitions may relate to behavior. We found that ensemble transitions are more abrupt when animals need to retrieve and begin using conditional associations. These fast transitions may be related to those observed in medial prefrontal cortex, which have been interpreted as neural correlates of insight[12], or abandonment of an inadequate internal model at the onset of exploration[13]. On what quantitative basis should transitions be classified as abrupt or gradual? In our study, the steepness of these transitions was compared directly to the slower transitions that accompanied action blocks. Moreover, ensemble transitions occurred after only a few errors, whereas behavioral improvements took tens of sound-guided trials (Fig. 4e,f). The difference in neural and behavioral timing suggests that M2 neural activity has mostly adjusted while the animal still systematically responds in a non-conditional manner. M2 may facilitate the engagement of sound-guided behavior by biasing the use of sensory information, suppressing repetitive actions, or both. By contrast, prior studies have shown that when an animal must acquire novel arbitrary associations, changes in cortical activity track behavioral improvements[5,43] and lag the more rapid remapping in the striatum[7]. A major difference between these studies of fast learning and our study is that the auditory-motor associations are already well learned in our task. We found that multiple rules were each associated with a distinct subset of population activity patterns. Such task-dependent changes in neural activity have been reported in multiple frontal cortical regions across species[2-4,12,44]. By asking the animal to shift repeatedly during a single session, we found that the network can return to a previously employed functional configuration to meet similar behavioral demands. This back-and-forth toggling of ensemble activity is reminiscent of the ensemble remapping observed in CA1 of the hippocampus during repeated exposure to spatial contexts[45,46]. Interestingly, one study reported that changes in environmental context also caused network activity shifts in the rodent medial prefrontal cortex. However, the ensemble code was not identical upon re-exposure, potentially due to a systematic drift over time[47]. The divergent findings of repeatable versus drifting network states in the rodent frontal cortex could reflect regional differences, or differences in how frontal areas encode cognitive versus environmental variables. Several lines of evidence support the idea that the neural dynamics in M2 reflect changes in internal processes (e.g. representation of task contingencies or motor planning and preparation), rather than differences in overt physical movements. First, three different ensemble analyses with matched, congruent trial conditions indicated distinct neural dynamics in sound and action blocks (Fig. 5f–i and Supplementary Fig. 7), despite a lack of observable difference in motor output for the same sets of trials (Fig. 1e and Supplementary Fig. 2). Second, neural signals related to motor execution should be strongest at the time of response. Instead, we found that the rule-specific separation of population activity patterns was significantly above chance at all times across a trial (Fig. 5d). Third, and perhaps the strongest evidence: muscimol inactivation of M2 had no detectable effect on motor output (Supplementary Fig. 5), while clearly affecting behavioral flexibility. What is the purpose of functional reconfiguration during adaptive decision-making? Ensemble activity patterns within multiple network subspaces reflect the diversity of neural representations in M2. Recent studies have shown that rodent M2 sends long-range projections to sensory cortex[48,49] and dorsal striatum[50]. Appropriate shifts in neural representations could allow M2 to exert differential top-down control in a task-dependent manner. Further study regarding the downstream impacts of frontal network transitions may yield important insights into neuropsychiatric disorders in which cognitive flexibility is impaired. Plausibly, the cognitive rigidity characteristic of disorders such as schizophrenia could result from an inability of frontal cortical networks to shift or maintain stable ensemble states.

Methods

Animals

Adult male mice with C57BL/6J genetic background were used. Mice were housed in groups of 3 – 5, in 12h/12h light-dark cycle (lights off at 7PM), and most experiments were performed in late afternoons and evenings (4PM – midnight). At the start of experiments, mice were P51 – 117. No statistical tests were used to pre-determine sample sizes, but sample sizes for this study are similar to those generally employed in the field. All experimental procedures were approved by the Institutional Animal Care and Use Committee, Yale University.

Surgery

Mice underwent two surgeries. For each surgery, the mouse was anesthetized with 2% isoflurane in oxygen during induction, then lowered to 1 – 1.5% for the remainder of the surgery. The mouse was placed over a water-circulating heating pad (TP-700, Gaymar Stryker) in a stereotaxic frame (David Kopf Instruments). Pre-operatively, the mouse was injected with carprofen (5 mg/kg, s.c.; #024751, Butler Animal Health) and dexamethasone (3 mg/kg, s.c.; Dexaject SP, #002459, Henry Schein Animal Health). Post-operatively, the mouse was injected with carprofen immediately after surgery (5 mg/kg, s.c.) and each day for 3 days following (5 mg/kg, s.c.). For the first surgery, an incision was made to expose the skull. Based on stereotaxic coordinates, the center location of the mouse secondary motor cortex (M2; AP = 1.5 mm, ML = 0.5 mm; relative to bregma) was marked in the right hemisphere. In other experiments, we targeted the anterior-lateral motor cortex[37] (ALM; AP = 2.5 mm, ML = 1.5 mm) or the primary visual cortex (V1; AP = −3.8 mm, ML = 2 mm) on the right hemisphere. A stainless steel head plate (eMachineshop.com) was affixed to the skull with Metabond (C&B, Parkell, Inc.), and a thin layer of clear Metabond was then applied to cover the entire skull. Mice were given at least 1 week to recover prior to behavioral training. Head plate-implanted mice were then trained on behavioral tasks (see below). Once a mouse reached a performance criterion of >90% correct rate on three consecutive days and was ready for imaging experiments, a second surgery was performed under anesthesia. Using a dental drill, a 3 mm-diameter craniotomy was made at the targeted location, which had been marked previously and remained visible through the Metabond. Dura was left intact, and was irrigated with artificial cerebrospinal fluid (ACSF, in mM: 5 KCl, 5 HEPES, 135 NaCl, 1MgCl2, 1.8 CaCl2; pH 7.3). Using a glass micropipette attached to a microinjection system (Nanoject II, Drummond), 32 – 46 nL of AAV1-Syn-GCaMP6s-WPRE-SV40 (5 x 1013 titer; UPenn Vector Core) was injected at a depth of 400 μm below dura at each of 4 locations, vertices of a 200 μm-wide square centered at the targeted cortical region. The glass micropipette was left in place for 5 min after injection to reduce backflow. A drop of warmed agar (1.2% in ACSF, Type III-A, High EEO, A9793, Sigma-Aldrich) was then applied to the cortical surface. A two-layer glass window was fabricated by first etching out a 2-mm diameter circle from #0 thickness glass cover slip, then bonding with UV-activated polymer (61, Norland Optical Adhesive) to a #1 thickness, 3-mm diameter round glass cover slip (64-0720 CS-3R, Warner Instruments). This glass window was then placed against the cortical surface. While applying light pressure, super glue was added to the rim to attach the glass to the skull and Metabond. Mice were again given at least 1 week to recover before resuming behavioral training. Imaging experiments would begin when behavioral performance criterion was reached. Eight out of eleven mice went through this procedure involving two surgeries. For the remaining three mice, the head plate implant, viral injection, and window implant procedures were performed in the same surgery before behavioral training.

Behavioral setup

For head-fixed mouse behavior, we used a training apparatus that has two lick ports, thus enabling two alternative choices. The use of two lick ports was inspired by another study[37]. Two metal screws were used to affix the head plate of the mouse onto a stainless steel mount. The mouse was then restrained inside an acrylic tube, which restricted gross body movements but allowed postural adjustments. The lick ports were fabricated from stainless steel 20-gauge needles, which were positioned at 90 and 270 degrees with respect to the mouse’s head orientation, and held in place by a 3D-printed plastic part mounted on a micromanipulator for fine positional adjustment. Water was delivered at the ports by gravity feed and the liquid volume was controlled by pneumatic valves (EV-2-24, Clippard), calibrated with an intravenous dripper to deliver ~2 μL per pulse. A battery-operated touch detector circuit signaled when the mouse’s tongue contacted a lick port. Auditory stimuli were played through computer speakers placed directly in front of the animal. The intensity of the auditory stimuli was calibrated to ~85 dB peak amplitude. Water delivery, lick detection, and sound presentation were connected to a desktop computer via a data acquisition board (USB-201, Measurement Computing). Presentation software (Neurobehavioral Systems) controlled the entire behavioral system. An infrared webcam was used to monitor the animal while in the rig. Behavioral training was performed inside the closed compartment of an audio-visual cart that was dark and soundproofed with acoustic foams (5692T49, McMaster-Carr). For imaging, mice were tested using a replica of the behavioral training setup under a two-photon microscope.

Adaptive decision-making task

To motivate participation in the task, water consumption was restricted to behavioral sessions. Mice were trained for 1 session per day, 6 days a week. On the non-training day, water was provided ad libitum in the home cage for 15 min. The mice were trained through four phases to shape their behavior. Phase one (~2 days): mice were habituated to head fixation in the behavior box, and trained to lick either one of the two ports for water reward. Mice were advanced to the next phase when they made >100 responses in a session. Phase two (~2 days), mice were trained to sample both ports. Here, mice were required to lick the left port to obtain water rewards three times, followed by the right port for the next three rewards, and so on. Mice were advanced to the next phase when they made >100 correct responses in a session. Phase three (>15 days), animals underwent training for two-choice auditory discrimination. One of two auditory cues was presented to begin each trial: a 2 s-long train of 0.5 s-long logarithmic frequency modulated sweeps from 5-to-15 kHz (“upsweep”) or from 15-to-5 kHz (“downsweep”). The stimuli were interleaved randomly from trial to trial. At 0.5 s following the onset of the auditory cue, a response window would open lasting for a maximum duration of 2 s. The animal's first lick within this response window was registered as its response for the trial. All other licks were logged but had no consequences. Once a response was recorded, playback of the auditory cue was terminated. A correct response, i.e. a left lick for upsweep or a right lick for downsweep, resulted in immediate delivery of 2 μL of water from the corresponding port. The next trial would begin 6 s following response. Incorrect responses resulted in 2 s of white noise presentation, with the next trial beginning 4 s later. Each trial had a total duration within a range from 7.5 to 9 s. Animals were allowed to perform trials until satiated (20 consecutive misses), typically after ~60 minutes. Training continued daily until a correct rate of >90% was attained for 3 consecutive days. For imaging experiments, mice were then trained under the two-photon microscope (with laser turned off) for habituation to the recording setup. All the mice were able to discriminate at >90% correct rate after 1–3 days of re-training. Finally, mice were tested on the adaptive decision-making task. The task always began with a sound block (S) indistinguishable from the two-choice auditory discrimination task. However, once the mouse reached a performance criterion of >85% correct for the last 20 trials, the stimulus-response-outcome contingencies would change from sound- to action-guided trials. In action-guided trials, task structure was identical to sound-guided trials. However, the correct response became fixed to one response direction, e.g. always left, regardless of the stimulus identity. No cue signaled the change in contingencies. When the mouse reached performance criterion again, another block switch would occur. A sound block was always followed by an action block, and vice versa. The second block was randomly chosen for each experiment to be action-left (AL) or action-right (AR). However, once the first action block was chosen, the block sequence became fixed for the remainder of the session. Therefore, the sequence of blocks could be one of two possibilities: (S-AL-S-AR-S-AL-S-AR…) or (S-AR-S-AL-S-AR-S-AL-…). Each session was terminated after 20 consecutive misses (trials with no response). Mice typically performed the adaptive decision-making task for 60 – 90 min. Following each adaptive decision-making test, mice resumed daily two-choice auditory discrimination until the next recording session, up to a maximum of seven adaptive decision-making tests. Our behavioral paradigm consists of blocks of trials that require the animal to shift between conditional and non-conditional approaches to action selection. In principle, mice may solve this task by ignoring sensory information completely during action blocks. However, the temporally structured lick rates during action blocks (Fig. 1e) strongly suggest use of the stimulus for gating lick responses. Our task has similarities with other paradigms that test behavioral flexibility, but there are also crucial differences. In contrast to paradigms that use a contextual cue to instruct rapid executive control on a trial-by-trial basis[9,10,39], animals adapt on a time scale of tens of trials in our task (Fig. 1c). This relatively slow rate of adaptation is akin to learning during arbitrary visuomotor mapping, where the animal’s basis for action selection is updated gradually based on reward feedback[7,40]. Our task also differs from other strategy- or set-shifting tasks for rodents[12,41] because non-spatial stimuli were used to probe arbitrary sensorimotor associations that do not conform to classical definitions of exemplars or sets. Furthermore, analysis of the types of errors made during training suggests that mice perform two-choice auditory discrimination in part by suppressing a prepotent tendency to repeat a rewarded choice. Action trials could thus be considered a natural strategy to the animal, whereas sound-guided trials require weeks of training to achieve high performance. Therefore, one caveat for our task is that animals are likely to have different degrees of learned and intrinsic familiarity for sound versus action trials.

Two-photon calcium imaging

The two-photon microscope (Movable Objective Microscope, Sutter Instrument) was controlled using ScanImage software[51]. The excitation source was an ultrafast laser (Chameleon Ultra II, Coherent). Excitation intensity was controlled by a Pockels cell (350-80-LA-02, Conoptics) and focused onto the sample with a 20x, N.A. 0.95 water immersion objective (Olympus). The time-averaged excitation laser intensity was 90–100 mW after the objective. To image fluorescence transients from GCaMP6s-expressing neurons, excitation wavelength was set at 920 nm, and emission was collected from 475 – 550 nm with a GaAsP photomultiplier tube. Time-lapse images were acquired at a resolution of 256 x 256 pixels and a frame rate of 3.62 Hz using bidirectional scanning. To synchronize behavior with imaging, a TTL pulse was sent at the beginning of each trial from the data acquisition board of the behavioral system to the imaging system to act as an external trigger for initiating image acquisition.

Inactivation

Mice were implanted with a head plate. The locations of M2 were marked on both hemispheres (AP = 1.5 mm, ML = 0.5 mm), and then covered with a thin layer of clear Metabond. Mice were then trained as described above, in preparation for the adaptive decision-making test. Craniotomies were performed at the marked locations. Using a glass micropipette attached to a microinjection system (Nanoject II, Drummond), ACSF, with or without muscimol (5 mM, 46 nL per hemisphere; cat. #195336, MP Biomedical), was injected at a depth of 400 μm into M2 of both hemispheres. Behavioral testing began 1–3 hr following injection. The same mice were tested after saline and muscimol treatments on consecutive days in a counter-balanced design, with no blinding. The mice were randomized to receive either saline or muscimol first in an alternating manner depending on the order in which they reached the behavioral performance criterion. Twelve mice were allocated for this experiment; however, one was excluded due to equipment malfunction during testing.

Histology

Following experiments, mice underwent transcardial perfusion with chilled formaldehyde solution (4% in phosphate-buffered saline). The brains were sectioned with a vibratome and imaged with an inverted wide-field fluorescence microscope.

Analysis: behavioral data

Timestamps of stimulus presentation, licks, and water delivery were logged in a text file by Presentation software (Neurobehavioral Systems, Inc.). Scripts were written in MATLAB to parse the log files. For the adaptive decision-making task, a perseverative error was defined as an incorrect response that would have been correct according to the last trial block’s contingencies. For example, during an action-left block, the stimulus-response pairings of upsweep-left lick and downsweep-left lick would be “correct”. Downsweep-right lick would be a “perseverative error”, because this stimulus-response pairing would have been correct in the preceding sound-guided block. The remaining possible stimulus-response pairing, upsweep-right lick, would be classified as an “other error”. The number of trials performed included all correct and error trials, but excluded the miss trials when the mouse failed to lick within the response window. Miss trials typically occurred near the end of the session when the mouse was satiated. Trials-to-criterion was defined as the number of trials performed in a certain trial block before reaching a performance criterion of 85% correct for the last 20 trials. Therefore, the minimum value of this quantity is 20. Mean trials to criterion for each session was calculated excluding the first sound block, because contingency switches have not yet begun. Mean blocks per 100 trials, mean perseverative errors per block, and mean other errors per block were calculated excluding the last block (i.e. trials after the last block switch). For analysis, we often compared pre-switch and post-switch conditions, which were defined as the 20 trials prior to or following a block switch. The first lick time was defined as the time of the first lick after sound onset for each trial, which may occur prior to the start of the response window. The first lick time is thus a sum of the reaction time and movement time. For this measurement, we excluded trials in which the mouse licked within 0.5 s before cue onset, in which case the first lick may represent the continuation of a spontaneous lick bout rather than a reaction to the stimulus.

Analysis: imaging data

Time-lapse fluorescence images were corrected for x-y motion using the TurboReg plug-in for ImageJ (NIH). We wrote a GUI in MATLAB to select cell bodies as regions of interest (ROIs). Values of pixels within an ROI were averaged to generate F For each cell, we estimated the neuropil signal by drawing a doughnut[52], by approximating the ROI area as a circle to estimate a radius r, then creating an annulus-shaped neuropil area with inner and outer diameters of 2r and 3r. This neuropil area excluded pixels if they were part of the ROI of another cell body. Values of pixels within the annulus-shaped neuropil area were averaged to generate F. To subtract the neuropil signal, we calculated F(t) = F, where α is a correction factor ranging from 0.2 – 0.6. The value of α was calibrated for each experiment to avoid over-correction, by making sure that F(t) > 0 for each cell. For each ROI, the fractional change in fluorescence, ΔF/F(t), was calculated as: , where F is the baseline fluorescence as a function of time. To estimate baseline, we first obtained F, the mean pixel intensity for the entire 256 pixel x 256 pixel field of view as a function of time. F was then calculated as: where F is the 10th percentile of F within a sliding window of 10 minute duration. F* and F* are the 10th percentile of F(t) and F within the first 10 minutes of the session, respectively. We verified that F does not vary with specific choices or rule blocks, and thus serves the purpose of compensating for slow, full-field signal drifts due to non-physiological sources. We have repeated the ensemble analyses with two other methods for calculating baseline. One, estimating F using the 10th percentile of F(t), on a per-cell basis, with a moving window of 10 minute duration. Two, estimating F using the 10th percentile of F(t) from the entire session, i.e. without a moving window. These different ways to estimate baseline led to qualitatively similar results for all the ensemble analyses.

Analysis: task-related activity and choice encoding

To calculate trial-averaged fluorescence transients, we created time bins that were 0.5 s wide, and then assigned each ΔF/F(t) value at a particular time t to the corresponding time bin relative to the animal’s response. The binned ΔF/F(t) values were averaged to obtain trial-averaged ΔF/F. To estimate uncertainty of the trial-averaged ΔF/F, a bootstrap analysis was performed by drawing fluorescence transients per trial, with replacement, up to the same number used to construct the trial average. The median and 95% confidence intervals of trial-averaged ΔF/F were estimated from 1000 iterations of this bootstrap analysis. To quantify choice encoding, we performed multiple linear regression analysis on the ΔF/F(t) of each cell using the following equation: where C(n) was the choice of current trial, C(n-1) was the choice of prior trial, C(n-2) was the choice two trials ago, ε(t) was the error term and a’s were regression coefficients. We coded a choice of left as 1 and right as −1. We used a non-overlapping 0.5 s-long moving window with step size of 0.5 s. A cell was deemed to encode one of the choice parameters or interaction if p < 0.01 for the corresponding regression coefficient. To avoid confounds from rule and reward signals, we analyzed only sound-guided trials in which R(n) = 1 (outcome of current trial = reward) and R(n-1) = 1 (outcome of prior trial = reward). We did not analyze action trials, because parameters such as C(n) and C(n-1) were highly correlated by virtue of the task structure, obviating a simple interpretation of the analysis.

Analysis: neural circuit trajectories

Scripts for the ensemble analysis were written in MATLAB, and are available upon request. For state-space analysis, we used demixed principal component analysis[36] (dPCA). To prepare the imaging data for dPCA, ΔF/F(t) for each cell for each trial was aligned in time, from 0 to 6 s from the time of the response in that trial. We have tried numerous other time windows and found similar results. This alignment led to an array with dimensions = cells x time x trials. Using this array, we averaged across 4 trial types: C(n) = 1, R(n) = 1, pre-switch sound trials; C(n) = −1, R(n) = 1, pre-switch sound trials; C(n) = 1, R(n) = 1, pre-switch action trials; C(n) = −1, R(n) = 1, pre-switch action trials. This trial-averaged array (cells x time x 4) was input into the dPCA algorithm[36] to demix time- and task-dependent variances and obtain principal components (PCs). To calculate neuronal circuit trajectories, single-trial or trial-averaged ΔF/F were projected onto the first three PCs. To characterize similarities between the neuronal circuit trajectories across blocks, we calculated the neuronal circuit trajectory for each block by using the trial-averaged fluorescence across the 20 trials pre-switch. The similarity between a pair of trajectories was quantified by calculating the mean of the Euclidean distances between the trajectories at matching time points in state-space. In order to compare between different experiments, this distance was normalized for each experiment: the Euclidean distances were divided by the spread of all population vectors, calculated as the root mean square of distances between all population vectors and the centroid of the vectors. To quantify how the neuronal circuit trajectories evolve on a trial-to-trial basis, we used the Mahalanobis distance, which is a measure of distance between one point and another collection of points. We defined the origin as the 20 trials preceding a block switch, and the destination as the 20 trials preceding the next block switch. We were interested in the relative separation between the origin, an individual trial that occurred in between, and the destination. Therefore, for each time point of a trial, we calculated Mahalanobis distances, d and d, from the individual trial (1 three-dimensional value) to the origin and destination respectively (20 three-dimensional values). The d and d for each individual trial is the median of d and d of the ~30 time points within a trial. To estimate the location of an individual trial relative to the origin and destination, we calculated the ratio of Mahalanobis distances, d. For the Mahalanobis distance ratios, which are a function of trial number from switch, we fitted with a logistic function, where x is the midpoint trial, k is the steepness, L is the range, and L is the minimum value. The parameter L is not fitted, but rather estimated for each transition by calculating the mean of Mahalanobis distance ratio using trials −5 to −1 from switch. We fitted every neural ensemble transition using this method, but excluded those in which the midpoint trial x < −5 or x > 200, indicating a poor fit. Based on this criterion, we excluded none (0/33) of the action-to-sound shifts and 8% (3/38) of the sound-to-action shifts in our analysis of M2 neural ensembles. For analysis of the ALM data set, we excluded 8% (2/26) of the action-to-sound shifts and 3% (1/32) of the sound-to-action shifts. When comparing behavioral and neural transitions, we defined ‘behavioral transition trial’ as the trial to criterion (85% correct for 20 trials) subtracted by 20, i.e. the first of the sequence of 20 trials leading to block switch. The ‘neural transition trial’ was defined as the trial when the first term of the logistic fit of Mahalanobis distance ratios reached a value of 75% L. That is, the trial x that satisfies this equation: This definition is arbitrary; it is unknown how much the population activity pattern must resemble the final pre-switch ensemble state in order to qualify as a ‘transition’. Therefore, in another analysis we first fitted each neural transition with the logistic function, and identified the behavioral trial corresponding to each 5% L step of neural transition from 10 to 90% L. We then calculated the mean hit and error rates at those corresponding behavioral trials, thus plotting the relationship between behavioral performance and neural transition without explicitly defining a transition trial.

Analysis: decoding

To determine how well ensemble dynamics could be used to predict trial type, we first selected those imaging frames that occurred between 0 to 6 s from time of response out of the frame-by-frame imaging data (i.e., ΔF/F(t)). We then projected these ΔF/F(t) onto the PCs deduced from dPCA to obtain population activity vectors. This procedure reduced the dimensionality of our data from (frames × cells) to (frames × 3). Each population activity vector in this analysis came from one of four possible trial types: R(n)=1, pre-switch sound trials; R(n)=1, pre-switch action-left trials; R(n)=1, pre-switch action-right trials; other trial types were not considered for the decoding analysis. Using a randomly chosen fraction (80%) of the population activity vectors, we constructed a classifier based on linear discriminant analysis, using Mahalanobis distances with stratified covariance estimates (the “classify” function in MATLAB with “Mahalanobis” option). We then tested the performance of this classifier on the remaining 20% of the population activity vectors, comparing the classification results with actual trial types. This five-fold cross-validation process was repeated 1,000 times to obtain a median estimate of classifier accuracy. To investigate decoding accuracy across time, the timing information of each population activity vector relative to the time of response in each trial was retained. We then ran a separate decoding analysis on the population activity vectors measured during each time period, using a non-overlapping sliding window with duration of 0.28 s and step size of 0.28 s. This window duration is the inverse of frame rate, which was 3.6 Hz. To decode from single-cell activity, ΔF/F(t) of each cell was used instead of population activity vectors as inputs to construct the classifier.

Statistics

Statistical tests were performed in MATLAB, and are indicated in the main text or figure legends. Briefly, a Wilcoxon signed-rank test was used for all two-sample, paired comparisons. For two-sample, unpaired comparisons, a Wilcoxon rank-sum test was used. Paired t-tests were used for bin-wise analysis of lick rates. For quantification of choice signals as a function of time, multiple linear regression was first performed as detailed above; a binomial test was then applied to the proportion of cells significantly encoding choice within each time-bin. For ensemble decoding analyses, mean classification accuracy was tested against chance level using a one-sample t-test. For t-tests, the sampling distribution of the mean was assumed to be normal, but this was not formally tested. All t-tests were two-tailed. A statistics checklist is available in the Supplementary Materials.

Code availability

The custom MATLAB code used for this study is available upon request.

Data availability

The data that support the findings of this study are available from the corresponding author upon request.

51 in total

Review 1. Arbitrary associations between antecedents and actions.

Authors: S P Wise; E A Murray
Journal: Trends Neurosci Date: 2000-06 Impact factor: 13.837

Review 2. Role of prefrontal cortex in a network for arbitrary visuomotor mapping.

Authors: E A Murray; T J Bussey; S P Wise
Journal: Exp Brain Res Date: 2000-07 Impact factor: 1.972

3. A neural circuit model of flexible sensorimotor mapping: learning and forgetting on multiple timescales.

Authors: Stefano Fusi; Wael F Asaad; Earl K Miller; Xiao-Jing Wang
Journal: Neuron Date: 2007-04-19 Impact factor: 17.173

Review 4. Functional role of the supplementary and pre-supplementary motor areas.

Authors: Parashkev Nachev; Christopher Kennard; Masud Husain
Journal: Nat Rev Neurosci Date: 2008-10-09 Impact factor: 34.870

5. Premotor cortex and the conditions for movement in monkeys (Macaca fascicularis).

Authors: U Halsband; R E Passingham
Journal: Behav Brain Res Date: 1985-12 Impact factor: 3.332

6. Deficits on conditional associative-learning tasks after frontal- and temporal-lobe lesions in man.

Authors: M Petrides
Journal: Neuropsychologia Date: 1985 Impact factor: 3.139

7. Functional, but not anatomical, separation of "what" and "when" in prefrontal cortex.

Authors: Christian K Machens; Ranulfo Romo; Carlos D Brody
Journal: J Neurosci Date: 2010-01-06 Impact factor: 6.167

8. Input- and Output-Specific Regulation of Serial Order Performance by Corticostriatal Circuits.

Authors: Patrick E Rothwell; Scott J Hayton; Gordon L Sun; Marc V Fuccillo; Byung Kook Lim; Robert C Malenka
Journal: Neuron Date: 2015-10-21 Impact factor: 17.173

9. ScanImage: flexible software for operating laser scanning microscopes.

Authors: Thomas A Pologruto; Bernardo L Sabatini; Karel Svoboda
Journal: Biomed Eng Online Date: 2003-05-17 Impact factor: 2.819

10. Premotor cortex is critical for goal-directed actions.

Authors: Christina M Gremel; Rui M Costa
Journal: Front Comput Neurosci Date: 2013-08-12 Impact factor: 2.380

30 in total

Review 1. Secondary Motor Cortex: Where 'Sensory' Meets 'Motor' in the Rodent Frontal Cortex.

Authors: Florent Barthas; Alex C Kwan
Journal: Trends Neurosci Date: 2016-12-22 Impact factor: 13.837

Review 2. Neocortical dynamics during whisker-based sensory discrimination in head-restrained mice.

Authors: Fritjof Helmchen; Ariel Gilad; Jerry L Chen
Journal: Neuroscience Date: 2017-09-14 Impact factor: 3.590

3. Cumulative Effects of Social Stress on Reward-Guided Actions and Prefrontal Cortical Activity.

Authors: Florent Barthas; Melody Y Hu; Michael J Siniscalchi; Farhan Ali; Yann S Mineur; Marina R Picciotto; Alex C Kwan
Journal: Biol Psychiatry Date: 2020-02-19 Impact factor: 13.382

4. Roles of Prefrontal Cortex and Mediodorsal Thalamus in Task Engagement and Behavioral Flexibility.

Authors: Tobias F Marton; Helia Seifikar; Francisco J Luongo; Anthony T Lee; Vikaas S Sohal
Journal: J Neurosci Date: 2018-02-07 Impact factor: 6.167

5. Control of adaptive action selection by secondary motor cortex during flexible visual categorization.

Authors: Tian-Yi Wang; Jing Liu; Haishan Yao
Journal: Elife Date: 2020-06-24 Impact factor: 8.140

6. Transformation of Cortex-wide Emergent Properties during Motor Learning.

Authors: Hiroshi Makino; Chi Ren; Haixin Liu; An Na Kim; Neehar Kondapaneni; Xin Liu; Duygu Kuzum; Takaki Komiyama
Journal: Neuron Date: 2017-05-17 Impact factor: 17.173

7. Enhanced Population Coding for Rewarded Choices in the Medial Frontal Cortex of the Mouse.

Authors: Michael J Siniscalchi; Hongli Wang; Alex C Kwan
Journal: Cereb Cortex Date: 2019-09-13 Impact factor: 5.357

Review 8. Learning in the Rodent Motor Cortex.

Authors: Andrew J Peters; Haixin Liu; Takaki Komiyama
Journal: Annu Rev Neurosci Date: 2017-03-31 Impact factor: 12.449

9. Spatial representations in the superior colliculus are modulated by competition among targets.

Authors: Mario J Lintz; Jaclyn Essig; Joel Zylberberg; Gidon Felsen
Journal: Neuroscience Date: 2019-04-11 Impact factor: 3.590

Review 10. Functional flexibility in cortical circuits.

Authors: Jessica A Cardin
Journal: Curr Opin Neurobiol Date: 2019-10-01 Impact factor: 6.627