Literature DB >> 35193973

Rodents monitor their error in self-generated duration on a single trial basis.

Tadeusz Władysław Kononowicz^1,2, Virginie van Wassenhove³, Valérie Doyère².

Abstract

A fundamental question in neuroscience is what type of internal representation leads to complex, adaptive behavior. When faced with a deadline, individuals' behavior suggests that they represent the mean and the uncertainty of an internal timer to make near-optimal, time-dependent decisions. Whether this ability relies on simple trial-and-error adjustments or whether it involves richer representations is unknown. Richer representations suggest a possibility of error monitoring, that is, the ability for an individual to assess its internal representation of the world and estimate discrepancy in the absence of external feedback. While rodents show timing behavior, whether they can represent and report temporal errors in their own produced duration on a single-trial basis is unknown. We designed a paradigm requiring rats to produce a target time interval and, subsequently, evaluate its error. Rats received a reward in a given location depending on the magnitude of their timing errors. During the test trials, rats had to choose a port corresponding to the error magnitude of their just-produced duration to receive a reward. High-choice accuracy demonstrates that rats kept track of the values of the timing variables on which they based their decision. Additionally, the rats kept a representation of the mapping between those timing values and the target value, as well as the history of the reinforcements. These findings demonstrate error-monitoring abilities in evaluating self-generated timing in rodents. Together, these findings suggest an explicit representation of produced duration and the possibility to evaluate its relation to the desired target duration.

Entities: Chemical

Keywords: error monitoring; metacognition; time perception; timing

Mesh：

Year: 2022 PMID： 35193973 PMCID： PMC8892352 DOI： 10.1073/pnas.2108850119

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 12.779

In neuroscience, a fundamental question is how rich the internal representation of an individual’s experience must be to yield adaptive behavior. Let us consider a hungry individual in need of finding food fast: The individual may adopt a trial-and-error foraging strategy to maximize reward but may also, to maximize its efficiency, represent rich experiential variables, such as how much time it takes to reach a source of food. Both representing elapsed time and monitoring its inherent uncertainty plays an important role in adaptive behavior, learning, and decision making (1). When representing these variables, the sources of uncertainty are both exogenous (stimuli driven) and endogenous (neural implementation). The mapping of exogenous sources of temporal uncertainty has been well described in timing behavior: For instance, mice can adjust their behaviors to the width of the distribution of temporal intervals provided through external stimuli (2). On the other hand, the endogenous sources of uncertainty for time perception are less understood and more difficult to address. Evidence that animals are sensitive and have access to the internal uncertainty of elapsed time comes from a task in which the individual must produce a required target duration using a lever press or a key press (1, 3, 4). In a task in which individuals must produce an interval of fixed duration to obtain a reward (Fig. 1), a plausible strategy to maximize reward would be to set the produced duration to be longer than the required target duration so as to allow a margin of error [internal target duration; (5)]. This is because the larger an individual’s representational uncertainty, the larger the margin of error to maximize the reward. Consistent with this, studies have shown that the magnitude of error in produced intervals varies with the magnitude of temporal uncertainties (6, 7), and participants with larger temporal uncertainty set larger margins of errors [Fig. 1 and ; (1, 7)]. The observed optimization of timing behavior begs the question of how rich the representation of elapsed time must be.

Fig. 1.

The TP task and error-monitoring protocol. (A) Schematic of a box arrangement with a lever available in the middle of the panel and reward ports on the left and right side of the lever. Reward availability was signaled by the port lit, depicted by the lightbulbs. Reward delivery was triggered by rats’ nose poke in the reward port. Depending on the group assignment, rats had to either hold the lever pressed for a minimum of 3.2 s (HOLD group) or press the lever twice with a minimal delay (3.2 s) between two presses (PRESS group). (B) TP performance, in error-monitoring test sessions, follows Weber’s law for both groups, with signatures of optimality. (Upper) Probability density functions over TPs for each individual rat in HOLD (blue) and PRESS (red) groups. Thresholds Θ (blue and red dashed lines for HOLD and PRESS groups, respectively) are plotted for each individual. (Bottom Left) Average probability density functions over TPs for HOLD and PRESS groups superimposed. Note the distribution shift and width shrinkage for HOLD group. (Bottom Right) For each rat, µ(TP) is plotted against σ(TP). Both at the individual and at the group level the PRESS rats showed larger µ(TP) and σ(TP), visible as an upward right shift of the red curve. This pattern indicates that rats make their choices optimally, taking into account their level of TP variability. The results hold within each rat and across sessions (). (C) Schematic depiction of how rewards were assigned to specific parts of TP distribution. Green color is used for “small error” (SE) trials and orange color for “large error” (LE) trials. Red color indicates TPs that were out of reward range. The arrows indicate probabilistic assignment of TP type (SE or LE) to left and right ports, on training trials. On test trials, the food–port assignments remained, but both ports were available and, thus, the amount of reward was driven by the rat’s choice. (D) Schematic of a trial structure. From the top to bottom, the succession of task events is depicted. They alternate along TP axis (color bar with red, green, and orange) and show different scenarios that are determined by the rats’ performance on TP in single trials. ITI is the last event in a single-trial sequence. A trial-and-error strategy would predict that near-optimal behavior can be parsimoniously explained by adaptation so that timing behavior would fluctuate around the required duration. The representational view would predict that uncertainty and trial-to-trial errors are experiential variables used by the animals to monitor their timing behavior. To settle the question of whether rodents can monitor their timing errors relative to their target on a trial-by-trial basis, we developed a task inspired by human work. Humans required to generate a time interval can also reliably report the magnitude of their errors and their sign (8) (i.e., they can evaluate by how much [magnitude] their generated duration was too short or too long [sign], with respect to the target duration). Humans can also report how confident they are in their timing behavior (9). We tested here these temporal cognitive abilities in rats, which were required to produce a time interval and correctly report, in order to obtain a reward, the magnitude of their timing errors on some test trials. We show that rats correctly reported the magnitude of their timing error, suggesting that their timing behavior uses explicit representations of time intervals together with their uncertainty around the internal target duration.

Results

Rats Produce Time Intervals Optimally.

Two groups of rats were trained to produce a minimum target duration (T = 3.2 s) by pressing a lever twice to demarcate T (PRESS group) or by pressing a lever for T amount of time (HOLD group). Producing at least T provided access to a reward (Fig. 1). All rats produced the required time interval T as all distributions of duration time productions (TPs) peaked after T (Fig. 1). Higher rewards were assigned to more accurate TPs to maximally exploit animal’s temporal precision and use this temporal precision efficiently in a reward assignment rule (Fig. 1). Consistent with Weber’s law in duration reproduction (10–12), the standard deviation (SD) of TPs [σ(TP)] increased with the mean TP [μ(TP); Fig. 1 , Bottom and ; t(586) = 15.4, P < 10−10]. Comparable slopes in PRESS and HOLD groups were found (Fig. 1 ΔAIC = 0.3, P = 0.155). As σ(TP) shrank over sessions () and across individuals (Fig. 1), the distance between T and μ(TP) shrank as well. This covariation strongly suggests a near-optimal exploitation of timing uncertainty by the animals. Indeed, the animals displayed behavioral patterns suggesting that they adapt their mean responding on the basis of their timing uncertainty (1, 3, 4). This phenomenon has two plausible explanations: 1) the animals could adapt their TPs by trial and error (13, 14), based exclusively on the memorized association of different rewards with different TP lengths (15) and 2) alternatively, their behavior could be driven by model-based, explicit representations of TP and their relation to internal target duration (16), which implies a tracking of their temporal error on a single-trial basis. Hence, we explore the two possibilities in the next two sections.

Rats Choose Accurately on Test Trials, Expressing Error Monitoring.

To assess temporal error monitoring, we developed a behavioral paradigm in which, on each trial, rats first produced a duration (TP) and subsequently reported its accuracy (i.e., how far the TP is from T). Rats were trained with the reward assignment rule (Fig. 1), which associated TP error (small error: TPSE trials versus large error: TPLE trials) with reward delivery of different sizes (TPSE-2p and TPLE-1p trials) or reward absence (TPSE-0p and TPLE-0p trials) in one of two ports. A given port delivered 1 pellet (1p) after a large error (TPLE-1p) but 0 pellet (0p) after a small error (TPSE-0p, Fig. 1, “1p port”), whereas the other port delivered 2p after a small error (TPSE-2p) but 0p after a large error (TPLE-0p, Fig. 1, “2p port”). Before introducing test trials, the pattern of reaction times (RT) in the control training trials, in which only one port was lit, showed that rats reached the port faster when expecting a reward delivery based on their TP error (). Furthermore, when probed on test trials with both reward ports lit up, the rats were rewarded if they chose according to the rule (Fig. 1), thus maximizing reward rate. Rats increased their choice RT on test trials compared to training trials, indicating typical choice behavior (). Test trial accuracy (proportion of correct choices) is a direct measure of temporal error monitoring, as the reward rule was conditional on the magnitude of the TP error. Note that the reward assignment rule and threshold adjustment were constructed such that overall reward was similar in both ports. Furthermore, if rats exhibited a bias to choose 2p port on all test trials, the accuracy would result in the chance-level accuracy scores (0.5). Indeed, we did not observe strong association between accuracy and pellet ratio on per rat, per session basis [β = −1.32, SE = 2.06, t(1,781) = −0.64, P = 0.53]. From Fig. 2, it is evident that accuracy on test trials (collapsed over 1p and 2p ports) was above chance level for all rats (each animal above 50% chance level, P < 0.05, two-sided Wilcoxon test; see ). Additionally, the PRESS group exhibited higher test trial accuracy [Fig. 2 β = 0.04, SE = 0.007, z(5967) = 5.46, P < 10−7]. As the PRESS group was also the one in which animals showed poorer TP performance, it suggests better choice performance in animals with larger variability. To test this hypothesis, we assessed the relationship between choice performance and SD across sessions by fitting a model where choice accuracy was predicted by SD with inclusion of group factor. The effect of SD [β = 0.018, SE = 0.022, t(567) = 0.81, P = 0.417] and interaction between SD and group [β = −0.003, SE = 0.029, t(600) = −0.122, P = 0.902] were not significant. That suggested that the difference in choice accuracy between groups was not driven by differences in variability but was related to the different motor sequences required to produce duration. As rodents may use motor routines to support temporal judgments (17–20), it raises the question to what extent motor routines could contribute to interval timing and error monitoring.

Fig. 2.

Rats report temporal errors by choosing accurately on test trials. (A) Gray data points show session-by-session accuracy on test trials for individual rats, that is, the proportion of test trials in which a rat chose a port at which a reward will effectively be delivered. The 1p and 2p reward trials were averaged together. A strategy in which only one port would be chosen would result in chance level (0.5) accuracy. Each column shows the results for one rat across all experimental sessions for the two groups of rats. All rats chose significantly above the chance level, indicating the ability to monitor temporal errors. (B) Accuracy on test trials plotted separately for HOLD and PRESS group. Notably, PRESS group achieved a higher level of accuracy. (C) Probability density distribution over TPs on test trials plotted as a function of the rats’ choices. TPs reported by the rats as 2p trials are plotted in green, and trials classified by rats as 1p trials are plotted in orange. The mean of each distribution is indicated by a dashed vertical line of corresponding color. The 1p distribution (orange) is shifted with respect to the 2p distribution (green), clearly indicating that the rats’ reports are a function of TPs. The main plot shows density across all rats and sessions. The inset plot shows one sample session of one rat. (D) Logistic function fitted to a sample session of one rat from C. As the curve descends, TP duration increases together with the probability of choosing 1p port on test trials and the probability of choosing 2p port decreases. The descent of the logistic function signifies that as TP duration increases the probability of choosing 2p on test trials decreases. The scale ranges from green to orange, corresponding to reward assignment (Fig. 1). Data points at the top (green, 2p choice) and at the bottom (orange, 1p choice) of the plot show individual TPs, whereas the color code signifies 1p and 2p choices of rat “7” in session 32. (E) GLMM fit including TP as predictor. Data from 14 rats were used to fit the model. Two types of shaded areas display CIs at a population level. The narrow shading displays population-level CIs. The wider shading displays population-level predictions with consideration of uncertainty in the variance parameters (21). Thus, the wider shading can be considered as prediction intervals (22). Similarly, as in D, color progression from green to orange depicts increasing probability of 1p port choice as a function of TP. The inset model displays obtained choice odds of TP predictor. As the test trial accuracy neglects a continuous link between choice and TP error, we verified that the rats showed an increased probability of 1p choice as a function of TP error (Fig. 2 ), with generalized linear mixed model (GLMM) predicting the probability of choosing 1p port versus 2p port given the TP error [Fig. 2 β = 3.29, SE = 0.30, z(19,512) = 11.01, P < 10−15]. Rats exhibited choice uncertainty for TPs close to their internal threshold (Fig. 2 ). Of course, the better the internal threshold aligns with Θ threshold, the higher the chances for higher-choice accuracy. Overall, both analyses showed that rats chose between 1p and 2p options with relatively high accuracy, suggesting that subjects have a representation of the relation between the duration of the just produced interval and the interval, longer than required interval, chosen based on the level of temporal uncertainty. However, the representation of the required interval, the margin of error, and the representation of the time elapsed during an attempt to produce the required interval, all have limited precision. The uncertainty in the port choice must reflect the uncertainties in the first-order variables, as listed, on which it depends. Therefore, near-perfect accuracy is not to be expected.

Rats Keep Track of Temporal Errors and Reward History.

We next investigated whether the rats used sources of information other than TP errors to choose accurately on test trials. Recent work showed that local mean or variance of TP is not stable over time in monkeys (23) or in humans (8). This nonstationarity of TP was also observed here, as illustrated in a sample plot (Fig. 3 and ) depicting changes in local mean of TP [we refer to the moving mean of the last n trials as μ(TPn) over the course of one sample session]. We hypothesized that rats could use fluctuating μ(TP), when faced with test trials, by keeping track of previous trials. Indeed, rats track reward history (RH) of previous trials (10), also in a situation in which rewards were delivered on the left or on the right side of a main port (24, 25). In the current paradigm, the location of the reward delivery port depended on the TP error. Thus, as TPs were nonstationary, rats could have collected more rewards in just one port, in turn contributing to their choices. Indeed, reward on previous trials contributed to test trial choices (). We tested the hypothesis that rats based their choices on the RH. For example, if most of the recent trials were TPLE, the rat has been rewarded in the 1p port, therefore, increasing a chance to choose that port. To quantify the RH, we calculated the relative amount of reward received in 1p and 2p ports. To test whether rats relied on RH, we used a logistic regression extending the basic model by adding the moving average of n previous trials (μRHn) to the model, as an expression of RH. Keeping current TP and adding RH over a different number of preceding trials (μRH10, μRH20, or μRH30 for window sizes of 10, 20, or 30 trials) in the same model allowed the assessing of their relative contributions to choice behavior. Independently of the number of trials considered—namely, independently of the history window size—both current TP and μRHn contributed to choice behavior, as evidenced by large values of estimated choice odds [Fig. 3 RHn: β = 0.60, SE = 0.094, z(16,423) = 6.37, P < 10−9; μRH10: β = 0.47, SE = 0.041, z(16,423) = 11.34, P < 10−14; μRH20: β = 0.35, SE = 0.064, z(16,423) = 0.64, P = 10−7; μRH30: β = 2.35, z(16,962) = 4.54, P < 10−5].

Fig. 3.

Rats keep track of temporal errors and RH. (A) TPs (black) plotted across an experimental session (example). Shaded red rectangle depicts a moving average window. The red trace marks the moving average of 20 TPs. That is, one data point of moving average is a result of averaging the preceding 20 trials. Fluctuation of moving average illustrates nonstationarity of TP. (B) Results of GLMM fit, including current TP and running average of last 10, 20, and 30 rewards (reward history). Data from 14 rats were used to fit the model. The formula in the plot shows specification of fixed terms in the GLMM. Asterisks indicate significant terms. Error bars display CIs of estimated effects. Similar to Fig. 2, the inset plot depicts logistic fit for current trial (circle) and RH (μRW10, triangle). (C) Results of GLMM fit, including current TP and running average of last 10, 20, and 30 TPs, as a proxy for RH. Data from 14 rats were used to fit the model. Significant model terms indicate that rats keep track of temporal errors on current TP as well as TP history. Similar to Fig. 2, the inset plot depicts logistic fit for current trial (circle) and TP history (μTP30, triangle). As the previous TPs can influence current TP (26–28), we also considered the possibility that RH, due to previous TPs (μTPn), could influence choices on test trials (). As the amount of reward was conditional on TP, mean TP in n previous trials (μTPn) is a proxy of RH, incorporating TP history. Indeed, both current TP and μTPn contributed to choice behavior, as evident in the large values of estimated choice odds [Fig. 3 TPn: β = 2.66, z(16,962) = 7.45, P < 10−13; μTP10: β = 1.24, z(16,962) = 3.86, P < 10−3; μTP20: β = 0.50, z(16,962) = 0.91, P = 0.36; μTP30: β = 2.35, z(16,962) = 4.54, P < 10−5]. Similar results were obtained for rewarded and unrewarded trials (). Remarkably, both factors, TP and RH, remained significant, indicating that rats rely on both sources of information while choosing on test trials. Together, these results show that during choice behavior, rats take into account both factors: their temporal error on the current trial (TPn) and the RH (μRHn). Contribution of both factors demonstrates the richness of representation of elapsed time, including error-monitoring abilities together with tracking previous behavioral outcomes.

Inferred Timing Errors Can Guide Future Behavior.

As rats reported their errors, we asked whether they used this ability to guide future behavior. The first possibility is that temporal errors are only used in the experimental context to obtain rewards. The second possibility is that temporal errors can also be utilized for error adjustment on adjacent trials. Meck et al. (29) showed that when rats estimated 10- or 20-s time intervals, the probability of trial-to-trial alternation between durations shorter, or longer, than the median was above chance level. Alternation of shorter and longer trials in an ordered sequence was a signature of “threshold hunting,” which we investigated in this dataset by analyzing errors on consecutive trials, referred as relative errors (res; Fig. 4). Focusing on res instead of raw TPs allows the highlighting local adjustments of TP, which would otherwise be confounded with other phenomena involving slow TP changes across multiple trials. If there is no representation of re, there is no possible adjustment on the next TP. Therefore, no relation between error on nth TP (ren) and error on nth + 1 TP (ren+1) would be expected. In contrast, if temporal errors are utilized, small ren will be followed by large ren+1 and vice versa, resulting from adjustment of consecutive res. We refer to this behavior as threshold hunting (29, 30).

Fig. 4.

Inferred timing errors guide future behavior. (A) The error estimation hypothesis predicts adaptation on the n + 1 trial and so-called “hunting” behavior. (Top) We considered triplets of trials in which the two first trials were 2p trials. The re in TP on n trial is calculated as the subtraction of TP during the second trial from TP during the first in a given triplet (ren = “n” − “n − 1”). The re in TP on n + 1 trial is the result of the subtraction of TP in the third trial from TP in the second trial in a given triplet (ren+1 = “n + 1” − “n”). The bottom panels depict predictions in case when temporal errors are used or not used to guide behavior. If temporal error monitoring does not guide behavior, no correlation between ren and ren+1 is predicted. However, the hypothesis of an error monitoring predicts a negative association between ren and ren+1, which is depicted in the bottom right panel. (B) ren were categorized as short and long bins, based on ren magnitude. Values of ren+1 are plotted against short and long ren bins. Gray lines connect data points for individual rats. Dotted lines represent HOLD group, and solid lines represent PRESS group. To compute an index of error adjustment (aka “hunting index”), we subtracted ren+1 of the long bin from that of the short bin. The values of hunting index for each rat are plotted in C along y-axis. (Top Inset) The values of hunting index plotted for both groups. No between group difference was found [t (12) = 0.40, P = 0.70, t test]. (Bottom Inset) Results of GLMM fit, including current ren as a predictor of ren+1 for each individual rat. Each line depicts a correlation slope for each individual rat. This negative relationship between ren and ren+1 was confirmed with linear-mixed regression [β = −0.36, t (12) = −5.70, P < 10−3, GLMM], in line with analysis in the main plot in B. (C) The blue (HOLD group) and red data points (PRESS group) display the hunting index for each rat plotted against overall choice accuracy on test trials (Left) and values of current TP (Right) and TP history (Right Inset) estimated from the same model fit. The lines depict resulting fits of robust regression model. The shaded gray area represents CIs. We controlled for the effects of reward magnitude on TP () by focusing on triplets of trials, in which the first two trials were accurate trials rewarded with 2p (TPSE-2p). As reward magnitude calibrates performance variability on the next trial (23, 31), we verified that the TP variability on the 2p trials was unaffected by reward magnitude (). ren was calculated as the signed difference of temporal error between the second and the first trial in each triplet (second TPSE-2p − first TPSE-2p; Fig. 4). Likewise, ren+1 was calculated as the difference between the third and the second trial in each triplet (third TP − second TPSE-2p; Fig. 4). Each triplet resulted in one pair of ren and ren+1. We split the ren trials into short and long bins of the same count. Based on short and long ren bins, we then evaluated ren+1 (Fig. 4). In line with the error-monitoring hypothesis, we found that ren+1 in long ren bin were shorter than ren+1 in short ren bin [Fig. 4, β = −0.096, t(2,441) = −7.34, P < 10−12]. Thus, after being rewarded on the nth trial with 2p, rats adjusted their re on the next trial, suggesting the threshold-hunting behavior. If the threshold-hunting behavior is a behavioral proxy for error monitoring, the degree to which individual rats exhibited hunting should be associated with error-monitoring metrics: test trial accuracy and estimated choice odds given current TP and TP history. A hunting index was calculated by subtraction of long bin ren+1 from short bin ren+1. It is evident, from the Fig. 4 , Left, that the larger the hunting index, the larger the average accuracy [Fig. 4 (12) = 3.25, P = 0.007; robust regression]. As revealed by previous analyses, the overall test trial accuracy was influenced by two factors: current TP and TP history. To assess whether they relate differently to the hunting index, we retrieved the estimated parameters of the GLMM for TP and μTP30 for each rat. We chose μTP30 because this factor showed the highest estimates among TP history factors in the previous analyses. If the degree of hunting is a proxy for error monitoring, the larger the hunting index, the larger the current TP estimates should be, and the opposite prediction holds for TP history (TP30). As suggested by Fig. 4 , Right, the rats that relied more on the current TP also exhibited the largest hunting index [t (11) = 7.44, P < 10−4], whereas those who relied more on the TP history exhibited the smallest hunting index [t (11) = −4.19, P = 0.002; Fig. 4 , Right Inset]. The addition of TP30 to the model including TP was statistically justified [Wald test: W(1) = 17.58, P < 10−4]. Together, while controlling for the plausible effects of variance (), we demonstrated hunting behavior in rats. The association of hunting behavior with indices of choice accuracy suggests that rats can use estimated errors to guide future behavior, supporting a functional role of error monitoring in guiding adaptive behavior.

Discussion

Although primates and rodents can report their confidence in perceptual tasks (e.g., refs. 32–35), whether rodents can monitor the errors of self-generated behavior, such as the production of a time interval, was not known. Our results provide evidence for self-monitoring of temporal errors in rats. During timing behavior and its evaluation, rats displayed behavioral characteristics suggesting an online assessment of their temporal errors in line with the idea that rats represent rich, experiential variables. Previous studies suggested that rats may take into account their internal, temporal uncertainty during timing behavior, but an alternative explanation for a behavior resembling error monitoring was offered for all the previous studies (for review ref. 5). For example, Foote and Crystal (36) trained rats to discriminate short and long stimulus durations; when rats were faced with trials close to discrimination boundary, they chose a smaller but certain reward without taking the risk of choosing short or long and receiving no reward. The alternative explanation is that rats learned differential reinforcement for different durations (15), putting in question monitoring of temporal errors. However, if animals were responding according to two different memory representations in our TP task, a bimodal TP distribution would have been expected, which we did not observe in the current study. The paradigm overcame the previous difficulties by employing time production paradigm instead of stimulus duration discrimination. By adapting timing thresholds individually for each rat, we forced the animals to remain close to their individual limits of time discrimination. Additionally, by designing a way to report temporal errors, we can avoid relying on the whole distribution of responses, which is an indirect means to infer the plausibility of monitoring of internal uncertainty (1, 2). Instead, rats reported estimated errors in their timing behavior on a trial-by-trial basis to maximize reward. Their behavior was based on both the mean and precision of their representation of the elapsed interval. Remarkably, the rats showed the ability to track their error on the current trial, showing that rats kept track of the internal variables on which decisions were based, and of the relation of those variables to the target value. Along with tracking their error on the current trial, they kept history of recent reinforcements to maximize the amount of reward in a given amount of time. It suggests that, in addition to their capacity to store in memory multiple parameters of their behavior (4), rats have an exquisite ability to monitor their self-generated behavior. Furthermore, the association of test trial accuracy and size of the TP correction between consecutive trials suggests that rats can use estimated errors to guide future behavior, supporting a functional role of error monitoring in guiding adaptive behavior. One limitation of the current study is the usage of just one very specific reward assignment rule. On one hand, it forced the rats to be more precise, as TPs close to the objective threshold were rewarded with 2p. On the other hand, the distribution had to be titrated in proportion to the reward size. Thus, the question remains of how rats would perform with different reward assignment rules. The question is important as it touches on the relation between levels of temporal uncertainty and ability to report temporal errors. Investigating different levels of precision in the future will allow understanding the properties of representations that rats use to keep track of time. Modification of the reward assignment rule should also allow us to investigate whether rodents possess the ability to monitor the sign of temporal errors as humans (8, 9). Our finding of error monitoring of timing behavior in rats is markedly similar to recent findings in humans, in which participants were able to report a signed magnitude of their temporal errors (8, 9). Although we do not claim that rats and humans use the same cognitive machinery for the monitoring of temporal errors, our results indicate that rats use basic computations allowing error monitoring, whereby the monitoring process actively infers the state of an “internal clock.” Ultimately, these results deepen our understanding of error-monitoring abilities in rodents and demonstrate a richness of representation of elapsed time that goes beyond trial-and-error strategy in interval timing. Together, thanks to our way to investigate animal cognition, the present findings show that monitoring of internal uncertainty is not a privilege of primates. The current study contributes to ongoing discussion on self-monitoring (32).

Methods

Subjects.

The subjects were male Sprague–Dawley rats (n = 16; Envigo). They were 3 mo old at the beginning of the experiments. The rats were housed in groups of four and kept under a 12 to 12 h light/dark cycle. The experiments were performed during the light cycle. After 1 wk of acclimation to the vivarium with ad libitum food and water access, rats were food restricted, such that their weights reached and were maintained at 85% of their free-fed weight. All rats were handled, weighed, and fed daily until the end of the experiment. For all rats, two experimental sessions were scheduled per day. The first and the second session of the day was separated by ∼3 h during which rats returned to home cages with water access. The experiment was carried out in accordance with the guidelines established by the European Communities Council Directive (2010/63/EU) for compliance and use of laboratory animals and approved by the French Ministry of Research and the French National Ethical Committee (2013/6).

Apparatus.

The experimental setup was built using the Habitest Modular System (Coulbourn Instruments) with an operant box (Rat Test Cage, H10-11R-TC) placed in an isolation cubicle (H10-24A). Fig. 1 depicts the spatial arrangement of the behavioral box. On the left wall, a retractable lever (H23-17RC) was placed between two reward ports (H14-01R), each connected to a feeder (H14-23R) delivering grain-based pellets (45 mg, Phymep, Dustless Precision Pellets) and with photocell sensors (H20-94) to measure port entries. Lever was separated from feeders by 7 cm and was placed 6.5 cm above the floor grid. Reward ports contained a light emitting diode to serve as a reward cue after each successful TP. A red house light (H11-01R-LED) and a speaker (H12-01R) were placed on the right wall. The system was controlled, and data collected using Graphic State 4 software. The data were converted to text files and analyzed using custom-written R code.

Procedure: Pretraining.

The pretraining followed several steps. The first step was the feeder training (two sessions), which consisted in familiarizing rats with the reward ports and the pairing of the pellet drop with the onset of the reward port light. A trial started after an intertrial–interval (ITI), randomly chosen from 10 to 15 s (in steps of 1 s), with a pellet drop that was simultaneous with the onset of the light in the reward port. The light stayed on for up to 2 s unless the pellet had been picked up. Rats had 7 s to pick up the pellet. The next trial started when a pellet was collected or when 7 s elapsed. The session ended when rats collected 40p. The side of the reward delivery was pseudorandomized. In the second pretraining step (port light training, 10 sessions), the rats were required to nose poke in the reward port in order to receive the reward. The maximal time for which the port light cue would stay on was set to 5 s. All other aspects of the task remained unchanged. When all rats successfully collected rewards on more than 80% of trials in two consecutive sessions, we moved to the third step. In the third step (lever press training, three sessions), the rats were required to press a lever in order to activate the light cue. The trial started with the lever presentation that the rats had to press for turning on a port light cue (randomly chosen to be the left or the right side) waiting for the rat to poke in for reward delivery. The lever was retracted after a lever press was made or 5 s had elapsed, and a new ITI started. Rats could earn 50p in one session; otherwise, the session ended after 30 min. All rats reached a criterion of 80% rewarded trials in two consecutive sessions.

Duration and Precision Training.

Two groups of rats were formed, corresponding to two different ways of producing durations. One group of rats (n = 8; HOLD group) was trained to hold the lever down for a required time of target duration. The lever press and the lever release served as onset and offset times for the produced duration, respectively. The second group of rats (n = 8; PRESS group) was trained to produce a duration spaced between two lever presses. The first lever press and the second lever press served as the onset and the offset times, respectively. Generation of another press by PRESS group likely involves more degrees of freedom in the rat’s motor routine. A slightly more complex motor pattern likely translated to a difference in precision of TP (Fig. 1). Four operant boxes were used to train the rats. Two rats of the PRESS and HOLD groups were assigned to each operant box. Two rats of each home cage were assigned to PRESS, and two were assigned to HOLD to counterbalance for any “home cage” effects. We refer to the time interval that the rat produced, on any given trial, as TP. To train the rats to produce a duration of 3.2 s, which we refer to as the target duration (T), we followed the procedure used by Platt et al. (37). We started with T of 0.4 s, for which rats obtained access to the reward only if they produced a duration of at least 0.4 s. To train the rats on longer durations, we increased progressively T following logarithmically spaced targets between 0.4 and 3.2 s (T: 0.4, 0.61, 0.92, 1.39, 2.11, and 3.2 s). For each rat, and in each session, a density function was calculated for the TP distribution, as well as the associated proportion of the area under the curve that exceeded the criterion T. A longer T was introduced to all rats when 80% of the area under the curve exceeded the T criterion on two consecutive sessions. However, computation of thresholds in the remaining parts of training was performed individually. On any given trial of duration and precision training trial (), the lever was inserted after an ITI of 4 to 8 s (in temporal increments of 1 s). The rats had unlimited time to initiate their interval production by pressing the lever. When the TP (holding or pressing twice, depending on the group assignment) was too short (TP < 3.2 s), the lever was retracted, and the trial restarted with the insertion of the lever after a random ITI. After the rewarded trials [3.2 s < TP < 2 × σ(TP)], rats had between 4 and 8 s to collect their reward, after which the lever was presented again. This trial structure remained unchanged until the end of the experiment. Once all subjects achieved the T criterion for correct performance on the final T (3.2 s), a bound on longer durations was introduced that equated to 2 × σTP, meaning that rats were not reinforced in trials in which their TP exceeded two SDs away from the mean TP calculated over the two previous sessions (σTP). This procedure intended to force the animal to be precise in their TP. To signal to the rats that a TP was too long, a tone (4 kHz, 1s, and 75 dB) was played after the TP was terminated. The sound was incorporated to distinguish the too-long, no-reward trials from the too-short, no-reward trials. The ITI was adjusted by the sound duration, resulting in ITI values ranging from 3 to 7 s. During the precision training, every two sessions the threshold for too-long TP was recomputed on a per rat basis. We observed that TP precision increased in the course of precision training; however, we did not define a strict stop rule criterion, as a changed payoff in the next training step could have caused further increase in TP precision. Within one session, the rats could obtain up to 100p. The session was limited to 45 min. Rats adjusted their behavior in response to the sound signal by producing a narrower TP distribution, rendering them more precise in their TP. During the duration training, the port-light cue always indicated the reward port in which a reward was available. The reward port was randomly chosen with 50% chance. All rats chose the cued port first with accuracy above 95%. The duration training took 34 sessions; the precision training took 12 sessions.

Error-Monitoring Training.

We refer to this part of training as error-monitoring training, which took 36 sessions. In this phase, the rats learned to assign a reward size as a function of their error size in TP. To assign reward size to TP errors, we estimated a threshold (Θ) based on the individual TP distribution. Fig. 1 depicts a hypothetical distribution of TP for a given animal. More accurate TP (which we name TPSE for TP Small Error), between T and Θ, were rewarded with 2p in one port (Fig. 1, green area). The less accurate TP (which we refer to as TPLE for TP Large Error), between Θ and 2 × σ(TP), was rewarded with 1p in the other port (Fig. 1, orange area). Θ was estimated as a point dividing the green and orange areas with 1:2 ratio (Fig. 1). The first threshold (Θ) to separate green and orange areas was computed based on the performance of the animals in the two last sessions of precision training. The optimal ratio of 2p to 1p trials was 0.5, calculated as the proportion of TPSE (2p) and TPLE (1p) trials. That ratio guaranteed equal number of pellets delivered in both ports. Thresholds Θ were computed for each animal individually. Θ was updated when rats deviated from optimal ratio by factor of 3, meaning that when the lower (0.16) or upper (1.5) bound were exceeded. The individual thresholds were recalculated when the update bounds were exceeded on two consecutive sessions. The contingencies assigning TP length to a reward port are depicted in Fig. 1 . Specifically, on a given trial, after TP generation, if TP fell between T and 2 × σ(TP), either the left port or the right port was lit, chosen randomly. For TPSE trials (green area), animals received 2p in the one port (2p port) or 0p in the other port (1p port). We refer to trials rewarded with 2p and 0p as TPSE-2P and TPSE-0P, respectively. Generation of TPSE on a given trial resulted in a 50% chance of receiving a reward as TPSE-2P or TPSE-0P, as indicated by the arrows above pointing to the 2p and 1p port. For less accurate TPs that landed in the orange area, animals received 1p in the 1p port or 0p in the 2p port. We refer to TPLE trials that were reinforced with 1p and 0p as TPLE-1P and TPLE-0p, respectively. The assignment of TPLE-1p and TPSE-2p to the left and right port was counterbalanced across animals. The subject entered the lit port on 95% of test trials on average. Entering the unlit port caused trials termination.

Error-Monitoring Test Sessions.

To test whether animals were able to report temporal errors, we used the same trial structure as in error-monitoring training but added choice trials (Fig. 1). We used the same paradigm as in the error-monitoring training sessions on 80% of trials. On the remaining 20% of trials, we introduced test trials in which both ports were lit up, giving the animal an opportunity to choose according to the rule described in Error-Monitoring Training. Thus, in those test trials, the rats were offered the possibility to get a reward in either port, with a reward size (0p versus 1p versus 2p) depending on their TP. To maximize the reward rate, the animals should choose in line with the accuracy of TP, categorized as TPSE or TPLE. Choosing TPSE-2p on TPSE and TPLE-1p on TPLE trials would be suggestive of error-monitoring abilities. Within one session, rats could obtain up to 130p. Session time was limited to 65 min. In total, the rats performed 40 error-monitoring test sessions.

Data Processing.

All analyses used custom-written scripts in R. Datasets of individual sessions were removed only if equipment malfunctioned or when a small number of test trials was obtained in one of the ports (<5). In total, we removed 50 individual datasets (per rat and per session) out of 640 datasets of all test sessions (equivalent to 8%).

Statistical Analysis.

To account for sample dependency, we used linear-mixed effects models (LMM) (e.g., ref. 39) as implemented in R package lme4 (version 1.1–21), that is, for multiple per animal observations across sessions. LMM are regression models that model the data by taking into consideration multiple levels. Overall, LMM can be expressed as the following (40):where yi is a response vector for the i-th grouping, β is a vector of fixed effects, ε reflects error term, bi is a vector of random effects, and Xi and Z are matrices reflecting fixed and random effect design. Perhaps the most straightforward and intuitive distinction between fixed and random effects posits that fixed effects are assumed to be constant over experiments, while random effects are a random subset of the levels in the underlying population (41). Animals and sessions were random effects in the model and were allowed to vary in their intercept. P values were calculated based on a type 3 ANOVA with Satterthwaite approximation of degrees of freedom, using the lmerTest package in R (42). The mixed effects model approach was combined with model comparisons that allow for selecting the best fitting model in a systematic manner. Model comparisons were performed based on Akaike Information Criterion (AIC). As we collected 40 test sessions for each rat, we considered the option to add a nested random effect structure, reflecting that each rat could perform slightly differently on each session. To select the random effect structure, we fitted a model including only the intercept term, comparing different random effect structures using χ2 test and AIC. We started with a model containing only the intercept term for different rats. Model comparison showed that the addition of an intercept term for each session, nested within a rat, was justified (ΔAIC = 3,164, χ2 = 3,262, P = 10−15). The addition of a random slope effect for each predictor was also justified when compared against the model with random intercept per rat and per session (ΔAIC = 2,290, χ2 = 2,300, P = 10−15). The random slope effects reflected the fact that each rat could perform differently. The random effects definition remained unchanged for all assessed models.

Time Production Analysis.

The mean (μ) and SD () of TP were calculated across sessions and across rats. To remove the left tail–reflecting, impulsive responses and motor errors, we analyzed TP longer than 2 s. On average, each rat produced 218 trials per session, excluding trials below 2 s and 223 including trials of all durations. To assess whether the SD of TP scaled with the mean of TP, we performed LMM, which included rats as random effect, of the following form: Indices t, s, and r stand for trial, session, and rat, respectively. μTP indicates the mean TP calculated per session and per trial, which was included as a fixed effect, as indicated by the β1 coefficient. b0 stands for the random effect of rat. To compare differences between the two groups of rats, we included session as a random effect ().

Reaction Time Analyses.

To assess whether rats expressed confidence in their choices, differences in RT for rewarded and unrewarded trials, for the 1p and 2p ports (; RT TPSE-1p − RT TPLE-0p [dark red]); RT TPLE-2p − RT TPSE-0p [green]) were calculated. Differences in mean RT were calculated for each session and each rat. Therefore, the LMM included a nested intercept term of session, which was allowed to vary within each rat. To test whether rats expressed choice behavior on test trials in which both ports were lit, RT were also averaged on a per session and per rat basis (). Test trials were contrasted against training trials. LMM was formulated similarly to Eq. , including a random effect of rat.

Choice Accuracy on Test Trials.

Overall, we collected 21,129 test trials. Average number of test trial per rat, per session was 37.9 (). Individual accuracies were tested against chance level using Wilcoxon test, separately for each rat. Samples were mean accuracy in a session. The exact results are presented in . The last column of provides information on how many sessions accuracy of an individual rat was higher than 55%. Additionally, displays 2p/1p ratio obtained per session, per rat. The data displayed in were used to analyze the association between accuracy and pellet ratio in Rats Choose Accurately on Test Trials, Expressing Error Monitoring. To assess the difference in accuracy between PRESS and HOLD groups, the average accuracy was computed on a per session and per rat basis. The data were tested using LMM with a random intercept of session.

Predicting Choices Based on TP.

Accuracies for rat 1 and 2 in HOLD group were marginally significant, and they did not exhibit choice behavior on at least half of the sessions (Fig. 2). Therefore, they were excluded from the single-trial analyses. A simplified GLM was fit to illustrate the principle of performed analyses. Rat (nr 7, PRESS group, session 32) choices from a single session (2p port versus 1p port choices) were predicted using TP from the same trial. That GLM had the following form:where TPn stands for TP on single trial. Modified version LMM, that is GLMM, was used to model binary outcome variable (2p port versus 1p port choices) to determine whether TP predicts rats’ choices. The most basic GLMM had the following form: The log odds of rats’ choices were modeled as a function of TP, expressed by TP rat. b0 + b1s express a random intercept of rat and session, with session nested within a rat. Nested random intercepts of a session within rat were kept for remaining model fits. The model was fit using 19,512 samples of test trials.

Predicting Choices Based on TP and Reward History.

To investigate whether rats used sources of information other than current trial TP error to choose accurately on test trials, we first fitted a model incorporating lagged rewards and TP on previous trials (). We used current TP and 15 other predictors (different lags), which were reward size (RH) or TP vectors lagged by n trials (n = {1:15}). We used a similar approach, as introduced in the previous paragraph with several extensions. The model of the following form:was enhanced with a fixed effect of RH (RHn), where n stands for the trial lag. The model in which TP was included instead of RH () was of the same form. Both models were fit using 19,309 samples of test trials. To investigate cumulative effects of RH, we calculated mean μRHn across n trials. One individual trace of nonstationary TP was visualized in Fig. 3. All RH traces that were used in this analysis were visualized in . Using these predictors, we fitted a model of the following form:where β1TP express a fixed effect of TP and βμRWn express a fixed effect of cumulative reward in 1p and 2p ports in n last trials. In addition to nested session by rat random intercepts, as in the previous models, we added a random effect of slope for each rat (b μRWn). A model of the same structure was fit for μTPn cumulative TP history. Both models were fit using 16,973 samples of test trials. The difference of 2,539 trials, as compared to initial model, was due to averaging of the previous trials, causing a removal of the initial 30 trials of each session.

Predicting Relative Errors Based on Error-Monitoring Accuracy.

To investigate whether rats adjust their errors between trials, we identified triplets of trials in which the two first trials were accurate trials rewarded with 2p. In total, we identified 2,750 such triplets from all rats (Fig. 4). We observed outliers in Fig. 4, which violate the normality assumption of a linear regression. Therefore, we used a robust linear model that attempts to minimize the influence of outlier observations without removing them from a dataset. Using robust and efficient-weighted least squares estimator (43), the least square weights were adaptively estimated using the empirical distribution of residuals. As such, robust methods are less sensitive to a nonnormal distribution. All models, in this section, were implemented using “robust” R package [version 0.5; (44)]. The fitted model predicted hunting index with test trial accuracy (Fig. 4). To disentangle the effect of different contributions of test trial accuracy to hunting index, we used individual coefficient for each rat, expressing current trial contributions and previous trials contributions. Thus, we used parameters of fitted GLMM, as described in the Eq. , however, just with two terms: β1TP and β2μTP30. We retrieved the coefficient for each rat and used them as predictors of hunting index in the second fitted robust regression (Fig. 4).

38 in total

Rodents monitor their error in self-generated duration on a single trial basis.

Results

Rats Produce Time Intervals Optimally.

Rats Choose Accurately on Test Trials, Expressing Error Monitoring.

Rats Keep Track of Temporal Errors and Reward History.

Inferred Timing Errors Can Guide Future Behavior.

Discussion

Methods

Subjects.

Apparatus.

Procedure: Pretraining.

Duration and Precision Training.

Error-Monitoring Training.

Error-Monitoring Test Sessions.

Data Processing.

Statistical Analysis.

Time Production Analysis.

Reaction Time Analyses.

Choice Accuracy on Test Trials.

Predicting Choices Based on TP.

Predicting Choices Based on TP and Reward History.

Predicting Relative Errors Based on Error-Monitoring Accuracy.

1. Generalization of prior information for rapid Bayesian time estimation.

2. Time-based reward maximization.

3. A behavioral theory of timing.

Review 4. What is consciousness, and could machines have it?

5. Theoretical implications of quantitative properties of interval timing and probability estimation in mouse and rat.

6. Optimal response rates in humans and rats.

7. The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect.

8. Temporal context calibrates interval timing.

9. Mice infer probabilistic models for timing.

10. Metacognition in the rat.

1. Timekeeping rats estimate how long a task will take.

2. Tracing the shadow of time.