| Literature DB >> 32753514 |
David B Kastner1,2, Anna K Gillespie2, Peter Dayan3,4, Loren M Frank5,2,6.
Abstract
Animal behavior provides context for understanding disease models and physiology. However, that behavior is often characterized subjectively, creating opportunity for misinterpretation and misunderstanding. For example, spatial alternation tasks are treated as paradigmatic tools for examining memory; however, that link is actually an assumption. To test this assumption, we simulated a reinforcement learning (RL) agent equipped with a perfect memory process. We found that it learns a simple spatial alternation task more slowly and makes different errors than a group of male rats, illustrating that memory alone may not be sufficient to capture the behavior. We demonstrate that incorporating spatial biases permits rapid learning and enables the model to fit rodent behavior accurately. Our results suggest that even simple spatial alternation behaviors reflect multiple cognitive processes that need to be taken into account when studying animal behavior.SIGNIFICANCE STATEMENT Memory is a critical function for cognition whose impairment has significant clinical consequences. Experimental systems aimed at testing various sorts of memory are therefore also central. However, experimental designs to test memory are typically based on intuition about the underlying processes. We tested this using a popular behavioral paradigm: a spatial alternation task. Using behavioral modeling, we show that the straightforward intuition that these tasks just probe spatial memory fails to account for the speed at which rats learn or the types of errors they make. Only when memory-independent dynamic spatial preferences are added can the model learn like the rats. This highlights the importance of respecting the complexity of animal behavior to interpret neural function and validate disease models.Entities:
Keywords: behavioral modeling; learning and memory; reinforcement learning; rodent behavior
Year: 2020 PMID: 32753514 PMCID: PMC7534917 DOI: 10.1523/JNEUROSCI.0972-20.2020
Source DB: PubMed Journal: J Neurosci ISSN: 0270-6474 Impact factor: 6.167
Figure 1.RL agent with memory does not learn a spatial alternation task in the same way as a group of rats. , Layout of the track. Reward wells were located at the end of the three arms of the track. , Probability of getting reward averaged across all rats (black; n = 10) and for the RL agent with just memory best fit (over 1012 trials) to the averaged data (orange) and fit to maximize reward (teal). The first 300 well visits are shown to highlight the trials over which the majority of the learning occurs. For each rat or single run of the agent, the presence or absence of reward over well visits was smoothed with a Gaussian filter with a SD of 2.25 well visits. For all curves, the width of the bar indicates SEM. Dotted lines show an exponential fit to the first 300 well visits. , Graphic of RL agent. Colored symbols, and , reflecting the transition propensities and the value approximation, respectively, indicate the entities that change as the agent goes to arms, , and does or does not get reward, . The state of this agent, and therefore the probability of transitioning to each of the arms, , is defined by the current arm location, , and the previous arm location, , of the agent. The propensities, , are comprised only of the state-based transition matrix (i.e., the memory component). , Probability of visiting each of the arms within the first 10 trials averaged (±SEM) across all rats (black), across all repeats of the best fit model to the rewards (orange), and across all repeats of the model that maximizes the rewards (teal). , Values of for the exponential fits to the learning performance in panel . Vertical extent of the bars indicate the 99% confidence interval of the fit value. , Average inbound (top) and outbound (bottom) errors across all rats (±SEM; black), for the model that best fits the reward rate (orange), and for the fit that maximizes the reward (teal) as shown in panel . A third set of parameter values was fit to minimize the discrepancy between the inbound and outbound errors of the model and the averaged errors of the rats. These parameter values turn out to be very similar to those that maximize the total reward of the model, and the curves are therefore obscured by the teal lines (and so are not shown in part or ). Inbound and outbound errors for each animal were smoothed with a Gaussian filter with a SD of 2.25 errors and then interpolated to reflect well visits.
Figure 2.RL agent with memory and dynamic spatial preferences can learn a spatial alternation task as rapidly as a group of rats. , Graphic of RL agent. Colored symbols, and , indicate the entities that change as the agent goes to arms, , and does or does not get reward, . The state of this agent, and therefore the probability of transitioning to each of the arms, , is defined by the current arm location and the previous arm location of the agent. The propensities are comprised of the state-based transition matrix (i.e., the memory component) combined with an independent arm preference and a neighbor transition preference . , Probability of getting reward averaged across all rats (black; n = 10) and for the RL agent with memory and the dynamic preferences for individual arms and neighbor transitions (green). The first 300 well visits are shown to highlight the time over which the majority of the learning occurs. For each rat or single run of the agent, the presence or absence of reward over well visits was smoothed with a Gaussian filter with a SD of 2.25 well visits. For all curves the width of the bar indicates SEM. Dotted lines show an exponential fit to the first 300 well visits. , Values of for the exponential fits to the learning performance in panel . Vertical extent of the bars indicate the 99% confidence interval of the fit value. , Average inbound (top) and outbound (bottom) errors across all rats (±SEM; black) and for the best fit model to those errors (green; different parameters than from ). Inbound and outbound errors for each animal were smoothed with a Gaussian filter with a SD of 2.25 errors and then interpolated to reflect well visits.
Figure 3.RL agent with memory and dynamic preferences fits spatial alternation learning behavior of individual rats. , Inbound (top) and outbound (bottom) error likelihood for an individual animal (black). Values smoothed with a Gaussian filter with a SD of 2.25 errors and then interpolated to reflect well visits. In green is the average behavior of 200 repeats of the model using the parameters that minimize the rms difference between the model and the animal, over all trials performed by that animal. The periodic bumps in the plot of the inbound errors reflect the beginning of a session where the rat or agent is likely to not start at arm 2 and thereby makes an inbound error. , Average inbound (top) and outbound (bottom) errors (±SEM) across all rats (black) and individual fits to each rat (green). Dotted lines show an exponential fit to the curves.