| Literature DB >> 27896312 |
Aaron J Gruber1, Rajat Thapa1.
Abstract
The propensity of animals to shift choices immediately after unexpectedly poor reinforcement outcomes is a pervasive strategy across species and tasks. We report here that the memory supporting such lose-shift responding in rats rapidly decays during the intertrial interval and persists throughout training and testing on a binary choice task, despite being a suboptimal strategy. Lose-shift responding is not positively correlated with the prevalence and temporal dependence of win-stay responding, and it is inconsistent with predictions of reinforcement learning on the task. These data provide further evidence that win-stay and lose-shift are mediated by dissociated neural mechanisms and indicate that lose-shift responding presents a potential confound for the study of choice in the many operant choice tasks with short intertrial intervals. We propose that this immediate lose-shift responding is an intrinsic feature of the brain's choice mechanisms that is engaged as a choice reflex and works in parallel with reinforcement learning and other control mechanisms to guide action selection.Entities:
Keywords: WSLS; decay; lose-switch; memory; reinforcement
Mesh:
Year: 2016 PMID: 27896312 PMCID: PMC5112541 DOI: 10.1523/ENEURO.0167-16.2016
Source DB: PubMed Journal: eNeuro ISSN: 2373-2822
Figure 3.Invariance of lose-shift and win-stay models to movement times. , Frequency of population ITIs after losses showing that intervals were increased for long (green) compared with short (dark) barriers. , Probability of lose-shift computed across the population independently for short (dark) and long (green) barriers. Both conditions were fitted well by the common model (dark solid line). The change in the area under the curve computed independently for each subject between conditions shows no difference (inset), indicating that the mnemonic process underlying lose-shift responding is invariant to the ITI distribution. , , Plots of ITI and probability of stay responses after wins, showing that win-stay is also invariant to barrier length. , Mean lose-shift responding across subjects is decreased by longer barriers. , Within-subject ITI increases after loss trials under long barriers compared with short barriers. , Mean within-subject change in the probability of lose-shift due to longer barriers is predicted (magenta dashed line) by the change in ITI based on the log-linear model. , Mean probability of win-stay computed across animals is not altered by barrier length. , Long barriers led to more rewarded trials per session because of the reduction in predictable lose-shift responding. , Mean probability of lose-shift for bins of 20 trials and rats for long and short barriers, showing an increase across sessions for either barrier length. , Mean ITI after loss for each barrier condition, showing a decrease within the session. , Mean number of licks prior to reinforcement across the session, showing a decrease within sessions but no effect of barrier length. (, inset) Plots of lose-shift and licking for each barrier condition, showing that licking is not sufficient to account for variance in lose-shift between barrier conditions. Statistically significant difference among group means: *p < 0.05, ***p < 0.001. Error bars show SEM.
Figure 5.Responses during every training session for one cohort. Responses plotted for each rat (symbol-color) and each day of training. Session 1 is the second time the rats were placed into the behavioral box, and reward probability was p = 0.5 for each feeder regardless of previous responses or rewards. , Number of trials completed in each session. Rats were allowed 90 min to complete up to 150 trials in sessions 1–10, and hallways of increasing lengths were introduced in sessions 3–8. , Plot of the probability of responding to the rightward feeder, probability of lose-shift, and probability of win-stay during the first 16 sessions. The majority of rats showed no side bias, strong lose-shift, and very little win-stay in initial trials. Only a few rats showed initial side bias, and therefore little lose-shift and strong win-stay (blue shading in panels ). Lose-shift was invariant over training, whereas win-stay increased (see text). Dark lines indicate median across all subjects for each day.
Figure 1.Prevalence of win-stay and lose-shift responses. , Schematic illustration of the behavioral apparatus. , Scatter plot and population histograms of win-stay and lose-shift responding, showing that these strategies are anticorrelated among subjects. , Frequency of ITIs after loss trials across the population. , Probability of lose-shift computed across the population for the bins of ITI in , revealing a marked log-linear relationship. Individual subjects also exhibit this behavior, as indicated by the nonzero mean of the frequency histogram of linear coefficient terms for fits to each subject’s responses (inset; see text for statistical treatment). , , Plots for win-stay analogous to those in and reveal a log-parabolic relationship with ITIs in the population and individual subjects. Vertical lines in and indicate SEM, and the dashed lines indicate chance levels (Prob = 0.5).
Details of statistical treatments.
| a | Mean of lose-shift probability across the population is not equal to 0.5. | 97 | 19.2 | 1.00E–34 | Reject H0 | 1 | Subjects | 0 | ||
| b | Mean of win-stay probability across the population is not equal to 0.5. | 97 | 1.4 | 0.17 | Accept H0 | 0.74 | Subjects | 0 | ||
| c | Relationship between win-stay and lose-shift across subjects is not linearly correlated. | Linear regression | 97 | 32.2 | 1.00E–06 | Reject H0 | 0.72 | Subjects | 0 | |
| d | Relationship between lose-shift probability and ITI computed from binned aggregate data from all subjects is explained by a constant model. | 14 | 398 | 1.00E–11 | Reject H0 | 1 | Binned probabilities | 0 | ||
| e | Mean regression slope computed from the independent log-linear regression of lose-shift to ITI is not different from 0. | 54 | 40 | 1.00E–40 | Reject H0 | 1 | Subjects | 42 | Insufficient samples for regression (criterion is ≥25 samples in 4 consecutive bins, after removing trials that follow entry of the non-chosen feeder) | |
| f | Relationship between win-stay probability and ITI for binned data across subjects is explained by a constant model. | 14 | 12.8 | 1.00E–03 | Reject H0 | 0.99 | Binned probabilities | 0 | ||
| g | Mean regression factor for the quadratic term computed from the independent regression of lose-shift to log10(ITI) is not different from 0. | 63 | 6.6 | 1.00E–08 | Reject H0 | 0.96 | Subjects | 32 | Insufficient samples for regression (criterion is ≥25 samples in 4 consecutive bins, after removing trials that follow entry of the non-chosen feeder) | |
| h | Relationship between the ITI after wins and the ITI after losses is explained by a constant model. | 97 | 225 | 1.00E–26 | Reject H0 | 1 | Subjects | 0 | ||
| i | Relationship between subject-wise lose-shift probability and logarithm of the ITI after losses is explained by a constant model. | 97 | 20.6 | 2.00E–05 | Reject H0 | 0.99 | Subjects | 0 | ||
| j | Relationship between subject-wise win-stay probability and logarithm of the ITI after wins is explained by a constant model. | 97 | 1.8 | 0.18 | Accept H0 | 0.6 | Subjects | 0 | ||
| k | Response time is invariant to the trial position within sessions, independent of barrier length (i.e., main effect). | RM-ANOVA | 9,864 | 2.8 | 0.003 | Reject H0 | 0.96 | Binned trials and subjects | 0 | |
| l | Anticipatory licking is invariant to the trial position within sessions, independent of barrier length (i.e., main effect). | RM-ANOVA | 9,864 | 8.8 | 1.00E–06 | Reject H0 | 1 | Binned trials and subjects | 0 | |
| m | Relationship between the within-session change in anticipatory licking and total licks (per trial) is explained by a constant model. | 8 | 38.7 | 3.00E–04 | Reject H0 | 0.99 | Binned trials | 0 | ||
| n | The prevalence of lose-shift responding is invariant to the trial position within sessions, independent of barrier length (i.e., main effect). | RM-ANOVA | 9,864 | 2.2 | 0.02 | Reject H0 | 0.89 | Binned trials and subjects | 0 | |
| o | Relationship between the within-session change in lose-shift prevalence and anticipatory licking is explained by a constant model. | 8 | 27.8 | 7.00E–04 | Reject H0 | 0.99 | Binned trials | 0 | ||
| ITI after loss is invariant to the trial position within sessions, independent of barrier length (i.e., main effect). | RM-ANOVA | 9,864 | 29 | 1.00E–06 | Reject H0 | 1 | Binned trials and subjects | 0 | ||
| q | Relationship between the within-session change in lose-shift prevalence and log ITI after loss is explained by a constant model. | 8 | 24.8 | 1.00E–03 | Reject H0 | 0.99 | Binned trials | 0 | ||
| r | Mean running speed in the presence of shorter barriers is not different from the mean running speed in the presence of the longer barriers. | 18 | 0.05 | 0.96 | Accept H0 | 0.96 | Subjects | 0 | ||
| s | Mean % change in A.U.C for lose-shift vs. log(ITI) due to increasing barrier length for each subject is not different from 0 | 16 | 0.09 | 0.93 | Accept H0 | 0.95 | Subjects (within) | 2 | Insufficient samples for regression (criterion is ≥25 samples in 4 bins) | |
| t | Mean % change in A.U.C for win-stay vs. log(ITI) due to increasing barrier length for each subject is not different from 0 | 14 | 0.55 | 0.59 | Accept H0 | 0.87 | Subjects (within) | 5 | Insufficient samples for regression (criterion is ≥25 samples in 4 bins) | |
| u | Mean change in lose-shift probability across subjects when the longer barrier is introduced is not different from 0. | 18 | 4.7 | 2.00E–04 | Reject H0 | 0.71 | Subjects (within) | 0 | ||
| v | Mean difference between predicted and actual lose-shift decrease due to increased barrier length is not different from 0. | 18 | 0.14 | 0.89 | Accept H0 | 0.95 | Subjects (within) | 0 | ||
| w | Mean change in rewarded trials due to barrier length is not different from 0. | 18 | 2.45 | 0.02 | Reject H0 | 0.92 | Subjects (within) | 0 | ||
| x | The prevalence of lose-shift responding is invariant to the trial position within sessions, independent of barrier length (i.e., main effect). | RM-ANOVA | 6,109 | 1.6 | 0.16 | Accept H0 | 0.42 | Binned trials and subjects | 0 | |
| y | The ITI after loss is invariant to the trial position within sessions, independent of barrier length (i.e., main effect). | RM-ANOVA | 6,109 | 5.7 | 3.00E–05 | Reject H0 | 0.99 | Binned trials and subjects | 0 | |
| z | Anticipatory licking is invariant to the trial position within sessions, independent of barrier length (i.e., main effect). | RM-ANOVA | 6,109 | 6.8 | 4.00E–06 | Reject H0 | 1 | Binned trials and subjects | 0 | |
| aa | The prevalence of lose-shift responding is invariant to barrier length, independent of changes due to trial position in the session (i.e., main effect). | RM-ANOVA | 1,18 | 8.3 | 0.01 | Reject H0 | 0.78 | Binned trials and subjects | 0 | |
| ab | The ITI after loss is invariant to barrier length, independent of changes due to trial position in the session (i.e., main effect). | RM-ANOVA | 1,18 | 28 | 5.00E–05 | Reject H0 | 1 | Binned trials and subjects | 0 | |
| ac | Anticipatory licking is invariant to barrier length, independent of changes due to trial position in the session (i.e., main effect). | RM-ANOVA | 1,18 | 0.5 | 0.52 | Accept H0 | 0.9 | Binned trials and subjects | 0 | |
| ad | Relationship between lose-shift responding and anticipatory licking is explained by a constant model. | 5 | 10.1 | 0.02 | Reject H0 | 0.58 | Binned trials | 0 | ||
| ae | Mean difference in win-stay probability across subjects computed after a previous win vs. two previous wins at the same feeder is not greater than 0. | 48 | 10.2 | 1.00E–13 | Reject H0 | 1 | Subjects (within) | 2 | Insufficient occurrence of win-stay-wins sequences (criterion is ≥25) | |
| af | Mean difference in lose-shift probability across subjects computed after a previous loss vs. two previous losses at the same feeder is not greater than 0. | 32 | 2.2 | 0.99 | Accept H0 | 1 | Subjects (within) | 18 | Insufficient occurrence of lose-stay-lose sequences (criterion is ≥25) | |
| ag | Mean prediction accuracy of the Q-learning model and win-stay-lose-shift is not different from 0. | 34 | 5.2 | 1.00E–05 | Reject H0 | 0.96 | Subjects | 0 | ||
| ah | The median probability of lose-shift on the second training session is not different from chance (0.5). | Wilcox | 17 | 0.03 | Reject H0 | 0.77 | Subjects | 0 | ||
| ai | Mean probability of lose-shift did not change across training or testing days. | RM-ANOVA | 15,150 | 0.54 | 0.91 | Accept H0 | 1 | Subjects, sessions | 0 | |
| aj | Mean probability of win-stay did not change across training or testing days. | Wilcox | 17 | 0.01 | Reject H0 | 0.83 | Subjects | 0 | ||
| ak | Mean probability of win-stay did not change across training or testing days. | RM-ANOVA | 15,150 | 2.3 | 5.00E–03 | Reject H0 | 1 | Subjects, sessions | 0 |
Figure 2.Within-session changes of dependent variables. , Mean response time (from nose-poke to feeder) over 15 consecutive trials and all animals in Fig. 1. Response time increases throughout the session after trial 30, suggesting a progressive decrease in motivation. , Mean number of licks before reinforcement, which decreases within the session. The number of these anticipatory licks correlates strongly with the total number of licks at each feeder within the session (inset). , Mean probability of lose-shift, which increases within the session and negatively correlates with licking (inset). , Mean ITI after loss trials decreases within session. The within-session variance of lose shift correlates strongly with the log of the within-session ITI after losses (inset). Error bars indicate SEM.
Figure 4.Effect of consecutive wins or losses on choice: test for reinforcement learning. , Plot of probability of a stay response on trial n, after a win (i.e., win-stay; left) or win-stay-win sequence (right) for each rat. The latter is the probability that the rat will chose the same feeder in three consecutive trials given wins on the first two of the set. The data show an increased probability of repeating the choice given two previous wins on the same feeder compared with a win on the previous trial, consistent with RL theory. , Plot of probability of a switch response on trial n after a loss (lose-shift; left) or after a lose-stay-lose sequence (right). The probability of shifting after two consecutive losses to the same feeder is not greater than the probability of shifting after a loss on the previous trial, which is inconsistent with the predictions of RL theory. In both plots, gray lines indicate a within-subject increase in probability, whereas red lines indicate a decrease. ***Statistical significance of increased probability (p < 0.001) within subjects.