| Literature DB >> 31320592 |
Nitzan Shahar1,2, Rani Moran3,2, Tobias U Hauser3,2, Rogier A Kievit2,4, Daniel McNamee3,2, Michael Moutoussis3,2, Raymond J Dolan3,2.
Abstract
Model-free learning enables an agent to make better decisions based on prior experience while representing only minimal knowledge about an environment's structure. It is generally assumed that model-free state representations are based on outcome-relevant features of the environment. Here, we challenge this assumption by providing evidence that a putative model-free system assigns credit to task representations that are irrelevant to an outcome. We examined data from 769 individuals performing a well-described 2-step reward decision task where stimulus identity but not spatial-motor aspects of the task predicted reward. We show that participants assigned value to spatial-motor representations despite it being outcome irrelevant. Strikingly, spatial-motor value associations affected behavior across all outcome-relevant features and stages of the task, consistent with credit assignment to low-level state-independent task representations. Individual difference analyses suggested that the impact of spatial-motor value formation was attenuated for individuals who showed greater deployment of goal-directed (model-based) strategies. Our findings highlight a need for a reconsideration of how model-free representations are formed and regulated according to the structure of the environment.Entities:
Keywords: decision making; motor learning; reinforcement learning
Year: 2019 PMID: 31320592 PMCID: PMC6689934 DOI: 10.1073/pnas.1821647116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.Schematic of the 2-step task. (A) At a first stage, participants choose between 2 options (represented by abstract fractal images) that determine the presentation of 1 of 2 second-stage states according to a fixed transition probability of 70% (“common”) or 30% (“rare”). At a second stage, participants also choose between 2 fractals to gain a reward (£0 or £1 play pounds). (B) Fractals were randomly assigned on each trial and stage to the right/left side of the screen. Participants indicated their choice by pressing a corresponding left/right arrow key. Therefore, the same fractal could be selected by either a left or right key press, and a fractal’s excepted value was unrelated to location on screen or the motor effector response used to report a choice. The panel illustrates 2 random trial sequences where a common transition took place. These trial sequences demonstrate that the same fractal selection could have been made with relation to different motor effector responses. Additional task information can be found in .
Fig. 2.Within-state effects of reward on fractal and response key selection. (A) Example for a within-state trial sequence, where the same pair of second-stage fractals was offered with either the same or a flipped response mapping. (B) Effect of outcome (rewarded vs. unrewarded) and mapping (flipped vs. same) on the probability of choosing the same fractal at trial n + 1. Results highlight a tendency to repeat fractal selection after a rewarded trial. Notably, a greater effect of reward was evident when the fractal was mapped to the same compared with the alternative response key. This indicates that the effects of reward are evident at the level of the relevant fractal but also at the level of the outcome-irrelevant response key. Error bars represent 95% confidence intervals. (C) Raincloud plot (34) showing individual scores for the outcome × mapping interaction effect (as calculated in a mixed effect regression). Positive values indicate greater involvement of spatial-motor value associations on choice behavior.
Fig. 3.Between-state effect of reward on response key selection. (A) Example for sequence of trials included in the analysis, where a different pair of fractals was offered in the current and next second-stage state. (B) Effect of previous reward (rewarded vs. unrewarded) on the chance that the participant will select the same response key as in trial n + 1. The result reflects that value was assigned to the response key independent of task states or fractal identity. Error bars represents 95% confidence intervals. (C) Raincloud plot (34) showing individual scores for the outcome effect (as calculated in a mixed effect regression). Positive values indicate greater involvement of spatial-motor value associations on choice behavior.
Fig. 4.Between-stage effect of reward on response key selection. (A) Example for sequence of trials included in the analysis where we examine response key selection at the first stage of trial n + 1 as a function of response key selection at the second stage and reward at trial n. (B) Effect of reward (rewarded vs. unrewarded) on the chance that the participant will select in the first stage of trial n + 1 the same response key selected at the second stage of trial n. Results suggest that value was assigned to the outcome-irrelevant response key (or fractal location) independent of task stage, state, or fractal identity. Error bars represents 95% confidence intervals. (C) Raincloud plot (34) showing individual scores for the outcome effect (as calculated in the mixed effect regression). Positive values indicate greater involvement of spatial-motor value associations on choice behavior.