| Literature DB >> 29614130 |
Carolina Feher da Silva1, Todd A Hare1,2.
Abstract
Many studies that aim to detect model-free and model-based influences on behavior employ two-stage behavioral tasks of the type pioneered by Daw and colleagues in 2011. Such studies commonly modify existing two-stage decision paradigms in order to better address a given hypothesis, which is an important means of scientific progress. It is, however, critical to fully appreciate the impact of any modified or novel experimental design features on the expected results. Here, we use two concrete examples to demonstrate that relatively small changes in the two-stage task design can substantially change the pattern of actions taken by model-free and model-based agents as a function of the reward outcomes and transitions on previous trials. In the first, we show that, under specific conditions, purely model-free agents will produce the reward by transition interactions typically thought to characterize model-based behavior on a two-stage task. The second example shows that model-based agents' behavior is driven by a main effect of transition-type in addition to the canonical reward by transition interaction whenever the reward probabilities of the final states do not sum to one. Together, these examples emphasize the task-dependence of model-free and model-based behavior and highlight the benefits of using computer simulations to determine what pattern of results to expect from both model-free and model-based agents performing a given two-stage decision task in order to design choice paradigms and analysis strategies best suited to the current question.Entities:
Mesh:
Year: 2018 PMID: 29614130 PMCID: PMC5882146 DOI: 10.1371/journal.pone.0195328
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Scheme of a typical two-stage task.
The thicker arrow indicates the common transition and the thinner arrow indicates the rare transition.
Fig 2Results from the classical two-stage task as originally reported by Daw and colleagues (A) and variations (B–F), obtained by simulating model-free and model-based agents.
In all panels, the behavior of simulated model-free agents are shown in the left bar-plots and model-based agents on the right. The y-axis shows the probability of staying with (i.e. repeating) the same action made on the previous trial. The x-axis separates the data as a function of previous outcome (rewarded, unrewarded) and transition (common = dark grey, rare = light grey). The data were analyzed by logistic regression, in which the stay probability was computed as a function of the previous outcome and transition, with the analysis in panel E) being modified to include additional regressors (see Section “Unequal reward probabilities make model-free agents indirectly sensitive to transition probabilities”). The reward probabilities at each second stage and the agents’ eligibility trace (λ) are listed for each panel. A) The results from the classic two-stage task, as described by Daw et al. [1]. B) shows the pattern of stay probabilities when the second-stage rewards are fixed at 0.8 and 0.2. C) is identical to panel A, except that both second-stage reward probabilities are fixed at 0.5 instead of drifting independently around a mean of 0.5. D) is identical to panel B, except that the agents’ eligibility traces are set to values < 1 instead of equal to 1. E) plots the same data as B), but analyzed with the extended logisitic regression discussed in Section “Unequal reward probabilitiesmake model-free agents indirectly sensitive to transition probabilities”. Lastly, F) presents the results of the modified task discussed in Section “Model-based agents will show main effects of transition in addition to transition by reward interactions under specific task conditions”, in which the second-stage reward probabilities sum to a value greater than 1.
Fig 3Difference in stay probability for model-based agents.
Differences between the sum of the stay probabilities for model-based agents following common versus rare transitions (i.e., the sum of the dark gray bars minus the sum of the light gray bars) as a function of the sum of the reward probabilities at the final state (p + b). This specific example plot was generated assuming that final state reward probabilities are equal (p = b) and that the exploration-exploitation parameter in Eq 16 is β = 2.5. When computing the differences in stay probability on the y-axes, P stands for the stay probability after a common transition and a reward, P is the stay probability after a common transition and no reward, P is the stay probability after a rare transition and a reward, and P is the stay probability after a rare transition and no reward.