| Literature DB >> 28653668 |
Aaron M Bornstein1, Mel W Khaw2, Daphna Shohamy3,4, Nathaniel D Daw1,5.
Abstract
We provide evidence that decisions are made by consulting memories for individual past experiences, and that this process can be biased in favour of past choices using incidental reminders. First, in a standard rewarded choice task, we show that a model that estimates value at decision-time using individual samples of past outcomes fits choices and decision-related neural activity better than a canonical incremental learning model. In a second experiment, we bias this sampling process by incidentally reminding participants of individual past decisions. The next decision after a reminder shows a strong influence of the action taken and value received on the reminded trial. These results provide new empirical support for a decision architecture that relies on samples of individual past choice episodes rather than incrementally averaged rewards in evaluating options and has suggestive implications for the underlying cognitive and neural mechanisms.Entities:
Mesh:
Year: 2017 PMID: 28653668 PMCID: PMC5490260 DOI: 10.1038/ncomms15958
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1Restless bandit task and re-analysis.
(a) Four-armed bandit from Daw et al. (2006). Participants chose between four slot machines to receive points. (b) Payoffs. The mean amount of points paid out by each machine varied slowly over the course of the experiment. (c) Model comparison. Log Bayes factors favouring sampling over the TD model.
Fit model parameters for Experiment 1.
| TD | 0.7754 (0.0472) | 8.7069 (0.9564) | 0.6498 (0.0193) | 8.8867 (1.0811) |
| Sampler | 0.7277 (0.0447) | 9.0130 (1.1012) | 0.3046 (0.1288) | - |
TD, temporal difference.
The parameters shown are the mean (s.e.m.) across subjects. The final column shows the mean (s.e.m.) of the log Bayes Factor versus the Sampler model (smaller is better).
Figure 2Ticket bandit task.
(a) The ticket-bandit task. Each slot machine (‘bandit’) delivered tickets—trial-unique photographs—associated with a dollar value—either −$5 or $5. (b) Payoff probabilities. The probability of each bandit paying out a winning ticket varied slowly over the course of the experiment. Participants were told that their total payout would be contingent both on the number of winning tickets they accrued and their ability to correctly respond on a post-task memory test asking them to recall the reward value and slot machine associated with each ticket. (c) Memory probes. Participants encountered 32 recognition memory probes. On 26 of these probe trials, participants were shown objects that were either received on a previous choice trial (‘valid’), whereas on others they were shown new objects that were not part of any previous trial (‘invalid’). Participants were asked only to perform a simple old/new recognition judgement—to press ‘yes’ if they had seen the image previously in this task and ‘no’ if they had not. After each recognition probe, the sequence of slot machine choices continued as before.
Figure 3Ticket bandit results.
(a) Model comparison. Log Bayes factors favouring sampling over the TD model. (b) Impact of probes. As in standard RL models, choices are affected by previously observed rewards (black points). Here, memory probes evoking past decisions (red) also modulate choices on the subsequent choice trial. Data points are log odds of choosing the righthand option. (*P<0.05, **P<0.01 and ***P<0.001).
Fit model parameters for Experiment 2.
| TD | 0.5552 (0.0862) | – | 1.7551 (0.6845) | −0.0930 (0.2354) | 6.9182 (1.3227) |
| Sampler | 0.5393 (0.0583) | 0.4386 (0.0990) | 2.2869 (0.4943) | 0.5855 (0.3215) | – |
TD, temporal difference.
The parameters shown are the mean (s.e.m.) across subjects. The final column shows the mean (s.e.m.) of the log Bayes Factor versus the Sampler model (smaller is better).
Ticket values are modified by performance on a post-task recall memory probe.
| Correct on both questions | $5 | |
| Incorrect on one question | −$5 |
After the main slot machine task, ‘tickets’ paid out by the machines were presented to the participant again. The participant was asked to recall two specific details associated with the ticket: the machine that paid it out (left or right) and the value of the ticket (−$5 or $5). To encourage participants to encode the ticket-machine-value triplet, they were told that the final value of the tickets would depend on both the original value of the ticket and the participant’s performance on two post-task recall questions. If they answered either question incorrectly, $5 tickets were modified to be worth $0. If they answered both questions correctly, −$5 tickets were modified to be worth −$2.5. The payout values of each ticket after the memory tests are described in this table. Values altered by the results of the memory tests are highlighted in bold.