| Literature DB >> 21354558 |
A Nicolle1, M Symmonds, R J Dolan.
Abstract
Action-outcome contingencies can be learnt either by active trial-and-error, or vicariously, by observing the outcomes of actions performed by others. The extant literature is ambiguous as to which of these modes of learning is more effective, as controlled comparisons of operant and observational learning are rare. Here, we contrasted human operant and observational value learning, assessing implicit and explicit measures of learning from positive and negative reinforcement. Compared to direct operant learning, we show observational learning is associated with an optimistic over-valuation of low-value options, a pattern apparent both in participants' choice preferences and their explicit post-hoc estimates of value. Learning of higher value options showed no such bias. We suggest that such a bias can be explained as a tendency for optimistic underestimation of the chance of experiencing negative events, an optimism repressed when information is gathered through direct operant learning.Entities:
Mesh:
Year: 2011 PMID: 21354558 PMCID: PMC3081069 DOI: 10.1016/j.cognition.2011.02.004
Source DB: PubMed Journal: Cognition ISSN: 0010-0277
Fig. 1Timeline for both actor session and observer session. Learning blocks (dark gray) and test blocks (light gray) alternate nine times, with rests at three regular intervals. In learning trials, actors make a free choice between a stimulus pair, indicated by the blue box. Outcomes of the chosen and unchosen stimulus are then displayed sequentially, with a yellow box indicating a win, and red indicating no win. In observer sessions, learning trials differ only in participants’ response. Here, participants wait until the blue box is shown, indicating the “other participant’s” choice, and then press the button corresponding to the selected stimulus. Outcomes are presented as in the actor session. In test trials, free choices between stimulus pairs are made by both actors and observers, but outcomes are not displayed.
Fig. 2In (a) choice accuracy for each test trial gamble pair is shown collapsed across test block. In (b) the change in choice accuracy from actor to observer learning sessions (observer accuracy–actor accuracy) is plotted separately for each of the nine test blocks. Pairs are labeled according to the probability of a win for each stimulus. Choice accuracy is measured as the probability that participants chose the stimulus with the highest probability of a win. Actor and observer learning differed only for the 40/20 p{win} pair, with observers showing significantly lower accuracy compared to actors. Error bars show the standard error of the mean.
Fig. 3Participants’ estimated probability of a win (p{win}) for each stimulus, learned during the actor and observer sessions, plotted against the actual p{win} for each stimulus. Observers significantly overestimated the p{win} for the 20% win stimulus, compared to actors. Error bars show the standard error of the mean.