Literature DB >> 19005555

Associative learning of social value.

Timothy E J Behrens¹, Laurence T Hunt, Mark W Woolrich, Matthew F S Rushworth.

Abstract

Our decisions are guided by information learnt from our environment. This information may come via personal experiences of reward, but also from the behaviour of social partners. Social learning is widely held to be distinct from other forms of learning in its mechanism and neural implementation; it is often assumed to compete with simpler mechanisms, such as reward-based associative learning, to drive behaviour. Recently, neural signals have been observed during social exchange reminiscent of signals seen in studies of associative learning. Here we demonstrate that social information may be acquired using the same associative processes assumed to underlie reward-based learning. We find that key computational variables for learning in the social and reward domains are processed in a similar fashion, but in parallel neural processing streams. Two neighbouring divisions of the anterior cingulate cortex were central to learning about social and reward-based information, and for determining the extent to which each source of information guides behaviour. When making a decision, however, the information learnt using these parallel streams was combined within ventromedial prefrontal cortex. These findings suggest that human social valuation can be realized by means of the same associative processes previously established for learning other, simpler, features of the environment.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2008 PMID： 19005555 PMCID： PMC2605577 DOI： 10.1038/nature07538

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

In order to compare learning strategies for social and reward-based information, we constructed a task in which each outcome revealed information both about likely future outcomes (reward-based information) and about the trust that should be assigned to future advice from a confederate (social information). 24 subjects performed a decision-making task requiring the combination of information from three sources (fig 1, methods and supplementary information): (i) the reward magnitude of each option (generated randomly at each trial); (ii) the likely correct response (blue or green) based on their own experience of rewards on each option; and (iii) the confederate’s advice, and how trustworthy the confederate currently was. When a new outcome was witnessed, subjects could use this single outcome to learn in parallel about the likely correct action, and the trustworthiness of the confederate.

Figure 1

Experimental task and behavioural findings. (a) Experimental task (See methods and Supplementary information). Each trial consists of four phases. Subjects are presented with a decision (CUE), receive the advice (red square) of the confederate (SUGGEST) and respond using a button press (grey square). An INTERVAL period follows, before the correct outcome is revealed (MONITOR). If the subject chooses correctly the red bar is incremented by the number of points on the chosen option. (b,c) Reward schedules for reward (b) and social (c) information. Dashed lines show the true probability of blue being correct (b) and the true probability of correct confederate advice (c). Each schedule underwent periods of stability and volatility. Solid lines show the model’s estimate of the probabilities. (d) Optimal model estimates of the volatility of reward (green) and social (red) information. (e) Logistic regression on subject behaviour. Factors included were: The reward magnitude difference between options (RMD); the outcome probability derived from the model using reward outcomes (RLO); the outcome probability derived from the model using confederate advice (RLC); the possibility that the subjects would blindly follow the confederate without learning (BFC); and the possibility that subjects would assume the confederate would behave as in the previous trial (CPT). The logistic regression analysis revealed significant effects only on RMD, RLO and RLC.

The investigation resembles previous experiments that have compared animate and inanimate conditions in different trials or experiments5,6. Here, however, both sources of information were present on each trial outcome but the relevance of each was manipulated continuously allowing determination of both the fMRI signal and the behavioural influence associated with each source of information. Optimal behaviour in this task requires the subject to track the probability of the correct action and the probability of correct advice independently, and to combine these two probabilities into an overall probability of the correct response (supplementary information). Computational models of reinforcement learning (RL) have had considerable success in predicting how such probabilities are tracked in learning tasks outside the social domain7. The simplest RL models integrate information over trials by maintaining and updating the expected value of each option. When new information is observed this value is updated by the product of the prediction error and the learning rate7. In our task, there are two dissociable prediction errors; the reward prediction error (actual reward - expected value), for learning about the correct option; and the confederate prediction error (actual - expected fidelity), for learning about the trustworthiness of the confederate. The optimal learning rate depends on the volatility of the underlying information source8-10. In volatile conditions, subjects should give more weight to recent information, using a fast learning rate. In stable conditions, subjects should weigh recent and historical information almost equally, using a slow learning rate. By ensuring that the correct option and the confederate’s advice became volatile at different times, we ensured that the learning rate for these two sources of information varied independently. We used a Bayesian reinforcement learning (RL)8 model (supplementary info) to generate the optimal estimates of prediction error, volatility and outcome probability separately for each source of information (fig 1b,c,d). We first sought to establish whether human behaviour matched predictions from the RL model. We used logistic regression to determine the degree to which subject choices were influenced by the optimally-tracked confederate and outcome probabilities, and by the difference in reward magnitudes between options. Parameter estimates for all three information sources were significantly greater than zero, and there was no significant difference in the degree to which subjects used reward and social information to determine their behaviour (fig 1e). Furthermore there was no significant effect either of subjects blindly following confederate advice without learning its value, or of subjects assuming that the confederate would behave in the same way as the previous trial (fig 1e). Hence subjects were able to integrate the fidelity of the confederate over many trials in an RL-like fashion. We then investigated whether the FMRI signal reflected the model’s estimates of prediction error and volatility, for both social and reward information, when subjects witnessed new outcomes. In the reward domain, neural responses have been identified that encode these key parameters8, 11-16. Dopamine neurons in the ventral tegmental area (VTA) code reward prediction errors12, 13, 17. Similar signals are reported in the dopaminoceptive striatum11, 18 and even in the VTA itself, when specialized strategies are used in human fMRI studies19. FMRI correlates of the learning-rate in the reward domain have been reported in anterior cingulate sulcus (ACCs). If humans can learn from social information in a similar fashion, it should be possible to detect signals that co-vary with the same computational parameters, but in the social domain. We observed BOLD correlates of the confederate prediction error in dorsomedial prefrontal cortex (DMPFC) in the vicinity of the paracingulate sulcus, right middle temporal gyrus (MTG), and in the right superior temporal sulcus at the temporoparietal junction (STS/TPJ) (figure 2a). Equivalent signals were present in the left hemisphere at the same threshold, but did not pass the cluster extent criterion; similar effects were also found bilaterally in the cerebellum (supplementary information). Notably, these regions showed a pattern of activation similar to known dopaminergic activity in reward learning13, but for social information. Activity correlated with the probability of a confederate lie after the subject decision but before the outcome was revealed (a prediction signal). When the subjects observed the trial outcome, activity correlated negatively with this same probability, but positively with the event of a confederate lie (Figure 2b). This signal reflects both components of a prediction error signal for social information: The outcome (lie or truth) minus the expectation (Figure 2b). These signals cannot be influenced by reward prediction errors as the two types of prediction error were decorrelated in the task design. The presence of this prediction error signal in the brain is a prerequisite for any theory of an RL-like strategy for social valuation.

Figure 2

Predictions and prediction errors in social and non-social domains. Timecourses show (partial) correlations ± SEM. See figure S2. (a) Activation in the DMPFC, right TPJ/STS and MTG correlate with the social prediction error at the outcome (thresholded at Z>3.1, cluster size >50 voxels). (b) Deconstruction of signal change in the DMPFC. Similar results were found in the MTG and TPJ/STS. Top panel: Following the outcome, areas that encode prediction error correlate positively with the outcome and negatively with the predicted probability. Red: effect size of the confederate lie outcome (1 for lie, 0 for truth). Blue: effect size of the predicted confederate lie probability. To perform inference, we fit a hemodynamic model in each subject to the timecourse of this effect (i.e. to the blue line). The green line in the top panel shows the mean overall fit of this hemodynamic model (for comparison with the blue line). Bottom panel: The effect of lie probability (blue line from top panel) is decomposed into an hrf at each trial event (fig S2). Dashed and solid lines show mean responses±s.e.m. Each region showed a significant positive effect of predicted confederate lie probability after the decision (t(22)=1.96 (p<0.05), 1.73(p<0.05), 1.74(p<0.05) for DMPFC, MTG and TPJ/STS respectively). Crucially, each brain region showed a significant negative effect of predicted confederate lie probability after the outcome (t(22)=2.68 (p<0.005), 2.35 (p<0.05), 3.27 (p<0.005)). (c) Ventral striatum is taken as an example of a number of regions revealed by the voxelwise analysis of reward prediction error (thresholded at Z>3.1, cluster size >100 voxels) (d) Panels are exactly as in (b), but coded in terms of reward and not in terms of confederate fidelity. Top panel shows the parameter estimate relating to the expected value of the trial (blue line) and, after the outcome, the parameter estimate relating to the magnitude of these rewards (grey line). To test for prediction error coding, we again fit a hemodynamic model to the expectation parameter estimate (shown by the green line, for comparison with blue line). Bottom panel: The timecourse showed a significant positive effect during the time of the decision (t(22)=3.32 (p<0.002)), and a significant negative effect after the trial outcome (t(22)=2.50, p<0.05) - see supplementary information for further discussion.

We performed a similar analysis for prediction errors on reward information (reward minus expected reward). We found a significant effect of reward prediction error in the ventral striatum (figure 2c), the ventromedial prefrontal cortex, and anterior cingulate sulcus (see supplementary information). As in the social domain, we observed significant effects of all three elements of the reward prediction error (Figure 2d) (see supplementary information for discussion). As previously demonstrated8, the volatility of action-outcome associations predicted BOLD signal in a circumscribed region of ACCs (figure 3a). This effect varied across people such that those whose behaviour relied more on their own experiences (supplementary information) showed a greater volatility related signal in this region (figure 3b). The volatility of confederate advice correlated with BOLD signal in a circumscribed region in the adjacent ACC gyrus (ACCg) (figure 3a). Subjects whose behaviour relied more on this advice showed greater signal change in this region (figure 3c). Notably, this double dissociation [reflected in a three way interaction between area (ACCs versus ACCg), volatility type (social versus outcome) and degree of reliance on social (F1,20=7.145, p=0.015) or experiential information (F1,20=5.379, p=0.031)] can be understood by reference to a dissociation in macaque monkeys. Selective lesions to ACCs but not ACCg impair reward-guided decision-making in the reward domain20. In the social domain, male macaques will forego food to acquire information about other individuals21, 22. Selective lesions to ACCg but not ACCs abolish this effect23. We found that BOLD signals in these two regions reflect the respective values of the same outcome for learning about the two different sources of information.

Figure 3

Agency-specific learning rates dissociate in the ACC (a) Regions where the BOLD correlates of reward (green) and confederate (red) volatility predict the influence that each source of information has on subject behaviour (Z>3.1, p<0.05 cluster corrected for cingulate cortex). Subjects with high BOLD signal changes in response to reward volatility in the ACC sulcus are guided strongly by reward history information (max Z=3.7, correlation (R=0.7163, p<0.0001) shown in (b)). Subjects with high BOLD signal changes in response to confederate advice volatility in the ACC gyrus are guided strongly by social information (max Z=4.1, correlation (R=0.7252, p<0.0001) shown in (c)).

Learning about reward probability from vicarious and personal experiences recruits distinct neural systems, but subjects combine information across both sources when making decisions (figure 1e). A ventromedial portion of the prefrontal cortex (VMPFC) has been shown to code such an expected value signal for the chosen action24, 25 during decision-making. We computed two probabilities of reward on the subject’s chosen option; one based only on experience and one based only on confederate advice. BOLD Signal in the VMPFC was significantly correlated with both probabilities (figures 4a and S4). However, there was subject variability in whether the VMPFC signal better reflected the reward probability based on outcome history or on social information. The extent to which the VMPFC data reflected each source of information (at the time of the decision) was predicted by the ACCs/ACCg response to outcome/social volatility (at the time when the outcomes were witnessed) (figure 4b,c).

Figure 4

Combination of expected value of chosen option in VMPFC. (a) Activation for the combination (mean contrast) of experience-based probability during CUE and SUGGEST phases, and advice-based probability during SUGGEST phase (thresholded at Z>3.1, p<0.005 cluster-corrected for VMPFC). These phases represent the times at which subjects had these probabilities available to them (see supplementary information and figure S4). (b) Correlation between effect of outcome-based probability in VMPFC during the decision and effect of outcome volatility in ACCs during MONITOR (R = 0.6119, p<0.0002). (c) Correlation between effect of confederate-based probability in VMPFC during the decision and effect of confederate volatility in ACCs during MONITOR (R = 0.6119, p<0.0002).

Here, we have shown that the weighting assigned to social information is subject to learning and continual update via associative mechanisms. We use techniques that predict behaviour when learning from personal experiences to show that similar mechanisms explain behaviour in a social context. Furthermore, we demonstrate fundamental similarities between the neural encoding of key parameters for reward-based and social learning. Despite employing similar mechanisms, distinct anatomical structures code learning parameters in the two domains. However, information from both is combined in ventromedial prefrontal cortex when making a decision. By comparing the two sources of information, we find that social prediction error signals similar to those reported in dopamine neurons for reward-based learning are coded in the MTG, STS/TPJ and DMPFC. BOLD signal fluctuations in these regions are often seen in social tasks26, 27, and in tasks which involve the attribution of motive to stimuli28.Such activations have been thought critical in studies of theory of mind28. That these regions should code quantitative prediction and prediction error signals about a confederate, lends more weight to the argument that social evaluation mechanisms are able to rely on simple associative processes. A second crucial parameter in reinforcement learning models is the learning rate, reflecting the value of each new piece of information. In the context of reward-based learning, this parameter predicts BOLD signal fluctuations in ACC sulcus at the crucial time for learning8 - a finding that is replicated here. We further demonstrate that the exact same computational parameter, in the context of social learning, predicts BOLD fluctuations in the neighbouring ACC gyrus. This functional dissociation is mirrored by differences in the regions’ anatomical connectivity. In the macaque monkey, connections with motor regions lie predominantly in ACCs29, giving access to information about the monkey’s own actions. Connections with visceral and social regions, including the STS, lie predominantly in ACCg29, giving access to information about other agents. Nevertheless, that it is the same computational parameter that is represented in ACCs and ACCg, suggests that parallel streams of learning occur within ACC for social and non-social information. It has been suggested that VMPFC activity might represent a common currency in which the value of different types of items might be encoded25, 30. Here we show that the same portion of the VMPFC represents the expected value of a decision based on the combination of information from social and experiential sources. However, the extent to which the VMPFC signal reflects each source of information during a decision is predicted by the extent to which the ACCs and ACCg modulate their activity at the point when information is learnt. If, as is suggested, the VMPFC response codes the expected value of a decision, then the ACCs response to each new outcome predicts the extent that this outcome will determine future valuation of an action; the ACCg response predicts the extent to which this outcome will determine future valuation of an individual.

Methods Summary

(For detailed methods see supplementary information). Short Description of task (Figure 1a). Subjects performed a decision-making task whilst undergoing FMRI, repeatedly choosing between blue and green rectangles, each of which had a different reward magnitude available on each trial. The chance of the correct colour being blue or green depended on the recent outcome history. Prior to the experiment, subjects were introduced to a confederate. At each trial, the confederate would choose between supplying the subject with the correct or incorrect option, unaware of the number of points available. The subject’s goal was to maximise the number of points gained during the experiment. In contrast, the confederate’s goal was to ensure that the eventual score would lie within one of two pre-defined ranges, known to the confederate but not the subject. The confederate might therefore reasonably give consistently helpful or unhelpful advice, but this advice might change as the game progressed (supplementary information). During the experiment, the confederate was replaced by a computer that gave correct advice on a prescribed set of trials. Subjects knew that the trial outcomes were determined by an inanimate computer program, but believed that the social advice came from an animate agent’s decision.

29 in total

Review 1. Learning and selective attention.

Authors: P Dayan; S Kakade; P R Montague
Journal: Nat Neurosci Date: 2000-11 Impact factor: 24.884

2. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops.

Authors: Saori C Tanaka; Kenji Doya; Go Okada; Kazutaka Ueda; Yasumasa Okamoto; Shigeto Yamawaki
Journal: Nat Neurosci Date: 2004-07-04 Impact factor: 24.884

3. Getting to know you: reputation and trust in a two-person economic exchange.

Authors: Brooks King-Casas; Damon Tomlin; Cedric Anen; Colin F Camerer; Steven R Quartz; P Read Montague
Journal: Science Date: 2005-04-01 Impact factor: 47.728

Review 4. Advances in functional and structural MR image analysis and implementation as FSL.

Authors: Stephen M Smith; Mark Jenkinson; Mark W Woolrich; Christian F Beckmann; Timothy E J Behrens; Heidi Johansen-Berg; Peter R Bannister; Marilena De Luca; Ivana Drobnjak; David E Flitney; Rami K Niazy; James Saunders; John Vickers; Yongyue Zhang; Nicola De Stefano; J Michael Brady; Paul M Matthews
Journal: Neuroimage Date: 2004 Impact factor: 6.556

Review 5. A neural substrate of prediction and reward.

Authors: W Schultz; P Dayan; P R Montague
Journal: Science Date: 1997-03-14 Impact factor: 47.728

6. Social perception from visual cues: role of the STS region.

Authors:
Journal: Trends Cogn Sci Date: 2000-07 Impact factor: 20.229

7. Perceptions of moral character modulate the neural systems of reward during the trust game.

Authors: M R Delgado; R H Frank; E A Phelps
Journal: Nat Neurosci Date: 2005-10-16 Impact factor: 24.884

8. A neural basis for social cooperation.

Authors: James Rilling; David Gutman; Thorsten Zeh; Giuseppe Pagnoni; Gregory Berns; Clinton Kilts
Journal: Neuron Date: 2002-07-18 Impact factor: 17.173

9. Dissociable roles of ventral and dorsal striatum in instrumental conditioning.

Authors: John O'Doherty; Peter Dayan; Johannes Schultz; Ralf Deichmann; Karl Friston; Raymond J Dolan
Journal: Science Date: 2004-04-16 Impact factor: 47.728

10. A role for the macaque anterior cingulate gyrus in social valuation.

Authors: P H Rudebeck; M J Buckley; M E Walton; M F S Rushworth
Journal: Science Date: 2006-09-01 Impact factor: 47.728

366 in total

1. Testing the reward prediction error hypothesis with an axiomatic model.

Authors: Robb B Rutledge; Mark Dean; Andrew Caplin; Paul W Glimcher
Journal: J Neurosci Date: 2010-10-06 Impact factor: 6.167

2. Great expectations: neural computations underlying the use of social norms in decision-making.

Authors: Luke J Chang; Alan G Sanfey
Journal: Soc Cogn Affect Neurosci Date: 2011-12-23 Impact factor: 3.436

3. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

Authors: Lusha Zhu; Kyle E Mathewson; Ming Hsu
Journal: Proc Natl Acad Sci U S A Date: 2012-01-18 Impact factor: 11.205

Review 4. Identity economics and the brain: uncovering the mechanisms of social conflict.

Authors: Scott A Huettel; Rachel E Kranton
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2012-03-05 Impact factor: 6.237

5. Age differences in default and reward networks during processing of personally relevant information.

Authors: Cheryl L Grady; Omer Grigg; Charisa Ng
Journal: Neuropsychologia Date: 2012-03-30 Impact factor: 3.139

Review 6. Imaging models of valuation during social interaction in humans.

Authors: Kenneth T Kishida; P Read Montague
Journal: Biol Psychiatry Date: 2012-04-14 Impact factor: 13.382

Review 7. Grist and mills: on the cultural origins of cultural learning.

Authors: Cecilia Heyes
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2012-08-05 Impact factor: 6.237

Review 8. Toward a neurobiology of delusions.

Authors: P R Corlett; J R Taylor; X-J Wang; P C Fletcher; J H Krystal
Journal: Prog Neurobiol Date: 2010-06-15 Impact factor: 11.685

Review 9. Interpreting developmental changes in neuroimaging signals.

Authors: Russell A Poldrack
Journal: Hum Brain Mapp Date: 2010-06 Impact factor: 5.038

Review 10. The social brain and reward: social information processing in the human striatum.

Authors: Jamil P Bhanji; Mauricio R Delgado
Journal: Wiley Interdiscip Rev Cogn Sci Date: 2013-10-08