| Literature DB >> 30374019 |
Sophie Bavard1,2,3, Maël Lebreton4,5,6, Mehdi Khamassi7,8, Giorgio Coricelli9,10, Stefano Palminteri11,12,13.
Abstract
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects' behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation-two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.Entities:
Mesh:
Year: 2018 PMID: 30374019 PMCID: PMC6206161 DOI: 10.1038/s41467-018-06781-2
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Experimental design and normalization process. a Learning task with four different contexts: reward/big, reward/small, loss/small, and loss/big. Each symbol is associated with a probability (P) of gaining or losing an amount of money or magnitude (M). M varies as a function of the choice contexts (reward seeking: +1.0€ or +0.1€; loss avoidance: −1.0€ or −0.1€; small magnitude: +0.1€ or −0.1€; big magnitude: +1.0€ or −1.0€). b The graph schematizes the transition from absolute value encoding (where values are negative in the loss avoidance contexts and smaller in the small magnitude contexts) to relative value encoding (complete adaptation as in the RELATIVE model), where favorable and unfavorable options have similar values in all contexts, thanks to both reference-point and range adaptation
Correct choice rate of the learning sessions as a function of task factors in Experiments 1, 2 and both experiments
| Experiment 1 ( | Experiment 2 ( | Both experiments ( | ||||
|---|---|---|---|---|---|---|
| Val | 0.002 | 0.969 | 0.285 | 0.597 | 0.167 | 0.684 |
| Inf | – | – | 7.443 | 0.0095** | – | – |
| Mag | 4.872 | 0.0398* | 4.267 | 0.0456* | 9.091 | 0.00378** |
| Val × Inf | – | – | 1.037 | 0.315 | – | – |
| Val × Mag | 4.011 | 0.0597 | 0.08 | 0.779 | 1.755 | 0.19 |
| Inf × Mag | – | – | 0.006 | 0.939 | — | — |
| Val × Inf × Mag | – | – | 0.347 | 0.559 | — | — |
**P < 0.01; *P < 0.05, t-test
Symbol choice rate of the transfer test as a function of task factors and option correctness in Experiments 1, 2 and both experiments
| Experiment 1 ( | Experiment 2 ( | Both experiments ( | ||||
|---|---|---|---|---|---|---|
| Valence | 33.42 | 1.43e−05*** | 43.78 | 7.23e−08*** | 76 | 3.38e−12*** |
| Favorableness | 57.66 | 3.6e−07*** | 149.5 | 6.46e−15*** | 203.5 | <2e−16*** |
| Magnitude | 2.929 | 0.103 | 4.225 | 0.0466* | 0.525 | 0.472 |
| Val × Fav | 4.039 | 0.0589 | 6.584 | 0.0142* | 10.8 | 0.00171** |
| Val × Mag | 11.68 | 0.00289** | 3.565 | 0.0665 | 11.55 | 0.00122** |
| Fav × Mag | 10.8 | 0.00388** | 0.441 | 0.51 | 4.131 | 0.0466* |
| Val × Fav × Mag | 8.241 | 0.00979** | 1.529 | 0.224 | 7.159 | 0.00964** |
***P < 0.001; *P < 0.05; **P < 0.01; t-test
BICs as a function of the dataset used for parameter optimization (Learning sessions, Transfer test or Both) and the computational model
| Experiment 1 ( | Experiment 2 ( | Both experiments ( | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Learning sessions (nt = 160) | Transfer test (nt = 112) | Both (nt = 272) | Learning sessions (nt = 160) | Transfer test (nt = 112) | Both (nt = 272) | Learning sessions (nt = 160) | Transfer test (nt = 112) | Both (nt = 272) | |
| ABSOLUTE (df = 2/3) | 179.8 ± 5.9 | 113.6 ± 5.7 | 295.1 ± 9.9 | 190.9 ± 5.9 | 126.9 ± 4.1 | 325.4 ± 6.5 | 187.2 ± 3.8 | 122.4 ± 3.4 | 315.3 ± 5.6 |
| RELATIVE (df = 2/3) | 193.3 ± 4.5 | 135.8 ± 5.1 | 329.6 ± 8.0 | 185.1 ± 5.6 | 121.1 ± 4.0 | 306.0 ± 7.3 | 187.9 ± 4.0 | 126.0 ± 3.3 | 313.9 ± 5.7 |
| HYBRID (df = 3/4) | 178.3 ± 6.0 | 109.3 ± 5.0 | 284.6 ± 9.1 | 181.5 ± 5.8 | 105.8 ± 4.1 | 290.5 ± 8.0 | 180.5 ± 4.3 | 106.9 ± 3.2 | 288.5 ± 6.1 |
| POLICY (df = 2/3) | 185.4 ± 6.9 | 123.7 ± 6.3 | 311.0 ± 12.2 | 190.1 ± 4.9 | 139.4 ± 3.9 | 334.6 ± 6.5 | 188.5 ± 3.9 | 134.2 ± 3.4 | 326.7 ± 6.0 |
| UTILITY (df = 3/4) | 173.9 ± 6.5 | 107.5 ± 6.3 | 282.2 ± 10.8 | 183.4 ± 5.6 | 123.1 ± 4.5 | 310.1 ± 7.1 | 180.2 ± 4.3 | 117.9 ± 3.8 | 300.8 ± 6.2 |
Nt, number of trials; df, degree of freedom
Fig. 2Behavioral results and model simulations. a Correct choice rate during the learning sessions. b Big magnitude contexts’ minus small magnitude contexts’ correct choice rate during the learning sessions. c and d Choice rate in the transfer test. Colored bars represent the actual data. Big black (RELATIVE), white (ABSOLUTE), and gray (HYBRID) dots represent the model-predicted choice rate. Small light gray dots above and below the bars represent individual subjects (N = 60). White stars indicate significant difference compared to zero. Error bars represent s.e.m. **P < 0.01, t-test. Green arrows indicate significant differences between actual and predicted choices at P < 0.001, t-test
Fig. 3Transfer test behavioral results and model simulations. Colored map of pairwise choice rates during the transfer test for each symbol when compared to each of the seven other symbols, noted here generically as ‘option 1′ and ‘option 2′. Comparisons between the same symbols are undefined (black squares). a Experimental data, b ABSOLUTE model, c RELATIVE model, and d HYBRID model
Fig. 4Computational properties and behavioral correlates of value normalization. a Likelihood difference (from model fitting) between the RELATIVE and the ABSOLUTE models over the 80 trials of the task sessions for both experiments (N = 60). A negative likelihood difference means that the ABSOLUTE model is the best-fitting model for the trial and a positive likelihood difference means that the RELATIVE model is the best-fitting model for the trial. Green dots: likelihood difference significantly different from 0 (P < 0.05, t-test). b Likelihood difference between the RELATIVE and the ABSOLUTE models over the first part of the task (40 first trials) and the last part (40 last trials) for both experiments. c Likelihood difference between the RELATIVE and the ABSOLUTE models for the two experiments. A negative likelihood difference means that the ABSOLUTE model is the best-fitting model for the experiment and a positive likelihood difference means that the RELATIVE model is the best-fitting model for the experiment. d Subject-specific free parameter weight (ω) comparison for the two experiments. e Subject-specific free parameter weight (ω) as a function of correct debriefing for the two questions (“fixed pairs” and “number of pairs”). f Debriefing as a function of the weight parameter. Small light gray dots above and below the bars in a–f represent individual subjects (N = 60). g and h Correct choice rate as a function of subjects’ weight parameter in the learning sessions and the transfer test for both Experiments 1 and 2. One dot corresponds to one participant (N = 60); green lines represent the linear regression calculations. Error bars represent s.e.m. ***P < 0.001, **P < 0.01, *P < 0.05, t-test
Model parameters of the HYBRID model as a function of the dataset used for parameter optimization (learning sessions, transfer test or Both) and the computational model
| Experiment 1 ( | Experiment 2 ( | Both experiments ( | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Learning sessions | Transfer test | Both | Learning sessions | Transfer test | Both | Learning sessions | Transfer test | Both | |
|
| 0.15 ± 0.04 | 0.12 ± 0.03 | 0.09 ± 0.02 | 0.30 ± 0.11 | 0.13 ± 0.04 | 0.17 ± 0.04 | 0.25 ± 0.08 | 0.13 ± 0.03 | 0.15 ± 0.03 |
|
| 0.25 ± 0.06 | 0.30 ± 0.08 | 0.14 ± 0.04 | 0.23 ± 0.04 | 0.34 ± 0.07 | 0.20 ± 0.04 | 0.24 ± 0.04 | 0.33 ± 0.05 | 0.18 ± 0.03 |
|
| — | — | — | 0.16 ± 0.04 | 0.25 ± 0.05 | 0.16 ± 0.03 | — | — | — |
|
| 0.29 ± 0.07 | 0.34 ± 0.06 | 0.34 ± 0.06 | 0.52 ± 0.06 | 0.58 ± 0.06 | 0.58 ± 0.05 | 0.44 ± 0.05 | 0.50 ± 0.05 | 0.50 ± 0.04 |