| Literature DB >> 30759077 |
Nitzan Shahar1,2, Tobias U Hauser1,2, Michael Moutoussis1,2, Rani Moran1,2, Mehdi Keramati1,2, Raymond J Dolan1,2.
Abstract
A well-established notion in cognitive neuroscience proposes that multiple brain systems contribute to choice behaviour. These include: (1) a model-free system that uses values cached from the outcome history of alternative actions, and (2) a model-based system that considers action outcomes and the transition structure of the environment. The widespread use of this distinction, across a range of applications, renders it important to index their distinct influences with high reliability. Here we consider the two-stage task, widely considered as a gold standard measure for the contribution of model-based and model-free systems to human choice. We tested the internal/temporal stability of measures from this task, including those estimated via an established computational model, as well as an extended model using drift-diffusion. Drift-diffusion modeling suggested that both choice in the first stage, and RTs in the second stage, are directly affected by a model-based/free trade-off parameter. Both parameter recovery and the stability of model-based estimates were poor but improved substantially when both choice and RT were used (compared to choice only), and when more trials (than conventionally used in research practice) were included in our analysis. The findings have implications for interpretation of past and future studies based on the use of the two-stage task, as well as for characterising the contribution of model-based processes to choice behaviour.Entities:
Mesh:
Year: 2019 PMID: 30759077 PMCID: PMC6391008 DOI: 10.1371/journal.pcbi.1006803
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
Fig 1A schematic of the two-stage task (panel A) and an example of a random walk used to generate the true expected value for each of the four bandits at the task second-stage (panel B). At the first-stage participants choose between two options (represented by abstract fractal images) that determined the presentation of the second-stage via fixed transition probabilities of 70% (‘common’) or 30% (‘rare’). At the second-stage, participants again choose between two bandits that led to receipt of reward (£0 or £1 play pounds). Note the second-stage included two pairs of bandits where the composition of each pair was fixed, but where the value of each bandit drifted slowly and independently. More specifically the reward associated with the second-stage bandits were subjected to random walks and thus had to be constantly learned by participants.
Fig 2Examining the relationship between model-agnostic scores (MB-I(choice), MB-II(RT)) and the w-parameter.
To obtain these plots we gradually increased the w-parameter from 0 to 1 in .1 steps, each time simulating 200 experiments with 5,000 trials each using the DDM-RL model (All other parameters were selected randomly and uniformly form a pre-defined range, of α1/2[0,1], λ[0,1], w[0,1], p[0,.5], b1/2[1,10], a1/2[1,3], τ1/2[.01,.5] for each experiment). (A) For each of the 200 experiments, we averaged MB-I(choice) and MB-II(RT) (see Eqs 11 and 13) scores. We then standardized the eleven mean scores, separately for each MB score. Results showed a strong relationship between the w-parameter and both model-agnostic measures. (B) Here we illustrate how deployment of model-based strategies in the first-stage is affecting MB-II(RT) via systematic effects on second-stage value discrimination. Specifically, Panel B presents averaged ΔQ-value (max–min Q-value) for the second-stage state the agent visited. Results confirmed that higher w-parameter values lead to higher/lower value discriminability (ΔQ-value) after common/uncommon transitions, respectively. Notably, in the DDM-RL model ΔQ-values are directly and positively associated with drift-rates and hence contribute to faster RTs (see Eq 8). This result illustrate why higher w-parameter is associated with quicker/slower RT2 after common/uncommon transitions, respectively. (C/D) To further demonstrate how deployment of model-based strategies in the first-stage leads to systematic value differences in the second-stage we labelled in each trial the best and worst state (state that included the highest Q-value out of the four available second-stage bandits, and the alternative state). Panel C shows that across all simulation the best state was related with higher value discriminability (higher ΔQ-value), regardless of the w-parameter. Panel D further shows that higher w-parameter is related with higher probability of visiting the best state by means of common transitions (see Eq 3). Therefore, Panels C & D illustrates the reason that higher w-parameter leads to higher value discriminability after common trials as illustrated in Panel B.
Correlation estimates describing the relationship between the different model-based estimates.
| MB-I (choice) | MB-II (RT) | |
|---|---|---|
| MB-II (RT) | .53 | . |
| w-parameter (RL) | .38 (.30-.45) | .26 (.18-.34) |
| w-parameter (DDM-RL) | .41 (.33-.47) | .24 (.16-.32) |
| MB-II (RT) | .61 | . |
| w-parameter (RL) | .31 | .33 |
| w-parameter (DDM-RL) | .37 | .36 |
Note.
aPearson correlation estimate.
bSpearman rank estimate
Fig 3(A/B/C) Scatterplots showing the relationship between the three hierarchical model-based estimates obtained from empirical data (scores were averaged across baseline and follow-up).
Spearman's correlation estimating the relationship between the true and recovered parameters.
| Trials in the analysis | |||||
|---|---|---|---|---|---|
| RL model | α1 | .54 | .62 | .68 | .92 |
| α2 | .95 | .98 | .99 | .99 | |
| λ | .53 | .71 | .71 | .88 | |
| .82 | .90 | .91 | .97 | ||
| β1 | .82 | .90 | .93 | .98 | |
| β2 | .89 | .96 | .98 | .99 | |
| DDM-RL model | α1 | .68 | .72 | .84 | .94 |
| α2 | .99 | .99 | .99 | .99 | |
| λ | .58 | .75 | .83 | .92 | |
| .91 | .94 | .97 | .99 | ||
| .93 | .93 | .99 | .99 | ||
| .93 | .98 | .99 | .99 | ||
| τ1 | .99 | .99 | .99 | .99 | |
| .99 | .99 | .99 | .99 | ||
| .97 | .99 | .99 | .99 | ||
| τ2 | .99 | .99 | .99 | .99 | |
Fig 4(A/B) Scatter plots for true compared to recovered w-parameter (estimating model-based/free trade off). Results show a better correlation for DDM-RL (panel B; modeling choice & RT, r = .9) compared with an RL (choice only) model previously reported in the literature (panel A, r = .62).
Psychometric properties for model-based estimates.
| MB-I (choice) | Individual scores | .52 | .28 |
| Hierarchical scores | .81 | .40 | |
| MB-II (RT) | Individual scores | .87 | .33 |
| Hierarchical scores | .87 | .33 | |
| Latent score (choice & RT) | . | .75 | |
| w-parameter (RL model) | Individual scores | . | .16 |
| Hierarchical scores | . | .21 | |
| w-parameter (DDM-RL model) | Individual scores | . | .20 |
| Hierarchical scores | . | .14 |
aPearson correlation estimate.
bSpearman rank correlation estimate.
cSpearman-Brown corrected Pearson correlation estimate.
Estimates in brackets represent 95% confidence intervals.
Fig 5Internal consistency estimates for MB-I(choice) and MB-II(RT).
In all figures, x-axis represents the number of trials in the analysis, and y-axis the Pearson’s correlation (corrected using Spearman-Brown formula) between the scores calculated for odd and even trials. (A) Internal stability for MB-I(choice) obtained from simulated data of RL vs. DDM-RL models. Results suggest that reliability reached criteria for the RL-DDM with fewer trials compared to the RL model. (B) Internal stability for MB-II(RT) obtained from simulated data of the DDM-RL model. (C/D) Internal stability for MB-I(choice) and MB-II(RT) calculated from empirical data (follow-up only). (E) Internal consistency in empirical data for the four conditions that assemble MB-I(choice) (CR: common-rewarded, CU: common-unrewarded, UR: uncommon-rewarded, UU: uncommon-unrewarded, see Eq 9–11). Ribbons present 95% CI. The horizontal line represents the .7 criteria for internal stability.
Statistical power (percent of studies that rejected the null hypothesis, given an effect exists) for a between group design (control vs. experiment).
Table values show the chance of finding a statistically significant between group effect as a function of true effect-size, sample-size and number of trials in the experiment.
| 30 participants | 4.5% | 5.8% | 7.5% | |
| 100 participants | 6.2% | 9% | 8.2% | |
| 500 participants | 14.1% | 24% | 28.2% | |
| 30 participants | 7% | 7% | 7.9% | |
| 100 participants | 11% | 15% | 15.1% | |
| 500 participants | 52.8% | 64% | 67% | |
| 30 participants | 11.5% | 10.9% | 13% | |
| 100 participants | 31.5% | 30.3% | 37.8% | |
| 500 participants | 94.7% | 90.7% | 96.9% |