| Literature DB >> 25566131 |
Daniel J Schad1, Elisabeth Jünger2, Miriam Sebold1, Maria Garbusow1, Nadine Bernhardt3, Amir-Homayoun Javadi4, Ulrich S Zimmermann2, Michael N Smolka3, Andreas Heinz1, Michael A Rapp5, Quentin J M Huys6.
Abstract
Theories of decision-making and its neural substrates have long assumed the existence of two distinct and competing valuation systems, variously described as goal-directed vs. habitual, or, more recently and based on statistical arguments, as model-free vs. model-based reinforcement-learning. Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation. Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function. This suggests shared cognitive and neural processes; provides a bridge between literatures on intelligence and valuation; and may guide the development of process models of different valuation components. Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation.Entities:
Keywords: cognitive abilities; decision-making; fluid intelligence; habitual and goal-directed system; model-based and model-free learning; reward
Year: 2014 PMID: 25566131 PMCID: PMC4269125 DOI: 10.3389/fpsyg.2014.01450
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1(A) Trial structure: Step 1 consisted of a choice between two abstract gray stimuli. The unchosen stimulus faded away while the chosen stimulus was highlighted with a red frame and moved to the top of the screen, where it remained visible for 1.5 s. In Step 2 a second, colored, stimulus pair appeared. Step 2 choices resulted either in a win of 20 Cents or no win. (B) Transition structure: Each first stage stimulus led to one, fixed, second stage pair in 70% of the trials (common transition), and to the other second stage stimulus pair in 30% of the trials (rare transition). Reinforcement probabilities for each second stage stimulus changed slowly and independently between 25% and 75% according to Gaussian random walks with reflecting boundaries (Daw et al., 2011). Win probabilities, P (reward), are displayed as a function of trial number. (C) Model predictions: Predictions from the computational model (Daw et al., 2011) based on the model-free (left panel) vs. model-based (right panel) system for the probability to repeat the choice from the previous trial as a function of reward (rew., rewarded; unrew., unrewarded) and transition type at the previous trial. Model-free choice predicts a main effect of reward, and no effect of transition. Model-based choice predicts an interaction of transition × reward. Figure partly adapted from Sebold et al. (2014).
Logistic mixed-effects model results testing the effects of individual cognitive abilities on reward, transition frequency, and their interaction in first-stage choice repetition.
| Main effect ability | |||||||||
| ability linear | 1.65 | 0.003 | 0.03 | 1.43 | 0.06 | 0.11 | 1.47 | 0.04 | 0.09 |
| ability quadratic | 0.81 | 0.02 | 0.08 | ||||||
| Reward × ability | |||||||||
| ability linear | 1.18 | 0.096 | 0.48 | 1.12 | 0.35 | 0.83 | 1.11 | 0.43 | 0.83 |
| ability quadratic | 0.78 | 0.002 | 0.02 | ||||||
| Transition × ability | |||||||||
| ability linear | 1.20 | 0.02 | 0.08 | 1.12 | 0.17 | 0.42 | 1.07 | 0.44 | 0.63 |
| ability quadratic | 0.92 | 0.14 | 0.42 | ||||||
| Reward × Transition × ability | |||||||||
| ability linear | 2.23 | 0.003 | 0.03 | 1.89 | 0.02 | 0.06 | 1.79 | 0.03 | 0.10 |
| ability quadratic | – | – | – | ||||||
p < 0.10;
p < 0.05;
p < 0.01;
.
Computational mixed-effects model parameter estimates.
| Mean | 5.00 | 3.63 | 0.39 | 0.25 | 0.45 | 0.49 | 0.12 |
| Subject SD | 0.46 | 0.18 | 0.34 | 0.51 | 0.44 | 0.80 | 0.16 |
Standard Deviations (SD) of the parameters are given on the transformed scale used for parameter fitting and statistical analysis. Statistical tests for model components are based on Bayesian model comparison (see SMO). Subject SD indicates variability of estimated model parameters across individual participants. In the model fitting, we allowed all model parameters to vary across subjects; this procedure effectively de-confounds effects of cognitive abilities on model parameters from any other process captured in the computational model.
Summary statistics of fluid intelligence scores.
| 68.6 (15.9) | 34.2 (12.1) | 2.2 (0.76) | 7.6 (2.6) | 32 (3.2) | |
| 35–98 | 20–70 | 1.1–4.7 | 4–14 | 24–37 | |
| TMT speed | −0.575 | ||||
| TMT exec | −0.099 | −0.282 | |||
| Digit Span Backwards | 0.306 | −0.121 | −0.090 | ||
| MWT-B | 0.576 | −0.490 | −0.205 | 0.379 |
p < 0.01;
p < 0.1;
DSST, Digit Symbol Substitution Task score; TMTspeed, Trail Making Test A in s; TMTexec, Trail Making Test B in s/TMTspeed); Digit Span, Digit Span Backwards maximum span retained; MWT-B, German vocabulary test.
Figure 2(A–C) Choice repetition probabilities: Average proportion of trials on which participants repeated their previous choice, as a function of outcome (reward vs. no reward) and transition (common vs. rare) at the previous trial. Results are presented for individuals with a low (A, 35–59), medium (B, 59–75), and high (C, 76–98) performance score on the Digit Symbol Substitution Test (DSST). Error bars are subject-based standard errors of the means. (D–E) Individual reward and transition effects and DSST performance: Individual estimates of the main effect of reward (= rewarded − unrewarded; D) and the reward × transition interaction (= rewarded common − rewarded rare − unrewarded common + unrewarded rare; E) on repetition-probabilities (p_repeat: repetition = 1, switch = 0) as a function of individual DSST scores. Lines show the estimated quadratic (D) and linear (E) effects with 95% confidence intervals.
Figure 3Individual parameter estimates and DSST performance: Maximum posterior parameter values of the dual-system reinforcement learning model for each participant as a function of performance on the . The lines represent predictions from linear regressions of each model parameter on DSST scores, with 95% confidence intervals (CI). (A–D) Regression lines and CI in unbounded fitting-space were transformed to model-space for plotting by passing them through the inverse-logit function. (A) Best-fitting individual parameter values for the weighting parameter ω, which determines the balance between model-free (weight = 0) and model-based (weight = 1) control. (B) Regression of best-fitting weighting parameter values on the interaction between DSST scores × working memory span (median-split factor). (C) Best-fitting parameter values for the second-stage learning rate α2. (D) The lambda (λ) parameter determines update of model-free step 1 action values by step 2 prediction errors. (E) Repetition factor, p, indicates how strongly individuals tend to repeat previous actions.