| Literature DB >> 30958826 |
Maël Lebreton1,2,3,4, Karin Bacily1,2, Stefano Palminteri5,6,7, Jan B Engelmann1,2,8.
Abstract
The ability to correctly estimate the probability of one's choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests that this metacognitive process-confidence judgment- is susceptible to numerous biases. Here, we investigate the effect of outcome valence (gains or losses) on confidence while participants learned stimulus-outcome associations by trial-and-error. In two experiments, participants were more confident in their choices when learning to seek gains compared to avoiding losses, despite equal difficulty and performance between those two contexts. Computational modelling revealed that this bias is driven by the context-value, a dynamically updated estimate of the average expected-value of choice options, necessary to explain equal performance in the gain and loss domain. The biasing effect of context-value on confidence, revealed here for the first time in a reinforcement-learning context, is therefore domain-general, with likely important functional consequences. We show that one such consequence emerges in volatile environments, where the (in)flexibility of individuals' learning strategies differs when outcomes are framed as gains or losses. Despite apparent similar behavior- profound asymmetries might therefore exist between learning to avoid losses and learning to seek gains.Entities:
Mesh:
Year: 2019 PMID: 30958826 PMCID: PMC6472836 DOI: 10.1371/journal.pcbi.1006973
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Reinforcement-learning.
Model comparison. AIC, Akaike Information Criterion (computed with nLLmax); BIC, Bayesian Information Criterion (computed with nLLmax); DF, degrees of freedom; nLLmax, negative log likelihood; nLPPmax, negative log of posterior probability; EF, expected frequency of the model given the data; XP, exceedance probability (computed using the Laplace approximation of the model evidence ME). The table summarizes for each model its fitting performances.
| ABSOLUTE | 3 | 385±20 | 392±20 | 404±20 | 391±20 | 0.28 | 0.02 | |
| RELATIVE | 4 | 345±24 | 353±24 | 369±24 | 354±24 | 0.72 | 0.98 | |
| ABSOLUTE | 3 | 411±15 | 417±15 | 429±15 | 416±15 | 0.05 | 0.0 | |
| RELATIVE | 4 | 355±16 | 363±16 | 379±16 | 362±16 | 0.95 | 1.0 |
Reinforcement-learning.
Free parameters. ABSOLUTE, absolute value learning model; RELATIVE, relative value learning model (best-fitting model); LL optimization, parameters obtained when minimizing the negative log likelihood; LPP optimization, parameters obtained when minimizing the negative log of the posterior probability. The table summarizes for each model the likelihood maximizing (best) parameters averaged across subjects. Data are expressed as mean±s.e.m. The values retrieved from the LPP optimization procedure are those used to generate the variable used in the confidence glme models.
| ABSOLUTE | RELATIVE | ABSOLUTE | RELATIVE | ||
| Inverse temperature ( | 6.29±0.63 | 54.04±38.8 | 6.07±0.61 | 12.65±1.47 | |
| Factual learning rate ( | 0.37±0.05 | 0.23±0.04 | 0.36±0.04 | 0.24±0.04 | |
| Counterfactual learning rate ( | 0.13±0.03 | 0.07±0.02 | 0.15±0.03 | 0.09±0.02 | |
| Context learning rate ( | - | 0.46±0.10 | - | 0.46±0.10 | |
| ABSOLUTE | RELATIVE | ABSOLUTE | RELATIVE | ||
| Inverse temperature ( | 102.00±99.49 | 83.05±73.15 | 2.65±0.29 | 6.86±0.81 | |
| Factual learning rate ( | 0.49±0.07 | 0.26±0.04 | 0.49±0.07 | 0.24±0.04 | |
| Counterfactual learning rate ( | 0.24±0.08 | 0.12±0.04 | 0.24±0.08 | 0.13±0.03 | |
| Context learning rate ( | - | 0.41±0.09 | - | 0.40±0.09 | |
Modelling confidence ratings.
Estimated fixed-effect coefficients from generalized linear mixed-effect models.
| Intercept ( | 0.52±0.04 | 0.72±0.02 | 0.53±0.04 | |
| Choice difficulty ( | 0.33±0.06 | 0.47±0.07 | 0.30±0.05 | |
| Preceding confidence ( | 0.28±0.04 | - | 0.28±0.03 | |
| Context value ( | - | 0.45±0.14 | 0.47±0.14 | |
| Intercept ( | 0.53±0.03 | 0.75±0.02 | 0.53±0.03 | |
| Choice difficulty ( | 0.18±0.02 | 0.25±0.04 | 0.17±0.03 | |
| Preceding confidence ( | 0.29±0.04 | - | 0.30±0.04 | |
| Context value ( | - | 0.17±0.7 | 0.16±0.06 | |
Note that the number of degrees-of-freedom differs between REDUCED GLME 1 and 2 in Experiment 1, because some participants failed to answer within the allocated time, causing missed observations. This has a lower impact on the number of usable observations in the REDUCED GLME 2 because this model does not make use of “preceding confidence” (which are missing observations–in addition to the missed trials- in the REDUCED GLME 2 and FULL FLME).
Modelling performance and reaction times.
Estimated fixed-effect coefficients from generalized linear mixed-effect models (performance: logistic regression; reaction times: linear regression).
| Intercept ( | -0.84±0.20 | 1.90±0.09 | |
| Choice difficulty ( | 9.90±1.67 | -0.65±0.20 | |
| Preceding confidence ( | 1.28±0.36 | -0.24±0.14 | |
| Context value ( | 1.19±0.54 | -0.37±0.11 | |
| Intercept ( | -0.71±0.22 | 1.68±0.09 | |
| Choice difficulty ( | 5.29±0.76 | -0.41±0.09 | |
| Preceding confidence ( | 1.21±0.33 | -0.54±0.10 | |
| Context value ( | 0.30±0.28 | -0.17±0.05 | |
Assessing the specific role of context values on confidence.
Estimated fixed-effect coefficients from generalized linear mixed-effect models.
| Intercept ( | 0.58±0.05 | 0.68±0.03 | |
| Choice difficulty ( | 0.27±0.05 | 0.13±0.03 | |
| Preceding confidence ( | 0.26±0.03 | 0.24±0.04 | |
| Context value ( | 0.43±0.14 | 0.15±0.06 | |
| Reaction times ( | -0.03±0.01 | -0.09±0.01 | |
| Intercept ( | 0.53±0.04 | 0.53±0.03 | |
| Choice difficulty ( | 0.24±0.05 | 0.14±0.03 | |
| Preceding confidence ( | 0.28±0.04 | 0.30±0.04 | |
| Context value ( | 0.10±0.05 | 0.06±0.02 | |
| q-values sum ( | 0.22±0.09 | 0.06±0.02 | |