| Literature DB >> 29694347 |
Falk Lieder1, Amitai Shenhav2, Sebastian Musslick3, Thomas L Griffiths4.
Abstract
The human brain has the impressive capacity to adapt how it processes information to high-level goals. While it is known that these cognitive control skills are malleable and can be improved through training, the underlying plasticity mechanisms are not well understood. Here, we develop and evaluate a model of how people learn when to exert cognitive control, which controlled process to use, and how much effort to exert. We derive this model from a general theory according to which the function of cognitive control is to select and configure neural pathways so as to make optimal use of finite time and limited computational resources. The central idea of our Learned Value of Control model is that people use reinforcement learning to predict the value of candidate control signals of different types and intensities based on stimulus features. This model correctly predicts the learning and transfer effects underlying the adaptive control-demanding behavior observed in an experiment on visual attention and four experiments on interference control in Stroop and Flanker paradigms. Moreover, our model explained these findings significantly better than an associative learning model and a Win-Stay Lose-Shift model. Our findings elucidate how learning and experience might shape people's ability and propensity to adaptively control their minds and behavior. We conclude by predicting under which circumstances these learning mechanisms might lead to self-control failure.Entities:
Mesh:
Year: 2018 PMID: 29694347 PMCID: PMC5937797 DOI: 10.1371/journal.pcbi.1006043
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
The core assumption of the LVOC model explains the learning effects observed in five different cognitive control experiments.
| Phenomenon | Explanation of the LVOC model | |
|---|---|---|
| Lin et al. (2016), Exp. 1 | In the training block, participants learn to find the target increasingly faster when it always appears in a location with a certain color. In the test block, participants are significantly slower on trials that violate this regularity. | People learn to predict the value of attending to different locations from their color. |
| Krebs et al. (2010), Exp. 1 | People come to name the color of incongruent words faster and more accurately for colors for which performance is rewarded. | People learn to predict the value of increasing control intensity from the color of the word. |
| Braem et al. (2012), Exp. 1 | On a congruent Flanker trial, people are faster when the previous trial was rewarded and congruent than when it was unrewarded and congruent, but the opposite holds when the previous trial was incongruent. These effects are amplified in people with high reward sensitivity. | People learn to exert more control on incongruent trials. Thus, rewarded incongruent trials tend to reinforce higher control signals while rewarded congruent trials tend to reinforce low control signals. Thus, people increase control after the former and lower control after the latter. |
| Bugg et al. (2008), Exp. 2 | People become faster and more accurate at naming the color of an incongruently colored word when it is usually incongruent than when it is usually congruent. | People learn that exerting more control is more valuable when the color or word is predictive of incongruence. |
| Bugg et al. (2011), Exp. 2 | People are faster at naming animals in novel, incongruently labelled images when that species was mostly incongruently labelled in the training phase than when it was mostly congruently labelled. | People learn that exerting more control is more valuable when the semantic category of the picture is predictive of incongruence. |
Fig 1Learning to control the allocation of attention.
a) Visual search task used by Lin et al. (2016). b) Human data from Experiment 1 of Lin et al. (2016). c) Predictions of the LVOC model. d) Fit of Win-Stay Lose-Shift model. e) Fit of Rescorla-Wagner model.
Fig 2LVOC model captures that in the paradigm by Krebs et al.
(a) People learn to exert more cognitive control on stimuli whose features predict that performance will be rewarded which manifests in faster responses (b) and fewer errors (c).
Fig 3Metacognitive reinforcement learning captures the effect of reward on learning from experienced conflict observed by Braem et al.
(2012). a) Illustration of the Flanker task by Braem et al. (2012). b) Human data by Braem et al. (2012). c) Fit of LVOC model. d) Fit of Rescorla-Wagner model. e) Fit of Win-Stay Lose-Shift model.
Fig 4The LVOC model captures the finding that people learn to adjust their control intensity based on features that predict incongruence.
a) Color-Word Stroop paradigm by Bugg et al. (2008). b-c) LVOC model captures that people learn to exploit features that predict incongruency to respond faster and more accurately on incongruent trial. d) Picture-Word Stroop paradigm by Bugg, Jacoby, and Chanani (2011). e-f) Just as human participants, the LVOC model responds more quickly and accurately to novel exemplars from animal categories that it previously learned to associate with more frequent incongruent trials.
Model parameters used in the simulations of empirical findings.
| Krebs, et al. (2010) | 1.60 | −0.01 | 3 | 0.05 | 1.60¢ | 3.5% |
| Braem et al. (2012) | 4.17 | −2 | 2.75 | 5 | 4.17¢ | 0.8% |
| Bugg et al. (2008) | 1.95 | −2.1 | 2.65 | 3.01 | 3.89¢ | 0.4% |
| Bugg et al. (2011) | 5 | −2 | 2.75 | 3 | 18.00¢ | 0.8% |