| Literature DB >> 24312028 |
Massimo Silvetti1, Ruth Seurinck, Marlies E van Bochove, Tom Verguts.
Abstract
Decision making under uncertainty is challenging for any autonomous agent. The challenge increases when the environment's stochastic properties change over time, i.e., when the environment is volatile. In order to efficiently adapt to volatile environments, agents must primarily rely on recent outcomes to quickly change their decision strategies; in other words, they need to increase their knowledge plasticity. On the contrary, in stable environments, knowledge stability must be preferred to preserve useful information against noise. Here we propose that in mammalian brain, the locus coeruleus (LC) is one of the nuclei involved in volatility estimation and in the subsequent control of neural plasticity. During a reinforcement learning task, LC activation, measured by means of pupil diameter, coded both for environmental volatility and learning rate. We hypothesize that LC could be responsible, through norepinephrinic modulation, for adaptations to optimize decision making in volatile environments. We also suggest a computational model on the interaction between the anterior cingulate cortex (ACC) and LC for volatility estimation.Entities:
Keywords: ACC; learning rate; locus coeruleus; norepinephrine; plasticity; prediction error; reinforcement learning; volatility
Year: 2013 PMID: 24312028 PMCID: PMC3826478 DOI: 10.3389/fnbeh.2013.00160
Source DB: PubMed Journal: Front Behav Neurosci ISSN: 1662-5153 Impact factor: 3.558
Figure 1(A) Trial timeline (left rewarded, right unrewarded trial). Participants’ choice was communicated by appearance of a yellow bar under the selected figure. Time intervals indicate jittering. (B) Figure sets. Each set was assigned a specific reward probability (p_1 or p_2). (C) Experimental timeline showing reward rates for each figure set (color) as a function of time. Each square represents a block of 18 trials. During the volatile (Vol) environment, reward probabilities were switched between sets each 18 trials.
Figure 2(A) Behavioral learning rate (± s.e.m.) as a function of SE. (B) Average pupil size during the choice epoch (± s.e.m.) as a function of SE. (C) Pupil size time course grand averages (baseline corrected) for choice epoch (± s.e.m.). Vertical bar: choice onset. Timeline in milliseconds. Grey horizontal bars indicate the time windows in which the difference between Vol and Stat2 is significant (cluster level family-wise corrected p < .05) (D) Scatter plot representing the covariation between pupil size and learning rate during the Vol period. Each data point is a single subject average. Regression line is shown in black.
Figure 3(A) Schema representing the RVPM. The system consists in a module simulating the ACC (that estimates reward expectations, V unit, and computes PEs, δ units), a module simulating the dopaminergic brainstem nuclei (VTA), a module making decisions on the basis of the ACC expectations (Actor module), and a module representing stimuli or possible actions (Cues module). Once the choice is made, the environment provides an outcome that is encoded by the VTA module, which delivers to the ACC module a dopaminergic reward signal. The VTA module receives recurrent connections from the ACC module, allowing dopamine shifting from reward period to cue period. (B) Results of RVPM simulations supporting the model, from Silvetti et al. (2013). The plot shows the ACC module PE signal (sum of all the units’ activity) as a function of trial number, in three different SEs. Although the average PE signal (green dashed line) is highest during Stat2 (highly uncertain environment), Vol environment triggers very strong phasic PE activity. This property can be exploited for volatility detection, e.g., the red dashed line indicates a possible threshold for volatility detection based on PE magnitude.