| Literature DB >> 34389808 |
Alireza Soltani1, Etienne Koechlin2.
Abstract
The real world is uncertain, and while ever changing, it constantly presents itself in terms of new sets of behavioral options. To attain the flexibility required to tackle these challenges successfully, most mammalian brains are equipped with certain computational abilities that rely on the prefrontal cortex (PFC). By examining learning in terms of internal models associating stimuli, actions, and outcomes, we argue here that adaptive behavior relies on specific interactions between multiple systems including: (1) selective models learning stimulus-action associations through rewards; (2) predictive models learning stimulus- and/or action-outcome associations through statistical inferences anticipating behavioral outcomes; and (3) contextual models learning external cues associated with latent states of the environment. Critically, the PFC combines these internal models by forming task sets to drive behavior and, moreover, constantly evaluates the reliability of actor task sets in predicting external contingencies to switch between task sets or create new ones. We review different models of adaptive behavior to demonstrate how their components map onto this unifying framework and specific PFC regions. Finally, we discuss how our framework may help to better understand the neural computations and the cognitive architecture of PFC regions guiding adaptive behavior.Entities:
Mesh:
Year: 2021 PMID: 34389808 PMCID: PMC8617006 DOI: 10.1038/s41386-021-01123-1
Source DB: PubMed Journal: Neuropsychopharmacology ISSN: 0893-133X Impact factor: 7.853
Fig. 1Dissecting adaptive behavior based on different types of links between stimuli, actions, and outcomes.
A The goal of learning is to obtain certain outcomes by selecting appropriate actions based on presented stimuli while considering the context that includes internal state as well as external cues that reflect the latent state of the environment. This requires linking stimuli, actions, and outcomes, which can be done in multiple ways each with different levels of flexibility. B Different types of learning strategies for linking stimuli (S), actions (A), and outcomes (O) and their main shortcomings. (1) S-Rew associations link reward values (Rew value) of the outcomes to certain stimuli that precede those outcomes, allowing for the computation of stimulus value. Such a model cannot correctly link S and O if reward that follows the same stimulus (Rew’) or the state of the animal changes. (2) A-Rew associations link reward values (Rew value) of the outcomes to certain actions that precede those outcomes, allowing for the computation of action value. Such a model cannot correctly link A and O if reward (Rew’) that follows the same action or state of the animal changes. (3) S–A associations or selective models link the chosen action and the stimulus that precedes this action using experienced rewards. Such models cannot link S and A if reward type or state of the animal changes. (4) S–O (similarly feature–outcome, F–O) associations or predictive models link S (respectively, F) and O by learning the probability of outcomes contingent upon stimuli and/or their features regardless of their rewarding values through encoding the statistical occurrences of these outcomes. (5) A–O associations or predictive models link A and O by learning the probability of outcomes contingent upon actions regardless of their rewarding values. Predictive models cannot easily transfer learning from one context to another context. C Flexible link between stimuli, actions, and outcomes through creation of task sets consisting of multiple internal models (see text for more details).
Fig. 2Functional architecture of the prefrontal cortex contributing to adaptive behavior.
Medial and lateral view of the human prefrontal cortex (PFC) and its main anatomical regions shown in relation with their contribution to adaptive behavior. The laOFC (lateral orbitofrontal cortex), vmPFC (ventromedial PFC), dmPFC (posterior and anterior dorsomedial PFC) along with premotor cortex (and possibly clPFC) are present presumably in all mammals from rodents to primates and humans. The lateral PFC including clPFC and especially midlPFC (mid-lateral PFC) emerges in primates, whereas fpPFC (frontopolar cortex) is specific to humans. In the proposed framework, the laOFC encodes stimulus reward values (S-Rew), posterior dmPFC action reward values (A-Rew). The vmPFC encodes predictive models involving learning (Stimulus–)Action–Outcomes associations. The lateral premotor cortex encodes low-level selective models (Stimulus–Action associations), whereas the clPFC encodes higher-level selective models (Cues-(S–A) associations). Task sets (TS) form large-scale neural frames linking such internal models encoded in these various PFC regions in order to potentially invoke them together to guide behavior. TS reliability is the ability of TS internal models to jointly predict external contingencies. midlPFC learns contextual models predicting TS reliability according to external cues. The actor TS is the TS driving ongoing behavior and which reliability is monitored in the vmPFC. Counterfactual TS are the TSs which reliability are monitored in the fpPFC without contributing to ongoing behavior. White arrows indicate major information flows related to actor task set reliability (medial PFC) and counterfactual task-set reliabilities (lateral PFC). Black arrows indicate major information flows related to reward values of action outcome (ventral PFC) and reliability-based inhibition or selection of actor TS in the dorsal PFC. See text for more explanations.