| Literature DB >> 31920796 |
Bernard W Balleine1, Amir Dezfouli2.
Abstract
It is now commonly accepted that instrumental actions can reflect goal-directed control; i.e., they can show sensitivity to changes in the relationship to and the value of their consequences. With overtraining, stress, neurodegeneration, psychiatric conditions, or after exposure to various drugs of abuse, goal-directed control declines and instrumental actions are performed independently of their consequences. Although this latter insensitivity has been argued to reflect the development of habitual control, the lack of a positive definition of habits has rendered this conclusion controversial. Here we consider various alternative definitions of habit, including recent suggestions they reflect chunked action sequences, to derive criteria with which to categorize responses as habitual. We consider various theories regarding the interaction between goal-directed and habitual controllers and propose a collaborative model based on their hierarchical integration. We argue that this model is consistent with the available data, can be instantiated both at an associative level and computationally and generates interesting predictions regarding the influence of this collaborative integration on behavior.Entities:
Keywords: action sequences; chunking; goal-directed action; habits; model-based; model-free; reinforcement learning
Year: 2019 PMID: 31920796 PMCID: PMC6917637 DOI: 10.3389/fpsyg.2019.02735
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Competition and collaboration in goal-directed and habitual action control. (A) Simple model of competition for performance with goal-directed and habitual controllers mutually inhibiting one another. (B) More sophisticated approach to competition, with goal-directed and habitual controllers competing through arbitration. (C) Behavioral evidence suggests, in contrast to competition, that habit and goal-directed processes are intimately connected and collaborate in action selection, evaluation, and execution. (D) A formal associative architecture that instantiates the collaboration between habit and goal-directed controllers through the interaction of habit memory and associative memory systems, the latter feeding back to control performance. Action selection in the habit memory is mediated by the association of S1 and R1 that feeds forward to provide both subthreshold activation of the motor output and activation of the action representation, A1, in the associative memory provoking retrieve of the action outcome (O1) and its evaluation through the interaction of the associative and evaluative memory systems. The latter provides a promiscuous, feedback (cybernetic) signal that sums with the forward excitation from the habit memory. If positively evaluated (blue lines/arrows), it provokes action execution; if negatively evaluated (red lines/arrows), it blocks performance. (E) An example of the representation of a complex habit sequence in the habit memory incorporating lever press and magazine approach responses together with a simple lever press action. Both are represented in the habit memory (the expanded sequence, the acquisition of which is supported by proprioceptive feedback from motor output) and its chunked representation in the associative memory (e.g., ALO-MA). (F) The formal associative-cybernetic model incorporating chunked action sequences and simple actions in both the habit memory and the associative memory.
Figure 2Evidence for hierarchical collaboration in humans and rats. (A) Two-stage task in human subjects. (B) After a rare transition (example shown) and revaluation of O2 (upper panel), an expanded action repertoire using action sequences (e.g., A1R1) can induce insensitivity to revaluation of the second stage choice (e.g., R1). (C) The influence of reward and non-reward on the tendency to stay on the same first stage choice after a common and a rare transition in human subjects. (D) Simulated (sim) second stage choices from various flat model-based and/or model-free RL models (left panel), a hierarchical RL model (center), and the human data (right panel). (E) Design of a two-stage task in rats with training conducted on a two-stage discrimination that is reversed, initially, every four trials and subsequently every eight trials. At various points in training, we included rare transitions as probe tests (sessions 40, 66, 78, 87, and 94). (F) The odds ratio of staying on the same stage 1 action after reward on the previous trial over the odds ratio after no reward. The horizontal line represents the indifference point. Each vertical line is one session. (G) Results from the probe tests. Note the comparable performance of rats and humans when rats show evidence of having acquired an accurate representation of the multistage nature of the task. (H) Rat data from second stage choices using a comparable version of the task to that used in humans. Panels (A–D,G,H) are taken directly from Dezfouli and Balleine (2013, 2019). Panels (E,F) are redrawn from Dezfouli and Balleine (2019).