| Literature DB >> 31372665 |
T V Lim1, R N Cardinal1,2,3, G Savulich1,2, P S Jones1, A A Moustafa4, T W Robbins1,2, K D Ersche5,6.
Abstract
RATIONALE: Drug addiction has been suggested to develop through drug-induced changes in learning and memory processes. Whilst the initiation of drug use is typically goal-directed and hedonically motivated, over time, drug-taking may develop into a stimulus-driven habit, characterised by persistent use of the drug irrespective of the consequences. Converging lines of evidence suggest that stimulant drugs facilitate the transition of goal-directed into habitual drug-taking, but their contribution to goal-directed learning is less clear. Computational modelling may provide an elegant means for elucidating changes during instrumental learning that may explain enhanced habit formation.Entities:
Keywords: Appetitive discrimination learning; Computational modelling; Extinction; Goal-directed learning/behaviour; Habit; Hierarchical Bayesian; Perseveration; Positive feedback; Reinforcement sensitivity
Mesh:
Year: 2019 PMID: 31372665 PMCID: PMC6695345 DOI: 10.1007/s00213-019-05330-z
Source DB: PubMed Journal: Psychopharmacology (Berl) ISSN: 0033-3158 Impact factor: 4.530
Fig. 1Outline of the appetitive discrimination learning task. Participants were required to learn by trial and error which response associated with an animal picture gained them points. Feedback was provided by a picture of another animal coupled with either a number of points or an empty box with no points
Summary of the reinforcement learning models tested. Several models with different parameter combinations were assessed via bridge sampling. We show the included posterior probabilities for each model, i.e. the probability of each model given the data (and given that they were equiprobable before the data). Models were ranked accordingly and we found that the best-fit model used three parameters: learning rate, reinforcement sensitivity and perseveration. We have also included log Bayes factors for comparisons between the ranked models. According to the criteria of Kass and Raftery (1995), there was overwhelming evidence that the top two ranked models were superior to all other models. Though the difference between the top two models was marginal, we have selected the model that was more likely, which was also the more parsimonious of the two. [Note: Logs are natural logarithms unless stated.]
| Free parameters | Model selection | |||||||
|---|---|---|---|---|---|---|---|---|
| Learning ratea | Extinction rate, | Reinforcement sensitivity, | Perseveration, | Log marginal likelihood | Log posterior | Posterior | Log10 Bayes factor (relative to next-ranked model) | Ranking |
| ✓ | ✓ | ✓ | − 6718.8 | − 0.578 | 0.561 | 0.106 | 1 | |
| ✓ | ✓ | ✓ | ✓ | − 6719.0 | − 0.823 | 0.439 | 18.03 | 2 |
| ✓ | ✓ | ✓ | − 6760.5 | − 42.33 | 0 | 0.407 | 3 | |
| ✓ | ✓ | − 6761.5 | − 43.27 | 0 | 140.71 | 4 | ||
| ✓ | ✓ | ✓ | − 7085.5 | − 367.27 | 0 | 20.04 | 5 | |
| ✓ | ✓ | − 7131.6 | − 416.40 | 0 | 492.78 | 6 | ||
| − 8266.3 | − 1548.06 | 0 | N/A | 7 b | ||||
aFor some models, the learning rates were fractionated into learning from reward (α) or non-reward (i.e. extinction rate, α), as shown. If extinction rate is not defined in the model, then the learning rate should encompass learning from both reward and non-reward (α).
bTo verify that these results were not spurious findings, we included a random choice model, which assumes that choices were selected at random (p = 0.5 for each of the two possible responses). Our results suggest that all tested models fit the data better than the random choice model.
Fig. 2The mean group differences of the posterior distributions for each learning parameter in the model. Parameters that have group differences (indicated in red) have 95% highest density intervals that do not overlap zero. Compared with healthy control volunteers, patients with CUD show a reduced learning rate. Both mean differences in reinforcement sensitivity and perseveration did overlap with zero. (Note: the reinforcement sensitivity parameter is placed on a different axis due to scale differences.)
Fig. 3Structural connectivity of mean fractional anisotropy (FA) between brain regions involved in a the goal-directed system, which has been linked with interactions between the medial prefrontal cortex, the anterior caudate nucleus and ventral parts of the striatum, and b the habit system, which depends on interactions between pre-motor cortex (BA6) and the posterior putamen. c Scatter plot depicting the significant relationships in healthy control volunteers between learning rates and mean FA values within the neural pathway that has been suggested to underlie goal-directed learning. Scatter plot showing the lack of such a relationship in CUD patients