| Literature DB >> 34676440 |
Roshan Cools1,2, Hanneke E M den Ouden3, Mojtaba Rostami Kandroodi4,5, Jennifer L Cook6, Jennifer C Swart1, Monja I Froböse1, Dirk E M Geurts1,2, Abdol-Hossein Vahabie7,8, Majid Nili Ahmadabadi7.
Abstract
RATIONALE: Brain catecholamines have long been implicated in reinforcement learning, exemplified by catecholamine drug and genetic effects on probabilistic reversal learning. However, the mechanisms underlying such effects are unclear. OBJECTIVES AND METHODS: Here we investigated effects of an acute catecholamine challenge with methylphenidate (20 mg, oral) on a novel probabilistic reversal learning paradigm in a within-subject, double-blind randomised design. The paradigm was designed to disentangle effects on punishment avoidance from effects on reward perseveration. Given the known large individual variability in methylphenidate's effects, we stratified our effects by working memory capacity and trait impulsivity, putatively modulating the effects of methylphenidate, in a large sample (n = 102) of healthy volunteers.Entities:
Keywords: Catecholamines; Computational modelling of behaviour; Dopamine; Methylphenidate; Reversal learning; Working memory
Mesh:
Substances:
Year: 2021 PMID: 34676440 PMCID: PMC8629893 DOI: 10.1007/s00213-021-05974-w
Source DB: PubMed Journal: Psychopharmacology (Berl) ISSN: 0033-3158 Impact factor: 4.415
Fig. 1Experimental design and basic results. A Study timeline. Participants took part in 2 sessions, where on day 1, they started with a 20-min medical screening and on day 2 completed a working memory test (listening span). Neuropsychological questionnaires (Quest) were completed at home between the two sessions. Mood and medical symptom ratings (MMSR) were acquired at 3 time points. A battery of 6 tasks (Swart et al. 2017; Froböse et al. 2018; Cook et al. 2019) was performed in a fixed order, where the probabilistic reversal learning (PRL) task was always performed last. Average timings are indicated, with timings most relevant for the current study in purple. B Reversal learning design. On each trial, three visual stimuli were represented in three out of four randomly selected locations. Participants had to choose one of the stimuli with a mouse click and subsequently received feedback. The feedback would either be a reward (green, happy emotion) and punishment (red, sad emotion). During acquisition, the rewarded stimulus (here purple) resulted in a 75:25 ratio of reward/punishment. Selecting the neutral stimulus (orange) and punished stimulus (blue) led to 50:50 and 25:75 ratio of reward/punishment, respectively. After 40 trials, the reversal phase started, and rewarded and punished stimulus contingencies reversed. The participant now had to learn to select the blue stimulus. C Trial-by-trial choice. Trial-by-trial averaged probability of selection of each stimulus. A sliding window with 5-trial width is used for smoothing. Overall, participants learned to make the correct selection for each of the phases and showed relatively rapid reversal. D Average choice probability. Distribution of choice probability averaged within acquisition (dark blue) and reversal (light blue) phases. In both phases, participants learnt to select 75% rewarded option but do significantly less well during the reversal phase. E Feedback sensitivity. The degree to which people repeated a choice was modulated by the valence of the previous outcome for that choice: People were more likely to reselect a stimulus (‘stay’) after it had been rewarded than after it was punished. This effect was weaker during the reversal phase (less stay after a win, more stay after a loss), in line with slower learning during reversal. Note that the intercept in this plot was chance level stay (1/3), even after a loss people were more likely to stay than chance (simple effect of pStay following loss: F(1,99) = 77.1; p < 0.0001, following win F(1,99) = 1603.4; p < 0.0001), reflecting a relatively small impact of a single (negative) outcome and thus slow learning
Fig. 2Effects of methylphenidate on reversal learning task performance. A No significant main effect of methylphenidate on average probability of stimulus selection. Distribution of difference of stimulus selection probability between two sessions (MPH-placebo) for acquisition and reversal phase is demonstrated in dark and light blue, respectively. Methylphenidate did not consistently affect either overall learning or differential learning during acquisition and reversal. B Trial-by-trial averaged probability of selection of each stimulus (median split based on WM span). Left panel: high WM group (n = 48), probability of ‘rewarded’ stimulus selection increased under methylphenidate (dash line) in comparison to placebo (solid line) during the acquisition phase. Right panel: low WM group (n = 54), probability of ‘rewarded’ stimulus selection decreased under methylphenidate in comparison to placebo during the acquisition phase. A sliding window with 5-trial width is used for smoothing. C Methylphenidate effects predicted by WM span. Methylphenidate increased the accuracy of selecting ‘rewarded’ option vs ‘punished’ option in acquisition phase more than reversal phase for high WM span participants yet decreased it for low WM span participants (r = 0.26, p = 0.009). By splitting up the MPH effect by phase factor, during the acquisition, middle panel, participants with high WM improved under methylphenidate (r = 0.30, p = 0.002), but during the reversal, right panel, there is no significant interaction
Fig. 3Model fitting and simulation result for PRL task modelling. A Model comparison on base models. Model frequency and protected exceedance probability indicate that the EWA + F model (EWA with forgetting rate for unchosen options) provides the best description of the data (PXP = 0.91). B Trial-by-Trial simulated choice. Model simulations of the winning base model verify that the EWA + F model captures the behavioural data (grey lines indicate average raw data). C Simulated average choice probability. Distribution of stimulus selection probability for acquisition and reversal phase is demonstrated in dark and light blue, respectively (compare with Fig. 1D). The simulated data for EWA + F model qualitatively and quantitatively replicate the participants’ behaviour and regenerate key features of the data
Model evidence for base model and methylphenidate model families and parameter estimates for winning models in each family (see supplementary table 1 for full details for all models)
| Base model family | ||||||
| Model | Param.* | Constraint | Median | Range (25–75%) | pxpi | |
| EWA | 0.01 | 0.22 | ||||
| EWA + F | ||||||
| 0.77 | 0.29–0.87 | |||||
| 0.63 | 0.27–0.83 | |||||
| 4.23 | 3.11–7.88 | |||||
| 0.35 | 0.02–0.68 | |||||
| Hybrid | 0.0 | 0.10 | ||||
| Hybrid + F | 0.08 | 0.28 | ||||
| Methylphenidate model family | ||||||
| EWA + F | 0.02 | 0.25 | ||||
| EWA + F | 0 | 0.13 | ||||
| EWA + F | ||||||
| 0.70 | 0.26–0.86 | |||||
| 0.70 | 0.33–0.84 | |||||
| 0.56 | 0.21–0.79 | |||||
| 4.46 | 3.13–7.94 | |||||
| 0.22 | 0.01–0.62 | |||||
| EWA + F | 0 | 0.12 | ||||
| EWA + F | 0 | 0.08 | ||||
*A weakly informative Gaussian prior was used for all parameters (x∼N(μ,σ2) where the mean valueand the variance). According to theoretical constraints of parameters, sigmoid or exponential transformations are applied.
Fig. 4Modelling the effects of methylphenidate. A Simulated choice. Simulated data replicated observed behaviour for no main effect of methylphenidate (see Fig. 2A). B Model comparison on methylphenidate models. Model frequency and protected exceedance probability (PXP = 0.98) indicate that the EWA + F that allows for differential learning rates under methylphenidate and placebo best captures the data. Model validation: C-F. C Simulated trial-by-trial behaviour. Simulated data replicate observed behaviour for high WM and low WM participants (see Fig. 2B). D Choice simulation. The simulated data using winning model regenerate quantitative characteristics of data, particularly a positive effect of methylphenidate on performance for high WM participants (see Fig. 2C). E Methylphenidate changes inverse learning rate as a function of working memory span. The difference in inverse learning rates under methylphenidate vs placebo () covaries with WM span (r = 0.2; p = 0.043). Methylphenidate increases in high WM participants, where decreases it in low WM participants. F Methylphenidate-induced effect on raw performance scores. The methylphenidate-induced change in inverse learning rate () is correlated with methylphenidate-induced change in raw performance (r = 0.33; p < 0.001)