Literature DB >> 35470564

Computational models of exploration and exploitation characterise onset and efficacy of treatment in methamphetamine use disorder.

Alex H Robinson¹, Trevor T-J Chong¹, Antonio Verdejo-Garcia¹.

Abstract

People with Methamphetamine Use Disorder (PwMUD) spend substantial time and resources on substance use, which hinders their ability to explore alternate reinforcers. Gold-standard behavioural treatments attempt to remedy this by encouraging action towards non-drug reinforcers, but substance use often persists. We aimed to unravel the mechanistic drivers of this behaviour by applying a computational model of explore/exploit behaviour to decision-making data (Iowa Gambling Task) from 106 PwMUD and 48 controls. We then examined the longitudinal link between explore/exploit mechanisms and changes in methamphetamine use 6 weeks later. Exploitation parameters included reinforcement sensitivity and inverse decay (i.e., number of past outcomes used to guide choices). Exploration parameters included maximum directed exploration value (i.e., value of trying novel actions). The Timeline Follow Back measured changes in methamphetamine use. Compared to controls, PwMUD showed deficits in exploitative decision-making, characterised by reduced reinforcement sensitivity, U = 3065, p = 0.009, and less use of previous choice outcomes, U = 3062, p = 0.010. This was accompanied by a behavioural pattern of frequent shifting between choices, which appeared consistent with random exploration. Furthermore, PwMUD with greater reductions of methamphetamine use at 6 weeks had increased directed exploration (β = 0.22, p = 0.045); greater use of past choice outcomes (β = -0.39, p = 0.002) and greater choice consistency (β = -0.39, p = 0.002). Therefore, limited computational exploitation and increased behavioural exploration characterise PwMUD's presentation to treatment, while increased directed exploration, use of past choice outcomes and choice consistency predict greater reductions of methamphetamine use.

Entities: Chemical

Keywords: computational modelling; decision-making; explore/exploit; methamphetamine; predictive; substance use

Mesh：

Substances：
Methamphetamine

Year: 2022 PMID： 35470564 PMCID： PMC9286537 DOI： 10.1111/adb.13172

Source DB: PubMed Journal: Addict Biol ISSN： 1355-6215 Impact factor: 4.093

INTRODUCTION

People with Methamphetamine Use Disorder (PwMUD) spend substantial time and resources on their substance use, which hinders their ability to act towards alternative reinforcers. , Clinically, we see this in the persistent preoccupation with methamphetamine use ; the decreased ability to complete non‐drug‐related goals and the impoverished health, social relationships and quality of life that accompany their presentation to treatment. , , Furthermore, despite treatment interventions attempting to shift people away from methamphetamine use and towards adaptive, non‐drug‐related activities (i.e., positive social groups, treatment attendance and personal goals), substance use often persists. , Computational models of the exploration/exploitation framework , are a novel approach to investigate the mechanisms preventing behavioural change in PwMUD. These models assess whether an organism takes a new action to seek out greater rewards (exploration) or instead repeats familiar actions already associated with a known reward (exploitation). , Exploration can be further separated into directed exploration (goal‐oriented behaviours to gain information about an unfamiliar action) and random exploration (exploration without the conscious goal of gaining new information). , , The prevailing view in the addiction literature is that persistent substance use is linked to a bias towards exploitative decision‐making. , , However, evidence supporting this hypothesis is mixed and varies depending on the substance of concern and form of analysis (i.e., basic behavioural outcomes versus computational modelling). , , , To overcome this challenge, more recent computational approaches now avoid the dichotomous, behavioural separation of explore/exploit biases, and instead provide explanations of how dysfunction may be driven in either process. For example, Smith et al. found that explore/exploit dysfunction was driven by hyposensitivity to loss, hypersensitivity to reward and increased random exploration amongst a mixed sample of Substance Use Disorders. In another example, Morris et al. found that people with Alcohol Use Disorder had reduced effectiveness in exploiting reward, driven by decreased exploration after reinforcement. As these two findings provide more nuanced explanations of explore/exploit decision‐making (when compared to summary behavioural measures), computational modelling appears to be a promising measure to identify underlying cognitive processes relevant to PwMUD's treatment. The Iowa Gambling Task (IGT) is one paradigm that appears well‐suited to measure the mechanistic underpinnings of PwMUD's explore/exploit behaviour. This is because advantageous performance on the task requires an initial exploration of the available choices, followed by a subsequent exploitation of the most advantageous choices. Such a pattern is clinically relevant as it mimics the efforts of PwMUD to progressively explore and exploit non‐drug reinforcers across their recovery. Furthermore, the IGT is a useful measure to identify decision‐making differences between PwMUD and healthy controls and has been linked to clinical outcomes in PwMUD. For example, similar to other substance use disorders, early IGT studies were able to identify generalised and/or ‘risky’ decision‐making in PwMUD compared to controls. , , , Following these findings, the IGT has since identified relationships between generalised decision‐making dysfunction and clinical outcomes in PwMUD, such as length of abstinence and treatment response. , However, while the IGT has been computationally modelled to identify specific decision‐making mechanisms in other stimulant use disorders, , such an approach has not yet been implemented in PwMUD, and the vast majority of the above findings are based on a single summary metric of performance (i.e., net score). Recently, a computational model has been developed that can sensitively measure the underlying, computational parameters involved in explore/exploit decisions on the IGT. This model, the Value plus Sequential Exploration model (VSE), quantifies the strength of multiple, interlinked mechanisms involved in exploitation (i.e., sensitivity to reinforcement and number of previous choices used to guide current decision) and directed exploration (i.e., value associated to gaining new information and how frequently people reach this value). It then uses these individual parameters to compute the overall values of exploring and exploiting, which can then be used to predict participants' choice behaviour. The inclusion of these directed exploration parameters separates the VSE from previous models of the IGT , , , and allows a mechanistic understanding of PwMUD's behaviour on the task. Our study had two aims. First, we sought to reveal the computational mechanisms driving maladaptive explore/exploit decisions amongst treatment‐seeking PwMUD. This was achieved by applying the VSE model to behaviour on the IGT and then comparing the parameters characterising explore/exploit behaviour between PwMUD and drug‐naïve controls. Second, we investigated whether the baseline explore/exploit parameters in PwMUD could predict changes in methamphetamine use six‐weeks after treatment engagement—which is a vulnerable period for drug reduction amongst PwMUD. Both these aims were exploratory and did not include any a priori hypotheses.

MATERIALS AND METHODS

Design

We applied the VSE model of the IGT to a large, previously collected, dataset comparing PwMUD and drug‐naïve controls. Our first aim of identifying group differences in explore/exploit computational parameters used a cross‐sectional design. Our second aim of identifying whether baseline parameters of explore/exploit decisions predicted changes in methamphetamine use after 6 weeks used a prospective design.

Participants

Data were available for 106 PwMUD and 48 control participants, of which 79 PwMUD (74.53%) returned for follow‐up. PwMUD were recruited during the first 3 weeks of treatment from private and public treatment settings in Melbourne, Australia. These included inpatient rehabilitation and detoxification services, as well as outpatient counselling services. The inclusion criteria for PwMUD were (1) a primary diagnosis of methamphetamine dependence identified by the Structured Clinical Interview for the DSM‐IV (SCID‐IV) ; (2) self‐reported abstinence from methamphetamine for more than 48 h (to rule out acute drug effects) but less than 21 days (to prevent heterogeneity in recovery stages) and (3) no secondary substance dependence (excluding alcohol, cannabis and tobacco, measured by SCID‐IV). Controls were required to have never used methamphetamine and to not meet criteria for any substance dependence. All participants were required to have never experienced (1) loss of consciousness greater than 30 min (self‐reported); (2) a diagnosis of bipolar, schizophrenia or other psychotic disorder (screened by the SCID‐IV); (3) intellectual disability (screened by the Wechsler Abbreviated Scale of Intelligence II; WASI‐II) and (4) neurological disorders (self‐reported). Table 1 presents each group's sociodemographic statistics. Table 2 presents substance use in PwMUD.

TABLE 1

Descriptive statistics of sociodemographic characteristics in PwMUD and controls

	PwMUD	Controls	Frequentist Mann. Whit.	Bayes factor Mann. Whit.
Sex (F/M)	27F/79M	12F/36M	p = 0.95
Age	31.20 (7.25)	31.59 (8.67)	p = 0.65	BF ₁₀ = 0.18
Education (years)	12.93 (2.25)	13.26 (2.07)	p = 0.32	BF ₁₀ = 0.29
FSIQ	96.19 (10.96)	101.92 (11.36)	p = 0.007	BF ₁₀ = 5.14
Employed (any)	28 (26.92%)	26 (54.17%)	p = 0.001
Depression (CES‐D)	28.51 (12.30)	7.25 (5.93)	p < 0.001	BF ₁₀ = 87 412.26
Sociodem. Status	6.97 (2.41)	7.92 (1.60)	p = 0.042	BF ₁₀ = 1.019

Note: Values represent means with standard deviations in parentheses.

Abbreviation: CES‐D, Centre for Epidemiologic Studies Depression Scale.

TABLE 2

PwMUDs baseline methamphetamine and other common substance use

	M	SD	Range
Severity of Dependence Meth	11.10	3.15	[1–15]
Severity of Dependence Cann.	2.63	4.24	[0–15]
Severity of Dependence Alch.	1.71	3.30	[0–14]
Methamphetamine
Daily dose (grammes)	0.71	0.56	[0.03–4.00]
Frequency (days/month)	23.30	9.25	[4–31]
Duration of use (years)	6.95	4.90	[0.6–30]
Other substance use
Cigarettes p month	407.13	257.50	[0–930]
Standard drinks p month	75.34	160.76	[0–775]
Cannabis p month (grammes)	24.89	42.21	[0–217]

Note: Severity of Dependence Scale scores range between 0 and 15, with those >4 indicating problematic methamphetamine use, ≥3 indicating likely Alcohol Use Disorder and ≥4 indicating likely Cannabis Use Disorder.

Descriptive statistics of sociodemographic characteristics in PwMUD and controls Note: Values represent means with standard deviations in parentheses. Abbreviation: CES‐D, Centre for Epidemiologic Studies Depression Scale. PwMUDs baseline methamphetamine and other common substance use Note: Severity of Dependence Scale scores range between 0 and 15, with those >4 indicating problematic methamphetamine use, ≥3 indicating likely Alcohol Use Disorder and ≥4 indicating likely Cannabis Use Disorder.

Procedure

Participants completed the IGT as part of a larger cognitive assessment, from which results have been reported elsewhere. , , , Testing sessions occurred at treatment centres or quiet, easily accessible spaces (e.g., community libraries). The follow‐up assessments had a similar structure and nature to the baseline assessment but were briefer due to the omission of the clinical interview. Measures were administered by researchers with postgraduate training in clinical psychology and followed well‐established guidelines to ensure reliability. Study procedures were approved by the Eastern Health Human Research Ethics Committee (E52/1213) and participants received $40 AUD in grocery gift cards.

Measures

Sociodemographic measures

Participants self‐reported their sex, age, years of education and employment status. The WASI‐II estimated full‐scale IQ. The Centre for Epidemiologic Studies Depression Scale (CES‐D) estimated depressive symptomology. Socioeconomic status was estimated by the Australian Bureau of Statistics' Socio‐Economic Indexes for Areas.

History of methamphetamine and secondary drug use

A modified version of the Interview for Research on Addictive Behaviours (IRAB) estimated PwMUD's daily methamphetamine use (grammes), duration of use (years), route of administration and secondary substance use. The Severity of Dependence Scale (SDS) measured severity of dependence to alcohol, cannabis and methamphetamine.

Iowa Gambling Task

The IGT is a computer‐based measure that requires participants to make 100 selections from four decks of cards, with the goal of making as much game‐based currency as possible. We used the ABCD version in which decks C and D are ‘good’ (providing small wins but even smaller losses, leading to a net profit) and decks A and B are ‘bad’ (providing large wins but even larger losses, leading to a net loss). The likelihood of punishment also varies across decks. For example, decks A and C regularly enforce losses (50% likelihood), while decks B and D rarely enforce losses (10% likelihood). Participants received standardised, verbal instructions outlining the task goal and its visual layout, with intentionally vague information on the reward structure (‘some decks are better than others’). While the IGT has typically been measured using the net score (number of ‘good’ decks selected minus number of ‘bad’ decks; either measured once as a single outcome or measured in blocks of 20 trials across the task), we analysed numerous additional measurements of choice behaviour and computational parameters using the VSE Toolbox.

Choice behaviour

Choice behaviour measures were used to provide an overall understanding of the overt behaviour of participants on the task. These would then be later used to provide a context to the VSE parameter findings. Beyond the traditional measure of net score, the VSE toolbox also provides behavioural measures of win/stay and lose/switch frequency, mutual information (higher values indicate that choices on subsequent trials are better predicted by the current choice), choice entropy (higher values indicate that participants evenly selected from all decks; lower values indicate that participants selected from only a few decks/one deck) and the frequency with which individuals chose three different decks over three consecutive trials (‘sequential exploration 3’) or four decks over four trials (‘sequential exploration 4’). We note that the ‘sequential exploration 3/4’ outcomes have previously been referred to as ‘directed exploration 3/4’. However, because high frequencies of switching decks may also represent random exploration (particularly if this is occurring in later trials with little improvement in performance), we believe this new label is more accurate. Finally, we also measured log median reaction time (after removing the first trial) in order to measure task engagement. We used this approach as significantly faster responses may highlight a participant response strategy of ‘rushing’ at the cost of accuracy, while significantly longer responses may highlight ‘mind‐wandering’ during the task.

Exploration/exploitation modelling: Value plus sequential exploration model

The VSE model computes value weightings for both exploitation and directed exploration and estimates several parameters related to each process. The Exploitation weight (Exploit) of a given deck, d, calculates the value of continuing to select that deck on a given trial, t. This value is influenced by an inverse decay parameter, Δ, which controls how many past outcomes are used to guide the decision (Δ ∈ [0, 1], with Δ = 0 using only the outcome from the previous trial to update value and Δ = 1 using the outcomes of all previous trials). The Exploit weight also incorporates a value function, v, which includes a parameter that represents the sensitivity of the individual to gains and losses on the current trial (reinforcement sensitivity, θ: θ ∈ [0, 1], higher values reflect greater sensitivity to both gains and losses): A decay of previous exploit weightings is also fitted to unselected decks, using the same inverse decay parameter, ∆: In contrast, the Exploration weighting (Explore) computes the value for participants to undertake directed exploration towards a specific deck on each trial. Here, the Explore value of the selected deck is set to zero after selection (Equation 4), while unexplored decks are updated via a delta‐rule to increase the drive towards exploration. This occurs via (1) a directed exploration bonus parameter (φ, unbounded), which sets the ‘threshold’ of how biassed a person is towards directed exploration (i.e., positive values mean a person is more attracted to explore, while negative values indicate a person is more likely to continue selecting the current deck), and (2) a directed exploration learning rate (α: α ∈ [0, 1]), which drives how quickly a person gets to the upper limit of exploration demand (Equation 5). In this manner, if two people had all other parameter values equal, the person with a higher exploration learning rate (α) would more frequently explore. Finally, the overall probability of choosing a particular deck was determined by comparing the Exploit and Explore weights using a softmax function (Equation 6). The inverse temperature parameter, C (or consistency), reflects stochastic behaviour (with higher values representing more deterministic behaviour in line with the above computations). Using the parameters above, we can estimate (1) the relative strength of exploitative processes (i.e., inverse decay, ∆, and sensitivity to reinforcement, θ); (2) the relative strength of directed exploration processes (directed exploration bonus, φ, and directed exploration learning rate, α) and (3) the stochasticity of individuals' decisions (consistency, C). These five parameters were our primary outcomes of interest given they drive the explore/exploit behaviour described in the previous section. The VSE model has been shown to have good to excellent parameter recovery. The creator of the model found a mean parameter recovery of r = 0.81, with all parameters falling within a recoverability range of r = 0.67–0.95. A simulation based on our own sample size (N = 149) found a mean parameter recoverability of rho = 0.73 (reinforcement sensitivity, θ, rho = 0.56; inverse decay, Δ, rho = 0.83; directed exploration bonus, φ, rho = 0.87; directed exploration learning rate, α, rho = 0.58; consistency, C, rho = 0.79, p < 0.001 for all parameters). Table S1 reports the correlations matrix between the five parameters from our analysis. The small and non‐significant correlations between exploitative‐based and explorative‐based parameters (i.e., reinforcement sensitivity, θ, and inverse decay, Δ, did not correlate with directed exploration bonus, φ, and/or directed exploration learning rate, α) highlight how exploitation and exploration are separable constructs in the model.

Changes in methamphetamine use at six‐weeks

The Timeline Follow Back interview (TLFB) measured the number of days PwMUD used methamphetamine in the month before commencing treatment and in the month before the follow‐up assessment. To aid accuracy, we provided several cues to aid recollection (i.e., including public holidays on the calendar; prompting participants to describe their patterns of use and identifying any deviations; and assuring confidentiality). We calculated reduction of methamphetamine use by subtracting follow‐up TLFB scores from baseline TLFB scores.

Statistical analyses

Overview

When comparing groups, we applied Bayesian and frequentist independent t tests, Mann‐Whitney U tests, and Chi‐Square tests using JASP. For the traditional analysis of net score across 20‐trial blocks, a 5 (number of blocks) × 2 (groups) repeated ANOVA was used. A Bayes Factor (BF10) of <1/30 indicated evidence in favour of the null, and >3 indicated evidence in favour of the alternative. Alpha was set at p = 0.05. We implemented the VSE model using the VSE Toolbox in MATLAB 2017a. Outliers were removed based on choice behaviour data and were defined as more than 3.29 standard deviations from the mean.

Aim 1: Modelling and comparing explore/exploit mechanisms between groups

The VSE toolbox identified whether PwMUD and controls differed in their parameters of explore/exploit decisions. This toolbox uses a Variational Bayesian scheme, implemented via the VBA Toolbox. To first see whether the VSE model was the best fitting model of participants' behaviour, we compared model fit between the VSE and other models of the IGT. These comparisons used the Akaike Information Criterion and included (1) the VSE model; (2) a modified VSE model which includes separate parameters for sensitivity towards reward and losses; (3) the Expectancy Valence model ; (4) the Prospect Valence Learning model ; (5) the Value Plus Perseverance model and (6) the Outcome‐Representation Learning model. For brief explanations of the other models, see the supporting information. We based each individuals' parameters on the best fitting model.

Covarying for cognitive or demographic group differences

Because PwMUD can differ from controls in areas such as attention, , working memory, depression and IQ, we also investigated whether any of these four factors were explaining any of the computational, between‐group findings. This is particularly important, as previous work with this sample of PwMUD found increased inattention and depression, in addition to decreased full‐scale IQ, compared to controls. Therefore, we obtained previously collected measures of inattention (omission errors on the Continuous Performance Test II; CPT‐II Om Errors), inattention/disinhibition (commission errors on the CPT‐II; CPT‐II Comm Errors) and working memory (longest digit sequencing span from the Letter Number Sequencing task; LDSS) used previously in this sample. We then ran correlations between the above five measures (intelligence, depression, inattention, inattention/disinhibition and working memory) and the VSE parameters of interest, controlling for multiple comparisons. If significant, they were included as covariates in an ANCOVA analysis. A brief description of the additional cognitive measures and their descriptive statistics are presented in the supporting information.

Aim 2: Predicting reductions in methamphetamine use

We used multiple hierarchical regression to identify whether any VSE parameters could predict changes in PwMUD's methamphetamine use at 6‐week follow‐up. We chose the VSE parameters to predict reductions in methamphetamine use, rather than summary behavioural statistics, as they provide a more mechanistic explanation of why methamphetamine use may change, over and above the raw behavioural outputs alone. Here, we used the stepAIC function in R to identify the best combination of parameters. We applied the stepwise regression both backwards (i.e., starting with a model including all parameters and then systematically removing parameters) and forwards (i.e., starting with the simplest model that includes only the intercept and then systematically adding parameters). The best fitting model was then recreated in JASP to determine Bayesian estimates.

RESULTS

Behavioural analyses

Five PwMUD represented extreme outliers in one or more behavioural domains (i.e., net score, win/stay, lose/switch, mutual information, choice entropy and sequential exploration over three/four trials) and were removed from analyses. Visual inspection of Figure 1 shows control participants continuously improved their net score across the task. In contrast, PwMUD did not increase their score. Furthermore, controls showed a higher level of sequential exploration for three and four consecutive decks in the first 10 trials, before reducing their exploration (Figure 2). In contrast, PwMUD started with lower levels of exploration yet maintained these at a higher level throughout the task. Thus, controls appeared to experience an early period of exploration, followed by adaptive exploitation, while PwMUD appeared to struggle identifying which decks should have been exploited and instead appeared to engage in random exploration.

FIGURE 1

Cumulative net score of PwMUD (red) and controls (blue) across IGT. Note. Solid line represents mean; shaded area represents 95% confidence intervals

FIGURE 2

Comparison of behavioural indices of sequential exploration (SE3 and SE4) between PwMUD (red) and controls (blue). Note. SE3, sequential exploration 3. SE4, sequential exploration 4. These refer to how often participants chose three/four different decks across three/four subsequent trials (i.e., choosing deck a then C then B then D). Dotted line represents theoretical chance of each event (0.33 for SE3, and 0.09 for SE4). Solid line represents mean; shaded area represents 95% confidence intervals

Cumulative net score of PwMUD (red) and controls (blue) across IGT. Note. Solid line represents mean; shaded area represents 95% confidence intervals Comparison of behavioural indices of sequential exploration (SE3 and SE4) between PwMUD (red) and controls (blue). Note. SE3, sequential exploration 3. SE4, sequential exploration 4. These refer to how often participants chose three/four different decks across three/four subsequent trials (i.e., choosing deck a then C then B then D). Dotted line represents theoretical chance of each event (0.33 for SE3, and 0.09 for SE4). Solid line represents mean; shaded area represents 95% confidence intervals Traditional group comparisons yielded significant differences on net score, with controls performing better than PwMUD, t(70.83) = 4.50, p < 0.001, BF 10 = 2265.34, as well as controls showing a greater improvement in their net score across the blocks of the task, F(3.32, 500.74) = 6.07, p < 0.001, BF = 100.95. Using more detailed behavioural analysis, controls also showed greater mutual information, an indication towards autocorrelation between subsequent choices, U = 3486, p < 0.001, BF 10 = 89.22. Conversely, PwMUD showed greater choice entropy, indicating greater selection of different decks across the task, U = 1488, p < 0.001, BF = 60.43. Consistent with Figure 2, PwMUD exhibited greater sequential exploration across three consecutive trials, U = 1729.5, p = 0.005, BF 10 = 7.51, though there was mixed evidence to whether this trend continued over four trials, U = 1925, p = 0.043, BF 10 = 1.75. There were no significant group differences on win/stay (U = 2726.5, p = 0.22, BF 10 = 0.45) or lose/switch behaviour (U = 2054, p = 0.13, BF 10 = 0.81). Figure 3 visualises these results.

FIGURE 3

Behaviour indices of choice behaviour in PwMUD (red) and controls (blue). Note. Individual dots represent each participant in each group; * = p < .05. While sequential exploration 4 was significant, BF10 = 1.75. Higher values of mutual information highlight indicate that choices on subsequent trials are better predicted by the current choice. Higher values of choice entropy indicate that participants evenly explored all decks; lower values of choice entropy indicate selecting from only a few decks/one deck. Sequential exploration 3/4 refers to how often participants chose three/four different decks across three/four subsequent trials

Aim 1: Modelling and comparing explore/exploit mechanisms between groups

Compared to all other models, the VSE had the best fit in both groups (see Figures S1 and S2). From this, we found that PwMUD's behaviour on the IGT appeared primarily driven by deficits in exploitative processes. First, we investigated whether any of the five covariates previously identified in Section 2.5.2 were particularly relevant to explaining between‐group differences in the computational parameters of the VSE models (see Table S2). Here, inattention/disinhibition (CPT‐II Comm Errors) correlated with directed exploration bonus (φ; rho = 0.30, p = 0.002), and intelligence (WASI‐II) correlated with inverse decay (Δ, rho = 0.26, p = 0.003). Thus, both were included as covariates in their respective between‐group analysis. In comparing the computational parameters, PwMUD were first found to be less sensitive to reinforcement (both negative/positive) than controls, shown by a reduced reinforcement sensitivity (θ) parameter, U = 3065, p = 0.009, BF 10 = 3.22. Second, there was a trend for PwMUD to use fewer previous choices to guide their current choice, as shown by a borderline difference in the inverse decay (∆) parameter, U = 3062, p = 0.010, BF 10 = 2.48. When including the covariate of intelligence (F[1, 145] = 5.98, p = 0.02, BF = 4.44, η p = 0.04), this between‐group difference on inverse decay (∆) remained similar (F[1, 145] = 4.33, p = 0.04, BF = 1.96, η 2 p = 0.03). Third, PwMUD and controls did not significantly differ in the upper bound of their directed exploration value, shown by the directed exploration bonus parameter (φ; U = 2046, p = 0.13, BF 10 = 0.39). Moreover, when we included inattention as a covariate (F[1, 146] = 10.47, p = 0.002, BF = 31.52, η p = 0.07), the between‐group differences on directed exploration bonus (φ) became null (F[1, 146] = 0.27, p = 0.61, BF = 0.22, η p = 0.002). Fourth, and in contrast to the previous result, PwMUD reached their upper bound of directed exploration slower than controls, shown by the directed exploration learning rate parameter (α), U = 3094, p = 0.007, BF = 5.07. Finally, choice stochasticity was similar between groups, shown by the consistency parameter (C), U = 2525, p = 0.68, BF = 0.21. Figure 4 visualises these group differences across the different parameters.

FIGURE 4

VSE parameter estimates of PwMUD (red) and controls (blue). Note. Individual dots represent each participant in each group; *p < .05. While inverse decay was significant, BF 10 = 2.48. Higher values of reinforcement sensitivity indicate greater sensitivity for reward/punishment. Higher values of inverse decay indicate use of more previous outcomes. Higher values of consistency reflect less choice stochasticity. Higher values of exploration bonus indicate a greater maximum value of directed exploration. Higher values of exploration learning rate indicate a greater frequency of reaching the maximum directed exploration value

Aim 2: Predicting reductions in methamphetamine use

The best fitting combination of VSE parameters (Table 3) significantly predicted changes in self‐reported methamphetamine use between baseline and 6‐week follow‐up, F(3,69) = 5.93, p = 0.001, adj. R = 0.17. This model retained three parameters: inverse decay (∆), directed exploration bonus (φ) and consistency (C), all of which significantly predicted changes in methamphetamine use at follow‐up. Here, inverse decay (∆) and consistency (C) negatively associated with reduced methamphetamine use (i.e., using fewer previous choices to guide decisions and having a greater choice stochasticity were associated with smaller reductions in methamphetamine use). Conversely, directed exploration bonus (φ) positively associated with reduced methamphetamine use (i.e., a greater tendency towards directed exploration was associated with larger reductions in methamphetamine use). These results were not explained by task engagement, with no correlation between log median reaction time and methamphetamine use (r = .02, 95%CI [−0.21, 0.25], p = 0.87, BF 10 = 0.15).

TABLE 3

Statistics of best fitting model using the VSE parameters to predict PwMUDs reduction of use between baseline and follow‐up

Predictor	Unstandardised coefficient	Standard error	Standardised Coef. (β)	β 95% CI	t	p	BF _inc
Intercept	33.31	4.67			7.14	<0.001	1.00
Inverse decay, ∆	−16.12	5.05	−0.39	[−0.63, −0.14]	−3.19	0.002	16.29
Exploration bonus, φ	1.35	0.66	0.22	[0.04, 0.44]	2.04	0.045	4.36
Consistency, C	−11.94	3.71	−0.39	[−0.64, −0.15]	−3.22	0.002	14.94

Statistics of best fitting model using the VSE parameters to predict PwMUDs reduction of use between baseline and follow‐up We then conducted a post hoc hierarchical regression to see if our best fitting model of VSE predictors (directed exploration bonus, φ; consistency, C, and inverse decay, ∆) had stronger predictive utility than a previous approach in this sample that used working memory to predict future methamphetamine use. The first block of this regression that retained the original study's variables (age, sex, intelligence, sociodemographic status, depression, severity of dependence to methamphetamine, severity of dependence to cannabis, working memory and delay discounting k score) did not significantly predict reductions in TLFB scores, F(9,61) = 1.51, p = 0.17 and explained 6% of the variance. In comparison, adding in the three best VSE parameters in the second block made the model significant, F(12,58) = 2.61, p = 0.008 and explained 22% of the variance in PwMUD's reduction in TLFB scores.

DISCUSSION

We revealed the mechanisms driving maladaptive explore/exploit decisions amongst PwMUD and identified how some of these mechanisms can predict future changes in methamphetamine use. At treatment onset, PwMUD exhibited limited exploitive computational processes, which seemed to produce a persistent form of ‘random’ behavioural exploration. This was evident by PwMUD having a reduced sensitivity to overall reinforcement (reward sensitivity, θ); a likely preference for using fewer, more recent trials to guide decision‐making (inverse decay, ∆), and a behavioural pattern of consistently switching their choices across the task without improving their performance. In addition, those PwMUD who showed greater reductions of methamphetamine use during early recovery demonstrated, at baseline, greater directed exploration (directed exploration bonus, φ), reduced choice stochasticity (consistency, C) and greater incorporation of previous outcomes in their decisions (inverse decay, ∆). These findings highlight that PwMUD's presentation to treatment is characterised by dysfunction in exploitative computational processes, while both exploration and exploitative computational processes impact treatment success. We provide a summary reminder of the VSE parameters and their key findings in Table 4 to aid the reader.

TABLE 4

Summary of VSE parameters, their interpretation, meanings and findings in between‐group and predictive analysis

Parameter name	Range	Interpretation	Between‐group differences	Predicting changes in meth. use
Exploitative parameters
Reinforcement sensitivity, θ	θ ∈ [0, 1]	Influences the strength of rewards/losses on value calculations. Values near 0 reflect a weaker sensitivity to value of rewards/losses equally. Values near 1 reflect a greater sensitivity to value of rewards/losses equally.	PwMUD had less sensitivity to reinforcement than controls.	Was not included in the best fitting model.
Inverse decay, Δ	Δ ∈ [0, 1]	Identifies how many previous trial outcomes the participant is using to guide their current decision. At 0, only the previous trials outcome guides the next decision. At 1, all the previous trials outcomes guide the next decision.	PwMUD likely used fewer previous trial outcomes to guide their current decision, compared to controls.	Was in the best fitting model. Using fewer previous choices were associated with a smaller reduction in methamphetamine use.
Explorative parameters
Directed exploration bonus, φ	φ ∈ [−∞, ∞]	Sets a maximal ‘threshold’ of how biassed the participant is to select decks that have not been explored in recent trials. Negative values indicate an overall preference to keep selecting familiar decks (exploitation). Positive values indicate an overall preference to explore recently unselected decks (directed exploration).	A likely null difference between groups. Neither group had a higher threshold or bias towards directed exploration.	Was in the best fitting model. Having a greater bias towards directed exploration was associated with a greater reduction in methamphetamine use.
Directed exploration learning rate, α	α ∈ [0, 1]	How quickly participants return to their maximal value of exploration again, after having recently explored. Values near 0 reflect a slow return to the maximal explore value. Values near 1 reflect a quick return to the maximal explore value.	PwMUD reached their upper value of directed exploration slower than controls. Please note, this does not necessarily mean their exploration value was lower than exploitation.	Was not included in the best fitting model.
Other parameters
Consistency, C	C ∈ [0, ∞]	Reflects stochastic behaviour (i.e., whether the behaviour of the participant is consistent or not with the equations of the VSE model). Greater values reflect behaviour with greater consistency to the VSE model. Smaller values reflect more unpredictable behaviour to the VSE model.	A likely null difference between groups. This means both groups appeared to act equally consistently within the parameters of the VSE model.	Was in the best fitting model. Having greater choice stochasticity was associated with a smaller reduction in methamphetamine use.

Summary of VSE parameters, their interpretation, meanings and findings in between‐group and predictive analysis Currently, the prevailing view is that substance addiction represents a bias away from exploration and towards excessive exploitation. , , Our finding that PwMUD demonstrate weakened exploitative mechanisms challenges this perspective. In fact, previous behavioural, computational and neuroimaging studies have shown that PwMUD have a reduced sensitivity to positive reinforcement and decreased dopaminergic functioning in key regions related to exploitative reinforcement learning (e.g., the striatum). Furthermore, our finding of a likely decreased inverse decay amongst PwMUD is similar to another computational study of decision‐making amongst PwMUD. Together, these lines of evidence, in conjunction with lack of between‐group differences in the directed exploration bonus parameter (φ), indicate that PwMUD have fundamental and specific weaknesses in identifying and exploiting prior instances of reinforcement, which then lead to a behavioural presentation of ‘random exploration’. In contrast, PwMUD who showed the greatest reductions of methamphetamine use had greater directed exploration, showed greater consistency in their decisions and incorporated more past outcomes into their choices. There are at least two non‐mutually exclusive possibilities for this finding. First, PwMUD with greater directed exploration may be more open to engage with alternative, non‐drug rewards. As such, these participants may have been more likely to identify rewarding aspects of non‐drug related actions (i.e., social connectedness and personal meaning), which is protective towards recovery. , Second, because these participants incorporated more past choices into their decision‐making, they may be more mindful about the outcomes of methamphetamine‐related choices during recovery. For example, such PwMUD may have been able to incorporate greater amounts of evidence towards recovery (or against relapse) and thus have greater biases towards treatment goals. Our study highlights how computational modelling of the explore/exploit framework can help predict the clinical prognosis of PwMUD. This is particularly important, as the search for neuropsychological predictors of PwMUD's treatment outcomes is still relatively new and, as such, provides an additional data source for clinicians to identify those at risk of poor treatment outcomes. Furthermore, the VSE parameters identified appear to show a significant improvement over previous modelling attempts with this dataset, when predicting changes of methamphetamine use over the early stages of recovery. Finally, we show how the VSE model and detailed behavioural outcomes can dissect critical decision‐making mechanisms that characterise explore/exploit behaviour at treatment onset. However, these findings must be considered in the context of the limitations of our study. For example, the original protocol, on which our data are based, lost ~25% of PwMUD to follow‐up. Even though this retention rate is satisfactory relative to other studies, our results may be skewed towards those who can consistently engage with research programmes. Furthermore, our PwMUD group presented with higher scores of depressive symptomatology and inattention/disinhibition and lower scores of full‐scale IQ, all of which may impact task performance. Still, we accounted for these potential covariates in our between‐group analyses and found no change to our overall findings. Finally, it is important to note that performance on a rapid, small‐scale and structured computer task may not necessarily generalise to slower, larger‐scale and less structured decisions a person makes in their day‐to‐day lives during recovery. As such, replication of our findings in more ecologically relevant paradigms appears necessary. To conclude, we identified specific dysfunction in explore/exploit mechanisms amongst PwMUD and then used these parameters to predict changes in methamphetamine use during early recovery. At treatment onset, PwMUD exhibited a behavioural pattern of random exploration, driven by deficits in identifying and exploiting the outcomes of non‐drug actions. In comparison, we found that the greatest reductions in methamphetamine use were associated with greater directed exploration, more consistent choices and a greater use of previous outcomes. Our findings highlight the importance of identifying the underlying computational mechanisms of explore/exploit behaviour in addiction, while also providing a useful, methodological approach that can be applied across other addiction‐related cohorts.

AUTHOR CONTRIBUTIONS

AHR and AVG designed the study. AHR ran the analyses and interpreted the results, with support from TT‐JC. AHR led the manuscript writing process with TT‐JC and AVG providing significant feedback and revisions. All authors critically reviewed the above content and approved this final version for publication. Figure S1. Comparing VSE model fit to several alternative models for PwMUD. Figure S2. Comparing VSE model fit to several alternative models for Controls. Table S1. Spearman's Rho correlations between all VSE parameters Table S2. Spearman's Rho correlations between potential covariates and VSE parameters for between‐group comparisons. Click here for additional data file.

55 in total

1. Differential effects of MDMA, cocaine, and cannabis use severity on distinctive components of the executive functions in polysubstance users: a multiple regression analysis.

Authors: Antonio J Verdejo-García; Francisca López-Torrecillas; Francisco Aguilar de Arcos; Miguel Pérez-García
Journal: Addict Behav Date: 2005-01 Impact factor: 3.913

Review 2. Cognitive deficits in individuals with methamphetamine use disorder: A meta-analysis.

Authors: Stéphane Potvin; Julie Pelletier; Stéphanie Grot; Catherine Hébert; Alasdair M Barr; Tania Lecomte
Journal: Addict Behav Date: 2018-01-31 Impact factor: 3.913

3. Quality of life among treatment seeking methamphetamine-dependent individuals.

Authors: Rachel Gonzales; Alfonso Ang; Deborah C Glik; Richard A Rawson; Stella Lee; Martin Y Iguchi
Journal: Am J Addict Date: 2011-05-31

4. Cortical substrates for exploratory decisions in humans.

Authors: Nathaniel D Daw; John P O'Doherty; Peter Dayan; Ben Seymour; Raymond J Dolan
Journal: Nature Date: 2006-06-15 Impact factor: 49.962

5. Dropout rates of in-person psychosocial substance use disorder treatments: a systematic review and meta-analysis.

Authors: Sara N Lappan; Andrew W Brown; Peter S Hendricks
Journal: Addiction Date: 2019-11-06 Impact factor: 6.526

6. Making the hard work of recovery more attractive for those with substance use disorders.

Authors: James R McKay
Journal: Addiction Date: 2016-08-17 Impact factor: 6.526

7. Cognitive deficits in methamphetamine addiction: Independent contributions of dependence and intelligence.

Authors: Rebecca E Fitzpatrick; Adam J Rubenis; Dan I Lubman; Antonio Verdejo-Garcia
Journal: Drug Alcohol Depend Date: 2020-02-05 Impact factor: 4.492

8. A new survey of methamphetamine users in treatment: who they are, why they like "meth," and why they need additional services.

Authors: Jane Carlisle Maxwell
Journal: Subst Use Misuse Date: 2013-10-04 Impact factor: 2.164

Review 9. Structural and metabolic brain changes in the striatum associated with methamphetamine abuse.

Authors: Linda Chang; Daniel Alicata; Thomas Ernst; Nora Volkow
Journal: Addiction Date: 2007-04 Impact factor: 6.526

10. Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task.

Authors: Katia M Harlé; Shunan Zhang; Max Schiff; Scott Mackey; Martin P Paulus; Angela J Yu
Journal: Front Psychol Date: 2015-12-18

1 in total

1. Computational models of exploration and exploitation characterise onset and efficacy of treatment in methamphetamine use disorder.

Authors: Alex H Robinson; Trevor T-J Chong; Antonio Verdejo-Garcia
Journal: Addict Biol Date: 2022-05 Impact factor: 4.093

1 in total