Ravi D Mill1, Emily C Winfield1, Michael W Cole1, Suchismita Ray2. 1. Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ 07102, USA. 2. Department of Health Informatics, School of Health Professions, Rutgers Biomedical and Health Sciences, Newark, NJ 07103, USA. Electronic address: shmita@shp.rutgers.edu.
Abstract
Prescription opioid use disorder (POUD) has reached epidemic proportions in the United States, raising an urgent need for diagnostic biological tools that can improve predictions of disease characteristics. The use of neuroimaging methods to develop such biomarkers have yielded promising results when applied to neurodegenerative and psychiatric disorders, yet have not been extended to prescription opioid addiction. With this long-term goal in mind, we conducted a preliminary study in this understudied clinical group. Univariate and multivariate approaches to distinguishing between POUD (n = 26) and healthy controls (n = 21) were investigated, on the basis of structural MRI (sMRI) and resting-state functional connectivity (restFC) features. Univariate approaches revealed reduced structural integrity in the subcortical extent of a previously reported addiction-related network in POUD subjects. No reliable univariate between-group differences in cortical structure or edgewise restFC were observed. Contrasting these mixed univariate results, multivariate machine learning classification approaches recovered more statistically reliable group differences, especially when sMRI and restFC features were combined in a multi-modal model (classification accuracy = 66.7%, p < .001). The same multivariate multi-modal approach also yielded reliable prediction of individual differences in a clinically relevant behavioral measure (persistence behavior; predicted-to-actual overlap r = 0.42, p = .009). Our findings suggest that sMRI and restFC measures can be used to reliably distinguish the neural effects of long-term opioid use, and that this endeavor numerically benefits from multivariate predictive approaches and multi-modal feature sets. This can serve as theoretical proof-of-concept for future longitudinal modeling of prognostic POUD characteristics from neuroimaging features, which would have clearer clinical utility.
Prescription opioid use disorder (POUD) has reached epidemic proportions in the United States, raising an urgent need for diagnostic biological tools that can improve predictions of disease characteristics. The use of neuroimaging methods to develop such biomarkers have yielded promising results when applied to neurodegenerative and psychiatric disorders, yet have not been extended to prescription opioid addiction. With this long-term goal in mind, we conducted a preliminary study in this understudied clinical group. Univariate and multivariate approaches to distinguishing between POUD (n = 26) and healthy controls (n = 21) were investigated, on the basis of structural MRI (sMRI) and resting-state functional connectivity (restFC) features. Univariate approaches revealed reduced structural integrity in the subcortical extent of a previously reported addiction-related network in POUD subjects. No reliable univariate between-group differences in cortical structure or edgewise restFC were observed. Contrasting these mixed univariate results, multivariate machine learning classification approaches recovered more statistically reliable group differences, especially when sMRI and restFC features were combined in a multi-modal model (classification accuracy = 66.7%, p < .001). The same multivariate multi-modal approach also yielded reliable prediction of individual differences in a clinically relevant behavioral measure (persistence behavior; predicted-to-actual overlap r = 0.42, p = .009). Our findings suggest that sMRI and restFC measures can be used to reliably distinguish the neural effects of long-term opioid use, and that this endeavor numerically benefits from multivariate predictive approaches and multi-modal feature sets. This can serve as theoretical proof-of-concept for future longitudinal modeling of prognostic POUD characteristics from neuroimaging features, which would have clearer clinical utility.
Prescription opioid use disorder (POUD) is a serious public health problem in the United States. It is well documented that prescription opioids are easily available and this has resulted in an increase in both their medical and non-medical use (Compton et al., 2016, Dart et al., 2015). According to CDC Wonder1, drug overdose deaths relating to prescription opioids increased from 3,442 in 1999 to 17,029 in 2017, leading to widespread acknowledgement of an ongoing opioid epidemic in the US. Whilst neuropsychological research has identified opioid-related cognitive deficits via behavioral studies (e.g. risky decision-making and aberrant cognitive control; Baldacchino et al., 2012), research on the long-term neural effects of POUD is comparatively lacking. Integrating greater knowledge of the neural effects with insights into cognitive dysfunction is likely critical to advancing a mechanistic understanding of opioid use disorders, and improving treatments in the clinic.Previously reported human neuroimaging studies have focused predominantly on illicit opioid use i.e. heroin. These studies have reported opioid-induced alterations to structural MRI (sMRI; Li et al., 2014, Liu et al., 2009), diffusion imaging (Li et al., 2016, Sun et al., 2017) and functional MRI (Fareed et al., 2017, Ieong and Yuan, 2017, Ma et al., 2010, Yuan et al., 2010) measures, relative to healthy controls. These large-scale effects have been supplemented by the results of animal models of long-term opioid use, which also revealed prominent changes to brain structure and function at the cellular and systems level (Cunha-Oliveira et al., 2007, Kibaly et al., 2019, Liao et al., 2005). Despite likely overlap in the neural pathophysiology of heroin addiction and POUD, differences might also arise due to the purer chemical composition of prescription opioid drugs (Risser et al., 2007, Upadhyay et al., 2010) and the different routes of consumption (i.e. sublingual for PO versus intravenous for heroin; Upadhyay et al., 2012). These possibilities further corroborate the need for research targeting the long-term neural effects of prescription opioids specifically.To this end, the few previous neuroimaging studies conducted in POUD subjects have almost exclusively used “univariate” analysis approaches to discriminate between POUD and healthy control groups. This typically takes the form of 2-sample ttests contrasting structural (e.g. sMRI morphometry measures) and/or functional (e.g. resting-state functional connectivity, restFC) measures separately for each region/connection between the two groups. Such approaches have identified POUD-related alterations in distributed cortical (i.e. insula, orbitofrontal cortex, anterior cingulate cortex, ventromedial prefrontal cortex) and subcortical (i.e. nucleus accumbens, dorsal striatum, amygdala, hippocampus; Lin et al., 2016, McConnell et al., 2020, Upadhyay et al., 2010, Younger et al., 2011) regions, which overlap to a large degree with a “drug cue processing network” linked to general addiction-related dysfunction (Jasinska et al., 2014, Ray et al., 2015). Some of these regions have also been associated with behavioral variables with likely clinical relevance, such as structural volume in the nucleus accumbens correlating with the self-assessed severity of opioid misuse (McConnell et al., 2020; see also Upadhyay et al., 201022).In contrast to these univariate approaches, the field of clinical neuroscience has more recently transitioned towards “multivariate” approaches that seek to classify disease characteristics on the basis of the multivariate pattern of input brain imaging measures or “features” (Arbabshirani et al., 2017, Woo et al., 2017). Such approaches typically involve some degree of model training, rendering them part of a broad category of supervised “machine learning” (ML) algorithms in biomedical research. The coefficients resulting from ML model training are then applied to generate predictions of disease characteristics in held-out test subjects. Multivariate ML approaches have been successfully applied to classify binary group status (1 = clinical, 0 = healthy control) across diverse forms of pathology (e.g. Alzheimer’s Disease; Kloppel et al., 2008, Liu et al., 2018, Rathore et al., 2017, Woo et al., 2017) and psychopathology (e.g. schizophrenia; Abi-Dargham and Horga, 2016, Orrù et al., 2012, Yahata et al., 2017, Yassin et al., 2020). The findings consistently demonstrate greater statistical sensitivity for multivariate ML classification approaches compared to univariate between-group contrasts (Hebart and Baker, 2018, Jimura and Poldrack, 2012, Spronk et al., 2020). The multivariate ML approaches are also by design more viable for clinical translation, given that trained models can be readily applied to generate predictions in individual subjects (unlike univariate approaches), as would be the use-case in the clinic. In addition to binary classifications of group status, multivariate ML approaches have more recently been used in predictive modeling of continuous behavioral outcome measures (Mill et al., 2020, Rosenberg et al., 2016, Shen et al., 2017). The potential for modeling clinically relevant behavior has brought the field closer to the long-term goal of developing neuroimaging “biomarkers”, with an ideal application being to harness multivariate ML methods towards prospective modeling of clinically useful disease characteristics (e.g. illness severity and prognosis; Matthews and Hampshire, 2016, Woo et al., 2017).Despite this potential, multivariate ML approaches to analyzing neuroimaging data have thus far been scarcely applied to opioid addiction. The few previously published opioid-related papers have focused on classification of heroin users, and on the basis of isolated (uni-modal) MRI measures (resting-state fMRI activations or arterial spin labeling; Li et al., 2019, Zhang et al., 2011a, Zhang et al., 2011b). The lack of multivariate ML applications is surprising given the sustained high relapse rate in opioid addiction (Li et al., 2016, Rong et al., 2016, Stewart et al., 2019, Tang et al., 2006), highlighting a clear need to go beyond current purely behavioral bases of diagnosing and monitoring outcomes in the clinic.We aimed to address these various lacunae in the present report. Multi-modal MRI data (sMRI and restFC) were analyzed for POUD subjects and healthy controls, encompassing cortical and subcortical regions. The efficacy of univariate and multivariate approaches in discriminating between the two groups was explored across both MRI modalities. Multivariate classification approaches were used to discriminate group status, whereas multivariate predictive modeling approaches were used to predict individual differences in a clinically relevant outcome measure (persistence in goal pursuit). The results provide theoretical proof-of-concept for the efficacy of multivariate (versus univariate) and multi-modal (versus uni-modal) approaches in the study of POUD. Specifically, we demonstrate that at a given timepoint following a POUD diagnosis, the neural and behavioral effects of long-term POUD use can be reliably identified from multivariate structural and functional neuroimaging data. Whilst this finding in of itself has theoretical utility in advancing our understanding of the brain basis of opioid addiction, our preliminary results are also discussed with a view to the long-term goal of developing more clinically useful prognostic biomarkers from neuroimaging features. To preview the points raised in the Discussion, our reported success in classifying clinical status and predicting relevant behavior from neuroimaging data acquired at the present timepoint could serve as a precursory step to more clinically useful prospective modeling of future disease outcomes (e.g. relapse rate assessed at a later timepoint) from the same data.
Materials and methods
Participants
Twenty-six (5 female) individuals with prescription opioid use disorder (POUD group), and 21 (9 female) healthy volunteers with no history of substance use (control group), matched for age, education, and ethnic background were included in the study (see Table 1 for full sample demographics). Inclusion criteria for both groups included being between 21 and 54 years of age, English as their primary language, right-handedness, and near 20/20 vision (or corrected). Exclusion criteria for both groups included any serious physical illness, a history of childhood learning disability or special education that was current, presence of any serious psychiatric illness (for the POUD group, this entailed any psychiatric or substance use disorder other than POUD; further details below), MRI contraindications, claustrophobia, abnormal hearing in either ear, history of loss of consciousness for more than 30 minutes, alcohol abuse and substance dependence including past dependence, and pregnancy for women.
Table 1
Demographic and substance use information for individuals with POUD and healthy controls. “Diff. stat” refers to test statistics for between-group difference contrasts of specified variables; 2-sample ttests were conducted for all variables apart from the categorical “Female” sex measure (for which the odds ratio from a Fisher’s exact test is provided).
POUD (n = 26): Mean, Range (SD)
Control (n = 21): Mean, Range (SD)
Diff. stat
p
Age (yrs.)
32, 22–48 (6.3)
33, 21–54 (6.5)
0.74
0.46
Education (yrs.)
12, 3–20 (2.4)
13, 7–20 (2.1)
1.04
0.30
Race/Ethnicity
Caucasian
11
6
African American
14
11
Hispanic
1
4
Female (n)
5
9
3.15
0.112
Opioid Craving (before scan)
3.04, 1–7 (2.11)
1, 1–1 (0.00)
4.43
0.001
Prescription Opioid Use
Frequency (days/week)
6.8, 3–7 (0.38)
N/A
Duration of use (yrs.)
5.5, 2–20 (3.35)
N/A
Money spent ($/week)
$739, $70–2,500 (4 8 3)
N/A
Alcohol Use (no. of drinks/week)
0.62, 0–2 (0.85)
0.86, 0–3 (1.1)
−0.87
0.39
Cigarette Use
Frequency (days/week)
7
7
Quantity (cigarettes/day)
6.21, 4–9 (1.53)
5.75, 3–8 (1.67)
0.66
0.52
Non-smokers (#)
12
13
Non-cigarette smokers’ frequency and quantity data were not included in the group average.
Demographic and substance use information for individuals with POUD and healthy controls. “Diff. stat” refers to test statistics for between-group difference contrasts of specified variables; 2-sample ttests were conducted for all variables apart from the categorical “Female” sex measure (for which the odds ratio from a Fisher’s exact test is provided).Non-cigarette smokers’ frequency and quantity data were not included in the group average.To clarify inclusion/exclusion criteria for the POUD subjects specifically, these were recruited on the basis of meeting DSM 5 criteria for moderate-to-severe opioid use disorder, as operationalized in the SCID-5-RV (First, 2015). Our participants reported using opioids because of their strong desire to use these drugs, and their inability to control opioid use. Opioid patients with comorbid other substance use disorders (SUDs) were excluded from the study. Comorbid psychiatric disorders including psychotic disorders, depressive disorders, anxiety disorders, trauma and stressor-related disorders, and eating disorders were also exclusionary. To focus our study on long-term PO use, inclusion in the POUD group also required a history of using prescription opioid (PO) pills for at least the past 1 year and participants were excluded if they were co-dependent on both PO and any other substance. Note also that we only included POUD subjects that were not experiencing chronic pain, thereby avoiding another potential confounding influence on the assessed neural measures (Upadhyay et al., 2010). These strict inclusion/exclusion criteria were essential in allowing us to study the long-term neural effects of PO use in a targeted manner.Opioid using participants were recruited from Integrity House’s inpatient addiction treatment center, a drug free facility in Newark, New Jersey. Patients were detoxified before admission to this treatment facility, which had a 6 month-long inpatient treatment program. During the 2nd/3rd week since their admission, we conducted a screening interview and if eligible, the patients were recruited for the study. All participants took part in the study towards the end of the first month since their admission. All recruited POUD subjects had been abstinent from PO since admission to the center, and were not undergoing methadone or buprenorphine treatment. Thus, by the time patients took part in the study, they were not experiencing opioid withdrawal. To summarize, our screening protocols ensured that POUD subjects in our final sample were long-term PO users (mean duration of use = 5.5 years, see Table 1), were not undergoing withdrawal, were free from chronic pain, and were free from serious physical or psychiatric comorbidity. Control participants were recruited by advertising in North Jersey Craigslist and by word-of-mouth.On the day of scanning at the Rutgers University Brain Imaging Center (RUBIC), all participants provided written informed consent approved by the Rutgers University Institutional Review Board (IRB), and were administered a urine screen to rule out pregnancy in women, and to ensure negative urine toxicology for cocaine, methamphetamine, THC, opiate, and benzodiazepines (via the One Step Multi-Drug Screen Test Panel). They were also assessed for recent alcohol use with a breathalyzer, and self-rated opioid craving. The groups did not differ in alcohol or tobacco use at the time of enrollment (see Table 1 for all between-group comparisons). At the end of the study, participants received a VISA gift certificate worth $100 for their participation.
Data acquisition.
MRI was collected on a 3 T Siemens TRIO scanner with a 32-channel head coil at the Rutgers University Brain Imaging Center (RUBIC). The findings reported here focus on structural and resting-state functional MRI scans, which were collected as part of a broader study that included task fMRI scans targeting cue reactivity, reward and punishment processing underlying POUD (which are not analyzed here). Structural MRI images were acquired with a T1-weighted MPRAGE sequence (TR = 1900 ms, TE = 2.52 ms, flip angle = 9°, FOV = 256 mm, 1 mm isotropic voxels, 176 slices in sagittal orientation). Resting-state fMRI images were acquired via a whole-brain echo planar imaging (EPI) sequence with multiband acceleration (TR = 600 ms, TE = 28.2 ms, flip angle 30°, FOV = 208 mm, 1.5 × 1.5 × 3 mm voxels, 35 interleaved slices in axial orientation, multiband acceleration factor 5; 1 run of 600 volumes). Dual-echo gradient-recalled echo (GRE) fieldmaps were also acquired prior to the fMRI sequence to correct b0 distortions in the functional data. During the rest fMRI scans, participants were instructed to keep their eyes open and focus on a fixation cross for 6 minutes.
Estimation of structural MRI measures
All processing of the T1 sMRI images was conducted in Freesurfer (Dale et al., 1999, Fischl, 2004, Fischl et al., 2002). Full anatomical T1 segmentation was performed for each subject using the “recon-all” command. From the outputs of this command, the “aparcstats2table” command was used to extract cortical surface gray matter thickness (in mm) from regions parcellated from a standard anatomical atlas (Desikan et al., 2006). We used the “asegstats2table” command to extract mean image intensity (in raw MR signal units) from atlas-defined subcortical volume regions. Both the cortical thickness and subcortical intensity measures were confined to anatomical atlas regions that overlapped with regions in the drug cue processing network (Ray et al., 2015), previously highlighted as a critical network of dysfunction in drug addiction. Homologues for all 18 ROIs reported in the Ray et al paper (which was established in an independent sample of subjects) were identified in the aparc/aseg results and averaged across hemispheres. This yielded a total of 10 ROIs that formed the basis of the univariate and multivariate sMRI analyses: 4 subcortical regions (amygdala; hippocampus; nucleus accumbens, NA; dorsal striatum) with corresponding intensity values, and 6 cortical regions (anterior cingulate cortex, ACC; dorsolateral prefrontal cortex, dlPFC; insula; medial orbitofrontal cortex, mOFC; medial prefrontal cortex, mPFC; parahippocampus, PHC) with corresponding thickness values.
Estimation of resting-state functional connectivity measures
Preprocessing of the rest fMRI data was conducted in FSL using the FEAT toolbox (Woolrich et al., 2001). The fMRI volumes were first motion-corrected using a single-band volume acquired at the start of the run as the reference for realignment. The volumes then underwent b0 unwarping using the GRE fieldmaps, removal of non-brain tissue, highpass filtering with a 0.01 Hz cutoff, linear coregistration to the subject’s anatomical T1 and nonlinear normalization to an MNI template. Resting-state activation timeseries were then extracted from 264 whole-brain regions defined from the Power functional atlas (Power et al., 2011). General linear model (GLM) nuisance regression was performed for the regional timeseries to remove artifacts, which included regressors for 6 motion parameters, white matter and ventricular timeseries, and their temporal derivatives. Volumes with a framewise displacement (FD; Power et al., 2012, Power et al., 2014) greater than 0.25 mm were scrubbed from the GLM to reduce motion contamination. Resting-state functional connectivity (restFC) was finally estimated from the cleaned residual timeseries for each subject, as the Pearson correlation coefficient between all pairs of regions. This yielded a 264-by-264 symmetric restFC matrix for each subject that was used for all univariate and multivariate analyses.
Univariate between-group analyses involving sMRI and restFC features
Univariate sMRI analyses consisted of mixed factor analysis of variance (ANOVA), conducted separately for the subcortical regional intensity and cortical regional thickness measures. For the subcortical regions, a 2 (group: POUD, control) × 4 (region: amygdala, hippocampus, NA, dorsal striatum) mixed ANOVA was conducted. For the cortical regions, a 2 (group: POUD, control) × 6 (region: ACC, dlPFC, insula, mOFC, mPFC, PHC) mixed ANOVA was conducted. In both cases, ‘group’ was the between-subjects factor and ‘region’ was the within-subjects factor. Planned 2-sample ttests contrasting POUD versus control groups for each region were also performed to follow up on the ANOVA results. For these analyses, significance was assessed against an uncorrected alpha threshold of 0.05.Univariate restFC analyses also followed a standard approach, with 2-sample ttests contrasting the POUD and control groups separately for each connection in the upper diagonal of the 264-by-264 matrix (34716 connections total). The r values for each connection were Fisher-z transformed to improve normality before being submitted to the ttest. Significance was assessed after FDR correction for multiple comparisons across connections (corrected alpha = 0.05; Benjamini and Hochberg, 1995).
Multivariate machine learning classification using sMRI and restFC features
We trained multivariate machine learning (ML) classifiers to discriminate between the POUD and control group subjects. Unlike the univariate approaches described above that treated each region (for sMRI measures) or connection (for restFC) as a discrete unit of analysis, multivariate classifiers utilize algorithms that search for the pattern amongst such units (i.e. features) that can optimally separate classes to which input observations are drawn from. In our context, we classified subjects (observations) into their clinical groups (classes): POUD or control. Our chosen algorithm was a minimum-distance classification approach related to representational similarity analysis, which has been successfully applied to human neuroimaging data in a number of studies (Diedrichsen and Kriegeskorte, 2017, Haxby, 2001, Hebart and Baker, 2018, Kriegeskorte et al., 2008, Mur et al., 2009). As described in more detail below, this minimum distance approach classifies an individual subject on the basis of the multivariate distance (Pearson correlation) between their feature values and two group templates (created by averaging over the same features in “control” and “POUD” group subjects in the held-out training data). This simple linear classifier has been used successfully to distinguish a wide array of clinical groups from healthy controls on the basis of neuroimaging data (Mill et al., 2020, Spronk et al., 2020).Separate classifiers were trained on 4 feature sets extracted from the individual subjects. The “sMRI only” classifier used the same measures from the univariate sMRI analyses: subcortical image intensity from 4 regions and cortical thickness from 6 regions. The “sMRI subcortex” classifier used only the image intensity values from the 4 subcortical regions. The “restFC only” classifier used the same measures from the univariate restFC analyses: connections (Pearson r values) extracted from the upper diagonal of each subject’s full restFC matrix. Given the high dimensionality of restFC (34716 connections total), we performed a feature selection step to identify the top 50 most diagnostic features (described fully in the next paragraph). Finally, the “combined” (i.e. multi-modal) classifier input the 10 structural features used in the sMRI classifier and the top 50 features identified by the restFC classifier to distinguish between the groups.The procedure for quantifying classification performance followed the same leave-one-subject-out crossvalidation scheme across all 4 feature sets. Firstly, given that the POUD group (n = 26) was larger than the control group (n = 21), we confined the classification analyses to a random subset of 21 POUD subjects to ensure that model training was performed on balanced group sizes (Poldrack et al., 2019). A total of 42 subjects were hence submitted to crossvalidation, wherein each subject was iteratively held out as a test subject, with all other subjects used for model training. Specific to the restFC and combined classifiers, feature selection was performed at the start of each crossvalidation loop to confine the number of connectivity features to those that were most diagnostic of group status. These diagnostic features were identified via 2-sample ttests contrasting the two groups (excluding the held-out test subject to prevent circularity), separately for each connection (similar to Dosenbach et al., 2010). The top 50 connections were selected by sorting the absolute t statistics. Note that this feature selection step was fully embedded in the crossvalidation scheme (i.e. is applied to the training set only) rather than being applied as a preliminary step to the full sample. The latter approach has been highlighted as a common pitfall in applying machine learning approaches to disease classification, as it leads to inflated classification accuracies from double-dipping (Arbabshirani et al., 2017).Following feature selection, group templates were created by averaging selected features separately for the POUD and control subjects in the training set (again, excluding the held-out test subject). Pearson correlation (r) was used to assess the distance between the test subject’s features and both group templates, with a binary classification decision (“POUD” versus “control”) set on the basis of which group template r value was higher. This process was repeated holding out each subject in the sample, with classification performance assessed as the average decision accuracy across crossvalidation loops.Classification accuracy was quantitatively compared across the four feature sets. Statistical significance of the classification accuracies was assessed via a permutation approach. This generated a null distribution of classification accuracies by scrambling the group labels prior to running the classification on the same feature sets (over 1000 iterations). P values were set as the proportion of permuted classification accuracies that were greater than that observed for each (unscrambled) classifier, at alpha = 0.05. This permutation testing approach ensured that random noise in the sample was not driving the between-groups classification (Mill et al., 2020). To facilitate comparison across the different feature sets, the same scrambled group labels (for each of the 1000 iterations) were used when generating null distributions for all reported ML models. Note that the pattern of classification accuracies was identical when using binomial tests against 50% chance accuracy. Where noted, we also corrected classification accuracy p values for multiple modeling variations via the Holm-Bonferroni familywise error correction procedure (Holm, 1979, Mill et al., 2016).We conducted a number of control analyses to illustrate that the significant group classification achieved by the “combined” feature model was robust to analytic variation. Firstly, whilst we chose 50 as the number of diagnostic restFC features on a somewhat principled basis (i.e. as the nearest “whole” number matching the sample size of 42), we also trained models after varying this hyperparameter to select the top 100 or top 150 connections. We also ran the classifications with an alternative k-folds crossvalidation approach, wherein 90% of subjects were assigned to training and 10% to testing sets over 10 folds (each test fold containing equal numbers of POUD and control subjects that were uniquely assigned across folds). We also ran the classifications using an alternative linear classification algorithm - support vector machine - using the libSVM toolbox implementation (Chang and Lin, 2011) with leave-one-out crossvalidation. Finally, we also demonstrated that classification performance was not driven by head motion: firstly by calculating the Spearman correlation between subjects’ group status (1 = POUD, 0 = control) and the mean framewise displacement (FD) across their resting-state session. For completeness, we also regressed out the mean FD from each input feature prior to running the “combined” classification model (following e.g. Rao et al., 2017, Todd et al., 2013).
Multivariate predictive modeling of task behavior
We used a multivariate approach to model individual differences in a clinically relevant behavioral measure. The selected measure was behavioral performance on the Persistence-After-Setbacks (PAS) task, which has been described in detail previously (Bhanji et al., 2016, Bhanji and Delgado, 2014; Bhanji et al., in revision). Briefly, participants played a “path decision game” in which they chose a path and tried to earn points by advancing a stick figure through obstacles to the end of the path. To start each round, participants chose between three paths with a point value at the end (80, 70, or 60 points). Participants then encountered obstacles while taking steps along the chosen path, and pressed a button to see if the obstacles resulted in a negative event received (62.5% of obstacles, stick figure is sent back to the beginning of the path) or negative event avoided (37.5% of obstacles, stick figure advances one step forward along the path). After receiving a negative event, participants made a decision to persist with their chosen path (i.e., the path where they just received a negative event) or choose a different path. Persistence behavior was calculated as the proportion of choices to persist with the same path after a negative event. This measure was used as the outcome for our predictive modeling analyses as it captures cognitive processes underlying persistent pursuit of a goal despite setbacks that are likely disrupted in clinical addiction disorders, and are likely prognostically relevant (e.g. capturing unwillingness to persist on a rehabilitation program after experiencing a relapse; Sinha, 2007). In the Results, we confirm the clinical relevance of this behavioral measure by contrasting it across the POUD and control groups via 2-sample ttest.Our multivariate modeling approach was applied to all subjects that completed the task (POUD n = 19, control n = 19). Multiple linear regression models were trained to predict persistence behavior from the same four feature sets used in the ML classification analyses: sMRI only (4 subcortical image intensity plus 6 cortical thickness features), sMRI subcortex (4 subcortical image intensity features), restFC only (top 50 diagnostic connection features), and combined (multi-modal: 10 sMRI features plus top 50 diagnostic restFC features). Leave-one-subject-out crossvalidation was again used to assess model performance. On each crossvalidation loop, the held-out test subject was identified and the remaining subjects were assigned to the training set. All input features were then zscored across subjects, using the training set mean and standard deviation to prevent circularity. The to-be-predicted persistence data were also zscored in non-circular fashion (using the training set mean and standard deviation) prior to model estimation.For models involving restFC features (“restFC only” and “combined”), we adapted the top 50 diagnostic feature selection approach used for the ML classifications to better reflect individual differences-style modeling. Diagnostic connections were identified as the top 50 absolute values for the Pearson correlation between connectivity and the persistence outcome measure, computed across training set subjects. Note we did not vary this “top n” hyperparameter to maintain consistency with the top n value that previously yielded positive ML classification results. For the restFC and combined models, this resulted in the number of features (50 and 60 respectively) exceeding the number of observations (37 training subjects). Hence, principal component analysis (PCA) was used to reduce the dimensionality of the input feature matrix and ensure it was full rank (Mill et al., 2020). PCA was conducted on training set features (trainX), with the number of retained components varied over two values: the max permitted by the data (i.e. the number of training set subjects minus 1 = 36), and 10. The latter was expected to perform better as including all available PCs likely causes the model to overfit to noise in the training set.The persistence data for the training set (trainY) were then regressed onto the selected number of PC scores (max or 10), with the resulting regression coefficients multiplied by the corresponding PCA loadings to obtain model beta estimates. These betas were multiplied with the held-out test subject’s features (testX) to generate a predicted behavioral score (testŶ). This process was repeated with all 38 subjects iteratively assigned to the test set, with behavioral prediction accuracy finally quantified as the Pearson correlation between the predicted and actual persistence scores. Statistical significance was assessed via p values accompanying the Pearson r (alpha = 0.05). Whilst the Pearson correlation values provided insight into how well the pattern of individual differences in behavior across subjects was predicted, for completeness we also reported the mean absolute error (MAE) capturing prediction accuracy in more absolute terms (Poldrack et al., 2019, Scheinost et al., 2019), as well as the R squared coefficient of determination (calculated using the sum of squares formulation, as recommended in Poldrack et al., 2019). Note that we corrected for the variation in the number of PCs hyperparameter via the Holm-Bonferroni familywise error correction procedure (Holm, 1979, Mill et al., 2016).As with the ML classification analyses, we demonstrated that prediction accuracy for the “combined” model was not driven by head motion: both by computing the Spearman correlation between behavioral persistence and mean FD, and also by regressing out mean FD from each input feature prior to running the “combined” predictive model. To further demonstrate robustness, we also repeated the combined predictive modeling analysis with an alternative k-folds crossvalidation scheme (90% of subjects assigned to training and 10% to testing sets, over 9 folds).
Results
Univariate sMRI analyses revealed between-group structural differences in subcortical but not cortical regions
The utility of univariate approaches in separating the POUD group from the control group on the basis of sMRI features was investigated via mixed factor ANOVA, with “group” entered as a between-subjects factor and “region” as a within-subjects factor. For subcortical drug cue-related regions from which MR image intensity had been extracted (see Method 2.3), the 2 (group: POUD, control) × 4 (region: amygdala, hippocampus, NA, dorsal striatum) ANOVA yielded a main effect of region: F(3,135) = 2017.83, p < .001, η2G = 0.91 (see Fig. 1a for descriptives). The main effect of group was a non-significant trend in the direction of lower regional intensity for the POUD subjects: F(1,45) = 3.50, p = .068, η2G = 0.06 (marginal means: POUD = 82.84, control = 83.69). Importantly, the region × group interaction was significant: F(3,135) = 3.97, p = .009, η2G = 0.02. Examining the means in Fig. 1a revealed that the interaction was driven by the POUD < control effect varying in magnitude (but not direction) across the regions. Planned 2-sample ttests supported this, revealing that the lower intensity for the POUD subjects was significant for the dorsal striatum (t(45) = 2.45, p = .018, Cohen’s d = 0.72) and NA (t(45) = 2.31, p = .025, d = 0.68), but not for the hippocampus (t(45) = 0.91, p = .370, d = 0.27) and amygdala (t(45) = 0.74, p = .461, d = 0.22).
Fig. 1
Univariate analyses revealed between-group differences for subcortical sMRI measures, but not cortical sMRI or restFC measures. A) Subcortical sMRI results: mean image intensity (MR units) across volumetric Freesurfer ROIs. Each bar represents the mean and standard error for each region, with individual subject data points overlaid, separately for the POUD (blue) and Control (orange) groups. Asterisks denote significant between-group differences via 2-sample ttest at p < .05 (uncorrected). d. striatum = dorsal striatum; Hippo. = hippocampus; N. Acc. = nucleus accumbens. B) Cortical sMRI results: gray matter thickness (mm units) across surface Freesurfer ROIs. Plotting conventions are the same as in panel A. mOFC = medial orbitofrontal cortex; PHC = parahippocampus; dlPFC = dorsolateral prefrontal cortex; mPFC = medial prefrontal cortex; ACC = anterior cingulate cortex. C) RestFC results: 264-by-264 region restFC matrices for the group average POUD (top left), group average Control (top middle) and group difference (POUD minus Control; top right). Matrices plot Pearson r connectivity values, with the Power et al (2011) functional atlas from which the 264 regions were localized depicted in the bottom panel. Note that the unthresholded difference matrix is provided as none of the connections significantly differed between the two groups after multiple comparison correction. Network affiliations for each region: AUD = auditory; CER = cerebellar; CON = cingulo-opercular; DMN = default mode; DAN = dorsal attention; FPN = fronto-parietal; SAL = salience; SMH = somatomotor hand; SMF = somatomotor face; SUB = subcortical; VAN = ventral attention; VIS = visual. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Univariate analyses revealed between-group differences for subcortical sMRI measures, but not cortical sMRI or restFC measures. A) Subcortical sMRI results: mean image intensity (MR units) across volumetric Freesurfer ROIs. Each bar represents the mean and standard error for each region, with individual subject data points overlaid, separately for the POUD (blue) and Control (orange) groups. Asterisks denote significant between-group differences via 2-sample ttest at p < .05 (uncorrected). d. striatum = dorsal striatum; Hippo. = hippocampus; N. Acc. = nucleus accumbens. B) Cortical sMRI results: gray matter thickness (mm units) across surface Freesurfer ROIs. Plotting conventions are the same as in panel A. mOFC = medial orbitofrontal cortex; PHC = parahippocampus; dlPFC = dorsolateral prefrontal cortex; mPFC = medial prefrontal cortex; ACC = anterior cingulate cortex. C) RestFC results: 264-by-264 region restFC matrices for the group average POUD (top left), group average Control (top middle) and group difference (POUD minus Control; top right). Matrices plot Pearson r connectivity values, with the Power et al (2011) functional atlas from which the 264 regions were localized depicted in the bottom panel. Note that the unthresholded difference matrix is provided as none of the connections significantly differed between the two groups after multiple comparison correction. Network affiliations for each region: AUD = auditory; CER = cerebellar; CON = cingulo-opercular; DMN = default mode; DAN = dorsal attention; FPN = fronto-parietal; SAL = salience; SMH = somatomotor hand; SMF = somatomotor face; SUB = subcortical; VAN = ventral attention; VIS = visual. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)We repeated the mixed ANOVA approach for cortical drug cue-related regions from which surface gray matter thickness had been extracted (see Method 2.3). The 2 (group: POUD, control) × 6 (region: ACC, dlPFC, insula, mOFC, mPFC, PHC) mixed ANOVA yielded a significant main effect of region: F(5,225) = 159.21, p < .001, η2G = 0.63 (see Fig. 1b for descriptives). The main effect of group was non-significant: F(1,45) = 0.03, p = .863, η2G = 0.00 (marginal means: POUD = 2.66, control = 2.65). The region × group interaction was a non-significant trend: F(5,225) = 2.16, p = .060, η2G = 0.02. Inspection of the means in Fig. 1b suggested that the group effect on cortical thickness varied in both magnitude and direction across regions. Planned 2-sample ttests revealed no significant differences between the POUD and control groups for any of the 6 regions (sign of all t statistics reflects POUD > control): PHC t(45) = 1.27, p = .212, d = 0.37; mPFC t = -1.23, p = .223, d = 0.36; mOFC t = -0.91, p = .368, d = 0.27; ACC t = 0.76, p = .450, d = 0.22; insula t = 0.35, p = .730, d = 0.10; dlPFC t = -0.05, p = .960, d = 0.02.To rule out the potential confounding influence of sex and head motion (quantified as the mean framewise displacement, FD, across each subject’s resting-state fMRI session), we repeated the above analyses after including both variables as covariates3. For the 2 × 4 subcortical ANOVA, the pattern was unchanged: significant main effect of ROI (F(3,129) = 256.01, p < .001, η2G = 0.57), non-significant main effect of group (F(1,43) = 2.19, p = .146, η2G = 0.04), with a significant interaction (F(3,129) = 4.37, p = .006, η2G = 0.02). Planned pairwise contrasts of the subcortical structural measures across the two groups were conducted via one-way ANCOVA (including the same covariates), separately for each region. This also yielded the same pattern of results: significant group differences in the same directions for the dorsal striatum (F(1,43) = 4.99, p = .031) and NA (F(1,43) = 4.37, p = .043), but not for the other regions (both p > .600). For the 2 × 6 cortical ANOVA, the pattern was unchanged: significant main effect of ROI (F(5,215) = 26.94, p < .001, η2G = 0.23), non-significant main effect of group (F(1,43) = 0.00, p = .968, η2G = 0.00) and non-significant interaction (F(5,215) = 1.93, p = .090, η2G = 0.02). All cortical regional pairwise ANCOVA contrasts were also non-significant (all p > .200). Hence, the pattern of sMRI results was unchanged by inclusion of sex and head motion as covariates, ruling out their potential confounding influence.To summarize, the univariate sMRI analyses were able to distinguish the POUD subjects from the healthy controls based on subcortical intensity but not cortical surface thickness. The subcortical intensity results suggested a trend towards generally reduced structure in subcortical regions associated with drug cue processing in the POUD subjects, with this trend reaching statistical significance in the dorsal striatum and NA (i.e. ventral striatum). The cortical surface thickness analyses failed to yield a significant result, both in the ANOVA and the planned 2-sample ttests. This latter result highlights a partial lack of sensitivity of the univariate approaches in discriminating between the POUD and control groups, which we expand upon in subsequent sections.
Univariate restFC analyses failed to distinguish between POUD and healthy subjects
The restFC analyses focused on 264 regions localized from the whole-brain Power functional atlas (Power et al., 2011). To further probe the efficacy of univariate approaches, we contrasted restFC between the POUD and control groups via 2-sample ttests at each individual connection (after Fisher-z transforming the Pearson r values, with FDR correction for multiple comparisons; see Method 2.5). Fig. 1c depicts the average restFC matrix for each group, and the unthresholded difference matrix for POUD minus control. Whilst there are subtle differences visible in the unthresholded difference matrix (e.g. numerically greater within-network connectivity of the default mode network, DMN, for POUD), no connection survived multiple comparison correction. This lack of significance held after controlling for potential sex and head motion confounds via one-way ANCOVAs conducted at each connection, with sex and mean FD entered as covariates, and resulting p values FDR-corrected for multiple comparisons. Coupled with the mixed significance observed with the univariate sMRI analyses, this result further evidences a lack of sensitivity of standard univariate approaches in separating POUD from healthy control subjects.
Multivariate ML classification: sMRI features reliably discriminated POUD from healthy subjects
We examined whether multivariate machine learning approaches could more successfully discriminate between the two groups on the basis of the multivariate pattern amongst input features. All reported classifiers used the same minimum-distance classification algorithm (Diedrichsen and Kriegeskorte, 2017, Mill et al., 2020, Mur et al., 2009, Spronk et al., 2020), with performance quantified via the same crossvalidation approach, and significance of the observed classification accuracies assessed via permutation testing (see Method 2.6 for details).The first “sMRI only” classifier input the same structural features as those used in the univariate analyses (MR image intensity from 4 subcortical regions and gray matter thickness from 6 cortical regions; see Method 2.6). This classifier was able to reliably discriminate between POUD and control group subjects (see Fig. 2): 64.3% mean classification accuracy, 66.7% sensitivity in classifying POUDs, 61.9% specificity in classifying controls; p = .013 via permutation test, null classification mean = 50.2%. Note that this classifier included the 6 cortical sMRI features that failed to yield significant between-group differences in the univariate analyses (via mixed ANOVA and 2-sample ttests, see section 3.1). To probe the influence these non-significant univariate features might have had on multivariate classification performance, we trained an additional “sMRI subcortex” classifier that used only the 4 subcortical intensity features. This yielded numerically worse classification accuracy than the “sMRI only” classifier, and failed to reach statistical significance (see Fig. 2): 57.1% mean accuracy, 61.9% POUD sensitivity, 52.4% control specificity; p = .105, null mean = 49.8%.
Fig. 2
Multivariate classifiers trained on sMRI only, restFC only and combined (multi-modal: sMRI and restFC) feature sets reliably discriminated between POUD and control subjects. Plotted are the observed mean classification accuracies across models (orange circles), and the mean of the permuted null classifications (blue circles, bars represent standard deviation over n = 1000 permutations). Asterisks denote significance of the observed classification accuracies versus the permuted null. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Multivariate classifiers trained on sMRI only, restFC only and combined (multi-modal: sMRI and restFC) feature sets reliably discriminated between POUD and control subjects. Plotted are the observed mean classification accuracies across models (orange circles), and the mean of the permuted null classifications (blue circles, bars represent standard deviation over n = 1000 permutations). Asterisks denote significance of the observed classification accuracies versus the permuted null. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)The results demonstrate statistically reliable discrimination of POUD subjects via a multivariate ML classifier trained using sMRI features. Further highlighting the sensitivity of multivariate analyses, we observed a numerical improvement in classification accuracy with inclusion of cortical thickness features that did not distinguish between the groups via univariate approaches.
Multivariate ML classification: restFC features reliably discriminated POUD from healthy subjects
We applied the same multivariate classification approach to distinguish between the POUD and control group subjects, this time on the basis of restFC features. Given the high dimensionality of restFC (34716 connections from the upper diagonal of the 264-by-264 matrix), we performed a feature selection step at the start of each crossvalidation loop that confined classifier training to the top 50 most diagnostic connections (see Method 2.6. for details). The resulting “restFC only” classifier also achieved significantly above-chance classification accuracy (see Fig. 2): 64.3% mean accuracy, 76.2% POUD sensitivity, 52.4% control specificity; p = .015, null mean = 50.3%. This finding further highlights the utility of multivariate over univariate approaches in capturing neural differences characteristic of opioid addiction. ML classifiers trained on restFC features yielded significantly above-chance classification accuracy whereas univariate approaches applied to the same features were unable to do so.
Multivariate ML classification: Combining sMRI and restFC features in a multi-modal classifier numerically improved classification accuracy
We next determined whether combining the same sMRI and restFC features, which yielded significant classification accuracies in their standalone ML models, in a single multi-modal classifier would numerically improve performance. This “combined” ML classifier included the 10 sMRI features and the top 50 diagnostic restFC features (see Method 2.6 for details). As anticipated, classification accuracy was numerically improved for this “combined” classifier compared to all previously presented uni-modal classifiers (see Fig. 2): 66.7% mean accuracy, 66.7% POUD sensitivity, 66.7% control specificity; p < .0014, null mean = 50.2%. The combined ML classifier yielded a 2.4%, 9.6% and 2.4% improvement in classification accuracy compared to the sMRI only, sMRI subcortex and restFC only classifiers respectively. We demonstrate in the following section that these numerical improvements are reliable across variation in subsampling of the larger POUD group. Additional control analyses are also presented that highlight the robustness of the “combined” classification results, thereby evidencing the utility of multi-modal over uni-modal approaches.
Control analyses highlighted robustness of the combined ML classifier
To demonstrate robustness of the combined ML classification results, we firstly varied the “top n” hyperparameter governing the number of diagnostic restFC connections included as features. This was set to top 50 for the main results (see Method 2.6). Multivariate ML classifiers that were trained using the top 100 and top 150 diagnostic restFC features yielded identical, significantly above-chance classification accuracies; for both top 100 and top 150: 66.7% mean accuracy, 66.7% POUD sensitivity, 66.7% control specificity; p < .001, null mean = 50.2%. Secondly, we performed the combined classification using an alternative k-fold crossvalidation approach (see Method 2.6). This also yielded significantly above-chance classification accuracy: 65.0% mean accuracy, 65.0% POUD sensitivity, 65.0% control specificity; p = .017, null mean = 50.2%. Finally, we also ran the combined classification using an alternative linear classifier (support vector machine, see Method 2.6 for details), which again yielded significantly above-chance accuracy: 73.8% mean accuracy, 76.2% POUD sensitivity, 71.4% control specificity; p = .003, null mean = 46.7%. Note that statistical significance was maintained after correcting for experimenter degrees of freedom across these 5 variations in combined ML model estimation parameters (top 50 with leave-one-out crossvalidation, top 100 with leave-one-out, top 150 with leave-one-out, top 50 with k-folds and top 50 with SVM leave-one-out), via the Holm-Bonferroni familywise error procedure (at corrected alpha = 0.05).For the main “combined” classifier, we randomly selected 21 subjects from the POUD group (n = 26) to match the 21 healthy control subjects recruited, thereby ensuring that group sizes were balanced during classifier training (see Method 2.6 for details). To ensure that the pattern of results was not affected by this subsampling, we reran the entire ML analysis four times, each with unique randomization of the POUD subjects. As can be observed in the summary Table 2, the “combined” model yielded significantly above-chance classification accuracy across these POUD subsampling variations (5 including the main classifier presented in section 3.5). The results of the random POUD subsampling also highlight the robustness of the numerical improvement in classification accuracy for the “combined” versus uni-modal classifiers. Across all 5 variations, the “combined” model yielded higher classification accuracies 3/5 times versus the “sMRI only” classifier (with the two remaining instances yielding equal classification accuracies), 5/5 times versus the “sMRI subcortex” classifier, and 5/5 times versus the “restFC only” classifier.
Table 2
High performance of the combined ML classifier was preserved across variation in POUD group subsampling. Presented are the mean classification accuracies for each of the 4 classifier models (columns) across the 5 subsampling variations (rows). P values assessing the statistical significance of the observed accuracies against permuted null distributions are presented in parenthesis.
sMRI only
sMRI subcortex
restFC only
combined
R1 (main)
64.3% (p = .013)
57.1% (p = .105)
64.3% (p = .015)
66.7% (p < .001)
R2
61.9% (p = .025)
57.1% (p = .113)
54.8% (p = .162)
61.9% (p = .025)
R3
64.3% (p = .020)
57.1% (p = .117)
61.9% (p = .016)
66.7% (p = .011)
R4
61.9% (p = .029)
59.5% (p = .058)
64.3% (p = .005)
66.7% (p = .006)
R5
64.3% (p = .011)
50.0% (p = .364)
42.9% (p = .825)
64.3% (p = .011)
High performance of the combined ML classifier was preserved across variation in POUD group subsampling. Presented are the mean classification accuracies for each of the 4 classifier models (columns) across the 5 subsampling variations (rows). P values assessing the statistical significance of the observed accuracies against permuted null distributions are presented in parenthesis.
Multivariate model using combined features yielded numerically strongest prediction of individual differences in persistence behavior
We next implemented a multivariate approach to predictive modeling of clinically relevant behavior. Subjects in our sample completed an out-of-scanner behavioral task designed to measure individual differences in the ability to persist in the face of setbacks (Bhanji et al., 2016, Bhanji and Delgado, 2014; see Method 2.7 for task details). We firstly confirmed the clinical relevance of this measure by contrasting it across the POUD and control groups via 2-sample ttest. Persistence was significantly lower for POUD versus control subjects, reflecting the former group’s reduced willingness to persist in the face of setbacks: POUD group mean persistence = 41.3%, control group mean = 59.2%; t(36) = 2.58, p = .014, Cohen’s d = 0.85. This supports the presence of dysfunctional persistence behavior in the POUD subjects, which we attempted to capture via our multivariate predictive modeling approach.Behavioral prediction accuracy for models trained on the same 4 feature sets used in the classification analyses are presented in Fig. 3 (see Method 2.7 for modeling details). Depicted are scatterplots for each model along with the Pearson correlation (r) and the mean absolute error (MAE) between predicted and actual persistence behavior across subjects, which quantified model performance. Inspection of Fig. 3 conveys that individual differences in clinically relevant persistence behavior were reliably modeled using MRI features. Comparison of the numerical differences in prediction accuracy across feature sets revealed a similar pattern to the ML classification results: prediction accuracy was clearly non-significant for the “sMRI subcortex” model, marginally non-significant for the “sMRI only” model and clearly significant for the “restFC only” model5. However, the multi-modal “combined” model once again numerically outperformed all uni-modal prediction accuracies (r change = 0.34, 0.12 and 0.03 respectively).
Fig. 3
Combined multivariate model yielded the highest accuracy in predicting persistence behavior. Depicted are the model-predicted versus actual persistence behavior across subjects, for the following models: A) sMRI only, B) sMRI subcortex, C) restFC only, and D) combined (multi-modal: sMRI and restFC features). Across panels, predicted and actual behavioral scores were zscored in non-circular fashion as part of model estimation (see Method 2.7 for details). Pearson correlation (r) values capturing the predicted-to-actual overlap (model accuracy) with accompanying p values are provided, along with the mean absolute error (MAE) and linear lines of best fit. Asterisks denote significant prediction accuracy at p < .05. Note that the accompanying R squared (coefficient of determination) values for panels A-D are -0.06, -0.09, 0.14 and 0.16 respectively.
Combined multivariate model yielded the highest accuracy in predicting persistence behavior. Depicted are the model-predicted versus actual persistence behavior across subjects, for the following models: A) sMRI only, B) sMRI subcortex, C) restFC only, and D) combined (multi-modal: sMRI and restFC features). Across panels, predicted and actual behavioral scores were zscored in non-circular fashion as part of model estimation (see Method 2.7 for details). Pearson correlation (r) values capturing the predicted-to-actual overlap (model accuracy) with accompanying p values are provided, along with the mean absolute error (MAE) and linear lines of best fit. Asterisks denote significant prediction accuracy at p < .05. Note that the accompanying R squared (coefficient of determination) values for panels A-D are -0.06, -0.09, 0.14 and 0.16 respectively.As expected, prediction accuracy was numerically lower for the “restFC only” and “combined” models if the maximum number of PCs were used in model training (likely due to overfitting; see Method 2.7 for details), yet both remained significant: restFC maxPC r = 0.37, p = .022, MAE = 0.85, R squared = 0.06; combined maxPC r = 0.33, p = .041, MAE = 0.80, R squared = 0.08. Indeed, highlighting the robustness of the model accuracies, all 4 relevant p values (restFC only/combined × maxPC/PC10) survived Holm-Bonferroni familywise error correction for experimenter degrees of freedom in varying the number of retained PCs (corrected alpha = 0.05). To further demonstrate robustness, we also reran the combined predictive model using an alternative k-folds crossvalidation scheme (see Method 2.7 for details), and obtained a similar pattern of results: k-folds maxPC r = 0.39, p = .019 (remains significant after Holm-Bonferroni correction), MAE = 0.75, R squared = 0.13; k-folds PC10 r = 0.31, p = .070, MAE = 0.79, R squared = 0.08.Overall, our results highlight the efficacy of multivariate predictive modeling approaches in capturing individual differences in persistence behavior, which were demonstrated to be clinically relevant to POUD. Paralleling our ML classification results, multi-modal feature sets once again yielded the numerically highest prediction accuracy compared to uni-modal feature sets.
Head motion control analyses
Given widely documented concerns over head motion confounding MRI-based multivariate classification and behavioral modeling results (Rao et al., 2017, Siegel et al., 2017, Todd et al., 2013), we conducted a series of analyses to rigorously probe such influences. Note that partial mitigation of head motion artifacts was already achieved for the main results via removal (i.e. “scrubbing”, Power et al., 2014) of high motion timepoints and nuisance regression of motion estimates during preprocessing (see Method 2.4 for details). To probe any residual influence of head motion on the multivariate ML classification results, we computed the Spearman correlation between actual group status (1 = POUD, 0 = control) and the mean framewise displacement (FD) across each subject’s resting-state session. This yielded a non-significant relationship: rho = 0.28, p = .072. To conclusively remove the influence of head motion, we also regressed out the mean FD across subjects from each input feature prior to rerunning the “combined” classification model (following e.g. Rao et al., 2017, Todd et al., 2013). The resulting classification accuracy was numerically lower than the main “combined” model, but remained statistically significant: 61.9% mean classification accuracy, 61.9% POUD sensitivity, 61.9% control specificity; p = .029, null mean = 49.8%.A similar approach was taken to probe the influence of head motion on the predictive modeling results. The Spearman correlation between behavioral persistence and mean FD across subjects was clearly non-significant: rho = -0.15, p = .375. Regressing out mean FD from each feature prior to running the “combined” predictive model again yielded statistically significant (albeit slightly numerically reduced) model performance: r = 0.38, p = .020, MAE = 0.76, R squared = 0.13.Hence, whilst regressing out head motion from multi-modal features reduced prediction accuracy for both the multivariate ML classification and predictive behavioral models, this operation did not affect the recovery of significantly above-chance model performance. This rules out the possibility that head motion was predominantly driving the efficacy of the multivariate models (Laumann et al., 2016, Power et al., 2014). Note also that the observed reduction in model performance might also result from discarding cognitively/clinically relevant trait information that is likely associated with head motion (e.g. task engagement; Zeng et al., 2014).
Discussion
Our findings demonstrate the sensitivity of structural and functional MRI measures to long-term neural effects of prescription opioid addiction. Despite the severe individual and societal burden of POUD, functional neuroimaging investigations in this clinical group have remained scarce. To rectify this, we employed univariate and multivariate (machine learning) approaches to discriminate POUD characteristics on the basis of structural (sMRI morphometric) and functional (restFC) information. Importantly, we only included long-term POUD subjects that had been detoxified and that were not concurrently experiencing chronic pain or serious physical or psychiatric comorbidity, thereby avoiding a number of potential confounding influences on our brain imaging measures. Of the univariate analyses conducted, we observed a statistically reliable reduction in subcortical integrity in POUD compared to healthy controls. This effect was maximal in the NA (i.e. ventral striatum) and dorsal striatum (see Fig. 1a). The striatum has long been implicated in the cellular action profile following opioid consumption (Cunha-Oliveira et al., 2008, Volkow et al., 2003), with dopamine release in this region serving as the endpoint of intracellular events triggered by the endogenous opioid system (Compton et al., 2016, Volkow et al., 1999). Critically, findings from animal models suggest that repeated modulation of this system over long-term opioid use can lead to structural changes, such as decreased dendritic spine density (Liao et al., 2005), lower dopamine D2 receptor availability (Zijlstra et al., 2008), and apoptosis of brain neurons and microglia (Hu et al., 2002). Such cellular changes plausibly underpin the striatal structural loss observed here, with previous human neuroimaging studies reporting similar sMRI changes (McConnell et al., 2020) even after just one month of opioid use (Lin et al., 2016, Younger et al., 2011). It remains unclear whether this structural loss results from direct neurotoxic impacts of chronic drug ingestion (Cunha-Oliveira et al., 2008) and/or cognitive processes relating to negative reinforcement that perpetuate cycles of addiction (Kibaly et al., 2019, Stewart et al., 2019). Nevertheless, our results converge with previous reports in suggesting reduced subcortical structural integrity, most prominent in the dorsal and ventral striatum (NA), as a large-scale neural substrate of long-term opioid addiction.Unlike the subcortical sMRI analyses, univariate analyses of both cortical sMRI and whole-brain fMRI restFC failed to reliably discriminate between the POUD and control groups at standard statistical thresholds. For the cortical sMRI analyses, the observed non-significant main effect of group status is anticipated by previous reports of mixed patterns of increase and decreases in gray matter volume across cortical regions in opioid users (Lin et al., 2016, Younger et al., 2011). However, region-specific contrasts also failed to recover a significant group difference even at uncorrected thresholds, revealing a general lack of sensitivity in univariate detection of cortical structural changes. Our results partially align with a previous report employing univariate approaches that also found significant loss of structure in subcortical (amygdala) but not cortical regions for POUD subjects (Upadhyay et al., 2010). Our failure to observe significant univariate differences in restFC is also in keeping with the pattern in previous opioid addiction studies; whilst connectivity changes involving similar drug cue-related regions as in the present report have been documented, the sign of these changes has alternated inconsistently (Ieong and Yuan, 2017). This is consistent with the recent findings of Spronk and colleagues (Spronk et al., 2020), who reported subtle univariate alterations in restFC across multiple mental disorders. It is possible that with access to comparably large samples as used by Spronk et al (clinical groups ranged from 59 to 90 subjects) the resultant increase in statistical power could have yielded significant connectivity differences in the present report. We elaborate on limitations associated with our small sample in a later paragraph.Nevertheless, even in this limited sample, our multivariate ML classification analyses were overall more successful, recovering significantly above-chance group classifications for both uni-modal sMRI and restFC feature sets. In addition to showing significant classification accuracy from restFC features that did not yield significant univariate differences, we also demonstrated that the sMRI classifications benefited from inclusion of cortical features that also failed to recover univariate differences (via comparison of “sMRI only” and “sMRI subcortex” accuracies in Fig. 2). This suggests that long-term effects of opioid use are reflected in the multivariate pattern amongst subcortical and cortical morphometric measures, as well as restFC. The fact that non-significant univariate information can still have discriminative utility in a multivariate classification approach highlights the greater sensitivity of the latter class of methods (Hebart and Baker, 2018, Jimura and Poldrack, 2012). When considered with the greater viability of translating ML classification approaches towards making individualized predictions in the clinic (compared to univariate methods), we hope that our preliminary results can spur future applications of such approaches in POUD.In guiding these future extensions, we demonstrated the particular efficacy of combining feature sets across sMRI and restFC modalities. The resulting “combined” classifier numerically outperformed all other uni-modal classifiers, and this held across a number of control analyses (including randomised sampling of the larger POUD group, and confound regression of motion estimates). Despite the dearth of ML classification analyses applied to opioid disorders, general support for the superiority of multi-modal approaches comes from Alzheimer’s disease research (Adamczuk et al., 2015, Liu et al., 2018, Rathore et al., 2017, Zhang et al., 2011a). Given that this subdomain of clinical neuroscience has witnessed the greatest volume of ML applications, its convergence towards multi-modal feature sets should be harnessed to expedite developments in the more nascent field of opioid use, and addiction generally. The superiority of multi-modal compared to uni-modal feature sets might arise from the different but complementary “perspectives” of POUD-related neural dysfunction captured by distinct modalities (Liu et al., 2018, Zhang et al., 2011b). Another compatible explanation is that incorporating multi-modal features from separate scan sequences likely mitigates the influence of artifacts that are specific to a given modality. This increases the likelihood of classification algorithms identifying multivariate patterns that reflect cognitively and clinically relevant information.The benefit of multi-modal feature sets was also apparent in the final set of analyses targeting multivariate prediction of a continuous behavioral variable (persistence in goal pursuit; Bhanji et al., 2016). The highest prediction accuracy was again obtained for a “combined” (sMRI and restFC) model, which survived rigorous correction for possible head motion confounds. More generally, these analyses extended the multivariate ML classifications of group status towards modeling of clinically relevant behavior. This link to behavior has been previously highlighted as a critical step in developing clinically useful biomarkers from functional neuroimaging measures (Mill et al., 2020, Rosenberg et al., 2016, Woo et al., 2017). It is also worth highlighting the distinction between our present “predictive” approach (testing on held-out subjects) versus “associative” approaches (e.g. correlating imaging and behavioral measures in the full sample) that have been predominantly applied in the study of opioid addiction. Whereas the latter approach risks overfitting by testing a model on the same data used to train it, the former predictive approach avoids this by testing on held-out data, thereby deriving more generalizable brain-behavior models (Shen et al., 2017).This ability to generate predictions for held-out subjects again suggests potential applications in the clinic, and it is worth noting that reliable classifications of group status and modeling of persistence behavior was achieved with just 20 min of total scan time (sMRI and restFC). The use of a resting-state fMRI session is particularly appealing as it leads to greater subject compliance than task fMRI, with the latter imposing lengthier instructions and/or task training in a sample that already poses challenges for recruitment and within-scanner compliance. Beyond these practical benefits, our success in using restFC information to improve clinically relevant group classification and behavioral modeling (compared to the uni-modal sMRI models) theoretically attests to the cognitive and clinical relevance of restFC (Mill et al., 2020, Mill et al., 2017).However, we must emphasize the preliminary nature of the present findings and acknowledge a number of limitations. Firstly, although our sample size of POUD subjects was larger than the majority of previous POUD reports, it remained objectively small. Issues with predictive modeling in small samples have been raised recently for both ML classification (Poldrack et al., 2019) and behavioral modeling (Marek et al., 2020) approaches. Future research should aim to rectify the dearth of large samples in POUD (and addiction research more generally), albeit noting the inherent challenges in recruiting this specific clinical group. To this end, consortium attempts to recruit large samples of POUD through coordination across multiple scanner sites would be useful. A larger sample would also enable more rigorous testing of the significance of observed accuracy differences between feature set models, via permutation or subsampling approaches. This formal testing was not undertaken in the present report and hence model accuracy differences were consistently described as “numerical” rather than “statistically significant”. A second limitation was the low duration of resting-state fMRI data collected per subject (6 min). Recent studies have highlighted improved detection of individualized (trait) aspects of restFC with greater amounts of resting-state data per subject (greater than30 min; Gordon et al., 2017, Gratton et al., 2018). This relates to a third limitation of the present study: all model performance effect sizes, despite being statistically reliable, numerically fell far below prevalent guidelines for clinical utility (i.e. 80% classification accuracy and 80% predicted-to-actual behavioral overlap). Future extensions of our ML approaches should strive for improved model performance, which should be aided by collecting more rest data per subject. Whilst we purposely used simple linear classification and predictive modeling approaches for this first foray into POUD, it is possible that more intensive nonlinear modeling approaches would improve classification accuracy, albeit with a potential loss of interpretability (Woo et al., 2017). Finally, improved modeling on the basis of restFC features might also be achieved by exploring more advanced functional connectivity estimation methods, with greater causal validity than Pearson correlation (Mill et al., 2016, Mill et al., 2017, Reid et al., 2019).Considering these limitations, we interpret the present findings as constituting a theoretical proof-of-concept that lays the foundation for future MRI investigations of POUD. The present report supplements previous insights into cognitive changes associated with long-term opioid addiction (Baldacchino et al., 2012) with insight into accompanying neural changes. An integrated theoretical account of both cognitive and neural underpinnings of opioid addiction will be critical in improving current treatment limitations, as indexed by the high relapse rate. In the long-term extension of our modeling approaches towards a useful clinical protocol, not only do the afore-mentioned targets for model accuracy (80%) need to be met, but the modeling should also target more clinically useful predictions. This will critically depend on the availability of longitudinal data for POUD subjects. Such follow-up is planned in the present sample, and would enable investigation of whether MRI features at the presently assessed timepoint 1 (upon admission to a rehabilitation program) are able to predict clinical outcomes (e.g. relapse rate) at a later timepoint 2. If successful, prospective modeling on the basis of subjects’ neuroimaging profiles (“biotypes”, Drysdale et al., 2017) would have clear clinical utility as a basis for tailoring POUD treatment and monitoring programs to individual patients. Future neuroimaging studies in POUD will therefore be critical in extending our preliminary findings from the lab to the clinic.Author contributions: SR collected the data, which was formatted/curated by ECW. RDM, SR and MWC conceptualized the analysis approach. RDM conducted the data analyses, with input from ECW and SR. RDM and SR wrote the manuscript, with feedback provided by ECW and MWC.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Timothy P Morris; Agnieszka Burzynska; Michelle Voss; Jason Fanning; Elizabeth A Salerno; Ruchika Prakash; Neha P Gothe; Susan Whitfield-Gabrieli; Charles H Hillman; Edward McAuley; Arthur F Kramer Journal: Med Sci Sports Exerc Date: 2022-04-25