| Literature DB >> 35908308 |
Jacqueline K Harris1, Stefanie Hassel2, Andrew D Davis3, Mojdeh Zamyadi4, Stephen R Arnott3, Roumen Milev5, Raymond W Lam6, Benicio N Frey7, Geoffrey B Hall8, Daniel J Müller9, Susan Rotzinger10, Sidney H Kennedy10, Stephen C Strother4, Glenda M MacQueen2, Russell Greiner11.
Abstract
Many previous intervention studies have used functional magnetic resonance imaging (fMRI) data to predict the antidepressant response of patients with major depressive disorder (MDD); however, practical constraints have limited many of those attempts to small, single centre studies which may not adequately reflect how these models will generalize when used in clinical practice. Not only does the act of collecting data at multiple sites generally increase sample sizes (a critical point in machine learning development) it also generates a more heterogeneous dataset due to systematic differences in scanners at different sites, and geographical differences in patient populations. As part of the Canadian Biomarker Integration Network in Depression (CAN-BIND-1) study, 144 MDD patients from six sites underwent resting state fMRI prior to starting escitalopram treatment, and again two weeks after the start. Here, we consider ways to use machine learning techniques to produce models that can predict response (measured at eight weeks after initiation), based on various parcellations, functional connectivity (FC) metrics, dimensionality reduction algorithms, and base learners, and also whether to use scans from one or both time points. Models that use only baseline (pre-treatment) or only week 2 (early-response) whole-brain FC features consistently failed to perform significantly better than default models. Utilizing the change in FC between these two time points, however, yielded significant results, with the best performing analytical pipeline achieving 69.6% (SD 10.8) accuracy. These results appear contrary to findings from many smaller single-site studies, which report substantially higher predictive accuracies from models trained on only baseline resting state FC features, suggesting these models may not generalize well beyond data used for development. Further, these results indicate the potential value of collecting data both before and shortly after treatment initiation.Entities:
Keywords: Depression; Functional connectivity; Machine learning; Resting State; Treatment response; fMRI
Mesh:
Substances:
Year: 2022 PMID: 35908308 PMCID: PMC9421454 DOI: 10.1016/j.nicl.2022.103120
Source DB: PubMed Journal: Neuroimage Clin ISSN: 2213-1582 Impact factor: 4.891
Fig. 1(Top) Diagram of data flow through analytic pipelines. Pre-processed resting state fMRI data from either baseline or week 2 is fed through a series of operations that first generate functional connectivity (FC) features and then generate a predictive model based on these features. Alternate approaches are used for parcellation, connectivity estimation, dimensionality reduction, and base learner, every combination of which is tested for each set of input data, resulting in a total of 240 models for each of the baseline and week 2 datasets (5 parcellations × 3 connectivity metrics × 4 dimensionality reduction techniques × 4 classifiers) – leading to a total of 480 models. Along the highlighted pathway, for example, pre-processed data collected at week 2 is first parcellated using the Power coordinates, resulting in a 259x295 matrix for each participant’s data, where 259 is the number of regions of interest (ROI) included in the power coordinates after SNR masking, and 295 is the length of the temporal dimension of the original fMRI dataset. Between every pair of ROIs, the correlation between time-courses is then calculated, resulting in a single value for every ROI pair. The full set of correlations corresponds to the lower triangle of the full correlation matrix, which is then vectorized with a length of 33,411 for each participant. This process of feature generation is repeated for each participant, resulting in a 144x33,411 matrix of FC features. Since feature generation is independent of response label, this procedure is completed prior to model generation. Features then fed into model generation are scaled, and passed to ANOVA feature selection, which chooses the k (value obtained based on internal cross-validation) most relevant features to be used in the linear SVM classifier. (Bottom) Delta models are processed through the same analytic pathway, with the addition of a subtraction step at the end of feature generation. Here, both baseline and week 2 FC features are generated, and the difference of the two matrices (the delta FC feature matrix) is used in subsequent predictive modelling. An additional 240 models are generated using these delta features, considering all possible combinations of processing steps.
Fig. 2Ranked models with highest mean cross-validation accuracy for data from baseline (left), week 2 (center), and delta (right) timepoints. Bar charts depict the mean test accuracy for each pipeline with error bars representing standard deviation across folds. Pipelines that performed significantly better than default accuracy (p-value ≤ 0.05) are indicated with an asterisk, which occurs only in the delta models and a single week 2 model. Tables below further detail the pipeline settings of top performing models along with mean cross-validation accuracy, and standard deviation.
Fig. 3Impact of analytic pipeline choices on prediction accuracy. For each evaluated step of the analytic pipeline, the figure shows the mean difference between pipelines with indicated alternatives and mean overall predictive accuracy. Error bars are scaled to one-sixth standard deviation. For each step in the pipeline, alternative operations were compared using a Wilcoxon matched-pairs rank test with Bonferroni correction. Steps involved in feature generation had the greatest impact on model performance, specifically the choice of connectivity metric, where correlation showed the highest mean performance. Neither step in model generation (dimensionality reduction or base learner) had a substantial impact on performance.
Fig. 4Mean resubstitution (blue) and testing (orange) accuracy across timepoints and base learners. Mean accuracy results for all models separated by time point and base learner are plotted, where each box extends from the first quartile to the third quartile with median line; whiskers extend to show full data range with mean default model accuracy indicated by the red horizontal line through plots.