| Literature DB >> 33064342 |
Sage Hahn1, Scott Mackey1, Janna Cousijn2, John J Foxe3, Andreas Heinz4, Robert Hester5, Kent Hutchinson6, Falk Kiefer7, Ozlem Korucuoglu8, Tristram Lett4, Chiang-Shan R Li9, Edythe London10, Valentina Lorenzetti11,12,13, Luijten Maartje14, Reza Momenan15, Catherine Orr1, Martin Paulus16,17, Lianne Schmaal18,19, Rajita Sinha9, Zsuzsika Sjoerds20,21, Dan J Stein22, Elliot Stein23, Ruth J van Holst24, Dick Veltman25, Henrik Walter4, Reinout W Wiers2, Murat Yucel10,26, Paul M Thompson27, Patricia Conrod28, Nicholas Allgaier1, Hugh Garavan1.
Abstract
To identify neuroimaging biomarkers of alcohol dependence (AD) from structural magnetic resonance imaging, it may be useful to develop classification models that are explicitly generalizable to unseen sites and populations. This problem was explored in a mega-analysis of previously published datasets from 2,034 AD and comparison participants spanning 27 sites curated by the ENIGMA Addiction Working Group. Data were grouped into a training set used for internal validation including 1,652 participants (692 AD, 24 sites), and a test set used for external validation with 382 participants (146 AD, 3 sites). An exploratory data analysis was first conducted, followed by an evolutionary search based feature selection to site generalizable and high performing subsets of brain measurements. Exploratory data analysis revealed that inclusion of case- and control-only sites led to the inadvertent learning of site-effects. Cross validation methods that do not properly account for site can drastically overestimate results. Evolutionary-based feature selection leveraging leave-one-site-out cross-validation, to combat unintentional learning, identified cortical thickness in the left superior frontal gyrus and right lateral orbitofrontal cortex, cortical surface area in the right transverse temporal gyrus, and left putamen volume as final features. Ridge regression restricted to these features yielded a test-set area under the receiver operating characteristic curve of 0.768. These findings evaluate strategies for handling multi-site data with varied underlying class distributions and identify potential biomarkers for individuals with current AD.Entities:
Keywords: addiction; alcohol dependence; genetic algorithm; machine learning; multi-site; prediction; structural MRI
Mesh:
Year: 2020 PMID: 33064342 PMCID: PMC8675424 DOI: 10.1002/hbm.25248
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.399
FIGURE 1The distribution of both training (Sites 1–24) and testing (25–27) datasets is shown, and further broken down by AD to case ratio per site, as well as split by category (e.g., balanced vs. control‐only)
Sex and age, across the full collected dataset from 27 sites as split further into training and withheld testing set, and by alcohol use disorder (AD) versus control
| Split‐AD | Participants | Male (%) | Mean age ( |
|---|---|---|---|
| Train‐AD | 692 | 423 (61) | 33.36 ± 9.96 |
| Train‐Control | 960 | 554 (57) | 28.54 ± 9.56 |
| Test‐AD | 146 | 79 (54) | 44.72 ± 10.55 |
| Test‐Control | 236 | 99 (42) | 42.33 ± 12.31 |
FIGURE 2The different permutations of analyses conducted internally on the training set, with differing input dataset options (top row), classifiers (middle row), and computed CV scoring metrics (bottom row)
FIGURE 3A simplified view of the final pipeline, where the full training dataset is employed in an evolutionary feature search designed to produce optimal subsets of high performing features. From this collection of feature subsets a meta analysis for determining feature importance is conducted and a subset of “best” features are selected. Next, a logistic regression classifier is trained and evaluated on the testing dataset, with access to only the “best” subset of features
The results for each of the three considered classifiers with just the base dataset, the base dataset with added case‐only sites and lastly the full dataset with control‐only sites (see Figure 1 for information on which sites are balanced vs. control or case‐only) across both cross validation (CV) strategies, as highlighted in Figure 2
| Dataset | Classifier | Random three‐fold CV AUC (± STD) | Leave‐site‐out CV (5 sites) AUC (± STD) |
|---|---|---|---|
| Base | Logistic regression | 0.723 ± 0.042 | 0.644 ± 0.125 |
| Base | SGD | 0.731 ± 0.034 | 0.663 ± 0.139 |
| Base | SVM | 0.724 ± 0.038 | 0.623 ± 0.096 |
| Base ± case‐only | Logistic regression | 0.907 ± 0.022 | 0.560 ± 0.189 |
| Base ± case‐only | SGD | 0.896 ± 0.012 | 0.561 ± 0.183 |
| Base ± case‐only | SVM | 0.912 ± 0.011 | 0.578 ± 0.111 |
| Full (case ± control) | Logistic regression | 0.917 ± 0.012 | 0.636 ± 0.169 |
| Full (case ± control) | SGD | 0.919 ± 0.009 | 0.652 ± 0.132 |
| Full (case ± control) | SVM | 0.915 ± 0.014 | 0.631 ± 0.139 |
Note: Standard deviation in area under the receiver characteristic operator curve (AUC) across cross‐validated folds is provided, as an estimate of confidence. Random three‐fold CV was stratified according to AD status and was repeated 50 times with different random splits.
FIGURE 4(a) The top 15 features (threshold chosen for readability), as ranked by average weighted feature importance (where 0 indicates a feature appeared in none of the GA final models, and 1 represents a feature appeared in all) are shown. (b) The cortical thickness and (c) cortical average surface area feature importance scores, above an a priori selected threshold of 0.1, are shown as projected onto the fsaverage surface space
FIGURE 5The ROC curve for the final logistic regression model on the testing set, as restricted to only the “best” subset of four features