| Literature DB >> 35486643 |
Oualid Benkarim1, Casey Paquola1,2, Bo-Yong Park1,3,4, Valeria Kebets1, Seok-Jun Hong4,5,6, Reinder Vos de Wael1, Shaoshi Zhang7,8,9, B T Thomas Yeo7,8,9, Michael Eickenberg10, Tian Ge11, Jean-Baptiste Poline1, Boris C Bernhardt1, Danilo Bzdok1,12,13,14.
Abstract
Brain imaging research enjoys increasing adoption of supervised machine learning for single-participant disease classification. Yet, the success of these algorithms likely depends on population diversity, including demographic differences and other factors that may be outside of primary scientific interest. Here, we capitalize on propensity scores as a composite confound index to quantify diversity due to major sources of population variation. We delineate the impact of population heterogeneity on the predictive accuracy and pattern stability in 2 separate clinical cohorts: the Autism Brain Imaging Data Exchange (ABIDE, n = 297) and the Healthy Brain Network (HBN, n = 551). Across various analysis scenarios, our results uncover the extent to which cross-validated prediction performances are interlocked with diversity. The instability of extracted brain patterns attributable to diversity is located preferentially in regions part of the default mode network. Collectively, our findings highlight the limitations of prevailing deconfounding practices in mitigating the full consequences of population diversity.Entities:
Mesh:
Year: 2022 PMID: 35486643 PMCID: PMC9094526 DOI: 10.1371/journal.pbio.3001627
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 9.593
Fig 3Participant diversity is a major determinant for the classification accuracy of predictive models.
Results are based on the classification of ASD versus TD using functional connectivity profiles. For each dataset, results show prediction accuracy for each possible combination of 5 out of 10 strata for training and the remaining 5 strata as holdout. Prediction accuracy based on AUC (top) and F1 score (bottom) in 2 different cohorts: ABIDE (left) and HBN (right). For each cohort, the first column indicates the predictive model performance using a 10-fold CV strategy based solely on the training set, where diversity is computed as the average of all pairwise absolute differences in propensity scores (i.e., WD). The second column displays the performance for each single stratum in the holdout strata. Diversity denotes the mean absolute difference in propensity scores between the participants of the training set and those in the held-out strata with unseen participants (i.e., OOD). The strength of the association between performance and diversity is reported with Pearson correlation coefficient (r). Our empirical results show a strong relationship between predictive performance and diversity, although different correlation directions were found in ABIDE and HBN cohorts. Data underlying this figure can be found in S1 Data. ABIDE, Autism Brain Imaging Data Exchange; ASD, autism spectrum disorder; AUC, area under the curve; CV, cross-validation; HBN, Healthy Brain Network; OOD, out of distribution; TD, typically developing; WD, within distribution.
Demographics for each site and group.
| Site |
| Sex (M/F) | Age | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TD | ASD | ADHD | ANX | TD | ASD | ADHD | ANX | TD | ASD | ADHD | ANX | ||
|
|
| 70 | 56 | - | - | 69/1 | 52/4 | - | - | 15.4 ± 7.0 | 14.5 ± 7.9 | - | - |
|
| 22 | 20 | - | - | 22/- | 20/- | - | - | 19.7 ± 7.0 | 20.8 ± 7.3 | - | - | |
|
| 19 | 18 | - | - | 19/- | 18/- | - | - | 15.8 ± 3.2 | 14.5 ± 3.3 | - | - | |
|
| 40 | 52 | - | - | 40/- | 52/- | - | - | 21.5 ± 7.8 | 23.6 ± 7.6 | - | - | |
|
| 151 | 146 | - | - | 150/1 | 142/4 | - | - | 17.7 ± 7.3 | 18.6 ± 8.4 | - | - | |
|
|
| 23 | 41 | 172 | 86 | 11/12 | 37/4 | 119/53 | 52/34 | 11.7 ± 3.3 | 12.5 ± 4.0 | 11.5 ± 3.3 | 12.4 ± 3.7 |
|
| 33 | 49 | 130 | 74 | 19/14 | 36/13 | 87/43 | 43/31 | 11.6 ± 3.5 | 12.7 ± 4.0 | 11.6 ± 3.3 | 12.1 ± 3.5 | |
|
| 46 | 16 | 38 | 41 | 20/26 | 11/5 | 24/14 | 23/18 | 11.8 ± 3.7 | 12.6 ± 3.4 | 12.8 ± 4.1 | 12.8 ± 3.8 | |
|
| 102 | 106 | 340 | 201 | 50/52 | 84/22 | 230/110 | 118/83 | 11.7 ± 3.5 | 12.6 ± 3.9 | 11.7 ± 3.4 | 12.4 ± 3.6 | |
Number of participants (N), males/females (M/F), and mean age and standard deviation for each site and group. Note that some participants from HBN are diagnosed with more than one disorder.
ABIDE, Autism Brain Imaging Data Exchange; ADHD, attention-deficit/hyperactivity disorder; ANX, anxiety; ASD, autism spectrum disorder; CBIC, CitiGroup Corcell Brain Imaging Center; HBN, Healthy Brain Network; NYU, New York University Langone Medical Center; PITT, University of Pittsburgh, School of Medicine; RU, Rutgers University Brain Imaging Center; SI, Staten Island; TCD, Trinity Centre for Health Sciences, Trinity College Dublin; TD, typically developing; USM, University of Utah, School of Medicine.