| Literature DB >> 35294251 |
Jingwei Li1,2,3, Danilo Bzdok4,5, Jianzhong Chen3, Angela Tam3, Leon Qi Rong Ooi3, Avram J Holmes6,7, Tian Ge7,8,9, Kaustubh R Patil1,2, Mbemba Jabbi10,11,12,13, Simon B Eickhoff1,2, B T Thomas Yeo3,14, Sarah Genon1,2.
Abstract
Algorithmic biases that favor majority populations pose a key challenge to the application of machine learning for precision medicine. Here, we assessed such bias in prediction models of behavioral phenotypes from brain functional magnetic resonance imaging. We examined the prediction bias using two independent datasets (preadolescent versus adult) of mixed ethnic/racial composition. When predictive models were trained on data dominated by white Americans (WA), out-of-sample prediction errors were generally higher for African Americans (AA) than for WA. This bias toward WA corresponds to more WA-like brain-behavior association patterns learned by the models. When models were trained on AA only, compared to training only on WA or an equal number of AA and WA participants, AA prediction accuracy improved but stayed below that for WA. Overall, the results point to the need for caution and further research regarding the application of current brain-behavior prediction models in minority populations.Entities:
Year: 2022 PMID: 35294251 PMCID: PMC8926333 DOI: 10.1126/sciadv.abj1812
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.957
Fig. 1.Ethnic/racial composition in our datasets and the brain atlases used for RSFC calculation.
Subpopulation composition of (A) HCP and (B) ABCD and functional connectivity ROIs. Note that the naming of ethnic/racial categories in (B) followed the definition given by the ABCD consortium (), which was slightly different from the National Institutes of Health definition. (C) The 400-area cortical parcellation derived by Schaefer et al. (). Parcel colors correspond to 17 large-scale networks (). (D) Nineteen subcortical ROIs from Deskian/Killiany atlas ().
Fig. 2.Procedures for data splitting and the matching between WA and AA.
(A) HCP dataset. (B) ABCD dataset.
Fig. 3.The prediction error is larger (i.e., lower predictive COD) in AA than matched WA in the HCP dataset (left) and the ABCD dataset (right).
Each violin plot shows the various predictive COD across 40 data splits in (A) and (C) and across 120 training-test splits in (B) and (D). Blue and green violins represent AA and WA, respectively. Red violins are the difference. Gray violins show the null distribution of difference generated by flipping the AA versus WA labels. Asterisk indicates that the difference in predictive COD between matched AA and WA is significant (FDR controlled at q = 5%). Gray dashed line indicates 0.
Fig. 4.Predictive COD of AA and WA when no confounding variable was regressed during model building.
(A) HCP dataset. Each violin plot shows the various predictive COD across 40 data splits. (B) ABCD dataset. Each violin plot shows the various predictive COD across 120 training-test splits. Blue and green violins represent AA and WA, respectively. Red violins show the difference. Gray violins show the null distribution of difference generated by flipping the AA versus WA labels. Asterisk indicates that the difference in predictive COD between matched AA and WA is significant (FDR controlled at 5%). Gray dashed line indicates 0.
Fig. 5.Extended analyses.
(A) Impact of training population for behavioral prediction model. The influence of training population was evaluated using predictive COD. Each bar corresponds to one of three types of prediction models: (i) trained on AA only, (ii) trained on the same number of random WA, and (iii) trained on both. For each model, the number of behavioral measures with better performance in WA than AA is indicated by the length of the navy blue bar, while the mint color represents the number of behavioral variables with better performance in AA than WA. The gray color represents the number of behavioral variables not showing significant difference in test accuracies between AA and WA. (B and C) For full-dataset models (when models were trained on the entire dataset), plot AA versus WA accuracy difference (predictive COD; vertical axis) against the difference in similarity between model-learned brain-behavior association patterns and true groupwise brain-behavior association patterns (horizontal axis). Each red dot represents a behavioral measure.
List of behavioral measures with better prediction in WA, with better prediction in AA, or without significant difference in prediction between WA and AA, for the models trained only on AA and only on WA.
|
|
|
| |
| Model trained only on AA | • Visuospatial reaction time | • Short-delay recall | • Visual episodic memory |
| • Reading (pronunciation) | • Total prodromal psychosis symptoms | • BAS—fun seeking | |
| • Visuospatial efficiency | • Anxious/depressed | • Executive function (card sort) | |
| • Thought problems | • Rule-breaking behavior | • BAS—drive | |
| • Social problems | • Somatic complaints | • Withdraw/depressed | |
| • Behavioral inhibition | • Long-delay recall | • Visuospatial accuracy | |
| • Mania | • Attention problems | • Prodromal psychosis severity | |
| • Matrix reasoning | • BAS—reward responsiveness | • Processing speed | |
| • Aggressive behavior | • Picture vocabulary | ||
| • Crystallized cognition | |||
| • Positive urgency | |||
| • Fluid cognition | |||
| • Cognitive control/attention (flanker) | |||
| • Lack of perseverance | |||
| • Lack of planning | |||
| • Sensation seeking | |||
| • Working memory (list sort) | |||
| • Overall cognition | |||
| • Negative urgency | |||
| Model trained only on WA | • Picture vocabulary | • Total prodromal psychosis symptoms | • Somatic complaints |
| • Working memory (list sort) | |||
| • Visuospatial reaction time | • Attention problems | ||
| • Reading (pronunciation) | |||
| • Crystallized cognition | • Executive function (card sort) | ||
| • BAS—drive | |||
| • Aggressive behavior | • Fluid cognition | ||
| • Overall cognition | |||
| • Visuospatial efficiency | • Mania | ||
| • Thought problems | |||
| • Sensation seeking | • Rule-breaking behavior | ||
| • Withdrawn/depressed | |||
| • Behavioral inhibition | • Processing speed | ||
| • Matrix reasoning | • Anxious/depressed | ||
| • Cognitive control/attention (flanker) | • Prodromal psychosis severity | ||
| • Lack of perseverance | |||
| • Social problems | |||
| • Long-delay recall | |||
| • Short-delay recall | |||
| • Negative urgency | |||
| • BAS—reward responsiveness | |||
| • Visuospatial accuracy | |||
| • Lack of planning | |||
| • Positive urgency | |||
| • Visual episodic memory | |||
| • BAS—fun seeking |
Fig. 6.Scatterplots of predicted scores against true behavioral scores for the behavioral measures with inconsistent conclusion drawn from predictive COD and Pearson’s correlation.
For each behavioral measure, a representative data split is shown (A to F). Each blue or green dot represents one AA or WA test participant, respectively. The numbers reported in blue color correspond to AA, while the green ones correspond to WA. Behavioral variance refers to the variance of the true behavioral score. Prediction shift was calculated as the square of the mean difference between true and predicted scores.