Emma A M Stanley1,2,3, Matthias Wilms2,3,4, Pauline Mouches1,2,3, Nils D Forkert1,2,3,4. 1. University of Calgary, Department of Biomedical Engineering, Calgary, Alberta, Canada. 2. University of Calgary, Department of Radiology, Calgary, Alberta, Canada. 3. University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada. 4. University of Calgary, Alberta Children's Hospital Research Institute, Calgary, Alberta, Canada.
Abstract
Purpose: Explainability and fairness are two key factors for the effective and ethical clinical implementation of deep learning-based machine learning models in healthcare settings. However, there has been limited work on investigating how unfair performance manifests in explainable artificial intelligence (XAI) methods, and how XAI can be used to investigate potential reasons for unfairness. Thus, the aim of this work was to analyze the effects of previously established sociodemographic-related confounders on classifier performance and explainability methods. Approach: A convolutional neural network (CNN) was trained to predict biological sex from T1-weighted brain MRI datasets of 4547 9- to 10-year-old adolescents from the Adolescent Brain Cognitive Development study. Performance disparities of the trained CNN between White and Black subjects were analyzed and saliency maps were generated for each subgroup at the intersection of sex and race. Results: The classification model demonstrated a significant difference in the percentage of correctly classified White male ( 90.3 % ± 1.7 % ) and Black male ( 81.1 % ± 4.5 % ) children. Conversely, slightly higher performance was found for Black female ( 89.3 % ± 4.8 % ) compared with White female ( 86.5 % ± 2.0 % ) children. Saliency maps showed subgroup-specific differences, corresponding to brain regions previously associated with pubertal development. In line with this finding, average pubertal development scores of subjects used in this study were significantly different between Black and White females ( p < 0.001 ) and males ( p < 0.001 ). Conclusions: We demonstrate that a CNN with significantly different sex classification performance between Black and White adolescents can identify different important brain regions when comparing subgroup saliency maps. Importance scores vary substantially between subgroups within brain structures associated with pubertal development, a race-associated confounder for predicting sex. We illustrate that unfair models can produce different XAI results between subgroups and that these results may explain potential reasons for biased performance.
Purpose: Explainability and fairness are two key factors for the effective and ethical clinical implementation of deep learning-based machine learning models in healthcare settings. However, there has been limited work on investigating how unfair performance manifests in explainable artificial intelligence (XAI) methods, and how XAI can be used to investigate potential reasons for unfairness. Thus, the aim of this work was to analyze the effects of previously established sociodemographic-related confounders on classifier performance and explainability methods. Approach: A convolutional neural network (CNN) was trained to predict biological sex from T1-weighted brain MRI datasets of 4547 9- to 10-year-old adolescents from the Adolescent Brain Cognitive Development study. Performance disparities of the trained CNN between White and Black subjects were analyzed and saliency maps were generated for each subgroup at the intersection of sex and race. Results: The classification model demonstrated a significant difference in the percentage of correctly classified White male ( 90.3 % ± 1.7 % ) and Black male ( 81.1 % ± 4.5 % ) children. Conversely, slightly higher performance was found for Black female ( 89.3 % ± 4.8 % ) compared with White female ( 86.5 % ± 2.0 % ) children. Saliency maps showed subgroup-specific differences, corresponding to brain regions previously associated with pubertal development. In line with this finding, average pubertal development scores of subjects used in this study were significantly different between Black and White females ( p < 0.001 ) and males ( p < 0.001 ). Conclusions: We demonstrate that a CNN with significantly different sex classification performance between Black and White adolescents can identify different important brain regions when comparing subgroup saliency maps. Importance scores vary substantially between subgroups within brain structures associated with pubertal development, a race-associated confounder for predicting sex. We illustrate that unfair models can produce different XAI results between subgroups and that these results may explain potential reasons for biased performance.
Authors: Vladimir Fonov; Alan C Evans; Kelly Botteron; C Robert Almli; Robert C McKinstry; D Louis Collins Journal: Neuroimage Date: 2010-07-23 Impact factor: 6.556
Authors: Marcia E Herman-Giddens; Jennifer Steffes; Donna Harris; Eric Slora; Michael Hussey; Steven A Dowshen; Richard Wasserman; Janet R Serwint; Lynn Smitherman; Edward O Reiter Journal: Pediatrics Date: 2012-10-20 Impact factor: 7.124
Authors: Donald J Hagler; SeanN Hatton; M Daniela Cornejo; Carolina Makowski; Damien A Fair; Anthony Steven Dick; Matthew T Sutherland; B J Casey; Deanna M Barch; Michael P Harms; Richard Watts; James M Bjork; Hugh P Garavan; Laura Hilmer; Christopher J Pung; Chelsea S Sicat; Joshua Kuperman; Hauke Bartsch; Feng Xue; Mary M Heitzeg; Angela R Laird; Thanh T Trinh; Raul Gonzalez; Susan F Tapert; Michael C Riedel; Lindsay M Squeglia; Luke W Hyde; Monica D Rosenberg; Eric A Earl; Katia D Howlett; Fiona C Baker; Mary Soules; Jazmin Diaz; Octavio Ruiz de Leon; Wesley K Thompson; Michael C Neale; Megan Herting; Elizabeth R Sowell; Ruben P Alvarez; Samuel W Hawes; Mariana Sanchez; Jerzy Bodurka; Florence J Breslin; Amanda Sheffield Morris; Martin P Paulus; W Kyle Simmons; Jonathan R Polimeni; Andre van der Kouwe; Andrew S Nencka; Kevin M Gray; Carlo Pierpaoli; John A Matochik; Antonio Noronha; Will M Aklin; Kevin Conway; Meyer Glantz; Elizabeth Hoffman; Roger Little; Marsha Lopez; Vani Pariyadath; Susan Rb Weiss; Dana L Wolff-Hughes; Rebecca DelCarmen-Wiggins; Sarah W Feldstein Ewing; Oscar Miranda-Dominguez; Bonnie J Nagel; Anders J Perrone; Darrick T Sturgeon; Aimee Goldstone; Adolf Pfefferbaum; Kilian M Pohl; Devin Prouty; Kristina Uban; Susan Y Bookheimer; Mirella Dapretto; Adriana Galvan; Kara Bagot; Jay Giedd; M Alejandra Infante; Joanna Jacobus; Kevin Patrick; Paul D Shilling; Rahul Desikan; Yi Li; Leo Sugrue; Marie T Banich; Naomi Friedman; John K Hewitt; Christian Hopfer; Joseph Sakai; Jody Tanabe; Linda B Cottler; Sara Jo Nixon; Linda Chang; Christine Cloak; Thomas Ernst; Gloria Reeves; David N Kennedy; Steve Heeringa; Scott Peltier; John Schulenberg; Chandra Sripada; Robert A Zucker; William G Iacono; Monica Luciana; Finnegan J Calabro; Duncan B Clark; David A Lewis; Beatriz Luna; Claudiu Schirda; Tufikameni Brima; John J Foxe; Edward G Freedman; Daniel W Mruzek; Michael J Mason; Rebekah Huber; Erin McGlade; Andrew Prescot; Perry F Renshaw; Deborah A Yurgelun-Todd; Nicholas A Allgaier; Julie A Dumas; Masha Ivanova; Alexandra Potter; Paul Florsheim; Christine Larson; Krista Lisdahl; Michael E Charness; Bernard Fuemmeler; John M Hettema; Hermine H Maes; Joel Steinberg; Andrey P Anokhin; Paul Glaser; Andrew C Heath; Pamela A Madden; Arielle Baskin-Sommers; R Todd Constable; Steven J Grant; Gayathri J Dowling; Sandra A Brown; Terry L Jernigan; Anders M Dale Journal: Neuroimage Date: 2019-08-12 Impact factor: 7.400
Authors: B J Casey; Tariq Cannonier; May I Conley; Alexandra O Cohen; Deanna M Barch; Mary M Heitzeg; Mary E Soules; Theresa Teslovich; Danielle V Dellarco; Hugh Garavan; Catherine A Orr; Tor D Wager; Marie T Banich; Nicole K Speer; Matthew T Sutherland; Michael C Riedel; Anthony S Dick; James M Bjork; Kathleen M Thomas; Bader Chaarani; Margie H Mejia; Donald J Hagler; M Daniela Cornejo; Chelsea S Sicat; Michael P Harms; Nico U F Dosenbach; Monica Rosenberg; Eric Earl; Hauke Bartsch; Richard Watts; Jonathan R Polimeni; Joshua M Kuperman; Damien A Fair; Anders M Dale Journal: Dev Cogn Neurosci Date: 2018-03-14 Impact factor: 6.464
Authors: H Garavan; H Bartsch; K Conway; A Decastro; R Z Goldstein; S Heeringa; T Jernigan; A Potter; W Thompson; D Zahs Journal: Dev Cogn Neurosci Date: 2018-04-16 Impact factor: 6.464