Darya Chyzhyk1,2,3, Gaël Varoquaux1,2, Michael Milham3,4, Bertrand Thirion1,2. 1. Parietal project-team, INRIA Saclay-île de France, France. 2. CEA/Neurospin bât 145, 91191 Gif-Sur-Yvette, France. 3. Center for the Developing Brain, Child Mind Institute, New York, NY 10022, USA. 4. Center for Biomedical Imaging and Neuromodulation, Nathan S. Kline Institute for Psychiatric Research, Orangeburg, NY 10962, USA.
Abstract
BACKGROUND: With increasing data sizes and more easily available computational methods, neurosciences rely more and more on predictive modeling with machine learning, e.g., to extract disease biomarkers. Yet, a successful prediction may capture a confounding effect correlated with the outcome instead of brain features specific to the outcome of interest. For instance, because patients tend to move more in the scanner than controls, imaging biomarkers of a disease condition may mostly reflect head motion, leading to inefficient use of resources and wrong interpretation of the biomarkers. RESULTS: Here we study how to adapt statistical methods that control for confounds to predictive modeling settings. We review how to train predictors that are not driven by such spurious effects. We also show how to measure the unbiased predictive accuracy of these biomarkers, based on a confounded dataset. For this purpose, cross-validation must be modified to account for the nuisance effect. To guide understanding and practical recommendations, we apply various strategies to assess predictive models in the presence of confounds on simulated data and population brain imaging settings. Theoretical and empirical studies show that deconfounding should not be applied to the train and test data jointly: modeling the effect of confounds, on the training data only, should instead be decoupled from removing confounds. CONCLUSIONS: Cross-validation that isolates nuisance effects gives an additional piece of information: confound-free prediction accuracy.
BACKGROUND: With increasing data sizes and more easily available computational methods, neurosciences rely more and more on predictive modeling with machine learning, e.g., to extract disease biomarkers. Yet, a successful prediction may capture a confounding effect correlated with the outcome instead of brain features specific to the outcome of interest. For instance, because patients tend to move more in the scanner than controls, imaging biomarkers of a disease condition may mostly reflect head motion, leading to inefficient use of resources and wrong interpretation of the biomarkers. RESULTS: Here we study how to adapt statistical methods that control for confounds to predictive modeling settings. We review how to train predictors that are not driven by such spurious effects. We also show how to measure the unbiased predictive accuracy of these biomarkers, based on a confounded dataset. For this purpose, cross-validation must be modified to account for the nuisance effect. To guide understanding and practical recommendations, we apply various strategies to assess predictive models in the presence of confounds on simulated data and population brain imaging settings. Theoretical and empirical studies show that deconfounding should not be applied to the train and test data jointly: modeling the effect of confounds, on the training data only, should instead be decoupled from removing confounds. CONCLUSIONS: Cross-validation that isolates nuisance effects gives an additional piece of information: confound-free prediction accuracy.
Authors: Max A Little; Gael Varoquaux; Sohrab Saeb; Luca Lonini; Arun Jayaraman; David C Mohr; Konrad P Kording Journal: Gigascience Date: 2017-05-01 Impact factor: 6.524
Authors: Nico U F Dosenbach; Binyam Nardos; Alexander L Cohen; Damien A Fair; Jonathan D Power; Jessica A Church; Steven M Nelson; Gagan S Wig; Alecia C Vogel; Christina N Lessov-Schlaggar; Kelly Anne Barnes; Joseph W Dubis; Eric Feczko; Rebecca S Coalson; John R Pruett; Deanna M Barch; Steven E Petersen; Bradley L Schlaggar Journal: Science Date: 2010-09-10 Impact factor: 47.728
Authors: Krzysztof Gorgolewski; Christopher D Burns; Cindee Madison; Dav Clark; Yaroslav O Halchenko; Michael L Waskom; Satrajit S Ghosh Journal: Front Neuroinform Date: 2011-08-22 Impact factor: 4.081
Authors: Alexandre Abraham; Michael P Milham; Adriana Di Martino; R Cameron Craddock; Dimitris Samaras; Bertrand Thirion; Gael Varoquaux Journal: Neuroimage Date: 2016-11-16 Impact factor: 7.400
Authors: Dávid Samu; Karen L Campbell; Kamen A Tsvetanov; Meredith A Shafto; Lorraine K Tyler Journal: Nat Commun Date: 2017-05-08 Impact factor: 14.919
Authors: Fidel Alfaro-Almagro; Mark Jenkinson; Neal K Bangerter; Jesper L R Andersson; Ludovica Griffanti; Gwenaëlle Douaud; Stamatios N Sotiropoulos; Saad Jbabdi; Moises Hernandez-Fernandez; Emmanuel Vallee; Diego Vidaurre; Matthew Webster; Paul McCarthy; Christopher Rorden; Alessandro Daducci; Daniel C Alexander; Hui Zhang; Iulius Dragonu; Paul M Matthews; Karla L Miller; Stephen M Smith Journal: Neuroimage Date: 2017-10-24 Impact factor: 6.556