Literature DB >> 35277962

How to remove or control confounds in predictive models, with applications to brain biomarkers.

Darya Chyzhyk1,2,3, Gaël Varoquaux1,2, Michael Milham3,4, Bertrand Thirion1,2.   

Abstract

BACKGROUND: With increasing data sizes and more easily available computational methods, neurosciences rely more and more on predictive modeling with machine learning, e.g., to extract disease biomarkers. Yet, a successful prediction may capture a confounding effect correlated with the outcome instead of brain features specific to the outcome of interest. For instance, because patients tend to move more in the scanner than controls, imaging biomarkers of a disease condition may mostly reflect head motion, leading to inefficient use of resources and wrong interpretation of the biomarkers.
RESULTS: Here we study how to adapt statistical methods that control for confounds to predictive modeling settings. We review how to train predictors that are not driven by such spurious effects. We also show how to measure the unbiased predictive accuracy of these biomarkers, based on a confounded dataset. For this purpose, cross-validation must be modified to account for the nuisance effect. To guide understanding and practical recommendations, we apply various strategies to assess predictive models in the presence of confounds on simulated data and population brain imaging settings. Theoretical and empirical studies show that deconfounding should not be applied to the train and test data jointly: modeling the effect of confounds, on the training data only, should instead be decoupled from removing confounds.
CONCLUSIONS: Cross-validation that isolates nuisance effects gives an additional piece of information: confound-free prediction accuracy.
© The Author(s) 2022. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  biomarkers; confound; deconfounding; phenotype; predictive models; statistical testing; subsampling

Mesh:

Substances:

Year:  2022        PMID: 35277962      PMCID: PMC8917515          DOI: 10.1093/gigascience/giac014

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  56 in total

1.  A general statistical analysis for fMRI data.

Authors:  K J Worsley; C H Liao; J Aston; V Petre; G H Duncan; F Morales; A C Evans
Journal:  Neuroimage       Date:  2002-01       Impact factor: 6.556

2.  Multi-level bootstrap analysis of stable clusters in resting-state fMRI.

Authors:  Pierre Bellec; Pedro Rosa-Neto; Oliver C Lyttelton; Habib Benali; Alan C Evans
Journal:  Neuroimage       Date:  2010-03-10       Impact factor: 6.556

3.  Combining magnetoencephalography with magnetic resonance imaging enhances learning of surrogate-biomarkers.

Authors:  Denis A Engemann; Oleh Kozynets; David Sabbagh; Guillaume Lemaître; Gael Varoquaux; Franziskus Liem; Alexandre Gramfort
Journal:  Elife       Date:  2020-05-19       Impact factor: 8.140

4.  Using and understanding cross-validation strategies. Perspectives on Saeb et al.

Authors:  Max A Little; Gael Varoquaux; Sohrab Saeb; Luca Lonini; Arun Jayaraman; David C Mohr; Konrad P Kording
Journal:  Gigascience       Date:  2017-05-01       Impact factor: 6.524

5.  The need to approximate the use-case in clinical machine learning.

Authors:  Sohrab Saeb; Luca Lonini; Arun Jayaraman; David C Mohr; Konrad P Kording
Journal:  Gigascience       Date:  2017-05-01       Impact factor: 6.524

6.  Prediction of individual brain maturity using fMRI.

Authors:  Nico U F Dosenbach; Binyam Nardos; Alexander L Cohen; Damien A Fair; Jonathan D Power; Jessica A Church; Steven M Nelson; Gagan S Wig; Alecia C Vogel; Christina N Lessov-Schlaggar; Kelly Anne Barnes; Joseph W Dubis; Eric Feczko; Rebecca S Coalson; John R Pruett; Deanna M Barch; Steven E Petersen; Bradley L Schlaggar
Journal:  Science       Date:  2010-09-10       Impact factor: 47.728

7.  Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python.

Authors:  Krzysztof Gorgolewski; Christopher D Burns; Cindee Madison; Dav Clark; Yaroslav O Halchenko; Michael L Waskom; Satrajit S Ghosh
Journal:  Front Neuroinform       Date:  2011-08-22       Impact factor: 4.081

8.  Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example.

Authors:  Alexandre Abraham; Michael P Milham; Adriana Di Martino; R Cameron Craddock; Dimitris Samaras; Bertrand Thirion; Gael Varoquaux
Journal:  Neuroimage       Date:  2016-11-16       Impact factor: 7.400

9.  Preserved cognitive functions with age are determined by domain-dependent shifts in network responsivity.

Authors:  Dávid Samu; Karen L Campbell; Kamen A Tsvetanov; Meredith A Shafto; Lorraine K Tyler
Journal:  Nat Commun       Date:  2017-05-08       Impact factor: 14.919

10.  Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank.

Authors:  Fidel Alfaro-Almagro; Mark Jenkinson; Neal K Bangerter; Jesper L R Andersson; Ludovica Griffanti; Gwenaëlle Douaud; Stamatios N Sotiropoulos; Saad Jbabdi; Moises Hernandez-Fernandez; Emmanuel Vallee; Diego Vidaurre; Matthew Webster; Paul McCarthy; Christopher Rorden; Alessandro Daducci; Daniel C Alexander; Hui Zhang; Iulius Dragonu; Paul M Matthews; Karla L Miller; Stephen M Smith
Journal:  Neuroimage       Date:  2017-10-24       Impact factor: 6.556

View more
  2 in total

1.  Statistical quantification of confounding bias in machine learning models.

Authors:  Tamas Spisak
Journal:  Gigascience       Date:  2022-08-26       Impact factor: 7.658

2.  How to remove or control confounds in predictive models, with applications to brain biomarkers.

Authors:  Darya Chyzhyk; Gaël Varoquaux; Michael Milham; Bertrand Thirion
Journal:  Gigascience       Date:  2022-03-12       Impact factor: 6.524

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.