Literature DB >> 36017878

Statistical quantification of confounding bias in machine learning models.

Tamas Spisak1.   

Abstract

BACKGROUND: The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded.
RESULTS: The test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases.
CONCLUSIONS: The proposed test (implemented in the package mlconfound; https://mlconfound.readthedocs.io) can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers.
© The Author(s) 2022. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  conditional independence; conditional permutation; confounder test; confounding bias; machine learning; predictive modeling

Mesh:

Year:  2022        PMID: 36017878      PMCID: PMC9412867          DOI: 10.1093/gigascience/giac082

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   7.658


  37 in total

1.  Methods to detect, characterize, and remove motion artifact in resting state fMRI.

Authors:  Jonathan D Power; Anish Mitra; Timothy O Laumann; Abraham Z Snyder; Bradley L Schlaggar; Steven E Petersen
Journal:  Neuroimage       Date:  2013-08-29       Impact factor: 6.556

2.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

3.  Exact testing with random permutations.

Authors:  Jesse Hemerik; Jelle Goeman
Journal:  Test (Madr)       Date:  2017-11-30       Impact factor: 2.345

4.  Properties of balanced permutations.

Authors:  Lucinda K Southworth; Stuart K Kim; Art B Owen
Journal:  J Comput Biol       Date:  2009-04       Impact factor: 1.479

5.  Measuring the effects of confounders in medical supervised classification problems: the Confounding Index (CI).

Authors:  Elisa Ferrari; Alessandra Retico; Davide Bacciu
Journal:  Artif Intell Med       Date:  2020-01-13       Impact factor: 5.326

6.  Statistical quantification of confounding bias in machine learning models.

Authors:  Tamas Spisak
Journal:  Gigascience       Date:  2022-08-26       Impact factor: 7.658

7.  Harmonization of cortical thickness measurements across scanners and sites.

Authors:  Jean-Philippe Fortin; Nicholas Cullen; Yvette I Sheline; Warren D Taylor; Irem Aselcioglu; Philip A Cook; Phil Adams; Crystal Cooper; Maurizio Fava; Patrick J McGrath; Melvin McInnis; Mary L Phillips; Madhukar H Trivedi; Myrna M Weissman; Russell T Shinohara
Journal:  Neuroimage       Date:  2017-11-17       Impact factor: 6.556

Review 8.  Preventing dataset shift from breaking machine-learning biomarkers.

Authors:  Jérôme Dockès; Gaël Varoquaux; Jean-Baptiste Poline
Journal:  Gigascience       Date:  2021-09-28       Impact factor: 6.524

9.  Using structural MRI to identify bipolar disorders - 13 site machine learning study in 3020 individuals from the ENIGMA Bipolar Disorders Working Group.

Authors:  Abraham Nunes; Hugo G Schnack; Christopher R K Ching; Ingrid Agartz; Theophilus N Akudjedu; Martin Alda; Dag Alnæs; Silvia Alonso-Lana; Jochen Bauer; Bernhard T Baune; Erlend Bøen; Caterina Del Mar Bonnin; Geraldo F Busatto; Erick J Canales-Rodríguez; Dara M Cannon; Xavier Caseras; Tiffany M Chaim-Avancini; Udo Dannlowski; Ana M Díaz-Zuluaga; Bruno Dietsche; Nhat Trung Doan; Edouard Duchesnay; Torbjørn Elvsåshagen; Daniel Emden; Lisa T Eyler; Mar Fatjó-Vilas; Pauline Favre; Sonya F Foley; Janice M Fullerton; David C Glahn; Jose M Goikolea; Dominik Grotegerd; Tim Hahn; Chantal Henry; Derrek P Hibar; Josselin Houenou; Fleur M Howells; Neda Jahanshad; Tobias Kaufmann; Joanne Kenney; Tilo T J Kircher; Axel Krug; Trine V Lagerberg; Rhoshel K Lenroot; Carlos López-Jaramillo; Rodrigo Machado-Vieira; Ulrik F Malt; Colm McDonald; Philip B Mitchell; Benson Mwangi; Leila Nabulsi; Nils Opel; Bronwyn J Overs; Julian A Pineda-Zapata; Edith Pomarol-Clotet; Ronny Redlich; Gloria Roberts; Pedro G Rosa; Raymond Salvador; Theodore D Satterthwaite; Jair C Soares; Dan J Stein; Henk S Temmingh; Thomas Trappenberg; Anne Uhlmann; Neeltje E M van Haren; Eduard Vieta; Lars T Westlye; Daniel H Wolf; Dilara Yüksel; Marcus V Zanetti; Ole A Andreassen; Paul M Thompson; Tomas Hajek
Journal:  Mol Psychiatry       Date:  2018-08-31       Impact factor: 15.992

10.  Time of day is associated with paradoxical reductions in global signal fluctuation and functional connectivity.

Authors:  Csaba Orban; Ru Kong; Jingwei Li; Michael W L Chee; B T Thomas Yeo
Journal:  PLoS Biol       Date:  2020-02-18       Impact factor: 8.029

View more
  1 in total

1.  Statistical quantification of confounding bias in machine learning models.

Authors:  Tamas Spisak
Journal:  Gigascience       Date:  2022-08-26       Impact factor: 7.658

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.