Literature DB >> 35755858

Removing the influence of group variables in high-dimensional predictive modelling.

Emanuele Aliverti1, Kristian Lum2, James E Johndrow3, David B Dunson4.   

Abstract

In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a variety of sources, including batch effects, systematic measurement errors, or sampling bias. Without explicit adjustment, machine learning algorithms trained using these data can produce poor out-of-sample predictions which propagate these undesirable correlations. We propose a method to pre-process the training data, producing an adjusted dataset that is statistically independent of the nuisance variables with minimum information loss. We develop a conceptually simple approach for creating an adjusted dataset in high-dimensional settings based on a constrained form of matrix decomposition. The resulting dataset can then be used in any predictive algorithm with the guarantee that predictions will be statistically independent of the group variable. We develop a scalable algorithm for implementing the method, along with theory support in the form of independence guarantees and optimality. The method is illustrated on some simulation examples and applied to two case studies: removing machine-specific correlations from brain scan data, and removing race and ethnicity information from a dataset used to predict recidivism. That the motivation for removing undesirable correlations is quite different in the two applications illustrates the broad applicability of our approach.

Entities:  

Keywords:  Constrained optimization; Criminal justice; Neuroscience; Orthogonal predictions; Predictive modelling; Singular value decomposition

Year:  2021        PMID: 35755858      PMCID: PMC9221581          DOI: 10.1111/rssa.12613

Source DB:  PubMed          Journal:  J R Stat Soc Ser A Stat Soc        ISSN: 0964-1998            Impact factor:   2.175


  20 in total

1.  Thresholding of statistical maps in functional neuroimaging using the false discovery rate.

Authors:  Christopher R Genovese; Nicole A Lazar; Thomas Nichols
Journal:  Neuroimage       Date:  2002-04       Impact factor: 6.556

2.  Adjustment of systematic microarray data biases.

Authors:  Monica Benito; Joel Parker; Quan Du; Junyuan Wu; Dong Xiang; Charles M Perou; J S Marron
Journal:  Bioinformatics       Date:  2004-01-01       Impact factor: 6.937

3.  The sva package for removing batch effects and other unwanted variation in high-throughput experiments.

Authors:  Jeffrey T Leek; W Evan Johnson; Hilary S Parker; Andrew E Jaffe; John D Storey
Journal:  Bioinformatics       Date:  2012-01-17       Impact factor: 6.937

4.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

5.  An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest.

Authors:  Rahul S Desikan; Florent Ségonne; Bruce Fischl; Brian T Quinn; Bradford C Dickerson; Deborah Blacker; Randy L Buckner; Anders M Dale; R Paul Maguire; Bradley T Hyman; Marilyn S Albert; Ronald J Killiany
Journal:  Neuroimage       Date:  2006-03-10       Impact factor: 6.556

6.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Authors:  Daniela M Witten; Robert Tibshirani; Trevor Hastie
Journal:  Biostatistics       Date:  2009-04-17       Impact factor: 5.899

7.  Batch effect removal methods for microarray gene expression data integration: a survey.

Authors:  Cosmin Lazar; Stijn Meganck; Jonatan Taminau; David Steenhoff; Alain Coletta; Colin Molter; David Y Weiss-Solís; Robin Duque; Hugues Bersini; Ann Nowé
Journal:  Brief Bioinform       Date:  2012-07-31       Impact factor: 11.622

8.  Reduced interhemispheric resting state functional connectivity in cocaine addiction.

Authors:  Clare Kelly; Xi-Nian Zuo; Kristin Gotimer; Christine L Cox; Lauren Lynch; Dylan Brock; Davide Imperati; Hugh Garavan; John Rotrosen; F Xavier Castellanos; Michael P Milham
Journal:  Biol Psychiatry       Date:  2011-01-20       Impact factor: 13.382

9.  Enhanced cue reactivity and fronto-striatal functional connectivity in cocaine use disorders.

Authors:  Claire E Wilcox; Terri M Teshiba; Flannery Merideth; Josef Ling; Andrew R Mayer
Journal:  Drug Alcohol Depend       Date:  2011-04-03       Impact factor: 4.492

Review 10.  The Human Connectome Project's neuroimaging approach.

Authors:  Matthew F Glasser; Stephen M Smith; Daniel S Marcus; Jesper L R Andersson; Edward J Auerbach; Timothy E J Behrens; Timothy S Coalson; Michael P Harms; Mark Jenkinson; Steen Moeller; Emma C Robinson; Stamatios N Sotiropoulos; Junqian Xu; Essa Yacoub; Kamil Ugurbil; David C Van Essen
Journal:  Nat Neurosci       Date:  2016-08-26       Impact factor: 24.884

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.