Literature DB >> 35465234

Hierarchical confounder discovery in the experiment-machine learning cycle.

Alex Rogozhnikov1, Pavan Ramkumar1, Rishi Bedi1, Saul Kato1,2, G Sean Escola1,3.   

Abstract

The promise of machine learning (ML) to extract insights from high-dimensional datasets is tempered by confounding variables. It behooves scientists to determine if a model has extracted the desired information or instead fallen prey to bias. Due to features of natural phenomena and experimental design constraints, bioscience datasets are often organized in nested hierarchies that obfuscate the origins of confounding effects and render confounder amelioration methods ineffective. We propose a non-parametric statistical method called the rank-to-group (RTG) score that identifies hierarchical confounder effects in raw data and ML-derived embeddings. We show that RTG scores correctly assign the effects of hierarchical confounders when linear methods fail. In a public biomedical image dataset, we discover unreported effects of experimental design. We then use RTG scores to discover crossmodal correlated variability in a multi-phenotypic biological dataset. This approach should be generally useful in experiment-analysis cycles and to ensure confounder robustness in ML models.
© 2022 The Author(s).

Entities:  

Keywords:  Mann-Whitney U test; bias; confounders; debiasing; experimental design; hierarchical confounders; machine learning; robustness; stem cell biology

Year:  2022        PMID: 35465234      PMCID: PMC9024009          DOI: 10.1016/j.patter.2022.100451

Source DB:  PubMed          Journal:  Patterns (N Y)        ISSN: 2666-3899


  12 in total

1.  A robust removing unwanted variation-testing procedure via γ -divergence.

Authors:  Hung Hung
Journal:  Biometrics       Date:  2019-08-20       Impact factor: 2.571

2.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors:  Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal:  JAMA       Date:  2016-12-13       Impact factor: 56.272

3.  Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization.

Authors:  Izhar Wallach; Abraham Heifets
Journal:  J Chem Inf Model       Date:  2018-05-08       Impact factor: 4.956

4.  Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes.

Authors:  Daniel Shu Wei Ting; Carol Yim-Lui Cheung; Gilbert Lim; Gavin Siew Wei Tan; Nguyen D Quang; Alfred Gan; Haslina Hamzah; Renata Garcia-Franco; Ian Yew San Yeo; Shu Yen Lee; Edmund Yick Mun Wong; Charumathi Sabanayagam; Mani Baskaran; Farah Ibrahim; Ngiap Chuan Tan; Eric A Finkelstein; Ecosse L Lamoureux; Ian Y Wong; Neil M Bressler; Sobha Sivaprasad; Rohit Varma; Jost B Jonas; Ming Guang He; Ching-Yu Cheng; Gemmy Chui Ming Cheung; Tin Aung; Wynne Hsu; Mong Li Lee; Tien Yin Wong
Journal:  JAMA       Date:  2017-12-12       Impact factor: 56.272

5.  Two-sample tests for comparing intra-individual genetic sequence diversity between populations.

Authors:  Peter B Gilbert; A J Rossini; Raj Shankarappa
Journal:  Biometrics       Date:  2005-03       Impact factor: 2.571

6.  Insights into the Mutational Burden of Human Induced Pluripotent Stem Cells from an Integrative Multi-Omics Approach.

Authors:  Matteo D'Antonio; Paola Benaglio; David Jakubosky; William W Greenwald; Hiroko Matsui; Margaret K R Donovan; He Li; Erin N Smith; Agnieszka D'Antonio-Chronowska; Kelly A Frazer
Journal:  Cell Rep       Date:  2018-07-24       Impact factor: 9.423

7.  Analysis of Transcriptional Variability in a Large Human iPSC Library Reveals Genetic and Non-genetic Determinants of Heterogeneity.

Authors:  Ivan Carcamo-Orive; Gabriel E Hoffman; Paige Cundiff; Noam D Beckmann; Sunita L D'Souza; Joshua W Knowles; Achchhe Patel; Dimitri Papatsenko; Fahim Abbasi; Gerald M Reaven; Sean Whalen; Philip Lee; Mohammad Shahbazi; Marc Y R Henrion; Kuixi Zhu; Sven Wang; Panos Roussos; Eric E Schadt; Gaurav Pandey; Rui Chang; Thomas Quertermous; Ihor Lemischka
Journal:  Cell Stem Cell       Date:  2016-12-22       Impact factor: 25.269

8.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.

Authors:  John R Zech; Marcus A Badgeley; Manway Liu; Anthony B Costa; Joseph J Titano; Eric Karl Oermann
Journal:  PLoS Med       Date:  2018-11-06       Impact factor: 11.069

9.  Resolving challenges in deep learning-based analyses of histopathological images using explanation methods.

Authors:  Miriam Hägele; Philipp Seegerer; Sebastian Lapuschkin; Michael Bockmayr; Wojciech Samek; Frederick Klauschen; Klaus-Robert Müller; Alexander Binder
Journal:  Sci Rep       Date:  2020-04-14       Impact factor: 4.379

Review 10.  Addressing variability in iPSC-derived models of human disease: guidelines to promote reproducibility.

Authors:  Viola Volpato; Caleb Webber
Journal:  Dis Model Mech       Date:  2020-01-17       Impact factor: 5.758

View more
  1 in total

1.  Data science, human intelligence, and therapeutics discovery: An interview with Sean Escola, Saul Kato, and Pavan Ramkumar.

Authors:  Pavan Ramkumar; Saul Kato; G Sean Escola
Journal:  Patterns (N Y)       Date:  2022-04-08
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.