Literature DB >> 35498876

Sample Selection Bias in Evaluation of Prediction Performance of Causal Models.

James P Long1, Min Jin Ha1.   

Abstract

Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However prediction performance does depend on the selection of training and test sets. In particular biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren [5]. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance compared to standard association based estimators such as Lasso. Finally we compare the performance of causal estimators in simulation studies which reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.

Entities:  

Keywords:  causal inference; genetic perturbation experiments; prediction; sample selection bias

Year:  2021        PMID: 35498876      PMCID: PMC9053600          DOI: 10.1002/sam.11559

Source DB:  PubMed          Journal:  Stat Anal Data Min        ISSN: 1932-1864            Impact factor:   1.247


  3 in total

1.  Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors.

Authors:  Patrick Kemmeren; Katrin Sameith; Loes A L van de Pasch; Joris J Benschop; Tineke L Lenstra; Thanasis Margaritis; Eoghan O'Duibhir; Eva Apweiler; Sake van Wageningen; Cheuk W Ko; Sebastiaan van Heesch; Mehdi M Kashani; Giannis Ampatziadis-Michailidis; Mariel O Brok; Nathalie A C H Brabers; Anthony J Miles; Diane Bouwmeester; Sander R van Hooff; Harm van Bakel; Erik Sluiters; Linda V Bakker; Berend Snel; Philip Lijnzaad; Dik van Leenen; Marian J A Groot Koerkamp; Frank C P Holstege
Journal:  Cell       Date:  2014-04-24       Impact factor: 41.582

2.  Methods for causal inference from gene perturbation experiments and validation.

Authors:  Nicolai Meinshausen; Alain Hauser; Joris M Mooij; Jonas Peters; Philip Versteeg; Peter Bühlmann
Journal:  Proc Natl Acad Sci U S A       Date:  2016-07-05       Impact factor: 11.205

3.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.