Uri Shaham1, Kelly P Stanton2,3, Jun Zhao3, Huamin Li4, Khadir Raddassi5, Ruth Montgomery6, Yuval Kluger2,3,4. 1. Department of Statistics, Yale University, New Haven, CT 06511, USA. 2. Department of Pathology, Yale School of Medicine, New Haven, CT 06510, USA. 3. Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA. 4. Applied Mathematics Program, Yale University, New Haven, CT 06511, USA. 5. Departments of Neurology and Immunobiology. 6. Department of Internal Medicine, Yale School of Medicine, New Haven, CT 06510, USA.
Abstract
MOTIVATION: Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly calibrated. RESULTS: We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual neural network, trained to minimize the Maximum Mean Discrepancy between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and scRNA-seq datasets, and demonstrate that it effectively attenuates batch effects. AVAILABILITY AND IMPLEMENTATION: our codes and data are publicly available at https://github.com/ushaham/BatchEffectRemoval.git. CONTACT: yuval.kluger@yale.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly calibrated. RESULTS: We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual neural network, trained to minimize the Maximum Mean Discrepancy between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and scRNA-seq datasets, and demonstrate that it effectively attenuates batch effects. AVAILABILITY AND IMPLEMENTATION: our codes and data are publicly available at https://github.com/ushaham/BatchEffectRemoval.git. CONTACT: yuval.kluger@yale.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Evan Z Macosko; Anindita Basu; Rahul Satija; James Nemesh; Karthik Shekhar; Melissa Goldman; Itay Tirosh; Allison R Bialas; Nolan Kamitaki; Emily M Martersteck; John J Trombetta; David A Weitz; Joshua R Sanes; Alex K Shalek; Aviv Regev; Steven A McCarroll Journal: Cell Date: 2015-05-21 Impact factor: 41.582
Authors: Karthik Shekhar; Sylvain W Lapan; Irene E Whitney; Nicholas M Tran; Evan Z Macosko; Monika Kowalczyk; Xian Adiconis; Joshua Z Levin; James Nemesh; Melissa Goldman; Steven A McCarroll; Constance L Cepko; Aviv Regev; Joshua R Sanes Journal: Cell Date: 2016-08-25 Impact factor: 41.582
Authors: Florian Hahne; Alireza Hadj Khodabakhshi; Ali Bashashati; Chao-Jen Wong; Randy D Gascoyne; Andrew P Weng; Vicky Seyfert-Margolis; Katarzyna Bourcier; Adam Asare; Thomas Lumley; Robert Gentleman; Ryan R Brinkman Journal: Cytometry A Date: 2010-02 Impact factor: 4.355
Authors: Rachel Finck; Erin F Simonds; Astraea Jager; Smita Krishnaswamy; Karen Sachs; Wendy Fantl; Dana Pe'er; Garry P Nolan; Sean C Bendall Journal: Cytometry A Date: 2013-03-19 Impact factor: 4.355
Authors: Travers Ching; Daniel S Himmelstein; Brett K Beaulieu-Jones; Alexandr A Kalinin; Brian T Do; Gregory P Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M Hoffman; Wei Xie; Gail L Rosen; Benjamin J Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M Cofer; Christopher A Lavender; Srinivas C Turaga; Amr M Alexandari; Zhiyong Lu; David J Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura K Wiley; Marwin H S Segler; Simina M Boca; S Joshua Swamidass; Austin Huang; Anthony Gitter; Casey S Greene Journal: J R Soc Interface Date: 2018-04 Impact factor: 4.293