Md Bahadur Badsha1, Rui Li1, Boxiang Liu2, Yang I Li3, Min Xian4, Nicholas E Banovich5, Audrey Qiuyan Fu1. 1. Department of Statistical Science, Institute for Bioinformatics and Evolutionary Studies, Institute for Modeling Collaboration & Innovation, University of Idaho, Moscow, ID 83844, USA. 2. Department of Biology, Stanford University, Stanford, CA 94305, USA. 3. Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA. 4. Department of Computer Science, University of Idaho, Idaho Falls, ID 83401, USA. 5. The Translational Genomics Research Institute, Phoenix, AZ 85004, USA.
Abstract
BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses. METHODS: To overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates. RESULTS: On both simulated and real data, LATE and TRANSLATE outperform existing scRNA-seq imputation methods, achieving lower mean squared error in most cases, recovering nonlinear gene-gene relationships, and better separating cell types. They are also highly scalable and can efficiently process over 1 million cells in just a few hours on a GPU. CONCLUSIONS: We demonstrate that our nonparametric approach to imputation based on autoencoders is powerful and highly efficient.
BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses. METHODS: To overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates. RESULTS: On both simulated and real data, LATE and TRANSLATE outperform existing scRNA-seq imputation methods, achieving lower mean squared error in most cases, recovering nonlinear gene-gene relationships, and better separating cell types. They are also highly scalable and can efficiently process over 1 million cells in just a few hours on a GPU. CONCLUSIONS: We demonstrate that our nonparametric approach to imputation based on autoencoders is powerful and highly efficient.
Entities:
Keywords:
autoencoder; deep learning; gene expression; single-cell
Authors: Aleksandra A Kolodziejczyk; Jong Kyoung Kim; Valentine Svensson; John C Marioni; Sarah A Teichmann Journal: Mol Cell Date: 2015-05-21 Impact factor: 17.970
Authors: David van Dijk; Roshan Sharma; Juozas Nainys; Kristina Yim; Pooja Kathail; Ambrose J Carr; Cassandra Burdziak; Kevin R Moon; Christine L Chaffer; Diwakar Pattabiraman; Brian Bierie; Linas Mazutis; Guy Wolf; Smita Krishnaswamy; Dana Pe'er Journal: Cell Date: 2018-06-28 Impact factor: 41.582
Authors: Grace X Y Zheng; Jessica M Terry; Phillip Belgrader; Paul Ryvkin; Zachary W Bent; Ryan Wilson; Solongo B Ziraldo; Tobias D Wheeler; Geoff P McDermott; Junjie Zhu; Mark T Gregory; Joe Shuga; Luz Montesclaros; Jason G Underwood; Donald A Masquelier; Stefanie Y Nishimura; Michael Schnall-Levin; Paul W Wyatt; Christopher M Hindson; Rajiv Bharadwaj; Alexander Wong; Kevin D Ness; Lan W Beppu; H Joachim Deeg; Christopher McFarland; Keith R Loeb; William J Valente; Nolan G Ericson; Emily A Stevens; Jerald P Radich; Tarjei S Mikkelsen; Benjamin J Hindson; Jason H Bielas Journal: Nat Commun Date: 2017-01-16 Impact factor: 14.919