Chong Chen1,2, Changjing Wu1, Linjie Wu1, Xiaochen Wang1, Minghua Deng1,3,4, Ruibin Xi1,3. 1. School of Mathematical Sciences, Peking University, Beijing, China. 2. Damo Academy, Alibaba Group, Beijing, China. 3. Center for Statistical Sciences, Peking University, Beijing, China. 4. Center for Quantitative Biology, Peking University, Beijing, China.
Abstract
MOTIVATION: Single cell RNA-sequencing (scRNA-seq) technology enables whole transcriptome profiling at single cell resolution and holds great promises in many biological and medical applications. Nevertheless, scRNA-seq often fails to capture expressed genes, leading to the prominent dropout problem. These dropouts cause many problems in down-stream analysis, such as significant increase of noises, power loss in differential expression analysis and obscuring of gene-to-gene or cell-to-cell relationship. Imputation of these dropout values can be beneficial in scRNA-seq data analysis. RESULTS: In this paper, we model the dropout imputation problem as robust matrix decomposition. This model has minimal assumptions and allows us to develop a computational efficient imputation method called scRMD. Extensive data analysis shows that scRMD can accurately recover the dropout values and help to improve downstream analysis such as differential expression analysis and clustering analysis. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. The R package scRMD is available at https://github.com/XiDsLab/scRMD.
MOTIVATION: Single cell RNA-sequencing (scRNA-seq) technology enables whole transcriptome profiling at single cell resolution and holds great promises in many biological and medical applications. Nevertheless, scRNA-seq often fails to capture expressed genes, leading to the prominent dropout problem. These dropouts cause many problems in down-stream analysis, such as significant increase of noises, power loss in differential expression analysis and obscuring of gene-to-gene or cell-to-cell relationship. Imputation of these dropout values can be beneficial in scRNA-seq data analysis. RESULTS: In this paper, we model the dropout imputation problem as robust matrix decomposition. This model has minimal assumptions and allows us to develop a computational efficient imputation method called scRMD. Extensive data analysis shows that scRMD can accurately recover the dropout values and help to improve downstream analysis such as differential expression analysis and clustering analysis. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. The R package scRMD is available at https://github.com/XiDsLab/scRMD.