Samarendra Das1, Shesh N Rai2. 1. Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India; Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA; School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA. Electronic address: samarendra.das@louisville.edu. 2. Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA; School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA; Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA; Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA; Biostatistics and Informatics Facility, Center for Integrative Environmental Research Sciences, University of Louisville, Louisville, KY 40202, USA; Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY 40202, USA. Electronic address: shesh.rai@louisville.edu.
Abstract
Single-cell RNA sequencing (scRNA-seq) is a powerful technology that is capable of generating gene expression data at the resolution of individual cell. The scRNA-seq data is characterized by the presence of dropout events, which severely bias the results if they remain unaddressed. There are limited Differential Expression (DE) approaches which consider the biological processes, which lead to dropout events, in the modeling process. So, we develop, SwarnSeq, an improved method for DE, and other downstream analysis that considers the molecular capture process in scRNA-seq data modeling. The performance of the proposed method is benchmarked with 11 existing methods on 10 different real scRNA-seq datasets under three comparison settings. We demonstrate that SwarnSeq method has improved performance over the 11 existing methods. This improvement is consistently observed across several public scRNA-seq datasets generated using different scRNA-seq protocols. The external spike-ins data can be used in the SwarnSeq method to enhance its performance. AVAILABILITY AND IMPLEMENTATION: The method is implemented as a publicly available R package available at https://github.com/sam-uofl/SwarnSeq. Published by Elsevier Inc.
Single-cell RNA sequencing (scRNA-seq) is a powerful technology that is capable of generating gene expression data at the resolution of individual cell. The scRNA-seq data is characterized by the presence of dropout events, which severely bias the results if they remain unaddressed. There are limited Differential Expression (DE) approaches which consider the biological processes, which lead to dropout events, in the modeling process. So, we develop, SwarnSeq, an improved method for DE, and other downstream analysis that considers the molecular capture process in scRNA-seq data modeling. The performance of the proposed method is benchmarked with 11 existing methods on 10 different real scRNA-seq datasets under three comparison settings. We demonstrate that SwarnSeq method has improved performance over the 11 existing methods. This improvement is consistently observed across several public scRNA-seq datasets generated using different scRNA-seq protocols. The external spike-ins data can be used in the SwarnSeq method to enhance its performance. AVAILABILITY AND IMPLEMENTATION: The method is implemented as a publicly available R package available at https://github.com/sam-uofl/SwarnSeq. Published by Elsevier Inc.