| Literature DB >> 31299886 |
Sam Tracy1,2, Guo-Cheng Yuan1,2, Ruben Dries3.
Abstract
BACKGROUND: Single-cell RNA-sequencing technologies provide a powerful tool for systematic dissection of cellular heterogeneity. However, the prevalence of dropout events imposes complications during data analysis and, despite numerous efforts from the community, this challenge has yet to be solved.Entities:
Keywords: Bootstrap; Dropout; Imputation; RNA-seq; Single-cell
Mesh:
Substances:
Year: 2019 PMID: 31299886 PMCID: PMC6624880 DOI: 10.1186/s12859-019-2977-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1A motivation of the RESCUE imputation pipeline illustrated with a hypothetical example of simulated data. a Heatmap of a log-transformed normalized expression matrix with cell type clustering affected by dropout. b t-SNE visualizations of cell clusters determined with the principle components of many subsamples of informative genes, and a histogram showing the bootstrap distribution of the within-cluster non-zero gene expression means for one missing expression value in the data set. c Heatmap of the expression data after imputing zero values with a summary statistic of the bootstrap distributions
Fig. 2Estimation bias after imputing simulated data (Additional file 14: Table S1; Primary). a Scatter plots compare the true transcript counts (x-axis) to estimated counts (y-axis) for those lost to dropout. The red diagonal indicates unbiased estimation. b The percent absolute error for all missing counts. c The percent error for counts specific to the top ten marker genes across cell types. The dashed lines indicate 100% error, or no improvement over dropout
Fig. 3Data visualization and cell-type clustering before and after imputing simulated data (Additional file 14: Table S1; Primary). a t-SNE visualization of the original data labeled by cell type. b t-SNE after dropout c t-SNE after application of RESCUE. d t-SNE after application of scImpute. e t-SNE after application of DrImpute. f The percent improvement after imputation over the data containing dropout in similarity measures between known cell types and clustering results
Fig. 4Estimation bias and recovery of differential expression after imputing the MCA uterus tissue data. a The percent absolute error for all missing counts. b The percent error for counts specific to top marker genes across cell types. Above 100% indicates no improvement over the data containing simulated dropout. c Log-fold changes in the two most differentially expressed marker genes for each cell type that went undetected after dropout
Fig. 5Data visualization and cell-type clustering before and after imputing the MCA data. a t-SNE visualization of the original uterus tissue data labeled by cell type. b t-SNE after dropout c t-SNE after application of RESCUE. d t-SNE after application of scImpute. e t-SNE after application of DrImpute. f The percent improvement after imputation over the data containing dropout in similarity measures between known cell types and clustering results for all four tissue types