| Literature DB >> 31068142 |
Wenbin Ye1,2, Guoli Ji1,2,3, Pengchao Ye1,2, Yuqi Long4, Xuesong Xiao1,2, Shuchao Li1,2, Yaru Su5, Xiaohui Wu6,7,8.
Abstract
BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data.Entities:
Keywords: Cell type clustering; Dropout imputation; Network propagation; Similarity measurement; Single cell RNA-sequencing
Mesh:
Year: 2019 PMID: 31068142 PMCID: PMC6505295 DOI: 10.1186/s12864-019-5747-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Schematic diagram of the scNPF framework. scNPF consists of two modules, scNPF-propagation for imputing dropouts `and scNPF-fusion for fusing multiple smoothed expression matrices to a cell-to-cell similarity matrix. Outputs from scNPF-propagation and scNPF-fusion can be used for downstream analyses of scRNA-seq data, such as visualization, dimension reduction, clustering, and lineage reconstruction
Fig. 2Benchmarking of scNPF-propagation on eight published scRNA-seq data sets.Clustering is performed by applying a consensus clustering method called SC3 on the imputed expression matrices. SC3 clustering is repeated for 10 times. Each dot represents an individual SC3 clustering run and each bar represents the median performance. ARI is employed to measure the concordance between inferred and true cluster labels. Detailed information of the data sets is shown in Additional file 2: Table S1
Fig. 3Benchmarking of scNPF-propagation on eight published scRNA-seq data sets using different propagation modes and/or priori networks. Clustering is performed by applying SC3 on the imputed expression matrices. SC3 clustering is repeated for 10 times. Each dot represents an individual SC3 clustering run and each bar represents the median performance. ARI is employed to measure the concordance between inferred and true cluster labels. Detailed information of the data sets is shown in Additional file 2: Table S1
Fig. 4Benchmark results of scNPF-fusion on the Darmanis data. a Heatmaps for similarities learned from the data by Euclidean distances, pairwise Pearson correlations, SIMLR, scNPF-fusion, and RAFSIL. The scales in relative units denote the similarity. Cells with the same cell type (annotated by the colored axes) are grouped together. b t-SNE visualization for similarity matrices learned from different similarity measures. Each point denotes a cell. Smaller distance between two cells means higher similarity. True labels were not used as inputs for dimension reduction but were indicated in distinct colors to validate the results. RAFSIL1 and RAFSIL2 denote the result from the RAFSIL tool with the embedded RAFSIL1 or RAFSIL2 method
Fig. 5Performance comparison of the five similarity measurements on eight published scRNA-seq data sets. a The internal validation metric of Dunn was employed to measure the cell separation. b ARI is employed to measure the concordance between inferred and true cluster labels. K-means clustering is applied on the similarity matrices obtained from different methods. K-means clustering is repeated for 10 times. Each dot represents an individual K-means clustering run and each bar represents the median performance. Detailed information of the data sets is shown in Additional file 2: Table S1
Fig. 6Performance comparison of similarities learned from scNPF-fusion with different network combinations on eight published scRNA-seq data sets. a Internal validation metric of Dunn was employed to measure the cell separation. b ARI is employed to measure the concordance between inferred and true cluster labels. K-means clustering is applied on the similarity matrices obtained from different methods. K-means clustering is repeated for 10 times. Each dot represents an individual K-means clustering run and each bar represents the median performance. Detailed information of the data sets is shown in Additional file 2: Table S1