| Literature DB >> 34078287 |
Rui-Yi Li1, Jihong Guan1, Shuigeng Zhou2.
Abstract
BACKGROUND: The rapid development of single-cell RNA sequencing (scRNA-seq) enables the exploration of cell heterogeneity, which is usually done by scRNA-seq data clustering. The essence of scRNA-seq data clustering is to group cells by measuring the similarities among genes/transcripts of cells. And the selection of features for cell similarity evaluation is of great importance, which will significantly impact clustering effectiveness and efficiency.Entities:
Keywords: Clustering; Feature weighting; Single cell RNA sequencing; feature selection
Mesh:
Substances:
Year: 2021 PMID: 34078287 PMCID: PMC8171019 DOI: 10.1186/s12859-021-04033-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
A summary of 8 sc-RNAseq datasets
| Datasets | #Cells | #Clusters | #Genes | Unit | Sequencing protocol |
|---|---|---|---|---|---|
| GSE59892 [ | 49 | 3 | 25737 | FPKM | Smart-seq [ |
| GSE36552 [ | 90 | 7 | 19595 | FPKM | Tang et al [ |
| E-MTAB-3321 [ | 124 | 5 | 28223 | CPM | Smart-Seq2 [ |
| GSE51372 [ | 187 | 7 | 15584 | FPKM | Tang et al [ |
| E-MTAB-2600 [ | 704 | 3 | 21231 | CPM | Smart-Seq2 [ |
| GSE108097 [ | 2746 | 16 | 20670 | UMI | Microwell-seq [ |
| GSE60361 [ | 3005 | 9 | 19972 | UMI | Islam et al.[ |
| SRP073767 [ | 4271 | 8 | 16449 | UMI | 10X [ |
Feature selection results of the 8 sc-RNAseq datasets
| Datasets | #Genes | #Genes-S1 | #Genes-S2 |
|---|---|---|---|
| GSE59892 | 25737 | 9894 (38.4%) | 765 (3.0%) |
| GSE36552 | 19595 | 9786 (49.9%) | 283 (1.4%) |
| E-MTAB-3321 | 28223 | 9948 (35.2%) | 904 (3.2%) |
| E-MTAB-2600 | 30768 | 9897 (32.2%) | 981 (3.2%) |
| GSE51372 | 29018 | 3922 (28.5%) | 173 (0.7%) |
| GSE60361 | 19972 | 6740 (33.7%) | 79 (0.4%) |
| GSE108097 | 20670 | 8814 (42.6%) | 921 (4.5%) |
| SRP073767 | 16653 | 8997 (54.0%) | 830 (5.0%) |
DBI values of datasets before and after feature selection
| Datasets | Genes-all | Genes-S1 | Genes-S2 |
|---|---|---|---|
| GSE59892 | 2.03 | 2.12 | 1.81 |
| GSE36552 | 1.99 | 2.15 | 1.51 |
| E-MTAB-3321 | 3.31 | 3.20 | 3.15 |
| E-MTAB-2600 | 7.80 | 7.69 | 7.59 |
| GSE51372 | 4.48 | 4.08 | 2.87 |
| GSE60361 | 5.95 | 5.93 | 3.80 |
| GSE108097 | 7.98 | 7.49 | 5.59 |
| SRP073767 | 10.85 | 10.04 | 6.05 |
Fig. 1Results of 5 traditional clustering algorithms on 8 scRNA-seq datasets before (Genes-all) and after (Genes-CaFew) feature selection
Fig. 2Results of clustering methods specifically for scRNA-seq data. The value in the parentheses following each method’s name in the legend is the average ARI
Fig. 3Two-dim visualization results of 4 scRNA-seq datasets before (Genes-all) and after (Genes-CaFew) feature selection. a t-SNE visualization, b UMAP visualization
Fig. 4The pipeline of CaFew