| Literature DB >> 35513780 |
Jiayi Dong1,2, Yin Zhang1,2, Fei Wang3,4.
Abstract
BACKGROUND: With the development of modern sequencing technology, hundreds of thousands of single-cell RNA-sequencing (scRNA-seq) profiles allow to explore the heterogeneity in the cell level, but it faces the challenges of high dimensions and high sparsity. Dimensionality reduction is essential for downstream analysis, such as clustering to identify cell subpopulations. Usually, dimensionality reduction follows unsupervised approach.Entities:
Keywords: Autoencoder; Dimensionality reduction; Fine-tuning; Semi-supervised
Mesh:
Year: 2022 PMID: 35513780 PMCID: PMC9069784 DOI: 10.1186/s12859-022-04703-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Framework of scSemiAE. (1) Annotation: predicting cell type via a classifier and labeling partial cells with high confidence; (2) Pretraining: training an autoencoder with all cells; (3) Fine-tuning: adjusting the weights of encoder using labeled cells
Fig. 2Change of ARI (Louvain & K-means) and ACC (kNN) values with the increasing labeled proportion for six methods. a On Cortex dataset; b on Heart dataset; c on Limb Muscle dataset; d on Embryos dataset
Fig. 3Change of ARI (Louvain & K-means) values when increasing the number of labeled cell subpopulations for three semi-supervised methods. a on Cortex dataset; b on Heart dataset; c on Limb Muscle dataset; d on Embryos dataset
The details for all used datasets
| Dataset | # Cells | #Cell subpopulations | # Genes | # Cells of each subpopulation |
|---|---|---|---|---|
| Cortex | 3005 | 7 | 19972 | 939, 820, 399, 290, 235, 224, 98 |
| Heart | 4433 | 11 | 23341 | 775, 458, 344, 127, 100, 93, 58, 47, 41, 8 |
| Limb Muscle | 1521 | 6 | 23341 | 683, 354, 205, 172, 70, 37 |
| Embryos | 1529 | 5 | 24557 | 466, 415, 377, 190, 81 |
| Pancreas | 6321 | 13 | 34363 | 2281, 1172, 1065, 711, 405, 359, 180, 61, 24, 20, 17, 14, 12 |
Fig. 4a Change of ARI (Louvain) and ACC (kNN) values with the increasing labeled proportion for six methods on Pancreas dataset; b Change of ARI (Louvain) values when increasing the number of labeled cell subpopulations for three semi-supervised methods on Pancreas dataset
Fig. 5Comparison on the Pancreas dataset. a Visualization for the raw data; b Visualization for the embedding of scSemiAE; c Visualization for the embedding of netAE
Fig. 6Comparison on the Embryos dataset. a Visualization for the raw data; b Visualization for the embedding of scSemiAE; c Visualization for the embedding of netAE; d Visualization for the embedding of scANVI