| Literature DB >> 34698115 |
Hongyu Chen1, Yang Lv2,3, Xinxin Yin1, Xi Chen1, Qinjie Chu1, Qian-Hao Zhu4, Longjiang Fan1,5, Longbiao Guo2.
Abstract
Single-cell RNA (scRNA) profiling or scRNA-sequencing (scRNA-seq) makes it possible to parallelly investigate diverse molecular features of multiple types of cells in a given plant tissue and discover cell developmental processes. In this study, we evaluated the effects of sample size (i.e., cell number) on the outcome of single-cell transcriptome analysis by sampling different numbers of cells from a pool of ~57,000 Arabidopsis thaliana root cells integrated from five published studies. Our results indicated that the most significant principal components could be achieved when 20,000-30,000 cells were sampled, a relatively high reliability of cell clustering could be achieved by using ~20,000 cells with little further improvement by using more cells, 96% of the differentially expressed genes could be successfully identified with no more than 20,000 cells, and a relatively stable pseudotime could be estimated in the subsample with 5000 cells. Finally, our results provide a general guide for optimizing sample size to be used in plant scRNA-seq studies.Entities:
Keywords: Arabidopsis thaliana; cell number; sampling coverage; single-cell RNA (scRNA)
Mesh:
Substances:
Year: 2021 PMID: 34698115 PMCID: PMC8929096 DOI: 10.3390/cimb43030119
Source DB: PubMed Journal: Curr Issues Mol Biol ISSN: 1467-3037 Impact factor: 2.976
Figure 1Distribution of sample sizes (i.e., cell numbers) in the recent 1244 studies on single-cell RNA profiling. Technique means the methodology applied in profiling single-cell gene expression. The numbers in the round boxes mean cell number in plant single-cell researches. Distribution of the 1244 studies in different species are also shown in Supplementary Materials Table S2.
Figure 2Significant principal component (PC) numbers in the nine subsamples with different cell numbers.
Figure 3Effect of cell numbers on cell clustering. (A) tSNE maps of the nine subsamples with different cell numbers. (B) Changes of the number of cell clusters with the increase in cell numbers. (C) Changes of the annotated cell types with the increase in cell numbers. (D) Composition of cell types annotated in different subsamples. (E) The scores of the five statistical measures for evaluating cluster similarity of different sample size. The five indices are Rand index (RI), Morey and Agrest’s adjusted Rand index (MA), Hubert and Arabie’s adjusted Rand index (HA), Fowlkes and Mallows index (FM), and Jaccard index (JI).
Figure 4Effect of cell numbers on cell trajectory inference. (A) The UMAP map of nine subsamples with different cell numbers. (B) Density distribution map of meristem and hair cells on the pseudotime axis in the nine subsamples with different cell numbers. (C) The number of differentially expressed genes that change as a function of pseudotime.