| Literature DB >> 32477414 |
Yinghao Cao1, Xiaoyue Wang1, Gongxin Peng1.
Abstract
Currently most methods take manual strategies to annotate cell types after clustering the single-cell RNA sequencing (scRNA-seq) data. Such methods are labor-intensive and heavily rely on user expertise, which may lead to inconsistent results. We present SCSA, an automatic tool to annotate cell types from scRNA-seq data, based on a score annotation model combining differentially expressed genes (DEGs) and confidence levels of cell markers from both known and user-defined information. Evaluation on real scRNA-seq datasets from different sources with other methods shows that SCSA is able to assign the cells into the correct types at a fully automated mode with a desirable precision.Entities:
Keywords: CellMarker database; cell type annotation; differentially expressed genes; score annotation model; single-cell RNA sequencing
Year: 2020 PMID: 32477414 PMCID: PMC7235421 DOI: 10.3389/fgene.2020.00490
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Flowchart of the SCSA. First, DEGs of each cluster will be extracted and filtered from gene expression file. Next, SCSA employs marker gene databases to annotate cell clusters. In this step, both known marker gene database and user-defined marker database could be used simultaneously. For each cluster each database, a cell-gene matrix (M) with two vectors (E, L) will be generated to form a raw score vector (S). If multiple databases were selected, vectors would be normalized and combined together to make a new vector (Z), then multiplied with a database weight matrix (W) to make the last uniform vector. In the last step, ranked cell type vector will be generated according to the uniform score. In addition, SCSA employs GO enrichment analysis to give users some clue for unidentified clusters.
Figure 3Cell components of PBMCs predicted by SCSA. (A) Clustering of uniform scores of the top five predicted cell types in four PBMCs datasets by SCSA. Each column stands for one cluster of four PBMCs datasets and each row stands for one cell type. Uniform scores were normalized using the z-score method to make clusters comparable. (B) Percentages of four different cell types in four PBMCs datasets based on SCSA's prediction. (C) Five cell types plotted by t-SNE based on the prediction of SCSA for four PBMCs datasets.
Figure 2Performance of SCSA in comparison with other methods (scMatch, CellAssign, and Garnett) based on three known cell type datasets. Dataset identity was labeled on top of the panel with cluster numbers in brackets. For legends, “Positive” meant percentage of correctly predicted clusters, while “Negative” meant incorrectly predicted clusters and “Missed” meant predictions with uncertain cell types.