| Literature DB >> 35258562 |
Massimo Andreatta1,2, Ariel J Berenstein3, Santiago J Carmona1,2.
Abstract
: A common bioinformatics task in single-cell data analysis is to purify a cell type or cell population of interest from heterogeneous datasets. Here we present scGate, an algorithm that automatizes marker-based purification of specific cell populations, without requiring training data or reference gene expression profiles. scGate purifies a cell population of interest using a set of markers organized in a hierarchical structure, akin to gating strategies employed in flow cytometry. scGate outperforms state-of-the-art single-cell classifiers and it can be applied to multiple modalities of single-cell data (e.g. RNA-seq, ATAC-seq, CITE-seq). scGate is implemented as an R package and integrated with the Seurat framework, providing an intuitive tool to isolate cell populations of interest from heterogeneous single-cell datasets. AVAILABILITY: R package source code and reproducible tutorials are available at https://github.com/carmonalab/scGate. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Year: 2022 PMID: 35258562 PMCID: PMC9048671 DOI: 10.1093/bioinformatics/btac141
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Purifying cell populations from single-cell datasets using scGate. (A) Uniform Manifold Approximation and Projection (UMAP) representation of scRNA-seq data of PBMC populations annotated by Hao (B) Purification of target cell types using scGate, for B cells on the left (using marker MS4A1 [encoding CD20]) and NK on the right (using NCAM [encoding CD56] and KLRD1 as positive markers, and CD3D as a negative marker). The violin plots display normalized ADT counts for the indicated proteins on the same cells. Precision (PREC), recall (REC) and MCC are shown. (C) UMAP representation of scRNA-seq data of melanoma tumors annotated by Jerby-Arnon (D) Purification of macrophages using a hierarchical GM: immune cells at the first level (left panel) and macrophages at the second level (middle panel). Macrophage gene signature (UCell) scores are shown in the right panel. (E) scGate purification of monocytes using DNA accessibility of a PBMC 10× multiomics dataset. Violin plots display coupled RNA expression values. Gene-associated accessibility values were inferred using Signac (Stuart ). (F) PREC (Positive Predictive Value) and MCC values for five publicly available scRNA-seq datasets (derived from blood or tumors) for scGate and three other cell type classifiers