| Literature DB >> 29409532 |
F Alexander Wolf1, Philipp Angerer2, Fabian J Theis3,4.
Abstract
SCANPY is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells ( https://github.com/theislab/Scanpy ). Along with SCANPY, we present ANNDATA, a generic class for handling annotated data matrices ( https://github.com/theislab/anndata ).Entities:
Keywords: Bioinformatics; Clustering; Differential expression testing; Graph analysis; Machine learning; Pseudotemporal ordering; Scalability; Single-cell transcriptomics; Trajectory inference; Visualization
Mesh:
Year: 2018 PMID: 29409532 PMCID: PMC5802054 DOI: 10.1186/s13059-017-1382-0
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1aSCANPY’s analysis features. We use the example of 68,579 peripheral blood mononuclear cells of [6]. We regress out confounding variables, normalize, and identify highly variable genes. TSNE and graph-drawing (Fruchterman–Reingold) visualizations show cell-type annotations obtained by comparisons with bulk expression. Cells are clustered using the Louvain algorithm. Ranking differentially expressed genes in clusters identifies the MS4A1 marker gene for B cells in cluster 7, which agrees with the bulk labels. We use pseudotemporal ordering from a root cell in the CD34+ cluster and detect a branching trajectory, visualized with TSNE and diffusion maps. b Speedup over CELL RANGER R kit. We consider representative steps of the analysis [6]. c Visualizing and clustering 1.3 million cells. The data, brain cells from E18 mice, are publicly available from 10x Genomics. PCA = principal component analysis, DC = diffusion component