| Literature DB >> 35603231 |
Erik P Storrs1,2, Daniel Cui Zhou1,2, Michael C Wendl1,2, Matthew A Wyczalkowski1,2, Alla Karpova1,2, Liang-Bo Wang1,2, Yize Li1,2, Austin Southard-Smith1,2, Reyka G Jayasinghe1,2, Lijun Yao1,2, Ruiyang Liu1,2, Yige Wu1,2, Nadezhda V Terekhanova1,2, Houxiang Zhu1,2, John M Herndon3,4, Sid Puram1, Feng Chen1, William E Gillanders3,4, Ryan C Fields3,4, Li Ding1,2,4.
Abstract
Motivation: The use of single-cell methods is expanding at an ever-increasing rate. While there are established algorithms that address cell classification, they are limited in terms of cross platform compatibility, reliance on the availability of a reference dataset and classification interpretability. Here, we introduce Pollock, a suite of algorithms for cell type identification that is compatible with popular single-cell methods and analysis platforms, provides a set of pretrained human cancer reference models, and reports interpretability scores that identify the genes that drive cell type classifications.Entities:
Year: 2022 PMID: 35603231 PMCID: PMC9115775 DOI: 10.1093/bioadv/vbac028
Source DB: PubMed Journal: Bioinform Adv ISSN: 2635-0041
Fig. 1.Pollock overview schema. Overview of Pollock model architecture, training, cell type prediction and pretrained models usage. During training, single-cell inputs are split into training and validation sets. (1a and b) A VAE with a classification head is fit with the training partition of the single-cell data. The model is trained with contributions from three loss functions: KL divergence loss on the latent embedding, ZINB gene expression reconstruction loss and cross-entropy loss on the cell type predictions. (2) Evaluation metrics are then computed on a validation set of withheld single-cell data. In addition to cell type prediction, Pollock also outputs feature importance’s for the input features of each predicted cell. (3) Following the training, Pollock models are saved and can be used for cell type inference at a later date
Fig. 2.Pollock feature comparison and benchmarking dataset overview. (A) Comparison of Pollock features against features implemented in other popular single-cell classification tools. (B) Datasets used for benchmarking and the training of disease-specific models
Fig. 3.Pollock benchmarking and performance. (A) Pollock cell type classification performance (F1-score) compared against six established single-cell classification methods for each disease and data type. (B) Comparison of Pollock cell type classification performance between disease-specific and generalized models. Confusion matrices showing the overlap of generalized model predicted cell types versus groundtruth cell labels for (C–E) scRNA-seq, snRNA-seq and snATAC-seq validation datasets and (F) a publicly available HCA bone marrow dataset
Fig. 4.Pollock cell state annotation in a pan-immune atlas. (A) Confusion matrix showing overlap of Pollock predicted versus groundtruth cell labels for a scRNA-seq BRCA immune cell state annotated dataset. (B) Comparison of Pollock feature importance score and gene expression for literature-based single-cell marker genes. (C) Significant GO: Molecular Function pathways enriched in the top 20 DWGs for the following NK/T cell states: NK, CD8 T cell-proliferating, CD8 T cell-exhausted and Treg. Pathways are rank-ordered by their −log10 FDR corrected P-values. (D) Heatmap displaying feature importance scores for the top 20 DWGs for each immune cell state