| Literature DB >> 34028547 |
Stefano Mangiola1,2, Maria A Doyle3,4, Anthony T Papenfuss1,2,3,4,5.
Abstract
MOTIVATION: Seurat is one of the most popular software suites for the analysis of single-cell RNA sequencing data. Considering the popularity of the tidyverse ecosystem, which offers a large set of data display, query, manipulation, integration, and visualisation utilities, a great opportunity exists to interface the Seurat object with the tidyverse. This interface gives the large data science community of tidyverse users the possibility to operate with familiar grammar.Entities:
Year: 2021 PMID: 34028547 PMCID: PMC9502154 DOI: 10.1093/bioinformatics/btab404
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Comparison between the data structure (https://github.com/boxuancui/DataExplorer) (top; abstracted tibble for tidyseurat) and the information presented to the user (bottom) for Seurat (A) and tidyseurat (B; including transcript information). The dataset underlying these visualizations is a subset of a peripheral blood mononuclear cell fraction provided by 10× (10xgenomics.com)
Example of a tibble abstraction of a Seurat table
| # A Seurat-tibble abstraction: 8033 × 11 | ||||||
|---|---|---|---|---|---|---|
| # Features = 1000 | Active assay=SCT | Assays=RNA, SCT | ||||||
| Cell | Total count | Total transcripts | PC1 | UMAP1 | Cluster | Cell type |
| cell_1 | 10 456 | 450 | −1.23 | −3.47 | 1 | T cell |
| cell_2 | 2088 | 400 | 0.98 | −1.59 | 2 | B cell |
| cell_3 | 11 309 | 699 | 5.55 | 1.26 | 5 | Monocyte |
| cell_4 | 8791 | 423 | −5.42 | −4.42 | 1 | Monocyte |
Note: Pre-existing cell-wise annotation and newly calculated information are all coexisting in a unique table.
Fig. 2.A cheat sheet of the tidyverse functionalities that tidyseurat enables for Seurat objects. This cheat sheet provides examples of the alternative tidyverse and Seurat syntax. The green colour scheme includes procedures that output a tidyseurat, if: (i) do not lead cell duplication; and (ii) key columns (e.g. cell identifier) are not excluded, modified, nor renamed (e.g. through a select, mutate and rename commands). In this case, a table (rather than an abstraction) is returned for independent analysis and visualization. The blue colour scheme includes procedures that return tibble tables for independent analyses and plotting. The grey-shaded boxes include the alternative code utilizing Seurat and base-R.
Fig. 3.Pseudo-code representing procedures for the analysis of single-cell RNA sequencing data integrating Seurat and tidyverse functions through tidyseurat. For functions that are not part of tidyseurat nor base R, package prefixes were added
Fig. 4.Tidyverse-compatible libraries offer powerful, flexible and extensible tools to visualize single-cell RNA sequencing data. Natively interfacing with such tools expands the possibilities for the user to learn from the data. Graphical results of the example workflow, integrating Seurat and tidyverse with tidyseurat. (A) Sample-wise distribution of biological indicators, including the proportion of mitochondrial transcripts and cell-cycle phase scores. For optimum visualization, a 20% subsampling was performed on the cell set. (B) Cells mapped in two- and three-dimensional UMAP space. The default integration of reduced dimensions and other cell-wise information in a tibble abstraction facilitates such visualization. (C) Distribution of transcript abundance for two marker genes, identified for each cluster identified by unsupervised estimation. Cells mapped in two-dimensional Uniform Manifold Approximation and Projection (UMAP) space. (D) Mapping of cells between the cell- or cluster-wise methods for cell-type classification. Only large clusters are labelled here. The colour scheme refers to cell types classified according to clusters. The bottom containers refer to the classification based on single cells. (E) Heatmap of the marker genes for cell clusters, produced with tidyHeatmap, annotated with data source and the first principal component. Only the ten largest clusters are displayed. The integrated visualization of transcript abundance, cell annotation and reduced dimensions is facilitated by the ‘join_features’ functionality and by the default complete integration of cell-wise information (including reduced dimensions) in the tibble abstraction
Example of a nested tidyseurat table, with gene markers calculated internally for each major immune cell type
| # A tibble 2 × 3 | ||
|---|---|---|
| Cell class | Data | Top markers |
| lymphoid | <tidyseurat> | RPL34, RPS27, RPL32, RPS3A, RPL21, RPL31 |
| myeloid | <tidyseurat> | S100A8, S100A9, S100A12, VCAN, CYP1B1, CD14 |
Note: This nesting is obtained with the nest-map combination from tidyverse.
Fig. 5.Presence of gamma delta T cells among lymphocytes, part of the case study for comparing Seurat with tidyseurat. (A) Integrative UMAP plot including both the signature score and the genes within the signature. Plots are faceted horizontally for biological condition (artifactual). (B) Interactive gating of high scoring cells for the gamma delta T cell signature (Pizzolato ), using tidygate (github.com/stemangiola/tidygate). (C) Distribution of the proportion of gamma delta T cells across patients from conditions A and B
Case study for the detection of gamma delta T cells among lymphoid cells
| Step | Seurat | tidyseurat |
|---|---|---|
|
| ||
|
|
signature_score_1 = seurat_obj[c(“CD3D”, “TRDC”, “TRGC1”, “TRGC2”),] %>% Seurat::GetAssayData(assay=“SCT”, slot=“data”) %>% colSums() %>% scales::rescale(to=c(0,1)) signature_score_2 = seurat_obj[c(“CD8A”, “CD8B”),] %>% Seurat::GetAssayData(assay=“SCT”, slot=“data”) %>% colSums() %>% scales::rescale(to=c(0,1)) seurat_obj$signature_score = signature_score_1 - signature_score_2 |
seurat_obj_sig = seurat_obj %>% join_features( features = c(“CD3D”, “TRDC”, “TRGC1”, “TRGC2”, “CD8A”, “CD8B”), shape = “wide”, assay = “SCT” ) %>% mutate(signature_score = scales::rescale(CD3D + TRDC + TRGC1 + TRGC2, to=c(0,1)) - scales::rescale(CD8A + CD8B, to=c(0,1)) ) |
|
|
splits = colnames(seurat_obj) %>% split(seurat_obj$sample) min_size = splits %>% sapply(length) %>% min() cell_subset = splits %>% lapply(function(x) sample(x, min_size)) %>% unlist() seurat_obj = seurat_obj[, cell_subset] |
seurat_obj_sig %>% add_count(sample, name = “tot_cells”) %>% mutate(min_cells = min(tot_cells)) %>% group_by(sample) %>% sample_n(min_cells) %>% |
|
|
Seurat::FeaturePlot( seurat_obj, features = c(“signature_score”, “CD3D”, “TRDC”, “TRGC1”, “TRGC2”, “CD8A”, “CD8B”), split.by = “type”, min.cutoff = 0.1 ) |
pivot_longer(cols=c(“CD3D”, “TRDC”, “TRGC1”, “TRGC2”, “CD8A”, “CD8B”, “signature_score”)) %>% group_by(name) %>% mutate(value = scale(value)) %>% ggplot(aes(UMAP_1, UMAP_2, color=value)) + geom_point() + facet_grid(type∼name) |
|
| ||
|
|
p = Seurat::FeaturePlot(seurat_obj, features = “signature_score”) seurat_obj$within_gate = colnames(seurat_obj) %in% CellSelector(plot = p) seurat_obj[[]] %>% # Pass object to plot |
seurat_obj_sig %>% mutate(gamma_delta = tidygate::gate_chr( UMAP_1, UMAP_2, .color = signature_score )) %>% |
|
|
add_count(sample, name = “tot_cells”) %>% count(sample, type, tot_cells, within_gate) %>% mutate(frac = n/tot_cells) %>% filter(within_gate == T) %>% | |
|
|
ggplot(aes(type, frac)) + geom_boxplot() + geom_point() | |
Note: Both Seurat and tidyseurat style coding is shown.