| Literature DB >> 35154244 |
Boxiang Liu1, Yanjun Li1, Liang Zhang1.
Abstract
Human and animal tissues consist of heterogeneous cell types that organize and interact in highly structured manners. Bulk and single-cell sequencing technologies remove cells from their original microenvironments, resulting in a loss of spatial information. Spatial transcriptomics is a recent technological innovation that measures transcriptomic information while preserving spatial information. Spatial transcriptomic data can be generated in several ways. RNA molecules are measured by in situ sequencing, in situ hybridization, or spatial barcoding to recover original spatial coordinates. The inclusion of spatial information expands the range of possibilities for analysis and visualization, and spurred the development of numerous novel methods. In this review, we summarize the core concepts of spatial genomics technology and provide a comprehensive review of current analysis and visualization methods for spatial transcriptomics.Entities:
Keywords: cell-type identification; clustering; dimensionality reduction; single-cell RNA-seq (scRNA-seq); spatial expression pattern; spatial interaction; spatial transcriptomics; visualization
Year: 2022 PMID: 35154244 PMCID: PMC8829434 DOI: 10.3389/fgene.2021.785290
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1An overview of spatial transcriptomic tasks. (A) Spatial transcriptomic datasets map gene expression measurements to their respective locations. (B) A spatial transcriptomic dataset can be analyzed in gene expression space, irrespective of spatial locations. Tasks such as clustering and cell-type identification fall into this category. (C) Spatial information can be used jointly with gene expression to detect spatial expression patterns and spatial domains. (D) These two sources of information can also be used to detect cell-cell and gene-gene interactions.
Current analysis and visualization tools for spatial transcriptomic datasets (accession date: 12/22/2021).
| Task | Tool | Inputs | Description | Language | Availability |
|---|---|---|---|---|---|
| Preprocessing | Space Ranger | Microscope images and FASTQ files | Space Ranger is an analysis pipeline for alignment, tissue and fiducial detection, barcode/UMI counting, and feature-spot matrix generation. | Bash and GUI |
|
| Scran (2016); | Gene expression | Scran uses pool-based and deconvoluted cell-based size factors for single-cell gene expression normalization. | R |
| |
| SCNorm (2017); | Gene expression | SCNorm uses double quantile regression-based model for gene-group normalization. | R |
| |
| Clustering | K-means | Gene expression | K-means iteratively assigns observations to the cluster with the nearest left. | R and Python | R: |
| Gaussian mixture model | Gene expression | GMM is similar to K-means but softly assigns observations to clusters based on the Gaussian distribution. | R and Python | R: | |
| hierarchical clustering | Gene expression | Hierarchical clustering iteratively merges closest observations. | R and Python | R: | |
| Louvain (2008); | Gene expression | Louvain performs community detection within networks by iterative optimization of modularity. | R and Python | R: | |
| Leiden (2019); | Gene expression | Leiden is a variant of the Louvain algorithm that guarantees well-connected communities. | R and Python | R: | |
| SC3 (2017); | Gene expression | SC3 performs consensus clustering of single-cell RNA-seq data. | R |
| |
| SIMLR (2017); | Gene expression | SIMLR is a multi-kernel learning approach for single-cell RNA-seq clustering. | R and MATLAB | MATLAB: | |
| Cell-specific marker genes | scran (2016); | Gene expression | Scran identifies consistently up-regulated genes through pairwise comparisons between clusters. | R |
|
| scGeneFit (2021); | Gene expression | ScGeneFit is a label-aware compressive classification method to select informative marker genes. | Python |
| |
| Cell-type identification | scmap (2018); | Gene expression | Scmap projects single-cell to References data sets with an approximate k-nearest-neighbor search. | R |
|
| SingleR (2019); | Gene expression | SingleR iteratively calculates pairwise correlation across single cells and remove lowly correlated cell type for noise control. | R |
| |
| Cell-ID (2021); | Gene expression of References and target single-cell datasets | Cell-ID performs multiple correspondence analysis (MCA) based gene signature extraction and cell identification | R |
| |
| JSTA (2021); |
| JSTA is a deep-learning-based cell segmentation and type annotation method by iteratively adjusting the assignment of boundary pixels based on the cell type probabilities for each pixel. | Python |
| |
| Dimensionality reduction | Principal component analysis | Gene expression | PCA identifies orthogonal vectors that maximize the variance of projections from data points. | R and Python | R: |
| t-SNE (2008); | Gene expression | T-SNE iteratively refines projections in the low dimensional space to match pairwise distances in the high dimension space. | R and Python | R: | |
| UMAP (2018); | Gene expression | UMAP is similar to t-SNE but faster and better preserves high dimensional structure. | R and Python | R: | |
| Spatially coherent genes | SpatialDE (2018); | Gene expression + spatial coordinates | SpatialDE uses gaussian process regression to decompose variability into spatial and non-spatial components. | Python |
|
| Trendsceek (2018); | Gene expression + spatial coordinates | Trendsceek uses marked point processes to identify spatial expression patterns. | R |
| |
| Spark (2018); | Gene expression + spatial coordinates | Spark is a generalized linear spatial model to identify spatial expression patterns. | R |
| |
| Spatial domains | Zhu | Gene expression + spatial coordinates | Zhu | R and Python | R: |
| SpaGCN (2021); | Gene expression + spatial coordinates + histology image | SpaGCN is a graph-convolutional-network-based method to jointly identify spatial domains and spatially variable genes. | Python |
| |
| Spot deconvolution | DSTG (2021); | Gene expression + spatial coordinates | DSTG builds a graph consisting of real and pseudo spatial transcriptomic data and apply graph convolutional network to predict real data’s cell type composition with help from pseudo data’s label. | Python |
|
| Super-resolution | BayesSpace (2021); | Gene expression + spatial coordinates | BayesSpace is a Bayesian model to leverage neighborhood information to enhance resolution. | R |
|
| Cell-cell interaction | SpaOTsc (2020); | Gene expression + spatial coordinates | SpaOTsc uses structured optimal transport between distribution of sender and receiver cells to identify cell-cell communication. | Python |
|
| Receptor-ligand interaction | GCNG (2020); | Gene expression + spatial coordinates | GCNG is a graph convolutional neural network to encode the spatial information as a graph and to predict whether a gene pair will interact. | Python |
|
| Integrative | Seurat (2018); | Gene expression + spatial coordinates | Seurat is an R package for integrative single-cell transcriptomic analysis. | R |
|
| Giotto (2021); | Gene expression + spatial coordinates | Giotto is an R package for integrative spatial transcriptomic analysis. | R |
| |
| Scanpy (2018); | Gene expression + spatial coordinates | Scanpy is a Python package for integrative single-cell transcriptomic analysis. | Python |
| |
| Squidpy (2021); | Gene expression + spatial coordinates | Squidpy is a Python package for integrative spatial transcriptomic analysis. | Python |
|
Current experimental methods for spatial transcriptomic profiling.
| Method | Type | Resolution | Genes | References |
|---|---|---|---|---|
| Visium | Spatial barcoding | 55 μm | Whole transcriptome |
|
| Slide-seq | Spatial barcoding | 10 μm | Whole transcriptome |
|
| HDST | Spatial barcoding | 2 μm | Whole transcriptome |
|
| DBiT-Seq | Spatial barcoding | 10 μm | Whole transcriptome |
|
| Seq-scope | Spatial barcoding | 0.5-0.8 μm | Whole transcriptome |
|
| Stereo-seq | Spatial barcoding | 0.5 or 0.715 μm | Whole transcriptome |
|
| SeqFISH |
| single-molecule | >10,000 |
|
| MerFISH |
| single-molecule | 100–1,000 |
|
| STARmap |
| single-cell | 160–1,020 |
|
| FISSEQ |
| subcellular | ∼8,000 |
|
FIGURE 2Comparison between principal component analysis and t-SNE. (A) Principal component analysis iteratively identifies vectors that minimize the sum of squared distances to the direction of the vector. Each vector is orthogonal to all previously selected vectors. (B) t-SNE calculates a pairwise similarity based on the probability density function of the Gaussian distribution in the original high dimension space. The points are randomly projected to a low dimensional space and iteratively refined so that the similarity in low dimension matches that in high dimension. At each iteration, similar pairs attract, and dissimilar pairs repel each other.
FIGURE 3Visualization of gene expression in the Euclidean space. (A) Spatially coherent genes and spatial domains can be visualized as 2D images. (B) Spot deconvolution methods estimate the proportion of each cell type within each spot. Pie charts are routinely used to represent cell type proportions within each spot. (C) Spot super-resolution methods estimate the cell type of sub-pixels based on correlation with neighbor spots. In this case, each spot of the original dataset is divided into nine spots in the super-resolved dataset.
FIGURE 4Identification of cell-cell and receptor-ligand interaction. (A) Cell-to-cell interaction can be identified by the correlation of gene expression values between cell pairs. (B) Receptor-ligand interaction can be identified by the correlation between genes in interacting cell types.