| Literature DB >> 35337374 |
Zexian Zeng1,2,3, Yawei Li4, Yiming Li4, Yuan Luo5,6,7,8.
Abstract
The recent advancement in spatial transcriptomics technology has enabled multiplexed profiling of cellular transcriptomes and spatial locations. As the capacity and efficiency of the experimental technologies continue to improve, there is an emerging need for the development of analytical approaches. Furthermore, with the continuous evolution of sequencing protocols, the underlying assumptions of current analytical methods need to be re-evaluated and adjusted to harness the increasing data complexity. To motivate and aid future model development, we herein review the recent development of statistical and machine learning methods in spatial transcriptomics, summarize useful resources, and highlight the challenges and opportunities ahead.Entities:
Mesh:
Year: 2022 PMID: 35337374 PMCID: PMC8951701 DOI: 10.1186/s13059-022-02653-7
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Applications of computational approaches in spatial transcriptomics research. A Spatially resolved transcriptomics measures transcriptomes while preserving spatial information. Although spatial transcriptomics data retains spatial information, it is compromised with low cellular resolution and read coverage. B Computational approaches capable of harnessing the complexity of spatial transcriptomics data have been developed for applications of localized gene expression pattern identification, spatial decomposition, gene imputation, and cell-cell interaction. Some of these models leverage gene expression profiles from single-cell RNA-seq (scRNA-seq) data or prior ligand-receptor information from relevant databases to aid spatial transcriptomics research. C Sequencing protocols for scRNA-seq have achieved high-throughput profiling at single-cell resolution, but cellular spatial information is lost during sequencing. Compared to spatial transcriptomics, scRNA-seq is more accessible and can reach cellular resolution. D By leveraging information from the spatial transcriptomics data, spatial location reconstruction could be performed for scRNA-seq data with missing spatial information. In addition, spatial locations could be reconstructed de novo by integrating prior knowledge such as ligand-receptor pair information. E A typical analysis workflow for spatial transcriptomics data. GEMs, Gel beads-in-emulsions; UMAP, uniform manifold approximation and projection
A summary of algorithms, application scenarios, advantages, and disadvantages of the reviewed methods
| Name | Algorithms | Application scenarios | Advantages | Disadvantages |
|---|---|---|---|---|
| SpatialDWLS [ | Weighted least squares | Spatial decomposition | Higher accuracy and faster than benchmarked tools | High bias in estimating the proportion of rare cell types |
| SPOTlight [ | Seeded NMF regression | Spatial decomposition | High accuracy across multiple tissues | Does not incorporate capture location information to model spatial decomposition |
| RCTD [ | Poisson distribution with MLE | Spatial decomposition | Systematically models platform effect | Assumes that platform effects are shared among cell types |
| Negative binomial distribution with MAP | Spatial decomposition | Utilizes complete expression profiles rather than selected marker genes to achieve a higher accuracy | Requires deep sequencing depth | |
| DSTG [ | Semi-supervised GCN | Spatial decomposition | Higher accuracy than benchmarked tools | Highly dependent on the quality of the link graph that models the GCN |
| ProximID [ | Cluster label permutations | Cell-cell/gene-gene interactions | Does not require to physically separate the cells in FISH images | Cannot detect interactions that are not physically attached |
| MISTy [ | Multi-view framework to dissect effects related to CCI | Cell-cell/gene-gene interactions | 1. Does not require cell type annotation 2. Utilizes complete expression profiles | The extracted interactions cannot be directly considered as causal |
| stLearn [ | A toolbox containing integrated algorithms from multiple studies | 1.Cell-cell/gene-gene interactions 2. Spatial clustering 3. Cell trajectories inference | A streamlined package from raw inputs to in-depth downstream analysis | Only compatible with certain ST platforms |
| SVCA [ | Gaussian processes | Cell-cell/gene-gene interactions | Is applicable to both RNA-seq and proteomic data | Does not account for technology-specific noise |
| GCNG [ | GCN | Cell-cell/gene-gene interactions | Can infer novel CCIs and predict novel functional genes | The hyperparameters need to be re-optimized when applied to different datasets |
| Seurat V3 [ | Analysis pipelines with integrated algorithms | 1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data 3. Others | 1. A comprehensive data analysis pipeline 2. Can be applied to multi-omics datasets, including transcriptomic, epigenomic, proteomic, and spatially resolved single-cell data | Only available for certain types of ST platforms |
| LIGER [ | Integrative NMF | 1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data | The embeddings maintain both common and dataset-specific terms | Memory intensive compared to benchmarked tools |
| SpaGE [ | Domain adaptation model to align ST and scRNA-seq data to a common space | 1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data | Less memory usage and faster than benchmarked tools in large datasets | Only common genes in both datasets are included in the model |
| stPlus [ | Autoencoder model for dimensional reduction to map ST and scRNA-seq data into a shared space | Gene imputation | 1. Higher accuracy than benchmarked tools in cell type clustering 2. Less time and memory usage than most benchmarked tools other than SpaGE [ | Only applicable to data from image-based sequencing platforms |
| gimVI [ | Variational autoencoders for dimensional reduction to map ST and scRNA-seq data into a shared space | 1. Gene imputation 2. Dimensional reduction and feature extraction | Generates platform-specific patterns in the model for better biological interpretability | Slower than benchmarked tools in large datasets |
| Harmony [ | Maximum diversity clustering and mixture model based batch correction | 1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data | Can impute low abundant genes with high accuracy | The embeddings lack biological interpretability |
| DEEPsc [ | ANN | Gene imputation | A system-adaptive method specifically designed for gene imputation | Does not incorporate spatial information into the computation |
| Trendsceek [ | Marked point process | Identify SVGs | Does not need to specify a distribution or a spatial region of interest | Limited to a single gene at a time, computationally intensive |
| SpatialDE [ | Gaussian process regression | Identify SVGs | Can detect both temporal and periodic gene expression patterns for SVG identification | Does not identify spatial regions with distinct expression patterns, computationally intensive |
| SPARK [ | Generalized linear spatial models | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | 1. Low false discovery rate 2. Does not require the user to preprocess the raw count matrix | The hyperparameters (kernels and weights) need to be re-optimized when applied to different datasets |
| SpaGCN [ | GCN | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | Jointly identifies SVGs and spatial domains | Does not incorporate cell type information and tissue anatomical structure into the computation |
| SPARK-X [ | Non-parametric covariance test | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | Less time and memory usage and lower false discovery rate than most benchmarked tools, especially in large-scale and sparse ST data | Accuracy varies on different similarity measurements and covariance functions |
| Diffusion model | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | Can detect genes with irregular spatial patterns | Has CPU parallelization, but no GPU acceleration | |
| GLISS [ | Graph Laplacian-based model | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | Does not need to make distributional assumptions for either spatial or scRNA-seq data | Requires pre-specified landmark genes either manually or through other algorithms |
| Zhu et al. [ | HMRF | 1. Profile localized gene expression pattern 2. Identify SVGs 3. Identify interactions between cell type and spatial environment | Can identify de novo spatially associated subpopulations | Only available for in situ hybridization datasets |
| BayesSpace [ | Bayesian statistical method | 1. Profile localized gene expression pattern to enhance ST data resolution 2. Spatial clustering | Does not require independent single-cell data | Only considers the neighborhood structure present in data from ST and Visium platforms |
| Bergenstråhle et al. [ | Deep generative model | Gene expression prediction from histology images | Available for gene expression inference at transcriptome-wide level in histology images | Only in situ RNA capturing technologies are available |
| Seurat V1 [ | L1-constrained linear model | 1. Spatial location reconstruction for scRNA-seq data 2. Gene imputation | The idea of landmark genes allows the use of a small number of genes for spatial location reconstruction | Need to pre-compute the positions of landmark genes |
| CSOmap [ | Reconstructs cellular spatial organization based on cell-cell affinity by ligand-receptor interactions | 1. Identify cell-cell/gene-gene interactions 2. Spatial location reconstruction for scRNA-seq data | Does not need to predefine the tissue shape for cell-cell interaction inference Does not need to pre-define landmark gene sets | The extracted spatial structure is a pseudo-space structure |
| DistMap [ | Mapping scores to measure the similarity between spatial and scRNA-seq data | Construct 3D gene expression blueprint for the Drosophila embryo | High accuracy with only 84 in situ suffices | Gene regulation can be considered as the in situ suffices to improve the accuracy of model |
| Peng et al. [ | Spearman rank correlation to measure the similarity between spatial and scRNA-seq data | Spatial location reconstruction for scRNA-seq data | High accuracy with a small number of genes and cells required | No benchmark studies for accuracy comparison |
| Achim et al. [ | Measure correlations between spatial and scRNA-seq data | Spatial location reconstruction for scRNA-seq data | Most cells can be mapped with high confidence with only a small number of marker genes (~ 50 to 100) | Need to filter low-quality genes before modeling |
| SpaOTsc [ | Structured optimal transport model | 1. Spatial location reconstruction for scRNA-seq data 2. Cell-cell/gene-gene interactions 3. Identify gene pairs that potentially intercellularly regulate each other | 1. Most cells can be accurately mapped with only a small number of genes 2. Can identify intercellular gene-gene regulatory information | Does not consider the time delay (including the diffusion time of ligand or the reacting time of the intracellular cascades) that may take place in cell-cell communication |
| novoSpaRc [ | Generalized optimal-transport model | Spatial location reconstruction for scRNA-seq data | Does not need to specify landmark genes for alignment | The accuracy can be promoted by using different loss functions |
| Tangram [ | Non-convex optimization by deep learning methods for spatial alignment | 1. Spatial location reconstruction for scRNA-seq data 2. Spatial decomposition 3. Gene imputation from histology data | Is compatible with both capture-based and image-based ST data | Histology gene expression prediction is less accurate if cells cannot be segmented in the images |
| Cell2location [ | Hierarchical Bayesian framework | 1. Spatial location reconstruction for scRNA-seq data 2. Spatial decomposition | Capable of inferring the absolute number of cells per cell type for each capture location | Hyperparameters to be pre-specified are often unknown by the user |
| SC-MEB [ | HMRF based on empirical Bayes | Spatially clustering | Faster and more accurate than benchmarked tools, especially in large datasets | The assumption of a fixed hexagonal neighborhood structure in the model may not maintain high accuracy for all ST platforms |
| STAGATE [ | Graph attention auto-encoder | 1. Spatially clustering 2. Identify SVGs | Can be applied to three-dimensional ST datasets | The boundary of two sections needs to be further refined |
| MULTILAYER [ | Agglomerative clustering of quantile normalized ST data | 1. Spatially clustering 2. Identify SVGs | Higher accuracy than benchmarked tools when applied to data from different ST platforms | Sensitive to ST data with low spatial resolution |
| HisToGene [ | Attention-based (vision transformer) model | Gene expression prediction from histology images | Can predict the gene expression in histology images at capture location level | Requires a large number of tissue samples for model training |
| STARCH [ | HMRF and HMM | Infer copy number aberrations | Higher accuracy than benchmarked tools in predicting CNAs in spatial datasets | A limited number of CNV states (deletion, neutral, amplification) are considered |
| Giotto [ | A toolbox containing integrated algorithms from multiple studies | A comprehensive toolbox for ST analysis and visualization | Offers comprehensive pipelines for ST data analysis | Only available for some ST platforms |
Abbreviations: MLE maximum-likelihood estimation, MAP maximum a posteriori, GCN graph convolutional network, GNN graph neural network, NMF non-negative matrix factorization, PCA principal components analysis, HMRF hidden Markov random field, ANN artificial neural network, MCC Matthews correlation coefficient, HMM hidden Markov model, SVG spatially variable gene, CNA copy number alteration, CNV copy number variation, ST spatial transcriptomics, CCI cell-cell interaction, FISH fluorescence in situ hybridization
Fig. 2Model workflow testing independencies between gene expression and spatial locations in spatial transcriptomics data. A Spatial transcriptomics technology has enabled multiplexed profiling of cellular transcriptomes and spatial locations. B In spatial transcriptomics data, the transcriptome information is represented by a matrix with genes as rows and spatial locations as columns. Distances between the spatial locations are obtained based on their coordinates. C Covariance matrices of gene expressions and spatial coordinates are calculated based on the gene expression and spatial coordinates, respectively. D Test of significance on whether the gene expressions are independent of the spatial coordinates using the covariance matrices. E Model spatial transcriptomics data using graphs, where each node corresponds to a spatial location, and two nodes are connected if they have proximate locations or similar expression profiles. Graph convolutional networks can aggregate features from each spatial location’s neighbors through convolutional layers and utilize the learned representation to perform node classification, community detection, and link prediction. Extended applications include spatial decomposition, localized expression pattern identification, and cell-cell interaction inference
Fig. 3Leveraging expression profiles from scRNA-seq data and spatial patterns from spatial transcriptomics data benefits the analysis of both types of data. A In sequencing protocols where the size of the capture location is larger than a cell, multiple cells are profiled as a mixture. Cell type-specific expression profiles derived from scRNA-seq data can be used to estimate cell type proportions at different capture locations. B With both scRNA-seq and spatial transcriptomics data projected to and clustered in a common latent space, complementary information from one type of data can be used for imputing features missing from the other type, for instance, spatial pattern prediction for scRNA-seq data and gene imputation for spatial transcriptomics data. C Graphs can next be constructed based on the feature similarities in the latent space, allowing downstream graph-based methods such as graph convolutional networks. UMAP, uniform manifold approximation and projection