| Literature DB >> 35765645 |
Yijun Li1, Stefan Stanojevic2, Lana X Garmire1,2.
Abstract
Spatial transcriptomics (ST) has advanced significantly in the last few years. Such advancement comes with the urgent need for novel computational methods to handle the unique challenges of ST data analysis. Many artificial intelligence (AI) methods have been developed to utilize various machine learning and deep learning techniques for computational ST analysis. This review provides a comprehensive and up-to-date survey of current AI methods for ST analysis.Entities:
Keywords: Artificial intelligence; Deep learning; Machine learning; Spatial transcriptomics
Year: 2022 PMID: 35765645 PMCID: PMC9201012 DOI: 10.1016/j.csbj.2022.05.056
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Overview of AI methodologies and application areas in ST data analysis. (a) Timeline of emerging AI methods in ST analysis, (b) characteristics of ST data, the potential reference datasets such as associated histology image and scRNA-Seq data, and the application areas in computational ST analysis: SVG detection, clustering, communication analysis, deconvolution, and enhancement.
Summary of AI methods in Spatial Transcriptomics Analysis.
| Method Category | Method Name | Description | Algorithm | Input | Advantage | Disadvantage | Software | Programming link |
|---|---|---|---|---|---|---|---|---|
| SVG detection | SOMDE | Uses self-organizing-maps to reduce the dimension of the ST dataset while retaining spatial structure and then detects SVG using a Gaussian Process model. | self-organizing-maps; Gaussian Process | ST data | SOMDE is runtime and memory efficient. | Performance on single-cell resolution ST datasets is unknown. | Python | https://github.co m/WhirlFirst/somde |
| scGCO | Identifies distinct gene expression patterns by optimizing the MRF model with graph cut. | graph cut; markov random field | ST data | scGCO is runtime and memory efficient and potentially scalable to large ST datasets. | Reproducibility needs to be further tested by comparison with other methods. | Python | https://github.co m/WangPeng-Lab/scGCO | |
| Clustering | SEDR | An autoencoder framework that learns low-dimensional joint embedding of spatial and gene expression information. | autoencoder; deep generative model | ST data | Capable of handling high-resolution ST datasets. | The utilization of spatial adjacency matrices could pose a problem for scaling up to the analysis of large ST datasets. | Python | https://github.com/HzFu/SEDR |
| coSTA | Treats each gene expression pattern as an image, extracts spatially-aware gene expression feature vectors through CNN, and clusters genes by spatial expression similarity. | convolutional neural network | ST data | Flexible to extend to model genes from neighboring samples, not just the same tissue. | The SVG detection functionality is not as sensitive as traditional SVG methods. | Python | https://github.com/rpmccordlab/CoSTA | |
| STAGATE | Uses a graph attention autoencoder and cell-type aware pruning module to cluster ST data. | graph attention autoencoder | ST data | Capable of handling ST datasets of diverse resolutions, especially those with cellular or sub-cellular resolution. | Doesn’t incorporate heterogeneity across tissue samples. | Python | https://github.com/zhanglabtools/STAGATE | |
| RESEPT | Embeds ST data to an RGB image through a graph autoencoder, and detects spatial domains by analyzing the RGB image with ResNet101, an established computer vision deep learning model. | graph autoencoder; deep convolutional neural network | ST data or RNA velocity | Flexible to analyze RNA velocity data as well as gene expression data. | Robustness regarding varying ST data resolution, technology platforms, and tissue types remains unexplored. | Python | https://github.com/OSU-BMBL/RESEPT | |
| spaGCN | Defines spatial domains by combining gene expression and spatial information through a graph convolutional neural network. | graph convolutional neural network | ST data; H&E images (optional) | Flexible enough to leverage H&E images in learning the embedded representation of ST data. | The reproducibility of the detection of spatially variable genes or metagenes remains unvalidated. | Python | https://github.com/jianhuupenn/SpaGCN | |
| stLearn | Uses pre-trained ResNet50 to leverage spatial neighborhood information in H&E images and extract morphological features for each spot, which are used to compute spatially-aware normalized gene expression. | deep convolutional neural network | ST data; H&E images | Clustering functionally can detect rare cell types in addition to spatial domains. | Method performance is dependent on the resolution of morphological images (if available). | R/Python | https://stlearn.readthedocs.io/en/latest/ | |
| spaCell | spaCell extracts image features with a pre-trained ResNet50 and combines them with gene expression with an autoencoder to detect spatial domains. | deep convolutional neural network; autoencoder | ST data; H&E images | Can analyze multiple images simultaneously to predict patient disease state. | Doesn’t utilize spatial coordinate information of the spots. | Python | https://github.com/BiomedicalMachineLearning/SpaCell | |
| MAPLE | Simultaneously analyze multiple ST datasets with a graph autoencoder and Bayesian finite mixture model to define cell spot sub-populations. | Graph autoencoder; Bayesian finite mixture model | Multi-sample ST data | Allows for simultaneous analysis of multiple ST datasets. | Assumes the same number of cell spot sub-populations across samples. | R | https://github.com/carter-allen/maple | |
| conST | An interpretable, multi-modal contrastive learning framework for learning joint graphical embedding of ST data for clustering and other downstream analyses. | Contrastive learning | ST data; matched H&E images (if applicable) | conST is the first contrastive learning computational method for ST data. | The parameter tuning in contrastive learning is non-trivial. | Python | https://github.com/ys-zong/conST | |
| Communication Analysis | GCNG | Infers ligand-receptor gene pair relationships by learning joint embedded features of gene pair expression values and cell-adjacency matrix using a graph convolutional neural network. | graph convolutional neural network | single-cell ST data | Directly uses spatial information in gene pair relationship inference and can predict novel interactions. | Does not incorporate prior cell-type knowledge in gene-gene relationship inference. | Python | https://github.com/xiaoyeye/GCNG |
| NCEM | Disentangles cell-cell communication on multiple orders through variations of a graph neural network model. | deep generative model | single-cell ST data | The multi-level framework of NCEM allows it to reconcile variation attribution and communication in different orders in a single model. | NCEM is currently only applicable to merFISH datasets; its performance on other single-cell ST platforms is unknown. | Python | https://github.com/theislab/ncem | |
| MISTy | An ensemble machine learning algorithm that uses random forest submodels to simultaneously learn gene interactions, local cellular niche effects, and overall communication analysis that accounts for tissue structure. | ensemble machine learning | ST data | Doesn’t require prior-knowledge-based cell type annotations. | Doesn’t guarantee causality for the extracted interactions. | R | https://saezlab. github.io/mistyR/ | |
| Deconvolution | Tangram | Aligns ST dataset with sn/sc RNA-seq data by matching spatial cell densities. The resulting mapping can be used for the deconvolution of lower-resolution ST data. | soft mapping | ST data; sn/sc RNA-Seq; H&E images (optional) | Capable of incorporating ST data with diverse resolutions. | The spot-to-cell assignment in deconvolution is random and can’t provide one-to-one alignment. | Python | https://github.com/broadinstitute/Tangram |
| DestVI | A deep generative model that learns cell-type-specific latent variables in scRNA-Seq data and maps them to ST data for deconvolution and cell state estimation. | deep generative model | ST data; scRNA-Seq | Addresses marked variation within cell types by directly estimating cell-type-specific latent variables. | External benchmark studies showed that DestVI’s performance was not robust across heterogeneous tissue types. | Python | https://scvi-tools.org/ | |
| CellDART | Deconvolves ST data using ADDA, where the model adaptively learns to distinguish between pseudo-spots generated from reference dataset with known cell proportions and actual ST spots. | Adversarial Discriminative Domain Adaptation (ADDA) | ST data; scRNA-Seq | Accommodates both ST and scRNA-Seq as reference data. | The size of the pseudo-spots is fixed, which could be susceptible to tissue types with heterogeneous spatial cell densities. | Python | https://github.com/mexchy1000/CellDART | |
| DSTG | Deconvolutes ST data by aligning pseudo-ST data and real ST data with a graph convolutional neural network. | graph convolutional neural network | ST data; scRNA-Seq | Simultaneously utilizes graphical structures and variable genes. | An external benchmark showed DSTG performance was not robust when the reference dataset is unmatched. | Python | https://github.com/Su-informatics-lab/DSTG | |
| Enhancement & Imputation | XFuse | A deep generative model that infers super-resolved spatial gene expression data by learning joint embedding space of ST data and high-resolution histological images. | deep generative model | single-cell ST data; histological images | Capable of spatial gene expression inference on a full-transcriptome scale. | The implicit assumption that histological images and ST data share the same latent space may introduce bias in spatial gene expression inference. | Python | https://github.com/ludvb/xfuse |
| DeepSpaCE | A convolutional neural network model that predicts spatial gene expression from histological images. | convolutional neural network | Matched H&E images from ST data. | The training of the DeepSpaCE model doesn’t require multiple samples. | Performance on other tissue types (besides human breast cancer) remains unvalidated. | Python | https://github.com/tmonjo/DeepSpaCE | |
| DEEPsc | A neural network-based method that infers spatial locations of scRNA-Seq data by extracting and aligning ST and scRNA-Seq feature vectors. | neural network | ST data; scRNA-Seq | Robust to random noise. | Training time is dependent on the dimension of spatial locations, which could pose scalability issues. | Matlab | https://github.com/fmaseda/DEEPsc | |
| stPlus | Enhances ST data by learning joint embedding of ST and scRNA-seq data via an autoencoder. | Autoencoder; k-NN | ST data; scRNA-Seq | Computationally scalable to large sample sizes or gene numbers. | An external benchmark study showed that stPlus had a relatively low accuracy rate in predicting spatial distribution of RNA transcripts. | Python | http://health.tsinghua.edu.cn/software/stPlus/ | |
Fig. 2General schematic of (a) the fully connected neural network, (b) the convolutional neural network, (c) the graph convolutional neural network, and (d) the autoencoder.