Literature DB >> 31713629

SpatialDB: a database for spatially resolved transcriptomes.

Zhen Fan1, Runsheng Chen1,2, Xiaowei Chen1.   

Abstract

Spatially resolved transcriptomic techniques allow the characterization of spatial organization of cells in tissues, which revolutionize the studies of tissue function and disease pathology. New strategies for detecting spatial gene expression patterns are emerging, and spatially resolved transcriptomic data are accumulating rapidly. However, it is not convenient for biologists to exploit these data due to the diversity of strategies and complexity in data analysis. Here, we present SpatialDB, the first manually curated database for spatially resolved transcriptomic techniques and datasets. The current version of SpatialDB contains 24 datasets (305 sub-datasets) from 5 species generated by 8 spatially resolved transcriptomic techniques. SpatialDB provides a user-friendly web interface for visualization and comparison of spatially resolved transcriptomic data. To further explore these data, SpatialDB also provides spatially variable genes and their functional enrichment annotation. SpatialDB offers a repository for research community to investigate the spatial cellular structure of tissues, and may bring new insights into understanding the cellular microenvironment in disease. SpatialDB is freely available at https://www.spatialomics.org/SpatialDB.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2020        PMID: 31713629      PMCID: PMC7145543          DOI: 10.1093/nar/gkz934

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Cells are recognized as the fundamental unit of multicellular organisms. The establishment of single-cell RNA sequencing (scRNA-seq) (1,2) has boosted the development of modern cellular and molecular biology. scRNA-seq unmasks cell subsets within the bulk RNA-seq data and provides information of the transcriptome of each individual cell to group subpopulations of cells with similar transcription patterns. Cells are then characterized into various types accordingly. However, when applied to solid tissues, a dissociation step must be performed to obtain cell suspension for the subsequent scRNA-seq analysis. This process loses spatial information, which is critical to cellular fate and property. To fully understand cell type identity in a multicellular organism (such as human and mouse), one must integrate individual cells’ transcriptome profiles with their spatial position in a certain tissue. Several methods have been developed to preserve spatial information. Single molecule RNA fluorescence insitu hybridization (smFISH) (3) has been applied to quantitate RNA transcripts at single-cell resolution within a particular tissue context. But only a small number of genes can be measured. To improve the throughput, other imaging-based approaches, such as multiplexed error-robust FISH (MERFISH) (4) and sequential FISH (seqFISH) (5), were emerged. Meanwhile, sequencing-based methods, such as laser capture microdissection sequencing (LCM-seq) (6), Tomo-seq (7), spatial transcriptomics (ST) (8) and Slide-seq (9), take advantages of high-throughput sequencing technology to obtain spatially resolved gene expression even at the single-cell or subcellular resolution. The development of spatially resolved transcriptomic techniques have profoundly impact many fields, including neuroscience (5,10–11), developmental biology (7,12,13) and immunology (14). Besides, these techniques have also been applied to cancer tissues. In 2016, a study of breast cancer using ST uncovered unexpected heterogeneity within a biopsy (which would be impossible to detect by regular transcriptome analysis), and provided more detailed prognostic information (8). Profiling >6000 tissue regions in a single prostate by ST, Emelie et al. measured spatial gene expression in prostate cancer tissue sections and identified gene expression gradients for re-stratifying the tumor micro-environment (15). Improvements of the spatially resolved transcriptomic techniques led to the rapid accumulation of complex datasets with positional information. Due to the dramatic differences in the available approaches, a database to achieve handy comparison, integration and visualization of spatially resolved transcriptomic data is lacking. Therefore, we developed SpatialDB, a manually curated resource of spatially resolved transcriptomes for researchers to efficiently investigate and reuse these published data. The current version of SpatialDB includes 24 spatially resolved transcriptomic datasets in 5 species (human, mouse, drosophila, Caenorhabditiselegans and zebrafish) generated by 8 spatially resolved transcriptomic techniques, including ST (8), Slide-seq (9), LCM-seq (6), seqFISH (5), MERFISH (4), Liver single cell zonation (16), Geo-seq (12) and Tomo-seq (7). SpatialDB provides an online tool for visualization of spatially resolved transcriptomic data and quick retrieval of spatial gene expression in a certain tissue of interest. Moreover, spatially variable (SV) genes were identified in 10 datasets, and functional enrichment analysis were performed. We expect that SpatialDB may serve as a helpful resource to facilitate the exploring of spatial organization of cells in tissues.

DATA COLLECTION AND PROCESSING

Data collection

We collected published spatially resolved transcriptomic datasets by searching PubMed with the following keywords: ‘spatial’ AND (‘transcriptome’ OR ‘transcriptomics’ OR ‘RNA-seq’ OR ‘RNA sequencing’). We obtained 8 spatially resolved transcriptomic techniques and 24 datasets from the search results (Figure 1 and Table 1). For each technique, we extracted a brief description and a schematic from papers. For each dataset, we read the original paper, and extracted the corresponding metadata, including publication information, data description, experimental design, samples, data availability, etc. We downloaded the gene expression matrix data and spatial position information from supplementary materials of the papers, GEO database and custom data hubs. If a dataset contains biological/technical replicates or samples from different tissues or treatments, we manually divided it into multiple sub-datasets. A total of 305 sub-datasets were obtained (Table 1). Datasets generated by four techniques (Table 1) were at single-cell resolution. All the collected datasets and studies were published before May 2019.
Figure 1.

Overview of SpatialDB database. Spatially resolved transcriptomic data generated by eight techniques were collected from public resources. SpatialDB provided a web interface for online visualization and comparison of these data. Users can browse, search and download the datasets, SV genes and their functional annotations.

Table 1.

Statistics and description of spatially resolved transcriptomic techniques in SpatialDB

TechniquesDatasets No.aSV genesbSingle-cell resolutionData storageHighcharts module
Spatial Transcriptomics5 (46)4MySQLScatter, Heatmap
Slide-seq1 (5)1Single cellJSONScatter, Heatmap
LCM-seq4 (9)1MySQLScatter, Heatmap
seqFISH3 (35)2Single cellMySQLScatter, Heatmap
MERFISH1 (181)1Single cellJSONScatter, Heatmap
Liver single cell zonation2 (2)Single cellMySQLScatter, Heatmap
Geo-seq1 (3)1MySQLScatter, Heatmap
Tomo-seq7 (24)MySQLLine
Total24 (305)104

aThe number of datasets (sub-datasets) for each technique.

bThe number of datasets in which the SV genes were identified.

Overview of SpatialDB database. Spatially resolved transcriptomic data generated by eight techniques were collected from public resources. SpatialDB provided a web interface for online visualization and comparison of these data. Users can browse, search and download the datasets, SV genes and their functional annotations. Statistics and description of spatially resolved transcriptomic techniques in SpatialDB aThe number of datasets (sub-datasets) for each technique. bThe number of datasets in which the SV genes were identified.

Data processing

We performed median ratio normalization on ST datasets using DESeq2 (version 1.22.2) (17). Two LCM-seq datasets (6,18) did not provide precise coordinates of samples. We obtained the 2D positions of samples by using t-SNE algorithm (19). Five pucks of coronal hippocampus, sagittal cerebellum, kidney, liver and sagittal cortex from Slide-seq dataset (9) were processed for visualization. Cells that had less than 100 read counts were removed. Genes that had less than 300 read counts in all cells were removed. In order to facilitate online visualization, expression matrix and spatial position information of Slide-seq and MERFISH datasets were represented in JSON (JavaScript Object Notation) format for each gene. Datasets from the other six techniques were stored to MySQL (version 5.5.60), and one data table was created for each sub-dataset (Table 1). We used two methods, SpatialDE (20) and trendsceek (21), to identify SV genes in 10 datasets (Table 1). The source codes of the two methods were obtained from the GitHub website. The q-value threshold of significant SV genes identified by SpatialDE was 0.05. Trendsceek performed four statistical tests (Emark, Vmark, MarkCorr and MarkVario) for each gene. Genes were considered to be significantly SV if q-values were <0.05 for at least one of the four tests. GO and KEGG enrichment analysis of SV genes were performed using clusterProfiler package (version 3.12.0) (22). The package was obtained from Bioconductor (release 3.9). The parameters of enrichment analysis were as follows. GO: ont = ‘ALL’, pAdjustMethod = ‘BH’, pvalueCutoff = 0.05, qvalueCutoff = 0.2, keyType = ‘ENTREZID’. KEGG: pvalueCutoff = 0.05, pAdjustMethod = ‘BH’, minGSSize = 10, maxGSSize = 500, qvalueCutoff = 0.2, use_internal_data = FALSE.

DATABASE CONSTRUCTION AND CONTENT

Database construction

SpatialDB was constructed on a CentOS Linux server (version 7.6). The web services were built using Apache (version 2.4.6). The website was developed using PHP (version 7.0.33). The front-end of the website was developed using Bootstrap framework (version 3.3.7). The DataTables framework (version 1.10.19) was used to display data in tables. The online visualization of spatially resolved transcriptomes was implemented using Highcharts (version 7.1.1), jQuery (version 3.4.0) and d3 (version 3.3.10) JavaScript libraries. SpatialDB is freely available to the research community at https://www.spatialomics.org/SpatialDB and requires no registration or login.

Visualization of spatially resolved transcriptomes

We combined scatter module with heatmap module of Highcharts framework to implement the visualization of spatial gene expression profiles generated by seven techniques (Figure 1 and Table 1). Users can browse the spatial expression profile of the gene of interest in the selected sample. One point in the chart represents a group of cells or one single cell. Users can set the radius and symbol of the point for each chart manually. The color of the point represents the gene expression level. A color bar is shown at the bottom of the chart. A popup box will open to show detailed information (read counts, coordinates, etc.) of the point by hovering the mouse over it. Users can view in full screen, print chart, view data table and download the image or data in various formats by clicking the chart context menu on the top right corner of the chart. Users can click and hold down the left mouse button, then drag a rectangle in the chart to zoom in. Hold down the ‘Shift’ key and the left mouse button to drag in the chart to move the field of view. The histological images of tissue sections are shown as the background of charts for two ST datasets (8,15). Diagrams of murine small intestinal villus and liver lobule are displayed as the background of charts for the LCM-seq dataset (23) and the liver single cell zonation datasets (16,24), respectively. Users can clearly browse the spatial positions of points in the original tissues. A data transformation drop-down list is provided for each chart. Users can manually apply data transformation, including log2, ln and log10. We used line module of Highcharts to implement the visualization of datasets generated by Tomo-seq (Figure 1 and Table 1). In the charts of Tomo-seq datasets, the coordinates on X-axis represent the serial numbers of tissue slices. The Y-axis represents the gene expression level.

Side-by-side comparison of spatial gene expression

We implemented the comparison of spatial gene expression profiles in the SpatialDB web interface. We provided two web pages to compare the heatmap charts and the line charts, respectively. Users can compare the spatial gene expression of two datasets generated by the same or different techniques at the same time side by side. Taking the heatmap charts comparison web page as an example, users can select ‘Dataset1’ and ‘Dataset2’ from the drop-down list, then click ‘Compare’ button to show the two charts side by side. For each chart, users can input a gene of interest, select a certain sample and click ‘Submit’ to show its spatial expression profile. The charts in the comparison web page contain all the options and properties mentioned in the above section.

Spatially variable (SV) genes

For each of the 10 datasets, SV genes identified in all sub-datasets by the same method were collected together and displayed in one data table. Taking ST as an example, a description of ST technique and a table of ST datasets will display by clicking ‘Dataset’ in the navigation bar of the web interface. Then, users can select a dataset (for example: dataset 27365449) and click ‘Details’ in the fifth column of the first row. A table containing dataset details will display. Users can find a tab named ‘SV genes’ under the dataset details table. Users can click the column title to sort the SV gene table in ascending or descending order. A fuzzy search box is provided on the top right of the table. Users can quickly search the table by keywords of interest. Functional enrichment analysis results were listed below the SV gene table. The 50 most enriched GO or KEGG items were shown in the dot plot.

Search, download and upload

Users can easily obtain the spatial expression of a gene of interest across all the datasets from different techniques by searching SpatialDB. A quick search box has been embedded in the homepage of the web interface. A fuzzy search box is also provided on the top right of the search results table. Users can narrow the search scope by selecting species from the drop-down list. In addition, users can download all data via the ‘Download’ web page. If users would like to share their data, they can send necessary information to us through the ‘Upload’ web page. We will process the data and add to SpatialDB. A detailed tutorial for the usage of the database was also provided on the ‘Help’ page.

DISCUSSION

One of the main goals of the Human Cell Atlas Project (25) is to characterize the spatial relationship of all cell types. Spatially mapping multiple cell types based on expression signatures simultaneously may help to study the interaction between different cell types. Some strategies have been developed to detect the spatial gene expression profiles at single-cell resolution, such as Slide-seq (9), seqFISH (5) or MERFISH (10). In order to explore the spatial gene expression data more efficiently for the research community, we constructed SpatialDB—a database for spatially resolved transcriptomes. To our best knowledge, SpatialDB is the first dedicated database to curate spatially resolved transcriptomes. The spatial gene expression online visualization, comparison tools and SV gene annotations will be helpful for biologists to explore the spatial organization of cells. The Human Cell Atlas Project have greatly propelled life sciences researches at single-cell level. It is expected that more spatial transcriptomic techniques will be developed, and spatial gene expression data will accumulate rapidly. We will keep collecting new techniques and datasets to update SpatialDB. Furthermore, we will integrate more tools and data sources to analyze the data.
  24 in total

1.  clusterProfiler: an R package for comparing biological themes among gene clusters.

Authors:  Guangchuang Yu; Li-Gen Wang; Yanyan Han; Qing-Yu He
Journal:  OMICS       Date:  2012-03-28

2.  Genome-wide RNA Tomography in the zebrafish embryo.

Authors:  Jan Philipp Junker; Emily S Noël; Victor Guryev; Kevin A Peterson; Gopi Shah; Jan Huisken; Andrew P McMahon; Eugene Berezikov; Jeroen Bakkers; Alexander van Oudenaarden
Journal:  Cell       Date:  2014-10-23       Impact factor: 41.582

3.  Spatial Transcriptomics of C. elegans Males and Hermaphrodites Identifies Sex-Specific Differences in Gene Expression Patterns.

Authors:  Annabel Ebbing; Ábel Vértesy; Marco C Betist; Bastiaan Spanjaard; Jan Philipp Junker; Eugene Berezikov; Alexander van Oudenaarden; Hendrik C Korswagen
Journal:  Dev Cell       Date:  2018-11-08       Impact factor: 12.270

4.  A gene expression atlas of early craniofacial development.

Authors:  Eric W Brunskill; Andrew S Potter; Andrew Distasio; Phillip Dexheimer; Andrew Plassard; Bruce J Aronow; S Steven Potter
Journal:  Dev Biol       Date:  2014-04-26       Impact factor: 3.582

5.  In Situ Transcription Profiling of Single Cells Reveals Spatial Organization of Cells in the Mouse Hippocampus.

Authors:  Sheel Shah; Eric Lubeck; Wen Zhou; Long Cai
Journal:  Neuron       Date:  2016-10-19       Impact factor: 17.173

6.  Spatial reconstruction of immune niches by combining photoactivatable reporters and scRNA-seq.

Authors:  Chiara Medaglia; Amir Giladi; Liat Stoler-Barak; Marco De Giovanni; Tomer Meir Salame; Adi Biram; Eyal David; Hanjie Li; Matteo Iannacone; Ziv Shulman; Ido Amit
Journal:  Science       Date:  2017-12-07       Impact factor: 47.728

7.  Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region.

Authors:  Jeffrey R Moffitt; Dhananjay Bambah-Mukku; Stephen W Eichhorn; Eric Vaughn; Karthik Shekhar; Julio D Perez; Nimrod D Rubinstein; Junjie Hao; Aviv Regev; Catherine Dulac; Xiaowei Zhuang
Journal:  Science       Date:  2018-11-01       Impact factor: 47.728

8.  Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity.

Authors:  Emelie Berglund; Jonas Maaskola; Niklas Schultz; Stefanie Friedrich; Maja Marklund; Joseph Bergenstråhle; Firas Tarish; Anna Tanoglidi; Sanja Vickovic; Ludvig Larsson; Fredrik Salmén; Christoph Ogris; Karolina Wallenborg; Jens Lagergren; Patrik Ståhl; Erik Sonnhammer; Thomas Helleday; Joakim Lundeberg
Journal:  Nat Commun       Date:  2018-06-20       Impact factor: 14.919

9.  Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH.

Authors:  Chee-Huat Linus Eng; Michael Lawson; Qian Zhu; Ruben Dries; Noushin Koulena; Yodai Takei; Jina Yun; Christopher Cronin; Christoph Karp; Guo-Cheng Yuan; Long Cai
Journal:  Nature       Date:  2019-03-25       Impact factor: 49.962

10.  SpatialDE: identification of spatially variable genes.

Authors:  Valentine Svensson; Sarah A Teichmann; Oliver Stegle
Journal:  Nat Methods       Date:  2018-03-19       Impact factor: 28.547

View more
  16 in total

Review 1.  Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics.

Authors:  Sophia K Longo; Margaret G Guo; Andrew L Ji; Paul A Khavari
Journal:  Nat Rev Genet       Date:  2021-06-18       Impact factor: 53.242

2.  Comprehensive Analysis of the Expression and Prognostic Value of LMAN2 in HER2+ Breast Cancer.

Authors:  Di Zhang; Liping Ye; Shuang Hu; Qingqing Zhu; Chenxi Li; Chengming Zhu
Journal:  J Immunol Res       Date:  2022-06-06       Impact factor: 4.493

Review 3.  Spatial components of molecular tissue biology.

Authors:  Giovanni Palla; David S Fischer; Aviv Regev; Fabian J Theis
Journal:  Nat Biotechnol       Date:  2022-02-07       Impact factor: 68.164

Review 4.  Review of multi-omics data resources and integrative analysis for human brain disorders.

Authors:  Xianjun Dong; Chunyu Liu; Mikhail Dozmorov
Journal:  Brief Funct Genomics       Date:  2021-07-17       Impact factor: 4.241

Review 5.  Museum of spatial transcriptomics.

Authors:  Lambda Moses; Lior Pachter
Journal:  Nat Methods       Date:  2022-03-10       Impact factor: 28.547

6.  A Picture Worth a Thousand Molecules-Integrative Technologies for Mapping Subcellular Molecular Organization and Plasticity in Developing Circuits.

Authors:  Jacqueline A Minehart; Colenso M Speer
Journal:  Front Synaptic Neurosci       Date:  2021-01-05

7.  How to Use Online Tools to Generate New Hypotheses for Mammary Gland Biology Research: A Case Study for Wnt7b.

Authors:  Yorick Bernardus Cornelis van de Grift; Nika Heijmans; Renée van Amerongen
Journal:  J Mammary Gland Biol Neoplasia       Date:  2021-02-24       Impact factor: 2.673

Review 8.  The 27th annual Nucleic Acids Research database issue and molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

Review 9.  Probing infectious disease by single-cell RNA sequencing: Progresses and perspectives.

Authors:  Geyang Luo; Qian Gao; Shuye Zhang; Bo Yan
Journal:  Comput Struct Biotechnol J       Date:  2020-10-21       Impact factor: 7.271

Review 10.  From whole-mount to single-cell spatial assessment of gene expression in 3D.

Authors:  Lisa N Waylen; Hieu T Nim; Luciano G Martelotto; Mirana Ramialison
Journal:  Commun Biol       Date:  2020-10-23
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.