Sergio Marco Salas1, Xiao Yuan2, Christer Sylven3, Mats Nilsson1, Carolina Wählby2, Gabriele Partel2,4,5. 1. Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden. 2. Department of Information Technology and Science for Life Laboratory Uppsala University, Uppsala, Sweden. 3. Department of Medicine, Karolinska Institutet, Huddinge, Stockholm, Sweden. 4. Laboratory of Multi-omic Integrative Bioinformatics, Department of Human Genetics, KU Leuven, Leuven, Belgium. 5. VIB-KU Leuven Center for Brain & Disease Research, Laboratory of Computational Biology, Department of Human Genetics, Leuven, Belgium.
Abstract
With the emergence of high throughput single cell techniques, the understanding of the molecular and cellular diversity of mammalian organs have rapidly increased. In order to understand the spatial organization of this diversity, single cell data is often integrated with spatial data to create probabilistic cell maps. However, targeted cell typing approaches relying on existing single cell data achieve incomplete and biased maps that could mask the true diversity present in a tissue slide. Here we applied a de novo technique to spatially resolve and characterize cellular diversity of in situ sequencing data during human heart development. We obtained and made accessible well defined spatial cell-type maps of fetal hearts from 4.5 to 9 post conception weeks, not biased by probabilistic cell typing approaches. With our analysis, we could characterize previously unreported molecular diversity within cardiomyocytes and epicardial cells and identified their characteristic expression signatures, comparing them with specific subpopulations found in single cell RNA sequencing datasets. We further characterized the differentiation trajectories of epicardial cells, identifying a clear spatial component on it. All in all, our study provides a novel technique for conducting de novo spatial-temporal analyses in developmental tissue samples and a useful resource for online exploration of cell-type differentiation during heart development at sub-cellular image resolution.
With the emergence of high throughput single cell techniques, the understanding of the molecular and cellular diversity of mammalian organs have rapidly increased. In order to understand the spatial organization of this diversity, single cell data is often integrated with spatial data to create probabilistic cell maps. However, targeted cell typing approaches relying on existing single cell data achieve incomplete and biased maps that could mask the true diversity present in a tissue slide. Here we applied a de novo technique to spatially resolve and characterize cellular diversity of in situ sequencing data during human heart development. We obtained and made accessible well defined spatial cell-type maps of fetal hearts from 4.5 to 9 post conception weeks, not biased by probabilistic cell typing approaches. With our analysis, we could characterize previously unreported molecular diversity within cardiomyocytes and epicardial cells and identified their characteristic expression signatures, comparing them with specific subpopulations found in single cell RNA sequencing datasets. We further characterized the differentiation trajectories of epicardial cells, identifying a clear spatial component on it. All in all, our study provides a novel technique for conducting de novo spatial-temporal analyses in developmental tissue samples and a useful resource for online exploration of cell-type differentiation during heart development at sub-cellular image resolution.
Recent cell atlasing efforts to describe the cellular complexity of human organs, and provide comprehensive maps of their cell types, have been supported by both technological developments and a number of international initiatives [1,2]. One such technological advance is single-cell RNA sequencing (scRNA-seq) [3,4], enabling profiling the transcriptome of tens of thousands of individual cells after tissue dissociation, and thus defining the cell-type composition of a tissue. Single cell RNA sequencing data can be further combined with more recent spatially resolved techniques [5,6] to create organ-wide gene expression atlases that map cell-type distributions and spatial biological programs directly in tissue samples.With the aim of producing a spatiotemporal gene expression and cell atlas of the developmental human heart, Asp et al. [7] recently combined three different high-throughput technologies for gene expression profiling with immunohistochemical staining. They studied three developmental stages in the first trimester at 4.5–5, 6.5 and 9 post-conception weeks (PCW) using (1) Spatial Transcriptomics [8] (ST) for untargeted spatial gene expression profiling of grid-microdissected tissues, (2) scRNA-seq of 6.5 PCW tissue samples for dissecting cellular heterogeneity at single cell resolution, and (3) in situ sequencing [9] (ISS) to resolve the spatial heterogeneity at subcellular resolution. Finally, probabilistic cell-mapping via pciSeq [10] was applied to achieve single cell level maps of the cell-type distribution. PciSeq both assigns in situ decoded reads to segmented cells and cells to scRNA-seq defined cell-types by a probabilistic approach. The result of the overall study was the first comprehensive spatial transcriptional atlas of the developmental human heart.Despite this achievement, some important limitations were found in the study, as pointed out also by Phansalkar et al. [11] in a commentary to the paper. One of the main limitations of the probabilistic approach is that preexisting knowledge of the tissue constituent cell-types is required to characterize the spatial cellular heterogeneity. Thus, probabilistic cell typing by in situ sequencing (pciSeq) was only possible for the 6.5 PCW developmental stage where single cell data was available, leaving the cellular diversity in the 4.5–5 and 9 PCW time points unexplored at a single cell level. Additionally, cell-typing methods that depend on priors defined by scRNA-seq, such as pciSeq or Tangram [12], may introduce a strong bias that can limit the possibility to distinguish between rare cell sub-types or sub-states that might not be fully captured by scRNA-seq sampling approaches, thus preventing to resolve the full range of heterogeneity of a sample. Finally, most of the current cell-typing methods rely on their ability to segment out 3D cells from a 2D representation of the tissue, leading to possible misidentification of cells and misclassification of reads.In order to overcome these limitations, we present a de novo spatiotemporal analysis of the Asp et al. ISS data where we model all three developmental stages at the same time. For our analysis we applied a data driven approach, called spage2vec [13], that generates a common representation of the spatial gene expression at the different developmental stages. Spage2vec represents the spatial gene expression as a graph and applies a graph representation learning technique to create a lower dimensional representation of the data independent from scRNA-seq defined priors. We then used this common data representation to define the identities and the spatiotemporal relationship of cell- and sub-cell-type gene expression signatures across the three developmental stages of the embryonic heart, identifying previously unreported molecular identities within cardiomyocytes as well as in atrial sub-epicardial cells. We provide newly molecularly defined maps of cell-type signatures during embryonic human heart development on an online platform for interactive exploration (https://tissuumaps.research.it.uu.se/human_heart.html)[14], that together with the Asp et al. atlas[7] represents a useful resource for future studies on human heart development.
Results
Identification of de novo cellular signatures during the early heart development
With the aim of exploring cellular diversity during the human heart development, spage2vec [13] was used to identify cellular expression signatures de novo across the three developmental stages ( (Materials and Methods). The analysis is based on the locations of genes represented in the gene panel used during ISS, and is thus dependent on the panel’s ability to represent heterogeneity in gene expression. A total of 27 clusters with specific cellular expression signatures were found across the three time points of the heart development ( The signatures were grouped in five main classes, and annotated according to their expression patterns, including atrial cardiomyocytes, ventricular cardiomyocytes, fibroblast-like cells/epicardium-derived cells, epicardial cells and neural crest cells (. Clusters assigned to the same class were found to have similar molecular and spatial patterns ( Their distribution was also found to be consistent through the different samples and time points analyzed and most of them were found to have a conserved location in the heart between PCW 4.5–5, PCW 6.5 and PCW 9 (.
Overview of the Spage2vec approach to characterize the human developmental heart.
A. Spage2vec constructs a graph from the spatial gene expression of the tissue samples and projects spatial markers in a common latent space. Scale bars: 1 mm, cutout 15 μm. B. PAGA plot representing the different expression profiles defined using spage2vec. Background colors represent main cell classes manually annotated based on cluster expression profiles. C. Heatmap showing the mean expression of each gene (i.e. expression profile) in the clusters defined by spage2vec, along with suggested cell classes, color coded as in Fig 1C. Expression is normalized by gene D. Spatial maps of the different expression profiles defined using spage2vec in three sections (color coded as Fig 1B), one from each time point. For interactive multi-resolution viewing: https://tissuumaps.research.it.uu.se/human_heart.html.
Spage2vec represents the cellular diversity better than pciSeq
In order to identify common findings and discrepancies with previous results, we compared the cellular identities defined by spage2vec with the ones described in Asp et al. [7]. This comparison was limited to the only time point analyzed via probabilistic pciSeq[10]; PCW 6.5 (, Materials and Methods). A total of 27 meaningful cellular identities shared across all three time points were defined, in contrast to the 15 cell types defined from scRNA-seq data and assigned in situ via pciSeq in Asp et al. Most of these additional clusters capture a previously undescribed diversity within cardiomyocytes (, while other cell types such as endothelial cells or fibroblast-like cells present a one-to-one correspondence. This is also observed when comparing the expression signatures of the spage2vec clusters and the cell types described in the single cell RNA sequencing dataset from Asp et al. (. Regarding the spatial location of the clusters, both methods agreed on the location of some clusters such as epicardial cells and, to a lesser extent, capillary endothelial cells (. However, more striking differences were observed when comparing the location of other cell types, such as the clusters with a fibroblast-like expression signature, where spage2vec clusters present a more specific spatial distribution through the tissue. (.
Individual time point analysis uncovers time-point specific expression signatures
To investigate whether the larger PCW 9, samples with a higher number of cells, could be driving the clustering results, leading to a misclassification of cells in the earlier and smaller tissue samples, individual clustering was performed separately on each of the time points (Materials and Methods). A total of 94 clusters were found, including 34 in PCW 4.5–5 and 30 both in PCW 6.5 and PCW 9 (. Overall, the clusters found in the different time points present a similar distribution in the latent space for all three time points (. In addition, most clusters found in specific time points recapitulated molecular and spatial signatures found when analyzing all time points together (. The clear correspondence between both analyses shows that the diversity found at different time points was not driven by any of the samples individually (
Analysis of individual developmental stages and correspondence with single cell data.
A. PAGA plots of the clusters found in each time point specific analysis. Clusters from each time point are represented in a PAGA plot, including 4.5–5 PCW (top), 6.5 PCW (middle) and 9 PCW (bottom). Background colors represent the main cell types found in the dataset. B. Heatmap representing correspondence in terms of cosine similarities between scRNA-seq data and spage2vec clusters from PCW 6.5 (Materials and Methods), normalized by row, therefore ranging from 0 to 1. Cells from scRNA-seq dataset (rows) are sorted based on their cell type in order to facilitate the interpretation.One important strength of the de novo analysis presented here is its ability to resolve the cellular heterogeneity at a higher resolution compared to the scRNA-seq data driven analyses, finding a larger number of clusters with distinct and consistent spatial distribution across the different samples, despite that the molecular information per cell is much lower than in the scRNAseq. One reason for this is the vastly larger number of cells analyzed in the ISS experiment (20.920 cells), compared to the scRNA seq experiment (3.777 cells). In order to assess whether this spatially defined diversity could also be found in the scRNA-seq dataset, the molecular signatures from the intermediate time point samples (PCW 6.5) and its corresponding scRNA-seq dataset were integrated using SpaGE [15] (Method). As observed in Fig 2B, several molecular signatures matched specific subpopulations in the single cell dataset at week 6.5. To compare the differences between the subpopulations, we identified their most differentially expressed genes (Figs (Materials and Methods) and assessed their gene ontology (GO) characteristics using scRNA-seq and compared their spatial locations in the tissue (.
Fig 2
Analysis of individual developmental stages and correspondence with single cell data.
A. PAGA plots of the clusters found in each time point specific analysis. Clusters from each time point are represented in a PAGA plot, including 4.5–5 PCW (top), 6.5 PCW (middle) and 9 PCW (bottom). Background colors represent the main cell types found in the dataset. B. Heatmap representing correspondence in terms of cosine similarities between scRNA-seq data and spage2vec clusters from PCW 6.5 (Materials and Methods), normalized by row, therefore ranging from 0 to 1. Cells from scRNA-seq dataset (rows) are sorted based on their cell type in order to facilitate the interpretation.
Exploration of molecular signatures identified within cardiomyocytes and endothelial cells.
A. Dot plots representing the expression of the 3 most differentially expressed genes between each of the clusters (cl) related with endothelial cells (cluster 14 and 17)(left), ventricular (cluster 4, 8 and 15)(middle) and atrial (cluster 1,2 and 7)(right) and cardiomyocytes. Expression is shown in the endothelial and cardiomyocyte related clusters linked to specific populations within the scRNA-seq dataset from Asp.et al.[7] B. Spatial maps highlighting the reads assigned to the clusters related with endothelial cells and cardiomyocytes (right), with gene ontology enrichment of biological processes for the top 15 most differentially expressed genes of each cluster (left). Color codes as in A.
Cardiomyocytes present a previously uncaptured cellular diversity
The developing heart’s cardiomyocytes provides a clear example where spage2vec clusters identify high molecular diversity. Three different molecular signatures were found within atrial cardiomyocytes and a total of five molecular signatures were found within ventricular cardiomyocytes, three of them having supporting scRNA-seq data (. Top differentially expressed genes (DEG) between the different subtypes were found to be expressed only in cardiomyocytes, presenting in scRNAseq the same coexpression patterns as in the ISS dataset, and suggesting that the diversity identified within this group of cells is not caused by the influence of neighboring cells ( These newly identified clusters also present a better-defined region-specific location compared to analogous pciSeq cell-type maps in Asp et al., where some atrial cells are misplaced in the ventricles and vice versa (. Within ventricular cardiomyocytes, the five different spage2vec clusters presented unique expression patterns and spatial distributions from the periphery to the interior of the heart. However, not all the clusters were aligned with corresponding cell subpopulations from scRNA-seq data integration (Method). While clusters 4, 8 and 15 aligned within both ventricular and MYOZ2-enriched cardiomyocytes, cluster 16 and 24 presented a very weak alignment within the cells described in the scRNA-seq analysis ( Indeed, we related pcw6.5–16 with the presence of some endothelial markers the neighboring cells, suggesting a mixed signature ( We found cluster 4 to be located within the ventricular wall, presenting a high expression of MYH7 and characteristics of trabecular myocardium. In contrast, cluster 8 had a peripheral location and was also found to have a strong expression of MYH7 consonant with outer, compact myocardium (. Both molecular signatures had GO characteristics of contracting ventricular muscle although these GO terms were more pronounced for trabecular myocardium. Finally, cluster 15, with a preferential location diffusively in the outer compact myocardium, was found to be related with cell division. As a consequence, we believe that these cells may be cardiomyoblasts participating in the consolidation of the compact myocardium.Regarding the clusters associated with atrial cardiomyocytes, the three molecular signatures identified presented distinct locations and molecular functions as suggested by Gene Ontology analysis (. While cluster 1 was located mainly in the periphery of the atria and presented GO characteristics of appendage formation, cluster 2 partly had a more central location, presenting GO characteristics of cardiac conduction. However, cluster 7 was found to be the most distinct molecular signature both spatially, being localized in the cranial and caudal part of the atria, and molecularly, presenting GO characteristics of morphogenesis and epithelial to mesenchymal transition ( These spatial and GO characteristics suggest that these cells might be involved in keeping with the formation of the atrial septum that occurs at this stage of development.Finally, two different clusters were found to be associated with endocardium-related cells, one being located in the atria and the other one in the ventricles (. The cells belonging to these clusters show very distinct spatial localization. However, unlike cardiomyocytes, we did not find notable differences in their gene expression; both present enrichment in GO terms involved in cardiovascular morphogenesis and development (. In addition, DEG between both clusters were found solely expressed in atrial and ventricular cardiomyocytes respectively, suggesting that both clusters identified the same cell type (endocardial cells) in two different spatial contexts (
An atria-specific subepicardial signature during human heart development
Perhaps one of the most remarkable aspects of the analysis was the identification of very thin sub-epicardial mesenchymal cell layers in the time point-specific analysis of PCW 6.5, possibly originating from epithelium via epithelial–mesenchymal transition (EMT) [16,17]. Using pseudotime analysis, we identified two main branches emerging from epicardial cells (cluster 18), which could be indicating two differentiation trajectories: one describing the differentiation of epicardial cells into epicardial derived cells and fibroblasts (i.e., cluster 18-21-26-9-12-10), and a second one describing the differentiation of epicardial cells into atrial cardiomyocytes (i.e., cluster 18-11-1-2-7) (. An additional branch connects epicardial derived cells to atrial cardiomyocytes (i.e., cluster 12-27-7-2-1), suggesting that epicardium-derived cells (EPDCs) undergo mesenchymal transition and differentiate into cardiomyocytes [18,19]. By mapping both the spage2vec identities and the pseudotime scores of these branches into the tissue we observed that pseudotime has a clear spatial component, matching with the gradient from the periphery to the interior of the heart in the developing atria (. GO analysis of clusters presenting enough supporting scRNA-seq cells show terms enriched for EMT and atrial morphogenesis (.
Description of the differentiation of epicardial cells in the human heart development.
A,B. Diffusion map of pseudo-cell expression profiles defined in PCW 6.5 (Materials and Methods) and assigned to clusters related with epicardial cells, ventricular cardiomyocytes, epicardium-derived cells and fibroblasts. Each spot is labelled in A according to the cluster it was assigned to in Fig 2A. In B, the color of each spot represents its pseudotime score, considering the root in cluster 18 (epicardial cells). Two main branches can be observed. Pseudotime scores above 0.5 were trimmed for visualization purposes C. Spatial map highlighting the spots assigned to the clusters present in Fig 4A in one of the two sections from PCW 6.5. D. Spatial map representing the pseudotime scores of each of the spots described in Fig 4B in one of the sections from PCW 6.5 and a region of interest present in the same tissue. Clusters were represented in two different plots, depending on whether they were situated in Branch 1 (top) or Branch 2 (bottom) according to Fig 4A and 4B. E. Dot plot showing enrichment of Gene Ontology biological processes for top 15 most differentially expressed genes in the clusters represented in Fig 4A.
Discussion
The improvement of targeted spatially resolved transcriptomic approaches [20,21] in terms of signal-to-noise ratio, sequencing depth, number of genes and number of cells analyzed is leading towards the generation of larger datasets that will enable more and more comprehensive data driven spatial analysis. So far, methods such as ISS have primarily been a useful complement to scRNA-seq strategies by uncovering the spatial location of scRNA-seq defined cell populations. However, spatial molecular organization in itself presents intrinsic critical information of the cellular heterogeneity that is not fully captured by non-spatial methods, thus de novo approaches that do not rely on previous knowledge are starting to gain relevance in the field due to their notable advantages [13,22]. In this study, thanks to one of these de novo approaches, spage2vec [13], we have been able to define 27 molecular signatures conserved during the developmental process of the heart based solely on the spatial location of the expressed molecules of 69 targeted genes.In contrast with the original study [7], where cell typing was constrained by availability of scRNA-seq data, our approach is able to define, in a spatiotemporal manner, different molecular signatures stable over the different time points analyzed during the heart development. Our analysis was able to capture stable cell populations through the developmental process, such as epicardial cells, and could be used for understanding biological processes like migration and differentiation. Supervised cell typing approaches [10,12,23] will force the ISS data to fit signatures designed from scRNA-seq, with the risk of introducing biases and losing part of the potential biological information available in the ISS data. Furthermore, supervised approaches may fail to assign cells to a cell type due to discrepancies between the detected molecular signatures and the scRNA-seq data. As a consequence, while de novo approaches such as spage2vec assign a molecular signature to each read analyzed, probabilistic cell typing approaches avoid assigning a signature to many of the reads analyzed, missing in some cases molecular patterns with a true biological implication.Moreover, unlike most existing cell typing strategies, spage2vec does not rely on cell segmentation. This aspect can be highly beneficial when working with compact tissue, where cell borders are difficult to define. Spage2vec directly clusters the mRNA reads based on their local environment, and neighborhood information is incorporated in the process. In order to capture spatial signatures at cellular resolution, this method aggregates local information from neighborhoods within a radius of approximately 15 μm, which is a reasonable inter-cell distance, although the detected spatial clusters can represent cellular and even subcellular gene expression patterns. Since the method is completely unsupervised, super-cellular or sub-cellular patterns may also be captured depending on multiple factors that are related to the gene panel selected, sequencing resolution, and local differences in cell density. This needs to be taken into consideration when interpreting the outcome of the method since individual cell types present in different spatial contexts can be split in different environment-dependent clusters due to the effect of the neighboring cells in the creation of the embedding, as we believe it occurs with endocardium-related cells in this study. In these cases, further verification of the clusters using independent techniques is recommended.For its unsupervised analysis, spage2vec depends on a targeted ISS gene panel. In this case, the genes were selected at an early stage of the Asp work10, based on scRNA-seq and Spatial Transcriptomics data. Despite the clear limitation of using a subset of markers for identifying clusters de novo, we have shown that leveraging deep learning representation power, spage2vec can also identify subpopulations through non-linear aggregation of spatial marker features, even without marker genes that can directly identify all cell populations. This is exemplified when comparing atrial and ventricular clusters described here with those reported in a follow up study by Sylvén et al [24] based on the combination of scRNA-seq and Spatial transcriptomics. They report two atrial clusters, trabecular and conduit atrium, that annotate to similar tissue distributions as spage2vec clusters pcw6-1-2 and pcw6-7. While available gene markers annotate trabecular atrium in spage2vec, no highly specific markers are available for conduit atrium but rather a concerted profile of markers that identifies it.For the ventricular clusters, similar principles prevail. Thus, Sylvén et al. [24] report characteristics of compact, trabecular with subtypes, and Purkinje-related myocardium that have tissue distributions similar to spage2vec clusters pcw6-4,8,15 (see viewer) annotating outer and inner ventricular myocardium and thus with spage2vec based on concerted gene profiles. An exception is spage2vec cluster 24 with CDK1, NUSAP1 and TOP2A gene expressions that are highly specific for cardiomyoblasts with high fractions of cell cycle G2M and S phases and exosome-enriched gene expressions.Apart from its ability to capture specific subpopulations, here we prove that segmentation-free methods can be used to describe a differentiation process, including its spatial component. In this manuscript we report two main trajectories involving epicardial cells in atrial development. In fact, this observation is supported by Singh et al. 2013[25], Greulich et al. 2011[26], Cai et al. 2008[18] and Zhou et al. 2008[19], who report that at the atrial level epicardial cells flow into the atrial myocardial wall of venous origin and through epithelial-to-mesenchymal transition differentiate into arterial endothelium, smooth muscle and perivascular fibroblasts and may contribute to myocardialization of the atrial wall.All in all, by applying spage2vec to study the human heart development we have been able to perform a spatiotemporal analysis of the cells found in post conception week 4.5–5, 6.5 and 9, identifying different molecular signatures and developmental processed previously undescribed with this resolution. The spatial maps of the newly characterized cell-type signatures are made public available and can be interactively explored at https://tissuumaps.research.it.uu.se/human_heart.html. Furthermore, this study demonstrates the advantages of using de novo strategies to jointly model the spatial gene expression at different developmental stages, without relying on cell segmentation and scRNA-seq to characterize developmental processes. Thus, it opens the possibility of applying this technique to similar biological systems where reference single cell RNA sequencing data may be limited or not available.
Material and methods
Datasets
The ISS dataset of the developing human heart [7] comprises gene expression information of 69 marker genes and decoded spatial coordinates of mRNA spots in eight tissue sections at three developmental time points (Fig 1A). There are 189541, 812808, and 1471602 mRNA reads at the three time points respectively, summing up to a total of 2473951 reads.
Fig 1
Overview of the Spage2vec approach to characterize the human developmental heart.
A. Spage2vec constructs a graph from the spatial gene expression of the tissue samples and projects spatial markers in a common latent space. Scale bars: 1 mm, cutout 15 μm. B. PAGA plot representing the different expression profiles defined using spage2vec. Background colors represent main cell classes manually annotated based on cluster expression profiles. C. Heatmap showing the mean expression of each gene (i.e. expression profile) in the clusters defined by spage2vec, along with suggested cell classes, color coded as in Fig 1C. Expression is normalized by gene D. Spatial maps of the different expression profiles defined using spage2vec in three sections (color coded as Fig 1B), one from each time point. For interactive multi-resolution viewing: https://tissuumaps.research.it.uu.se/human_heart.html.
Spatiotemporal representation of ISS gene expression data with spage2vec
Spage2vec [13] learns to map local neighborhood relationships between mRNA spots as distances in a continuous latent space using a deep learning model. As a result, a numerical vector is assigned to each individual mRNA spot describing its neighborhood composition. Therefore, molecules that share similar local environments are described with numerically similar vectors and consequently mapped in close proximity in the learned latent space. In such a way, we are able to build a spatiotemporal representation of the spatial gene expression in an unsupervised manner and without using any prior information. The learned representation is then used to perform clustering analysis to define localized gene expression signatures that represent cell-type signatures across the three embryonic stages.
Constructing a spatial gene expression graph
We first construct an undirected graph where each node represents an mRNA spot, with a one-hot encoding feature vector representing its corresponding gene. Each node is then connected by edges to its spatial local neighbors of the same tissue section within a maximum distance (d_max = 44.9 pixels/ 14.58 μm). We estimate the maximum distance such that 99% of nodes in the graph are connected to at least one neighbor. Connected components with less than six nodes are successively removed from the graph to exclude spurious reads such as spots located outside of the region of the heart sample, thus leaving 97.7% of the original mRNA reads for further processing.
Graph convolutional neural network model and training
We then train a graph convolutional neural network on the spatial gene expression graph to produce the spage2vec latent representation for each mRNA spot. The neural network consists of two GraphSAGE [27] layers. At each layer, the features of a node and its local neighborhood are aggregated and propagated to the next layer. The neural network learns its parameters in an unsupervised setting by minimizing a loss function based on random walks. The loss function of a node encourages similarity between the node and a direct neighbor that occurs in a random walk, and dissimilarity between the node and another node randomly sampled from the graph. Regarding the hyperparameters of the model, we use the mean aggregator at each layer and ReLU as the activation function for the first layer. The size of each layer is 32. The model is trained for 10 epochs with a batch size equal to 64, using Adam optimizer [28] with a learning rate equal to 0.001. The output for each mRNA spot is then a spage2vec latent vector of length 32.
Cluster analysis and visualization
After predicting a latent vector for each mRNA spot based on its neighborhood composition, we compute a kNN (k = 15) weighted graph of the spage2vec latent vectors and apply the Leiden clustering algorithm [29] (with clustering resolution r = 1) on the kNN-graph. We then use PAGA [30] to quantify the connectivity of acquired clusters, representing the clusters’ proximity in the latent space. Each cluster with less than 1000 nodes is merged into the closer larger cluster in the PAGA graph having the maximum connectivity to the smaller cluster, if the connectivity was greater than 0.1. Otherwise, they are considered outliers and filtered out. After merging and filtering out the small clusters, we count the number of spots per cluster per gene followed by cluster-wise Z-score normalization to create a cluster expression matrix. This led to the final set of spage2vec clusters, which can be visualized interactively using TissUUmaps [14].
Spage2vec and scRNA-seq data integration
We perform data integration between spage2vec clusters of individual analysis of PCW 6.5 ISS data and the corresponding scRNA-seq data from Asp et al. Specifically, we first log-normalize scRNA-seq total counts per cell. Then, we generate pseudo-cell gene expression profiles for each mRNA spot by aggregating its k-nearest neighbor (k = 100) in the spage2vec latent space. Next, we filter genes with less than 100 reads and log-normalize total counts per pseudo-cell. We thereafter integrate pseudo-cell and scRNA-seq gene expression profiles using SpaGE [15]. The two datasets are aligned by projecting them in a common latent space by domain adaptation [31] using 30 principal vectors. After alignment, we can either infer the spatial profile of genes that are missing from the original ISS gene panel, or vice versa assign scRNA-seq cells to spage2vec clusters by k-nearest neighbor imputation.Specifically, for each scRNA-seq cell we compute a cosine similarity in the common latent space with respect to all the k-th (k = 15) nearest neighbor pseudo-cells, and we define correspondence with a spage2vec cluster as the sum of all cosine similarities with respect to those pseudo-cells belonging to the given cluster. We then exclude scRNA-seq cells with low correspondence to the spatial clusters (i.e. maximum cosine similarity smaller than 0.3), and we assign each scRNA-seq cell to the spage2vec cluster with highest cosine similarity. Spatial clusters with less than 10 scRNA-seq cells assigned are marked as weakly aligned as they miss enough supporting scRNA-seq cells and thus are excluded from further analyses.
Comparison of spage2vec clusters with cell-type annotations from Asp et al.
A. Heatmap representing the confusion matrix between the cell type assigned to each read via pciSeq in Asp et al. [7] and the spage2vec cluster annotations. B. Heatmap representing the correlation between the expression profile of each spage2vec cluster and each cell type described using scRNA-seq in Asp et al. [7] for the 69 genes included in both datasets. C. Spatial location of a subset of clusters from the spage2vec analysis (top) and pciSeq (bottom) in a specific sample from pcw 6.5. Clusters selected represent both epicardial cells and fibroblast-like cells /epicardium derived cells in both cases and colors have been based on the similarities between spage2vec clusters and pciSeq clusters. A zoomed in region is shown for both datasets. D. Spatial location of cardiomyocyte-related clusters defined by both spage2vec (top) and pciseq (bottom) in a specific section from pcw 6.5. Each spage2vec cluster assigned to cardiomyocytes were classified as atrial or ventricular according to their molecular signature in Fig 1C.(TIF)Click here for additional data file.
Correspondence between pciSeq and spage2vec clusters.
A. Map of the main morphological regions identified in pcw6.5 sections. Regions were calculated by redefining every read based on the reads present in a radius of 70 pixels/22.8 um to capture the main tissue domains and applying leiden clustering on it. B. Scatter plot representing the abundancies of specific paired spage2vec-pciseq clusters in the regions defined in S2A (from left to right: epicardial cells-cluster 20; capillary endotheium-cluster 18; cardiac neural crest cells-cluster 26 and Fibroblast-like (VD)-cluster 16). Pearson correlation for every pair of clusters is included in the scatter plot.(TIF)Click here for additional data file.
Integration of time point-specific analyses.
A. PAGA plot representing all clusters found in the time-point specific analyses of pcw 4.5–5, pcw 6.5 and pcw 9. Each cluster is represented in a node and backg//round colors indicate main cell type annotations. B. Spatial location’s comparison between general clusters (Figs 1, S1, S2 and S3) and time-point specific clusters. Three main clusters are represented: cluster 13 (left), cluster 20 (middle) and cluster 18 (right) in one of the samples of each time point, together with their correspondent time point-specific cluster.(TIF)Click here for additional data file.
Correspondence between general clusters and time point-specific cluster.
Heatmap representing the confusion matrix between the cluster assigned to each read in the general analysis (Figs 1, S1, S2 and S3) and the cluster assigned to each read in the time point-specific analysis. Color column situated next to the time-point specific cluster labels indicates the time point where each cluster has been detected. The color code used in Fig 1A is used to label each time point.(TIF)Click here for additional data file.
Differential expression analysis of scRNA-seq data based on spage2vec cluster annotations.
Top 15 differentially expressed genes for the clusters found in the individual analysis of pcw 6.5. Scores of each gene (y axis) corresponds to the Wilcoxon rank-sum test score.(TIF)Click here for additional data file.
Exploration of differentially expressed genes between the endothelial and cardiomyocyte-related clusters.
A. Heatmap representing the correlation patterns found in scRNAseq (left) and spage2vec (right) between the three most differentially expressed genes (DEG) between endothelium-related clusters represented in Fig 3A. B. Dot plot representing the expression detected via scRNAseq of the three most DEG between endothelium-related clusters in the cell types identified in Asp et al. C. Heatmap representing the correlation patterns found in scRNAseq (left) and spage2vec (right) between the three most DEG between atrium cardiomyocytes subclusters represented in Fig 3A. D. Dot plot representing the expression detected via scRNAseq of the three most DEG between atrium cardiomyocytes subclusters in the cell types identified in Asp et al. E. Heatmap representing the correlation patterns found in scRNAseq (left) and spage2vec (right) between the three most DEG between ventricular cardiomyocytes subclusters represented in Fig 3A. F. Dot plot representing the expression detected via scRNAseq of the three most DEG between ventricular cardiomyocytes subclusters in the cell types identified in Asp et al. G. Dot plot representing the expression of the 2 most DEG of each fibroblast/EPDC subcluster (cl) H. Dot plot representing the expression detected via scRNAseq of the top DEG between fibroblast/EPDC subclusters in the cell types identified in Asp et al.
Fig 3
Exploration of molecular signatures identified within cardiomyocytes and endothelial cells.
A. Dot plots representing the expression of the 3 most differentially expressed genes between each of the clusters (cl) related with endothelial cells (cluster 14 and 17)(left), ventricular (cluster 4, 8 and 15)(middle) and atrial (cluster 1,2 and 7)(right) and cardiomyocytes. Expression is shown in the endothelial and cardiomyocyte related clusters linked to specific populations within the scRNA-seq dataset from Asp.et al.[7] B. Spatial maps highlighting the reads assigned to the clusters related with endothelial cells and cardiomyocytes (right), with gene ontology enrichment of biological processes for the top 15 most differentially expressed genes of each cluster (left). Color codes as in A.
(TIF)Click here for additional data file.
Exploration of spage2vec pcw6.5 clusters.
A. Heat map representin the mean expression of each spage2vec pcw6.5 cluster for the genes included in the ISS panel. B. Heat map representing the cross correlation of the spage2vec pcw6.5 clusters based on their mean expression. C. Jaccard index derived from the integration of the scRNA-seq dataset and the spage2vec pcw6.5 clusters D. Dot plot representing the expression of the genes expressed in the spage2vec pcw6.5–13 cluster in the different scRNA-seq cell types. E. Region of interest of the in situ sequencing samples (pcw6.5) representing the location of cluster pcw6.5–13 (right) and some of genes expressed in this cluster (left) F. Dot plot representing the expression of the genes expressed in the spage2vec pcw6.5–16 cluster in the different scRNA-seq cell types. G. Region of interest of the in situ sequencing samples (pcw 6.5) representing the location of cluster pcw6.5–16 (right) and some of genes expressed in this cluster (left)(TIF)Click here for additional data file.11 Mar 2022Dear Professor Nilsson,Thank you very much for submitting your manuscript "De novo spatiotemporal modelling of cell-type signatures in the developmental human heart using graph convolutional neural networks" for consideration at PLOS Computational Biology.As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Qing NieAssociate EditorPLOS Computational BiologyDaniel BeardDeputy EditorPLOS Computational Biology***********************Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: Review Comments:Graph neural networks has recently become a very widely used and powerful method for analyzing diverse complex networks, e.g., social network, financial network, protein networks, brain networks [1]. The paper applied the Spage2vec method, an advanced graph neural network model, to effectively characterize the cell types at sub-cellular imaging resolution for the development of fetal heart from 4.5 to 9 post conception weeks (PCWs)---three developmental stages. However, the in situ pciSeq technique published in 2020 only explored cell diversity for the 6.5 PCW heart development stage, yet not considered the 4.5-5 PCW and 9 PCW time points.Spage2vec requires no cell segmentation, and can learns the latent cell expression representations by incorporating the critical spatial molecular organization structure property. Using the obtained latent cell representations, it enables to characterize previously unreported molecular diversity within cardiomyocytes and epicardial cells. Moreover, it can identify their characteristic expression signatures for diverse downstream tasks, e.g., subpopulation detection. The experimental results show that the proposed method enables to discover 27 meaningful cellular clusters (more than the 12 cell types defined in the scRNA-seq data) shared across the three different heart development embryonic stages.The authors also built an open-source GitHub code repo for their work and provided an online platform for interactively visualization of multiple cell type clusters in different colors. Having public repo and online visualization platform can greatly aid other researchers in this field in better understanding the method implementations and the entire study.After reading through the paper, I listed some of my comments and questions below.Page 3, Line 70-71:If I understood well, the study on the pciSep technique was initially developed for brain cortex, yet not for the development of the heart. However, the author described that “the overall study was the first comprehensive spatial atlas of the developmental human heart”.Page 7 Line 147:There is a typo in the sentence “clusters whith (typo) distinct and consistent spatial…”.Material and Methods:• Page 13, Line 301/304: Does Spage2vec can capture global graph structure property, i.e., second-order proximity, or just capture local neighborhood relations among mRNA spots?• Line 311, page 14: I would suggest to add more details about the number of nodes in the constructed graph, and what is the node feature dimension? What is the adjacency matrix dimension in the constructed graph? weighted or unweighted, directed, or undirected?• What are the performance if directly apply conventional clustering t-sNE or IsoMap• Is there any correlation between the three different development stages? If has, is it possible to learn these evolving dynamics information directly using spage2vec model? or using Spage2vec to capture both spatial and temporal graph embeddings in the latent space simultaneously?Reviewer #2: Dear authors,I have reviewed the manuscript titled “De novo spatiotemporal modelling of cell-type signatures in the developmental human” presented by Salas et al. In their work they present their re-analysis of the human developmental heart spatial data presented by Asp et al 2019 using a graphical modelling approach that is independent of identifying cells. In doing so, they identify previously undescribed molecular diversity within cardiomyocytes and epicardial cell populations and describe gene expression trajectory dynamics within the tissues.Both the methodolocial approach and results are indeed very interesting, and I would like to see them published. Cell segmentation free approaches are still underappreciated in spatially resolved transcriptomics data analysis.However, I feel that the paper is currently lacking investigation and validation of the differences between the obtained results, and those presented by Asp et al. I hope that I can elaborate on my concerns with the following points.Major1) Comparative analysis of the spage2vec and pciSeq/scRNAseq data seems lacking in places. Some specific points regarding this issue are:1a) Methodological differences leading to different cluster numbers. In scRNAseq it is well established that the clustering parameters play a major role in how many clusters are identified. How invariant are the 27 clusters to parameters used by spage2vec (line 120)? How invariant are the 12 clusters to parameters used by pciSeq? How invariant are the 15 clusters to parameters used by scRNAseq? The authors should investigate that reasonable parameter changes in e.g. the scRNAseq cluster do not result in e.g. 27 clusters that match the spage2vec clusters perfectly (.. which is unlikely, but it should be explored). If the aut1b) I would expect that for the 6.5 PCW dataset (where matched scRNAseq and ISS data exist) that the cell types obtained between the spage2vec can be found in the scRNAseq data. The authors postulate that the detection of additional clusters in the ISS data using spage2vec is due to sampling more cells - this would explain the difference for rare cell types, but would also implicate that cell types with many cell should be equally represented in both the scRNAseq and ISS data. However given that such a large proportion of those cells are seen in the spage2vec data and not in the spage scRNAseq analysis makes it unlikely that this arises for differences in cell number. For example, the spage analysis in Figure 2 shows that a number of spage2vec e.g. clusters 16, 24 ,19 , 23 , 13, 25 and 3 do not appear to have matches in the spage analysis of the scRNAseq data (fig 2B). However, nearly all of these spage2vec clusters have have many cells annotated in the tissue (fig s2) at 6.5PWC. In light of this, how do the authors explain this discrepency?1c) If the authors cannot recapitulate the novel spage2vec cell types in the scRNAseq data, the authors need to explain why this is the case, with the support from other studies or by validation.1d) I find the consistency of the analysis of the Spage scRNAseq clusters (e.g. presented in figure 3A, S4E, S7) to be confusing, as the scRNAseq cluster variability do not correspond to spage2vec clusters variability. An example of how this can lead to confusion is that in cluster 14 and 17 have very different gene expression (e.g TCIM in cluster 14, and MYL2 for 17) in figure 1C, however, neither of these genes appear in the most variable genes for 14 and 17 in Figure 3A, but instead the genes shown in 3A seem to not be very variable at all. How do the authors explain this?1e) While spage2vec resolves much more sub-cell-types/state in e.g. atrial cardiomyocytes, it does not capture “Fibroblast-like (AV-rel)”, “Endothelium/ pericytes” (Figure S4A), “Erythrocytes 2”, or “Myoz2-enriched cardiomyocytes” very well. This is partially contradictory to the statement of resolving cell type heterogeneity better than scRNAseq (lines 145-147). The authors should clarify the implications of this on this current study, and critically assess the previous cell-typing in Asp et al. Looking at figure S4D, it seems that Spage2Vec misses the visual enrichment of “Myoz2-enriched cardiomyocytes” in the inner part of the left ventricle (and to lesser extent in the right ventricle). The authors should explain how the spage2vec clusters are an improvement over the pciseq clusters in figure S4D.1f) How was the spatial concordance of cell types determined (line 127-128)? This should be performed statistically.1g) The authors postulate that the spage2vec results are more inline with expected distributions of cell types (line 132-133). Please provide evidence for this, especially in light of point 1b.Intermediate2. Is spage2vec a “graph convolutional neural network”? While there are elements of graphs, convolutional operations, and neural networks in the implementation, the authors should revisited this to see whether their description fits with the current level of precision in defining tools in the field. Interestingly, the word “convolution” does not appear in the original spage2vec publication.3. The methods are well described, but lack any information of computational frameworks/tools/versions that are used. For example, the authors do not describe whether their analysis is performed in matlab/R/Python/etc for numerous sections. In fairness, the authors provide access to a very good Jupyter notebook in their github repository, but (i) the link to the repository it not a DOI (see https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content), and (ii) the authors and editors should decide on the level of appropriate reporting of computational tools (e.g. for NPG, I would typically list the version of tools and libraries within the methods text)4. It would be interesting to see how spage2vec compares to results obtained by HMRF based methods (e.g. https://dx.doi.org/10.1038%2Fnbt.4260, https://doi.org/10.1038/s41587-021-01044-w) for this dataset. While this would be interesting, a systematic comparison this may fall out of scope of the current study, so perhaps the authors could comment on how they expect their method would hold up against these other models.5. The human heart contain multi-nucleated cells, that may not be amenable to microfluidics based scRNAseq. Can the authors comment on the extent to which they expect this to be an issue in the scRNAseq data analysis, and how this may potentially link to cardiomyocyte diversity identified by spage2vec?6. Why do the authors only find “immune” cells in the 6.5PCW dataset?7. Datasets. There isn’t a link to the datasets that were analysed in the paper. On the github repo, the link to the raw data seems broken: https://doi.org/10.5281/zenodo.5060858.Minor (these may be suggestions or questions):8. Subheadings in the results would make is easier to read..9. As a general minor issue, the paper reads like “I used tool X on dataset Y”, with spage2vec being mentioned a few too many times (it sometimes appears three times within a paragraph)10. The abstract could perhaps also mention their trajectory analysis results.11. Line 50. “Cell Atlasing” -> “cell atlasing”.12. Line 55. Should “Single cell” be “Single cell RNAseq”?13. Line 56. Are citations 5-9 appropiate? Given the large number of SRT methods, perhaps one or 2 review papers should be cited?14. Line 68-69. I cannot see what this states in addition to lines 67-68.15. Line 70. The use of “comprehensive” together with “spatial atlas” here is questionable (… to be clear Asp et al was a great study, but we should be precise with what the results were). Are the authors sure that there were no other (non-transcriptomics) imaging based analysis of the human heart? Likewise, while in the “spatial atlas” is synonymous with “spatial transcriptional atlas” in many field of science, in other fields it could mean something different (e.g. “spatial proteomic atlas”, or atlasing using radiological approaches). The authors should either tone down “comprehensive”, or rephrase the result (e.g. “spatial transcriptional atlas”/”spatial cellular atlas”).16. E.g. Line 85. Should “reads” be replaced for something more generic, to also be applicable for, e.g. FISH based molecule detection? The nature of this problem is not specific to ISS, but to all single molecule resolved SRT assays.17. Line 91. “Powerful” – is this hyperbole? Graph-based approaches are indeed useful, but the use of “powerful” does not add to the sentence.18. Line 149. The differences in the number of cells in the ISS and scRNAseq data should be enumerated.19. Line 383. “e.i.” should be “i.e.”Figures20. Please check that figures are shown consistently. E.g Figure S4 shows the 6.5 PWC image, but this is flipped compared figure 1D.21. Figure 1C. The scale bar has no units.22. Figure 2B. The scale bar has no units. The x-axis label should be improved, e.g. “PCW6.5 spage2vec clusters”.23. Figure 3A. Abbreviation “cl” is not explained. Perhaps add “Cluster” as the top axis label?24. Figure 3B. X-axis labelling is inconsistent.25. Figure S1, S2, S3. It would be nice if the panels were arranged by biological grouping.26. Figure S5. Perhaps this should be a main figure.27. Figure S6. No units on the scale bar. X-axis label should be “spage2vec cluster”**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: No: The zenodo data repository link is provided, but the link does not work.**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: NoFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at .Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols13 Jun 2022Submitted filename: Spage2vec_reviewer_PLOS_commbio_response.docxClick here for additional data file.6 Jul 2022Dear Professor Nilsson,We are pleased to inform you that your manuscript 'De novo spatiotemporal modelling of cell-type signatures in the developmental human heart using graph convolutional neural networks' has been provisionally accepted for publication in PLOS Computational Biology.Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.Best regards,Qing NieAssociate EditorPLOS Computational BiologyDaniel BeardDeputy EditorPLOS Computational Biology***********************************************************Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #2: Dear authors,I have reviewed the point by point response and resubmitted work by Salas and colleagues. I am satisfied with the responses they provide and would recommend the work for publication.Thank you for entertaining my criticisms, and congratulations on a nice study.**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #2: Yes: Naveed Ishaque23 Jul 2022PCOMPBIOL-D-22-00047R1De novo spatiotemporal modelling of cell-type signatures in the developmental human heart using graph convolutional neural networksDear Dr Nilsson,I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!With kind regards,Zsofi ZomborPLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
Authors: Chen-Leng Cai; Jody C Martin; Yunfu Sun; Li Cui; Lianchun Wang; Kunfu Ouyang; Lei Yang; Lei Bu; Xingqun Liang; Xiaoxue Zhang; William B Stallcup; Christopher P Denton; Andrew McCulloch; Ju Chen; Sylvia M Evans Journal: Nature Date: 2008-05-14 Impact factor: 49.962
Authors: Evan Z Macosko; Anindita Basu; Rahul Satija; James Nemesh; Karthik Shekhar; Melissa Goldman; Itay Tirosh; Allison R Bialas; Nolan Kamitaki; Emily M Martersteck; John J Trombetta; David A Weitz; Joshua R Sanes; Alex K Shalek; Aviv Regev; Steven A McCarroll Journal: Cell Date: 2015-05-21 Impact factor: 41.582
Authors: Patrik L Ståhl; Fredrik Salmén; Sanja Vickovic; Anna Lundmark; José Fernández Navarro; Jens Magnusson; Stefania Giacomello; Michaela Asp; Jakub O Westholm; Mikael Huss; Annelie Mollbrink; Sten Linnarsson; Simone Codeluppi; Åke Borg; Fredrik Pontén; Paul Igor Costea; Pelin Sahlén; Jan Mulder; Olaf Bergmann; Joakim Lundeberg; Jonas Frisén Journal: Science Date: 2016-07-01 Impact factor: 47.728
Authors: Soufiane Mourragui; Marco Loog; Mark A van de Wiel; Marcel J T Reinders; Lodewyk F A Wessels Journal: Bioinformatics Date: 2019-07-15 Impact factor: 6.937