| Literature DB >> 33081343 |
Maria Schmidt1, Henry Loeffler-Wirth1, Hans Binder1.
Abstract
Single-cell RNA sequencing has become a standard technique to characterize tissue development. Hereby, cross-sectional snapshots of the diversity of cell transcriptomes were transformed into (pseudo-) longitudinal trajectories of cell differentiation using computational methods, which are based on similarity measures distinguishing cell phenotypes. Cell development is driven by alterations of transcriptional programs e.g., by differentiation from stem cells into various tissues or by adapting to micro-environmental requirements. We here complement developmental trajectories in cell-state space by trajectories in gene-state space to more clearly address this latter aspect. Such trajectories can be generated using self-organizing maps machine learning. The method transforms multidimensional gene expression patterns into two dimensional data landscapes, which resemble the metaphoric Waddington epigenetic landscape. Trajectories in this landscape visualize transcriptional programs passed by cells along their developmental paths from stem cells to differentiated tissues. In addition, we generated developmental "vector fields" using RNA-velocities to forecast changes of RNA abundance in the expression landscapes. We applied the method to tissue development of planarian as an illustrative example. Gene-state space trajectories complement our data portrayal approach by (pseudo-)temporal information about changing transcriptional programs of the cells. Future applications can be seen in the fields of tissue and cell differentiation, ageing and tumor progression and also, using other data types such as genome, methylome, and also clinical and epidemiological phenotype data.Entities:
Keywords: differentiation of tissues; machine learning; planarian; pseudotime trajectories; self-organizing maps; single cell RNA sequencing; transcriptomic landscapes
Year: 2020 PMID: 33081343 PMCID: PMC7603055 DOI: 10.3390/genes11101214
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Analysis of developmental paths (schematic overview). (A) Single-cell transcriptome data are provided as a matrix of cells (columns) versus transcripts (rows). Similarity sorting along genes and cells is applied to obtain gene- and cell-state trajectories, respectively. (B) Multibranched cell-state trajectories order cells along a similarity measure called pseudotime (pt). They are visualized either as hierarchical tree or as three-dimensional manifold using, e.g., “URD”-plots [29]. They reflect transcriptional changes upon development and differentiation of cells. (C) The gene-state trajectories were generated in self-organizing map (SOM)-space making use of a network formed between clusters of co-regulated genes marked with letters in the figure. Trajectories follow paths of maximum expression linking source (stem cells) and sink (differentiated tissues) nodes along the edges of the net. (D) RNA velocity analysis delivers vector fields in cell and gene phase space. Each vector points in direction of increasing transcript abundance either between cell states or between gene states. In fact, RNA velocity forecasts the “future” transcriptional state of each gene. The RNA-velocity vector of the cells is formed by the RNA-velocities of all genes as components while the RNA-velocity vector in gene space is composed by the RNA-velocities of all genes in the respective local gene cluster.
Figure 2SOM portrayal of the developing planarian single cell transcriptome: (A) Single-cell resolved tSNE plot (cell types are color-coded as shown in the legend in part E). SOM expression portraits of selected cell types provide a glimpse on cell and gene transcriptional space. (B) Expression portraits of stem cells and of differentiating progenitor and differentiated muscle body and a goblet cells show overexpression spots at different positions as indicated by the dashed circles. The spot profile reveals specific up regulation of the spots in the respective tissue specific cells. (C) The URD [40] plot resolves multibranched single-cell developmental manifolds. Paths of selected tissue types are indicated by arrows. (D) Gene activation trajectories of cell differentiation from stem cells into four different tissues were calculated by means of maximum flow on SOM embedding. (E) Legend of colors used for cell-type assignment (see also [29]). The color code is used throughout the paper.
Figure 3The single-cell transcriptomic landscape of the planarian Schmidtea mediterranea: (A) The module overexpression map provides an overview about modules of co-expressed and highly variant genes. They mainly arrange along the edges of the SOM and are labeled with bold capital letters A, D, F, J, L, N, R, B1, D1, E1, F1. The grey arrows illustrate selected developmental paths by linking the stemness spot N with the respective “tissue” spots. Expression profiles (gene set enrichment Z-score, GSZ-scale) and gene maps of selected functional gene sets underpin the functional context of these modules. They reveal specific up- and downregulation of the gene sets in different tissues (red arrows). The gene maps indicate the location of the genes by dots. Accumulation of genes in and near module areas are indicated by red arrows and dashed ellipses. (B) Population and variance maps visualize the number of genes in each metagene (log-scale) and the variance of metagene expression, respectively. About 65% of all genes studied show virtually invariant expression. They accumulate in the blue area in the right lower corner as indicated. (C) Expression profiles of the modules of low variant expression (labelled with non-bold letters in part A) are shown as heatmap. The arrows above the heatmap point in the direction of differentiation from stem cells into different tissues. The spot modules group roughly into three clusters as indicated in the right part of the heatmap and along the edges of the SOM in part A. (D) Location of key reference genes are shown by dots (see also Figure S2 and Table S1). They are mostly found in or near the spots which upregulate in the respective tissues.
Overview of module-wise enriched gene sets and marker genes of planarian sc RNASeq data.
| Cluster | Genes | Name | Enriched BP Gene Sets | Marker Genes | Tissue |
|---|---|---|---|---|---|
| A | 235 | Neurons 1 | Neurotransmitter secretion (−6) | pc2, ChAT, gpas | cav-1+ neurons, ChAT neurons 1, |
| B | 1314 | Phosphorus metabolic process (−4) | GABRB3 (dd21541), CAVII-like, th | ||
| C | 206 | ||||
| D | 265 | Pharynx | vim-1,VIT (dd1071), NPEPL1 (dd181) | Epidermis, Pharynx | |
| E | 383 | 1.G9.2, ifn | |||
| F | 370 | Neurons 2 | Microtubule-based movement (−18) | pkd2l-2, rootletin (dd6573), cav-1 | cav-1+ neurons, GABA neurons |
| G | 218 | ||||
| H | 330 | Post-2c | |||
| I | 228 | nb.22.le | |||
| J | 111 | Epidermis | actin filament capping (−2) | prog-2, prog-1, agat-3 | Epidermis, late epidermal progenitors 1/2 |
| K | 100 | pds | |||
| L | 150 | Muscle | muscle structure development (−9) | COL4A6A (dd2337), collagen, COL21A1 (dd9565) | Muscle body |
| M | 1032 | cali, if-1, HSPG2 (dd8356) | |||
| N | 121 | Stem cells | translational elongation and termination (≥−79) | smedwi-1, dd_6998 | Stem cells |
| O | 143 | ||||
| P | 114 | bruli | |||
| Q | 44 | vasa-1 | |||
| R | 119 | Run specific | Unknown origin, presumably due to batch effects | Run specific cells | |
| S | 974 | dd_5560, SAMD15 (dd19710), wntP-3 | |||
| T | 404 | sp-5 | |||
| U | 235 | ||||
| V | 265 | TYMS, gH4 | |||
| W | 146 | Gene expression (−8) | |||
| X | 398 | dd_13666 | |||
| Y | 445 | GLIPR1 (dd210), npp-18 | |||
| Z | 575 | TMPRSS9 (dd7966), CTSL2 (dd582), gata 4/5/6 | |||
| A1 | 18239 | G−protein coupled receptor signaling pathway (≥−47) | glipr-1, PI16 (dd940), ASCL4 (dd1854) | ||
| B1 | 181 | Pigment | organonitrogen compound catabolic process (−6) | pgbd-1, PSAPL1 (dd1706), KMO (dd7884) | Pigment cells |
| C1 | 458 | ||||
| D1 | 262 | Parenchym | regulation of intracellular signal transduction (−3) | ctsl2, | aqp+ parenchymal cells, pgrn+ parenchymal cells, psap+ parenchymal cells |
| E1 | 226 | Phagocytes | mat | Phagocytes | |
| F1 | 725 | Goblet, secretory cells | Goblet and secretory cells |
Figure 4Gene expression trajectories of planarian tissue development. (A) Expression portraits (in log FC and loglog FC scales) of the epidermal lineage illustrate the change of module-patterns during differentiation. The “loglogFC” scale better resolves subtle changes between over- and under-expression in red and blue, respectively. Alterations of expression patterns become evident as “lava lamp”-like flow of the red spot from the right to the left. The expression of stemness and epidermal spots N and J change in an antagonistic fashion along the pseudotime. (B) The gene-state trajectory of epidermal development links the stemness spot with the “epidermal” spots in the expression landscape shown as cumulative loglogFC SOM). The expression profile along the trajectory assigns genes and functions along the path. (C) Selected expression profiles of spots along the developmental path of epidermal cells switch from negative to positive slopes between spots P and K. (D) Developmental characteristics of four different planarian tissues. The row above shows SOM-trajectories and pt-profiles. Below, the heatmaps of expression modules characterize expression changes of the respective tissues upon development as a function of pt. The respective expression portraits are shown below. (E) Gene state developmental trajectories obtained from the difference of SOM between stemness and differentiated tissue states using “topoDistance” analysis show differentiation paths as in part A–D. Largest expression changes were observed in the “final” slope describing differentiation of tissues from progenitor states.
Figure 5Branched pseudotime analysis of planarian tissue development. (A) URD plot [40] of branched differentiation in 3D is shown in two projections. The pseudotime is calculated as the average number of diffusion steps required to reach each cell from the root cell (neoblast 1) [40]. (B) Pseudotime coloring of t-SNE plot and URD-branching tree. In pt-units epidermal tissues are most distant from the root. (C) Branching tree of cell differentiation. (D) Expression profiles of selected lineages along the branches of the URD differentiation tree. Samples are sorted along pseudotime (gray arrow). The black LOESS (locally weighted scatterplot regression)-curves serve as guide for the eye. Step-wise transitions between stemness and tissue transcriptional programs are shown by vertical dashed lines. (E) A tree of tissue-related gene clusters together with marker genes were taken from [29]. Accordingly, 48 co-regulated gene sets assign to Planarian tissues and combinations of them. Most of these genes locate not in the spots referring to fully differentiated tissues. The top five genes in these clusters according to our analysis are listed in the figure (see boxes and also Table 1).
Figure 6RNA velocity portrayal: (A) RNA velocity vector-field with metagene resolution identifies overexpression spot modules as attractors of overexpression except the “kernel” metagenes in the center of each spot (see text). (B) Velocity-trajectories using SOM area-segmentation indicate “watersheds” of differentiation between the attractors. The expression profiles illustrate expression changes along the selected trajectories.