| Literature DB >> 30206223 |
Fabrizio Costa1,2, Dominic Grün3, Rolf Backofen4,5.
Abstract
Cell types can be characterized by expression profiles derived from single-cell RNA-seq. Subpopulations are identified via clustering, yielding intuitive outcomes that can be validated by marker genes. Clustering, however, implies a discretization that cannot capture the continuous nature of differentiation processes. One could give up the detection of subpopulations and directly estimate the differentiation process from cell profiles. A combination of both types of information, however, is preferable. Crucially, clusters can serve as anchor points of differentiation trajectories. Here we present GraphDDP, which integrates both viewpoints in an intuitive visualization. GraphDDP starts from a user-defined cluster assignment and then uses a force-based graph layout approach on two types of carefully constructed edges: one emphasizing cluster membership, the other, based on density gradients, emphasizing differentiation trajectories. We show on intestinal epithelial cells and myeloid progenitor data that GraphDDP allows the identification of differentiation pathways that cannot be easily detected by other approaches.Entities:
Mesh:
Year: 2018 PMID: 30206223 PMCID: PMC6134144 DOI: 10.1038/s41467-018-05988-7
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Steps in our visualization approach. a Each cell is initially assigned to the class as determined by the user-provided clustering; furthermore, additional pre-processing such as filtering and feature selection is done. b For each pair of cells the similarity of the expression profiles is calculated using different metrics (see Methods). c To emphasize class membership in layout, we add for each cell an edge to the k-nearest neighbors of the same class; each edge is annotated with the desired distance between the two cells. d To visualize differentiation pathways, we add another type of edge called k-shift-edges, which connects cells to the k′ densest neighbors of a different class. e A force layout algorithm interprets each edge as a spring. f The optimal 2D configuration is determined minimizing the total energy of the systems. g We determine the convex hull of a given class in the layout. h Ternary plots are provided to further investigate differentiation pathways. Using a multi-class prediction approach, cells that are clearly members of a class are close to the corners, cells on the differentiation pathway between two classes lie on the corresponding edges, and undetermined ones are placed in the center of the plot
Fig. 2GraphDDP reveals differentiation trajectories of intestinal epithelial cells. a t-SNE map representation of intestinal epithelial single-cell transcriptome data from Grün et al.[17]. Clusters, highlighted in different colors, were derived by RaceID2 in the original study and correspond to distinct cell types or progenitor stages. b The lineage tree inferred by StemID is overlaid on cell clusters. Thicker links reflect higher coverage of a link by cells, and the color reflects the significance of a link measured by a logarithmic p-value. The lineage tree was found to be in good agreement with the current model of intestinal cell differentiation. c Visualization of the intestinal epithelial data by GraphDDP. The convex hull of each cluster is shown and shift edges are depicted to reflect the relations between clusters. The representation places stem cells in the center, recapitulates the differentiation trajectories shown in b and identifies novel lineage-specific progenitor relations (see text)
Fig. 3Visualization of scRNA-seq data[21] of bone marrow resident myeloid progenitors. The edges represent the denser neighbor of different class (i.e., k-shift edges), indicating differentiation trajectories. The meta-cluster consisting of C1–C7 (encircled in green) represents erythrocyte differentiation, with C1 being the endpoint expressing hemoglobin and C7 being an early erythrocyte progenitor. The differentiation order is clear for clusters C7, C6, and C5, as indicated by many k-shift edges between C7, C6 and C6, C7 in the layout. This is also supported by the ternary plot for C7, C6, C5 (upper triangle), where the cells are mostly located close to the C7–C6 line, or to the C6–C5 line of the triangle. The associated confusion score (see Methods) of 0.06 also clearly indicates a transition. For C2,C3, and C4, there is no obvious ordering, as they are connected by many k-shift edges between all pairs of combination (C2–C3, C2–C4, and C3–C4). Again this is supported by the ternary plot (lower triangle), with many cells in the middle of the triangle, indicating an equal likelihood to be classified as being a member of C2, C3 or C4. The associated high confusion score (0.72) also clearly indicates that there is no clear transition. This divergence can also be seen when looking at the expression profile of different markers for erythrocyte differentiation (plots on the right), which support the order C4→C2→C3→C1, as well as C4→C3→C2→C1. The meta-cluster consisting of clusters C13, C14, C15, and C16 (encircled in red) on the other hand relates to neutrophil differentiation, which is supported by the expression profiles of marker genes for neutrophil cells (plots on the left)