| Literature DB >> 33941244 |
Abstract
Emerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.Entities:
Keywords: Data visualization; Multimodal omics; Protein velocity; RNA velocity; Single-cell sequencing; UMAP; t-SNE
Mesh:
Year: 2021 PMID: 33941244 PMCID: PMC8091681 DOI: 10.1186/s13059-021-02356-5
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906
Fig. 1Overview of the joint embedding in JVis. Metrics d (left) and d′ (right) measure the dissimilarity of different cellular phenotypes of individual cells, such as the expression of surface proteins (left) and mRNA (right). t-SNE and UMAP learn a low-dimensional embedding of cells that preserves the distribution of similarities that are quantified based on d or d′ alone, which renders certain cell types indistinguishable to either modality. In this example, blue and red cells cannot be distinguished based on their measured surface proteins, and green and black cells overlap in transcriptomic space. In JVis we generalize t-SNE and UMAP to learn a joint embedding that preserves similarities in all modalities at the same time. We integrate d and d′ in a convex combination of KL divergences (j-SNE) or cross entropies (j-UMAP) between corresponding similarities in low and high-dimensional space. An arrangement of cells that minimizes this convex combination with simultaneously learned weights takes into account similarities and differences in both mRNA and surface protein expression to more accurately represent cellular identity (middle)
Fig. 2Comparison of cell types and protein acceleration in unimodal and multimodal embeddings. First row: Visualization of perturbed SNARE-seq measurements. Accessible chromatin (ChrAcc) and gene expression was measured simultaneously in single cell from human cell lines BJ, H1, K562, and GM12878. Gene expression measurements were randomly shuffled between cell lines BJ and H1 (MixRNA). a Conventional t-SNE embedding of cells based on shuffled gene expression alone. b j-SNE visualization of shuffled gene expression and (unchanged) chromatin accessibility. c j-UMAP visualization of shuffled gene expression and (unchanged) chromatin accessibility. Second row: t-SNE/j-SNE visualizations of CBM cells. Cluster labels were identified by Specter. Embeddings were computed from RNA measurements alone (d), protein expression (ADT) alone (e), or jointly from both (f). Third row: Protein acceleration in ECCITE-seq (ctrl) data set projected into transcriptom-based t-SNE (g), and joint mRNA and surface protein based embeddings j-SNE (h), and j-UMAP (i)