Laleh Haghverdi1, Florian Buettner2, Fabian J Theis1. 1. Institute of Computational Biology, Helmholtz Zentrum München 85764 Neuherberg, Germany and Department of Mathematics, Technische Universität München 85748 Garching, Germany Institute of Computational Biology, Helmholtz Zentrum München 85764 Neuherberg, Germany and Department of Mathematics, Technische Universität München 85748 Garching, Germany. 2. Institute of Computational Biology, Helmholtz Zentrum München 85764 Neuherberg, Germany and Department of Mathematics, Technische Universität München 85748 Garching, Germany.
Abstract
MOTIVATION: Single-cell technologies have recently gained popularity in cellular differentiation studies regarding their ability to resolve potential heterogeneities in cell populations. Analyzing such high-dimensional single-cell data has its own statistical and computational challenges. Popular multivariate approaches are based on data normalization, followed by dimension reduction and clustering to identify subgroups. However, in the case of cellular differentiation, we would not expect clear clusters to be present but instead expect the cells to follow continuous branching lineages. RESULTS: Here, we propose the use of diffusion maps to deal with the problem of defining differentiation trajectories. We adapt this method to single-cell data by adequate choice of kernel width and inclusion of uncertainties or missing measurement values, which enables the establishment of a pseudotemporal ordering of single cells in a high-dimensional gene expression space. We expect this output to reflect cell differentiation trajectories, where the data originates from intrinsic diffusion-like dynamics. Starting from a pluripotent stage, cells move smoothly within the transcriptional landscape towards more differentiated states with some stochasticity along their path. We demonstrate the robustness of our method with respect to extrinsic noise (e.g. measurement noise) and sampling density heterogeneities on simulated toy data as well as two single-cell quantitative polymerase chain reaction datasets (i.e. mouse haematopoietic stem cells and mouse embryonic stem cells) and an RNA-Seq data of human pre-implantation embryos. We show that diffusion maps perform considerably better than Principal Component Analysis and are advantageous over other techniques for non-linear dimension reduction such as t-distributed Stochastic Neighbour Embedding for preserving the global structures and pseudotemporal ordering of cells. AVAILABILITY AND IMPLEMENTATION: The Matlab implementation of diffusion maps for single-cell data is available at https://www.helmholtz-muenchen.de/icb/single-cell-diffusion-map. CONTACT: fbuettner.phys@gmail.com, fabian.theis@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Single-cell technologies have recently gained popularity in cellular differentiation studies regarding their ability to resolve potential heterogeneities in cell populations. Analyzing such high-dimensional single-cell data has its own statistical and computational challenges. Popular multivariate approaches are based on data normalization, followed by dimension reduction and clustering to identify subgroups. However, in the case of cellular differentiation, we would not expect clear clusters to be present but instead expect the cells to follow continuous branching lineages. RESULTS: Here, we propose the use of diffusion maps to deal with the problem of defining differentiation trajectories. We adapt this method to single-cell data by adequate choice of kernel width and inclusion of uncertainties or missing measurement values, which enables the establishment of a pseudotemporal ordering of single cells in a high-dimensional gene expression space. We expect this output to reflect cell differentiation trajectories, where the data originates from intrinsic diffusion-like dynamics. Starting from a pluripotent stage, cells move smoothly within the transcriptional landscape towards more differentiated states with some stochasticity along their path. We demonstrate the robustness of our method with respect to extrinsic noise (e.g. measurement noise) and sampling density heterogeneities on simulated toy data as well as two single-cell quantitative polymerase chain reaction datasets (i.e. mouse haematopoietic stem cells and mouse embryonic stem cells) and an RNA-Seq data of human pre-implantation embryos. We show that diffusion maps perform considerably better than Principal Component Analysis and are advantageous over other techniques for non-linear dimension reduction such as t-distributed Stochastic Neighbour Embedding for preserving the global structures and pseudotemporal ordering of cells. AVAILABILITY AND IMPLEMENTATION: The Matlab implementation of diffusion maps for single-cell data is available at https://www.helmholtz-muenchen.de/icb/single-cell-diffusion-map. CONTACT: fbuettner.phys@gmail.com, fabian.theis@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Moshe Biton; Adam L Haber; Noga Rogel; Grace Burgin; Semir Beyaz; Alexandra Schnell; Orr Ashenberg; Chien-Wen Su; Christopher Smillie; Karthik Shekhar; Zuojia Chen; Chuan Wu; Jose Ordovas-Montanes; David Alvarez; Rebecca H Herbst; Mei Zhang; Itay Tirosh; Danielle Dionne; Lan T Nguyen; Michael E Xifaras; Alex K Shalek; Ulrich H von Andrian; Daniel B Graham; Orit Rozenblatt-Rosen; Hai Ning Shi; Vijay Kuchroo; Omer H Yilmaz; Aviv Regev; Ramnik J Xavier Journal: Cell Date: 2018-11-01 Impact factor: 41.582
Authors: Spencer C Wei; Roshan Sharma; Nana-Ama A S Anang; Jacob H Levine; Yang Zhao; James J Mancuso; Manu Setty; Padmanee Sharma; Jing Wang; Dana Pe'er; James P Allison Journal: Immunity Date: 2019-03-26 Impact factor: 31.745