| Literature DB >> 30742040 |
George C Linderman1, Manas Rachh1, Jeremy G Hoskins1, Stefan Steinerberger2, Yuval Kluger3,4.
Abstract
t-distributed stochastic neighbor embedding (t-SNE) is widely used for visualizing single-cell RNA-sequencing (scRNA-seq) data, but it scales poorly to large datasets. We dramatically accelerate t-SNE, obviating the need for data downsampling, and hence allowing visualization of rare cell populations. Furthermore, we implement a heatmap-style visualization for scRNA-seq based on one-dimensional t-SNE for simultaneously visualizing the expression patterns of thousands of genes. Software is available at https://github.com/KlugerLab/FIt-SNE and https://github.com/KlugerLab/t-SNE-Heatmaps .Entities:
Mesh:
Substances:
Year: 2019 PMID: 30742040 PMCID: PMC6402590 DOI: 10.1038/s41592-018-0308-4
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1.FIt-SNE allows for embedding of the full 1.3 million mouse brain cell dataset (left), enabling the identification of known cell populations that cannot be identified when downsampling to a random 50,000 cells (right). (For the left figure, instead of plotting all 1.3 million embedded points, only 100,000 of the cells not expressing the marker genes are shown, whereas all the cells expressing the marker genes are shown.)
Time taken for 1000 iterations of the gradient descent phase of 2D t-SNE using Barnes-Hut t-SNE (BH t-SNE) compared to our implementation (FIt-SNE), as compared on a 2017 Macbook Pro for a given number of points N. See section 8.3.5 for more details.
| BH t-SNE | FIt-SNE | |
|---|---|---|
| 10,000 | 1 min. | < 1 min. |
| 100,000 | 11 min. | < 1 min. |
| 500,000 | 1 hr. 10 min. | 3 min. |
| 1,000,000 | 3 hr. 9 min. | 15 min. |
Time taken to compute input similarities in Barnes-Hut t-SNE (vptree) compared to FIt-SNE using either multithreaded vantage-point trees (vptreeMT) or a multi-threaded approximate nearest neighbor (annMT) approach on a 2017 Macbook Pro for a given number of points N.
| 50 Dimensions | 100 Dimensions | |||||
|---|---|---|---|---|---|---|
| vptree | vptreeMT | annMT | vptree | vptreeMT | annMT | |
| 10,000 | < 1 min. | < 1 min. | < 1 min. | < 1 min. | < 1 min. | < 1 min. |
| 100,000 | 2 min. | < 1 min. | < 1 min. | 3 min. | < 1 min. | < 1 min. |
| 500,000 | 56 min. | 15 min. | 3 min. | 1 hr. 30 min. | 20 min. | 4 min. |
| 1,000,000 | 4 hr. 45 min. | 1 hr. 15 min. | 6 min. | 7 hr. 9 min. | 1 hr. 40 min. | 8 min. |
Figure 2.Schematic and demo of t-SNE Heatmaps. Starting with the expression matrix (A) compute 1D t-SNE, which is plotted in (B) colored by the expression of each gene (with added jitter). We bin the 1D t-SNE, and represent each gene by its average expression in each bin (C), and then generate a heatmap of these vectors, so that genes with similar expression patterns in the t-SNE are grouped together (D). In (E), we demonstrate t-SNE heatmaps using retinal bipolar cells[11]
| Algorithm 1: FFT-accelerated Interpolation-based t-SNE (FIt-SNE) |
|---|
| Algorithm 2: Out-of-Core PCA (oocPCA) |
|---|