| Literature DB >> 33193561 |
Jiankang Xiong1,2, Fuzhou Gong1,2, Lin Wan1,2, Liang Ma3.
Abstract
The dramatic increase in amount and size of single-cell RNA sequencing data calls for more efficient and scalable dimensional reduction and visualization tools. Here, we design a GPU-accelerated method, NeuralEE, which aggregates the advantages of elastic embedding and neural network. We show that NeuralEE is both scalable and generalizable in dimensional reduction and visualization of large-scale scRNA-seq data. In addition, the GPU-based implementation of NeuralEE makes it applicable to limited computational resources while maintains high performance, as it takes only half an hour to visualize 1.3 million mice brain cells, and NeuralEE has generalizability for integrating newly generated data.Entities:
Keywords: elastic embedding; generalizable models; large-scale; neural networks; parametric models; single-cell RNA sequencing; stochastic optimization
Year: 2020 PMID: 33193561 PMCID: PMC7587292 DOI: 10.3389/fgene.2020.00786
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1(A) The flow chart of NeuralEE. In brief, NeuralEE constructs an NN that defines a parametric mapping from the original space to the embedded space. The full dataset is first randomly partitioned into several batches (only one batch is also acceptable, which means not applied with mini-batch trick or stochastic optimization). On each batch the attractive weight and the repulsive weight matrices are calculated and fed into the loss function of EE, which is represented as a composite function of the original data. By backpropagation algorithm, the parameters in the NN are optimized, and the mapping of the embedding is thus learned. (B) Visualization results of 1.3 million mouse brain cells by NeuralEE-SO.
Figure 2Comparison of NeuralEE to other visualization methods on ArtificialTree data. (A) True embedding. (B) EE. (C) NeuralEE. (D) NeuralEE-SO. (E) t-SNE. (F) UMAP. (G) PHATE. (H) PCA. Different color corresponds different branch of artificial tree.
Figure 3(A) NeuralEE or (B) net-SNE on the full ArtificialTree data. Their embeddings with stochastic optimization are showed at Supplementary Figure 6. (C) The top line panel is the NeuralEE based on the sub-samples with a sub-sampling scale as the index scale, and the bottom line panel is the mapping of all samples to the embedded space based on the NN trained on the top panel. (D) net-SNE under the similar experiments as (C). (E) From left to right is the NeuralEE on the entire CORTEX data, NeuralEE based on the sub-samples with a sub-sampling scale of 25%, and the mapping of all samples to the embedded space based on the NN trained on sub-samples. (F) net-SNE under the similar experiments as (E).
Figure 4Embedding results on 1.3 million mouse brain cells dataset. (A) NeuralEE-SO with batch size as 5,000. (B) FIt-SNE. (C) UMAP. Labels represent different clusters of Louvain community detection algorithm (Blondel et al., 2008). As NeuralEE belongs to parametric methods, when applied on million-size data, it could return several outliers on the embedded space, we manually delete them to make layout tighter. The raw embedding is showed at Supplementary Figure 7.