| Literature DB >> 32085701 |
Eugene Lin1,2,3, Sudipto Mukherjee1, Sreeram Kannan4.
Abstract
BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is an emerging technology that can assess the function of an individual cell and cell-to-cell variability at the single cell level in an unbiased manner. Dimensionality reduction is an essential first step in downstream analysis of the scRNA-seq data. However, the scRNA-seq data are challenging for traditional methods due to their high dimensional measurements as well as an abundance of dropout events (that is, zero expression measurements).Entities:
Keywords: Adversarial autoencoder; Dimensionality reduction; Generative adversarial networks; Single-cell RNA sequencing; Variational autoencoder
Year: 2020 PMID: 32085701 PMCID: PMC7035735 DOI: 10.1186/s12859-020-3401-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The novel architecture of an Adversarial Variational AutoEncoder with Dual Matching (AVAE-DM). An autoencoder (that is, a deep encoder and a deep decoder) reconstructs the scRNA-seq data from a latent code vector z. The first discriminator network D1 is trained to discriminatively predict whether a sample arises from a sampled distribution or from the latent distribution of the autoencoder. The second discriminator D2 is trained to discriminatively predict whether the scRNA-seq data is real or fake
Summary of scRNA-seq datasets employed in this study. There were 720 highest variance genes selected in each dataset for subsequent experiments
| Dataset | Number of cells | Number of cell types | Reference |
|---|---|---|---|
| Zeisel-3 k | 3005 | 7 | Zeisel et al. [ |
| Macoskco-44 k | 44,808 | 39 | Macosko et al. [ |
| Zheng-68 k | 68,579 | 10 | Zheng et al. [ |
| Zheng-73 k | 73,233 | 8 | Zheng et al. [ |
| Rosenberg-156 k | 156,049 | 73 | Rosenberg et al. [ |
Details of experimental results based on NMI scores for various dimension reduction algorithms, including the DR-A, PCA, ZIFA, scVI, SAUCIE, t-SNE, and UMAP methods. We carried out the experiments using the Rosenberg-156 k, Zheng-73 k, Zheng-68 k, Macosko-44 k, and Zeisel-3 k datasets. These dimension reduction algorithms were investigated with (a) 2 latent dimensions (K = 2), (b) 10 latent dimensions (K = 10), and (c) 20 latent dimensions (K = 20)
| Algorithm | Rosenberg-156 k | Zheng-73 k | Zheng-68 k | Macosko-44 k | Zeisel-3 k |
|---|---|---|---|---|---|
| (a) K = 2 | |||||
| DR-A | 0.5573 | ||||
| PCA | 0.2523 | 0.3396 | 0.2538 | 0.2984 | 0.4721 |
| ZIFA | 0.3049 | 0.3794 | 0.2810 | 0.3120 | 0.4250 |
| scVI | 0.5199 | 0.8261 | 0.5417 | 0.4599 | 0.7006 |
| SAUCIE | 0.4046 | 0.4304 | 0.2749 | 0.2707 | 0.4622 |
| t-SNE | 0.4343 | 0.6562 | 0.4081 | 0.4091 | 0.7103 |
| UMAP | 0.6507 | 0.4377 | 0.4184 | 0.7214 | |
| (b) K = 10 | |||||
| DR-A | |||||
| PCA | 0.3276 | 0.5612 | 0.3877 | 0.4243 | 0.5559 |
| ZIFA | 0.5074 | 0.8354 | 0.5152 | 0.4785 | 0.7807 |
| scVI | 0.5821 | 0.8060 | 0.5571 | 0.5155 | 0.7606 |
| SAUCIE | 0.4773 | 0.4209 | 0.3147 | 0.2874 | 0.5110 |
| t-SNE | N/A | N/A | N/A | N/A | N/A |
| UMAP | 0.5735 | 0.6911 | 0.4393 | 0.4129 | 0.7413 |
| (c) K = 20 | |||||
| DR-A | |||||
| PCA | 0.3761 | 0.5623 | 0.3874 | 0.4306 | 0.5561 |
| ZIFA | N/A | N/A | N/A | N/A | 0.7114 |
| scVI | 0.5831 | 0.7976 | 0.5691 | 0.5105 | 0.7419 |
| SAUCIE | 0.4740 | 0.4254 | 0.2952 | 0.2775 | 0.4808 |
| t-SNE | N/A | N/A | N/A | N/A | N/A |
| UMAP | 0.5656 | 0.6906 | 0.4413 | 0.4177 | 0.7419 |
N/A denotes that we could not run the given algorithm
Details of hyperparameters for DR-A based on the experimental results in Table 2. We carried out the experiments using the Rosenberg-156 k, Zheng-73 k, Zheng-68 k, Macosko-44 k, and Zeisel-3 k datasets. The DR-A algorithm was investigated with (a) 2 latent dimensions (K = 2), (b) 10 latent dimensions (K = 10), and (c) 20 latent dimensions (K = 20). G denotes a generative model and D denotes a discriminative model
| Dataset | Batch size | Hidden layer | Hidden unit | Learning rate |
|---|---|---|---|---|
| (a) K = 2 | ||||
| Rosenberg-156 k | 128 | 4 | 7 × 10−5 | |
| Zheng-73 k | 128 | 3 | 6 × 10−5 | |
| Zheng-68 k | 128 | 4 | 0.0001 | |
| Macosko-44 k | 128 | 3 | 0.0001 | |
| Zeisel-3 k | 128 | 4 | 8 × 10−4 | |
| (b) K = 10 | ||||
| Rosenberg-156 k | 128 | 4 | 6 × 10−5 | |
| Zheng-73 k | 128 | 4 | 2 × 10−5 | |
| Zheng-68 k | 128 | 4 | 7 × 10−5 | |
| Macosko-44 k | 128 | 4 | 7 × 10−5 | |
| Zeisel-3 k | 128 | 1 | 7 × 10−4 | |
| (c) K = 20 | ||||
| Rosenberg-156 k | 128 | 4 | 6 × 10−5 | |
| Zheng-73 k | 128 | 4 | 1 × 10−5 | |
| Zheng-68 k | 128 | 1 | 2 × 10−5 | |
| Macosko-44 k | 128 | 1 | 7 × 10−5 | |
| Zeisel-3 k | 128 | 1 | 7 × 10−4 | |
Fig. 22-D visualization for the Zeisel-3 k dataset. The Zeisel-3 k dataset was reduced to 2-D by using (a) DR-A, (b) PCA, (c) ZIFA, (d) scVI, (e) SAUCIE, (f) t-SNE, (g) UMAP, and (h) DR-A combined with t-SNE methods. Each point in the 2-D plot represents a cell in the testing set of the Zeisel dataset, which have 7 distinct cell types. There was an 80% training and 20% testing split from the original dataset in these experiments
Fig. 32-D visualization for the Zheng-73 k dataset. The Zheng-73 k dataset was reduced to 2-D by using (a) DR-A, (b) PCA, (c) ZIFA, (d) scVI, (e) SAUCIE, (f) t-SNE, (g) UMAP, and (h) DR-A combined with t-SNE methods. Each point in the 2-D plot represents a cell in the testing set of the Zheng-73 k dataset, which have 8 distinct cell types. There was an 80% training and 20% testing split from the original dataset in these experiments
Fig. 4The overall architecture of an Adversarial Variational AutoEncoder (AVAE) framework. An autoencoder (that is, a deep encoder and a deep decoder) reconstructs the scRNA-seq data from a latent code vector z. A discriminator network is trained to discriminatively predict whether a sample arises from a prior distribution or from the latent code distribution of the autoencoder