| Literature DB >> 32900358 |
Siamak Zamani Dadaneh1, Paul de Figueiredo2,3,4, Sing-Hoi Sze5, Mingyuan Zhou6, Xiaoning Qian7,8.
Abstract
BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is a powerful profiling technique at the single-cell resolution. Appropriate analysis of scRNA-seq data can characterize molecular heterogeneity and shed light into the underlying cellular process to better understand development and disease mechanisms. The unique analytic challenge is to appropriately model highly over-dispersed scRNA-seq count data with prevalent dropouts (zero counts), making zero-inflated dimensionality reduction techniques popular for scRNA-seq data analyses. Employing zero-inflated distributions, however, may place extra emphasis on zero counts, leading to potential bias when identifying the latent structure of the data.Entities:
Keywords: Bayesian; Hierarchical modeling; Single-cell RNA sequencing
Mesh:
Substances:
Year: 2020 PMID: 32900358 PMCID: PMC7487589 DOI: 10.1186/s12864-020-06938-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Graphical representation of the hierarchical gamma-negative binomial (hGNB) model
Parameters of the hierarchical gamma-negative binomial (hGNB) model and their interpretations in the context of scRNA-seq data
| Parameter | Constraint | Interpretation |
|---|---|---|
| Expression heterogeneity of genes in sample | ||
| Gene-latent factor association | ||
| Popularity of factor | ||
| Impact of cell covariate | ||
| Impact of gene covariate |
The inputs of hGNB are gene counts n and vector of cell- and gene-level covariates x and z
Fig. 2Mean-difference (MD) plot for S1/CA1 dataset. The solid red line represents the local regression fit to the data
Clustering performance based on synthetic data (J=100)
| Zero-Inflation | 40% | 60% | 80% |
|---|---|---|---|
| hGNB | 0.0905 ±0.009 | ||
| PCA | 0.2929 ±0.012 | 0.1265 ±0.012 | 0.0631 ±0.013 |
| ZIFA | 0.2642 ±0.019 | 0.1314 ±0.011 | 0.0728 ±0.010 |
| ZINB | 0.3501 ±0.022 | 0.1641 ±0.011 | |
| Monocle | 0.2453 ±0.015 | 0.1155 ±0.017 | 0.0613 ±0.010 |
| scVI | 0.3122 ±0.029 | 0.1476 ±0.023 | 0.0593 ±0.005 |
Clustering performance based on synthetic data (J=1000)
| Zero-Inflation | 40% | 60% | 80% |
|---|---|---|---|
| hGNB | 0.1470 ±0.010 | 0.0669 ±0.008 | |
| PCA | 0.2594 ±0.009 | 0.0964 ±0.018 | 0.0349 ±0.018 |
| ZIFA | 0.3189 ±0.011 | 0.1191 ±0.004 | 0.0475 ±0.002 |
| ZINB | 0.3574 ±0.019 | ||
| Monocle | 0.2316 ±0.015 | 0.0995 ±0.011 | 0.0490 ±0.001 |
| scVI | 0.2590 ±0.046 | 0.1025 ±0.014 | 0.0351 ±0.006 |
Fig. 3Low-dimensional representations of the S1/CA1 dataset. Panels correspond to (a) PCA (on total-count normalized data), (b) ZIFA (on total-count normalized data), (c) ZINB-WaVE, and (d) hGNB
Fig. 4Average silhouette width in scRNA-seq datasets (a) S1/CA1, (b) mESC, and (c) V1. Silhouette widths were computed in the low-dimensional space, using the groupings provided by the authors of the original publications. PCA and ZIFA were applied with both unnormalized (RAW) data and after total count (TC) normalization
Correspondence between identified clusters and cell types in OE dataset
| Cell Type | Clusters |
|---|---|
| GBC | cl4,cl9 |
| mSUS | cl2,cl3,cl5,cl11 |
| mOSN | cl8,cl12,cl13 |
| Immature Neurons | cl10 |
| MV | cl14 |
Fig. 5Lineage inference on the OE dataset. The low dimensional data representation derived by hGNB were used to cluster cells by RSEC. The minimum spanning tree (MST) of the derived clusters constructed by slingshot is also displayed