| Literature DB >> 30953530 |
Shiquan Sun1,2,3,4, Yabo Chen1, Yang Liu1, Xuequn Shang5,6.
Abstract
BACKGROUND: Single-cell RNA sequencing (scRNAseq) data always involves various unwanted variables, which would be able to mask the true signal to identify cell-types. More efficient way of dealing with this issue is to extract low dimension information from high dimensional gene expression data to represent cell-type structure. In the past two years, several powerful matrix factorization tools were developed for scRNAseq data, such as NMF, ZIFA, pCMF and ZINB-WaVE. But the existing approaches either are unable to directly model the raw count of scRNAseq data or are really time-consuming when handling a large number of cells (e.g. n>500).Entities:
Keywords: Deep learning; Matrix factorization; Read count; Single-cell RNA sequencing
Mesh:
Year: 2019 PMID: 30953530 PMCID: PMC6449882 DOI: 10.1186/s12918-019-0699-6
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1A simple example to show the parameter effect or optimizer effect of NMI and ARI in scRNA-seq data on clustering. a This figure shows the relationship between mean gene expression levels and dropout rates. The black line indicates observed value, which is computed by the number of unexpressed cells divided by the number of cells; The red line represents expected value, which is calculated by negative binomial distribution with mean gene expression levels and dispersion parameter ψ(ψ=mean(ψ))b This figure shows how optimizers affect the performance of different methods on NMI and ARI. c-d These two figure indicate how the number of factors affect the NMI and ARI, respectively
Fig. 2Performance evaluation on human brain scRNA-seq data. In this data set there are 420 cells in eight different cell types after the exclusion of hybrid cells. Each kind of color represent a kind of cell type. a-h These eight figures display the clustering output of two dimension of tSNE using eight matrix factorization methods(PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE, and scNBMF). f This figure shows NMI and ARI values which are from eight compared methods
Clustering comparison of the matrix factorization-based methods in terms of Normalized Mutual information (NMI) and Adjusted Random Index (ARI)
| Method | Brain | Embryo | Pancreas | |||
|---|---|---|---|---|---|---|
| NMI | ARI | NMI | ARI | NMI | ARI | |
| PCA | 0.582 | 0.339 | 0.366 | 0.187 | 0.630 | 0.368 |
| Nimfa | 0.494 | 0.258 | 0.414 | 0.173 | 0.456 | 0.114 |
| NMFEM | 0.456 | 0.264 | 0.741 | 0.614 | 0.435 | 0.175 |
| tSNE | 0.712 | 0.544 | 0.658 | 0.538 |
|
|
| ZIFA | 0.797 | 0.721 | 0.888 | 0.748 | 0.641 | 0.429 |
| pCMF | 0.787 | 0.788 | 0.822 | 0.659 | 0.547 | 0.334 |
| ZINB-WaVE | 0.892 | 0.916 | 0.888 | 0.721 | 0.518 | 0.342 |
| scNBMF |
|
|
|
| 0.716 | 0.472 |
The number with bold indicates the best performance method and the number with grey represents the second best performance method
Fig. 3Performance evaluation on human embryonic stem scRNA-seq data set, which contains 1018 cells in seven cell types. Different colors also represent different cell types. a-h These five figure display the clustering output of two dimension of tSNE using five matrix factorization methods(PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE, and scNBMF). f This figure shows NMI and ARI values which are from eight compared methods
Computation times (second) of the matrix factorization-based methods on human brain scRNAseq data set, k represents the number of factors
| Method | ||||||
|---|---|---|---|---|---|---|
| PCA | 11.54 | 11.55 | 11.70 | 11.35 | 11.37 | 11.59 |
| Nimfa | 639.15 | 1990.66 | 2260.13 | 2490.05 | 2705.42 | 2924.87 |
| NMFEM | 1471.39 | 1628.2 | 1913.11 | 2248.18 | 2659.23 | 3027.5 |
| tSNE | 1.85 | 14.41 | 32.11 | 56.01 | 77.20 | 101.25 |
| ZIFA | 5331.25 | 5831.04 | 6347.08 | 6987.52 | 7338.26 | 7722.33 |
| pCMF | 12391.6 | 13517.12 | 14260.26 | 15111.55 | 15978.44 | 17158.42 |
| ZINB-WaVE | 71053.1 | 79402.17 | 90118.3 | 101072.9 | 115379.7 | 126575.2 |
| scNBMF | 456.12 | 478.90 | 541.31 | 717.88 | 1053.22 | 1563.75 |