| Literature DB >> 27556924 |
Hung-I Harry Chen1,2, Yufang Jin2, Yufei Huang3, Yidong Chen4,5.
Abstract
BACKGROUND: The advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing (scRNA-seq). Unlike the objectives of conventional RNA-seq where differential expression analysis is the integral component, the most important goal of scRNA-seq is to identify highly variable genes across a population of cells, to account for the discrete nature of single-cell gene expression and uniqueness of sequencing library preparation protocol for single-cell sequencing. However, there is lack of generic expression variation model for different scRNA-seq data sets. Hence, the objective of this study is to develop a gene expression variation model (GEVM), utilizing the relationship between coefficient of variation (CV) and average expression level to address the over-dispersion of single-cell data, and its corresponding statistical significance to quantify the variably expressed genes (VEGs).Entities:
Keywords: Cell heterogeneity; Gene expression variation model; Negative binomial distribution; Single-cell; Single-cell RNA-Seq; Variably expressed genes
Mesh:
Substances:
Year: 2016 PMID: 27556924 PMCID: PMC5001205 DOI: 10.1186/s12864-016-2897-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Workflow of identifying significantly variably expressed genes and the following analyses for single-cell RNA-seq data
Fig. 2CV-mean plot of data under different α and β. Other parameters were fixed as gene number = 15,000, cell number = 1,000 cells, prct = 10 %, and s = 2
Estimation of model parameters under different α and β with fixed number of cells, prct, s, and gene number = 15,000. Comparing with the noise model in Eq. 9, we have obtained fairly low RMSE in each condition
| Simulation parameters | Regression results | |||||||
|---|---|---|---|---|---|---|---|---|
| # of cells |
|
|
|
|
|
| RMSE | RMSE (Eq. |
| 1,000 | 10 | 2 | 0 | 1 | 0.0003 ± 0.0002 | 1.0293 ± 0.0014 | 0.0074 ± 0.0006 | 0.044 ± 0.010 |
| 1.2 | 0.0004 ± 0.0003 | 1.2187 ± 0.0024 | 0.0047 ± 0.0006 | 0.066 ± 0.020 | ||||
| 1.5 | 0.0007 ± 0.0004 | 1.5032 ± 0.0039 | 0.0028 ± 0.0007 | 0.091 ± 0.019 | ||||
| 0.15 | 1 | 0.1557 ± 0.0005 | 1.0116 ± 0.0009 | 0.0047 ± 0.0003 | 0.049 ± 0.008 | |||
| 1.2 | 0.1562 ± 0.0007 | 1.1965 ± 0.0020 | 0.0038 ± 0.0004 | 0.030 ± 0.004 | ||||
| 1.5 | 0.1569 ± 0.0007 | 1.4756 ± 0.0047 | 0.0043 ± 0.0006 | 0.017 ± 0.001 | ||||
| 0.5 | 1 | 0.5146 ± 0.0013 | 1.0020 ± 0.0010 | 0.0038 ± 0.0004 | 0.060 ± 0.006 | |||
| 1.2 | 0.5161 ± 0.0011 | 1.1837 ± 0.0023 | 0.0039 ± 0.0003 | 0.047 ± 0.006 | ||||
| 1.5 | 0.5187 ± 0.0016 | 1.4561 ± 0.0045 | 0.0054 ± 0.0005 | 0.030 ± 0.004 | ||||
Estimation of model parameters under different prct and s with fixed number of cells, α, β, and gene number = 15,000
| Simulation parameters | Regression results | |||||||
|---|---|---|---|---|---|---|---|---|
| # of cells |
|
|
|
|
|
| RMSE | RMSE (Eq. |
| 1,000 | 0.15 | 1.2 | 1 | 10 | 0.1563 ± 0.0005 | 1.1965 ± 0.0018 | 0.0037 ± 0.0003 | 0.028 ± 0.002 |
| 30 | 0.1579 ± 0.0006 | 1.2017 ± 0.0019 | 0.0048 ± 0.0003 | 0.026 ± 0.001 | ||||
| 50 | 0.1612 ± 0.0009 | 1.2076 ± 0.0023 | 0.0071 ± 0.0005 | 0.027 ± 0.001 | ||||
| 2 | 10 | 0.1563 ± 0.0005 | 1.1961 ± 0.0019 | 0.0040 ± 0.0005 | 0.033 ± 0.007 | |||
| 30 | 0.1612 ± 0.0017 | 1.2015 ± 0.0024 | 0.0077 ± 0.0014 | 0.036 ± 0.001 | ||||
| 50 | 0.1713 ± 0.0014 | 1.2080 ± 0.0024 | 0.0147 ± 0.0012 | 0.050 ± 0.002 | ||||
| 3 | 10 | 0.1572 ± 0.0012 | 1.1963 ± 0.0026 | 0.0056 ± 0.0009 | 0.048 ± 0.008 | |||
| 30 | 0.1649 ± 0.0010 | 1.1997 ± 0.0027 | 0.0122 ± 0.0011 | 0.054 ± 0.002 | ||||
| 50 | 0.1775 ± 0.0012 | 1.2078 ± 0.0030 | 0.0225 ± 0.0011 | 0.096 ± 0.003 | ||||
Estimation of model parameters under number of cells with fixed α, β, prct, s, and gene number = 15,000
| Simulation parameters | Regression results | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
| # of cells |
|
| RMSE | RMSE (Eq. |
| 10 | 2 | 0.15 | 1.2 | 50 | 0.1595 ± 0.0024 | 1.1078 ± 0.0040 | 0.0127 ± 0.0007 | 0.037 ± 0.007 |
| 100 | 0.1587 ± 0.0023 | 1.1416 ± 0.0044 | 0.0085 ± 0.0009 | 0.034 ± 0.006 | ||||
| 500 | 0.1575 ± 0.0009 | 1.1836 ± 0.0036 | 0.0047 ± 0.0008 | 0.032 ± 0.008 | ||||
Fig. 3a CV-mean plot of data set GSE65525 and b the CV difference histogram
Fig. 4a CV-mean plot of data set GSE60361 and b the CV difference histogram
Fig. 53-D PCA plot of data set GSE65525