Literature DB >> 30953530

A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data.

Shiquan Sun1,2,3,4, Yabo Chen1, Yang Liu1, Xuequn Shang5,6.   

Abstract

BACKGROUND: Single-cell RNA sequencing (scRNAseq) data always involves various unwanted variables, which would be able to mask the true signal to identify cell-types. More efficient way of dealing with this issue is to extract low dimension information from high dimensional gene expression data to represent cell-type structure. In the past two years, several powerful matrix factorization tools were developed for scRNAseq data, such as NMF, ZIFA, pCMF and ZINB-WaVE. But the existing approaches either are unable to directly model the raw count of scRNAseq data or are really time-consuming when handling a large number of cells (e.g. n>500).
RESULTS: In this paper, we developed a fast and efficient count-based matrix factorization method (single-cell negative binomial matrix factorization, scNBMF) based on the TensorFlow framework to infer the low dimensional structure of cell types. To make our method scalable, we conducted a series of experiments on three public scRNAseq data sets, brain, embryonic stem, and pancreatic islet. The experimental results show that scNBMF is more powerful to detect cell types and 10 - 100 folds faster than the scRNAseq bespoke tools.
CONCLUSIONS: In this paper, we proposed a fast and efficient count-based matrix factorization method, scNBMF, which is more powerful for detecting cell type purposes. A series of experiments were performed on three public scRNAseq data sets. The results show that scNBMF is a more powerful tool in large-scale scRNAseq data analysis. scNBMF was implemented in R and Python, and the source code are freely available at https://github.com/sqsun .

Entities:  

Keywords:  Deep learning; Matrix factorization; Read count; Single-cell RNA sequencing

Mesh:

Year:  2019        PMID: 30953530      PMCID: PMC6449882          DOI: 10.1186/s12918-019-0699-6

Source DB:  PubMed          Journal:  BMC Syst Biol        ISSN: 1752-0509


Background

Single-cell RNA-sequencing (scRNAseq) analysis plays an important role in investigating tumour evolution, and is more powerful to characterize the intra-tumor cellular heterogeneity [1, 2]. Compared with traditional RNA sequencing (i.e. bulk RNAseq) which measures the specific gene expression level within a cell population, scRNAseq quantifies the specific gene expression level within only an individual cell [3, 4]. scRNAseq is more likely to understand the detailed biological processes of cell developmental trajectories and cell-to-cell heterogeneity, providing us fresh insights into cell composition, dynamic cell states, and regulatory mechanisms [5-8]. However, there are still several big challenges we have to carefully deal with before analyzing scRNAseq data [9, 10]. The first challenge is that the scRNAseq data is easy to involve some unwanted variables [11, 12], e.g. batch effects, confounding factors, etc. Moreover, the scRNAseq data set has their own characterizes, such as gene expression matrix is extremely sparse because of the quite small number of mRNAs represented in each cell [13]; current sequencing technologies, e.g. CEL-Seq2 [14] and Drop-seq [15], etc, do not have enough power to quantify the actual concentration of mRNAs (i.e. well-known “dropout events”) [16]; the heavy amplifications may result into strong amplification bias [17]; cell cycle state, cell size or other unknown factors may contribute to cell-cell heterogeneity even within the same cell type [18]. The second important feature of the scRNAseq data set is of count nature [19]. In most RNA sequencing studies, the number of reads mapped to a given gene or isoform is often used as an intuitive estimate of its expression level. To account for the count nature of the RNA sequencing data, and the resulting mean-variance dependence, most statistical methods were developed using discrete distributions in differential expression analysis, i.e., PQLseq [20], edgeR/DESeq [21, 22], and MACAU [23]. Therefore, a nature choice of analyzing scRNAseq data is to develop count-based dimensionality reduction methods. Although several dimensionality reduction techniques have been already applied to scRNAseq data analysis, such as principal component analysis (PCA) [24]; independent components analysis (ICA) [25], and diffusion map [26]; partial least squares (PLS) [27, 28]; nonnegative matrix factorization (or factor analysis) [29, 30], gene expression levels are inherently quantified by counts, i.e., count nature of scRNAseq data [31, 32]. Therefore, developing the bespoke scRNAseq dimensionality reduction method has been triggered within the last two years. The first factor analysis method, ZIFA, is trying to model the drop-out events via the zero-inflated model, but the method does not take into account the count nature of the data [33]; pCMF is trying to build sparse Gamma-Poisson factor model within the Bayesian framework, but such method does not include the covariates [34]; ZINB-WaVE is trying to involve both gene-level and sample-level covariates via a hierarchical model, but the method is really time-consuming when sample size is large [35, 36]. Here, in this paper, we propose a fast and efficient count-based matrix factorization method that utilizes the negative binomial distribution to account for the over-dispersion problem of the count nature of scRNAseq data, single-cell Negative Binomial-based Matrix Factorization, scNBMF. The reason of choosing negative binomial model instead of zero-inflated negative binomial model is that not only the most scRNAseq data sets do not show much technical contribution to zero-inflation (Fig. 1a), but also can largely reduce the computation burden in estimating drop-out parameters for each gene. With the stochastic optimization method Adam [37] implemented within TensorFlow framework, scNBMF is roughly 10 – 100 times faster than the existing count-based matrix factorization methods, such as pCMF and ZINB-WaVE. To make the proposed method scalable, we apply scNBMF to analyze three publicly available scRNAseq datasets. The results demonstrate that scNBMF is more efficient and powerful than other matrix factorization methods.
Fig. 1

A simple example to show the parameter effect or optimizer effect of NMI and ARI in scRNA-seq data on clustering. a This figure shows the relationship between mean gene expression levels and dropout rates. The black line indicates observed value, which is computed by the number of unexpressed cells divided by the number of cells; The red line represents expected value, which is calculated by negative binomial distribution with mean gene expression levels and dispersion parameter ψ(ψ=mean(ψ))b This figure shows how optimizers affect the performance of different methods on NMI and ARI. c-d These two figure indicate how the number of factors affect the NMI and ARI, respectively

A simple example to show the parameter effect or optimizer effect of NMI and ARI in scRNA-seq data on clustering. a This figure shows the relationship between mean gene expression levels and dropout rates. The black line indicates observed value, which is computed by the number of unexpressed cells divided by the number of cells; The red line represents expected value, which is calculated by negative binomial distribution with mean gene expression levels and dispersion parameter ψ(ψ=mean(ψ))b This figure shows how optimizers affect the performance of different methods on NMI and ARI. c-d These two figure indicate how the number of factors affect the NMI and ARI, respectively

Materials and methods

scNBMF: model and algorithm

scNBMF is to fit the logarithm likelihood function of negative binomial model-based matrix factorization. Given n cells and p genes, we denote Y as a gene expression matrix, and its element y is the count of gene i and cell j. To account for the over-dispersion problem, we model the gene expression level y as a random variable following the negative binomial distribution with parameters μ and ϕ, i.e., where the rate parameter μ denotes the mean expression level for gene i and cell j; the parameter ϕ represents variance of gene expression, typically means gene-specific over-dispersion; NB is the negative binomial distribution, i.e. For the rate parameter μ, we consider the following regression model where N is the total read count for the individual cell j (a.k.a read depth or coverage); W is the loadings while H is the factors represents the coordinates of the cells, which can be used to identify cell type purpose; K is the pre-defined number of components; When all ϕ→0, the negative binomial distribution will reduce to the standard Poisson distribution. Therefore, the log-likelihood function for gene i and cell j is where μ denotes the mean gene expression matrix and its element ; ϕ is a p-vector, and its element ϕ represents the over-dispersion parameter for gene i. To make our model more interpretation for the biological applications, we introduce a sparse penalty (LASSO) on loading matrix W since some genes are expressed while some are not in real-world biological processes. Therefore, the objective function of optimization problem becomes where ∥·∥1 is a l1-norm (i.e. LASSO penalty); λ denotes the penalty parameter. In the above model, we are interested in extracting the factor matrix H for detecting the cell type purposes. We first estimate the dispersion parameter ϕ) for each gene via edgeR [21] with default parameter settings, then fit the above model using Adam optimizer within TensorFlow. For deep learning model, we set the learning rate of the network as 0.001 and maximum iteration as 18000.

Compared methods and evaluations

To make scNBMF scalable, we compared seven existing methods, i.e. PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, and ZINB-WaVE, in the experiments. Since PCA and ZIFA are only for normalized gene expression data, we normalized raw count data following previous recommendations [38]. Typically, we transformed the count data using base 2 and pseudo count 1.0, i.e., log2(Y+1.0), into continuous data. The performance of each method was evaluated by the normalized mutual information (NMI), defined in [39] and the adjusted rand index (ARI), defined in [40] where L and L are the predicted cluster labels and the true labels, respectively; K and K are the predicted cluster number and the true cluster number, respectively; n denotes the number of cells assigned to a specific cluster k (k=1,2,⋯,K); similarly n denotes the number of cells assigned to cluster t (t=1,2,⋯,K); n represents the number of cells shared between cluster k and t; and n is the total number of cells.

Public scRNAseq data sets

Three publicly available scRNAseq data sets were collected from three studies: The first scRNAseq data set was collected from human brain [41]. There are 420 cells in eight cell types after excluded hybrid cells including, fetal quiescent cells (110 cells), fetal replicating cells (25 cells), astrocytes cells (62 cells), neuron cells (131 cells), endothelial (20 cells) and oligodendrocyte cells (38 cells) microglia cells(16 cells), and (OPCs, 16 cells), and remain 16,619 genes to test after filtering out the lowly expressed genes. The original data was downloaded from the data repository Gene Expression Omnibus (GEO; GSE67835); The second scRNAseq data set was collected from human pancreatic islet [42]. There are 60 cells in six cell types after excluding undefined cells including alpha cells (18 cells), delta cells (2 cells), pp cells (9 cells), duct cells (8 cells), beta cells (12 cells) and acinar cells (11 cells),and 116,414 genes to test after filtering out the lowly expressed genes. The original data was downloaded from the data repository Gene Expression Omnibus (GEO; GSE73727); The third scRNAseq data set was collected from the human embryonic stem [43]. There are 1018 cells which belong to seven known cell subpopulations that include neuronal progenitor cells (NPCs, 173 cells), definitive endoderm derivative cells (DEDs), endothelial cells (ECs, 105 cells), trophoblast-like cells (TBs, 69 cells), undifferentiated H1(212 cells) and H9(162 cells) ESCs, and fore-skin fibroblasts (HFFs, 159 cells), and contains 17,027 genes to test after filtering step. The original data was downloaded from the data repository Gene Expression Omnibus (GEO; GSE75748).

Results

Model selection

Our first set of experiments is to select the optimization method for the log-likelihood function of negative binomial matrix factorization model. Without loss of generality, we choose the human brain scRNAseq data set. Five optimization methods were compared to optimize the neural networks, i.e., Adam, gradient descent, Adagrad, Momentum and Ftrl. The results show that the Adam significantly outperforms other optimization methods regardless of what criteria we choose (Fig. 1b). Specifically, for NMI, Adam, gradient descent, Adagrad, Momentum, and Ftrl achieve 0.8579, 0.0341, 0.0348, 0.4859, and 0.1251, respectively. Therefore, in the following experiments, we will choose the Adam method to optimize the neural networks. Our second set of experiments is to select the number of factors in the low dimensional structure of cell types. Without loss of generality, we still choose the human brain scRNAseq data set. We varied the number of factors (k = 4, 6, 10, 15, and 20). The results demonstrate that the number of factors does not impact PCA (Fig. 1c and d; bule line). The other four methods show an increasing pattern when the number of factors varied from 4 to 20 (Fig. 1c and d). Therefore, we choose the top 20 factors in the following experiments. Our third set of experiments is to apply scNBMF to three scRNAseq real data sets, human brain, human pancreas islet, and human embryonic stem. The cell type information of the three data sets were reported by the original studies. For the comparison, we compared seven other methods, PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF and ZINB-WaVE. For the evaluation, we extracted the low dimensional structure with top 10 factors, and used k-means clustering method in an unsupervised manner, repeated 100 times to test how well each method can recover the cell type assignments on NMI and ARI in the studies. The first biological data application is performed on the human brain scRNAseq data set. Figure 2 demonstrates the comparison results of tSNE with respect to seven compared clustering methods. scNBMF shows the clearly cell type patterns with the annotated cell type (Fig. 1h). Also, we carried out the same analysis using PCA (Fig. 2a), Nimfa (Fig. 2b), NMFEM (Fig. 2c), tSNE (Fig. 2d), ZIFA (Fig. 2e), pCMF (Fig. 2f), and ZINB-WaVE (Fig. 2g). For NMI and ARI, scNBMF outperforms the other methods. Specifically, for NMI criterion, PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE and scNBMF achieve, 0.582, 0.494, 0.456, 0.712, 0.797, 0.787, 0.892, and 0.901, respectively (Fig. 2i and Table 1); while for ARI criterion, PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE and scNBMF achieve, 0.339, 0.258, 0.264, 0.544, 0.721, 0.788, 0.916, and 0.933, respectively (Fig. 2i and Table 1).
Fig. 2

Performance evaluation on human brain scRNA-seq data. In this data set there are 420 cells in eight different cell types after the exclusion of hybrid cells. Each kind of color represent a kind of cell type. a-h These eight figures display the clustering output of two dimension of tSNE using eight matrix factorization methods(PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE, and scNBMF). f This figure shows NMI and ARI values which are from eight compared methods

Table 1

Clustering comparison of the matrix factorization-based methods in terms of Normalized Mutual information (NMI) and Adjusted Random Index (ARI)

MethodBrainEmbryoPancreas
NMIARINMIARINMIARI
PCA0.5820.3390.3660.1870.6300.368
Nimfa0.4940.2580.4140.1730.4560.114
NMFEM0.4560.2640.7410.6140.4350.175
tSNE0.7120.5440.6580.538 0.793 0.652
ZIFA0.7970.7210.8880.7480.6410.429
pCMF0.7870.7880.8220.6590.5470.334
ZINB-WaVE0.8920.9160.8880.7210.5180.342
scNBMF 0.901 0.933 0.908 0.763 0.7160.472

The number with bold indicates the best performance method and the number with grey represents the second best performance method

Performance evaluation on human brain scRNA-seq data. In this data set there are 420 cells in eight different cell types after the exclusion of hybrid cells. Each kind of color represent a kind of cell type. a-h These eight figures display the clustering output of two dimension of tSNE using eight matrix factorization methods(PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE, and scNBMF). f This figure shows NMI and ARI values which are from eight compared methods Clustering comparison of the matrix factorization-based methods in terms of Normalized Mutual information (NMI) and Adjusted Random Index (ARI) The number with bold indicates the best performance method and the number with grey represents the second best performance method The second biological data application is to investigate the character of human pancreas islet scRNAseq data set. This data set has a smaller number of cells - only 60 cells in six cell types. Since all methods do not have enough power to detect the cell type clustering patterns, we did not show the tSNE plots for this data set. For NMI and ARI, tSNE shows the highest performance, while scNBMF achieves the second best performance (Table 1). Specifically, tSNE achieves 0.973 and 0.652 on NMI and ARI, respectively; while scNBMF is 0.716 and 0.472 on NMI and ARI respectively. The third biological data application is to investigate lineage-specific transcriptomic features at single-cell resolution. To elucidate the distinctions between different lineages, we performed eight matrix factorization methods, i.e., PCA (Fig. 3a), Nimfa (Fig. 3b), NMFEM (Fig. 3c), tSNE (Fig. 3d), ZIFA (Fig. 3e), pCMF (Fig. 3f), ZINB-WaVE (Fig. 3g), and scNBMF (Fig. 3h). scNBMF demonstrates more clearly their respective cell-type patterns compared with other methods. The cell type H1 and H9 show the tight overlapping pattern to indicate the relative homogeneity of human ES cells, such results are also consistence with the previous results [43]. For NMI and ARI, scNBMF outperforms other methods (Fig. 3i and Table 1). Specifically, for NMI, PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE and scNBMF achieve, 0.366, 0.414, 0.741, 0.658, 0.888, 0.822, 0.888, and 0.908, respectively; For ARI, PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE and scNBMF achieve, 0.187, 0.173, 0.614, 0.538, 0.748, 0.659, 0.721, and 0.763, respectively.
Fig. 3

Performance evaluation on human embryonic stem scRNA-seq data set, which contains 1018 cells in seven cell types. Different colors also represent different cell types. a-h These five figure display the clustering output of two dimension of tSNE using five matrix factorization methods(PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE, and scNBMF). f This figure shows NMI and ARI values which are from eight compared methods

Performance evaluation on human embryonic stem scRNA-seq data set, which contains 1018 cells in seven cell types. Different colors also represent different cell types. a-h These five figure display the clustering output of two dimension of tSNE using five matrix factorization methods(PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, ZINB-WaVE, and scNBMF). f This figure shows NMI and ARI values which are from eight compared methods

Computation time

The last set of experiments is to compare the computation time of PCA, Nimfa, NMFEM, tSNE, ZIFA, pCMF, and ZINB-WaVE. Without loss of generality, we use human brain data set to show the computation time of the compared methods (Table 2). Nimfa, NMFEM, ZIFA, pCMF, and ZINB-WaVE are the bespoke scRNAseq methods. Compared with the count-based methods, ZINB-WaVE and pCMF, scNBMF is roughly 100 folds faster than ZINB-WaVE, and 10 folds faster than pCMF. Even comparing the non-count based methods, ZIFA, Nimfa, and NMFEM, scNBMF is still the fastest method.
Table 2

Computation times (second) of the matrix factorization-based methods on human brain scRNAseq data set, k represents the number of factors

Methodk=2k=4k=6k=10k=15k=20
PCA11.5411.5511.7011.3511.3711.59
Nimfa639.151990.662260.132490.052705.422924.87
NMFEM1471.391628.21913.112248.182659.233027.5
tSNE1.8514.4132.1156.0177.20101.25
ZIFA5331.255831.046347.086987.527338.267722.33
pCMF12391.613517.1214260.2615111.5515978.4417158.42
ZINB-WaVE71053.179402.1790118.3101072.9115379.7126575.2
scNBMF456.12478.90541.31717.881053.221563.75
Computation times (second) of the matrix factorization-based methods on human brain scRNAseq data set, k represents the number of factors

Conclusion

With rapid developing sequencing technology, a large amount of scRNAseq data sets is easily obtained via different sources. Therefore, computation time is one of these big issues for downstream analysis. On the other hand, scRNAseq data have their own characterizes, i.e., count nature, noisy, and sparsity, etc. These have been triggered the development of a fast and efficient count-based matrix factorization method. In this paper, we proposed a count-based matrix factorization (scNBMF) method to model the raw count data, prevent losing information from normalizing raw count data. On three public biological scRNAseq data sets, scNBMF provides powerful performance compared with other seven methods in terms of NMI, ARI, and computation time. Zero-inflated distribution is more appropriate method to account for dropouts, e.g. ZIFA and ZINB-WaVE. In current study, we did not consider the zero-inflated model because the tested data sets do not show too much dropouts. However, this is a necessary step in analyzing some scRNAseq data sets. Therefore, we will add the zero-inflated distribution in the future version of the scNBMF. Biologically, if we incorporate all genes in scRNAseq data analysis, probably it would be able to involve some unwanted variables because not all genes are expressed in biological processes. An interesting direction to improve the performance of scNBMF is to select some informative genes first, this step can largely reduce unwanted variables, and exclude some redundancy genes [44, 45] in the downstream analysis. In addition, because gene expression levels are highly affected by other gene specific annotations, such as GC-content, gene length, and chromatin states [46]. If some interesting variables in the statistical model, such as “drop-out” parameter, can be inferred by annotation information, the method probably will significantly improve the power of detecting cell types from scRNAseq data.
  41 in total

1.  Utility of Single-Cell Genomics in Diagnostic Evaluation of Prostate Cancer.

Authors:  Joan Alexander; Jude Kendall; Jean McIndoo; Linda Rodgers; Robert Aboukhalil; Dan Levy; Asya Stepansky; Guoli Sun; Lubomir Chobardjiev; Michael Riggs; Hilary Cox; Inessa Hakker; Dawid G Nowak; Juliana Laze; Elton Llukani; Abhishek Srivastava; Siobhan Gruschow; Shalini S Yadav; Brian Robinson; Gurinder Atwal; Lloyd C Trotman; Herbert Lepor; James Hicks; Michael Wigler; Alexander Krasnitz
Journal:  Cancer Res       Date:  2017-11-27       Impact factor: 12.701

2.  Comparative Analysis of Single-Cell RNA Sequencing Methods.

Authors:  Christoph Ziegenhain; Beate Vieth; Swati Parekh; Björn Reinius; Amy Guillaumet-Adkins; Martha Smets; Heinrich Leonhardt; Holger Heyn; Ines Hellmann; Wolfgang Enard
Journal:  Mol Cell       Date:  2017-02-16       Impact factor: 17.970

3.  Normalizing single-cell RNA sequencing data: challenges and opportunities.

Authors:  Catalina A Vallejos; Davide Risso; Antonio Scialdone; Sandrine Dudoit; John C Marioni
Journal:  Nat Methods       Date:  2017-05-15       Impact factor: 28.547

4.  Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies.

Authors:  Shiquan Sun; Jiaqiang Zhu; Sahar Mozaffari; Carole Ober; Mengjie Chen; Xiang Zhou
Journal:  Bioinformatics       Date:  2019-02-01       Impact factor: 6.937

5.  Robust classification of single-cell transcriptome data by nonnegative matrix factorization.

Authors:  Chunxuan Shao; Thomas Höfer
Journal:  Bioinformatics       Date:  2016-09-23       Impact factor: 6.937

6.  Single-cell RNA-seq and computational analysis using temporal mixture modelling resolves Th1/Tfh fate bifurcation in malaria.

Authors:  Tapio Lönnberg; Valentine Svensson; Kylie R James; Daniel Fernandez-Ruiz; Ismail Sebina; Ruddy Montandon; Megan S F Soon; Lily G Fogg; Arya Sheela Nair; Urijah Liligeto; Michael J T Stubbington; Lam-Ha Ly; Frederik Otzen Bagger; Max Zwiessele; Neil D Lawrence; Fernando Souza-Fonseca-Guimaraes; Patrick T Bunn; Christian R Engwerda; William R Heath; Oliver Billker; Oliver Stegle; Ashraful Haque; Sarah A Teichmann
Journal:  Sci Immunol       Date:  2017-03-03

Review 7.  Application of single-cell genomics in cancer: promise and challenges.

Authors:  Quin F Wills; Adam J Mead
Journal:  Hum Mol Genet       Date:  2015-06-25       Impact factor: 6.150

8.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.

Authors:  Cole Trapnell; Davide Cacchiarelli; Jonna Grimsby; Prapti Pokharel; Shuqiang Li; Michael Morse; Niall J Lennon; Kenneth J Livak; Tarjei S Mikkelsen; John L Rinn
Journal:  Nat Biotechnol       Date:  2014-03-23       Impact factor: 54.908

Review 9.  Challenges and emerging directions in single-cell analysis.

Authors:  Guo-Cheng Yuan; Long Cai; Michael Elowitz; Tariq Enver; Guoping Fan; Guoji Guo; Rafael Irizarry; Peter Kharchenko; Junhyong Kim; Stuart Orkin; John Quackenbush; Assieh Saadatpour; Timm Schroeder; Ramesh Shivdasani; Itay Tirosh
Journal:  Genome Biol       Date:  2017-05-08       Impact factor: 13.583

10.  A general and flexible method for signal extraction from single-cell RNA-seq data.

Authors:  Davide Risso; Fanny Perraudeau; Svetlana Gribkova; Sandrine Dudoit; Jean-Philippe Vert
Journal:  Nat Commun       Date:  2018-01-18       Impact factor: 14.919

View more
  5 in total

Review 1.  Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis.

Authors:  Abhishek Sarkar; Matthew Stephens
Journal:  Nat Genet       Date:  2021-05-24       Impact factor: 38.330

2.  Algorithmic approaches to clonal reconstruction in heterogeneous cell populations.

Authors:  Wazim Mohammed Ismail; Etienne Nzabarushimana; Haixu Tang
Journal:  Quant Biol       Date:  2019-12-07

3.  netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis.

Authors:  Rebecca Elyanow; Bianca Dumitrascu; Barbara E Engelhardt; Benjamin J Raphael
Journal:  Genome Res       Date:  2020-01-28       Impact factor: 9.043

4.  An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data.

Authors:  Xifang Sun; Shiquan Sun; Sheng Yang
Journal:  Cells       Date:  2019-09-27       Impact factor: 6.600

5.  Mutational signature learning with supervised negative binomial non-negative matrix factorization.

Authors:  Xinrui Lyu; Jean Garret; Gunnar Rätsch; Kjong-Van Lehmann
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.