Literature DB >> 28011787

switchde: inference of switch-like differential expression along single-cell trajectories.

Kieran R Campbell1,2, Christopher Yau2,3.   

Abstract

Motivation: Pseudotime analyses of single-cell RNA-seq data have become increasingly common. Typically, a latent trajectory corresponding to a biological process of interest-such as differentiation or cell cycle-is discovered. However, relatively little attention has been paid to modelling the differential expression of genes along such trajectories.
Results: We present switchde , a statistical framework and accompanying R package for identifying switch-like differential expression of genes along pseudotemporal trajectories. Our method includes fast model fitting that provides interpretable parameter estimates corresponding to how quickly a gene is up or down regulated as well as where in the trajectory such regulation occurs. It also reports a P -value in favour of rejecting a constant-expression model for switch-like differential expression and optionally models the zero-inflation prevalent in single-cell data. Availability and Implementation: The R package switchde is available through the Bioconductor project at https://bioconductor.org/packages/switchde . Contact: kieran.campbell@sjc.ox.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28011787      PMCID: PMC5408844          DOI: 10.1093/bioinformatics/btw798

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Single-cell RNA-sequencing (scRNA-seq) has transformed biology by providing high-throughput quantification of mRNA abundance in individual cells allowing, amongst other things, the identification of novel cell types and gene expression heterogeneity (Trapnell, 2015). Single-cell pseudotime estimation (Ji and Ji, 2016; Reid and Wernisch, 2016; Shin ; Trapnell ) has also enabled gene expression profiles to be mapped to a unique value known as the pseudotime—a surrogate measure of the cellular state in temporally evolving biological process such as differentiation or cell-cycle. Once a pseudotime has been assigned to each cell it is possible to identify genes that exhibit a strong pseudotemporal dependence through differential expression testing. An approach first introduced in Trapnell was to regress gene expression on pseudotime using cubic B-spline basis functions with a Tobit likelihood. However, the flexible nonparametric nature of such models may lead to overfitting and may also be difficult to interpret. To our knowledge no other differential-expression-along-pseudotime models have been proposed. As a solution to these issues we present switchde, a statistical model and accompanying R package for identifying switch-like differential expression analysis along single-cell trajectories. We model sigmoidal expression changes along pseudotime that provides interpretable parameter estimates corresponding to gene regulation strength and timing along with hypothesis testing for differential expression. Our model optionally incorporates zero-inflation for datasets that exhibit high numbers of missing measurements.

2 Materials and methods

We begin with a C × G expression matrix for G genes and C cells with column vector , that is non-negative and represents gene expression in a form comparable to . We define the sigmoid function as where is the latent pseudotime of cell c. The parameters (Fig. 1A) may be interpreted as the average peak expression level (), the activation strength k or how quickly a gene is up-or-down regulated and the activation time (), or where in the trajectory the gene regulation occurs.
Fig. 1.

Sigmoidal expression across pseudotime. (A) The sigmoid curve as a model of gene expression along single-cell trajectories, parametrized by the average peak expression μ0, the activation strength k and the activation time t0. (B) An example using the NDC80 gene from the Trapnell dataset (Trapnell ), which had the lowest P-value of all genes tested. Gene expression measurements are shown as the grey points with the maximum likelihood sigmoid fit denoted by the dark line. The maximum likelihood parameter estimates were and . (C) Zero-inflated differential expression for the transcription factor MYOG. Solid line shows the MLE sigmoidal mean while crosses show imputed gene expression measured as zeroes. (D) Posterior predictive density for the zero-inflated model with the solid line denoting MLE sigmoidal mean.

Sigmoidal expression across pseudotime. (A) The sigmoid curve as a model of gene expression along single-cell trajectories, parametrized by the average peak expression μ0, the activation strength k and the activation time t0. (B) An example using the NDC80 gene from the Trapnell dataset (Trapnell ), which had the lowest P-value of all genes tested. Gene expression measurements are shown as the grey points with the maximum likelihood sigmoid fit denoted by the dark line. The maximum likelihood parameter estimates were and . (C) Zero-inflated differential expression for the transcription factor MYOG. Solid line shows the MLE sigmoidal mean while crosses show imputed gene expression measured as zeroes. (D) Posterior predictive density for the zero-inflated model with the solid line denoting MLE sigmoidal mean. We fit the model using gradient-based L-BFGS-B optimization to find maximum likelihood estimates (MLEs) of the parameters (Supplementary Methods). By setting k = 0 we identify a nested constant-expression model where and so can perform a likelihood ratio test for differential expression, where twice the difference in the log-likelihood MLE between the constant and sigmoidal models asymptotically follows a distribution with two degrees of freedom. scRNA-seq data is also known to exhibit a large number of dropouts where the expression measurements of low abundance transcripts are zero (Kharchenko ). This leads to sparse input matrices for downstream analysis which may violate assumptions of statistical models, such as the Gaussian likelihood above. Therefore, we have also developed an extension for datasets with high dropout rates that incorporates a zero-inflated likelihood similar to Pierson and Yau (2015).

3 Results and discussion

We applied switchde to the set of differentiating myoblasts from Trapnell . Using the originally published pseudotimes, we removed cells corresponding to contaminating mesenchymal cells and fitted switch-like models for the 11 253 genes expressed in at least 20% of cells with a mean expression of 0.1 FPKM, which took less than a minute on a laptop computer. 2336 genes were found to be significantly differentially expressed at 5% FDR after Benjamini-Hochberg multiple testing correction. The gene with the lowest reported P-value was NDC80 whose expression is plotted in Figure 1B along with the MLE sigmoid fit. The maximum likelihood parameter estimates were , indicating strong down-regulation and , which given the pseudotimes range from 0 to 77 indicates this down-regulation occurs within the first quarter of the trajectory. We next applied switchde in zero-inflated mode to a subset of genes from the same dataset. While zero-inflated mode accounts for dropout and is thus a less mis-specified model, the Expectation-Maximization algorithm required for inference takes on average an order of magnitude longer. The resulting fit for the transcription factor MYOG can be seen in Figure 1C. One advantage of the zero-inflated model is that transcripts that exhibit dropout may be imputed given the pseudotemporal trend, shown by the crosses in the figure. Finally, since switchde specifies a fully generative probabilistic model we can generate a posterior predictive distribution of gene expression over pseudotime. This distribution for MYOG is shown in Figure 1D, demonstrating the model is well calibrated with the overall pseudotemporal trend. Further data examples are given in Supplementary Material. In this paper we have introduced switchde, the first dedicated statistical framework for modelling differential expression over pseudotime. By assuming a parametric model of gene expression along trajectories our model provides interpretable parameter estimates corresponding to gene regulation strength and timing, incorporating zero-inflation that is prevalent in many scRNA-seq datasets. Finally, our model provides hypothesis testing for switch-like differential expression, though in practice this may lead to an inflated false discovery rate due to the assumption that pseudotimes are fixed (Campbell and Yau (2016)). Click here for additional data file.
  8 in total

1.  Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis.

Authors:  Jaehoon Shin; Daniel A Berg; Yunhua Zhu; Joseph Y Shin; Juan Song; Michael A Bonaguidi; Grigori Enikolopov; David W Nauen; Kimberly M Christian; Guo-li Ming; Hongjun Song
Journal:  Cell Stem Cell       Date:  2015-08-20       Impact factor: 24.633

2.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.

Authors:  Zhicheng Ji; Hongkai Ji
Journal:  Nucleic Acids Res       Date:  2016-05-13       Impact factor: 16.971

Review 3.  Defining cell types and states with single-cell genomics.

Authors:  Cole Trapnell
Journal:  Genome Res       Date:  2015-10       Impact factor: 9.043

4.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.

Authors:  Cole Trapnell; Davide Cacchiarelli; Jonna Grimsby; Prapti Pokharel; Shuqiang Li; Michael Morse; Niall J Lennon; Kenneth J Livak; Tarjei S Mikkelsen; John L Rinn
Journal:  Nat Biotechnol       Date:  2014-03-23       Impact factor: 54.908

5.  Bayesian approach to single-cell differential expression analysis.

Authors:  Peter V Kharchenko; Lev Silberstein; David T Scadden
Journal:  Nat Methods       Date:  2014-05-18       Impact factor: 28.547

6.  Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference.

Authors:  Kieran R Campbell; Christopher Yau
Journal:  PLoS Comput Biol       Date:  2016-11-21       Impact factor: 4.475

7.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis.

Authors:  Emma Pierson; Christopher Yau
Journal:  Genome Biol       Date:  2015-11-02       Impact factor: 13.583

8.  Pseudotime estimation: deconfounding single cell time series.

Authors:  John E Reid; Lorenz Wernisch
Journal:  Bioinformatics       Date:  2016-06-17       Impact factor: 6.937

  8 in total
  10 in total

1.  CCPE: cell cycle pseudotime estimation for single cell RNA-seq data.

Authors:  Jiajia Liu; Mengyuan Yang; Weiling Zhao; Xiaobo Zhou
Journal:  Nucleic Acids Res       Date:  2022-01-25       Impact factor: 16.971

Review 2.  Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application.

Authors:  Minghui Wang; Won-Min Song; Chen Ming; Qian Wang; Xianxiao Zhou; Peng Xu; Azra Krek; Yonejung Yoon; Lap Ho; Miranda E Orr; Guo-Cheng Yuan; Bin Zhang
Journal:  Mol Neurodegener       Date:  2022-03-02       Impact factor: 18.879

3.  Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers.

Authors:  Kieran R Campbell; Christopher Yau
Journal:  Wellcome Open Res       Date:  2017-03-15

Review 4.  Computational approaches for interpreting scRNA-seq data.

Authors:  Raghd Rostom; Valentine Svensson; Sarah A Teichmann; Gozde Kar
Journal:  FEBS Lett       Date:  2017-06-12       Impact factor: 4.124

Review 5.  The Human Cell Atlas: Technical approaches and challenges.

Authors:  Chung-Chau Hon; Jay W Shin; Piero Carninci; Michael J T Stubbington
Journal:  Brief Funct Genomics       Date:  2018-07-01       Impact factor: 4.241

6.  Single-Cell Transcriptomics of Regulatory T Cells Reveals Trajectories of Tissue Adaptation.

Authors:  Ricardo J Miragaia; Tomás Gomes; Agnieszka Chomka; Laura Jardine; Angela Riedel; Ahmed N Hegazy; Natasha Whibley; Andrea Tucci; Xi Chen; Ida Lindeman; Guy Emerton; Thomas Krausgruber; Jacqueline Shields; Muzlifah Haniffa; Fiona Powrie; Sarah A Teichmann
Journal:  Immunity       Date:  2019-02-05       Impact factor: 31.745

Review 7.  Bayesian statistical learning for big data biology.

Authors:  Christopher Yau; Kieran Campbell
Journal:  Biophys Rev       Date:  2019-02-07

8.  Single-Cell Sequencing of iPSC-Dopamine Neurons Reconstructs Disease Progression and Identifies HDAC4 as a Regulator of Parkinson Cell Phenotypes.

Authors:  Charmaine Lang; Kieran R Campbell; Brent J Ryan; Phillippa Carling; Moustafa Attar; Jane Vowles; Olga V Perestenko; Rory Bowden; Fahd Baig; Meike Kasten; Michele T Hu; Sally A Cowley; Caleb Webber; Richard Wade-Martins
Journal:  Cell Stem Cell       Date:  2018-11-29       Impact factor: 24.633

Review 9.  Orchestrating single-cell analysis with Bioconductor.

Authors:  Robert A Amezquita; Aaron T L Lun; Etienne Becht; Vince J Carey; Lindsay N Carpp; Ludwig Geistlinger; Federico Marini; Kevin Rue-Albrecht; Davide Risso; Charlotte Soneson; Levi Waldron; Hervé Pagès; Mike L Smith; Wolfgang Huber; Martin Morgan; Raphael Gottardo; Stephanie C Hicks
Journal:  Nat Methods       Date:  2019-12-02       Impact factor: 28.547

10.  A descriptive marker gene approach to single-cell pseudotime inference.

Authors:  Kieran R Campbell; Christopher Yau
Journal:  Bioinformatics       Date:  2019-01-01       Impact factor: 6.937

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.