Qi Jiang1,2, Shuo Zhang1,2, Lin Wan1,2. 1. NCMIS, LSC, LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. 2. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China.
Abstract
Time series single-cell RNA sequencing (scRNA-seq) data are emerging. However, dynamic inference of an evolving cell population from time series scRNA-seq data is challenging owing to the stochasticity and nonlinearity of the underlying biological processes. This calls for the development of mathematical models and methods capable of reconstructing cellular dynamic transition processes and uncovering the nonlinear cell-cell interactions. In this study, we present GraphFP, a nonlinear Fokker-Planck equation on graph based model and dynamic inference framework, with the aim of reconstructing the cell state-transition complex potential energy landscape from time series single-cell transcriptomic data. The free energy of our model explicitly takes into account of the cell-cell interactions in a nonlinear quadratic term. We then recast the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the adjoint method of optimal control. We evaluated GraphFP on the time series scRNA-seq data set of embryonic murine cerebral cortex development. We illustrated that it 1) reconstructs cell state potential energy, which is a measure of cellular differentiation potency, 2) faithfully charts the probability flows between paired cell states over the dynamic processes of cell differentiation, and 3) accurately quantifies the stochastic dynamics of cell type frequencies on probability simplex in continuous time. We also illustrated that GraphFP is robust in terms of cluster labelling with different resolutions, as well as parameter choices. Meanwhile, GraphFP provides a model-based approach to delineate the cell-cell interactions that drive cell differentiation. GraphFP software is available at https://github.com/QiJiang-QJ/GraphFP.
Time series single-cell RNA sequencing (scRNA-seq) data are emerging. However, dynamic inference of an evolving cell population from time series scRNA-seq data is challenging owing to the stochasticity and nonlinearity of the underlying biological processes. This calls for the development of mathematical models and methods capable of reconstructing cellular dynamic transition processes and uncovering the nonlinear cell-cell interactions. In this study, we present GraphFP, a nonlinear Fokker-Planck equation on graph based model and dynamic inference framework, with the aim of reconstructing the cell state-transition complex potential energy landscape from time series single-cell transcriptomic data. The free energy of our model explicitly takes into account of the cell-cell interactions in a nonlinear quadratic term. We then recast the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the adjoint method of optimal control. We evaluated GraphFP on the time series scRNA-seq data set of embryonic murine cerebral cortex development. We illustrated that it 1) reconstructs cell state potential energy, which is a measure of cellular differentiation potency, 2) faithfully charts the probability flows between paired cell states over the dynamic processes of cell differentiation, and 3) accurately quantifies the stochastic dynamics of cell type frequencies on probability simplex in continuous time. We also illustrated that GraphFP is robust in terms of cluster labelling with different resolutions, as well as parameter choices. Meanwhile, GraphFP provides a model-based approach to delineate the cell-cell interactions that drive cell differentiation. GraphFP software is available at https://github.com/QiJiang-QJ/GraphFP.
The dynamics of cell developmental processes (e.g., cell differentiation and tumorigenesis) are highly complex. Advances in single-cell RNA sequencing (scRNA-seq) technologies enable cell-resolved investigation of heterogeneous cell populations, offering a systematic approach to reveal underlying developmental dynamics, cell communication, and gene regulation [1]. The dynamic inferences of cell development from scRNA-seq transcriptomic profiles draw heavily on advances in computational and systems biology. Many efforts have been advanced to reconstruct cell developmental trajectories and pseudo-time from the single-cell snapshot profile sampled from an evolving cell population [2]. Methods have also been developed to quantify cell developmental landscape [3-6]. However, state-of-the-art dynamic inference methods based on the static single-cell snapshot profile alone may lack identifiability of complex dynamic processes [7].Recently, time series scRNA-seq data profiled from cells sampled at multiple physical time stages have been accumulating, accounting for additional temporal dimension. The wider dynamic ranges enriched by the temporal dimension show great promise in overcoming the difficulties that arise during the inferences from static single-cell snapshot profiling. Computational methods that explicitly incorporate temporal information have been developed. TASIC determined the temporal trajectories based on the probabilistic graphical model to integrate expression and time information [8], while CSHMM developed a continuous state hidden Markov model to infer trajectory structure and assigned cells to paths [9]. TSEE incorporated temporal information into a nonlinear dimensionality reduction algorithm of elastic embedding to visualize dynamic gene expression patterns, offering enhanced temporal resolution [10]. ScPADGRN reconstructed the dynamic gene regulatory network with a preconditioned ADMM optimization method [11]. Tempora incorporated biological pathway information to accurately identify cell types and then incorporated the time information to infer evolving cell-type trajectories [12].An emerging number of methods are being developed to reconstruct cell developmental energy landscape from time series single-cell data using the mathematical framework of optimal transport. Optimal transport has received considerable attention in recent years for various disciplines such as machine learning and statistical data analysis, as it has been proven to be a powerful tool in the analysis of complex data [13]. The core concept of optimal transport, Wasserstein distance between two probability distributions, quantifies an optimal cost of transporting one data distribution to the other. As a remarkably rich and fruitful concept, Wasserstein distance “enables a mechanism transforming the probability space into a Riemannian manifold (known as a Wasserstein manifold), so that geometric structures and partial differential equation (PDE) techniques can be established and analyzed” [14]. Amongst existing methods, Waddington-OT [15] reported landmark work that developed an unbalanced optimal transport framework to reconstruct the cell developmental landscape by inferring cell-cell probabilistic couplings based on the distributions between adjacent time points [15]; TrajectoryNet set up a dynamic optimal transport neural network framework to reconstruct the continuous normalizing flows of evolving cell populations on the continuous state space [16]; PRESCIENT modelled cell differentiation as a diffusion process over a potential energy landscape learned by the neural network framework [17]. However, the computation of optimal transport is still a bottleneck when processing large-scale data [13].In this study, we present GraphFP, a nonlinear Fokker-Planck equation on graph based model and dynamic inference framework, with the aim of reconstructing the cell state-transition complex potential energy landscape from time series single-cell transcriptomic data. The Fokker-Planck equation (FPE) is ubiquitously applied in the modelling of statistical physics and biological systems, including single-cell data analysis [5, 7, 18]. GraphFP is built on the mathematical framework established by the FPE on finite graph originally introduced by Chow et al. [19, 20] and Li [21] (see [14] for a recent survey). Building upon the fundamental form of optimal transport, GraphFP learns the complex geometry of data, as well as provides a novel way to quantify cell-cell interactions during cell development. It models the cell developmental process as stochastic dynamics of the cell state/type frequencies on probability simplex in continuous time. The discrete Wasserstein distance is introduced to transform the probability simplex into a Riemannian manifold, called discrete Wasserstein manifold. The FPE is proven to be the gradient flow of the free energy on the discrete Wasserstein manifold. The free energy of our model consists of a static linear potential energy term and a nonlinear quadratic interaction energy term that characterizes cell-cell interactions [22, 23]. To estimate the parameters which represent the linear potential energy and cell-cell interaction strengths, we recast the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the celebrated adjoint method of optimal control [24]. The adjoint method also played a central role in the development of the well-known neural network algorithm NeuralODE [25].We evaluated GraphFP on the time series scRNA-seq dataset of embryonic murine cerebral cortex development [26]. GraphFP reconstructed the cell state potential energy, which is a measure of cellular differentiation potency from both static and dynamic points of view. It faithfully charted the probability flows of cell state-transitions, consistent with the gold standard benchmarks. It also accurately quantified the stochastic dynamics of cell type frequencies on probability simplex in continuous time. GraphFP delineated cell-cell interactions that drive cell development in a model-based fashion. We tested the cell-cell interaction term of GraphFP by illustrating its ability to fit the nonlinear curves of experimental data and recover held-out time points. We illustrated that GraphFP is robust in terms of cluster labelling with different resolutions, as well as parameter choices. We compared GraphFP with state-of-the-art cell developmental energy landscape reconstruction methods in Discussion.
Methods
GraphFP is a coherent computational framework that simultaneously reconstructs the cell state-transition complex potential energy landscape and infers cell-cell interactions from time series single-cell transcriptomic data (Fig 1). It models cell state-transition dynamics with cell-cell interactions based on the mathematical framework introduced by Chow et al. [19, 20] and Li [21].
Fig 1
Overview of GraphFP algorithm.
GraphFP takes the input of time series single-cell transcriptomic data incorporated with experimental temporal information. It identifies cell states/types, estimates the cell type frequencies at each time point, estimates the linear potential energy Φ and the cell-cell interaction matrix based on the adjoint method. GraphFP outputs the stochastic dynamics of cell type frequencies (t) on probability simplex in continuous time, the cell state transition potential energy, and the probability flows of cell state-transitions underlying the evolving cell population.
Overview of GraphFP algorithm.
GraphFP takes the input of time series single-cell transcriptomic data incorporated with experimental temporal information. It identifies cell states/types, estimates the cell type frequencies at each time point, estimates the linear potential energy Φ and the cell-cell interaction matrix based on the adjoint method. GraphFP outputs the stochastic dynamics of cell type frequencies (t) on probability simplex in continuous time, the cell state transition potential energy, and the probability flows of cell state-transitions underlying the evolving cell population.GraphFP allows for reconstructing cell state potential energy, charting the probability flows between paired cell states over dynamic process, quantifying the stochastic dynamics of cell type frequencies on probability simplex in continuous time, and delineating cell-cell interactions that drive cell differentiation [22]. The main steps for GraphFP are organized in the following subsections.
Identifying cell states/types
Given the time series scRNA-seq data, the single-cell samples are collected at f time stages {t1, t2, …, t}, and for each time stage t(1 ≤ l ≤ f), a number of m single cells are sequenced with the corresponding gene expression vectors , where D is the number of genes for single cells. Thus, the single cell gene expression profile of a total number of cells is contained in the data matrixSuppose that the total number of M cells forms n cell states corresponding to n cell types. When the cells are already annotated or clustered, GraphFP directly utilizes the prior information of cell types. When cell type information is not available, GraphFP will apply state-of-the-art single-cell clustering and annotation methods (e.g., Louvain-Jaccard algorithm [27], the single-cell data analysis platform Seurat [28], and the single cell deep generative method scDEC [29]) to cluster cells into n clusters as the cell states/types.
Constructing the cell state-transition graph
The cell state-transition graph G = (V, E) is an undirected graph, where each vertex i in V represents a cell state/type and each edge {i, j} in E represents that cell state i can directly transit to cell state j or vice versa. In this study, with no inherent reliance on any prior information, we construct the cell state-transition graph G as a complete graph supported on all cell types. However, we can also incorporate prior information of cell state-transition into graph G when available.
Modelling cell state-transition dynamics with the nonlinear FPE on graph equipped with discrete L2-Wasserstein distance
GraphFP models the stochastic dynamics of cell state/type frequencies in the evolving cell population with the nonlinear FPE on graph. The underlying assumption of this model is that cell state-transitions follow the minimum total kinetic energy path during cell differentiation, which can be measured by the discrete L2-Wasserstein distance on graph. To estimate the parameters in the nonlinear FPE of GraphFP, we propose a gradient method to find the parameters that minimize the discrete L2-Wasserstein distance.We use discrete probability distributions supported on the vertices of cell state-transition graph G to represent the state of the system at time t. Suppose the number of vertices of G is n, the set of system states is the probability simplex supported on all vertices of G:The cell-state frequencies or probabilities are estimated for each time point separately, resulting in .Here, we mainly follow the setups in Chow et al. [20]; Li [21], and define the discrete nonlinear free energy of the cell-state system as follows:
where , , and represent the static linear potential energy parametrized by vector , the quadratic interaction energy of paired cell states parametrized by matrix = (w)1≤, and Boltzmann entropy with a hyper-parameter β ≥ 0, respectively. In general, the interaction matrix is asymmetry, with its element w representing the interaction strength from cell state j (as a signalling sender) to cell state i (as a signalling receiver) (see Eq (9) and section “Reconstruction of cell developmental energy landscape and modelling of cell-cell interactions” for further discussion). We hereinafter denote the parameters of the free energy as ≡ {Φ, }.Based on the specific form of free energy , the corresponding FPE on G is defined as follows: for any i ∈ V,
where , N(i) = {j ∈ V|{i, j} ∈ E} is the adjacency set of vertex i ∈ V, and g((t)) is defined asChow et al. [20] and Li [21] proved that, the dynamics of (t) is evolving along the gradient of free energy (Eq (1)) when is equipped with the discrete L2-Wasserstein distance on graph G: for any ,
where the infimum is taken over is a piecewise C1 curve that satisfies the FPE of Eq (2), (t1) = 1 and (t) = }. Intuitively, the discrete L2-Wasserstein distance can be understood as the total work for transporting 1 to on , which is the summation of the kinetic energies of the flows (mass×squared velocity) on the edges of graph G during the time period [t1, t].
Parameter estimation and model optimization
To estimate the parameters = {Φ, } of the free energy using all time series scRNA-seq data collected at time points {t1, t2, ⋯, t} (assume that is constant over the entire time period), we formulate the estimation problem as a minimization problem of the discrete L2-Wasserstein distance:
subject to the constraints
where n is the size of vertex set V. However, the dynamic optimization problem with constrains of multiple fixed points at (t)s in Eq (4) is extremely difficult, and maybe even unsolvable.In this study, to estimate parameters *, we follow TrajectoryNet [16] and relax the constraints of s at time points {t2, ⋯, t} by moving them into the object function as follows
subject to the constraints
where λ ≥ 0 is a constant regularization parameter, n is the size of vertex set V, and is the Kullback-Leibler divergence between the probability distributions and .The optimization problem of Eq (5) can be interpreted as an optimal control problem with fixed initial point and the parameters can be regarded as the control. We therefore propose a gradient method to estimate * based on the Pontryagin’s Maximum Principle (also known as the adjoint method) [24] to solve the optimal control problem of Eq (5). In our implementation, we treat the integral part and the KL divergence part in Eq (5) separately, solve each of them using the adjoint method, and then combine them together through the tradeoff parameters λs. See the details and the pseudocode of GraphFP algorithm in S1 Text.In our implementation, we centralize both Φ and such thatTherefore, in our study, w with a high absolute value indicates strong interaction from cell state j to cell state i (see the following section for further discussion).
Reconstruction of cell developmental energy landscape and modelling of cell-cell interactions
Following Chow et al. [20] and Li [21], we define the dynamic potential energy of cell type i as follows:
which consists of three components: (1) a static linear potential energy Φ that is time-independent and measures the cell differentiation potency of cell type i; (2) an interaction potential energy that coordinates cell development through intercellular communication; and (3) an entropy energy β log p(t) which is an analog of the potential energy induced by white noise in diffusion process [14].Overall, the Ψ(t) depicts a dynamic potential energy landscape of cell type i at time t: cell state i with a higher potential energy Ψ(t) tends to transit to more stable states with lower potential energies; while the Φ quantifies the cell differentiation potency of cell type i, as well as a way to represent cell developmental time (see The linear potential energy Φ by GraphFP quantifies cell differentiation potency in Results).Cell development relies on the coordination of cellular activities based on temporal and local cell-cell communication through molecular signalling events [22]. As a key component, the interaction potential energy in Eq (8) models and quantifies cell-cell interactions that drive cell development, where w is the interaction strength from cell type k (as a signalling sender) to cell type i (as a signalling receiver): when w > 0, cell type k will send signals to upgrade the potential energy of cell type i to a higher level, thus inhibiting the decrease of potential energy of cell type i; when w < 0, cell type k will send signals to downgrade the potential energy of cell type i to a lower level, thus stimulating the decrease of potential energy of cell type i; when w = 0 or w ≈ 0, cell type k will send no or weak signals to cell type i, thus unable to alter the potential energy level of cell type i.We say that cell types i and k have no mutual interactions when both w and w are zero or close to zero. Such modelling of the interaction matrix is inspired and evidenced by our increasing understanding that both positive and negative feedback circuits composed of stimulatory and inhibitory factors are involved in the regulation of precise coordination of cell fate decisions through intercellular communication [22, 30].
Dynamic inference of cell developmental process
Once we estimate parameters * = {Φ*, *}, we can quantify the stochastic dynamics of the cell type frequencies (t) on probability simplex in continuous time t(> t1), according to Eq (2) given the initial point of probability 1 on probability simplex.The potential energy difference between cell states i and j isWe can draw the curves of the potential energy of all cell states over time to illustrate the cell state-transition potential energy landscape from a dynamic point of view.Based on Eq (2) which is the gradient flow of free energy [20, 21], we define the probability flow from vertex j into vertex i through edge {i, j} between time stages [t, t] as
which measures the total probability mass transporting from vertex j into vertex i through edge {i, j} between adjacent time stages [t, t]: a positive value indicates a probability mass gain of vertex i resulted from the flow of cells transiting from state j into state i, while a negative value indicates a probability mass loss of vertex i resulted from the flow of cells transiting from state i into state j. If total probability mass is conservative (e.g., no cell proliferation is considered), we haveThe intuition of the probability flow definition is that, when potential energy difference Δ(t) = F((t)) − F((t)) > 0, cells tend to transit from a higher potential energy state j to a lower potential energy state i, resulting in a probability mass gain of vertex i.
Results
GraphFP models and infers cell differentiation as a cell state-transition process described by the nonlinear FPE on cell state-transition graph in continuous time (see Fig 1 and Methods for details).In this study, we evaluated the performance of GraphFP using the time series scRNA-seq dataset of embryonic murine cerebral cortex development [26]. This dataset was analyzed by Tempora [12]. We downloaded the processed data, including the cell type annotations provided by Tempora [12]. The time series transcriptomic profile contains 6,316 cells collected at embryonic day 11.5 (E11.5), E13.5, E15.5 and E17.5. Overall, these cells represent neuronal development states from the early precursors (apical precursors (APs) and radial precursors (RPs)) to intermediate progenitors (IPs) and differentiated cortical neurons. Fig 2a illustrates the gold standard trajectory of the 4 major cell states curated by Tempora. Tempora identified 7 cell types by clustering and annotation methods: two AP/RP clusters denoted as “3-APs/RPs” and “5-APs/RPs”, two IP clusters denoted as “4-IPs” and “7-IPs”, two young neuron clusters denoted as “2-Young Neurons” and “6-Young Neurons”, and one neuron cluster denoted as “1-Neurons” (Fig 2b).
Fig 2
GraphFP accurately reconstructs the cell state-transition energy landscape of the murine cerebral cortex dataset.
(a) The gold standard trajectory of embryonic murine cerebral cortex development. (b) The t-SNE plot of cells from the murine cerebral cortex dataset, colored by their cell-type labels. (c) GraphFP estimated the linear potential energy Φ. (d) GraphFP estimated the cell-cell interaction matrix . (e) Static linear potential energy landscape of cells on the t-SNE plot: cells are color-coded according to the linear potential energies Φs of their corresponding cell types. (f) The free energy (Eq (1)) of the system decreased over time. (g) The reconstructed potential energy landscape Ψ(t) of cell types (colored curves) over time. (h) The potential energies of the cell state pairs with the top 3 highest positive values of cell-cell interaction strengths ws: “2-Young Neurons ← 1-Neurons” (left panel), “6-Young Neurons ← 3-APs/RPs” (middle panel), and “4-IPs ← 1-Neurons” (right panel). (i) The potential energies of the cell state pairs with the top 3 lowest negative values of cell-cell interaction strengths ws: “6-Young Neurons ← 1-Neurons” (left panel), “4-IPs ← 3-APs/RPs” (middle panel), and “2-Young Neurons ← 4-IPs” (right panel).
GraphFP accurately reconstructs the cell state-transition energy landscape of the murine cerebral cortex dataset.
(a) The gold standard trajectory of embryonic murine cerebral cortex development. (b) The t-SNE plot of cells from the murine cerebral cortex dataset, colored by their cell-type labels. (c) GraphFP estimated the linear potential energy Φ. (d) GraphFP estimated the cell-cell interaction matrix . (e) Static linear potential energy landscape of cells on the t-SNE plot: cells are color-coded according to the linear potential energies Φs of their corresponding cell types. (f) The free energy (Eq (1)) of the system decreased over time. (g) The reconstructed potential energy landscape Ψ(t) of cell types (colored curves) over time. (h) The potential energies of the cell state pairs with the top 3 highest positive values of cell-cell interaction strengths ws: “2-Young Neurons ← 1-Neurons” (left panel), “6-Young Neurons ← 3-APs/RPs” (middle panel), and “4-IPs ← 1-Neurons” (right panel). (i) The potential energies of the cell state pairs with the top 3 lowest negative values of cell-cell interaction strengths ws: “6-Young Neurons ← 1-Neurons” (left panel), “4-IPs ← 3-APs/RPs” (middle panel), and “2-Young Neurons ← 4-IPs” (right panel).
GraphFP reconstructs the cell state-transition energy landscape
We applied GraphFP to the embryonic murine cerebral cortex development scRNA-seq dataset based on the cell state/type labels of 7 clusters provided by Tempora. We estimated the cell state frequencies of the 7 cell states for each of the 4 time points separately. GraphFP first estimated parameters = {Φ, } of the free energy based on the adjoint method (Fig 2c and 2d).In general, the static landscape of the estimated linear potential energies Φs shows a consistent understanding of the differentiation potencies of the 7 cell states. The early precursors states of “5-APs/RPs” and “3-APs/RPs” that mostly comprise cells at E11.5 have the highest two Φs of 0.059 and 0.054, respectively. The two IP clusters, “7-IPs” and “4-IPs”, and one young neuron cluster, “6-Young Neurons”, have the three intermediate-valued Φs of 0.052, -0.051, and 0.035, respectively. The differentiated cortical neuron clusters “1-Neurons” and “2-Young Neurons” have the lowest two Φs of -0.073 and -0.075, respectively (Fig 2c and 2e).We modelled the cell state-transition energy landscape from a dynamical geometric point of view. The dynamic potential energy Ψ(t) consists of not only the static linear part of Φ, but also an interaction energy part as shown in Eq (8). It provides a global and holistic view of cell development process. For example, when only looking at the static landscape of Φ, we found that the linear potential energy of “1-Neurons” (Φ1 = −0.073) is slightly higher than that of “2-Young Neurons” (Φ2 = −0.075). This is in conflict with our understanding that “1-Neurons” should have the lowest potential since it is located at the terminal node of the cell lineage (Fig 2a). However, when looking at the dynamic potential energy Ψ(t), we found that “1-Neurons” has a strong inhibitory interaction over “2-Young Neurons” (w21 = 0.26 versus w12 = 0.04) which can upgrade the potential energy of “2-Young Neurons” during the process. The resultant dynamic potential energy of “2-Young Neurons” (Ψ2(t)) surpasses that of “1-Neurons” (Ψ1(t)) with higher value after time point E13 (Fig 2g and Left Panel of Fig 2h). The potential energy difference (Δ21) between “2-Young Neurons” and “1-Neurons” diverges with enlarging gaps as time evolves, especially in the latter time stages after time point E15.5 (Left Panel of Fig 2h). This is well consistent with our understanding that: (1) “2-Young Neurons” tends to transit to “1-Neurons” during cell development (Fig 2a), and (2) the transition from “2-Young Neurons” to “1-Neurons” mainly occurs at the late neurogenesis between E15.5 and E17.5 [26].Further results on the linear potential energy, the cell-cell interactions and the dynamic potential energy will be provided in the following two sections.
The linear potential energy Φ by GraphFP quantifies cell differentiation potency
Computational quantification of cell differentiation potency (also known as cell stemness) is a challenging issue [18]. The pioneer work by Shi et al. [5] established a rigorous mathematical theory on quantifying cell stemness from scRNA-seq data based on continuous birth-death process.Here, we demonstrated that the linear potential Φ estimated by GraphFP can be used to quantify the cell differentiation potency. In this study, each cell will be assigned the same linear potential value as that of its corresponding cell type/state. Shi et al. [5] also quantified the cell differentiation potency at the cluster level (e.g., cell state and cell type), which makes the results more accurate and robust. Quantifying the cell differentiation potency at single-cell level is still difficult, as the single-cell gene expression profiles are known to be error-prone due to various technique issues [31].Following the study in Shi et al. [5], we tested whether our linear potential energies for pluripotent stem cells (at early time point) are higher than those for differentiated cells (at latter time point). It is clearly shown that, cells from samples collected at earlier time stages tend to have higher potential Φ and vice versa (Fig 3a). When using the one-sided Wilcoxon ranksum statistic as applied by Shi et al. [5], we confirmed with highly statistically significant results that the linear potential values of cells sampled at the earliest time stage E11.5 are higher than those cells sampled at the subsequent time stages E13.5 (p < 1.554e − 07), E15.5 (p < 2.2e − 16), and E17.5 (p < 2.2e − 16), respectively.
Fig 3
The linear potential energy Φ quantifies cell differentiation potency.
(a) Boxplot of the linear potential energies of cells sampled different time stages of the embryonic murine cerebral cortex development. (b) Trend in the addictive inverse of linear potential (circular points connected by black lines with y axis on left-hand side) and temporal score (triangle points connected by red lines with y axis on right-hand side) across cell types. (c) The linear potential energy Φ estimated by GraphFP. (d) The stationary probability distribution of the cell types.
The linear potential energy Φ quantifies cell differentiation potency.
(a) Boxplot of the linear potential energies of cells sampled different time stages of the embryonic murine cerebral cortex development. (b) Trend in the addictive inverse of linear potential (circular points connected by black lines with y axis on left-hand side) and temporal score (triangle points connected by red lines with y axis on right-hand side) across cell types. (c) The linear potential energy Φ estimated by GraphFP. (d) The stationary probability distribution of the cell types.Next, we checked the linear potential energies of the cell types with their pseudo-time during cell development process. Tempora [12] provided each cell type with a temporal score by adjusting its cell composition from each time point such that a cell type containing more cells from an early time point will have a lower score and vice versa. We therefore used the temporal scores as the pseudo-time for the 7 cell types. It is clearly demonstrated that the addictive inverse values of linear potential energy are strongly correlated with the temporal scores (Fig 3b, Pearson correlation coefficient = 0.91), further confirming that the linear potential energy well quantifies cell stemness.It is worth noting that the linear potential energy Φ (Fig 3c) is different from the stationary distribution of cell types (Fig 3d). The stationary distribution , which is the cell type frequencies or cell densities calculated using the merged data across all time points, is often used to construct the stationary energy landscape ≡ −log in scRNA-seq data analysis [6]. However, as pointed out by Shi et al. [5], the stationary energy landscape is the equilibrium potential induced by diffusion without birth and death. In some extent, the linear potential energy Φ by GraphFP is an analogy of the cell potential V(x) proposed by Shi et al. [5], which was taken as their quantification of cell differentiation potency.
GraphFP delineates cell-cell interactions
We quantified the cell-cell interactions and intercellular communication during embryonic murine cerebral cortex development using the cell-cell interaction matrix estimated by GraphFP. The measures the cell-cell interaction strength between each pair of two cell types/states, one as the signalling sender and the other as the signalling receiver (see Eq (9) and section “Reconstruction of cell developmental energy landscape and modelling of cell-cell interactions” in Methods for its biological interpretations).The estimated is a sparse matrix with most elements having values equal or close to zero (Fig 2d), indicating a majority of the pairs having no or weak interactions. For example, the cell types “5-APs/RPs” and “7-IPs” have no mutual interactions with w57 = 0 and w75 = −0.02 (Fig 2d). Furthermore, the estimated cell-cell interaction strengths in the first row of that corresponds to “5-APs/RPs” are zero or close to zero with |w| ≤ 0.04, making the contributions from its interaction term in Eq (8) negligible. As such, the dynamic potential energy Ψ5(t) is dominated by its linear potential energy (Φ5) with a resultant flattening potential energy curve (Fig 2g).We also observed a number of strong cell-cell interactions with large ws deviating from zero. Cell states other than “5-APs/RPs” and “7-IPs” have at least one w with strong interaction strength (e.g., |w| > 0.1), resulting in the deviation of their potential energies Ψ(t)s from their linear potential energies Φs largely driven by their interaction energies with sharpened potential energy curves (Fig 2g).We further examined cell state pairs with strong interactions.
The pairs of “2-Young Neurons ← 1-Neurons” (w21 = 0.26), “6-Young Neurons ← 3-APs/RPs” (w63 = 0.14) and “4-IPs ← 1-Neurons” (w41 = 0.12) have the top 3 highest positive values of ws (Fig 2d), indicating that the sender cell types (“1-Neurons”, “3-APs/RPs” and “1-Neurons”) pass strong inhibitory signalling to their corresponding receiver cell types (“2-Young Neurons”, “6-Young Neurons” and “4-IPs”, respectively). Their potential energy differences (|Δ|s) diverge with enlarging gaps as time evolves (Fig 2h), resulting in that cells tend to transit in one direction from the cell state with higher potential energy to the cell state with lower potential energy, only rarely transiting in the reverse direction.
In particular, “2-Young Neurons” tends to transit to “1-Neurons”, “3-APs/RPs” tends to transit to “6-Young Neurons”, “4-IPs” tends to transit to “1-Neurons”. These results are consistent with our understanding of the cell development process depicted in Fig 2a.On the other hand, the pairs of “6-Young Neurons ← 1-Neurons” (w61 = −0.21), “4-IPs ← 3-APs/RPs” (w43 = −0.15) and “2-Young Neurons ← 4-IPs” (w24 = −0.14) have the top 3 lowest negative values of ws, indicating that the sender cell types (“1-Neurons”, “3-APs/RPs” and “4-IPs”) pass strong stimulatory signalling to their receiver cell types (“6-Young Neurons”, “4-IPs” and “2-Young Neurons”, respectively).
In particular, the potential energy differences (|Δ|s) of the pair “6-Young Neurons: 1-Neurons” and the pair “3-APs/RPs: 4-IPs” converge with shrinking gaps as time evolves (Fig 2i), making the transitions between the paired cell states approaching to equilibrium in both directions. This result is consistent with our probability flow (Fig 4), where the net probability flows from “6-Young Neurons” to “1-Neurons” as well as from “3-APs/RPs” to “4-IPs” gradually decrease over time. The potential energy difference between “4-IPs” and “2-Young Neurons” starts from a small value close to zero at time point E11.5, then gradually increases to its largest gap at E14, and then gradually declines to zero again at E17.5. This result indicates that the transition from “4-IPs” to “2-Young Neurons” mainly occurs at the intermediate time region from E13.5 to E15.5, which is consistent with our understanding that the “4-IPs” cells are the intermediate progenitors of cell development (Fig 2a).
Fig 4
GraphFP charts the probability flows of cell state-transitions.
The circle point represents cell type (point size is proportional to the cell-type frequency at each time point); the line between cell types represents probability flow from source cell type to target cell type (line width is proportional to the value of probability flow).
GraphFP charts the probability flows of cell state-transitions.
The circle point represents cell type (point size is proportional to the cell-type frequency at each time point); the line between cell types represents probability flow from source cell type to target cell type (line width is proportional to the value of probability flow).Based on our calculation using GraphFP, we also confirmed that free energy (Eq (1)) of the system decreased over time (Fig 2f), which is consistent with accepted mathematical theory [20, 21]. However, according to our calculation, free energy of the system did not converge to its minimum free energy state at time point E17.5 when the experiment ended (see the vertical dashed red line in Fig 2f). We predicted from Fig 2f that the system would reach its minimum free energy state after time point E30.
GraphFP faithfully charts the probability flows of cell state-transitions during cell development
We next examined the ability of GraphFP to quantify the dynamics of cell state-transitions during embryonic murine cortical development by calculating the probability flows (Eq (10)) between each time intervals of the adjacent time stages (Fig 4). In the early stage from E11.5 to E13.5, cells mainly transit from the early precursors of “3-APs/RPs” and “5-APs/RPs” to the intermediate progenitor “4-IPs” and the two neuron clusters of “2-Young Neurons” and “1-Neurons”. In the middle stage from E13.5 to E15.5, the intermediate progenitor “4-IPs” joins in with “3-APs/RPs” and “5-APs/RPs” as the major source clusters that transit to two neuron clusters, “2-Young Neurons” and “1-Neurons”. Meanwhile, as a source cluster, “2-Young Neurons” starts to transit to “1-Neurons”, and in the latter stage from E15.5 to E17.5, “4-IPs” takes a leading role in transiting to the neuron cluster of “1-Neurons” and young neuron clusters, “2-Young Neurons” and “6-Young Neurons”. Meanwhile, “2-Young Neurons” continues as one of the major source clusters transiting to “1-Neurons” (Fig 4). Compared with the gold standard trajectory shown in Fig 2a by Tempora [12], we identified a new path whereby the IP cells of cluster “4-IPs” transit to neuron cells of cluster “1-Neurons”, as confirmed by Yuzwa et al. [26], who reported that cortical RPs divide asymmetrically from E11.5 to E17.5 to generate neurons directly or indirectly via transit-amplifying cells of IPs.
Cell-cell interactions drive the stochastic and nonlinear dynamics of cell development
GraphFP explicitly models cell-cell interactions with a nonlinear quadratic interaction term in the free energy (Eq (1)). To account for cell-cell interactions, we evaluated GraphFP on its ability to fit the experimental data (Fig 5a) and recover held-out time points (Fig 5b–5d) with cell-cell interaction term ( ≠ 0; solid lines in Fig 5) and without cell-cell interaction term ( = 0; dashed lines in Fig 5). To evaluate the performance on estimation accuracy, we applied Kullback-Leibler divergence (KLD) to measure the difference between the estimated probability distribution with/without interaction term and true probability distribution at each time points (Table 1). A lower KLD value is indicative of better performance.
Fig 5
GraphFP accurately quantifies the stochastic dynamics of the cell type frequencies by modelling cell-cell interactions.
GraphFP calculated the stochastic dynamics of the cell type frequencies (t) with cell-cell interaction term ( ≠ 0; solid lines) and without cell-cell interaction term ( = 0; dashed lines). Triangle points are the estimated cell type frequencies at each time point where red represents the input data point to GraphFP, while blue represents the held-out data point to GraphFP. (a) Using all 4 time points as input. (b) Held-out E13.5. (c) Held-out E15.5. (d) Held-out E13.5 and E15.5.
Table 1
Evaluation of GraphFP’s performance on quantifying the stochastic dynamics of cell-type frequencies with cell-cell interaction term (W ≠ 0) and without cell-cell interaction term (W = 0) on the murine cerebral cortex dataset.
KLD
With all time points
Held out E13.5
Held out E15.5
Held out E13.5 and E15.5
with
without
with
without
with
without
with
without
E13.5
0.0007
0.0101
0.0340
0.0217
7.0020e-05
0.0377
0.0675
0.0900
E15.5
0.0133
0.0168
0.0009
0.0119
0.0372
0.0518
0.1036
0.1130
E17.5
0.0056
0.0861
0.0036
0.0785
5.0603e-05
0.0240
1.9569e-06
2.0707e-06
The Kullback-Leibler divergence (KLD) distance was used to measure the difference between the estimated probability distribution by GraphFP and true probability distribution at each time points (E13.5, E15.5, E17.5).
GraphFP accurately quantifies the stochastic dynamics of the cell type frequencies by modelling cell-cell interactions.
GraphFP calculated the stochastic dynamics of the cell type frequencies (t) with cell-cell interaction term ( ≠ 0; solid lines) and without cell-cell interaction term ( = 0; dashed lines). Triangle points are the estimated cell type frequencies at each time point where red represents the input data point to GraphFP, while blue represents the held-out data point to GraphFP. (a) Using all 4 time points as input. (b) Held-out E13.5. (c) Held-out E15.5. (d) Held-out E13.5 and E15.5.The Kullback-Leibler divergence (KLD) distance was used to measure the difference between the estimated probability distribution by GraphFP and true probability distribution at each time points (E13.5, E15.5, E17.5).First, we applied GraphFP to the embryonic murine cerebral cortex development scRNA-seq dataset using all 4 time points. Based on the estimated parameters *, we calculated the stochastic dynamics of the cell type frequencies (t) on probability simplex in continuous time t(> t1) according to Eq (2) with given initial point of . Overall, GraphFP with cell-cell interaction term outperforms GraphFP without cell-cell interaction term on the fitting of the nonlinear curves for the 7 clusters (Table 1), especially for “2-Young Neurons”, “4-IPs” and “6-Young Neurons” (Fig 5a).Next, we applied GraphFP to the scRNA-seq datasets of (i) one held-out time point E13.5 (Fig 5b) and (ii) one held-out time point E15.5 (Fig 5c), separately.
GraphFP with cell-cell interaction term always outperforms GraphFP without cell-cell interaction term on both nonlinear curve fitting and held-out time point recovering except for one comparison on Held out E13.5 dataset at time stage E13.5. (Table 1).Finally, we applied GraphFP to the scRNA-seq data set of two held-out time points, E13.5 and E15.5 (Fig 5d). It is not surprising that both models drop their accuracies markedly on recovering the held-out time points.
In addition, the results by GraphFP with cell-cell interaction term still outperform those by GraphFP without cell-cell interaction term (Table 1 and Fig 5d).As shown in Fig 5, our results illustrate that the stochastic and nonlinear dynamics of cell development are not merely determined by the linear potential energies Φs, but also driven by nonlinear cell-cell interactions. Specifically, the evolving probability frequencies of cell types can be nonmonotonic (e.g., “2-Young Neurons”, “4-IPs”, “6-Young Neurons” in Fig 5).
Meanwhile, time series scRNA-seq data with cells profiled at more time points will provide more temporal information to recover the biologically complex dynamic processes.
GraphFP is robust to input data
As also shown in Fig 5a–5c, our results illustrate that GraphFP robustly recovers the stochastic and nonlinear dynamics of cell development by using all datasets or datasets with one held-out time point.Since GraphFP works on cells with cluster labels or cell type annotations, we then examined whether GraphFP is sufficiently robust to account for the uncertainty presented in the clustering or annotation methods. In the above sections, we have illustrated the outputs of GraphFP based on the labelling of 7 clusters with a fine resolution provided by Tempora [12]. Here, we further grouped the cells into 4 clusters with a coarse resolution as follows: “A-Neurons” constituted by cells from “1-Neurons”; “B-Young Neurons” constituted by cells from “2-Young Neurons” and “6-Young Neurons”; “C-APs/RPs” constituted by cells from “3-APs/RPs” and “5-APs/RPs”; and “D-IPs” constituted by cells from “4-IPs” and “7-IPs”. We compared the results of GraphFP based on the labelling of 7 cell types and the labelling of 4 cell types (Fig 6). To make the results comparable, we aggregated the results based on the labelling of 7 cell types by averaging the results from i) “3-APs/RPs” and “5-APs/RPs”, ii) “2-Young Neurons” and “6-Young Neurons” and iii) “4-IPs” and “7-IPs”, separately, resulting in the same dimensions as those based on the labelling of 4 cell types. It should be noted that the results based on 4 cell types and the aggregated results based on 7 cell types are consistent with similar patterns of linear energies Φs (Fig 6a and 6d), interaction matrices s (Fig 6b and 6e), and probability flows (Fig 6c and 6f). In addition, GraphFP is robust to the hyper-parameter choices in wide ranges.
Fig 6
GraphFP is robust to uncertainty presented in cell type labels.
GraphFP was applied to the murine cerebral cortex dataset based on the labelling of 4 cell types with a coarse resolution (a-c) and the labelling of 7 cell types with a fine resolution (d-f), separately. The estimated Φ (a), (b) and charted probability flow (c) by GraphFP based on the labelling of 4 clusters (“A-Neurons”, “B-Young Neuron”, “C-APs/RPs”, “D-IPs”). Aggregated results of the estimated Φ (d), (e) and charted probability flow (f) by GraphFP based on the labelling of 7 clusters, averaging the results from i) “3-APs/RPs” and “5-APs/RPs”, ii) “2-Young Neurons” and “6-Young Neurons” and iii) “4-IPs” and “7-IPs”, separately, resulting in the same dimensions as those based on the labelling of 4 cell types.
GraphFP is robust to uncertainty presented in cell type labels.
GraphFP was applied to the murine cerebral cortex dataset based on the labelling of 4 cell types with a coarse resolution (a-c) and the labelling of 7 cell types with a fine resolution (d-f), separately. The estimated Φ (a), (b) and charted probability flow (c) by GraphFP based on the labelling of 4 clusters (“A-Neurons”, “B-Young Neuron”, “C-APs/RPs”, “D-IPs”). Aggregated results of the estimated Φ (d), (e) and charted probability flow (f) by GraphFP based on the labelling of 7 clusters, averaging the results from i) “3-APs/RPs” and “5-APs/RPs”, ii) “2-Young Neurons” and “6-Young Neurons” and iii) “4-IPs” and “7-IPs”, separately, resulting in the same dimensions as those based on the labelling of 4 cell types.
The computational cost of GraphFP
We examined the impact of the number of cell types (n) on the computational cost of GraphFP. When working on the murine cerebral cortex dataset with 7 cell types (n = 7) and 4 time stages, the runtime of GraphFP was around 3 minutes for each task on a personal laptop (MacBook Pro with CUP 2.4 GHz Intel Core i5 and Memory 8 GB 2133 MHz LPDDR3) (S1 Table). In our implementation, we set λ = 1000(l ∈ 2, 3, 4), β = 0.001, the learning rate as α = 0.01/λ and Integral_step as 0.1.We next examined GraphFP on another time series scRNA-seq dataset of the mouse spinal cord injury healing process provided by [32] (see the detailed results in S2 Text). The new dataset contains 13 clusters (cell types) and 4 time points. We applied GraphFP to this dataset on the same computer with the same hyper-parameter settings as those for the murine cerebral cortex dataset. GraphFP still achieved accurate and robust reconstruction of cell state-transition energy landscape on this dataset (S2 Text), and also achieved a reasonable performance on computational speed: the runtime for each task was around 9 minutes (S2 Table).Based on the above experiments, we found that the computational speed of GraphFP is sustainable for tasks with moderate number of cell types. Meanwhile, the computation of GraphFP might be problematic for large n (e.g., n > 100) at current settings. As we applied the complete cell state-transition graph, the degree of freedom (e.g., the parameters of the cell-cell interaction matrix ) will grow in the order of O(n2), making the computation difficult. However, this problem is solvable. One way to solve this problem is to take advantage of the sparse structure of the cell-cell interaction matrix . As we have already noted, the estimated s of both the murine cerebral cortex dataset (Fig 2d) and the mouse spinal cord injury dataset (Fig A(b) in S2 Text) are sparse. Therefore, we can solve this problem by adding a L1 regularization term of the matrix to the loss function to enforce to be sparse. We plan to pursue this topic in our future work.On the other hand, in practice, for large number of cell types, we can trade off the estimation accuracy and the information of cell-cell interactions for runtime performance. GraphFP without the cell-cell interaction term will be efficient for large number of cell types since the degree of freedom (e.g., the parameters of linear potential energy Φ) will grow in the order of O(n). For example, the runtimes of GraphFP without cell-cell interaction term ( = 0) for both the murine cerebral cortex dataset (S1 Table) and the mouse spinal cord injury dataset (S2 Table) are all less than 20 seconds.
Discussion
Modelling of cell development has long been a key goal of systems biology.
The Waddington landscape is a classic metaphor for describing cell development. Mathematical framework of cell developmental energy landscape has been developed to study the dynamics of cell state-transitions from gene regulatory network (GRN) based perspective (e.g., [33-36]) and state manifold based perspective (e.g., [5, 37, 38]) (see [18] for a recent review of the two approaches). Traditional GRN-based landscape can be hindered by the computational issue raised by high-dimensional GRNs. Recently, a model-based dimension reduction approach of the landscape (DRL) was proposed to construct a low-dimensional energy landscape of high-dimensional GRNs [36], which overcomes the limitations of traditional methods. Although great success has been achieved, the GRN-based landscape depends on prior biological knowledge of the underlying GRN. When the information of GRNs is unavailable or not complete, the state manifold based landscape will be constructed, especially for scRNA-seq data analysis. The state manifold based methods model the cell development with stochastic Markov process and/or drift-diffusion PDE, where cell states (e.g., cell types and cell clusters) represent the local attractors of the underlying dynamic systems [18].In this study, we propose GraphFP, a state manifold based computational framework, to reconstruct the complex potential energy landscape and infer the stochastic dynamics of cell state-transition during cell development. GraphFP models cell development based on the diffusion process in a discrete spectrum of states [19-21]. It can be viewed from the lens of dynamic optimal transport on networks as solving an optimal control problem to minimize the kinetic energy of flow between adjacent time points [14, 39]. The FPE of GraphFP can be characterized as a gradient flow of free energy when the probability simplex of discrete states is equipped with the discrete L2-Wasserstein metric defined on the graphs [19-21]. Beyond its clear theoretical importance, GraphFP has enabled critical insight into nonlinear dynamic cell state-transition, as well as cell-cell interactions during cell development. We demonstrated that the cell-cell interaction part of GraphFP plays a key role in capturing the stochastic dynamic of the cell-type frequencies on both the murine cerebral cortex dataset (Table 1 and Fig 5) and the mouse spinal cord injury dataset (S2 Text).GraphFP has the following strengths over existing methods. First, GraphFP models the dynamics of cell clusters (e.g., cell states and cell types) on a discrete state space. In contrast, methods, such as Waddington-OT [15], TrajectoryNet [16] and PRESCIENT [17], modelled the dynamics of individual cells with drift-diffusion equations on a continuous state space. With the dramatic increase in amount and size of scRNA-seq data, the cluster-based approaches, which work on a relatively small number of clusters that usually represent annotated cell types, warrant both scalability to large-scale scRNA-seq data and ease of biological interpretability [12].Second, GraphFP is built on a nonlinear model that explicitly takes into account cell-cell interactions in free energy. The current computational methods for inferring cell-cell interactions from single-cell data are mainly based on machine learning or statistics, relying heavily on the domain knowledge as learning materials [22, 23]. On the other hand, GraphFP provides an alternative and model-based approach to decipher cell-cell interactions that drive cell development. In contrast, the underlying models of both Waddington-OT [15] and PRESCIENT [17] are only able to characterize cell state-transition on the static potential energy landscape driven by random noises, failing to account for cell-cell interactions. Although able to reconstruct nonlinear development landscape, TrajectoryNet was based on the neural network framework without explicit system models, thus lacking biological interpretability.Nonetheless, some aspects still need to be improved. Firstly, the current GraphFP does not account for cell proliferation during cell development, which may result in that probability masses are not conservative over time. We can solve this problem by adopting the unbalanced optimal transport framework that has been used by Waddington-OT [15] and TrajectoryNet [16] to quantify cell proliferation. Secondly, as the existing time series scRNA-seq methods such as Waddington-OT [15] and Tempora [12], the current GraphFP works in an off-line fashion such that the cell clustering and annotation are performed on the entire data by merging cells from all time points together. This approach offers an unbiased, comprehensive and quantitative definition of discrete cell types. However, with the emerging large-scale scRNA-seq data, it may be computationally cumbersome to cluster the massive and continually arriving scRNA-seq datasets as a whole. Therefore, developing an on-line framework of GraphFP that can cluster and annotate the single-cell time series scRNA-seq data in different batches in a serial fashion should be an interesting topic. The newly developed single-cell data analysis tools such as the on-line integration method online iNMF [40] and the cell type annotation method scArches based on transfer learning [41] can be adopted.
Details for the parameter estimation of GraphFP.
This document provides detailed description of the parameter estimation and pseudocode for the GrapFP algorithm.(PDF)Click here for additional data file.
Application of GraphFP to the mouse spinal cord injury dataset.
Fig A. GraphFP reconstructs the cell state-transition energy landscape on the mouse spinal cord injury scRNA-seq dataset. Fig B. The linear potential energy Φ quantifies cell differentiation potency. Table A. Evaluation of GraphFP’s performance on quantifying the stochastic dynamics of cell type frequencies with cell-cell interaction term ( ≠ 0) and without cell-cell interaction term ( = 0) on the mouse spinal cord injury dataset.(PDF)Click here for additional data file.
Runtimes of GraphFP on the murine cerebral cortex dataset.
(DOCX)Click here for additional data file.
Runtimes of GraphFP on the mouse spinal cord injury dataset.
(DOCX)Click here for additional data file.6 Sep 2021Dear Dr. Wan,Thank you very much for submitting your manuscript "Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data" for consideration at PLOS Computational Biology.As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Qing NieAssociate EditorPLOS Computational BiologyDouglas LauffenburgerDeputy EditorPLOS Computational Biology***********************Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: In this work, the authors propose an approach to infer cell state transition dynamics and uncover cell-cell interactions from time series single-cell RNA-seq data, by constructing the energy landscape of the system. Based on a nonlinear Fokker-Planck equation on Graph based model, they solve the model inference problem in a dynamic optimal transport framework. They further illustrate the validity of their approach by applying it to a time series scRNA-seq data set of embryonic murine cerebral cortex development.Overall, this is an interesting work, aiming to address a critical problem in dynamic model inference and data-driven approach in biology. However, I have some concerns that need to be addressed.1, The authors assume that the cell-cell interaction matrix W is symmetric, what does this assumption mean in biology? And why is this a reasonable assumption? For example, should cells with different differentiation potency have symmetric interactions?2, Fig.2 (a)(b)(c), the use of color for different cell types is very confusing. Please try to make them consistent.3, It’s interesting to get cell-cell interaction matrix W as in Fig.2. However, the biological indication needs to be further discussed. For example, from Fig. 2 (g and h) it is found that cell type 3 (Aps/RPs) has positive interaction with cell type 6 (Young Neurons), and negative interaction with cell type 4 (IPs) or 2 (Young Neurons). What could be the biological meaning here based on the linage commitment route in Fig. 2(a)? Why do the two Young Neurons populations (cell type 6 and 2) have opposite types of interactions with cell type 3 (Aps/RPs)?4, The energy landscape here is defined for each cell type, could it be expanded to single cell? i.e., each cell has a potential energy?5, Current work focus on the cell-cell interactions, and molecular expression data is used only for estimating probability of cell types at different time point. As we know, the molecular interactions are important to cell fate transition determinations, and corresponding landscape construction approaches have been proposed (Kang, Advanced Science, 8, 2003133 (2021)). The authors may want to discuss the relationship between the two types of approaches.Reviewer #2: The authors consider the dynamic inference of an evolving cell population from the time series scRNA-seq data. The main idea of their approach is to apply the optimal transport theory on graph with a free energy consists of the potential, mean field interaction, and the entropy terms. The parameters describing the potential and interaction terms are estimated by fitting the model to the available data. The computation applied to embryonic murine cerebral cortex development gives reasonable results. The literature review is also fair. Overall, I think the paper is interesting and present simple yet useful tool to analyze the time series scRNA-seq data.Some major and minor points are as below.Major:1. The computational cost and robustness. The authors state that the method will be applied to the cell states/types in a complete graph. If it is only for cell types, which have a small number, the DOF is small and the computation cost should be light. If it is for the whole cell states, the DOF might be huge and the computation is difficult to be done. BTW, in different time points, how will the authors choose the common cell states if they do not do grouping at first? How will the different grouping affect the result?2. Quantification of the cell stemness. This is an important issue in scRNA-seq data analysis. Will the proposed method give a quantification on this problem? I would like to see some tests and comments on this point.3. The authors choose the optimal transport (OT) approach and and Wasserstein distance to the scRNA-seq data analysis, which is a hot topic in this area and machine learning community. Yes, it is fancy to take OT. But it is also possible to utilize other simpler model and optimization approach to accomplish similar task. I wonder whether the authors have some reason to take OT instead of other methods. Some comments and discussion on this point should be added.Minor:1. Some typo or gramatical errors.Line 73: double periods in the text.Line 121: as follows**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: NoFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at .Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols5 Nov 2021Submitted filename: ResponseLetter-GraphFP.pdfClick here for additional data file.8 Dec 2021Dear Dr. Wan,Thank you very much for submitting your manuscript "Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Qing NieAssociate EditorPLOS Computational BiologyDouglas LauffenburgerDeputy EditorPLOS Computational Biology***********************A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:[LINK]Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: The authors have addressed my concerns thoroughly. I have only a few minor comments.1, Line 483, is ref. 37 the right reference here? It seems that ref. 37 is not talking about data-driven landscape.2, The final word of the paper (Line 533), “adopt” should be adopted?3, Fig. 3 (c and d), the y axis seems to be labeled incorrectly.Reviewer #2: The authors responded to my concerns perfectly. Now I recommend it for publication.**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: NoFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocolsReferences:Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.14 Dec 2021Submitted filename: ResponseLetter-GraphFP-R2.pdfClick here for additional data file.10 Jan 2022Dear Dr. Wan,We are pleased to inform you that your manuscript 'Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data' has been provisionally accepted for publication in PLOS Computational Biology.Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.Best regards,Qing NieAssociate EditorPLOS Computational BiologyDouglas LauffenburgerDeputy EditorPLOS Computational BiologyFeilim Mac GabhannEditor-in-ChiefPLOS Computational Biology***********************************************************20 Jan 2022PCOMPBIOL-D-21-00954R2Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic dataDear Dr Wan,I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!With kind regards,Agnes PapPLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
Authors: Scott A Yuzwa; Michael J Borrett; Brendan T Innes; Anastassia Voronova; Troy Ketela; David R Kaplan; Gary D Bader; Freda D Miller Journal: Cell Rep Date: 2017-12-26 Impact factor: 9.423
Authors: Lindsay M Milich; James S Choi; Christine Ryan; Susana R Cerqueira; Sofia Benavides; Stephanie L Yahn; Pantelis Tsoulfas; Jae K Lee Journal: J Exp Med Date: 2021-06-16 Impact factor: 14.307
Authors: Suoqin Jin; Christian F Guerrero-Juarez; Lihua Zhang; Ivan Chang; Raul Ramos; Chen-Hsiang Kuan; Peggy Myung; Maksim V Plikus; Qing Nie Journal: Nat Commun Date: 2021-02-17 Impact factor: 17.694
Authors: Chao Gao; Jialin Liu; April R Kriebel; Sebastian Preissl; Chongyuan Luo; Rosa Castanon; Justin Sandoval; Angeline Rivkin; Joseph R Nery; Margarita M Behrens; Joseph R Ecker; Bing Ren; Joshua D Welch Journal: Nat Biotechnol Date: 2021-04-19 Impact factor: 54.908