Literature DB >> 34879062

DeepCME: A deep learning framework for computing solution statistics of the chemical master equation.

Ankit Gupta¹, Christoph Schwab², Mustafa Khammash¹.

Abstract

Stochastic models of biomolecular reaction networks are commonly employed in systems and synthetic biology to study the effects of stochastic fluctuations emanating from reactions involving species with low copy-numbers. For such models, the Kolmogorov's forward equation is called the chemical master equation (CME), and it is a fundamental system of linear ordinary differential equations (ODEs) that describes the evolution of the probability distribution of the random state-vector representing the copy-numbers of all the reacting species. The size of this system is given by the number of states that are accessible by the chemical system, and for most examples of interest this number is either very large or infinite. Moreover, approximations that reduce the size of the system by retaining only a finite number of important chemical states (e.g. those with non-negligible probability) result in high-dimensional ODE systems, even when the number of reacting species is small. Consequently, accurate numerical solution of the CME is very challenging, despite the linear nature of the underlying ODEs. One often resorts to estimating the solutions via computationally intensive stochastic simulations. The goal of the present paper is to develop a novel deep-learning approach for computing solution statistics of high-dimensional CMEs by reformulating the stochastic dynamics using Kolmogorov's backward equation. The proposed method leverages superior approximation properties of Deep Neural Networks (DNNs) to reliably estimate expectations under the CME solution for several user-defined functions of the state-vector. This method is algorithmically based on reinforcement learning and it only requires a moderate number of stochastic simulations (in comparison to typical simulation-based approaches) to train the "policy function". This allows not just the numerical approximation of various expectations for the CME solution but also of its sensitivities with respect to all the reaction network parameters (e.g. rate constants). We provide four examples to illustrate our methodology and provide several directions for future research.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34879062 PMCID： PMC8687598 DOI： 10.1371/journal.pcbi.1009623

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

This is a PLOS Computational Biology Methods paper.

1 Introduction

Stochastic modelling in systems and synthetic biology has become an indispensable tool to quantitatively understand the intrinsically noisy dynamics within living cells [1]. Intracellular reaction networks typically involve low copy-number species in reactions that fire intermittently at random times, as opposed to continuously. Hence, deterministic models of such networks based on Ordinary Differential Equations (ODEs) fail to capture the essential properties of the system, and stochastic models become necessary [2]. Among the most widely used stochastic models are continuous-time Markov chains (CTMCs) whose states represent the copy-numbers of all species involved in the Chemical Reaction Network (CRN) [3]. If the number of species in the CRN is n, the Markov chain evolves over a discrete, possibly infinite, state-space comprising all accessible states. In most applications, the key object of interest is the probability distribution p(t) of the random state at time t. This probability distribution evolves in time t according to Kolmogorov’s forward equation that is more famously known in the chemical literature as the Chemical Master Equation (CME) (see, e.g., [4], and (8)). The CME is a system of coupled, deterministic ODEs describing the rates of inflow and outflow of probability at each state in the state-space . For even very small examples of CRNs, can be very large or infinite, and hence the CME cannot be solved directly despite the linear nature of its constituent ODEs. Hence, one typically estimates CME solutions numerically either by simulating the CTMC trajectories with numerical methods like the Stochastic Simulation Algorithm (SSA) [5] or the modified Next Reaction Method (mNRM) [6], or one models (parts of) the CME asymptotically in various parameter regimes, such as the large copy-number limit, or the large systems limit (see, e.g., [3, 7] and the references therein). Then, Fokker-Planck PDEs govern the evolution of the limiting densities. Solutions of these PDEs are known to admit DNN approximations which are free from the “Curse of Dimensionality” (CoD), see e.g. [8] and the references there. The main drawback of simulation-based solvers is that obtaining statistically precise estimates of the CME solution can be very cumbersome, due to the high cost of CTMC trajectory simulation. This led to the development of the Finite State Projection (FSP) method [9] that approximately solves the CME by truncating the state-space to a finite, tractable size. The FSP has been successfully used in many important biological studies with stochastic reaction network models. Over time, numerous algorithmic improvements to the original FSP method have been made, using advanced techniques such as Krylov Subspace approximations [10] or Tensor-Train representations [11]. Despite these advances, the scope of FSP’s applicability is still fairly limited because of the CoD inherent to the CME for complex CRNs: the dimension of the copy-number space of a large number of species involved in the CRN can be potentially prohibitive. With the algorithmic complexity of deterministic solution methods of the CME scaling exponentially with the number of species n, the CoD obviates the efficient numerical treatment of the CME for complex CRNs. In spite of these drawbacks, simulation schemes like the SSA or mNRM combined with FSP and its variants have emerged as the methodology of choice during the past decades for the computational exploration of complex CRNs in systems biology. This is mainly due to the lack of computational schemes that can effectively deal with CoD. In the past decade, with the ubiquitous emergence of possibly massive, noisy data from natural biological CRNs, and the possibility of engineering synthetic biological CRNs, the question of efficient numerical analysis of CRNs has become pivotal. Indeed several tasks in computational biology strongly depend on the availability of scalable, efficient computational tools to analyse large CRNs. These include structure and parameter identification in large CRNs, assimilation of observable data into CRN models, Bayesian estimation of non-observable quantities of interest conditional on CRNs, among many others. Recently, in the context of high-dimensional partial differential equations (PDEs), deep-learning based numerical approaches have been found highly effective in dealing with the CoD in these settings and appear efficient in numerical approximation of PDE solutions with high-dimensional state and parameter spaces [12-14]. We refer to the survey [8] and the references therein. Importantly, several types of PDEs considered in these studies also arise from various asymptotic scalings (large copy-numbers, large systems limits) of large CRNs. (e.g. [4, 7, 15, 16]). Furthermore, DNNs have been shown to be at least as expressive as certain tensor-structured formats from numerical multi-linear algebra, which were developed for the CME in [11] (see also [17]). Motivated by these advances and observations, in this paper we develop and explore corresponding deep-learning approaches for the efficient numerical solution of CMEs and for the related tasks of parameter estimation, and inference. Before detailing our approach, we remark that leveraging Machine Learning (ML) based approaches for the numerical treatment of complex CRNs is, in our view, natural and critical: CRNs being themselves networks, any viable computational approach should, in some sense, mimic this structure in order to accommodate the complexity of CRNs. This is in line with our previous work on tensor network based computational methods (e.g. [11, 18]). On the other hand, ML-based computational methodologies for data assimilation and quantitative prediction of complex systems is currently undergoing intense development. We therefore expect that corresponding advances in computational ML, such as progress in interpretability and training methods for DNNs, will entail corresponding methodological advances in the exploration of large, complex CRNs in biological systems engineering. Next we briefly describe our ML approach to solving the CME. Instead of estimating the probability p(t, x) for each state , one is often interested in learning the expectation of suitable real-valued functions g, referred to as the output function, under this probability distribution. Therefore, one is interested in the input-output map that associates an initial density to In the case this sum is formal for now. We later will indicate some sufficient conditions for this sum to be well-defined—applying state-space truncation schemes e.g. [9], we may assume that holds with a small error in the estimated expectation, which renders the summation finite. For example, if , for some and i ∈ {1, …, n}, then the output to be estimated is the m-th moment of the random copy-number of the i-th species at time t, i.e. Another relevant example is when , the indicator function for some subset defined as Then the output is the probability of the state X(t) being in set A at time t One method of choice to numerically approximate the map is by stochastic simulations generated with the SSA and its variants (e.g. [5, 19–21] and the references therein) combined with ensemble averaging. Generally, this approach mandates a large number of sample paths, to achieve Monte Carlo convergence to reasonable accuracy for at fixed t > 0. In the present paper, we propose DeepCME, a deep neural network based methodology to emulate the above-mentioned map. Also in the present approach path simulation is required, during the DNN training phase. However, we find that the number of paths to achieve DNN training generally is lower than by direct use of Monte Carlo estimator combined with stochastic simulations; accuracy is achieved through the generalization properties of DNNs rather than through approximation of admissible sets of initial densities. As is by now well-known in ML, an essential ingredient in DNN based approaches to emulate high-dimensional maps is the mathematical setup of suitable loss-functions which determine the training process. In DeepCME, we propose a particular loss function which is inspired by other, recent approaches in computational finance (e.g. [12] and the references therein). Specifically, using Kolmogorov’s backward equation, Kurtz’s random time change formulation [22] and Ito’s formula for jump processes, we identify an equation that the output quantity of interest along with some “policy map” must uniquely satisfy for each stochastic trajectory (X(t)) almost surely. Minimising a “loss” function that measures the error in this equation, allows us to train a deep neural network to learn the policy map and the quantity of interest in a reinforcement learning framework. Remarkably, this approach also yields the sensitivities of the quantity of interest w.r.t. all model parameters. Estimating these parametric sensitivities is important for many applications, but it is considered a difficult problem towards which a lot of research effort has recently been directed [23-31]. This paper is organised as follows. In Section 2 we provide some background on the CTMC model of a reaction network. In Section 3 we present our main results that allow us to cast the problem of solving a CME into the reinforcement learning framework. In Section 4 we describe our deep-learning approach and its implementation in TensorFlow. In Section 5 we illustrate this approach with four examples. Finally, in Section 6 we conclude and present directions for future research.

2 Preliminaries

2.1 The stochastic model

We start by describing the continuous-time Markov chain (CTMC) model of a reaction network. Consider a network with n species, denoted by X1, …, X, that participate in K reactions of the form where ν (resp. ) is the number of molecules of species X consumed (resp. produced) by reaction k. The system’s state at any time is the vector of copy-numbers of the n species. As time advances, this state gets displaced by the stoichiometric vector when reaction k fires, and this event occurs at rate λ(x) where is the propensity function for reaction k. Commonly λ is given by mass-action kinetics [3] with c > 0 being the associated rate constant. There are many ways to formally specify the CTMC representing a reaction network. One way is through its generator, which is an operator that captures the rate of change of the probability distribution of the process (see Chapter 4 in [22]). It is given by for any f that is a bounded real-valued function on the state-space of the Markov chain. The state-space is assumed to be nonempty and closed under the reaction dynamics, i.e. if and λ(x) > 0 then (x + ζ) is also in . Another way to specify the CTMC is via Kurtz’s random time change representation (see Chapter 6 in [22]) where R(t) is a counting process that counts the number of firings of reaction k in the time-period [0, t]. As is customary in trajectory-simulation (e.g. [3, 19]) which will also be required by us for DNN training, we express it in terms of an independent unit rate Poisson process Y as e.g. [21] With this representation in place, we consider the CTMC (X(t)) on the canonical probability space generated by the independent unit-rate Poisson processes Y1, …, Y.

2.2 Kolmogorov’s forward and backward equations

Let (X(t)) be the CTMC representing reaction dynamics with some initial state . For any state , let be the probability that the CTMC is in state x at time t. These probabilities evolve deterministically in time according to Kolmogorov’s forward equation, more widely known as the Chemical Master Equation (CME) [3, 4]. The CME is the following system of deterministic linear ODEs for each . Note that the number of ODEs in this CME system is equal to , the number of elements in , which is typically exorbitantly large or even infinite. Consider an output function such that for some finite time horizon T > 0. The Kolmogorov’s backward equation [32] describes the evolution of the martingale (w.r.t. the filtration generated by (X(t))) in the time interval [0, T], and it is given by with the terminal condition V(T, x) = g(x), . The backward Eq (10) will play a key role in our development of a deep learning approach for estimating quantities of the form . In the case where the state-space is finite, i.e. , we can enumerate it as . Then the CTMC generator in (11) can be expressed as the m × m transition rate matrix Q = [Q] given by Here we assume for convenience that all stoichiometry vectors (ζ-s) are distinct. Viewing p(t) as the vector , we can express the CME (8) as Here, , i, j ∈ 1: m. CME (12) admits the closed-form solution Similarly, viewing V(t) as the vector , the backward Eq (11) can be solved as where g denotes the vector . We are interested in networks where is extremely large or infinite. Then, numerically computing the matrix exponential in (13) or in (14) is not an option.

2.3 Parametric sensitivity analysis

Now consider the situation where the propensity functions depend on a scalar parameter θ (like reaction rate constant for mass-action kinetics, temperature etc.). Denoting the θ-dependent CTMC as (X(t)), it is often of interest to compute the parametric sensitivity of the observed output at time T. Such sensitivity values are important for many applications and their direct calculation is generally impossible but a number of simulation-based approaches have recently been developed to provide efficient numerical estimation of these sensitivity values; we mention only [23-31]. Theorem 3.3 in [29] proves that where V(t, x) is defined by (10) with X(⋅) replaced by X(⋅). The main difficulty in using this formula for computing sensitivities is that the function is unknown and hence it must be estimated “on the fly” by numerically generating auxiliary paths [29]. In the method we develop in this paper we shall “learn” (i.e., emulate by ML techniques) this function using deep neural networks. This would provide a simple direct way to estimate the parameter sensitivity via formula (16). This approach would in fact yield sensitivities w.r.t. all the model parameters in one shot, unlike what is afforded by existing sensitivity estimation approaches. In other words once the function x ↦ Δ V(t, x) is available for each k ∈ 1: K, we can use a common set of simulated trajectories to evaluate Monte Carlo estimators for sensitivities w.r.t. all parameters, based on expression (16), without any extra simulation effort. This is unlike most simulation-based approaches where estimation of each parameter sensitivity requires an additional set of distinct trajectories.

3 Main results

In this section we state and prove the key result on which our deep learning approach depends. Recall that our goal is to estimate (see (1)) which is the same as V(0, x0) (see (10)) if the initial state of the CTMC is X(0) = x0. Also recall the random time-change representation (5) and the definition of the reaction counting process R from (6). Henceforth we shall denote the centred version of this process as This centred process is a local martingale w.r.t. the filtration generated by (X(t)) (see Chapter 1 in [33]). We now state an assumption that we require for our approach. Assumption 3.1 (Non-explosivity of the CTMC) Let (X(t)) be the CTMC given by (5) with deterministic initial condition X(0) = x0. Let denote the state-space of this CTMC and let be the filtration it generates. If τ -stopping time defined by then τ → ∞ almost surely as M → ∞. Remark 3.2 There are a number of works in the literature that provide sufficient conditions for this non-explosivity condition to hold, subject to the form of the propensity functions (see for example in [34-36] and the references therein). Under the no-explosion assumption a probability distribution p(t) satisfying the CME exists uniquely (see e.g. Lemma 1.23 in [33]). Next we present the main result on which our deep learning approach is based. Theorem 3.3 (Expected output and policy map characterization) Suppose Assumption 3.1 holds for the CTMC (X(t)) and the output function satisfies (9). Let be a real number and let be a measurable -valued function on such that the following relation holds almost surely Then and exist uniquely and they can be identified as where V(t, x) is given by (10) and the difference operator Δ is as in (17). Remark 3.4 (Connection to our deep learning approach) Before we prove this theorem, we briefly describe how this result translates into our deep learning approach, details of which will be provided in Section 4. We can view as the “policy map” (in the parlance of reinforcement learning) that decides actions based on the current time-state pair (t, x), and depending on these actions the constant initial value is evolved in the time-interval [0, T] according to the r.h.s. of (19) for any CTMC trajectory (X(t)). Theorem 3.3 shows that the only way the final outcome of this evolution matches g(X(T)) at time T, is when is exactly the expected output , and the policy map is exactly (Δ1 V(t, x), …, Δ V(t, x)) where Δ V(t, x) is the difference in the expected output at time T, due to a single firing of reaction k at time t with system’s state at x = X(t). Using the modified next reaction method [6], one can easily generate trajectories of the CTMC (X(t)) along with the associated centred reaction counting processes . For each such trajectory, relation (19) can be interpreted in terms of known and unknown quantities as We represent the unknown map by a DNN and consider unknown as an optimisation variable. Then by minimising a “loss” function that measures the discrepancy in relation (21) we try to recover the optimal values of and that are given by (20). This allows us to estimate the output of interest (as ) and also its parametric sensitivities by substituting for Δ V(t, x) in (16). Observe that in traditional simulation-based estimation approaches, each simulated trajectory contributes with a small weight (viz. reciprocal of the sample size) to the Monte Carlo estimator for the output or one of its parameter sensitivities. This is quite unlike the proposed deep learning approach where each trajectory specifies an almost sure relationship between the unknown quantities that determine both the output and all its parameter sensitivities. Hence the deep learning approach is able to extract more information out of a small number of simulated trajectories as our examples in Section 5 illustrate. Proof.[Proof of Theorem 3.3] We prove this result in two steps. We first show that and given by (20) satisfy (19) almost surely. Then, we prove that if another such pair satisfying (19) exists then we must necessarily have and . Applying Ito’s formula for jump Markov processes to V(t, X(t)) we obtain Using Kolmogorov’s backward Eq (11) and simplifying we get Noting that V(T, X(T)) = g(X(T)) and we see that (19) holds with and given by (20). Now let be another pair satisfying (19), i.e. We subtract (22) from this equation to obtain where Note that is a local martingale w.r.t. the filtration generated by (X(t)) as it is defined as a sum of stochastic integrals whose integrands are adapted to and whose integrators are local martingales w.r.t. (see Appendix A.3 in [33]). If τ is the stopping time defined in Assumption 3.1, then the stopped process m(t ∧ τ) is a martingale, where a ∧ b ≔ min{a, b}. Applying Doob’s maximal inequality [22] on the submartingale |m(t ∧ τ)| we obtain Note that terms on both sides of the inequality are monotonically increasing in M. This monotonicity is obvious for the term on the l.h.s. and for the term on the r.h.s. it follows from the conditional Jensen’s inequality and from the martingale property Letting M → ∞ and using the monotone convergence theorem on both sides of (24) we obtain where we have used the fact that τ → ∞ as M → ∞ due to Assumption 3.1. Relation (23) informs us that m(T) = 0 almost surely and hence This is sufficient to conclude that and for any t ∈ [0, T]. As this holds for any CTMC trajectory (X(t)), we must have for any . This completes the proof of this theorem.

4 DeepCME: Deep learning formulation for CME

In this section we detail our deep learning method for solving CME, referred to as DeepCME. We have computationally implemented this method using the machine learning library TensorFlow [37]. Our source code for generating the ensuing numerical experiments is available at GitHub: https://github.com/ankitgupta83/DeepCME. As outlined in Remark 3.4, our approach is based on the almost sure relationship established in Theorem 3.3. Even though this result was presented for a single output function g(x), it can be easily extended for a vector-valued function g(x) = (g1(x), …, g(x)) by considering the unknown variable as a R-dimensional vector and the unknown map that takes a time-state pair (t, x) as input and produces an output in the space of R × K matrices. Such an extension is useful because in most applications one is interested in estimating multiple statistical properties (like means, variances, covariances etc.) of the CME solution p(T, ⋅). We now define the “loss” function that measures the discrepancies in the R almost sure relations given by (21). Let be the following continuously differentiable function where Δ = (Δ1, …, Δ) is a vector of positive threshold values and We define the loss function as where the expectation is estimated by computing the sample mean over a finite batch of “training” trajectories. During the training process this loss function is minimised in order to learn the optimal , which estimates our expectations of interest and the optimal matrix-valued policy map (see Remark 3.4). This policy map will enable us to estimate sensitivities of the quantities of interest w.r.t. all the model parameters as discussed in Section 2.3. The threshold values Δ = (Δ1, …, Δ) help in neutralising the disparities in the relative magnitudes of the estimated quantities and the discrepancies in the corresponding almost sure relations. The loss function minimisation is performed with the stochastic gradient descent (SGD) algorithm that makes use of the automatic differentiation routines that are built in TensorFlow. Differentiability properties of the function which defines the loss function are important for convergence of the SGD iterations. Our choice of ϕ(x) makes (x1, …, x) a differentiable combination of norm (for components with absolute values greater than 1) and norm squared (for components with absolute values strictly less than 1). Having such a combination makes the training more robust and promotes sparsity. In DeepCME we first encode the matrix-valued policy map by a DNN and we include as a vector of trainable variables. Then a batch of training trajectories is generated, and based on and the DNN-encoded policy map , the loss function is evaluated for this training data by measuring the discrepancy (according to (25)) in the almost sure relationship presented in Theorem 3.3. Keeping the training data fixed, this loss function is then minimised by adjusting and the DNN with SGD for a given number of iterations. Once these iterations are complete, provides estimates for the expectations of interest and their parametric sensitivities can be estimated by evaluating Monte Carlo estimators based on expression (16), using the DNN-encoded policy map and the training trajectories. In the next two subsections we elaborate more on the DNN encoding of the policy map and the loss function computation based on simulated trajectories.

4.1 DNN encoding of the policy map

Recall from Section 2.2 that if the state-space is finite then V(t, x) can in principle be found by exponentiating the transition rate matrix Q multiplied with (T − t) (see (14)). Hence, if λ = λ1 + iλ2 is an eigenvalue of Q, then on the associated eigenspace we would expect that the dependence of V(t, x) on time t is given by . Motivated by this rationale, rather than passing the time-values t directly as inputs to the DNN that encodes , we shall pass temporal features of the form where λ11, …, λ, λ12, …, λ are 2r trainable variables that represent the r dominant eigenvalues of the generator of the CTMC. Additionally, r trainable variables ψ1, …, ψ are included to represent ‘phase shifts’. Problem-specific temporal features, like the ones we consider, have been successfully employed in existing deep learning methods for ODE-based reaction network models (see, e.g., [38] and the references therein). Note that the mapping between time t and the temporal features is one-to-one and hence no information is lost by substituting time inputs with temporal features. We encode the policy map as a fully connected feedforward deep neural network whose architecture is schematically shown in Fig 1. This neural network consists of an input layer, L hidden layers and an output layer. Mathematically, DNNs Φ considered here are determined by a tuple where in layer ℓ = 1, …, L + 1, the map is an affine transformation i.e. , with weight matrix , and bias vector . As mentioned, in the presently considered DNNs, the input layer takes the temporal features and the state vector x = (x1, …, x).

Fig 1

Architecture of the neural network.

Architecture of the neural network.

DNN architecture to encode the matrix-valued map . The inputs (t, x) are passed to an input layer, which leaves the state values x unchanged but activates a dictionary of temporal features (26). The resulting output is propagated through a DNN with L fully connected hidden layers, and an additional output layer with each layer having N nodes. For simplicity, we assume no sparsity in the weight matrices and the bias vectors of these layers. In the final step, the output from the output layer is cast into the policy-map matrix corresponding to inputs (t, x). In Section 4.2 we describe how the loss function can be computed using this matrix-valued map for a batch of stochastic trajectories. The nonlinearities in (27) act on vectors in component-wise, with possibly different activations at each layer. The number L + 1 denotes the number of layers (sometimes referred to as depth) of the DNN Φ, and L denotes the number of hidden layers of DNN Φ. With the DNN Φ, we associate a realization, i.e., a map The relation between the DNN parameters Φ and its realisation as a map is not one-to-one: for several choices of Φ, realizations R(Φ) may give rise to the same map . This over-parametrization of DNNs is well-known to cause multiple minima in loss functions of DNN parameters, and to obstruct use of efficient optimisation algorithms in numerical DNN training. The goal of DNN approximations is to provide a parsimonious surrogate map R(Φ) for many-parametric, input-output maps which are not explicitly known and are accessible computationally only through possibly noisy evaluations. The input layer transforms the time-value t into temporal features (26) but leaves the state vector x = (x1, …, x) unchanged. For the layers, we assume fixed width, i.e., that each layer consists of N nodes (including the output layer). We also assume that no activation is applied at the output layer, i.e. ρ is the identity function, and all activations in the hidden layers are equal, i.e. for ℓ = 1, …, L and for i = 1, …, N, . In the ensuing numerical examples, we employ the so-called ReLU-activation for the hidden layers, which is given by Remark 4.1 More generally, for , we may choose the activations , observing that increasing the value of k increases differentiability of realizations of the DNN Φ. This may be of relevance in cases where the diffusion limits for large copy number counts of particular species imply higher smoothness of the map x ↦ p(T, x).

4.2 Loss function computation based on the training data

To numerically evaluate the loss function, we require simulated training trajectories of the form where X denotes the CTMC and each is the vector of centred reaction counting processes defined by (18). Such trajectories can be easily generated with Anderson’s modified next reaction (mNRM) method [6]. We discretise the time-interval [0, T] as Based on this partition, each simulated trajectory can be viewed as a collection of J + 1 triplets For each j we pass the time-state pair (t, X(t)) as input to the DNN-encoded matrix valued policy map to obtain . This allows us to compute iteratively as with . Here each is a R × 1 vector, is a R × K matrix and is a K × 1 vector. Following this scheme we can compute for the q-th simulated trajectory . With M such i.i.d. trajectories, the loss function (25) can be estimated as Here, we made use of (23). Remark 4.2 In the loss function (25) and its MC estimate (29), one could add a sparsity-promoting regularization term, in which case (25) would become Here, μ ≥ 0 is a penalty parameter and promotes sparsity in weights W Φ. In the numerical experiments we report we did not use this device. Remark 4.3 When the time-interval [0, T] is large, instead of using a single DNN to approximate the policy map , it may beneficial to employ multiple temporal DNNs that are uniformly distributed in the time-interval [0, T]. All these DNNs have the same structure, as shown in Fig 1. If N such DNNs are employed, then we use the m-th DNN to represent the policy map for t ∈ [(m − 1)δ, mδ) where m = 1, …, N and δ = T/N. Distributing DNNs across time would reduce the complexity of the policy map (as a function of time t) that is needed to be learned. This is helpful in scenarios where the stochastic dynamics has intricate temporal features, such as oscillations.

5 Examples

We now present four examples to illustrate our DeepCME method. All these examples are reaction networks with n species, denoted by X1, …, X, and 2n reactions. By varying n, we shall investigate how the DeepCME method performs as the network gets larger and compare its performance with simulation based methods. In all the examples, we assume that all the species have zero copy-numbers initially, and we consider two output functions g1(x) = x and whose expectation is to be estimated under the probability distribution given by the CME solution at time T = 1. In other words, we shall use DeepCME to estimate the first two moments of the copy-number of species X at time T, viz. We shall compare these estimates to those obtained by simulating 1000 CTMC trajectories with mNRM [6]. Our method DeepCME also yields estimates of the sensitivities of the estimated moments (31) w.r.t. all model parameters. We plot these estimates and compare them with the estimates obtained via the simulation-based Bernoulli Path Algorithm (BPA) [29]. These latter estimates are based on a sample of size 1000 and for each sample BPA requires generation of a certain number of auxiliary paths (see Section 2.3) which we set to be 10 in our examples. In all the examples, we encode the policy map as a DNN with L = 2 hidden layers and N = 4 nodes per layer (see Fig 1), irrespective of the number of species n. The activation function for all hidden layer nodes is ReLU(x) (see (28)) and we choose r = 1 for the temporal features (26) to transform the time-values. For loss function computation, we partition the time-interval [0, T] into J = 50 equal size time-increments. The neural network is trained with a training batch of M = 100 trajectories generated a priori with mNRM (see Section 4), and another such batch of M = 100 trajectories is used for validation. We display the loss function for the validation trajectories to track the training process. To facilitate comparison across network sizes, we normalise all the loss function trajectories to be one at the start of training. Note that the definition of our loss function (25) depends on certain threshold values Δ = (Δ1, Δ2). We choose these values as where (resp. ) denotes the sample mean (resp. standard deviation) of the values of the output function g for the trajectories in the training batch. Finally, to gauge the computational efficiency of DeepCME we compare the total central processing unit (CPU) time it requires (including the time to generate training and validation trajectories) to the total CPU time required by simulation-based approaches (mNRM and BPA) to estimate the expectations (31) and all its parameter sensitivities. All the computations were performed on the Euler computing cluster of ETH Zurich [39].

5.1 Independent birth death network

As our first example (see Fig 2A), we consider a network of n species that are all undergoing independent birth-death reactions We set k = 10 and γ = 1. The propensity functions obey mass-action kinetics and are hence affine functions of the state variable x.

Fig 2

Independent birth death network.

Independent birth death network.

(A) Depicts the network with n species and 2n reactions with mass-action kinetics. (B) The CPU times are shown for DeepCME for different values of n (denoted as # species), and for comparison the time needed by simulation based methods (mNRM for function estimates and BPA for parameter sensitivities) with 1000 trajectories is also indicated. (C) Plots the validation loss function w.r.t. training steps for various n values. Panels (D-F) show estimates for the function values ( and ) at T = 1 and the parameter sensitivities. The estimates with simulation based methods are shown as 95% confidence intervals with 1000 samples. For n = 5, 10, 20 species, we apply DeepCME to this reaction network by training the neural network for 10′000 SGD iterations. Based on the trained neural network, we compute estimates of the first two moments (31) and their sensitivities to both the model parameters k and γ. We also estimate these quantities with simulation-based methods (mNRM and BPA) with 1000 samples, and since the propensity functions are linear we can compute these quantities exactly as well. In plots shown in Fig 2D, 2E and 2F, we compare the estimates from all these approaches for various values of n. Observe that DeepCME is in general quite accurate in estimating both the moments and their parametric sensitivities, but there are a few cases where the error is significant (e.g. sensitivity w.r.t. γ for and n = 20). These errors can in principle be reduced by employing a different neural network to encode the policy map. In our experience, these errors were also reduced in some cases by including a sparsity promoting term in the loss function (see Remark 4.2) but the result was highly sensitive to the relative weight (i.e. parameter μ in (30)) of this term. The CPU time required by DeepCME and simulation-based methods for obtaining moment and sensitivity estimates are plotted in Fig 2B. Note that the CPU time for simulation-based methods grows linearly with the network size n, but for DeepCME this growth is sub-linear owing to the fixed structure of the underlying neural network. Despite this fixed structure, the validation loss function trajectories for DeepCME are similar for all n (see Fig 2C), indicating that the training process has low dependence on the number of species, probably because the species are evolving independently.

5.2 Linear signalling cascade

Our second example is a linear cascade with n-species (see Fig 3A), where species X catalyses the production of species X. The 2n reactions are given by We set β0 = 10, k = 5 and γ = 1. As in the previous example, all the propensity functions obey mass-action kinetics and are hence affine functions of the state.

Fig 3

Linear signalling cascade.

Linear signalling cascade.

(A) Depicts the network with n species and 2n reactions with mass-action kinetics. (B) The CPU times are shown for DeepCME for different values of n (denoted as # species), and for comparison the time needed by simulation based methods (mNRM for function estimates and BPA for parameter sensitivities) with 1000 trajectories is also indicated. (C) Plots the validation loss function w.r.t. training steps for various n values. Panels (D-F) show estimates for the function values ( and ) at T = 1 and the parameter sensitivities. The estimates with simulation based methods are shown as 95% confidence intervals with 1000 samples. For number of species n = 2, 5, 10, we apply DeepCME to this reaction network by training the neural network for 10′000 SGD iterations. Then we compute the moment estimates (31) and their sensitivities to all the model parameters. These quantities are also estimated with simulation-based methods (mNRM and BPA) with 1000 samples, and as with the previous example, the linearity of the propensity functions enables us to compute these quantities exactly as well. In the plots shown in Fig 3D, 3E and 3F, we compare the estimates from all these approaches for various values of n. Observe that DeepCME is accurate in estimating the moments but some of the parameter sensitivity estimates are not very accurate (e.g. sensitivity w.r.t. k for and n = 10). This is because the training process is not successful, as indicated by the validation loss function trajectories shown in Fig 3C. The CPU times for DeepCME and simulation-based methods are plotted in Fig 3B, and as expected they show sub-linear growth w.r.t. n for the former but linear growth for the latter.

Linear signalling cascade (continued).

In panels (A) and (B) we illustrate that the accuracy of the DeepCME estimates remains unaltered when the DNN shape parameters—N (number of nodes per layer) and L (number of hidden layers)—are doubled from their default values of N = 4 and L = 2. In panel (C) we illustrate that the accuracy of these estimates improves when the number of training trajectories is increased from M = 100 to M = 500. It is natural to ask if the accuracy of the estimates provided by DeepCME for n = 10 can be improved by making the DNN “deeper” (by increasing the number of hidden layers L) or “wider” (by increasing the number of nodes per layer N). We tested this by doubling each of these shape parameters, while keeping the other the same, and rerunning the DeepCME training procedure. As results in Fig 4A and 4B indicate, changing the DNN shape parameters did not improve the accuracy of the estimates. However we found that when we increase the number of training trajectories (see Fig 4C), the accuracy of the estimates does improve and this improvement is quite substantial in some cases (e.g. sensitivity w.r.t. γ for and n = 10).

Fig 4

Linear signalling cascade (continued).

5.3 Nonlinear signalling cascade

We now consider a variant of the network in the previous example where the catalytic production of species X by species X is non-linear (see Fig 5A) and is given by a activating Hill propensity with a basal rate where b = 1, k = 100, k0 = 10 and H = 1. Other reactions have mass-action kinetics as in the previous example, with the same rate constants β0 = 10 and γ = 1.

Fig 5

Nonlinear signalling cascade.

Nonlinear signalling cascade.

(A) Depicts the network with n species and 2n reactions. The reactions shown with red dashed-arrow have propensities given by a nonlinear activating Hill function (32). Other reactions have mass-action kinetics. (B) The CPU times are shown for DeepCME for different values of n (denoted as # species), and for comparison the time needed by simulation based methods (mNRM for function estimates and BPA for parameter sensitivities) with 1000 trajectories is also indicated. (C) Plots the validation loss function w.r.t. training steps for various n values. Panels (D-F) show estimates for the function values ( and ) at T = 1 and the parameter sensitivities (only the significant sensitivities are shown). The estimates with simulation based methods are shown as 95% confidence intervals. For number of species n = 2, 5, 10, we apply DeepCME to this reaction network by training the neural network for 10′000 SGD iterations. Then we compute the moment estimates (31) and their sensitivities to all the model parameters. These quantities are also estimated with simulation-based methods (mNRM and BPA) with 1000 samples, and unlike previous examples we cannot compute these quantities exactly due to nonlinear propensities. In the plots shown in Fig 5D, 5E and 5F, we compare the estimates from DeepCME and simulation-based approaches for various values of n. Observe that DeepCME is reasonably accurate in estimating the moments and their parametric sensitivities for all values of n. The success of the training process is shown by the validation loss function profiles in Fig 5C. Note that these loss functions increase monotonically with n and this is consistent with the observation that errors in DeepCME-estimated quantities increase with n (see Fig 5D, 5E and 5F). The CPU times for DeepCME and simulation-based methods are displayed in Fig 5B, and as in the previous examples they show sub-linear growth w.r.t. n for the former but linear growth for the latter.

5.4 Linear signalling cascade with feedback

Lastly we consider another variant of the network in the second example where there is negative feedback in the production of X1 from species X (see Fig 6A) which is given by a repressing Hill function with a basal rate where b = 1, k = 100, k0 = 10 and H = 1. Other reactions have mass-action kinetics as in the second example, with the same rate constants k = 5 and γ = 1. Due to the presence of feedback, oscillations can arise in the dynamics and to better represent this temporal dependence of the policy map we encode it with N = 5 identical DNNs (see Remark 4.3).

Fig 6

Linear signalling cascade with feedback.

Linear signalling cascade with feedback.

(A) Depicts the network with n species and 2n reactions. The reaction shown with red dashed-arrow has propensity given by a nonlinear repressing Hill function (33). All other reactions have mass-action kinetics. (B) The CPU times are shown for DeepCME for different values of n (denoted as # species), and for comparison the time needed by simulation based methods (mNRM for function estimates and BPA for parameter sensitivities) with 1000 trajectories is also indicated. (C) Plots the validation loss function w.r.t. training steps for various n values. Panels (D-F) show estimates for the function values ( and ) at T = 1 and the parameter sensitivities (only the significant sensitivities are shown). The estimates with simulation based methods are shown as 95% confidence intervals. For number of species n = 2, 5, 10, we apply DeepCME to this reaction network by training the neural network for 10′000 SGD iterations. Then we compute the moment estimates (31) and their sensitivities to all the model parameters, and we also estimate these quantities with simulation-based methods (mNRM and BPA) using 1000 samples. In the plots shown in Fig 6D, 6E and 6F), we compare the estimates from both these approaches for various values of n. Observe that DeepCME is quite accurate in estimating the moments for n = 2, 5 and the parametric sensitivities for only n = 2. For n = 5, 10 only the sensitivities for are accurate but the sensitivities for are not accurate with our neural network architecture. The validation loss function trajectories are shown in Fig 6C. The CPU times for DeepCME and simulation-based methods are plotted in Fig 6B, and they show a similar growth pattern as our earlier examples.

6 Conclusion

Over the past couple of decades, stochastic reaction network models have become increasingly popular as a modelling paradigm for noisy intracellular processes. Many consequential biological studies have experimentally highlighted the random dynamical fluctuations within living cells, and have employed such stochastic models to quantify the effects of this randomness in shaping the phenotype at both the population and the single-cell levels [40]. As experimental technologies continue to improve at a rapid pace, it is urgent to develop computational tools that are able to bring larger and more realistic systems within the scope of stochastic modelling and analysis. The central object of interest in stochastic reaction network models is a high-dimensional system of linear ODEs, called the Chemical Master Equation (CME). Numerical solutions to the CME are difficult to obtain and commonly used simulation-based schemes to estimate the solutions often require an inordinate amount of computational time, even for moderately-sized networks. Inspired by the recent success of machine learning approaches in solving high-dimensional PDEs [13], our goal in this paper is to devise a similar strategy, based on deep reinforcement learning to numerically estimate solutions to CMEs. We develop such a method, called DeepCME, and we illustrate it with a number of examples. The neural network we train in DeepCME provides estimates for expectations based on the CME solution and in principle it also provides estimates for the sensitivities of these expectations w.r.t. all the model parameters without any extra effort. Such parametric sensitivities are important for many applications, such as evaluating a network’s robustness properties [41] or identifying its critical components [42], but they are even more difficult to estimate than solutions to the CME [23-31]. Our work opens up several directions for future research. The machine-learning based computational framework and the mathematical formulation which we provide allows one to deploy and transfer strong ML methodologies to the quantitative analysis and to data assimilation into complex CRNs. The present, basic approach can be improved/extended in a number of ways. Firstly, it needs to be investigated how the architecture of the neural network can be optimally selected, for improved convergence of the training process, based on the CRN model. Overparametrized neural network architectures may be regularised by adding suitable weight-bias penalties in the loss-function. The resulting improved convergence will increase the accuracy of the estimates provided by DeepCME, especially for the parameter sensitivities, and reduce the number of trajectories needed for the neural network training. Secondly, although the presently proposed framework requires relatively few ‘exact’ stochastic simulations of the dynamics, it could nevertheless be computationally prohibitive for many large biological networks, especially if they are multiscale in nature [43, 44]. It might be possible to improve efficiency by replacing exact simulations with τ-leaping simulations [20], multi-level schemes [21] or with simulations based on reduced models for multiscale networks [16, 43–45]. Incorporating such approaches for generating training trajectories would make our approach computationally feasible for much larger networks than those considered here. In particular, multi-level simulation schemes which are based on coupling techniques [21] would allow one to construct a lower variance estimator for the loss function (29). This could in turn benefit the accuracy and the convergence of the training process (see, e.g. [46] for the development of multilevel DNN training algorithms, albeit in another class of applications). In the context of multiscale networks, identifying the appropriate copy-number scalings that give rise to reduced models with simpler dynamics is a highly specialised task requiring careful theoretical analysis [16]. However our approach can be extended to “learn” these scaling factors during the training process by including them as trainable subnetworks into the ML feature space and employing them to scale the state-vectors in the input layer of the DNNs (see Fig 1). It is quite possible that incorporating these scaling factors would enhance the expressivity of the DNN. Thirdly, the parameter sensitivities that we compute in our method could be employed in an ‘outer’ gradient descent method with the purpose of inferring model parameters by matching the computed statistics of CME solution with experimental data [47]. On the theoretical front, greater mathematical effort is required to understand how deep reinforcement-learning approaches can help in circumventing the curse of dimensionality inherent to CMEs. Alternative training approaches, such as Generative Adversarial Nets (GANs), may also be suitable for acceleration of the training process (see, e.g., [48]). Finally, the architecture of the DNNs may include feature spaces comprising parametric dictionaries of motifs, which are adjusted during training to the reaction rates and to the kinetics of the CRN under consideration. 31 Jul 2021 Dear Prof. Khammash, Thank you very much for submitting your manuscript "DeepCME: A deep learning framework for solving the Chemical Master Equation" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, James R. Faeder Associate Editor PLOS Computational Biology Daniel Beard Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: review is uploaded as attachment Reviewer #2: At a high level, the authors are using machine learning techniques (basic feedforward neural networks) to solve for (time dependent) expectations of stochastically modeled reaction networks. I think that this paper has the potential to be very good. It is the rare paper I review in which I think to myself "I wish I had done this!". Here are my main comments/critiques that I would like the authors to consider. 1. The title claims to solve the chemical master equation (Kolmogorov's forward equation). However, the paper is actually focused on solving for expectations. Now, I agree that a solution to the CME can come from using indicator functions. However, that does not really seem to be a natural application for this method (and there are no examples showing if the method is good for that). 2. A very important part of the paper is Theorem 3.3. More precisely: equation (19), which gives an equality that must hold (almost surely) for every *path*. In my view, the hear of the proof is that m(t) (unnumbered equation between (23) and (24)) is a local martingale. A few more words can be given here (or a reference) proving/saying that \\Delta \\hat V_k(s,X(s)) is adapted to the proper filtration and \\tilde R_k is a local martingale, thus.... Pointing to the proper reference will be helpful to readers not familiar with these things. 3. Page 10. Why was \\phi as given chosen? (this is minor) 4. This is my only major critique. I could not figure out exactly what the input to the NN was. This is explained on pages 10 and 11, but I did not understand it. Of course, I would really like to know how a path gets inputed since this is an absolutely key part of the method. I wonder if a simple example would be helpful here (I would suggest the birth death process of section 5.1). 5. Why were the NN's used so small (L = 2 and N_H = 4)? 6. The method does not seem viable at this stage for computing sensitivities. Perhaps soften your language to something along the lines of "we note that sensitivities can theoretically be computed as well via these methods" you could then give the reasoning as is. However, you could point out that it sometimes works and sometimes doesn't (since the examples are all over the map), and then point to this as future research. Reviewer #3: The DeepCME is a novel deep learning framework for numerical solution of the chemical master equation (CME). The authors reformulate the problem of obtaining expectation of specific functions of state space and their sensitives using the Kolmogorov's backward equation (theorem 3.3). Using this reformulation they construct appropriate loss functions to train deep neural networks using a reinforcement framework using relatively small number of stochastic simulations (SSA). They demonstrate the power and utility of the their method using a series of examples. This is a timely, well written and important study. I have the following specific comments: - Could the authors provide some insight the advantage of the specific formulation compared for example to forward equation in solving CME using deep neural networks? - At the beginning of section 5 the author's describe specific choice of a large number of hyper parameters (number of layer's, nodes, etc). Could the authors explore the significant of at least some of these choices in an example on the performance? - The author's show the CPU time required for direct SSA vs DeepCME. Could the author's also make comparison in term's of number of simulations used in the training and how does that change with n and the quality if the results? ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Vahid Shahrezaei Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Submitted filename: Gupta_review.pdf Click here for additional data file. 10 Sep 2021 Submitted filename: ReviewerResponse.pdf Click here for additional data file. 18 Oct 2021 Dear Prof. Khammash, Thank you very much for submitting your manuscript "DeepCME: A deep learning framework for computing solution statistics of the Chemical Master Equation" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. In particular, Reviewer 2 is asking for clarification on the exact form of input to the neural net, which would be needed to replicate the method. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, James R. Faeder Associate Editor PLOS Computational Biology Daniel Beard Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have sufficiently addressed my comments. Reviewer #2: The manuscript has improved. However, one of my main worries about the manuscript (and apparently this worry was also shared by Reviewer #1) remains unresolved. In particular, the inputs to the NN are still quite unclear to me. Let me be very explicit: I have no idea what is going on with the lambda's and phi's of what is now section 4.1. The authors describe them as "temporal features", but that doesn't mean anything to me. A trajectory is simply a listing of states and times. The authors need to explain how a trajectory is inputed into the NN. If there is a mapping to the lambda's and phi's somehow, then so be it, but please tell us clearly what that mapping is. At this point I would not be able to implement this method as the whole portion on "temporal features" is mysterious. Here are some smaller comments: 1. In the abstract it again says “The goal… estimating solutions of high-dimensional CMEs…” this is not really what it is doing. The next line is more accurate. Maybe merge and fix? (Except the next line says "Arbitrarily chosen functions" which seems strange) 2. (Minor). Line 63. “Which for” to “for which” 3. Line 123, this is super small but in the definition of the Q matrix you seem to be assuming that each reaction vector is uniquely associated with a reaction. This is not always the case. 4. Line 143. “On which our deep learning approach depends on” — one too many “on”s 5. Same line. Which is same —> which is the same. 6. Line 250. Why is this line here: “The relation between Φ and its realisation R(Φ) as a map is not one-to-one.” I guess it’s true: there can be lots of choices of rho’s and T’s that give R. But is this important (i.e., I’m concerned I’m missing something)? Reviewer #3: The authors have addressed all of my concerns in the revised manuscript. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: None Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 1 Nov 2021 Submitted filename: ReviewerResponse2.pdf Click here for additional data file. 8 Nov 2021 Dear Prof. Khammash, We are pleased to inform you that your manuscript 'DeepCME: A deep learning framework for computing solution statistics of the Chemical Master Equation' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, James R. Faeder Associate Editor PLOS Computational Biology Daniel Beard Deputy Editor PLOS Computational Biology *********************************************************** 22 Nov 2021 PCOMPBIOL-D-21-01072R2 DeepCME: A deep learning framework for computing solution statistics of the Chemical Master Equation Dear Dr Khammash, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

21 in total

DeepCME: A deep learning framework for computing solution statistics of the chemical master equation.

1 Introduction

2 Preliminaries

2.1 The stochastic model

2.2 Kolmogorov’s forward and backward equations

2.3 Parametric sensitivity analysis

3 Main results

4 DeepCME: Deep learning formulation for CME

4.1 DNN encoding of the policy map

Architecture of the neural network.

4.2 Loss function computation based on the training data

5 Examples

5.1 Independent birth death network

Independent birth death network.

5.2 Linear signalling cascade

Linear signalling cascade.

Linear signalling cascade (continued).

5.3 Nonlinear signalling cascade

Nonlinear signalling cascade.

5.4 Linear signalling cascade with feedback

Linear signalling cascade with feedback.

6 Conclusion

1. The slow-scale stochastic simulation algorithm.

2. A modified next reaction method for simulating chemical systems with time dependent propensities and delays.

3. The finite state projection algorithm for the solution of the chemical master equation.

4. Adaptive hybrid simulations for multiscale stochastic reaction networks.

5. Solving high-dimensional partial differential equations using deep learning.

6. A finite state projection algorithm for the stationary solution of the chemical master equation.

Review 7. Functional roles for noise in genetic circuits.

8. Direct solution of the Chemical Master Equation using quantized tensor trains.

9. A scalable computational framework for establishing long-term behavior of stochastic reaction networks.

10. Systems biology informed deep learning for inferring parameters and hidden dynamics.

1. Approximating solutions of the Chemical Master equation using neural networks.