Literature DB >> 25620846

New "Tau-Leap" Strategy for Accelerated Stochastic Simulation.

Doraiswami Ramkrishna¹, Che-Chi Shu¹, Vu Tran¹.

Abstract

The "Tau-Leap" strategy for stochastic simulations of chemical reaction systems due to Gillespie and co-workers has had considerable impact on various applications. This strategy is reexamined with Chebyshev's inequality for random variables as it provides a rigorous probabilistic basis for a measured τ-leap thus adding significantly to simulation efficiency. It is also shown that existing strategies for simulation times have no probabilistic assurance that they satisfy the τ-leap criterion while the use of Chebyshev's inequality leads to a specified degree of certainty with which the τ-leap criterion is satisfied. This reduces the loss of sample paths which do not comply with the τ-leap criterion. The performance of the present algorithm is assessed, with respect to one discussed by Cao et al. (J. Chem. Phys.2006, 124, 044109), a second pertaining to binomial leap (Tian and Burrage J. Chem. Phys.2004, 121, 10356; Chatterjee et al. J. Chem. Phys.2005, 122, 024112; Peng et al. J. Chem. Phys.2007, 126, 224109), and a third regarding the midpoint Poisson leap (Peng et al., 2007; Gillespie J. Chem. Phys.2001, 115, 1716). The performance assessment is made by estimating the error in the histogram measured against that obtained with the so-called stochastic simulation algorithm. It is shown that the current algorithm displays notably less histogram error than its predecessor for a fixed computation time and, conversely, less computation time for a fixed accuracy. This computational advantage is an asset in repetitive calculations essential for modeling stochastic systems. The importance of stochastic simulations is derived from diverse areas of application in physical and biological sciences, process systems, and economics, etc. Computational improvements such as those reported herein are therefore of considerable significance.

Entities: Chemical Disease Gene

Year: 2014 PMID： 25620846 PMCID： PMC4299402 DOI： 10.1021/ie502929q

Source DB: PubMed Journal: Ind Eng Chem Res ISSN： 0888-5885 Impact factor: 3.720

Introduction

In recent times, stochastic simulations have assumed extraordinary importance in modeling biological and nano systems. The importance of the Monte Carlo simulation methodology arises from its capacity to substitute for the less tractable master equation and mean field approaches. This capacity is derived from its basic simplicity which could, however, be seriously compromised if computational times become forbiddingly large because of the repetitive attribute of the methodology. Consequently, considerable effort has come to pass on finding efficient simulation strategies so that computation of statistical averages of stochastic behavior can become facile. The issue of efficiency arises in negotiating accuracy with sacrifice of rigor or through crafty approximations. In stochastic systems, however, the validity of approximations is uncertain because they invariably involve future values of random variables in the system leading to conservative and hence less efficient heuristics. The remedy must lie in the use of mathematical propositions concerning probabilities with which approximations about random variables are true. It is the purpose of this work to aid in the quest for efficient stochastic simulation strategies by exploiting an important inequality that could greatly rationalize the process. This is Chebychev’s inequality that is routinely encountered in the treatment of random variables.[1] The inequality is concerned with the probability with which a random variable deviates from its expectation in terms of its variance. We are concerned here with a stochastic system of chemical transformations involving several species at a fixed temperature. Temperature variation can be captured in the reaction rate constants which can, in turn, influence the rate at which the propensity function changes. However, this temperature change needs to be small since one big assumption made throughout the development of τ-leap is to assume the propensity function stays constant during each leap. The stochastic behavior arises out of the randomness in reaction rates and is reflected in the numbers of different reaction species at different times. Simulation of this system can be performed with an exact algorithm that has come to be known as SSA (stochastic simulation algorithm). The term “exact” is derived from the fact that the simulation is based on rigorously derived distributions of “waiting times” during which the system remains “quiescent” without any reaction event. While the methodology is usually attributed to Gillespie,[2] a basic version of the idea appeared in a paper by Kendall[3] who used the term “quiescent interval” instead of “waiting time”. This method was generalized for the simulation of population balance equations in a doctoral thesis by Shah,[4] subsequently published in the chemical engineering literature[5] independently of Gillespie’s publication[2] in the geophysical literature. Some further insights as to the connection of the foregoing algorithm to the master equation for population balances are given by Ramkrishna.[6] The virtue of SSA’s exactness is computational accuracy but at the cost of large computation times. SSA also provides the starting point for various improvements pursued by researchers in this area to minimize the cost of computation. The term “tau-leap”, coined by Gillespie and co-workers, aptly describes the underlying strategy of “leaping” over relatively less significant events (as reflected by minimal changes in the so-called “propensities” or transition rates of individual reactions). For all of the methods discussed in this work, each leap is checked after each calculation. Anderson’s method[7] is a precheck method and hence produces a very different mechanism of filtering leaps. For any given selection of ε, postcheck methods inevitably encounter some ineffective leap steps at which the leap condition is violated. Depending upon the number of violations and the extent of each violation, the accuracy of the simulation will be affected. Our method introduces a way to promote the likelihood of acquiring more efficient leap steps by appropriate increase of δ. Anderson’s method introduces a procedure to do precheck before performing a calculation of leaps. Only steps where the criteria are satisfied will continue with the calculation of the corresponding leaps. Moreover, only the first step of picking τ will follow the formula of Cao et al.[8] The subsequent steps of evaluating τ strictly follow their procedure. For each new calculated τ, it is then evaluated with the leap condition. Depending upon the value of τ, different scaling factors can be applied to adjust τ. If a leap for ε = ε′ is rejected, τ will be decreased by multiplying with some p < 1. If a leap is accepted for ε = ε′ but would fail if ε = 3ε′/4, it will be reduced by multiplying it by some p* that satisfies p < p* < 1. If a leap is accepted for both cases, τ can be enlarged by raising it to the power of q such that q is between 0 and 1. Thus, the algorithm for precheck will only accept those τ that satisfy the leap condition. However, in stochastic simulations, good results can still be accomplished, as long as the number of unsatisfied leaps is low. Every time a leap is rejected, a new calculation will need to be made; therefore the question is a balance between accuracy and speed. Anderson’s approach focuses on checking the leap condition, whereas all other postcheck methods focus on the issue of how to select a τ value. Therefore, the method cannot be included in our comparison of postcheck methods with the approach of this work. For that reason, this work will exclusively focus on comparison among postcheck methods. Negative population is another problem associated with the usage of Poisson random number generation. In the later version of the method of Cao et al.[8] the author introduced ways of capturing and treating critical reactions in which the negative population might occur; Peng et al. implemented a binomial leap strategy[9−11] to resolve the problem. In addition to the preceding treatments, we implement a novel strategy using the Chebyshev inequality to improve the simulation process. In this work, we show how the Chebyshev inequality can be used to craft this temporal leap by specifying the probability with which it influences the deviations in the stochastic numbers of chemical species from their expected values. In so doing, following Cao et al.,[8] we retain the Taylor series expansion of the propensities about the state of the system specified at instant t truncated beyond the quadratic terms. A new algorithm is derived simply using different probabilistic criteria for approximations.

Chebyshev Inequality

We recall that for any random variable X with finite expectation EX and variance VX the following inequality[1] is satisfied for an arbitrary ε > 0.For our purposes, the preceding inequality may be restated asClearly inequality 2 stipulates the minimum of the probability with which the random variable X deviates from EX by less than ε.

Development of the New “Tau-Leap” Strategy

As observed earlier, we adhere to the stochastic system formulated by Gillespie and Petzold.[8] Thus, there are N chemical species in numbers represented by the random vector X, involved in M transformations (reactions) with the vector ν ≡ {ν, j = 1, 2, ..., N} denoting the change in the numbers of species due to each jth reaction event. Further, the reaction propensity vector, conditional on X = x, is represented by {a(x); j = 1, 2, ..., M}. If at time t, X(t) = x, then after time τ (i.e., τ > 0), we have[1]in which K(τ;x) represents the number of jth reaction events during the interval from t to t + τ. For the formulation of τ-leap, K(τ;x) is assumed to be a Poisson process with mean and variance given by a (x) τ, notwithstanding the change in X during this time interval. Thus, (3) is replaced byThe “leap condition” of Gillespie and Petzold[1] is given bywhere the symbol ε is to be distinguished from that appearing in the Chebyshev inequality (1) or (2). Other replacements considered by these authors for ao(x) do not affect the treatment of this work. DefiningThe change in propensity during the time interval between t and t + τ is given byThe random variable Δa (τ;x), in view of the approximation in (8), has expectation EΔa (τ;x) ≡ μ(x)τ and variance VΔa(τ;x) = σ2(x)τ. Gillespie and Petzold choose the interval τ in accord with We now consider the application of the Chebyshev inequality to the random variable Δa(τ;x) toward determining a suitable criterion that will ensure the leap condition (5). We observe first that the triangular inequality implies thatBy imposing thatinequality (10) implies the leap condition (5). From (11), it follows thatThe preceding inequality does require that the right-hand side be positive so thatto the right of which is a potential candidate for τ as prescribed by Cao et al.[8] Invoking the Chebyshev inequality (2) by replacing the ε which appears there by (εao(x) – |μ(x)|τ), we obtainSince the right-hand side of (13) must be nonnegative, it is readily found that an upper bound for τ is ao2ε2/σ2, thus recovering the second candidate for τ-leap due to Gillespie and Penzold. However, we produce a more effective choice of τ-leap in this development. If we let the right-hand side of (13) be appropriately close to unity, a choice becomes available for τ. Thus, setting its value equal to δ (e.g., δ = 0.95), we have (dropping henceforth the argument x for notational simplicity)which obtains the following solution for τ, the subscript referring to its being specific to the jth reaction.The positive solution is in conflict with (12) so that we must seek only the smaller root with the negative sign which is indeed less than ao(x)ε/|μ(x)|. Thus, we havewhich is of course consistent with (12). We conclude from this analysis that the required τ is given bywhich assures the required leap condition (5) with a probability of at least δ. It is next of interest to compare the development here with the Gillespie–Petzold solution (hereafter referred to as GPS). Because (12) represents a common requirement, we only need consider the other alternative ε2ao2/σ2 for τ with that arising from expression 16. In passing, the Cauchy–Schwartz inequality (which states that the absolute value of the inner product between any two vectors is at most equal to the product of their norms) can be used on (7) to show thatwhich serves to further calibrate the plot of δ versus τ. We return to compare our prescription for τ to that by GPS. Suppose a specific value of ε is chosen. If we begin with choosingwhich implies that aoε/|μ| < ε2ao2/σ2, so that from GPS τ = aoε/|μ|. For this ε, Figure 1a shows the layout of δ and τ and thus the extent to which they may be negotiated for an efficient simulation as guided by Chebyshev’s inequality. In particular, it would appear that τ should in fact be safely less than aoε/|μ| (instead of being equal to it as in GPS) so that δ has a suitably positive value. Of course the resulting τ-leap would be lowered, but sample paths generated from the process would more reliably satisfy the τ-leap condition (5) which is indicative of higher simulation efficiency. It is noteworthy that the GPS choice is not backed by the probabilistic assurance of satisfying the τ-leap condition (5). If, on the other hand, ε < σ2/ao|μ|, the GPS prescription would call for τ = ao2ε2/σ2 < σ2/|μ|2; furthermore, it can be shown (see Appendix in the Supporting Information) that τ > τ0, where τ0 is obtained from (16) by setting δ = 0; thus the δ – τ scenario would change to that shown in Figure1b. Thus, regardless of the choice of τ in (9), Figure 2 locates it in the domain of negative δ thus denying any probabilistic assurance that the τ-leap condition 5 would be satisfied. It does not preclude, however, the generation of meaningful sample paths for the process.

Figure 1

Figure 2

Comparison of the accuracy for the histogram between the two methods: binomial method[13] and the present method for ε = 0.02.

Plots showing how τ (indicated as τ) may be chosen for various values of δ, the probability with which the τ-leap condition is satisfied for the case ε > σ2/ao|μ|. (a, top) The GPS solution for τ appears at the asymptote! (b, bottom) The GPS obtains τ in the region for which δ < 0. Comparison of the accuracy for the histogram between the two methods: binomial method[13] and the present method for ε = 0.02. As long as τ is chosen to satisfy (12), there is no explicit restriction on the choice of ε, although its smallness would indeed govern the quality of the τ-leap criterion. Throughout the simulation, both the method of Cao et al. as well as the current method encounter a number of defective steps at which the leap condition fails. As the number of these wasteful steps increases, the accuracy will be negatively influenced (i.e., it will require a higher number of sample paths to accomplish a fixed accuracy). Cao et al. proposed a procedure to generate leaps but never proved mathematically how it can be satisfied. Hence, some generated leaps might violate the leap condition badly and affect strongly the overall quality of the simulation. One way to avoid this problem and therefore to optimize the process is to increase the probability of getting a good selection of leaps. Two different ways which can accomplish this goal are as follows. Clearly, smaller values of ε will be progressively conservative and hence less effective in reducing computation time. On the other hand, if we fix ε, the use of the Chebyshev inequality may be regarded as a way to reduce the number of wasteful simulations that violate the τ-leap criterion with the stipulated ε. From (17), it is transparent that, for fixed ε, as δ → 1, τ → 0, which would be undesirable. Higher values of τ can be chosen by negotiating the value of δ to be suitably less than 1. While it may at first seem that the lower leaps in our approach may, not surprisingly, lead to more accurate solutions, it must be understood that the methodology lies in its efficiency in that a solution of given accuracy is obtained with a smaller number of simulations. The proposed method thrives in the facility to manipulate both the parameters ε and δ to promote efficiency. The computational demonstration to follow would of course confirm the foregoing observations. The development of this work is readily adapted to the new τ-leap method presented by Cao et al.,[8] who relate the relative change in propensity to that of the stochastic state variables involved. For example, if a(x) involves a first order reaction in the ith species (alone), we haveFor the foregoing case, the τ-leap condition in the jth propensity translates to |ΔX| ≤ εx in state variable domain. For more general cases, Cao et al.[8] obtain the τ-leap condition in the state variables aswhere g(≥1) represents the highest order of reaction with respect to species i. Note that (21) implies a change of at least one molecule; for an explanation of this and other aspects of this algorithm, the reader is referred to Cao et al.’s report.[8] The application of our methodology leads to the following expression for τ which will satisfy the leap condition for X with probability of at least δ.where α ≡ max{εx/g,1}, μ̂(x) ≡ Σνa(x), and σ̂2(x) ≡ Σν2a(x). The efficacy of the proposed algorithm is shown by comparing it with that of Cao et al.[8] The stochastic algorithm is usually employed to obtain the average behavior of the system for a chosen time interval by averaging several sample path simulations using the algorithm. The SSA[8,12] serves as a benchmark in evaluating the accuracy of any algorithm by comparing the histogram obtained by sample path averaging with that of SSA. In this regard, we use the same metric as that used by Cao and Petzold.[13] The comparison between the algorithms presented in this work with others has been made in two different ways. Thus, we compare the accuracies obtained for a fixed computation time, which is more focused on accuracy than computation time. Alternatively, where accuracy is prescribed, computation times can be compared for any two algorithms. We employ two different examples for our demonstration. They are (i) the Schlogl model noted for its bistability and (ii) a linear consecutive reaction system. The algorithms used for comparison are those due to Cao’s method,[8] Gillespie’s midpoint Poisson τ-leap method,[11,12] and the binomial leap method[9,10] subsequently improved by Peng et al.[11]

Examples

Example 1

We consider the consecutive linear reaction system as in (23).We apply our τ selecting method with formula 22 for the generation of τ. With rate constants c1 = 1 and c2 = 1 and initial conditions X1 = 104, X2 = 1 and X3 = 0, we calculate X3 until time is equal to 0.1.

Example 2

Schlogl’s chemical reaction model is shown in (24). This model is noted for its bistability. B1 and B2 are constants with their particle numbers as N1 and N2, respectively.The values of parameters, adapted from Cao et al.,[8] are c1 = 3 × 10–7, c2 = 10–4, c3 = 10–3, c4 = 3.5, N1 = 1 × 105, and N2 = 2 × 105. The initial condition of X is 250, and we simulate the system up to t = 4.

Discussion of Results

Figure 2 shows the probability density for X3 at t = 0.1 for SSA, the present method and the binomial method of Peng et al.,[11] using ε = 0.02. The higher and close proximity of the distribution by the present method to the SSA is clearly evident. Table 1 shows the number of sample paths used for averaging is only 10,000 for an accuracy notably higher than that from the algorithm under comparison. Although the computational time for each sample path for the present method is higher, a more accurate solution is obtained with 40% higher speed. Figure 3 shows the accuracy of the algorithms in comparison as a function of simulation time at different values of ε. Histogram distance errors are measured by 106 samples and 105 samples generated from the SSA method and the two τ-leap methods, respectively, at different values of ε. For the same simulation time, the accuracy of the histogram is notably higher for the present method.

Table 1

Comparison of Results Generated from Two Different Methods: Binomial Method[13] and the Present Method for the Consecutive Linear Reactiona

	binomial method	present method
no. of trajectories	30,000	10,000
ε	0.02	0.02
δ		0.667
histogram error	0.0090	0.0018
total simulation time (s)	5.417 × 10³	3.269 × 10³ (40% faster)

This table shows all of the parameters and results shown in Figure 2.

Figure 3

Comparison of histogram error corresponding to different simulation times for the two methods being used to model the consecutive linear reaction system. The binomial method of Peng et al. is shown in red whereas ours is shown in blue. This table shows all of the parameters and results shown in Figure 2. Figure 4 shows comparison of the histogram error as a function of simulation time for the present method against that by Cao et al.[8] for example 2. As in example 1, the notably higher accuracy of the current method is evident for this example also. Figure 5 compares the algorithm of Gillespie’s midpoint Poisson method with the present method for the same system. In both Figures 4 and 5, the number of SSA samples is 106. The comparison was made based on a fixed number of simulations for those methods at different values of ε. The simulation time for the former is considerably higher than that for the latter. When higher accuracy is needed, the computational time needed for Gillespie’s method is substantially longer. Another verification for our claim is shown in Figure 6. In this figure, ε is fixed to be 0.2, and results generated using two different methods at different numbers of samples were then compared. Similarly, we observed that our present algorithm improves both speed and accuracy. Although the computation time for each sample path is notably larger for the present work because of smaller time intervals, the higher efficiency of the resulting sample path allows accurate results with a significantly smaller number of sample paths as established in Figures 3, 4, and 5. Indeed the foregoing results bear testimony to notable improvements with our algorithm arising from the use of Chebyshev’s inequality.

Figure 4

Figure 5

Comparison of histogram error with respect to different simulation times of two different methods: Gillespie’s midpoint Poisson method in red and ours in blue. This plot shows the accuracy level of the two methods for Schlogl’s system at different ε.

Figure 6

Comparison of histogram error with respect to different simulation times of two different methods: the method of Cao et al.[8] in red and ours in blue. This plot shows the accuracy level of the two methods for Schlogl’s system at different ε. Comparison of histogram error with respect to different simulation times of two different methods: Gillespie’s midpoint Poisson method in red and ours in blue. This plot shows the accuracy level of the two methods for Schlogl’s system at different ε. Comparison of histogram error with respect to different simulation times of two different methods: Gillespie’s midpoint Poisson method in red and ours in blue. This plot shows the accuracy level of the two methods for Schlogl’s system at different numbers of samples at ε = 0.2.

Conclusion

The algorithm presented in this work has attributes of efficiency earned from being able to account for the likelihood with which approximations for τ-leap are satisfied by simulated sample paths. The choice of this probability is a rational guideline to Monte Carlo simulations.

5 in total

1. Binomial leap methods for simulating stochastic chemical kinetics.

Authors: Tianhai Tian; Kevin Burrage
Journal: J Chem Phys Date: 2004-12-01 Impact factor: 3.488

2. Binomial distribution based tau-leap accelerated stochastic simulation.

Authors: Abhijit Chatterjee; Dionisios G Vlachos; Markos A Katsoulakis
Journal: J Chem Phys Date: 2005-01-08 Impact factor: 3.488

3. Efficient binomial leap method for simulating chemical kinetics.

Authors: Xinjun Peng; Wen Zhou; Yifei Wang
Journal: J Chem Phys Date: 2007-06-14 Impact factor: 3.488

4. Efficient step size selection for the tau-leaping simulation method.

Authors: Yang Cao; Daniel T Gillespie; Linda R Petzold
Journal: J Chem Phys Date: 2006-01-28 Impact factor: 3.488

5. Incorporating postleap checks in tau-leaping.

Authors: David F Anderson
Journal: J Chem Phys Date: 2008-02-07 Impact factor: 3.488

5 in total

2 in total

1. On speeding up stochastic simulations by parallelization of random number generation.

Authors: Che-Chi Shu; Vu Tran; Jeremy Binagia; Doraiswami Ramkrishna
Journal: Chem Eng Sci Date: 2015-12-01 Impact factor: 4.311

2. Analytical approximations for spatial stochastic gene expression in single cells and tissues.

Authors: Stephen Smith; Claudia Cianci; Ramon Grima
Journal: J R Soc Interface Date: 2016-05 Impact factor: 4.118

2 in total