The "Tau-Leap" strategy for stochastic simulations of chemical reaction systems due to Gillespie and co-workers has had considerable impact on various applications. This strategy is reexamined with Chebyshev's inequality for random variables as it provides a rigorous probabilistic basis for a measured τ-leap thus adding significantly to simulation efficiency. It is also shown that existing strategies for simulation times have no probabilistic assurance that they satisfy the τ-leap criterion while the use of Chebyshev's inequality leads to a specified degree of certainty with which the τ-leap criterion is satisfied. This reduces the loss of sample paths which do not comply with the τ-leap criterion. The performance of the present algorithm is assessed, with respect to one discussed by Cao et al. (J. Chem. Phys.2006, 124, 044109), a second pertaining to binomial leap (Tian and Burrage J. Chem. Phys.2004, 121, 10356; Chatterjee et al. J. Chem. Phys.2005, 122, 024112; Peng et al. J. Chem. Phys.2007, 126, 224109), and a third regarding the midpoint Poisson leap (Peng et al., 2007; Gillespie J. Chem. Phys.2001, 115, 1716). The performance assessment is made by estimating the error in the histogram measured against that obtained with the so-called stochastic simulation algorithm. It is shown that the current algorithm displays notably less histogram error than its predecessor for a fixed computation time and, conversely, less computation time for a fixed accuracy. This computational advantage is an asset in repetitive calculations essential for modeling stochastic systems. The importance of stochastic simulations is derived from diverse areas of application in physical and biological sciences, process systems, and economics, etc. Computational improvements such as those reported herein are therefore of considerable significance.
The "Tau-Leap" strategy for stochastic simulations of chemical reaction systems due to Gillespie and co-workers has had considerable impact on various applications. This strategy is reexamined with Chebyshev's inequality for random variables as it provides a rigorous probabilistic basis for a measured τ-leap thus adding significantly to simulation efficiency. It is also shown that existing strategies for simulation times have no probabilistic assurance that they satisfy the τ-leap criterion while the use of Chebyshev's inequality leads to a specified degree of certainty with which the τ-leap criterion is satisfied. This reduces the loss of sample paths which do not comply with the τ-leap criterion. The performance of the present algorithm is assessed, with respect to one discussed by Cao et al. (J. Chem. Phys.2006, 124, 044109), a second pertaining to binomial leap (Tian and Burrage J. Chem. Phys.2004, 121, 10356; Chatterjee et al. J. Chem. Phys.2005, 122, 024112; Peng et al. J. Chem. Phys.2007, 126, 224109), and a third regarding the midpoint Poisson leap (Peng et al., 2007; Gillespie J. Chem. Phys.2001, 115, 1716). The performance assessment is made by estimating the error in the histogram measured against that obtained with the so-called stochastic simulation algorithm. It is shown that the current algorithm displays notably less histogram error than its predecessor for a fixed computation time and, conversely, less computation time for a fixed accuracy. This computational advantage is an asset in repetitive calculations essential for modeling stochastic systems. The importance of stochastic simulations is derived from diverse areas of application in physical and biological sciences, process systems, and economics, etc. Computational improvements such as those reported herein are therefore of considerable significance.
In
recent times, stochastic simulations have
assumed extraordinary
importance in modeling biological and nano systems. The importance
of the Monte Carlo simulation methodology arises from its capacity
to substitute for the less tractable master equation and mean field
approaches. This capacity is derived from its basic simplicity which
could, however, be seriously compromised if computational times become
forbiddingly large because of the repetitive attribute of the methodology.
Consequently, considerable effort has come to pass on finding efficient
simulation strategies so that computation of statistical averages
of stochastic behavior can become facile. The issue of efficiency
arises in negotiating accuracy with sacrifice of rigor or through
crafty approximations. In stochastic systems, however, the validity
of approximations is uncertain because they invariably involve future
values of random variables in the system leading to conservative and
hence less efficient heuristics. The remedy must lie in the use of
mathematical propositions concerning probabilities with which approximations
about random variables are true.It is the purpose of this work
to aid in the quest for efficient
stochastic simulation strategies by exploiting an important inequality
that could greatly rationalize the process. This is Chebychev’s
inequality that is routinely encountered in the treatment of random
variables.[1] The inequality is concerned
with the probability with which a random variable deviates from its
expectation in terms of its variance.We are concerned here
with a stochastic system of chemical transformations
involving several species at a fixed temperature. Temperature variation
can be captured in the reaction rate constants which can, in turn,
influence the rate at which the propensity function changes. However,
this temperature change needs to be small since one big assumption
made throughout the development of τ-leap is to assume the propensity
function stays constant during each leap. The stochastic behavior
arises out of the randomness in reaction rates and is reflected in
the numbers of different reaction species at different times. Simulation
of this system can be performed with an exact algorithm
that has come to be known as SSA (stochastic simulation algorithm).
The term “exact” is derived from the fact that the simulation
is based on rigorously derived distributions of “waiting times”
during which the system remains “quiescent” without
any reaction event. While the methodology is usually attributed to
Gillespie,[2] a basic version of the idea
appeared in a paper by Kendall[3] who used
the term “quiescent interval” instead of “waiting
time”. This method was generalized for the simulation of population
balance equations in a doctoral thesis by Shah,[4] subsequently published in the chemical engineering literature[5] independently of Gillespie’s publication[2] in the geophysical literature. Some further insights
as to the connection of the foregoing algorithm to the master equation
for population balances are given by Ramkrishna.[6]The virtue of SSA’s exactness is computational
accuracy
but at the cost of large computation times. SSA also provides the
starting point for various improvements pursued by researchers in
this area to minimize the cost of computation. The term “tau-leap”,
coined by Gillespie and co-workers, aptly describes the underlying
strategy of “leaping” over relatively less significant
events (as reflected by minimal changes in the so-called “propensities”
or transition rates of individual reactions). For all of the methods
discussed in this work, each leap is checked after each calculation.
Anderson’s method[7] is a precheck
method and hence produces a very different mechanism of filtering
leaps. For any given selection of ε, postcheck methods inevitably
encounter some ineffective leap steps at which the leap condition
is violated. Depending upon the number of violations and the extent
of each violation, the accuracy of the simulation will be affected.
Our method introduces a way to promote the likelihood of acquiring
more efficient leap steps by appropriate increase of δ. Anderson’s
method introduces a procedure to do precheck before performing a calculation
of leaps. Only steps where the criteria are satisfied will continue
with the calculation of the corresponding leaps. Moreover, only the
first step of picking τ will follow the formula of Cao et al.[8] The subsequent steps of evaluating τ strictly
follow their procedure. For each new calculated τ, it is then
evaluated with the leap condition. Depending upon the value of τ,
different scaling factors can be applied to adjust τ. If a leap
for ε = ε′ is rejected, τ will be decreased
by multiplying with some p < 1. If a leap is accepted
for ε = ε′ but would fail if ε = 3ε′/4,
it will be reduced by multiplying it by some p* that
satisfies p < p* < 1. If a
leap is accepted for both cases, τ can be enlarged by raising
it to the power of q such that q is between 0 and 1. Thus, the algorithm for precheck will only accept
those τ that satisfy the leap condition. However, in stochastic
simulations, good results can still be accomplished, as long as the
number of unsatisfied leaps is low. Every time a leap is rejected,
a new calculation will need to be made; therefore the question is
a balance between accuracy and speed. Anderson’s approach focuses
on checking the leap condition, whereas all other postcheck methods
focus on the issue of how to select a τ value. Therefore, the
method cannot be included in our comparison of postcheck methods with
the approach of this work. For that reason, this work will exclusively
focus on comparison among postcheck methods. Negative population is
another problem associated with the usage of Poisson random number
generation. In the later version of the method of Cao et al.[8] the author introduced ways of capturing and treating
critical reactions in which the negative population might occur; Peng
et al. implemented a binomial leap strategy[9−11] to resolve
the problem. In addition to the preceding treatments, we implement
a novel strategy using the Chebyshev inequality to improve the simulation
process. In this work, we show how the Chebyshev inequality can be
used to craft this temporal leap by specifying the probability with
which it influences the deviations in the stochastic numbers of chemical
species from their expected values. In so doing, following Cao et
al.,[8] we retain the Taylor series expansion
of the propensities about the state of the system specified at instant t truncated beyond the quadratic terms. A new algorithm
is derived simply using different probabilistic criteria for approximations.
Chebyshev Inequality
We recall that for any random
variable X with
finite expectation EX and variance VX the following inequality[1] is satisfied for an arbitrary ε > 0.For our purposes, the preceding inequality
may be restated asClearly inequality 2 stipulates
the minimum of the probability with which the random
variable X deviates from EX by less than ε.
Development of the New “Tau-Leap”
Strategy
As observed earlier, we adhere to the stochastic
system formulated
by Gillespie and Petzold.[8] Thus, there
are N chemical species in numbers represented by
the random vector X, involved in M transformations
(reactions) with the vector ν ≡ {ν, j = 1, 2, ..., N} denoting the change in the numbers
of species due to each jth reaction event. Further,
the reaction propensity vector, conditional on X = x, is represented by {a(x); j = 1, 2, ..., M}. If at time t, X(t) = x, then after time τ (i.e., τ > 0),
we have[1]in which K(τ;x) represents
the number of jth reaction events during the interval
from t to t + τ. For the formulation
of τ-leap, K(τ;x) is
assumed to be a Poisson process with mean and variance given by a (x) τ,
notwithstanding the change in X during this time interval.
Thus, (3) is replaced byThe “leap condition” of Gillespie
and Petzold[1] is given bywhere the symbol ε is to be distinguished
from that appearing in the Chebyshev inequality (1) or (2). Other replacements considered by
these authors for ao(x) do
not affect the treatment of this work. DefiningThe change in propensity during the time interval
between t and t + τ is given
byThe random variable Δa (τ;x), in view of
the approximation in (8), has expectation EΔa (τ;x) ≡ μ(x)τ and variance VΔa(τ;x) = σ2(x)τ. Gillespie and Petzold choose the interval
τ in accord withWe now consider the application of the Chebyshev
inequality to
the random variable Δa(τ;x) toward determining a suitable criterion
that will ensure the leap condition (5). We
observe first that the triangular inequality implies thatBy imposing thatinequality
(10) implies
the leap condition (5). From (11), it follows thatThe preceding inequality does require that
the right-hand side be positive so thatto the right of
which is a potential candidate
for τ as prescribed by Cao et al.[8]Invoking the Chebyshev inequality (2) by
replacing the ε which appears there by (εao(x) – |μ(x)|τ), we obtainSince the right-hand side of (13) must be nonnegative, it is readily found that an upper bound
for τ is ao2ε2/σ2, thus recovering the second candidate for τ-leap due to Gillespie
and Penzold. However, we produce a more effective choice of τ-leap
in this development.If we let the right-hand side of (13) be
appropriately close to unity, a choice becomes available for τ.
Thus, setting its value equal to δ (e.g., δ = 0.95), we
have (dropping henceforth the argument x for notational
simplicity)which obtains the following solution for τ, the subscript referring to its being specific
to the jth reaction.The positive solution is
in conflict with
(12) so that we must seek only the smaller root
with the negative sign which is indeed less than ao(x)ε/|μ(x)|. Thus, we havewhich is of course consistent
with (12). We conclude from this analysis that
the required
τ is given bywhich assures the required leap condition
(5) with a probability of at least δ.It is next of interest to compare the development here with
the
Gillespie–Petzold solution (hereafter referred to as GPS).
Because (12) represents a common requirement,
we only need consider the other alternative ε2ao2/σ2 for τ with that arising from expression 16. In passing,
the Cauchy–Schwartz inequality (which states that the absolute
value of the inner product between any two vectors is at most equal
to the product of their norms) can be used on (7) to show thatwhich serves to further calibrate the plot
of δ versus τ. We return to compare our prescription for
τ to that by GPS. Suppose a specific
value of ε is chosen. If we begin with choosingwhich
implies that aoε/|μ| < ε2ao2/σ2, so that from GPS τ = aoε/|μ|. For this ε, Figure 1a shows the layout
of δ and τ and thus the extent
to which they may be negotiated for an efficient simulation as guided
by Chebyshev’s inequality. In particular, it would appear that
τ should in fact be safely less
than aoε/|μ| (instead of being equal to it as in GPS) so that δ
has a suitably positive value. Of course the resulting
τ-leap would be lowered, but sample paths generated from the
process would more reliably satisfy the τ-leap condition (5) which is indicative of higher simulation efficiency.
It is noteworthy that the GPS choice is not backed by the probabilistic
assurance of satisfying the τ-leap condition (5). If, on the other hand, ε < σ2/ao|μ|, the GPS prescription would call for τ = ao2ε2/σ2 < σ2/|μ|2; furthermore, it can be shown (see Appendix in the Supporting Information) that τ > τ0, where τ0 is obtained from (16) by setting δ =
0; thus the δ – τ scenario would change to that
shown in Figure1b. Thus, regardless of the
choice of τ in (9), Figure 2 locates it in the domain
of negative δ thus denying any probabilistic
assurance that the τ-leap condition 5 would
be satisfied. It does not preclude, however, the generation of meaningful
sample paths for the process.
Figure 1
Plots
showing how τ (indicated as τ)
may be chosen for various values of δ, the probability with
which the τ-leap condition is satisfied for the case ε
> σ2/ao|μ|. (a, top) The GPS solution for τ appears at the asymptote! (b, bottom) The GPS obtains
τ in the region for which δ
< 0.
Figure 2
Comparison of the accuracy for the histogram between the two methods:
binomial method[13] and the present method
for ε = 0.02.
Plots
showing how τ (indicated as τ)
may be chosen for various values of δ, the probability with
which the τ-leap condition is satisfied for the case ε
> σ2/ao|μ|. (a, top) The GPS solution for τ appears at the asymptote! (b, bottom) The GPS obtains
τ in the region for which δ
< 0.Comparison of the accuracy for the histogram between the two methods:
binomial method[13] and the present method
for ε = 0.02.As long as τ is
chosen to satisfy (12), there is no explicit
restriction on the choice of ε, although
its smallness would indeed govern the quality of the τ-leap
criterion. Throughout the simulation, both the method of Cao et al.
as well as the current method encounter a number of defective steps
at which the leap condition fails. As the number of these wasteful
steps increases, the accuracy will be negatively influenced (i.e.,
it will require a higher number of sample paths to accomplish a fixed
accuracy). Cao et al. proposed a procedure to generate leaps but never
proved mathematically how it can be satisfied. Hence, some generated
leaps might violate the leap condition badly and affect strongly the
overall quality of the simulation. One way to avoid this problem and
therefore to optimize the process is to increase the probability of
getting a good selection of leaps. Two different ways which can accomplish
this goal are as follows. Clearly, smaller values of ε will
be progressively conservative and hence less effective in reducing
computation time. On the other hand, if we fix ε, the use of
the Chebyshev inequality may be regarded as a way to reduce the number
of wasteful simulations that violate the τ-leap criterion with
the stipulated ε. From (17), it is transparent
that, for fixed ε, as δ → 1, τ → 0,
which would be undesirable. Higher values of τ can be chosen
by negotiating the value of δ to be suitably less than 1. While
it may at first seem that the lower leaps in our approach may, not
surprisingly, lead to more accurate solutions, it must be understood
that the methodology lies in its efficiency in that a solution of
given accuracy is obtained with a smaller number of simulations. The
proposed method thrives in the facility to manipulate both the parameters
ε and δ to promote efficiency. The computational demonstration
to follow would of course confirm the foregoing observations.The development of this work is readily adapted to the new τ-leap
method presented by Cao et al.,[8] who relate
the relative change in propensity to that of the stochastic state
variables involved. For example, if a(x) involves a first order reaction
in the ith species (alone), we haveFor the foregoing
case, the τ-leap condition
in the jth propensity translates to |ΔX| ≤ εx in state variable domain. For more
general cases, Cao et al.[8] obtain the τ-leap
condition in the state variables aswhere g(≥1) represents the highest order of reaction
with respect
to species i. Note that (21) implies a change of at least one molecule; for an explanation of
this and other aspects of this algorithm, the reader is referred to
Cao et al.’s report.[8]The
application of our methodology leads to the following expression
for τ which will satisfy the leap condition for X with probability of at least δ.where
α ≡ max{εx/g,1}, μ̂(x) ≡ Σνa(x), and σ̂2(x) ≡ Σν2a(x).The efficacy
of the proposed algorithm is shown by comparing it with
that of Cao et al.[8]The stochastic
algorithm is usually employed to obtain the average
behavior of the system for a chosen time interval by averaging several
sample path simulations using the algorithm. The SSA[8,12] serves as a benchmark in evaluating the accuracy of any algorithm
by comparing the histogram obtained by sample path averaging with
that of SSA. In this regard, we use the same metric as that used by
Cao and Petzold.[13] The comparison between
the algorithms presented in this work with others has been made in
two different ways. Thus, we compare the accuracies obtained for a
fixed computation time, which is more focused on accuracy than computation
time. Alternatively, where accuracy is prescribed, computation times
can be compared for any two algorithms.We employ two different
examples for our demonstration. They are
(i) the Schlogl model noted for its bistability and (ii) a linear
consecutive reaction system. The algorithms used for comparison are
those due to Cao’s method,[8] Gillespie’s
midpoint Poisson τ-leap method,[11,12] and the binomial
leap method[9,10] subsequently improved by Peng
et al.[11]
Examples
Example
1
We consider the consecutive linear reaction
system as in (23).We apply our τ selecting method with
formula 22 for the generation of τ. With
rate constants c1 = 1 and c2 = 1 and initial conditions X1 = 104, X2 =
1 and X3 = 0, we calculate X3 until time is equal to 0.1.
Example 2
Schlogl’s chemical reaction model
is shown in (24). This model is noted for its
bistability. B1 and B2 are constants with their particle numbers as N1 and N2, respectively.The values of parameters,
adapted from Cao et al.,[8] are c1 = 3 × 10–7, c2 = 10–4, c3 = 10–3, c4 = 3.5, N1 = 1 × 105, and N2 = 2 × 105. The initial condition of X is 250, and we simulate
the system up to t = 4.
Discussion
of Results
Figure 2 shows the probability
density for
X3 at t = 0.1 for SSA, the present method and the
binomial method of Peng et al.,[11] using
ε = 0.02. The higher and close proximity of the distribution
by the present method to the SSA is clearly evident. Table 1 shows the number of sample paths used for averaging
is only 10,000 for an accuracy notably higher than that from the algorithm
under comparison. Although the computational time for each sample
path for the present method is higher, a more accurate solution is
obtained with 40% higher speed. Figure 3 shows
the accuracy of the algorithms in comparison as a function of simulation
time at different values of ε. Histogram distance errors are
measured by 106 samples and 105 samples generated
from the SSA method and the two τ-leap methods, respectively,
at different values of ε. For the same simulation time, the
accuracy of the histogram is notably higher for the present method.
Table 1
Comparison of Results Generated from
Two Different Methods: Binomial Method[13] and the Present Method for the Consecutive Linear Reactiona
binomial method
present
method
no. of trajectories
30,000
10,000
ε
0.02
0.02
δ
0.667
histogram error
0.0090
0.0018
total simulation time (s)
5.417 × 103
3.269 × 103 (40% faster)
This table shows all of the parameters
and results shown in Figure 2.
Figure 3
Comparison
of histogram error corresponding to different simulation
times for the two methods being used to model the consecutive linear
reaction system. The binomial method of Peng et al. is shown in red
whereas ours is shown in blue.
Comparison
of histogram error corresponding to different simulation
times for the two methods being used to model the consecutive linear
reaction system. The binomial method of Peng et al. is shown in red
whereas ours is shown in blue.This table shows all of the parameters
and results shown in Figure 2.Figure 4 shows
comparison of the histogram
error as a function of simulation time for the present method against
that by Cao et al.[8] for example 2. As in
example 1, the notably higher accuracy of the current method is evident
for this example also. Figure 5 compares the
algorithm of Gillespie’s midpoint Poisson method with the present
method for the same system. In both Figures 4 and 5, the number of SSA samples is 106. The comparison was made based on a fixed number of simulations
for those methods at different values of ε. The simulation time
for the former is considerably higher than that for the latter. When
higher accuracy is needed, the computational time needed for Gillespie’s
method is substantially longer. Another verification for our claim
is shown in Figure 6. In this figure, ε is fixed to be 0.2, and results generated
using two different methods at different numbers of samples were then
compared. Similarly, we observed that our present algorithm improves
both speed and accuracy. Although the computation time for each sample
path is notably larger for the present work because of smaller time
intervals, the higher efficiency of the resulting sample path allows
accurate results with a significantly smaller number of sample paths
as established in Figures 3, 4, and 5. Indeed the foregoing results
bear testimony to notable improvements with our algorithm arising
from the use of Chebyshev’s inequality.
Figure 4
Comparison
of histogram error with respect to different simulation times of two
different methods: the method of Cao et al.[8] in red and ours in blue. This plot shows the accuracy level of the
two methods for Schlogl’s system at different ε.
Figure 5
Comparison of histogram error with respect to
different simulation
times of two different methods: Gillespie’s midpoint Poisson
method in red and ours in blue. This plot shows the accuracy level
of the two methods for Schlogl’s system at different ε.
Figure 6
Comparison of histogram error with respect to
different simulation
times of two different methods: Gillespie’s midpoint Poisson
method in red and ours in blue. This plot shows the accuracy level
of the two methods for Schlogl’s system at different numbers
of samples at ε = 0.2.
Comparison
of histogram error with respect to different simulation times of two
different methods: the method of Cao et al.[8] in red and ours in blue. This plot shows the accuracy level of the
two methods for Schlogl’s system at different ε.Comparison of histogram error with respect to
different simulation
times of two different methods: Gillespie’s midpoint Poisson
method in red and ours in blue. This plot shows the accuracy level
of the two methods for Schlogl’s system at different ε.Comparison of histogram error with respect to
different simulation
times of two different methods: Gillespie’s midpoint Poisson
method in red and ours in blue. This plot shows the accuracy level
of the two methods for Schlogl’s system at different numbers
of samples at ε = 0.2.
Conclusion
The algorithm presented in this
work has attributes of efficiency
earned from being able to account for the likelihood with which approximations
for τ-leap are satisfied by simulated sample paths. The choice
of this probability is a rational guideline to Monte Carlo simulations.