Literature DB >> 24699269

Exact hybrid particle/population simulation of rule-based models of biochemical systems.

Justin S Hogg¹, Leonard A Harris¹, Lori J Stover¹, Niketh S Nair¹, James R Faeder¹.

Abstract

Detailed modeling and simulation of biochemical systems is complicated by the problem of combinatorial complexity, an explosion in the number of species and reactions due to myriad protein-protein interactions and post-translational modifications. Rule-based modeling overcomes this problem by representing molecules as structured objects and encoding their interactions as pattern-based rules. This greatly simplifies the process of model specification, avoiding the tedious and error prone task of manually enumerating all species and reactions that can potentially exist in a system. From a simulation perspective, rule-based models can be expanded algorithmically into fully-enumerated reaction networks and simulated using a variety of network-based simulation methods, such as ordinary differential equations or Gillespie's algorithm, provided that the network is not exceedingly large. Alternatively, rule-based models can be simulated directly using particle-based kinetic Monte Carlo methods. This "network-free" approach produces exact stochastic trajectories with a computational cost that is independent of network size. However, memory and run time costs increase with the number of particles, limiting the size of system that can be feasibly simulated. Here, we present a hybrid particle/population simulation method that combines the best attributes of both the network-based and network-free approaches. The method takes as input a rule-based model and a user-specified subset of species to treat as population variables rather than as particles. The model is then transformed by a process of "partial network expansion" into a dynamically equivalent form that can be simulated using a population-adapted network-free simulator. The transformation method has been implemented within the open-source rule-based modeling platform BioNetGen, and resulting hybrid models can be simulated using the particle-based simulator NFsim. Performance tests show that significant memory savings can be achieved using the new approach and a monetary cost analysis provides a practical measure of its utility.

Entities: CellLine Chemical Disease Species

Mesh：

Year: 2014 PMID： 24699269 PMCID： PMC3974646 DOI： 10.1371/journal.pcbi.1003544

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

This is a PLOS Computational Biology Methods article.

Introduction

Rule-based modeling

Cell signaling encompasses the collection of cellular processes that sample the extracellular environment, process and transmit that information to the interior of the cell, and regulate cellular responses. In a typical scenario, molecules outside of the cell bind to cognate receptors on the cell membrane, resulting in conformational changes or clustering of receptors. A complex series of protein binding and biochemical events then occurs, ultimately leading to the activation or deactivation of proteins that regulate gene expression or other cellular processes [1]. A typical signaling protein possesses multiple interaction sites with activities that can be modified by direct chemical modification or by the effects of modification or interaction at other sites. This complexity at the protein level leads to a combinatorial explosion in the number of possible species and reactions at the level of signaling networks [2]. Combinatorial complexity poses a major barrier to the development of detailed, mechanistic models of biochemical systems. Traditional modeling approaches that require manual enumeration of all potential species and reactions in a network are infeasible or impractical [2]–[4]. This has motivated the development of rule-based modeling languages, such as the BioNetGen language (BNGL) [5], [6], Kappa [7], [8], and others [9]–[12], that provide a rich yet concise description of signaling proteins and their interactions [13]. The combinatorial explosion problem is avoided by representing interacting molecules as structured objects and using pattern-based rules to encode their interactions. In the graph-based formalisms of BNGL and Kappa, molecules are represented as graphs and biochemical interactions by graph-rewriting rules. Rules are local in the sense that only the properties of the reactants that are transformed, or are required for the transformation to take place, affect their ability to react. As such, each rule defines a class of reactions that share a common set of transformations (e.g., the formation of a bond between molecules) and requirements for those transformations to take place (e.g., that one or more components have a particular covalent modification). The number of reactions encoded by a rule varies depending on the specifics of the model; a rule-based encoding is considered compact if it contains rules that encode large numbers of reactions. Overviews of rule-based modeling with BNGL can be found in Sec. S3.1 of Text S1 and Refs. [6], [14]. A description of the graph-theoretic formalism underlying BNGL is provided in Sec. S4.1 of Text S1, building on a previous graph-theoretical treatment [15].

Network-based and network-free simulation of rule-based models

An important characteristic of rule-based models is that they can encode both finite and infinite reaction networks. If the network is finite and “not too large” (≲10000 reactions [16]) it can be generated from the rule-based model algorithmically by a process known as “network generation” [5], [6], [14], [15], [17]. Network generation begins by applying the rules of a rule-based model to a set of initial “seed” species, which define the initial state of the model system, to generate new species and reactions. The new species are then matched against the existing species to determine whether or not they are already present in the network [18]. Any species that are not already present are added to the network and an additional round of rule application is performed. This iterative process continues until an iteration is encountered in which no new species are generated. The resulting system of reactions can then be simulated using a variety of network-based deterministic and stochastic simulation methods. For example, network-based simulation methods currently implemented within BioNetGen include SUNDIALS CVODE [19] for ordinary differential equation (ODE)-based simulations, Gillespie's stochastic simulation algorithm (SSA; direct method with dynamic propensity sorting) [20], [21], and the accelerated-stochastic “partitioned-leaping algorithm” [22]. The rule-based methodology also provides a way to simulate models with prohibitively large or infinite numbers of species and reactions. This “network-free” approach involves representing molecular complexes as particles and applying rule transformations to those particles at runtime using a kinetic Monte Carlo update scheme [23], [24]. At each simulation step, reactant patterns are matched to the molecular complexes within the system to calculate rule propensities. The rule to next fire is then selected probabilistically as in the SSA [20] and the particle(s) to participate in the transformation is (are) selected randomly from the set of matches. When the rule fires, transformations are applied to the reactant complexes to create the products. Since the reactants and products are determined at runtime there is no need to enumerate all species and reactions a priori as in network-based methods. This procedure is a particle-based variant of Gillespie's algorithm [23], [24] and a generalization of the “n-fold way” of Bortz et al. [25], which was originally developed to accelerate the simulation of Ising spin systems. An efficient, open-source implementation that is compatible with BNGL models is NFsim, the “network-free simulator” [16]. Other network-free simulation tools for rule-based models include RuleMonkey [26], DYNSTOC [27], SRsim [28], and KaSim [24]. A recent paper [29] compares the rejection-based sampling technique [23] used in NFsim with the rejection-free approach employed in RuleMonkey. For models of multivalent ligand-receptor binding, rejection-based sampling was shown to be more efficient in the vicinity of the solution-gel phase boundary, while rejection-free sampling was more efficient for simulating the dynamics within the gel phase. Since only the current set of molecular complexes and the transformations that can be applied to them are tracked, network-free methods can efficiently simulate systems that are intractable to network-based methods [16], [23], [24], [29]. However, the explicit representation of every molecule in the system is a major shortcoming of the approach. As such, network-free methods can require large amounts of computational memory for systems that contain large numbers of particles, a potential barrier to simulating systems such as the regulatory networks of a whole cell [30], [31]. A typical eukaryotic cell, for example, contains on the order of protein-coding genes, mRNA molecules, and protein molecules [32], [33], along with much larger numbers of metabolites, lipids, and other small molecules. Simulating a cell at this level of detail using a network-free approach would be impractical. There is a need, therefore, for new approaches that can reduce the memory requirements of network-free simulation methods.

Computational complexity

A common measure of the computational cost of an algorithm is its computational complexity. In basic terms, computational complexity measures how the computational cost increases as an algorithm is applied to increasingly larger data sets [34]. For the simulation methods considered in this paper, two types of computational complexity are important: (i) space complexity, the number of memory units consumed during the execution of an algorithm; (ii) time complexity, the number of computational steps required to complete an algorithm. Network-based exact-stochastic simulation methods, like Gillespie's SSA [20], [35], [36], treat species as lumped variables with a population counter. Therefore, their space complexity is constant in the number of particles in the system. However, representing the reaction network has a space complexity that is linear (or worse if a reaction dependency graph is used [37], [38]) in the number of reactions. Network-based SSA methods are thus space efficient for systems with large numbers of particles, but less so for systems with large numbers of reactions. The time complexity of SSA methods is more difficult to quantify. It depends on model-specific factors such as the number of reactions in the network and the values of rate constants and species concentrations, as well as methodological factors such as how the next reaction to fire in the system is selected [20], [21], [37]–[41] and how reaction propensities are updated after each reaction firing [37], [38]. However, for our purposes, what matters is that the time cost per event (reaction firing) for these methods is constant in the number of particles in the system and increases with the number of reactions in the network. Network-free methods, in contrast, represent each particle individually. Thus, their space complexity is linear in the number of particles. This is the primary shortcoming of these methods, as it limits the size of system that can be feasibly simulated. However, since reactions are not enumerated, their space complexity is linear in the number of rules, rather than the number of reactions. This is a key advantage for models where very large reaction networks are encoded by a small number of rules. Network-free methods also have an advantage over network-based methods in that their time complexity per event also scales with the number of rules, rather than the number of reactions. Since the number of rules in a rule-based model is typically far less than the number of reactions, this can be a substantial improvement. For example, NFsim has been demonstrated to significantly outperform network-based SSA methods for a family of receptor signaling models with large reaction networks [16]. We also note that for many models network-free methods have a time cost per event that is constant in the number of particles. However, for systems in which large aggregates form (e.g., models with polymerization dynamics [42], [43]) the cost can be significantly higher, scaling with the number of particles [16], [24]. Nevertheless, network-free methods are still usually the best option in these cases because these types of models tend to encode very large reaction networks [16]. In Table 1, we summarize the space and time complexities for different network-based SSA variants and for the network-free algorithm. Of most relevance to the current work are the entries that show: (i) the space complexity of network-based methods is constant in the number of particles and linear (or worse) in the reaction network size; (ii) the space complexity of network-free methods is linear in the number of particles and independent of the reaction network size, depending instead on the number of rules; (iii) the time complexity of network-based methods depends on the number of reactions in the network while for network-free methods it depends on the number of rules. Network-based methods are thus the best choice for systems with large numbers of particles and a small to moderate reaction network, and network-free methods are the best choice for systems with a large reaction network and small to moderate numbers of particles. However, neither method is optimal for systems that contain both a large number of particles and a large reaction network.

Table 1

Space and time complexities for network-based (SSA) and network-free (NF) stochastic simulation algorithms.

	SSA		NF
	Particles (P)	Reactions (R)	Particles (P)	Rules ()
Space		a, b		a
Time (per event)		c, d, e	, f	g

No dependency graph.

Dependency graph [37], [38].

Logarithmic classes (with dependency graph) [21], [39], [40].

Next-reaction method (with dependency graph) [37].

Direct method (with or without dependency graph) [20].

Polymerizing systems in gel phase [23], [42] (see Fig. 5B).

Direct method-like implementation.

Scalings are shown with respect to particle number, , and number of reactions, , or rules, . For combinatorially-complex models, . Note that time complexity is given on a “per event” (reaction/rule firing) basis. If a reaction dependency graph [37] is used, the space and time complexities of SSA methods with respect to depend on , the maximum number of reactions updated after each reaction firing [37], [38]. In combinatorially-complex models, often increases with (see Figure S1 of the supporting information). The time complexity of SSA methods with respect to also depends on the method used for selecting the next reaction to fire in the system. Scalings are shown for three different SSA variants that use different selection methods [20], [21], [37], [39], [40]. Also note that optimized variants of the direct method [21], [38], [41] have been shown to outperform methods with lower asymptotic complexity in some cases [38]. Space and time complexities of the NF algorithm with respect to assume no dependency graph and that the next rule to fire is selected as in Gillespie's direct method [20], although in principle other variants are possible.

No dependency graph. Dependency graph [37], [38]. Logarithmic classes (with dependency graph) [21], [39], [40]. Next-reaction method (with dependency graph) [37]. Direct method (with or without dependency graph) [20]. Polymerizing systems in gel phase [23], [42] (see Fig. 5B).

Figure 5

HPP performance analysis for the TLBR model.

(A) peak memory usage (left: absolute, right: relative to NFsim); (B) CPU run time (left: absolute, right: relative to NFsim); (C) number of reaction events fired during a simulation (); (D) equilibrium distribution of number of clusters (). The slight deviation from linearity for ‘NF’ in (A) is an artifact of how memory is allocated in NFsim.

Direct method-like implementation. Scalings are shown with respect to particle number, , and number of reactions, , or rules, . For combinatorially-complex models, . Note that time complexity is given on a “per event” (reaction/rule firing) basis. If a reaction dependency graph [37] is used, the space and time complexities of SSA methods with respect to depend on , the maximum number of reactions updated after each reaction firing [37], [38]. In combinatorially-complex models, often increases with (see Figure S1 of the supporting information). The time complexity of SSA methods with respect to also depends on the method used for selecting the next reaction to fire in the system. Scalings are shown for three different SSA variants that use different selection methods [20], [21], [37], [39], [40]. Also note that optimized variants of the direct method [21], [38], [41] have been shown to outperform methods with lower asymptotic complexity in some cases [38]. Space and time complexities of the NF algorithm with respect to assume no dependency graph and that the next rule to fire is selected as in Gillespie's direct method [20], although in principle other variants are possible.

Combining network-based and network-free methodologies

The key idea pursued in this work is that memory consumption can be reduced in network-free simulators if simple species and small molecular complexes that exist in the system in large numbers are treated as population variables with counters rather than as particles. However, retaining the ability to address combinatorial complexity requires retaining the particle representation for species and complexes that are comprised of many molecules and/or have a large number of internal states. Here, we present an approach, termed the hybrid particle/population (HPP) simulation method, that accomplishes this. Given a user-defined set of species to treat as population variables, the HPP method partially expands the network around these population species and then simulates the partially-expanded model using a population-adapted particle-based method. By treating complex species as structured particles, HPP capitalizes on the reduced time complexity with respect to network size characteristic of the network-free approach. However, for the subset of species treated as population variables, we take advantage of the constant memory requirements of the network-based methodology. It is important to emphasize that in the HPP approach it is the system that is represented in a hybrid manner, as a collection of particles and population variables. The underlying simulator remains the same particle-based variant of Gillespie's algorithm that is used in existing network-free simulators [23], [24], but with small modifications to support population variables. This distinguishes HPP from other types of hybrid methods that combine different simulation methodologies, e.g., ODE/SSA integrators [44]–[53].

Related work

While numerous rule-based modeling frameworks have been developed, little has been done with regard to hybrid particle/population simulation. Kappa [7], [8] has the concept of “tokens,” which are structureless population-type species. Modelers can write hybrid models in terms of both structured “agents” and structureless tokens and simulate them using KaSim 3, the most recent version of the Kappa-compatible network-free simulator (https://github.com/jkrivine/KaSim). However, there is no facility for transforming a model written exclusively in terms of agents into a hybrid form, as in our HPP method. Bittig et al. [10] have developed a spatial rule-based language called ML-Space that builds upon the multi-level language ML-Rules [9]. “Entities” that are assigned optional attributes such as shape, volume, and position in continuous space are automatically treated as particles diffusing via Brownian motion, while those without these attributes are treated as population variables reacting and diffusing within a discretized space (subvolumes). For non-spatial models, the population-based network-free algorithm (PNFA) of Liu et al. [54] employs a similar philosophy: all multi-state (structured) species are automatically treated as particles, while single-state species are treated as population variables. Both ML-Space and PNFA lack a general representation of intermolecular bonding, which makes it difficult to account for combinatorial complexity associated with aggregation processes [2], [29]. Falkenberg et al. [55] have proposed a hybrid deterministic/stochastic method that specifically addresses the problem of aggregation. Their approach first calculates occupancy probabilities as a function of time for all binding-site types by treating them as population variables and numerically integrating an associated set of deterministic ODEs describing the binding/unbinding kinetics. An ensemble of system states is then obtained by randomly distributing bonds, based on these probabilities, among a finite number of discrete molecules. The method assumes that inter- and intra-molecular bond formations occur with equal rates. Thus, although efficient for problems with high symmetry, its applicability to more general cases may be limited. Other approaches aimed at improving the efficiency of rule-based simulations include “on-the-fly” network generation [17], [56], [57], where the reaction network is gradually built up by adding reactions only when new species appear in the system. The approach has only been developed within the context of discrete-stochastic simulation and has been shown to be significantly less efficient than network-free approaches when applied to combinatorially-complex models [23], [58]. An alternative approach to reducing computational cost is exact model reduction (EMR) [59]–[64]. EMR aims to reduce the state space of a rule-based model while preserving the exact system dynamics with respect to observable quantities. These methods can achieve dramatic reductions in model complexity when applied within the context of ODEs, so long as the model does not contain significant cooperative or allosteric interactions [62], [64]. EMR for stochastic simulations, however, has so far been less successful (see http://infoscience.epfl.ch/record/142570/files/stochastic_fragments.pdf).

Methods

Example models

We have tested the performance of the HPP method by applying it to four example models, summarized in Table 2 and discussed in further detail below. All of the models are biologically relevant and are either taken directly from the literature or are based on models taken from the literature. Complete BNGL encodings, HPP configuration files (containing actions for loading models, defining population maps, and executing simulations), and partially-expanded versions of all example models are provided as Texts S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17 of the supporting information.

Table 2

Summary of example models used to test the performance of the HPP method.

Model	Rules	Reactions	Species	Particles (f = 1)	Population species	Rules after PNE	t_end (s)
TLBR [16], [42], [65]	4	∞	∞	5.3×10⁶	2	9	500
Actin [16], [43]	21	∞	∞	1.2×10⁶	2	25	1000
FcεRI [16], [70], [81]	24	58 276	3744	6.9×10⁶	1/6	25/38	2400
EGFR [18], [71], [72]	113	415 858	18 950	2.2×10⁶	29	159	1200

Number of particles is for an NFsim simulation of a full cell volume (). Fractional cell volumes as low as 0.001 and as high as 1 are used in the performance analyses (see “Example models” for details). Number of rules after PNE includes the population-mapping rules (one per population species).

Trivalent-ligand bivalent-receptor

The trivalent-ligand bivalent-receptor (TLBR) model is a simplified representation of receptor aggregation following multivalent ligand binding. TLBR has biological relevance to antigen-antibody interaction at the cell surface, where bivalent receptor complexes aggregate in the presence of multivalent antigen [65]. A theoretical study of the TLBR system was presented by Goldstein and Perelson [65], who derived analytical conditions for a solution-gel phase transition in terms of binding equilibrium constants, free ligand concentration, and receptors per cell. A more recent study considered the effects of steric constraints and ring closure on the solution-gel phase transition [42]. Despite its simplicity, the TLBR system experiences a state-space explosion near the solution-gel phase boundary. A computational study by Sneddon et al. using NFsim [16] reproduced the analytical results of Goldstein and Perelson. Due to large excesses of ligand and receptor under certain conditions, TLBR is a natural test case for HPP. We simulated the TLBR system using HPP with free ligand and receptor treated as population species. All simulations were performed with parameters as defined in Monine et al. [42], which lie within the solution-gel phase coexistence region. A cell-scale simulation assumed extracellular volume per cell ( cells/ml) with ligand and receptors per cell. Simulations were performed at fractional cell volumes, , ranging from to with a lumping rate constant k_lump = 10000/s (see below).

Actin polymerization

Actin polymerization plays a key role in cell morphology and motility [66], [67]. Roland et al. [43] presented a dynamic model of actin polymerization featuring filament elongation by monomer addition, stabilization by ATP hydrolysis, and severing mediated by actin depolymerizing factor (ADF)/cofilin. Sneddon et al. [16] presented a rule-based formulation of the Roland et al. model and replicated their results using NFsim. The model features an excess of actin monomer and ADF molecules. Therefore, we speculated that substantial memory reduction would be possible using the hybrid approach. We applied HPP to the Sneddon et al. rule-based model of actin dynamics (hereafter referred to as the Actin model) with actin monomer and ADF treated as population species. A cell-scale simulation assumed intracellular volume with actin monomer and ADF/cofilin. Simulations were performed at fractional cell volumes, , ranging from to with a lumping rate constant . = 10000/s

signaling

is a membrane receptor that binds IgE antibodies. Signaling through regulates basophilic histamine release in response to IgE antibody-antigen interaction [68]. Faeder et al. [69], [70] developed a rule-based model of receptor assembly and activation in which receptor dimerization/clustering is mediated by chemically cross-linked IgE, which serve as multivalent ligands. Dimerized receptors are transphosphorylated, leading to Syk and Lyn recruitment and phosphorylation. Sneddon et al. [16] presented several extensions of the Faeder et al. model, including the gamma2 variant with two phosphorylation sites. Particle-based NFsim simulations of the gamma2 model were found to be substantially faster than network-based SSA simulations. Due to the excess of free ligand, the HPP method was applied to the gamma2 model to reduce memory consumption. The method was applied with two different sets of population species. In the first case, only free ligand was treated as a population species (). In the second, cytosolic Lyn and all four phosphorylation states of cytosolic Syk were also treated as populations (). A cell-scale simulation assumed intracellular volume with extracellular space per cell ( cells/ml), ligand, and receptors per cell. Simulations were performed at fractional cell volumes, , ranging from to with a lumping rate constant = 10000/s.

EGFR signaling

A model of signaling through the epidermal growth factor receptor (EGFR), beginning with ligand binding and concluding with nuclear phospho-ERK activity, was constructed by combining three existing models: (i) a rule-based model of EGFR complex assembly [18]; (ii) a Ras activation model [71]; (iii) a pathway model of Raf, MEK and ERK activation [72]. Ras activation was coupled to the EGFR complex assembly by treating receptor-recruited Sos as the Ras GEF. Activated Ras was coupled to the Raf/MEK/ERK cascade through RasGTP-Raf binding and subsequent phosphorylation of Raf. Parameters for the combined model were obtained from the respective models. However, parameters governing Ras-GEF (i.e., Sos) activity had to be changed from their original values [71] in order to account for the known GEF-mediated activation of Ras [73]. Specifically, we used and (unitless). Free EGF and Raf-, MEK-, and ERK-based species were treated as population species in the hybrid variant. Ras-based species were also treated as populations except for those that include a Sos molecule. A cell-scale simulation assumed cytosolic and nuclear volume, with extracellular space, ligand, and receptors per cell. Simulations were performed at fractional cell volumes, , ranging from to with a lumping rate constant = 100000/s.

Performance metrics

HPP was evaluated for peak memory use, CPU run time, and accuracy as compared to particle-based NFsim simulations. For models where network generation is possible ( and EGFR), comparisons were also made to SSA simulations (as implemented within BioNetGen [6]). All simulations were run on a Intel Xeon E5520 @ 2.27 GHz (8 cores, 16 threads, x86_64 instruction set) with of RAM running the GNU/Linux operating system. To ensure that each process had access to of the compute cycles of a thread, no more than simulations were run simultaneously.

Peak memory

Average peak memory usage for each simulation method was calculated based on seven independent simulation runs. Peak memory for each run was evaluated by peak virtual memory allocation reported by the operating system with the command “cat/proc//status”. For all tested models, peak memory was achieved early in the simulation and remained steady throughout (data not shown).

CPU run time

Average CPU run time for each simulation method was calculated based on seven independent simulation runs using clock time as a metric. Clock time for each run was recorded using the Time::HiRes Perl module. Run time included initialization as well as the simulation phase. Partial network expansion for HPP simulations was a one time cost, typically a few seconds, and was not included in the calculation.

Accuracy

Simulation accuracy was quantified using several approaches. First, since HPP, NFsim, and SSA are all exact-stochastic methods, they should all produce statistically the same number of reaction firings. To verify this, for all tested models the total number of reaction firings was recorded for each of 40 independent simulation runs of each method (firings of population-mapping rules were subtracted from the total in HPP simulations). The Mann-Whitney U test [74], [75] was then used to test the null hypothesis that none of the methods produces a larger number of reaction firings. For the TLBR and Actin models, we further compared equilibrium distributions for key observables. These include the number of receptor clusters in the TLBR model and the length of actin polymers in the Actin model. 10 000 samples were collected over 100 000 seconds of simulated time and distributions were compared by binning samples (20 bins) and performing a two-sample chi-squared test [76]. For the and EGFR models, we compared dynamic trajectories for key observables. These include phosphorylated receptor and receptor-recruited, phosphorylated Syk in the model, and activated Sos and nuclear phosphorylated ERK in the EGFR model. Due to complications of autocorrelation, a statistical test was not applied to the dynamic trajectory comparison. Instead, moving averages and frequency envelopes, based on simulation runs of each method using a sampling window of , were plotted for inspection by eye.

Software

All HPP and NFsim simulations reported in this work were run using NFsim version 1.11, which is available for download at http://emonet.biology.yale.edu/nfsim. All simulations (SSA included) were invoked through BioNetGen version 2.2.4, which implements the hybrid model generator and is distributed with NFsim 1.11. Instructions for running simulations with BioNetGen (ODE, SSA, and HPP) can be found in Secs. S3.2 and S3.3 of Text S1 and Refs. [6], [14]. NFsim and BioNetGen source code are available at http://code.google.com/p/nfsim and http://code.google.com/p/bionetgen, respectively. Additional documentation for BioNetGen can be found at http://bionetgen.org.

Results

A hybrid particle/population simulation approach

In this section, we first present an approach, termed “partial network expansion,” for transforming a rule-based model into a dynamically-equivalent, partially-expanded form. We then describe a simple modification to the network-free simulation protocol that permits simulation of the transformed model as a collection of both particles and population variables. We refer to the combination of these methods as the hybrid particle/population (HPP) simulation method. The basic workflow is shown in Fig. 1.

Figure 1

Basic workflow of the HPP simulation method.

Basic workflow of the HPP simulation method.

Given a rule-based model and a user-specified set of population-mapping rules (which define the population species), partial network expansion (PNE) is performed to generate a hybrid version of the original model. The hybrid model is then passed to a population-adapted network-free simulator (e.g., NFsim 1.11), which generates the time-evolution trajectories for all observable quantities specified in the original model. The HPP approach is analogous to the coupled procedure of network generation and simulation described above, where a rule-based model is first transformed into a fully-expanded reaction network and then simulated as a collection of population variables (i.e., species) using a network-based simulator. The obvious differences are that in HPP the network is only partially expanded and the system can only be simulated stochastically using a population-adapted network-free simulator. The partial network expansion algorithm has been implemented within the open-source rule-based modeling package BioNetGen [5], [6], [14] and resulting hybrid models can be simulated using version 1.11 (or later) of the network-free simulator NFsim [16], which has been modified to handle population-type species. For convenience, we adhere in this paper to the BNGL syntax, which is summarized in Sec. S3.1 of Text S1 of the supporting material. However, the HPP method is generally applicable to any rule-based modeling language for which there exists a network-free simulator capable of handling a mixed particle/population system representation, e.g., KaSim 3.x for Kappa language models (see https://github.com/jkrivine/KaSim).

Population species and population-mapping rules

Given a rule-based model, the first step in the HPP approach is to select a subset of species to treat as “lumped” population variables. There are no hard-and-fast rules for doing this but, generally speaking, species that are good candidates for a population treatment (i) have a small number of components and internal states, (ii) participate in a small number of rules, and (iii) maintain a large population throughout the course of a simulation. An example is a simple ligand species that exists in great excess in the extracellular environment and interacts with cell surface receptors. It is our experience that these simple rules of thumb, combined with the experience and intuition of the modeler, are usually sufficient for selecting an adequate set of population species. However, in some cases a more systematic approach may be desirable. We will return to this topic below. For now, however, let us assume that we have selected a suitable set of population species. The next step in the HPP approach is to map each of these to an associated unstructured species. The mapping is accomplished by defining a population-mapping rule, which follows the same syntactic conventions as a standard BNGL rule. For example, the rulemaps the unbound EGF ligand, Egf(r), to the unstructured species pop_Egf(). To avoid confusion, we will henceforth refer to species on the reactant side of a population-mapping rule, such as Egf(r), as structured population species and to those on the product side as unstructured population species. Importantly, unstructured population species differ from conventional unstructured molecules in BNGL in that they possess a property, called a count, which records their current population (see Sec. S3.3 of Text S1 and Texts S4, S7, S10, S13, S14, and S17 to see how the population keyword is used to make this distinction). The action of the population-mapping rule above is thus to delete the Egf(r) molecule and to increment by one the count of pop_Egf(). The role of the rate parameter k_lump, termed the lumping rate constant, will be explained in detail below.

Partial network expansion

Ultimately, our goal in the HPP method is to replace in the simulation environment large numbers of indistinguishable particles with small numbers of lumped objects containing population counters (the unstructured population species), thus significantly reducing memory usage. In order to accomplish this without losing any information regarding the dynamics of the system, we must partially expand the rule set of the original model until all interactions and transformations in which the structured population species participate as reactants (see below) are enumerated. We can then swap the structured species with their unstructured counterparts, which have been specified via the population-mapping rules. We refer to this procedure as partial network expansion (PNE). The PNE algorithm is comprised of three basic steps, which are applied to each rule of a rule-based model: For each reactant pattern in the rule, identify all matches of that pattern into the set of structured population species. Also collect a self-match of the reactant pattern unless it equals one of the population species (this can only happen if the reactant pattern is a fully-specified species; see below for further discussion). Derive an expanded set of rules by applying the rule to all possible combinations (the cartesian product) of the pattern matches collected in Step 1. For each derived rule from Step 2, replace each instance of a structured population species with its unstructured population counterpart. The result is an expanded rule set consisting of three general types of rules: (i) particle rules, in which all reactants are conventional reactant patterns; (ii) mixed particle/population rules, where at least one reactant is a conventional reactant pattern and one is an unstructured population species; (iii) pure population reactions, where all reactants are unstructured population species. This expanded rule set has the property that every possible action of the original rule set on the population species is enumerated while actions on particle objects remain pattern-based (i.e., non-enumerated). For a more formal presentation of the PNE algorithm, complete with pseudocode, we direct the reader to Sec. S4.2 of Text S1.

Role of the population-mapping rules

After completion of PNE, the final step in transforming a rule-based model into a form that can be simulated as a hybrid particle/population system is to append the population-mapping rules to the expanded rule set. The reason for doing this is not immediately obvious. We have seen above that the population-mapping rules specify which structured species are to be replaced in the transformed model with population variables. However, an obvious question to ask is why we have chosen to specify this information via a set of reaction rules, rather than simply as a list of species to be lumped. The answer is combinatorial complexity. As explained above, systems that are combinatorially complex are comprised of a relatively small number of constituent parts but exhibit an explosion in the number of potential species and reactions due to the myriad number of ways in which these parts can be connected and arranged. Rule-based modeling is effective in representing these systems because it focuses only on the portions of molecular complexes that affect biochemical reactivity, not on entire species. However, a consequence of this approach is that there is often ambiguity regarding the products of a reaction rule. A rule may describe the breaking of a bond between two molecules, for example, but the exact composition of the resulting complexes is left necessarily ambiguous (see Fig. 2).

Figure 2

Simple illustration of ambiguity in the products of reaction rules.

Simple illustration of ambiguity in the products of reaction rules.

(A) A simple rule encodes the reversible binding of two molecule types, A and B. (B)–(D) If both molecules have multiple binding sites then they may be present within arbitrarily complex complexes. Breaking the bond between A and B thus produces a variety of product species, some of which may correspond to population species and others not. Dashed line represents a bond addition/deletion operation. With regard to the HPP approach, this ambiguity in the products of a reaction rule complicates the process of PNE. Application of a reaction rule to one complex may produce a population species, whereas application of the same rule to a different complex may not. Distinguishing between cases where population species are produced and where they are not is difficult, and may even be impossible if the system is combinatorially complex. Thus, the strategy that we have adopted here is to expand the network out only to the point where all population species on the reactant side are enumerated and to handle the ambiguity in products by adding the population-mapping rules to the rule set. The role of the population-mapping rules is thus to detect any instances of structured population species that appear in the simulation environment as products of a rule application and to gather them up into the unstructured population pool. This returns us to the issue of the lumping rate constant, k_lump. In Step 1 of the PNE algorithm, if a reactant pattern equals a population species then we discard the self-match (the structured version of the population species). To see why we do this, consider the binding rule depicted in Fig. 2A. However, different from Figs. 2B–D, assume that molecules A and B have only one binding site each. If we choose to lump the unbound molecules then we must define the following population-mapping rules: Obviously, these structured population species are equivalent to the reactant patterns in Fig. 2A. However, let us choose not to discard the self-matches in this case. PNE would then generate the following four derived rules: We see that the first three of these rules have conventional (structured) reactant patterns. However, if k_lump is sufficiently large then particle instances of A(b) and B(a) will never exist in the system long enough to be matched to these patterns. Thus, these rules can be safely discarded, which is equivalent to discarding the self-match in Step 1 of the PNE algorithm. Retaining only the fourth derived rule (the pure population version) simplifies the process and keeps the size of the derived rule set to a minimum. The consequence of this is obviously that the HPP method is formally exact only for an infinite lumping rate constant. From a practical point of view, this could be a problem if the network-free simulator being used does not support infinite rates (e.g., NFsim currently does not). However, our performance tests indicate that as long as k_lump is “large” with respect to the model dynamics then essentially exact results can be obtained (see ‘Performance analyses’). Nevertheless, we have implemented in BioNetGen a “safe” mode for PNE that retains all of the self-matches and, hence, produces exact results for any value of k_lump (see Sec. S3.3 of Text S1 for instructions on how to call this method). For a select number of examples, we have confirmed that both approaches give essentially identical results for sufficiently large k_lump and that the “safe” mode is less efficient (data not shown).

Simple example of PNE

PNE is best illustrated through an example. In Fig. 3, we present a simple rule-based model of receptor activation (for brevity, parameters, initial populations, and output observables are omitted; see Text S2 of the supporting material for the complete model in BNGL format). The model includes a ligand, L, its cognate receptor, R, and three cytosolic proteins, A, B, and C, that are recruited to the phosphorylated receptor. The 16 rules (six unidirectional and five reversible), describing ligand-receptor binding, receptor phosphorylation/dephosphorylation, and protein recruitment, encode a reaction network comprised of 56 species and 287 reactions. In applying the HPP method, eight species are selected for lumping: free ligand, free A, B and C, and complexes of A, B and C that exclude the receptor. Receptor complexes are treated as particles because there are many possible receptor configurations (48 total).

Figure 3

Simple receptor activation model in BNGL format.

Abridged; see Text S2 of the supporting material for the complete model and Text S3 for the population-mapping rules.

Simple receptor activation model in BNGL format.

Abridged; see Text S2 of the supporting material for the complete model and Text S3 for the population-mapping rules. In Fig. 4, a step-by-step application of PNE to rule 11f (forward) of Fig. 3 is presented. First, both reactant patterns are matched to the structured population species. Reactant pattern 1 has one match, while reactant pattern 2 has two. Note that since neither reactant pattern exactly equals a species (i.e., is isomorphic to one) the self match (identity automorphism) is added to the reactant match list in both cases. Next, the rule is applied to each possible reactant set (the cartesian product of the reactant match lists). This results in a set of six derived rules. The structured population species are then replaced in these rules by their associated unstructured species, resulting in one pure particle rule (the original rule), three mixed particle/population rules, and two pure population reactions. Including the population-mapping rules, the hybrid model contains a total of 42 rules, more than the original 16 but significantly less than the 287 reactions of the fully-expanded network. The complete partially-expanded HPP model in BNGL format can be found in Text S4 of the supporting material.

Figure 4

Partial network expansion (PNE) applied to Rule 11f of Fig. 3.

See Text S4 of the supporting material for the complete, partially-expanded model.

Partial network expansion (PNE) applied to Rule 11f of Fig. 3.

See Text S4 of the supporting material for the complete, partially-expanded model.

Population-adapted network-free simulation

Although modified relative to the original, the hybrid model generated from PNE remains properly a rule-based model. As such, it can, in principle, be simulated with any of the network-based (after network generation) and network-free simulation methods described above. However, the advantage of recasting the original model into the hybrid form is that it can be represented as a collection of particles and population objects and simulated using a modified network-free method that has the following attributes: (i) a population count property for each molecule object; (ii) a transformation that performs population increments and decrements; (iii) a method for calculating population-weighted propensities (rates). Examples of population-adapted network-free simulators are NFsim 1.11 and KaSim 3.x. The population-weighted propensity of a rule can be calculated asHere, is the rate constant (more generally, the “single-site rate law” [6]), is the symmetry factor (see Note 4.21 of Ref. [6]), is the number of reactant patterns in the rule (i.e., the molecularity), is the total number of complexes in the system, is the population of complex (unity in the case of particles), and is the number of matches of reactant pattern into complex (unity or zero for unstructured population species, i.e., the species either is the reactant or it is not). The difference between Eq. 1 and the formula used for calculating propensities in standard network-free simulators is the term ; a fully particle-based network-free calculation is recovered if all . Conversely, the difference between Eq. 1 and the formula used in network-based SSA simulators is the term ; a fully population-based calculation is recovered if all or , in which case is the total number of species in the network. Equation 1 thus generalizes the concept of propensity for hybrid systems comprised of both particles and population variables. Also note that for symmetric population reactions, e.g., pop_A()+pop_A()→A(a!0).A(a!0), the possibility of a null event must be calculated in order to prevent reactions involving the same molecule. This is accomplished by rejecting the event with probability . Furthermore, since population species have zero components, if complex is a population species and , then for all . This property is useful because it guarantees that a reactant pattern matches either particles or population species exclusively, never a mixture of both. Thus, once a rule has been selected to fire, the particles to participate in that rule can be selected from a uniform distribution rather than from a population-weighted distribution.

Performance analyses

Peak memory use and CPU run time

In Figs. 5–8, panels A, we show absolute and relative (with respect to NFsim) peak memory use as a function of cell fraction, , for all models considered. We see that in all tested cases HPP requires less memory than NFsim. For NFsim, we also see the expected linear relationship (Table 1) between peak memory use and particle number (i.e., cell fraction; the slight deviation from linearity is an artifact of how memory is allocated in NFsim). For HPP, peak memory use also scales linearly with particle number, but with a smaller slope. This is the expected behavior since as the cell fraction is increased (keeping concentrations constant) a portion of the added particles, and hence memory cost, is always absorbed by the population portion of the system. Furthermore, in cases where network generation is possible (, Fig. 7A; EGFR, Fig. 8A), we see the expected constant relationship between memory usage and particle number for the SSA (Table 1). We also see that the SSA requires more memory than both NFsim and HPP for all cell fractions considered. This is due to the high memory cost of the dependency update graph [38] used in the SSA implementation within BioNetGen, which scales with the product of the number of reactions in the network and the number of reactions updated after each reaction firing (see Table 1).

Figure 8

HPP performance analysis for the EGFR signaling model.

Figure 7

HPP performance analysis for the signaling model.

HPP performance analysis for the TLBR model.

(A) peak memory usage (left: absolute, right: relative to NFsim); (B) CPU run time (left: absolute, right: relative to NFsim); (C) number of reaction events fired during a simulation (); (D) equilibrium distribution of number of clusters (). The slight deviation from linearity for ‘NF’ in (A) is an artifact of how memory is allocated in NFsim.

HPP performance analysis for the actin polymerization model.

(A) peak memory usage (left: absolute, right: relative to NFsim); (B) CPU run time (left: absolute, right: relative to NFsim); (C) number of reaction events fired during a simulation (); (D) equilibrium distribution of actin polymer lengths (). The slight deviation from linearity for ‘NF’ in (A) is an artifact of how memory is allocated in NFsim.

HPP performance analysis for the signaling model.

HPP performance analysis for the EGFR signaling model.

(A) peak memory usage (left: absolute, right: relative to NFsim); (B) CPU run time (left: absolute, right: relative to NFsim); (C) number of reaction events fired during a simulation (); (D) timecourses (means and 5–95% frequency envelopes; ) for activated Sos (top) and nuclear phosphorylated ERK (bottom). The slight deviation from linearity for ‘NF’ in (A) is an artifact of how memory is allocated in NFsim. Due to high computational expense, SSA statistics were not collected in (C) and (D). In Figs. 5–8, panels B, we show absolute and relative (with respect to NFsim) CPU run times as a function of cell fraction. Generally speaking, HPP and NFsim run times are comparable in all cases, indicating that the reductions in memory use seen in Figs. 5–8, panels A, are not achieved at the cost of increased run times. In fact, HPP is slightly faster than NFsim in most cases. This is because operations on population species (e.g., increment/decrement) are less costly than the graph operations applied to particles (e.g., subgraph matching). Also note in Fig. 5B the expected quadratic relationship between run time and particle number for the TLBR model (Table 1), which is due to the formation of a super aggregate near the solution-gel phase boundary [23], [42]. In Figs. 7B and 8B, we see that the SSA is slower than both NFsim and HPP for all cell fractions considered. The difference is most pronounced at small cell fractions and is much more significant for EGFR than for . This is expected since previous work [16] has shown that network-free methods perform particularly well for systems with small numbers of particles and large networks (the EGFR network is significantly larger than the network; Table 2). Finally, we see in Fig. 7B that the CPU run time increases as we increase the number of species treated as populations in the model, even though the memory usage remains constant (Fig. 7A). This is interesting because it suggests that the variant, with free ligand as the only population species, is near-optimally lumped for the cell fractions considered. We revisit the issue of optimal lumping sets below. In Figs. 5–8, panels C, we show distributions of the number of reaction firings per simulation run for each of the simulation methods considered. It is evident that for all models the distributions, as illustrated by box plots, are similar for NFsim, HPP, and SSA (the latter for only; Fig. 7C). Statistically speaking, the two-sided Mann-Whitney U test [74], [75] was unable to reject the null hypothesis in all cases at the 5% significance level (TLBR: ; Actin: ; : ; EGFR: ). There is no evidence, therefore, that HPP does not generate statistically identical numbers of reaction firings to both NFsim and SSA. This is as expected since all methods are exact-stochastic approaches. In Figs. 5–8, panels D, we compare distributions obtained from NFsim and HPP simulations of all models. In Fig. 5D, we show equilibrium distributions of the number of receptor clusters in the TLBR model (). In Fig. 6D, equilibrium distributions of polymer lengths in the Actin model are shown (). In both cases, the NFsim and HPP distributions are statistically indistinguishable (TLBR: ; Actin: ). In Fig. 7D, time courses for phosphorylated receptor and receptor-recruited, phosphorylated Syk are shown (). In Fig. 8D, time courses for membrane-recruited (active) SOS and nuclear phospho-ERK are shown (). Although we did not perform any statistical tests, visual inspection of the trajectories clearly shows that in all cases the NFsim and HPP results are virtually identical.

Figure 6

HPP performance analysis for the actin polymerization model.

(A) peak memory usage (left: absolute, right: relative to NFsim); (B) CPU run time (left: absolute, right: relative to NFsim); (C) number of reaction events fired during a simulation (); (D) equilibrium distribution of actin polymer lengths (). The slight deviation from linearity for ‘NF’ in (A) is an artifact of how memory is allocated in NFsim.

Systematic approach to selecting population species

All of the HPP results presented in Figs. 5–8 were obtained with “hand-picked” sets of population species chosen based on modeler experience and intuition. The significant memory savings seen in these plots imply that this approach will often be sufficient in practice. However, it is fair to ask whether a more systematic approach to selecting population species can achieve additional memory savings. In order to address this question, we considered a variety of different lumping sets for each example model and compared their performance in terms of memory usage and CPU run time. The lumping sets were chosen based on average species populations calculated over the course of a single NFsim pre-simulation at cell fraction . Specifically, at periodic intervals, the full set of complexes in the system was collected, each complex canonically labeled, and the number of instances of each label (i.e., species) counted. Average values over the entire simulation were then calculated for each species. Sets of population species were constructed by lumping all species with an average population greater than a range of pre-defined thresholds. For convenience, we chose thresholds of . Average species populations obtained from each NFsim pre-simulation are provided in supplementary Dataset S1. The script that implements this method (for a single threshold) has been included in the recent BioNetGen 2.2.5 release (auto_hpp.pl in the Perl2 subdirectory). In Fig. 9, we show peak memory use and CPU run times for HPP simulations of each model at each lumping set considered. In general, these results illustrate the success of the hand-picked lumping sets, which produced memory savings close to the optimal in most cases. There was, however, some room for improvement in the model (Fig. 9C). This is because the fourth and fifth most populated species for this model were complexes comprised of five molecular subunits (see Dataset S1). Since we did not anticipate this result, these high-population species were not included in the hand-picked lumping set. The majority of the memory savings seen in Fig. 9C for thresholds are due to lumping of these species. Thus, our results also illustrate the value of using a more systematic approach to selecting population species in some cases.

Figure 9

HPP performance analyses for various lumping thresholds at cell fraction .

HPP performance analyses for various lumping thresholds at cell fraction .

(A) TLBR; (B) Actin; (C) ; (D) EGFR. In all plots, threshold values for different lumping sets are shown on the x-axis. For TLBR and Actin, some thresholds yield the same set of population species as larger thresholds and are thus omitted from the figures. For TLBR, results for thresholds are omitted due to impractically large partial networks in those cases. Results for NFsim (‘NF’) and the hand-picked lumping sets from Figs. 5–8 (‘HPP’) are shown in all plots for comparison. Error bars show standard error (three samples). It is also interesting to note in Figs. 9C and 9D the presence of an optimal lumping threshold between the maximum and minimum values considered. At high thresholds, most species are treated as particles and higher memory use is expected. At low thresholds, however, the higher memory use is due to the larger size of the partially-expanded network. Also interesting is that the run time results in Fig. 9 show a weak (if any) dependence on the chosen threshold, despite the fact that the time complexity of network-free methods scales linearly with rule set size (Table 1). Presumably, this is because the lower cost operations (increment/decrement) associated with the population species offset the increased cost of larger rule sets. This robustness of the time cost with respect to the size of the lumping set is a positive attribute of the HPP method.

Discussion

We have presented a hybrid particle/population simulation approach for rule-based models of biological systems. The HPP approach is applied in two stages (Fig. 1): (i) transformation of a rule-based model into a dynamically-equivalent hybrid form by partially expanding the network around a selected set of population species; (ii) simulation of the transformed model using a population-adapted network-free simulator. The method is formally exact for an infinite population lumping rate constant, but can produce statistically exact results in practice provided that a sufficiently large value is used (Figs. 5–8, panels C and D). As currently implemented, the primary advantage of the HPP method is in reducing memory usage during simulation (Figs. 5–8, panels A). Importantly, this is accomplished with little to no impact on simulation run time (Figs. 5–8, panels B). We have shown that peak memory use for HPP scales linearly with particle number (with a slope that is smaller than for NFsim; Figs. 5–8, panels A) and confirmed that when network generation is possible SSA memory use is approximately independent of particle number (Figs. 7A and 8A). At the system volumes that we have considered here, HPP memory use is significantly less than for SSA. However, the linear scaling of HPP and the constant scaling of SSA indicate that with further increases in the system volume there will invariably come a point where HPP memory use exceeds that of SSA. This is because species that are rare at small volumes, and hence chosen to be treated as particles, become plentiful at large volumes. Intuitively, a partially-expanded network should never require more memory than a fully-enumerated network. However, as currently implemented, there is no way to strictly enforce this restriction because HPP requires that population species be chosen prior to PNE. In Fig. 9, we have shown how a systematic approach to choosing population species can optimize memory usage for a given system volume. However, this approach requires running an NFsim pre-simulation, which may not be feasible for systems with extremely large numbers of particles (e.g., whole cells). Thus, we propose to develop a more general version of HPP that dynamically tracks the populations of species during the course of a simulation and automatically selects those to treat as population variables based on some criteria, e.g., that their population exceeds a certain threshold. In this automated version of HPP (aHPP), PNE would be performed every time a new species is lumped. If all species in the system become lumped then the network will naturally become fully enumerated. Hence, the memory load will never exceed that of the fully-expanded network. In Fig. 10, we provide a qualitative sketch of how we expect the memory usage of this hypothetical aHPP method to scale with system volume (particle number). Included for comparison are scalings for HPP, NFsim, and SSA. For models with finite networks (such as and EGFR), aHPP memory use should plateau once the entire reaction network has been generated. For models with infinite networks (such as TLBR and Actin), we expect aHPP memory use at large volumes to scale somewhere between constant and linear (no worse than HPP) depending on the model. A detailed analysis of the space complexity of a hypothetical, “optimal” aHPP method is provided in Sec. S2 of supplementary Text S1.

Figure 10

Memory use vs. simulated volume for different simulation methods, including a hypothetical automated HPP (aHPP).

Memory use vs. simulated volume for different simulation methods, including a hypothetical automated HPP (aHPP).

For finite networks, aHPP memory use plateaus once the entire reaction network has been generated. For infinite networks, the scaling at large volumes should fall somewhere between constant and linear (no worse than HPP) depending on the model (see Sec. S2 of Text S1 for an analysis). In order to frame our results within a real-world context, we have estimated the cost of simulation based on hourly rates of on-demand instances on the Amazon Elastic Compute Cloud (EC2). In Fig. 11, we show the hourly cost (per “effective compute unit”) of simulation as a function of required memory per simulation (details of the calculation can be found in Sec. S1 of Text S1). Also included in the plot are values for HPP (), NFsim (), and SSA () simulations of the EGFR model at cell fraction (Fig. 8A). Our calculations show that below of required memory High-CPU instances are the most cost effective. Above this threshold High-Memory instances are the better option. The HPP simulation falls below this cutoff while both NFsim and SSA lie above. There is a quantifiable benefit, therefore, to reducing memory usage in this case; HPP simulations on the EC2 would be and times less expensive, respectively, than NFsim and SSA (HPP is slightly faster than NFsim and significantly faster than SSA; Fig. 8B). Thus, the reduction in memory usage offered by HPP is not simply of academic interest but can impact, in a tangible way, the cost of doing computational research.

Figure 11

Cost of running simulations on the Amazon Elastic Compute Cloud (EC2).

Cost of running simulations on the Amazon Elastic Compute Cloud (EC2).

The minimum cost as a function of memory requirement was calculated based on January 2012 pricing (http://aws.amazon.com/ec2/) of all Standard, High-CPU, and High-Memory EC2 instances (see Sec. S1 of Text S1 for details of the calculation). Also included are values for NFsim, HPP, and SSA simulations of the EGFR model at cell fraction . Finally, even greater benefits are possible if, in addition to reducing memory usage, the speed of HPP simulations can be increased. leaping [36], [77]–[79] is an approach for accelerating stochastic simulations of chemically reactive systems. With a few exceptions (e.g., Ref. [80]), leaping has been applied primarily to fully-enumerated reaction networks. We believe that the HPP method provides a unique setting for the application of because, unlike in pure particle-based methods, there exists a partial network of reactions that act on population species. Thus, a network-based leaping method can be applied exclusively to the population component of a system while retaining the network-free approach in the particle component. We have recently implemented a leaping variant in BioNetGen, known as the partitioned-leaping algorithm [22], and are actively working on integrating it with the HPP. Average species populations from NFsim pre-simulations () of all example models considered in Fig. 9. (XLSX) Click here for additional data file. Average number of reactions that must be updated after each reaction firing (i.e, dependencies) for a collection of signaling models of varying network size (all models are included in the BioNetGen 2.2.5 release available at http://bionetgen.org). (EPS) Click here for additional data file. Sec. S1: Details of the monetary cost analysis shown in Fig. 11; Sec. S2: Space complexity analyses for the network-based SSA, network-free, HPP, and hypothetical aHPP methods (Fig. 10); Sec. S3: Overview of BNGL, model files, and running HPP simulations with BioNetGen/NFsim; Sec. S4: BNGL formalism and the formal foundation of the PNE algorithm (with pseudocode). (PDF) Click here for additional data file. Complete BNGL file for the simple receptor activation model of Fig. 3 (receptor_activation.bngl). (TXT) Click here for additional data file. HPP configuration file for the simple receptor activation model, including population mapping rules and instructions for executing NFsim and HPP simulations (run_receptor_activation.bngl). (TXT) Click here for additional data file. Partially-expanded (HPP) version of the simple receptor activation model of Fig. 3 generated using the method outlined in Fig. 4 (receptor_activation_hpp.bngl). (TXT) Click here for additional data file. BNGL file for the TLBR model (tlbr.bngl). (TXT) Click here for additional data file. HPP configuration file for the TLBR model (run_tlbr.bngl). (TXT) Click here for additional data file. HPP version of the TLBR model (tlbr_hpp.bngl). (TXT) Click here for additional data file. BNGL file for the Actin model (actin_simple.bngl). (TXT) Click here for additional data file. HPP configuration file for the Actin model (run_actin_simple.bngl). (TXT) Click here for additional data file. HPP version of the Actin model (actin_simple_hpp.bngl). (TXT) Click here for additional data file. BNGL file for the model (fceri_gamma2.bngl). (TXT) Click here for additional data file. HPP configuration file for the model (run_fceri_gamma2.bngl). (TXT) Click here for additional data file. HPP version of the model with free ligand treated as the only population species (fceri_gamma2_hpp1.bngl). (TXT) Click here for additional data file. HPP version of the model with free ligand, cytosolic Lyn and all four phosphorylation states of cytosolic Syk treated as population species (fceri_gamma2_hpp6.bngl). (TXT) Click here for additional data file. BNGL file for the EGFR model (egfr_extended.bngl). (TXT) Click here for additional data file. HPP configuration file for the EGFR model (run_egfr_extended.bngl). (TXT) Click here for additional data file. HPP version of the EGFR model (egfr_extended_hpp.bngl). (TXT) Click here for additional data file.

56 in total

1. SnapShot: key numbers in biology.

Authors: Uri Moran; Rob Phillips; Ron Milo
Journal: Cell Date: 2010-06-25 Impact factor: 41.582

2. Automatic generation of cellular reaction networks with Moleculizer 1.0.

Authors: Larry Lok; Roger Brent
Journal: Nat Biotechnol Date: 2005-01 Impact factor: 54.908

3. 'On-the-fly' or 'generate-first' modeling?

Authors: Michael L Blinov; James R Faeder; Jin Yang; Byron Goldstein; William S Hlavacek
Journal: Nat Biotechnol Date: 2005-11 Impact factor: 54.908

4. Signaling through receptors and scaffolds: independent interactions reduce combinatorial complexity.

Authors: Nikolay M Borisov; Nick I Markevich; Jan B Hoek; Boris N Kholodenko
Journal: Biophys J Date: 2005-05-27 Impact factor: 4.033

5. Pleomorphic ensembles: formation of large clusters composed of weakly interacting multivalent molecules.

Authors: Cibele V Falkenberg; Michael L Blinov; Leslie M Loew
Journal: Biophys J Date: 2013-12-03 Impact factor: 4.033

6. A whole-cell computational model predicts phenotype from genotype.

Authors: Jonathan R Karr; Jayodita C Sanghvi; Derek N Macklin; Miriam V Gutschow; Jared M Jacobs; Benjamin Bolival; Nacyra Assad-Garcia; John I Glass; Markus W Covert
Journal: Cell Date: 2012-07-20 Impact factor: 41.582

7. Kinetic Monte Carlo method for rule-based modeling of biochemical networks.

Authors: Jin Yang; Michael I Monine; James R Faeder; William S Hlavacek
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2008-09-10

Review 8. Guanine nucleotide exchange factors: activators of Ras superfamily proteins.

Authors: A F Overbeck; T R Brtva; A D Cox; S M Graham; S Y Huff; R Khosravi-Far; L A Quilliam; P A Solski; C J Der
Journal: Mol Reprod Dev Date: 1995-12 Impact factor: 2.609

Review 9. Toward a comprehensive language for biological systems.

Authors: James R Faeder
Journal: BMC Biol Date: 2011-10-17 Impact factor: 7.431

10. Emergence of large-scale cell morphology and movement from local actin filament growth dynamics.

Authors: Catherine I Lacayo; Zachary Pincus; Martijn M VanDuijn; Cyrus A Wilson; Daniel A Fletcher; Frank B Gertler; Alex Mogilner; Julie A Theriot
Journal: PLoS Biol Date: 2007-09 Impact factor: 8.029

16 in total

1. Derivative-Free Optimization of Rate Parameters of Capsid Assembly Models from Bulk in Vitro Data.

Authors: Lu Xie; Gregory R Smith; Russell Schwartz
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2016-05-05 Impact factor: 3.710

Review 2. Modeling for (physical) biologists: an introduction to the rule-based approach.

Authors: Lily A Chylek; Leonard A Harris; James R Faeder; William S Hlavacek
Journal: Phys Biol Date: 2015-07-16 Impact factor: 2.583

Review 3. Rule-based modeling: a computational approach for studying biomolecular site dynamics in cell signaling systems.

Authors: Lily A Chylek; Leonard A Harris; Chang-Shung Tung; James R Faeder; Carlos F Lopez; William S Hlavacek
Journal: Wiley Interdiscip Rev Syst Biol Med Date: 2013-09-30

4. Lazy Updating of hubs can enable more realistic models by speeding up stochastic simulations.

Authors: Kurt Ehlert; Laurence Loewe
Journal: J Chem Phys Date: 2014-11-28 Impact factor: 3.488

5. Asynchronous τ-leaping.

Authors: Zbigniew Jȩdrzejewski-Szmek; Kim T Blackwell
Journal: J Chem Phys Date: 2016-03-28 Impact factor: 3.488

6. BioNetGen 2.2: advances in rule-based modeling.

Authors: Leonard A Harris; Justin S Hogg; José-Juan Tapia; John A P Sekar; Sanjana Gupta; Ilya Korsunsky; Arshi Arora; Dipak Barua; Robert P Sheehan; James R Faeder
Journal: Bioinformatics Date: 2016-07-08 Impact factor: 6.937

7. Mechanisms of noncanonical binding dynamics in multivalent protein-protein interactions.

Authors: Wesley J Errington; Bence Bruncsics; Casim A Sarkar
Journal: Proc Natl Acad Sci U S A Date: 2019-11-27 Impact factor: 11.205

8. In silico prediction of ErbB signal activation from receptor expression profiles through a data analytics pipeline.

Authors: Arya A Das; Elizabeth Jacob
Journal: J Biosci Date: 2018-06 Impact factor: 1.826

9. A scalable, open-source implementation of a large-scale mechanistic model for single cell proliferation and death signaling.

Authors: Cemal Erdem; Arnab Mutsuddy; Ethan M Bensman; William B Dodd; Michael M Saint-Antoine; Mehdi Bouhaddou; Robert C Blake; Sean M Gross; Laura M Heiser; F Alex Feltus; Marc R Birtwistle
Journal: Nat Commun Date: 2022-06-21 Impact factor: 17.694

10. BiPSim: a flexible and generic stochastic simulator for polymerization processes.

Authors: Stephan Fischer; Marc Dinh; Vincent Henry; Philippe Robert; Anne Goelzer; Vincent Fromion
Journal: Sci Rep Date: 2021-07-08 Impact factor: 4.379