Literature DB >> 23423320

Programming biological models in Python using PySB.

Carlos F Lopez¹, Jeremy L Muhlich, John A Bachman, Peter K Sorger.

Abstract

Mathematical equations are fundamental to modeling biological networks, but as networks get large and revisions frequent, it becomes difficult to manage equations directly or to combine previously developed models. Multiple simultaneous efforts to create graphical standards, rule-based languages, and integrated software workbenches aim to simplify biological modeling but none fully meets the need for transparent, extensible, and reusable models. In this paper we describe PySB, an approach in which models are not only created using programs, they are programs. PySB draws on programmatic modeling concepts from little b and ProMot, the rule-based languages BioNetGen and Kappa and the growing library of Python numerical tools. Central to PySB is a library of macros encoding familiar biochemical actions such as binding, catalysis, and polymerization, making it possible to use a high-level, action-oriented vocabulary to construct detailed models. As Python programs, PySB models leverage tools and practices from the open-source software community, substantially advancing our ability to distribute and manage the work of testing biochemical hypotheses. We illustrate these ideas using new and previously published models of apoptosis.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 23423320 PMCID： PMC3588907 DOI： 10.1038/msb.2013.1

Source DB: PubMed Journal: Mol Syst Biol ISSN： 1744-4292 Impact factor: 11.429

Introduction

Mechanistic studies that couple experimentation and computation typically rely on models optimized for specific questions and biological settings. Such ‘fit-to-purpose' models can effectively describe and elucidate complex biological processes, but given the available data, they are usually restricted in scope, encompassing only a subset of cellular biochemistry (Batchelor et al, 2008; Xu et al, 2010; Huber et al, 2011; Schleich et al, 2012). Even in disciplines in which modeling is more mature, all-encompassing models are rare. However, it is common for fit-to-purpose models to require changes involving the addition, modification or elimination of species and reactions based on new discoveries. Often a family of models is needed (Albeck et al, 2008b; Muzzey et al, 2009; Rehm et al, 2009; Xu et al, 2010; Kleiman et al, 2011), each of which represents a competing molecular hypothesis derived from the literature, a different way of encoding ambiguous ‘word models' drawn from molecular biology, or postulated differences in a network from one cell type to the next (Gnad et al, 2012). One of the promises of systems biology is that collaborative and iterative construction and testing of models can improve hypotheses by subjecting them to an ongoing process of testing and improvement. However, the current proliferation of independently derived fit-to-purpose models frustrates this goal (Krakauer et al, 2011). We require ‘second generation' tools that leverage existing approaches to biological model construction and documentation while adding new means for modifying, comparing and sharing models in a transparent manner. Dynamic biological systems are generally modeled using stochastic or deterministic rate equations. The latter can be described using networks of ordinary differential equations (ODEs) that precisely represent mass action kinetics. However, when a network model has many species and variables, equations become a poor tool for model development and understanding. Even familiar biochemical processes are remarkably complex at the level of equations: for example, fully accounting for the binding and posttranslational modifications underlying activation of growth factor receptors can require thousands of equations that are tedious to generate, hard to error-check and difficult to understand (Hlavacek et al, 2006). The need for frequent updates is also a challenge because even a simple modification of a biochemical cascade can require dozens of small changes in the corresponding equations. When operating at the level of equations, it is also difficult to reuse the work of others directly. For example, in the field of apoptosis, Howells et al (2010) described a conceptual extension of a previously published model of Bcl-2 proteins (which control mitochondrial outer-membrane permeabilization (MOMP), Box 1) (Chen et al, 2007a) by adding the BH3-only Bcl-2 family member Bad. Interactions among the core set of Bcl-2 proteins were identical between the two models, but Howells et al (2010) rewrote the original set of ODEs simply to add a few new reactions. Manually rebuilding earlier models is not only time-consuming but also error-prone: as described in detail below, the practice has introduced errors and unintended changes in another pair of related apoptosis models. Moreover, the tendency to make numerous trivial changes in duplicated elements (e.g., by renaming species) makes it difficult to focus on key differences, frustrating later attempts at model comparison (Mallavarapu et al, 2008). Several projects are underway to address the problems of model proliferation and complexity using formalisms that aim for abstraction, transparency and reusability. Chief among these is rule-based modeling (Hlavacek et al, 2006; Bachman and Sorger, 2011) in which models are created using specialized languages such as BioNetGen language (BNGL) (Faeder et al, 2009) or Kappa (Danos et al, 2007b). These languages describe local interaction rules between specific domains or ‘sites' on molecules (e.g., enzymes and their substrates) and are easier to understand and reuse than equations (Danos, 2007). Rule-based approaches enable modeling of otherwise intractably complex systems in which posttranslational modification and the formation of multi-protein signaling complexes give rise to large numbers of distinct species (Blinov et al, 2006; Sneddon et al, 2010; Deeds et al, 2012). Rules can also be subjected to formal analysis (Danos et al, 2008; Feret et al, 2009) and used to generate both deterministic and agent-based simulations (Danos et al, 2007a; Sneddon et al, 2010). Although powerful, rule-based languages such as BNGL and Kappa do not exploit ‘higher-order' patterns in biochemical systems such as multi-step catalysis, scaffold assembly, polymerization, receptor internalization, etc. These patterns often recur several times in a single model and also are frequently encountered in models of different molecular networks. Currently, it is necessary to regenerate the rule sets for biochemical patterns each time the patterns arise. The creation of models having related but variant topologies presents an important special case of a higher-order pattern, particularly when a core process remains the same across all models and modifications focus on specific reactions. In both rule- and ODE-based models, it is necessary to implement the change for each model in the set, a laborious process when the number of models is large. Modeling tools that leverage approaches from software engineering are one way to increase reusability and reduce duplication (Mallavarapu et al, 2008; Pedersen and Plotkin, 2008; Danos et al, 2009; Mirschel et al, 2009; Gnad et al, 2012). In particular, LISP frameworks such as little b (Mallavarapu et al, 2008) and ProMot (Mirschel et al, 2009) have demonstrated the value of programmatic approaches. However, ProMot does not use rules, limiting its effectiveness for combinatorially complex systems; little b, while implementing rules internally, does not interoperate with tools and languages from the broader rule-based modeling community and is no longer in development (the similarities and differences between the little b and ProMot approaches have been described previously (Mallavarapu et al, 2008)). Combining the strengths of rule-based and programmatic approaches to modeling is a key goal of the work described here. A benefit of modeling biological systems using contemporary approaches from computer science and open-source software engineering is the ready availability of tools and best practices for managing and testing complex code. Good software engineering practice promotes abstraction, composition and modularity (Mallavarapu et al, 2008; Mirschel et al, 2009). Through abstraction, the core features of a concept or process are separated from the particulars: for example, a pattern of biochemical reactions (e.g., phosphorylation–dephosphorylation of a substrate) is described once in a generic form as a subroutine and then instantiated for specific models simply by specifying the arguments (e.g., species such as Raf, PP2A, and MEK). In programming, abstraction is achieved through the use of parameterizable functions or macros that are written once and then invoked as needed. Functions can be built up from other functions, a process known as composition. Abstraction and composition can occur at all levels of complexity: just as complex functions can be built from simple functions, large programs can be built up from smaller subsystems that are documented and tested individually. When these subsystems have well-defined input–output interfaces, they can be used as libraries that make it possible to write new programs using a simple vocabulary of well-tested concepts (e.g., a library of biochemical actions or core pathways such as the MAPK cascade) (Pedersen and Plotkin, 2008). The decomposition of complex biological models in this fashion facilitates extensibility and transparency, because well-developed mechanisms can be reused and changes can be localized to the subsystem that needs revision. Contemporary software engineering has much to teach us about the difficult task of developing and documenting models in a distributed setting. Software engineers ‘publish' their findings using robust programming tools that support code annotation, documentation, and verification, all significant challenges in biological modeling (Hlavacek, 2009). The open-source software community also provides a valuable socio-cultural framework for managing large, collaborative projects in the public domain. Version control tools such as Git and Subversion, along with ‘social coding' websites such as GitHub, have facilitated the collaborative development of software as complex as the kernel of the Linux operating system (http://github.com). It would be highly desirable to exploit such social and technical innovation in solving the problems of incremental model development and reuse in biology. In this paper, we describe PySB, an open-source programming framework written in Python that allows concepts and methodologies from contemporary software engineering to be applied to the construction of transparent, extensible and reusable biological models (http://python.org; Oliphant, 2007). A critical feature of modeling with PySB is that models are Python programs, and tools for documentation, testing, and version control (e.g., Git) can be used to help manage model development. Strictly speaking, a PySB ‘model' is a Python program, that, when executed, produces another program (the underlying reaction rules) that can be analyzed or used to create equations for simulation. The PySB framework provides a high-level action-oriented vocabulary congruent with our intuitive understanding of biochemistry (A phosphorylates B, C translocates to the nucleus, etc.). PySB is closely integrated with Python numerical tools for simulation and parameter estimation and graphical tools that enable plotting of model trajectories and topologies. We demonstrate the use of PySB to re-instantiate 12 previously published ODE-based models of the Bcl-2 family proteins that regulate apoptosis (Chen et al, 2007a, Chen et al, 2007b; Cui et al, 2008; Albeck et al, 2008b; Howells et al, 2010). We show how PySB can be used to decompose models into reusable macros that can be independently tested, and we generate composite models that combine a prior model from our lab describing extrinsic apoptosis (Albeck et al, 2008b, Albeck et al, 2008a) with alternative hypotheses about Bcl-2 family interactions from other investigators. Finally, we develop and calibrate an expanded cell-death model that spans the diversity of the multi-protein Bcl-2 family and draws on findings from leading biochemists in the field, depicting a unified, ‘embedded together' mechanism for mitochondrial membrane permeabilization (Leber et al, 2010; Llambi et al, 2011). All models in this paper, along with the PySB source code and documentation, are available for sharing and further development at GitHub and the PySB website (http://pysb.org; Materials and methods).

Results

We chose Python as the language for PySB because of its widespread use in the computational biology community, support for object-oriented and functional programming, and rich ecosystem of mathematical and scientific libraries. At the outset, we determined that PySB should interoperate seamlessly with BioNetGen (Faeder et al, 2009) and Kappa (Danos et al, 2007b), and thereby build on substantial investments in rule-based modeling. A PySB model consists of instances of a core set of classes: Model, Monomer, Parameter, Compartment, Rule, Initial and Observable, closely mirroring the form of BNGL and Kappa models. However, in PySB, the component declarations return software objects inside Python, allowing model elements to be manipulated programmatically. To simplify the process of writing rules and to maximize the syntactic match with BNGL, PySB redefines (overloads) some of the mathematical operators in Python to create a shorthand that resembles chemistry notation (Box 2). For example, in the context of a PySB rule definition, the ‘+' operator (which in other contexts represents mathematical addition) is used to enumerate a list of reactants or products. It is not necessary to use overloaded PySB operators but it makes the models easier to write and understand (see ‘PySB syntax' in the Materials and methods and the Supplementary Materials). By way of illustration, consider a ‘Hello World' program in PySB involving reversible binding of proteins L and R, each of which contains a single binding site s. A PySB program for this simple reaction has high overhead relative to the equivalent set of ODEs but serves to introduce the basic PySB syntax and show how it interoperates with other software such as BNG, the VODE integrator (Brown et al, 1989), and the Matplotlib plotting library (Hunter, 2007). In the first block of the program (Figure 1A; block 1), a declaration of molecule types (‘monomers' in PySB) is followed by the forward and reverse rate parameters, kf and kr, and initial conditions for unbound L and R. The syntax for the reversible binding rule (which will be familiar to users of rule-based languages) reads as follows: when the proteins L and R both have empty binding sites s (e.g., L(s=None)), they reversibly bind to form a complex that shares a single ‘bond' (e.g., L(s=1) % R(s=1)), at rates kf and kr. This approach to naming binding sites (and calling interactions ‘bonds,' without implying covalent modification) is drawn from rule-based languages and makes it possible to describe molecules having multiple sites of interaction with different specificities, modifications, and occupancy states (e.g., the distinct binding sites on the TRAIL receptor for ligand and death-inducing signaling complex (DISC) adaptor proteins). The ‘Hello World' model concludes by designating an observable of interest, the complex LR. (For additional details on the syntax used in this model, see Box 2 and the ‘PySB syntax' section of the Supplementary Materials; a unified modeling language (UML) class diagram of the core PySB classes is also provided in Supplementary Figure S3.) The second block of code in Figure 1A defines a time range and calls the PySB function odesolve, which performs deterministic model simulation by invoking BNG (to generate the reaction network) and the numerical integrator VODE (see also Figure 1B). Simulation results are returned as a matrix that is graphed using the plot command from Matplotlib (Figures 1A, block 3 and B).

Figure 1

Creation and simulation of a ‘Hello World' model in PySB. (A) Creation and deterministic simulation of a model using PySB. The call to Model() creates the pysb.core.Model object to which all subsequently declared components are added. The first block of code declares the molecule types, parameters, initial conditions, reversible reaction rule, and observable necessary for modeling and simulating the reversible binding of proteins L and R. The second block of code calls the odesolve function from the pysb.integrate module to generate and integrate the ODEs. The third block plots the simulated time course using the Matplotlib plot command. Numbers associated with the code blocks identify the correspondences between the code and the control flow shown in B. (B) Control flow for an ODE simulation of a PySB model. The columns ‘User,' ‘PySB', and ‘External Tools' indicate the locus of control of each step in the process (boxes). The ‘Result' column shows the result of each individual step: the gray box indicates results of steps that are internal to the call to odesolve, whereas the other results are visible to the user at the top level. After declaring the model elements as in A, the user calls odesolve, which generates the corresponding BNGL for the model, invokes BNG externally on the generated BNGL code to create the reaction network, parses the reaction network to generate the corresponding set of ODEs, and calls an external integrator (VODE) to generate the trajectories. The trajectories are returned to the user as a NumPy record array, where they are visualized with a call to the plot function from Matplotlib.

Using macros to model recurrent biochemical actions

The benefits of PySB become apparent only with more complex and interesting models in which programming constructs such as conditionals, loops, functions, classes, and modules are used to define reusable elements. The complexity of these elements can vary from a few species to multi-component cascades. Macros, reusable Python functions that are programmatically expanded into rules and reactions, define low-level biochemical actions such as ‘catalyze,' ‘bind,' or ‘assemble,' mirroring the way we describe biological processes verbally. PySB currently contains 13 general-purpose macros covering reversible binding, catalytic modification, synthesis, degradation, and pore assembly. Users can generate other model-specific macros to implement new or uncommon mechanisms, thereby creating libraries of biochemical actions for subsequent modeling projects (we are currently adding new macros ourselves); phosphorylation/dephosphorylation cascades and loops are obvious candidates for such a library. As an example, the catalyze macro implements a mass action model of an enzyme-mediated biochemical transformation (Figure 2A) based on six user-specified arguments: the enzyme and its binding site for substrate, the substrate and its binding site for enzyme, the product, and a list of forward, reverse, and catalytic rate constants. Figure 2A illustrates the application of catalyze to a reaction in which caspase-8 cleaves Bid to form truncated Bid (tBid; see Box 1 for a description of the relevant biology). Improved transparency is an important benefit of using macros in that they explicitly describe how elementary reactions are implemented. For example, catalyze invokes a two-step model of catalysis in which an enzyme–substrate complex is formed as an intermediate step: E+S ↔ES→E+P. Some published models of apoptosis (Chen et al, 2007a, Chen et al, 2007b; Cui et al, 2008; Howells et al, 2010) use a pseudo-first-order, one-step approximation E+S→E+P that has the merit of fewer parameters. However, depending on time scales and parameter values, there is a profound difference in the dynamics of one- and two-step catalysis: the one-step model, for example, makes the strict assumption that the enzyme always operates in its linear range and cannot be saturated. These differences are not apparent from the visual or existing verbal descriptions of the ODEs but instead require careful retrospective analysis. In contrast, PySB allows exploration of mechanistic differences simply by calling an alternative catalysis macro, catalyze_one_step (Supplementary Figure S1B). The benefit of macros is not only that the model is more concise (the catalyze macro call replaces two rules and four ODEs) but also that the difference between one- and two-step catalytic schemes is clearly declared and need not be inferred retrospectively. Because macros are tested programmatically, they also ensure correct instantiation of forward, reverse, and catalytic reactions, an important benefit because failure to implement these processes correctly is remarkably common (see below and Supplementary Note).

Figure 2

Three examples of mechanistic abstractions using macros. Full implementation of all macros can be found in the macros.py file in the PySB source code online (Materials and methods). (A) catalyze. The example call shows how the catalyze macro is called to add a catalytic reaction in which active caspase-8 (C8) binds to untruncated Bid (Bid(state=‘U')) to yield tBid (Bid(state=‘T')). Rate parameters (forward, reverse, and catalytic) are provided in the list `klist'. The ‘Basic implementation' in Supplementary Figure S1A shows the Python source code for a simplified version of the catalyze macro. Execution of the catalyze macro leads to the creation and addition of two rules to the model, which, when converted into BNGL, take the form shown below. Generation of the reaction network via BNG then results in the system of four ODEs shown at bottom. (B) assemble_pore_sequential models the assembly of a pore in a sequential fashion in which monomers bind to form dimers, dimers bind monomers to form trimers, trimers bind monomers to form tetramers, etc. Pores of size three (trimers) and above have a closed, ring-shaped topology, reflecting the variable structure and stoichiometry for the Bax pore suggested by recent in vitro experiments (Martinez-Caballero et al, 2009). The maximal size of the pore is given by the fourth argument to the macro (i.e., 6). With the parameters shown in the example, the execution of the macro results in five rules (for binding of monomers to monomers, monomers to dimers, monomers to trimers, etc) and six species and ODEs (monomers through hexamers). Pores with greater stoichiometry can be modeled simply by changing the pore size argument in the macro call. (C) bind_table is used to concisely represent combinatorial binding among two related groups of molecules. In the example call, the species in the column headers are the five known anti-apoptotic Bcl-2 proteins, whereas the row headers are various pro-apoptotic BH3-only proteins. The table entries represent the dissociation constants for binding between each pair of proteins drawn from in vitro measurements, given in units of nanomolar (Willis et al, 2005; Certo et al, 2006); the Python constant None indicates that no binding occurs. (In place of a dissociation constant, a table cell may alternatively contain a pair of explicit association and dissociation rates.) The names of the binding sites for the row-group and column-group proteins (i.e., ‘bf') are given as the second and third arguments. The final argument (i.e., kf=1e-3) specifies a default association rate to be applied to all reactions, given in units of nM−1 s−1. The execution of the bind_table call results in the instantiation of 28 reversible binding rules, each with the given association rate and a dissociation rate calculated from the dissociation constant provided in the table entries; this further expands to 41 ODEs.

The power of macros is most evident with complex biochemical actions. For example, the assemble_pore_sequential macro (Figure 2B) implements sequential assembly of a membrane-associated pore from multiple identical subunits. It was written to explore how Bax and Bak associate in the mitochondrial outer membrane to form the pore that translocates cytochrome c and Smac into the cytosol during MOMP (Youle and Strasser, 2008). The arguments to assemble_pore_sequential are the identity of the pore-forming protein and its two binding sites, the maximum number of proteins in a pore and a table containing rate constants for each assembly step. In vitro experiments suggest that Bax and Bak assemble by sequential addition of subunits (Martinez-Caballero et al, 2009), but the precise size of an active MOMP pore is unknown. Exploring the effects of changing the number n of subunits in a pore requires creation and modification of n–1 BNGL rules or n ODEs but only one value in the PySB assemble_pore_sequential macro. Many biochemical processes are controlled by families of proteins that have overlapping binding specificities. In apoptosis, MOMP is regulated by ∼20 pro- and anti-apoptotic Bcl-2 family proteins that have a range of affinities for each other and bind in various arrangements (Box 1). Modeling the binding of any two Bcl-2 proteins is simple, but managing equations or rules for all possible binding reactions is much harder. The bind_table macro uses a simple tabular representation to model interactions among members of multi-protein families (Figure 2C). The first argument to the macro is a table (a list of lists in Python) in which row and column headers identify pairs of interacting proteins and each table entry contains binding constants or the value None to indicate that there is no measurable interaction. The second and third arguments specify the binding site names for row and column species, respectively. In addition to being concise (a single bind_table call for the Bcl-2 proteins generates 28 rules and 41 ODEs), the tabular format highlights relationships between proteins by grouping them into functional classes; new family members can be introduced simply by adding rows and columns. A bind_table is therefore a simple computable representation of the ‘binding codes' that have been created by others to summarize structural and biochemistry studies on Bcl-2 family members (Chen et al, 2005; Kuwana et al, 2005; Certo et al, 2006). By changing the first argument in the bind_table call, it is straightforward to explicitly substitute one set of binding data for another, a useful feature for exploring differences in published binding codes or for modeling the behavior of mutant proteins (Fire et al, 2010; Debartolo et al, 2012). Models that use macros such as bind_table naturally acquire a ‘self-documenting' character that minimizes the need for additional explanatory description (see Figures 4B and 5A for examples of bind_table in the context of models of MOMP initiation).

Modules and model reuse

Macros abstract biochemical reactions at a fairly low level of detail involving a few proteins, but Python also supports a wide variety of methods for reusing more complex model elements. We have found three to be particularly useful: (1) Instance Reuse, for making small changes to an existing model; (2) Module Reuse, for programmatically composing a model from reusable pieces or modules; and (3) Class Reuse, for models involving combinatorial variation of several independent features. Of these, Instance Reuse is the simplest, entailing programmatic duplication of a previous model and explicit specification of new and modified elements. Instance Reuse proved to be an appropriate way to update the PySB version of EARM 1.0 (Albeck et al, 2008a) to include synthesis and degradation reactions (Figure 3A). This approach replaces conventional ‘cut-and-paste' copying and editing of ODEs or BNGL rules. Reuse is achieved by cloning the old model into a new model and then explicitly declaring the elements that are added or modified, making the changes easy to understand and track.

Figure 3

Three approaches to model reuse. Boxes with gray tabs represent Python modules/files. Statements marked ‘Reuse' identify the point of reuse of previously created model code. Background highlighting indicates correspondence between elements in the reuse schematic and the PySB code. (A) Direct reuse and subsequent modification of pre-existing model code. A pre-existing model is declared in its own Python file (earm_1_0.py). The extending model in the file earm_1_3.py (representing a later version) imports and duplicates the model object from earm_1_0.py and subsequently adds a list of synthesis and degradation reactions. (B) Reuse of modules using macros. Macros instantiating the components for the upstream (rec_to_bid) and downstream (pore_to_parp) portions of the extrinsic apoptosis pathway are placed in a Python file, earm_modules.py. Variant models differing only in the reaction topology for MOMP initiation (e.g., model_variant.py) are then created by invoking these macros for the shared upstream and downstream elements. (C) Reuse and recombination of model elements through class inheritance. A shared implementation of the compartmentalization-independent reactions for Bax activation (i.e., tBid_activates_Bax) is contained within the base class CptBase. Alternative compartmentalization strategies are implemented in the child classes TwoCpt and MultiCpt, which separately implement the compartmentalization-dependent reactions for tBid and Bax translocation (i.e., translocate_tBid_Bax). Because these classes inherit from CptBase, they acquire the implementation of tBid_activates_Bax, representing a point of reuse. Alternative protein-interaction topologies are implemented within the build_model function in the two classes Topo1 and Topo2. Models with either of the compartmentalization implementations (e.g., TwoCpt) and either of the interaction topologies (e.g., Topo1) can then be created dynamically by inheriting from the appropriate classes, representing an additional point of reuse. Readers familiar with the concept of polymorphism from object-oriented programming will note that calling the build_model method on any of the hybrid classes will polymorphically refer to the correct implementations in the parent classes.

Module Reuse involves the separation of a model into independent elements (‘motifs' or ‘modules'; see also Figure 7) that are written as callable subroutines in Python. It is not necessary for a biological process to be modular in a functional sense for modularization in PySB to be advantageous. Building a model from subroutines enables a ‘mix-and-match' approach in which a subset of the interactions in a model is subjected to revision or re-examination, whereas the rest remain the same. For example, we divided EARM 1.0 into three modules each involving self-contained blocks of PySB code for: (1) reactions from ligand–death receptor association to binding of DISC components; (2) interactions among Bcl-2 family members controlling MOMP; and (3) the cascade of reactions involving initiator and effector caspases and their immediate regulators (Figure 3B; Box 1). A series of papers examining alternative models of MOMP have been published by multiple groups (Chen et al, 2007a, 2007b; Cui et al, 2008; Howells et al, 2010), but MOMP has most commonly been studied in isolation from reactions occurring upstream and downstream. However, it has been shown that multi-protein cascades do not exhibit the same behavior in isolation as when they are part of larger networks (Del Vecchio et al, 2008; Chen et al, 2009). One of the primary aims of modeling signal transduction is to contextualize molecular mechanisms by embedding them in a network context. Thus, studies of extrinsic apoptosis would benefit from models in which alternative hypotheses for MOMP regulation are embedded in a more complete reaction pathway. Using conventional modeling tools, it is challenging to add MOMP ‘mini-models' to upstream and downstream reactions (Albeck et al, 2008b). In contrast, in PySB, this type of composition is simple: we have written a PySB program in which any of 15 models of MOMP are called up, along with common rec_to_bid and pore_to_parp modules to create 15 fully functioning hybrid models of extrinsic apoptosis (Figure 3B). The upstream and downstream reactions have also been modeled in several different ways by others (Rehm et al, 2006; O'Connor et al, 2008; Fricker et al, 2010; Neumann et al, 2010; Schleich et al, 2012), and it would be straightforward to use Module Reuse to combine different proposals for rec_to_bid and pore_to_parp with different MOMP modules. Class Reuse is a third and more sophisticated approach that exploits the class inheritance mechanism in Python. Common model features are coded in a base class, and model variants are written as ‘child' classes of the base class, able to inherit code from the base class directly with or without programmatic modification. Code from multiple variants can then be combined by further inheritance from more than one of these classes. For example, we used Class Reuse to model the effects of reaction compartmentalization on interactions among pro-apoptotic Bcl-2 proteins at the mitochondrial membrane (Figure 3C). In one case, we assumed that reactions took place in two well-mixed reaction compartments corresponding to cytosol and membrane (Figure 3C, TwoCpt), and in the second, we assumed that each mitochondrion constituted a distinct reaction compartment (MultiCpt). In addition, we independently explored different reaction topologies involving the Bcl-2 proteins tBid and Bax. Both topologies (Topo1 and Topo2) include the translocation of tBid and Bax to membranes but only the second topology (Topo2) incorporates activation of Bax by tBid. We used inheritance to automate creation of four different models having different compartmentalization schemes and reaction topologies (e.g., TwoCpt_Topo1). The notable feature of Class Reuse is that model variants are created and combined over multiple independent ‘axes'—in this example, compartmentalization and protein-interaction topology—transparently and with no duplicated code.

Integration with the Python ecosystem and external modeling tools

The iterative process of model development is dramatically accelerated when tools for model creation, simulation, analysis, and visualization are integrated. Many commercial and academic software packages, including Mathematica (Wolfram Research) and MATLAB (Mathworks, 2012), provide integrated tools for equation-based models but are unwieldy to use with rule-based or programmatic approaches because models must be exported and imported using SBML (Hoops et al, 2006; Maiwald and Timmer, 2008). At the same time, rule-based model editors such as RuleBender for BNGL (Smith et al, 2012) and RuleStudio for Kappa (https://github.com/kappamodeler/rulestudio) facilitate development of rule-based models but do not incorporate tools for data analysis, parameter fitting, and symbolic math. Simply by virtue of being written in Python, PySB interacts natively with a large and growing library of open-source scientific software such as NumPy, SciPy, SymPy, and Matplotlib (Table I). Models written using PySB can also exploit Python tools for documentation generation (sphinx) and for unit testing (unittest, nose, and doctest), both of which we used extensively in creating the models of extrinsic apoptosis described below.

Table 1

Integration with external modeling tools

Tool	Reference	Interface	Description (relevance to PySB)
NumPy	Oliphant, 2007	Python	Efficient array and matrix operations
SciPy	Oliphant, 2007	Python	Scientific algorithms, e.g., ODE integration, statistics, and optimization
SymPy	SymPy Development Team, 2012	Python	Symbolic manipulation of mathematical expressions
Matplotlib	Hunter, 2007	Python	Plotting and other data visualizations
Graphviz	Gansner and North, 2000	Python	Layout and rendering of node-edge graphs
BNG	Faeder et al, 2009	Wrapper	Translation of rules to a reaction network; stochastic simulation
Kappa	Danos et al, 2007b	Wrapper	Stochastic simulation; visualization and analysis of rules models
SBML	Hucka, 2003	Export	Compatibility with SBML tools
Mathematica	Wolfram Research	Export	General-purpose scientific computing
MATLAB	Mathworks	Export	General-purpose scientific computing

To interface PySB with BNG and Kappa, which are not implemented in Python, we wrote Python ‘wrapper' libraries, providing access to agent-based simulation, static analysis, and visualization. The wrappers also manage the syntactic differences between BNGL and Kappa, allowing either to be used for the same PySB model. Models can be trivially exported in BNGL format for use with established ‘all-in-one' tools that support BNGL, such as V-Cell (Moraru et al, 2002). PySB can export models as systems of ODEs in formats for SBML (Hucka, 2003), MATLAB, Mathematica, or PottersWheel (Maiwald and Timmer, 2008). Finally, to facilitate unique identification of model components, we added a light-weight annotation capability that allows any model element (including macros and modules) to be tagged with identifiers from external databases using subject–object–predicate triples compatible with MIRIAM (Le Novère et al, 2005). The net result is a software environment that combines the flexibility of a general-purpose scientific computing package with programmatic and rule-based modeling tools and an open-source code base.

EARM version 2.0: a family of models of extrinsic apoptosis and MOMP

To explore the ability of PySB to model the latest molecular data on apoptosis while also building on previous work, we used macros and Module Reuse to construct a family of cell-death models involving 15 different modules for MOMP regulation. Seven of the MOMP modules were previously published by the research group of Shen and colleagues (Chen et al, 2007a, 2007b; Cui et al, 2008), one module extends a Shen group model (Howells et al, 2010), five modules are from the work of Albeck et al (2008b), and three modules are entirely new and incorporate more complete sets of interactions among Bcl-2 proteins (Figure 4A). The three new modules are derived from word models in recent studies from Green and Andrews that unify previously competing mechanisms of pore formation (Billen et al, 2008; Leber et al, 2010; Llambi et al, 2011). Each of the 15 modules was instantiated in PySB as a distinct subroutine that can be called and analyzed in the context of a receptor-to-caspase pathway. The set of 15 MOMP modules is by no means complete, and several noteworthy models of extrinsic apoptosis and MOMP (Bentele et al, 2004; Bagci et al, 2006; Legewie et al, 2006; Rehm et al, 2009; Düssmann et al, 2010) have not yet been coded in PySB. However, because our objective is to explore model reuse and composition using PySB, we limited ourselves to the 15 MOMP-focused examples described above. We collectively denote the resulting set of 30 variant models (15 models only of MOMP plus 15 models of extrinsic apoptosis incorporating the MOMP modules) as Extrinsic Apoptosis Reaction Model version 2.0 (EARM 2.0); the models are summarized in Table II and are available as a Python package with source code downloadable from GitHub (Materials and methods).

Figure 4

Refactoring of published models into PySB. (A) Relationships between the models examined in this paper. The ‘Albeck Group' incorporates a series of incrementally expanded models shown in Figure 11 of Albeck et al (2008b); the ‘Shen/Howells Group' incorporates models from three papers from the research group of Shen and colleagues (Chen et al, 2007a, Chen et al, 2007b; Cui et al, 2008) and a derivative model from Howells et al (2010); the ‘Lopez group' includes three expanded models introduced in this paper. The arrows indicate that one model has been derived or extended from a prior model and point in the direction Base Model → Derivative Model. (B) The ‘Direct' model from Chen et al (2007b) in its original ODE-based representation. (C) Conversion of the Chen ‘Direct' model to a PySB module. The execution of the chen_febs_direct function results in rules that exactly reproduce the ODEs shown above (the molecule type Bad in the PySB function corresponds to the generic enabler species Ena in the original equations; Bid corresponds to the generic activator Act). The macro catalyze_one_step_reversible implements the two-reaction scheme E+S→E+P, P→S; assemble_pore_spontaneous implements the order-4 reaction 4 × subunit pore. The bind_table macro is illustrated in Figure 2C. (D) Model extension in PySB. Module Reuse (Figure 3B) was used to implement the ‘Direct' model from Cui et al (2008) as an extension of the prior ‘Direct' model from Chen et al (2007b) shown in Figure 4C. Invocation of the PySB function chen_febs_direct incorporates the elements of the original Chen et al (2007b) model; subsequent statements specify the modifications and additions required to yield the derived model from Cui et al (2008).

Table 2

Summary of models in EARM 2.0

Model name[a]	ID (full/MOMP)	Reference	MOMP-only (Mnb)[b]			Full apoptosis (Mna)[c]
			Rules	ODEs	Parameters	Rules	ODEs	Parameters
Lopez Embedded	M1a/b	This paper	39	40	78	66	76	133
Lopez Direct	M2a/b	This paper	27	32	58	54	68	113
Lopez Indirect	M3a/b	This paper	29	34	64	56	70	119
Albeck 11b	M4a/b	Albeck et al, 2008b	7	13	17	34	48	71
Albeck 11c	M5a/b	Albeck et al, 2008b	11	17	25	38	52	79
Albeck 11d	M6a/b	Albeck et al, 2008b	12	18	27	39	53	81
Albeck 11e	M7a/b	Albeck et al, 2008b	14	21	31	41	56	85
Albeck 11f	M8a/b	Albeck et al, 2008b	14	21	31	41	56	85
Chen 2007 Biophys J	M9a/b	Chen et al, 2007a	6	7	12	37	49	75
Chen 2007 FEBS Direct	M10a/b	Chen et al, 2007b	5	8	12	36	48	74
Chen 2007 FEBS Indirect	M11a/b	Chen et al, 2007b	3	6	9	34	48	72
Cui Direct	M12a/b	Cui et al, 2008	18	10	26	49	52	89
Cui Direct 1	M13a/b	Cui et al, 2008	22	11	33	53	53	96
Cui Direct 2	M14a/b	Cui et al, 2008	23	11	34	54	53	97
Howells	M15a/b	Howells et al, 2010	14	12	22	45	49	84

aModel names are drawn from the first author of the paper in which the mathematical model was published.

bMOMP-only variants are identified as Mnb, e.g., M1b for the MOMP-only variant of ‘Lopez Embedded'.

cFull apoptosis variants are identified as Mna, e.g., M1a for the full-apoptosis version of ‘Lopez Embedded'.

While porting existing models into PySB, we observed that several published ODE networks contained one or more errors relative to their verbal or graphical descriptions in the original paper (Cui et al, 2008) (see Supplementary Note). We were unable to discern whether the errors in the published ODE networks represent genuine mathematical errors or merely transcription errors made in the process of converting computer models to text. Even when we used the ODE networks as published, we found cases in which we were unable to reproduce the results described in the figures. Our own previous work was not entirely free of this problem: we could not reproduce the simulation results in Figure 11 of Albeck et al (2008b) without access to MATLAB source code that was inadvertently omitted from the original publication. Our aim is not to criticize these papers but instead to emphasize that the current practice of maintaining different forms of a model for the purpose of simulation, illustration, and publication is highly problematic. The lists of equations included as supplementary materials in most modeling papers are particularly troublesome because they exist independently of the simulation model and the two tend to deviate. These problems can be addressed by using electronic formats for model exchange with a single master from which all other versions are derived (Hucka et al, 2003; Waltemath et al, 2011). As an electronic format for models, PySB complements XML-based formats such as SBML in that macros, modules, and other high-level abstractions make model structure more intelligible than SBML alone. In addition, modeling biochemical processes by reusing previously validated macros eliminates ‘bookkeeping' errors such as those we identified in published MOMP models. To ensure that the reinstantiated models reproduced the behavior of the originally published versions, we wrote a series of unit tests using the Python modules unittest and nose. The tests guarantee that the reinstantiated models reproduce validated states, despite translation into PySB. Further details on our approach to unit testing can be found in the online documentation (Materials and methods). The structure and origin of the MOMP models are easier to understand using PySB than the underlying sets of ODEs. This can most easily be seen by comparing the ODE and PySB versions of a model from Chen et al (2007b) (Figures 4B and 4C). The original model is relatively simple (only seven ODEs), but understanding the precise mechanism for MOMP requires careful inspection of each equation. By comparison, the PySB model exploits macros to make the mechanisms transparent: single-step catalysis, combinatorial binding, and pore assembly. Many of the 15 MOMP models in EARM 2.0 represent incremental extensions of earlier models (this is particularly true of the models from Howells et al (2010) and Chen et al (2007a, 2007b), as well as the five models from Albeck et al (2008a, 2008b); Figure 4A). The authors of these models proceeded by duplicating ODEs from previous models or papers and then adding new species or reactions as required: e.g., the three models of Cui et al (2008) are derived directly from the ‘direct' model of Chen et al (2007b), whereas the model of Howells et al (2010) is based on an earlier model from Chen et al (2007a). However, the process of renaming species and variables in the derived models makes it difficult to verify that each variant correctly recapitulates the structure of the original model as claimed. For example, in Cui et al (2008), the authors stated simply that the ‘direct model' was ‘mainly based' on the earlier work of Chen et al (2007b), but we found that there were several important additions and modifications in the derived model, including addition of displacement, synthesis and degradation reactions, and a change in the MOMP pore from a Bax tetramer to a Bax dimer. Inspection of the PySB source code for the direct model of Cui et al (2008) (Figure 4D) makes these differences explicit by calling a subroutine for the earlier chen_febs_direct model (Figure 4C) and adding only the new reactions. PySB Module Reuse facilitated the process of embedding each of the 15 models of MOMP within the context of receptor–proximate reactions (ligand binding to Bid cleavage) and downstream reactions creating ‘Full Apoptosis' and ‘MOMP-only' versions (summarized in Table II; see also Materials and methods). We are currently developing additional apoptosis modules (e.g., alternative topologies for receptor activation and DISC formation) that will soon be part of the EARM repository; other researchers can also ‘fork' the code on GitHub and contribute their own additions. This should allow a cumulative and distributed approach to model development and comparison.

Embedded together: an updated and expanded MOMP model

The EARM 2.0 extrinsic apoptosis model incorporating the ‘Lopez Embedded' MOMP module variant, denoted EARM 2.0-M1a for short (Table II), implements a mathematical interpretation of recent experimental findings from Andrews (Billen et al, 2008; Leber et al, 2010) and Green (Llambi et al, 2011) and differs significantly from previously published models of MOMP (Figure 5). Interactions among Bcl-2 family members occur at the mitochondrial membrane rather than in the cytosol (Lovell et al, 2008), and anti-apoptotic proteins are able to bind both the pore-forming proteins, such as Bax and Bak, and a larger family of BH3-only Bcl-2 family members, thus serving as dominant-negative effectors (Billen et al, 2008; Leber et al, 2010). This is also consistent with a recent ‘Unified Model' by Green and coworkers demonstrating both ‘direct' and ‘indirect' modes of action by the anti-apoptotic Bcl-2 proteins (Llambi et al, 2011) (Box 1). The overlapping binding specificities implied by this model are summarized in a bind_table call that includes the key effector for extrinsic apoptosis (Bid), two BH3-only sensitizers (Bad and Noxa), two pore-forming effectors (Bax and Bak), and three anti-apoptotic proteins (Bcl-2, Bcl-XL, and Mcl-1), along with affinity data obtained from in vitro experiments (Willis et al, 2005; Certo et al, 2006) (Figure 5A). There is some doubt about whether peptide-based affinity measurements are directly relevant to protein–protein interactions occurring on the membranes of living cells, and the bind_table macro makes it straightforward to experiment with different values (Figure 2C). EARM 2.0-M1a also assumes auto-activation of Bax (and Bak), which has been demonstrated in multiple experimental contexts (Tan et al, 2006; Gavathiotis et al, 2010).

Figure 5

Representations of the Bcl-2 interaction topology in the EARM 2.0-M1b (‘Lopez Embedded') model. The Bcl-2 interaction model consists of five basic mechanistic elements or ‘motifs'—tBid/Bax/Bcl-xL translocation, activation of Bax and Bak by tBid, Bcl-2 family binding, Bax/Bak auto-activation, and pore assembly—that are highlighted in A, B, and C according to the colors in the legend. (A) PySB source code for the model, edited for brevity. (B) Simplified, manually drawn representation. (C) The full reaction network, generated from the PySB model using the PySB render_reactions tool. Rectangles represent species, circles represent reactions, lines represent reactions with the solid arrowhead representing the nominal forward direction, and the empty arrowhead (for reversible reactions only) representing the reverse direction. Catalytic reactions are depicted with a boxed arrow pointing from the catalyst to the reaction circle (species for enzyme–substrate complexes are omitted for clarity). (D) Kappa contact map, which shows the superset of all possible bonds between monomers calculated by static analysis (Danos et al, 2008). The contact map was computed using Kappa's complx tool accessed through the PySB Kappa wrapper library. Rectangles represent monomers, circles represent sites, and lines represent bonds.

Our previously published EARM 1.0–1.4 models (Albeck et al, 2008a, 2008b; Spencer et al, 2009; Aldridge et al, 2011; Gaudet et al, 2012) assumed that the all-or-none quality of MOMP arose from the ability of Bcl-2 to bind Bax monomers, dimers, and tetramers (Albeck et al, 2008b). However, subsequent immunoprecipitation experiments failed to support the existence of such higher-order hetero-oligomers (Kim et al, 2009). To determine whether the updated reaction topology in EARM 2.0-M1a can reproduce MOMP dynamics measured in single TRAIL-treated HeLa cells using Förster resonance energy transfer reporter proteins (Spencer et al, 2009), we fitted it to data using the simulated annealing algorithm in SciPy (Materials and methods). We found that EARM 2.0-M1a had as good a fit to data as previous models (Figure 6), and we therefore judge it to be superior to our earlier EARM 1.0 model based simply on better correspondence with prior knowledge. The fitting exercise also demonstrated that Python numerical tools can efficiently simulate and calibrate PySB models (parameter estimation functions are included in the EARM 2.0 Python package).

Figure 6

Simulated annealing fits for EARM 2.0-M1a for three experimentally measured protein signals IC-RP/tBid (A), IMS-RP/Smac release (B), and EC-RP/PARP cleavage (C). Gray lines indicate the experimental data with error bars indicating the s.d. In the case of B, the gray line denotes the mean time of death TExp, used to align the trajectories (Materials and methods). The purple curves show the simulated trajectories using nominal parameter values; the orange curves show the simulated trajectories after model fitting. The objective function for fitting is described in the Materials and methods.

Discussion

In this paper, we describe the development and use of PySB, a framework for creating, managing, and analyzing biological models in which models are full-fledged Python programs. PySB modules and macros generate BNGL or Kappa rules, which can be converted into mathematical equations. This hierarchical process is analogous to the way in which programs are written in a high-level language such as C++ and converted into microprocessor code by the compiler. This complexity is hidden from PySB users who work with macro libraries encoding biochemical actions such as ‘catalyze,' ‘bind,' ‘assemble,' etc. The advantages of a high-level abstraction layer include greater transparency and intelligibility, a reduction in implementation errors and a dramatic increase in the ability to compare and reuse previous work. Each level of representation remains accessible for analysis and there are no explicitly black-box steps: the Python model code reveals model ontogeny, structure, and approach to mechanism; the BNGL or Kappa rules generated by PySB support agent-based simulation and static analysis; and equations enable numerical integration, sensitivity analysis, steady-state analysis, etc. Use of familiar but powerful programming concepts in PySB models such as abstraction, composition, modularity, inheritance, and polymorphism make it possible to create variant models from pre-existing models across several axes of variation and build new models from previously tested elements. We expect these features of PySB to facilitate collaborative model development and evaluation. PySB draws on well-established practices in the open-source programming community for model documentation and sharing. Because PySB models are programs, they can be tracked and shared using the powerful tools developed for distributed, open-source software development (e.g., all the models in this paper are available, with documentation, at GitHub; see Materials and methods). It is simple to update models online, highlight differences with previous work and divide development among multiple individuals and research teams. Finally, PySB can be used as a general-purpose modeling tool because it interoperates with diverse scientific applications written in Python (e.g., NumPy, SciPy, SymPy, and Matplotlib). Unlike conventional all-in-one programs, PySB itself tackles only certain steps in the modeling process, relying on interoperability with programs developed and maintained by others to create a full-fledged solution. A benefit of this approach is that improvements in any of these programs accrue directly to users of PySB. The power of PySB derives, in part, from its ability to encode recurrent biochemical patterns in reusable macros (Figure 2) and to divide complex networks into modules that are defined once and called when needed (Figures 3B and 4C). By eliminating re-implementation, macros and modules separate fundamental mechanistic concepts from implementation details, and thereby make clear the purpose and origins of specific model features (Mallavarapu et al, 2008; Pedersen and Plotkin, 2008; Mirschel et al, 2009; Gnad et al, 2012). The ability of real biological networks to be meaningfully decomposed into functional modules is highly context dependent and a matter of controversy (Del Vecchio et al, 2008), but there is no requirement that modules in PySB correspond to modules in a biological or ‘black-box' engineering sense: the full reaction network is always accessible without simplification. Instead, PySB modules are defined according to flexible and convenient organizational boundaries, keeping open the possibility for cross-talk and emergent interactions with other modules. This style of modularity follows the open-ended approach of little b (Mallavarapu et al, 2008) and differs from ProMoT, in which modules interact only through previously designated molecular species (Mirschel et al, 2009). In general, choosing the right boundaries for a module, whether a software program or a biological model, is a matter of art and practical experience. In the models of extrinsic apoptosis analyzed in this paper, reactions governing MOMP are a good candidate for modularization because they largely take place in a discrete compartment (the mitochondrial membrane) and focus on reactions among Bcl-2 proteins. We have found that PySB naturally supports a hierarchy of modeling concepts (Figure 7A). At the top of this hierarchy are the models themselves, which represent a specific hypothesis about the topology and activity of a biological system or network; at the bottom of the hierarchy are specific mathematical equations (e.g., ODEs). Typical approaches to modeling proceed by directly rendering the hypothesis in equations, making it difficult to discern the assumptions implicit in the process of mathematical translation (Figure 7B). Rule-based approaches represent an intermediate level of abstraction in that they enumerate local interactions between proteins in a way that is less explicit than equations (Figure 7C). PySB adds an additional layer of abstraction in that the user works with macros and functions (Figure 7A; see also Figure 2). Sets of macros are then grouped into reusable subroutines that implement small mechanistic ‘motifs' corresponding roughly to a sentence in a word model, such as ‘tBid activates Bax and Bak.' Such ‘motifs' are then composed into modules, and modules into models. Constructed in this fashion, a set of variant models forms a ‘web' of intertwined elements that is largely self-documenting.

Figure 7

A conceptual hierarchy for model building with PySB. (A) Simplified hierarchy for EARM 2.0-M2a and M3a. Models can be decomposed into a small number of separable modules; modules can in turn be decomposed into smaller recurrent mechanistic elements or motifs. Motifs are collections of macro calls (e.g., the motif ‘tBid activates effectors' consists of two catalyze calls; Figure 5A). Each macro expands into multiple rules (Figure 2) and rules generate equations (Figure 1B). The gray box highlights the intermediate abstraction layers afforded by PySB. (B) ODE-based approaches to model construction proceed directly from a complex biological system to its mathematical representation, whereas (C) rule-based approaches provide one abstraction layer above ODEs.

PySB as a second-generation approach

PySB is not the first attempt to create a high-level language for modeling biochemistry and was inspired by ProMoT and little b, both of which represent models as LISP programs (Mallavarapu et al, 2008; Mirschel et al, 2009). However, as described above, these tools had limited or no support for rule-based modeling. PySB is based on the much more familiar Python language and is interoperable with BNGL and Kappa. The rule-based modeling community is also developing tools for managing complex models. For example, MetaKappa targets redundancy in models having related molecular species and partially overlapping functional characteristics (e.g., a set of mutants or isoforms of a single protein) (Danos et al, 2009). The Language for Biological Systems (LBS) is another approach in which rules are combined with methods for constructing parameterized modules (Pedersen and Plotkin, 2008). MetaKappa and LBS are examples of domain-specific languages (DSL) for high-level biological modeling, whereas PySB supports high-level modeling through the structured programming features of Python (Box 2). Through Python, PySB provides substantial flexibility in organizing models. A potential drawback of this flexibility is that static analysis of models may be more challenging to implement than for DSLs such as MetaKappa or LBS. For the time being, available static analysis algorithms can be applied to the rules generated from PySB models (Danos et al, 2008; Feret et al, 2009). We expect the strengths and weaknesses of these different approaches to become clearer as high-level languages are more widely adopted by the modeling community. Graphical tools represent an alternative approach to making biological models easier to understand. CellDesigner (Kitano et al, 2005) and BioUML (Kutumova et al, 2012) employ visual interfaces and graphical languages for model creation, interpretation, and revision. Our experience has been that purely visual approaches to model creation do not scale well with model size and are ambiguous with respect to the underlying biochemical mechanisms. This parallels experience in the software engineering community with UML diagrams: diagrams are a helpful adjunct to programmatic abstractions but they are not a replacement. This implies that visualizations should be created from an underlying computable representation and not the other way around; this is the approach taken by the rxncon software, which uses specialized tables to describe reactions in a way that specifically supports multiple visualizations (Tiger et al, 2012). However, to be most useful, the computable representation must be made as transparent as possible: approaches that rely on generated visualizations for transparency are hard to modify, because changes must be made to the underlying (non-transparent) representation. Although visualization remains important for communicating and describing models (Chylek et al, 2011), PySB follows the common software engineering paradigm in which programmatic abstraction serves as the principal tool for managing complexity and visualization serves to illustrate specific properties of a system (Danos et al, 2008). By way of illustration, we show how a reaction network (Figure 5C) and a Kappa contact map (Figure 5D) can be generated from EARM 2.0-M1b (the MOMP-module-only variant of M1a) and a species graph can be generated for the full EARM 2.0-M1a model (Supplementary Figure S2). These visualizations proved useful in debugging models during the preparation of this manuscript, but it was the PySB model code that supported straightforward revisions (Figure 5A). The code not only contains more information than the visualizations, but with only 10 macro calls falling into 5 ‘motifs' (color-coded regions), it compares favorably with even a simplified, hand-drawn ‘cartoon' representation (Figure 5B) in terms of intelligibility. The advantages of using programmatic methods for modeling biological pathways are not necessarily evident from simple examples in which the underlying equations are self-explanatory (e.g., the ‘Hello World' model in Figure 1A). This has led some to dispute the value of such methods and to argue that direct modeling in equations is superior. However, as models become more realistic and complex, equations rapidly become difficult to understand and errors creep in, a problem that was evident with the published MOMP models we reinstantiated (Figure 4). The advantages of programmatic abstraction also become more evident when a model must be revised or shared, particularly if the original developers have moved on to something new. This paper shows how PySB, in combination with BNGL and Kappa, goes a long way toward addressing this problem.

The EARM 2.0 models of extrinsic cell death

One aim of this paper was to create a new model of extrinsic apoptosis that incorporated the latest thinking on the biochemistry of MOMP while facilitating comparison with previously published models. EARM 2.0 includes 15 different models for the reactions controlling MOMP, 12 of which were previously published in five papers and three of which are novel. In general, previously published models of MOMP do not explore Bcl-2 biochemistry in the context of a complete receptor-to-effector caspase network. Such models also simplify the biology of Bcl-2 proteins, representing only a subset of the family members. We overcame these limitations by modularizing the extrinsic apoptosis pathway and using composition to embed different MOMP modules within a larger network. Use of the bind_table macro (for modeling interactions among members of a multi-protein family) made it possible to efficiently encode the differential binding affinities of many Bcl-2 proteins for each other. Although the molecular interactions included in our EARM 2.0 models are not comprehensive, the extensible nature of the PySB representation makes it simple to add additional mechanistic details and species in future work. One important feature of apoptosis is that MOMP pores do not form until a pro-death stimulus has been applied at a sufficient concentration for several hours; however, once pores form cells die quickly (Goldstein et al, 2000; Rehm et al, 2002; Albeck et al, 2008b). Our original explanation for the all-or-none regulation of MOMP was that an anti-apoptotic protein such as Bcl-2 or Bcl-XL binds to successively larger Bax and Bak oligomers, thereby creating a cooperative inverse relationship between Bcl-2 and pore levels. This idea has not been borne out by experiments and instead it appears that the kinetic properties of MOMP must arise from the dual affinity of anti-apoptotic proteins for both BH3-only proteins and pore-forming effectors (Billen et al, 2008; Llambi et al, 2011). We find that an ‘embedded together' model (EARM 2.0-M1a) incorporating this revised thinking about Bcl-2 family proteins (Leber et al, 2010) can reproduce the dynamics of MOMP as measured in single cells using live-cell imaging (Spencer et al, 2009). An additional round of experimental and computational ‘model discrimination' studies is needed to show whether EARM 2.0-M1a is indeed superior to previous models, but we prefer it simply on the basis of its faithful recapitulation of current knowledge.

PySB as a means to incremental and collaborative model development

One of the key aims for PySB, and also for BNGL, Kappa and related meta-languages, is to promote distributed, incremental, and collaborative approaches to modeling. There are both technical and conceptual challenges that must be addressed for this to be successful. A significant technical hurdle in model integration and reuse is the need for a standard nomenclature for model species. The SBML community's MIRIAM standard is an essential resource in this regard (Le Novère et al, 2005), but the way in which rule-based models represent species and complexes will demand a modified approach to annotation. As a first step, we have implemented a basic annotation capability in PySB (see Supplementary Figure S3) based on MIRIAM-style subject–object–predicate triples that should help resolve naming ambiguities. However, the fundamental challenge for integrating and reusing models of disparate biological processes remains the fact that biological models remain ‘fit-to-purpose,' focused on addressing specific biological problems or contexts (Krakauer et al, 2011). PySB does not prescribe ‘universal' approaches to representing biological components or processes but instead makes fit-to-purpose modeling more transparent and manageable through the use of both hierarchical abstractions (Figure 7) and tools for documenting, testing, and tracking models drawn from software engineering. In the short-term, these features should allow communities of biologists working on related biological problems to work in parallel toward shared goals; in the longer-term, real-world experimentation with approaches for collaborative modeling should yield best practices for building broadly reusable models. The fact that PySB models are programs allows us to exploit the tools and social conventions of the open-source software development community for distributed model development. In open-source software, derivative or variant branches of a source tree can be spun off and then merged into the principal source tree if desired. Version control systems such as Git allow this process to be managed and visualized. PySB models shared via GitHub can have both private and public branches that preserve the integrity of ongoing model development while allowing for external contributions. Multiple groups can develop derivative models with confidence that the relationships among variants can be tracked and managed. SBML versions of PySB models can also be deposited in the BioModels repository, supporting current procedures for indexing, citation, and search. Software tests can be written to ensure that models and modules behave as documented. As models get larger and the scope of the underlying biology exceeds the expertise of a single modeling team, tools such as PySB will be needed to create reusable, shareable, and transparent biological models in a distributed manner—a major goal of a systems-level program of biological discovery.

Materials and methods

PySB code and documentation

PySB is freely available under an open-source BSD license. Links to the GitHub source code repository as well as documentation and other didactic materials are available at http://pysb.org. The EARM 2.0 models and associated documentation are available along with the data used for model calibration at http://sorgerlab.github.com/earm. An SBML version of the EARM 2.0-M1a model is included in the Supplementary Materials of the paper.

PySB syntax

PySB implements a syntax for rules based on that of BNGL using classes and overloaded operators from the Python language, leveraging Python as the language parser. Rule definitions are built up using patterns (represented internally as instances of the classes MonomerPattern, ComplexPattern, and ReactionPattern) that specify both a rule's preconditions and its effects. Site and state conditions for monomers are specified using an overloaded __call__ method for the Monomer class, which takes site and state conditions as keyword arguments. For example, if L is an instance of Monomer, L(s='P') specifies L with site s having state P. This use of the __call__ method, along with the overloaded operators +, %, >>, and <>, allow rules to be specified using a syntax that parallels that of BNGL and Kappa (Figure 1A). See also the ‘PySB syntax' section of the Supplementary Materials.

Component self-export

By default, when a model component is constructed, it is added to the current instance of the Model class; a variable referring to the newly created component (with a name matching the name passed to the component constructor) is also inserted into the global namespace. This eliminates the need to retain references to newly created objects and explicitly add them to the model. This ‘self-export' functionality is managed by the PySB core class SelfExporter, which identifies the module in which the current Model instance was declared and adds global variables for components to that namespace. If alternative approaches to component and model management are desired, the self-export feature can be disabled by adding the keyword argument _export=False to each component constructor. See also the ‘PySB syntax' section of the Supplementary Materials.

ODE integration

PySB generates the reaction network through an external call to BioNetGen and extracts the network graph by parsing the resulting .net file. The network graph is then used to build up the right-hand sides of the ODEs as SymPy symbolic math expressions containing the appropriate rate constants, mass action terms, and stoichiometric coefficients. If a C compiler is available, the right-hand side function is implemented in C using scipy.weave.inline; otherwise, the right-hand side function is evaluated as a Python expression. A reference to the right-hand side function is passed to scipy.integrate.ode, a generic wrapper library for ODE integration routines; for this work, we used the FORTRAN integrator VODE (Brown et al, 1989).

Conversion of published models to PySB

For the models lacking an electronic version (Chen et al, 2007a, 2007b; Cui et al, 2008; Howells et al, 2010), the ODEs generated by the PySB versions were manually validated against the ODEs listed in the original publications. In the case of the three models with errors in the published ODEs (Cui et al, 2008) (see Supplementary Note), the PySB version was written to generate the ODEs corresponding to the described reaction scheme without these errors. For the models for which we had access to the original MATLAB code (Albeck et al, 2008b), the PySB versions were also programmatically validated against the output from the published versions.

Modularization of MOMP models

Each MOMP-only model (Table II) was written to have the addition of tBid as its most upstream event, and the release of cytochrome c and Smac as its most downstream event. In some cases, these default boundaries did not match the boundaries for the MOMP module in the original publications: Albeck et al (2008b) had the addition of active caspase-8 as the most upstream event, whereas the Shen/Howells group models had Bax oligomerization (rather than Smac release) as the most downstream event (Chen et al, 2007a, 2007b; Cui et al, 2008; Howells et al, 2010). In these cases, the networks of the original models were modified to achieve consistent boundaries across modules. The boundaries of the original models can nevertheless be reproduced through the use of optional arguments to the module subroutines that add or remove reactions as appropriate.

Simulation and parameter estimation

Simulations of EARM 2.0-M1a were carried out using the VODE integrator via the SciPy library using Newton's method for root evaluations and the backward differentiation formula integration method. Absolute and relative tolerances were set to 10−5. Parameter estimation was performed using the simulated annealing routine implemented in scipy.optimize.anneal with an appropriately defined objective function (described below). Nominal values for rate constants for the DISC and PARP modules were set to their published values in EARM 1.0; rate constants for the MOMP module were drawn from Certo et al (2006), Willis et al (2005), or set to values from similar rates from EARM 1.0. During the annealing process, all rate constants were allowed to vary two orders of magnitude above and below their nominal values (i.e., 0.01X—100X); initial protein concentrations were held fixed and not estimated. Trajectories for the initiator caspase reporter protein (IC-RP), mitochondrial inter-membrane space reporter protein (IMS-RP), and effector caspase reporter protein (EC-RP) were used from previously published data (Spencer et al, 2009). In the model, tBid, cytosolic Smac, and cleaved PARP were fit to the data for IC-RP, IMS-RP, and EC-RP, respectively. IMS-RP data from 10 cells indicated an average MOMP time of 9810±2690, s after the exposure of the cells to ligand. The IC-RP and EC-RP signals were normalized and aligned to this MOMP time to yield an average trajectory for each. The objective function used to calculate model fit was the sum of component functions for each of the experimental observables as follows: where are the rate parameters for the system of ODEs, and , , and are the objective functions for corresponding observables. These objective functions were defined as follows: where N=112 is the number of experimental time points, is the simulated value of tBid at time t (with the given parameters,, Bid0 is the total amount of Bid in the simulation, and ICRP is the distribution of MOMP-aligned experimental IC-RP trajectories at time t. The objective function for EC-RP was defined similarly, but with cleaved PARP in place of tBid and EC-RP in place of IC-RP: Data from previous experiments showed that release of IMS-RP from mitochondria due to MOMP is both rapid and complete under all tested experimental conditions, typically reaching completion within 1–3 min (Albeck et al, 2008a). To ensure that the model could reproduce these kinetics, we used an alternative objective function for the cSmac/IMS-RP signal to consider the onset (T) and duration (T) of MOMP: is the experimental ‘death time,' recorded as the time of MOMP as measured by IMS-RP release; TExp is the experimental ‘switching time,' defined as the time required for IMS-RP to be released, estimated to have a mean value of 180 s and a variance of 3600, s2. The corresponding simulated values were defined as where T90 and T10 are the times taken by the simulation to reach 90% and 10% of maximum Smac release, respectively. The final term in constrains the simulated Smac trajectory to achieve approximately 100% release: cSmac is the final simulated value of cytosolic Smac, and Smac0 is total Smac.

71 in total

1. Hierarchical regulation of mitochondrion-dependent apoptosis by BCL-2 subfamilies.

Authors: Hyungjin Kim; Mubina Rafiuddin-Shah; Ho-Chou Tu; John R Jeffers; Gerard P Zambetti; James J-D Hsieh; Emily H-Y Cheng
Journal: Nat Cell Biol Date: 2006-11-19 Impact factor: 28.824

2. Auto-activation of the apoptosis protein Bax increases mitochondrial membrane permeability and is inhibited by Bcl-2.

Authors: Chibing Tan; Paulina J Dlugosz; Jun Peng; Zhi Zhang; Suzanne M Lapolla; Scott M Plafker; David W Andrews; Jialing Lin
Journal: J Biol Chem Date: 2006-03-29 Impact factor: 5.157

Review 3. Rules for modeling signal-transduction systems.

Authors: William S Hlavacek; James R Faeder; Michael L Blinov; Richard G Posner; Michael Hucka; Walter Fontana
Journal: Sci STKE Date: 2006-07-18

4. COPASI--a COmplex PAthway SImulator.

Authors: Stefan Hoops; Sven Sahle; Ralph Gauges; Christine Lee; Jürgen Pahle; Natalia Simus; Mudita Singhal; Liang Xu; Pedro Mendes; Ursula Kummer
Journal: Bioinformatics Date: 2006-10-10 Impact factor: 6.937

5. Modeling of the role of a Bax-activation switch in the mitochondrial apoptosis decision.

Authors: Chun Chen; Jun Cui; Haizhu Lu; Rui Wang; Shuai Zhang; Pingping Shen
Journal: Biophys J Date: 2007-03-30 Impact factor: 4.033

Review 6. The BCL-2 protein family: opposing activities that mediate cell death.

Authors: Richard J Youle; Andreas Strasser
Journal: Nat Rev Mol Cell Biol Date: 2008-01 Impact factor: 94.444

7. Apoptosis initiated when BH3 ligands engage multiple Bcl-2 homologs, not Bax or Bak.

Authors: Simon N Willis; Jamie I Fletcher; Thomas Kaufmann; Mark F van Delft; Lin Chen; Peter E Czabotar; Helen Ierino; Erinna F Lee; W Douglas Fairlie; Philippe Bouillet; Andreas Strasser; Ruth M Kluck; Jerry M Adams; David C S Huang
Journal: Science Date: 2007-02-09 Impact factor: 47.728

8. Systems analysis of effector caspase activation and its control by X-linked inhibitor of apoptosis protein.

Authors: Markus Rehm; Heinrich J Huber; Heiko Dussmann; Jochen H M Prehn
Journal: EMBO J Date: 2006-08-24 Impact factor: 11.598

9. Robustness analysis identifies the plausible model of the Bcl-2 apoptotic switch.

Authors: Chun Chen; Jun Cui; Wei Zhang; Pingping Shen
Journal: FEBS Lett Date: 2007-10-08 Impact factor: 4.124

10. Mathematical modeling identifies inhibitors of apoptosis as mediators of positive feedback and bistability.

Authors: Stefan Legewie; Nils Blüthgen; Hanspeter Herzel
Journal: PLoS Comput Biol Date: 2006-07-28 Impact factor: 4.475

74 in total

1. Pycellerator: an arrow-based reaction-like modelling language for biological simulations.

Authors: Bruce E Shapiro; Eric Mjolsness
Journal: Bioinformatics Date: 2015-10-26 Impact factor: 6.937

2. Constant-complexity stochastic simulation algorithm with optimal binning.

Authors: Kevin R Sanft; Hans G Othmer
Journal: J Chem Phys Date: 2015-08-21 Impact factor: 3.488

3. Parameter Estimation and Uncertainty Quantification for Systems Biology Models.

Authors: Eshan D Mitra; William S Hlavacek
Journal: Curr Opin Syst Biol Date: 2019-11-06

4. BioSimulator.jl: Stochastic simulation in Julia.

Authors: Alfonso Landeros; Timothy Stutz; Kevin L Keys; Alexander Alekseyenko; Janet S Sinsheimer; Kenneth Lange; Mary E Sehl
Journal: Comput Methods Programs Biomed Date: 2018-10-10 Impact factor: 5.428

5. MOLNs: A CLOUD PLATFORM FOR INTERACTIVE, REPRODUCIBLE, AND SCALABLE SPATIAL STOCHASTIC COMPUTATIONAL EXPERIMENTS IN SYSTEMS BIOLOGY USING PyURDME.

Authors: Brian Drawert; Michael Trogdon; Salman Toor; Linda Petzold; Andreas Hellander
Journal: SIAM J Sci Comput Date: 2016-06-01 Impact factor: 2.373

Review 6. Modeling for (physical) biologists: an introduction to the rule-based approach.

Authors: Lily A Chylek; Leonard A Harris; James R Faeder; William S Hlavacek
Journal: Phys Biol Date: 2015-07-16 Impact factor: 2.583

Review 7. Rule-based modeling: a computational approach for studying biomolecular site dynamics in cell signaling systems.

Authors: Lily A Chylek; Leonard A Harris; Chang-Shung Tung; James R Faeder; Carlos F Lopez; William S Hlavacek
Journal: Wiley Interdiscip Rev Syst Biol Med Date: 2013-09-30

8. Smart computational exploration of stochastic gene regulatory network models using human-in-the-loop semi-supervised learning.

Authors: Fredrik Wrede; Andreas Hellander
Journal: Bioinformatics Date: 2019-12-15 Impact factor: 6.937

9. Intrinsic limits of information transmission in biochemical signalling motifs.

Authors: Ryan Suderman; Eric J Deeds
Journal: Interface Focus Date: 2018-10-19 Impact factor: 3.906

10. Tracing the footsteps of autophagy in computational biology.

Authors: Dipanka Tanu Sarmah; Nandadulal Bairagi; Samrat Chatterjee
Journal: Brief Bioinform Date: 2021-07-20 Impact factor: 11.622