Pauline Traynard1, Luis Tobalina2, Federica Eduati3, Laurence Calzone1, Julio Saez-Rodriguez2,3. 1. Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France. 2. RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany. 3. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK.
Abstract
Here we present logic modeling as an approach to understand deregulation of signal transduction in disease and to characterize a drug's mode of action. We discuss how to build a logic model from the literature and experimental data and how to analyze the resulting model to obtain insights of relevance for systems pharmacology. Our workflow uses the free tools OmniPath (network reconstruction from the literature), CellNOpt (model fit to experimental data), MaBoSS (model analysis), and Cytoscape (visualization).
Here we present logic modeling as an approach to understand deregulation of signal transduction in disease and to characterize a drug's mode of action. We discuss how to build a logic model from the literature and experimental data and how to analyze the resulting model to obtain insights of relevance for systems pharmacology. Our workflow uses the free tools OmniPath (network reconstruction from the literature), CellNOpt (model fit to experimental data), MaBoSS (model analysis), and Cytoscape (visualization).
Finding effective drugs and understanding how they work still pose considerable challenges with profound effects on human health. Several options exist to find compounds that could be used to treat a specific disease, such as target‐based or phenotypic screening approaches.1 More recent developments in experimental techniques such as shRNA2 and CRISPR‐Cas93 or microfluidics4, 5 promise to ease the discovery of new targets and drugs by increasing the number of targets that can be tested at once and decreasing the amount of biological material necessary to perform the experiments. However, a target discovered with such techniques may not necessarily be actionable in the clinical setting.6, 7 A compound that showed promising results in the laboratory may fail to be effective on patients. Understanding how a drug acts at the systems level is key to increasing the likelihood of success. Likewise, understanding the molecular basis driving a disease can be of great help in the search for its cure.Mechanistic models can help to better understand a drug's mode of action and predict the behavior of a biological system in response to drugs. The nascent field of Quantitative Systems Pharmacology aims to address these challenges by combining mechanistic models with pharmacokinetic modeling.8, 9, 10 It is known that many diseases involve alterations in the signaling pathways used by cells to interpret the cues from their environment.11, 12, 13, 14 Likewise, many drugs are designed to target components of these signaling pathways. These different pathways are not merely linear signaling conduits activated by different stimuli, but are also interconnected via crosstalk mechanisms to regulate each other, giving rise to signaling networks.
REGULATORY NETWORKS
The structure and function of signaling networks are complex, and they are differently deregulated in different biological contexts in nontrivial ways. Previous clinical studies have shown that inhibiting the same oncogenes can vary in efficacy, depending on the patient. The best example is the treatment of BRAF mutations in melanoma compared to colon carcinomas. In melanoma, a particular BRAF activating mutation, V600E, leads to the activation of signaling pathways involved in proliferation. Treatments targeting specifically the mutation, such as vemurafenib, show an efficient immediate response in melanomas. However, colon cancerpatients with the same mutation do not respond to the treatment.15 The difference between these two responses to the drug is due to different cellular contexts. Only present in colon cancer, a feedback loop on EGFR activation leading to the activation of the mitogen‐activated protein kinases (MAPK) pathway, through RAS, may be responsible for the poor outcome. This example highlights the importance of accurately describing the networks regulating these signaling pathways and their crosstalks. To ensure efficiency of the drug treatments, a good knowledge of these complex interactions and how patient mutations affect the cellular fate is necessary.The regulatory networks used in these models are generally extracted from databases and the literature. These sources represent the current knowledge available about interactions involving the proteins of interest. However, little information is known about which regulations are specific to a given cell type or a particular biological context. This information is important to highlight the specific mechanisms of regulation in different contexts. For example, comparing signaling pathways of healthy and diseased cells allows investigating which mechanisms are deregulated as a cause or an effect of the disease.16, 17 Similarly, the comparison of pathway activity in different cells from the same cancer type allows investigating mechanism of resistance to drugs, which can be exploited to suggest personalized therapies.18, 19, 20 If experimental data are available, optimization procedures can be used to refine the initial networks to be cell line‐ or context‐specific.
BASICS OF LOGIC MODELING
Mathematical modeling can help to decipher these complex mechanisms.18 Several modeling formalisms exist to deal with complex gene regulatory and signaling network structures.11, 21, 22 Because models can also provide insights into a drug's mode of action, scientists are also working towards building bridges between drug design and network modeling techniques.9, 23, 24Among the modeling techniques, logic(al) modeling has proven to be very versatile and able to provide useful biological insights.25, 26, 27 It has been applied for studying several biological phenomena (e.g., developmental processes,28 hematopoiesis,29 or cell fate decision30), but primarily used in signaling and gene regulation.26 Compared to other modeling techniques commonly used to describe biological systems, such as ordinary differential equations based on chemical kinetics, logic models are better suited to describe medium‐ or large‐scale networks, where detailed biological knowledge is often incomplete and where a more schematic representation of the system can improve the overall interpretability of the results. However, it is necessary to keep in mind that the formulation of logic rules implies a simplification of the described biological system and might not be able to fully capture the complexity of the underlying system.22Within the logic formalism, signaling networks are modeled by defining a signed and directed causal regulatory network as well as rules to update the state of its components.22, 25, 27 The sign of the interaction depends on whether it is an activation or an inhibition, and the direction indicates which node acts over another one (e.g., in a kinase–substrate interaction where the kinase activates its substrate, the edge will go from the kinase to the substrate and its sign will be positive). In these networks, proteins are represented by nodes and interactions between proteins are represented by edges. The state of a node can represent its activation status. This way, the nodes can take different values: 0 (False) or 1 (True), in the case of Boolean models; or discrete values: 0, 1, 2, or higher, in the case of multivalued models for representing distinct activation states. These states are updated following defined update rules, which take into account the state of input nodes and the type of interaction (positive or negative). Several different variations have been developed, ranging from Petri nets31 to logic‐based Ordinary Differential Equations (ODEs).32 An overview on some of the different formalisms available to model signaling networks (interaction graphs, logical models, and logic‐based ODE models) and the types of analyses that can be achieved with them is reviewed elsewhere.33 By including nodes corresponding to the targets of different drugs in these models, their values can be manipulated in the simulations to find out what would be the effect of these compounds when applied to the system.
BIOLOGICAL APPLICATIONS OF LOGIC MODELING
The examples where logic modeling has been applied to understand biological mechanisms are numerous and go from the study of T‐cell receptor signaling34, 35 to cell‐fate decision,36 mammalian cell cycle,37 or host immune response.38 In particular, cancer research has motivated many of the applications of logic modeling and fostered several methodological developments, because it requires studying large signaling and regulatory networks.39 For example, discrete logical modeling has been used to describe the DNA damage response signal transduction network in human epithelial cells and predict candidate target proteins for sensitization of carcinomas to DNA‐damaging agents.39 Also, the ErbB network was explored with a model built from prior knowledge and protein phosphorylation data to understand the drug resistance mechanisms of breast cancer cell lines.40 Logic modeling was applied to bladder cancer to investigate the effects of gene alterations leading to invasiveness.41 However, the use of logic modeling tools is not limited to cancer applications, and research in other diseases also benefit from these techniques. For instance, Boolean modeling was applied to the study of systemic lupus erythematosus, to stratify patients and find the best matching treatment.42 It has also been used to study the immunological response to infections,43 or to understand apoptosis given its relevance in diseases like Alzheimer's and Parkinson's.30 These and other examples19, 44 illustrate the value of logic modeling to enhance our understanding of the systemic effect of therapies. The models provide a formal tool to quickly evaluate in silico the effect of targeting one specific component of the model or explore the effects of possible drug combinations.20, 45, 46, 47 It is also possible to assess how changes in the network (e.g., missing or inactive receptor, etc.) may affect the effect of a drug.In this tutorial, we show the different steps involved in logic modeling on a prostate cancer example that involves some of the key phosphorylation pathways of this malignancy, and that we use to predict cell survival in different conditions (Figure
1). The steps include the building of the signaling network, its improvement using available data, and its simulation and analysis geared towards obtaining useful biological insights and predictions. We also present several tools that are useful for such a workflow, namely, Omnipath
48 to construct the signaling network, CellNOpt
49 to build a model trained to data, MaBoSS
50 to simulate and predict treatment response, and Cytoscape
51 to visualize some results of the simulations. It should be noted that many other excellent tools to model and analyze signaling networks exist, such as BoolSim,52 BoolNet,53 GINsim,54 etc. Many of these are compatible via the SBML‐qual format,55 and their development is coordinated by the CoLoMoTo initiative.56
Figure 1
Workflow suggested when applying logic modeling to the study of a biological question. In this tutorial we use Omnipath
48 for signaling database mining, CellNOpt
49 for model fitting, MaBoSS
50 for simulations, and Cytoscape
51 for visualization and network analysis. Different steps of the pipeline include 1) selecting a system and a question of interest and building a first version of the network, 2) choosing a modeling formalism and improving the model with data, and 3) analyzing the model, making predictions and comparing them to experimental data. The dashed arrow indicates a comparison between the results of the analysis and the experimental data. Dotted arrows represent feedback of the results into the modeling pipeline. Rounded boxes represent elements that can be considered part of the different types of the analysis. PKN, prior knowledge network.
Workflow suggested when applying logic modeling to the study of a biological question. In this tutorial we use Omnipath
48 for signaling database mining, CellNOpt
49 for model fitting, MaBoSS
50 for simulations, and Cytoscape
51 for visualization and network analysis. Different steps of the pipeline include 1) selecting a system and a question of interest and building a first version of the network, 2) choosing a modeling formalism and improving the model with data, and 3) analyzing the model, making predictions and comparing them to experimental data. The dashed arrow indicates a comparison between the results of the analysis and the experimental data. Dotted arrows represent feedback of the results into the modeling pipeline. Rounded boxes represent elements that can be considered part of the different types of the analysis. PKN, prior knowledge network.
Selection of a system to study
The decision to use logic models is usually spurred by the wish to gain a mechanistic understanding of a biological system for which the size, the lack of knowledge, or both, precludes more refined approaches such as reaction‐based modeling. The system of study might be a specific pathway, a cell line, or a disease, for example. The available data and publications related to the chosen system are usually a starting point for the following steps.41 In other cases, new experiments are designed and carried out to provide the necessary data for modeling.16 For this tutorial, we decided to study a small signaling network in prostate cancer, and do so using published data describing the phosphorylation response of key proteins in prostate cancer cell lines in response to the addition of several ligands and inhibitors.57The study providing the data used in this tutorial57 takes a data‐driven approach based on multivariate regression analysis to predict prostate cancer cell survival from the phosphorylation levels of 14 key proteins. These proteins are related to core signaling pathways that drive cell growth in three prostate cancer cell lines in response to various treatments by ligands or kinase inhibitors. The study focuses on the MAPK, PI3K, and IKK pathways. Correlations between phosphoproteins and cell survival are discussed in relation with their known roles in signaling pathways, with the goal of studying the sources of castration resistance between the cell lines. The results suggest that, in prostate cancer, androgen‐independent growth and androgen‐mediated signaling are largely driven via MAPK and PI3K signaling.We aim to study this system with a modeling approach, focusing on the data measured in the LNCaP cell line.
Construction of a regulatory network
Constructing a regulatory network is the first step in the modeling process. A first network, commonly referred to as the prior knowledge network (PKN), gathers the biological knowledge already known for the main components involved in the process being studied as a signed and directed graph. The PKN is then used for downstream modeling.When building a PKN, it is critical to know what the essential elements that should be included in the model are. The nodes often describe primarily proteins, but they can be considered as other types of elements such as genes that affect protein functionality. Elements to include in a cancer model comprise proteins related to the most frequently mutated genes, differentially expressed genes, drug targets, and phenotypic readouts (e.g., cyclins for cell cycle, caspases for apoptosis, EMT regulators for EMT, etc.). Many of these usually surface during the explorative data analysis preceding the mechanistic modeling. The major players of signaling pathways known to be deregulated, such as EGFR, RAS, MEK for the MAPK pathway, or TP53, MDM2, CASP8, and/or CASP3 for the apoptosis pathway, also need to be considered. Finally, input nodes that account for the microenvironment (presence or absence of growth factors, nutrients or ligands, hypoxia conditions, etc.) or for the treatments performed in the experiments and used for the modeling have to be added. The list of nodes can, of course, evolve throughout the construction of the network, through the exploration of published experiments where new genes or proteins are identified as playing an important role, or the inclusion of intermediary nodes required to link two processes.Once defined, the selected nodes are linked with edges that represent direct interactions or indirect regulations, identified in the literature. When available, the representation of detailed mechanisms, however, is challenged by the requirement to keep a reasonable size for the network, and as a consequence a reduced computation cost for simulations. A good balance has to be found between detailed mechanistic descriptions of the key signaling components and more schematic representations for less important processes reduced to the necessary players. This process often requires careful consideration of the mechanisms defined as essential for the question at hand. In addition, some simplification can be done once the model is built in an automatic manner. Automatic reduction tools exist to help the process while preserving the dynamics in the logic formalism,47, 58 such as removing intermediary nodes in linear pathways.Another complementary approach to sketching the network from literature knowledge is to use database information. There exist some pathway databases that depict, in great detail, the literature curated signaling pathways. Among these databases, KEGG,59 Reactome,60, 61 BioCarta,62 Wikipathways,63 Signor,64 Signalink,65 ACSN,66 etc., can be cited. Retrieving information from them is faster than the tedious process of reviewing the literature. However, the modeler should keep in mind the limits of such resources: because they are not always up‐to‐date, specific but important information not included in the databases might be missed. To facilitate the retrieval of information from all these databases, we use Omnipath,48 a comprehensive collection of existing manually curated pathway databases. Omnipath comes with a Python tool called pypath, developed to query its content, and manipulate and filter it easily. This allows us to extract the interactions that concern only the predefined list of proteins, and execute complex queries such as retrieving all possible paths between two proteins.There are different ways in which the network can be represented.67 For simplicity, we present our example with a type of diagram referred to as “entity relations” in the standard format SBGN,67 where nodes correspond to genes, proteins, modified proteins, complexes, or phenotypes, and edges are directed arrows representing activation or inhibition of one node over the other. This representation provides a simplified mechanistic view of the processes involved in the disease where the details of the reactions (degradation, synthesis, phosphorylation, etc.) are not explicitly represented but rather pictured as each node having a positive or a negative influence on other nodes of the network.In our case study, we build a simple network representing the signaling cascades involved in the survival of prostate cancer cells (Figure
2). We aim at validating the network discussed in the initial publication of the data,57 where two versions are provided. The first one is an undirected map containing the proteins whose phosphorylation status are measured in response to treatment with ligands and inhibitors (Erk1, Erk2, Akt1, Akt2, Akt3, RPS6, GSK3α, GSk3β, p38, JNK1, JNK2, JNK3, HSP27, Stat3), the ligands (EGF, IGF1, IL6, TNFα, and DHT) that we select as inputs, and key components of the PI3K/mTOR and MAPK pathways (PI3K, mTOR, RPS6, β‐catenin, Jak, RAS, MEK, Rac, and IKK). The second network suggests some possible links between the androgen receptor (AR) and the PI3K signaling pathway, RPS6, or cell cycle targets.
Figure 2
(a) Prior knowledge network (PKN) derived from public resources, including interactions connecting nodes which are measured (in blue) or perturbed (stimulated in green and inhibited in red) in the experimental data.57 The network was further expanded to include more components from the apoptotic pathway (p53, Caspase8, and Caspase 9) and Myc for the cell cycle activation and their regulation of Survival. Network layout is generated with Cytoscape.51 (b) Examples of logic rules used to convert the network to a logic model. All other nodes in the model with more than one input edge are modeled with a simple OR gate.
(a) Prior knowledge network (PKN) derived from public resources, including interactions connecting nodes which are measured (in blue) or perturbed (stimulated in green and inhibited in red) in the experimental data.57 The network was further expanded to include more components from the apoptotic pathway (p53, Caspase8, and Caspase 9) and Myc for the cell cycle activation and their regulation of Survival. Network layout is generated with Cytoscape.51 (b) Examples of logic rules used to convert the network to a logic model. All other nodes in the model with more than one input edge are modeled with a simple OR gate.Using Omnipath, signed interactions between the complete list of proteins used in the two initial networks were retrieved in order to connect them into a single network.For instance, the query related to an interaction between PI3K (PIK3CA) and AKT (AKT1) in Omnipath returns an activation of AKT1 by PIK3CA from four databases (SignaLink3, Laudanna_effects, Wang, Signor), and an inhibition of AKT1 by PIK3CA from one database (Laudanna_effects). We can therefore include in the network an activation of AKT by PI3K as the most confident interaction. Looking back at the four literature references associated with the interaction also confirms the sign and direction. A detailed description of the network is reported in the Supplementary Materials, with the python code used with pypath.In general, one or more phenotypes of interest will be modeled as additional nodes (e.g., survival, apoptosis, proliferation, etc.) and linked to the rest of the network by appropriate edges, to facilitate model predictions. In our case study, we predict cell survival, available along with phosphorylation data,57 as a phenotypic outcome of the model. Therefore, the resulting network was extended to describe roughly the regulation of the cell cycle and the apoptotic pathway, both of which influence an output node called “Survival.” Aiming at simplicity for this tutorial, we chose a few prominent components among the high number of possible apoptotic and proliferative factors that could be included in the PKN. Thus, Caspase 8, p53, and Caspase 9 represent the possible modes of activation of the apoptotic pathway, whereas the proliferative pathways are depicted by MYC and a generic node named Cell_cycle. For more information on this extension, see the Supplementary Materials.The resulting PKN is displayed in Figure
2. Complex crosstalks exist between pathways and are usually taken into account in the detailed model. For instance, the apoptotic regulation modeled in another work36 integrates three highly intertwined pathways activated by death receptors. For this tutorial, however, the network is kept simple to focus on core signaling cascades.
Translation into a logic model
As shown in Figure
1, the PKN, or interaction graph, can be directly studied with structural analysis, or translated into a logic model to extend the scope of the analysis. Structural analysis include topological analyses such as degree and centrality measures, pathway searches, feedback and feedforward loop identification, or characterization of minimal cut and intervention sets.68
Cytoscape
51 is a graph visualization and analysis tool that provides access to many of these methods. The translation of the interaction graph into a logic model allows formally studying how signaling information flows through the network and how perturbations of its components affect this flow.33In a logic model, nodes of the regulatory network are assigned a state: active/true or inactive/false in the case of Boolean models. A set of logic rules are assigned to the edges, which determine how the state of each node will be updated as a function of the state of the nodes that influence them. When a node has two upstream activators, if its activation depends on the simultaneous presence of its two upstream nodes, the two inputs will be linked by an AND gate. If its activation depends on either of the two upstream nodes, the two input nodes will be linked by an OR gate. Inhibition of a node by an upstream node is described with a NOT gate. Combining AND, OR, and NOT gates, we can express any logical function as a sum of products, also known as disjunctive normal form.In some cases, when a node has several upstream regulators, it may be unclear if they act in combination or alone. In other words, they may act through any combination of logic gates. In those cases, we can instantiate the different possible rules and use experimental data to find the appropriate representation of the interaction.47 It is thus possible to find an appropriate translation of the regulatory network to a logic model that can be used for simulation purposes.An example of a complex logic rule (Figure
2
b) in our case is the one regulating the “Survival” node. The upstream nodes for “Survival” are “Cell cycle,” “MYC,” “Caspase8,” and “Caspase9” nodes. To activate “Survival,” the “Cell cycle” node needs to be active, but the presence of either Caspase 8 or Caspase 9 can inhibit survival. However, if MYC is also active, both caspases need to be active in order to inhibit survival.This can be written in the following way:Survival = 1 if (Cell_cycle AND ((NOT Caspase8 AND NOT Caspase9) OR (MYC AND (NOT Caspase8 OR NOT Caspase9)))Altered variants (often referred as “mutants”) of the model can then be generated by replacing the logic rule defining the regulation of a component by a constant (0 or 1 in the Boolean case), to model inhibiting or activating perturbations, such as drug treatments or gene mutations.The Boolean version of logic modeling might be somewhat restrictive in terms of the scenarios it can represent due to its limitation to binary states. Extensions have been developed to overcome this limitation. In this tutorial we will show two of these alternatives with two selected tools: CellNOpt,49 which implements logic‐based ODEs, and MaBoSS,50 a tool for continuous time Boolean modeling.The logic‐based ODE approach deals with the conversion of logic functions that only accept binary inputs to continuous functions that replicate their behavior. By allowing continuous values for state and time, quantitative time course data can be matched. There are several approaches to build such ODEs,32, 52 and we use here the one of Wittman et al.32 Because the number of variables and parameters for a given system is much smaller than ODE systems derived from a mechanistic (biochemical) description, logic‐based ODE models can include a larger number of biological components. Therefore, logic‐based ODEs are particularly useful when dealing with: 1) a medium‐scale network, where only a qualitative biological knowledge of the system under investigation is available, which can be easily interpreted with the logic formalism, and 2) quantitative (and possibly time‐course) data resulting from experiments which can be well described with the ODE formalism. The software CellNOpt
49 provides the necessary tools to transform a logic model into a logic‐based ODE model and find a set of good parameters by fitting the model to the available data.MaBoSS is a C++ software that simulates continuous time Markov processes on a Boolean network.50 In contrast to deterministic continuous approaches, such as logic‐based ODEs, this formalism handles the asynchronous updates of the states of the nodes in a stochastic way and generates a population of trajectories as sequences of Boolean states. Transition rates can be associated with each node, and probabilities of network states can be estimated given a set of initial probabilities. The main result consists of the Boolean state distribution after reaching an asymptotic behavior, and the implicit time provided by sequence steps allows considering transient dynamics. The size of the population of trajectories can be adapted to the scale of the network, under the control of probability errors, which makes MaBoSS useful even for very large networks.In the next section, we take the regulatory network built previously, with the logic gates already selected based on literature information, and we use some of the available experimental data to find the appropriate weights for an ODE version of the logic model. These weights will also be used as some of the transition rates of the continuous time Markov process model.
Training with data
The purpose of using data to train a network is to obtain a better representation of our system, since information available in the literature and databases may be incomplete, and specify a generic model to a particular biological system. For example, different cell types may share the same signaling networks but with some differences in their wiring, and this may not be reported in the literature. That way, networks can be refined to be cell line‐ or context‐specific by comparing model simulations with experimental data and refining model parameters until simulations are able to reproduce the measurements. This process is named “model training” and corresponds to solving an optimization problem where an objective function that captures how well the model can describe the data is minimized. This optimization can be performed automatically using different algorithms.The most useful data to train a model are measurements of the states of model variables under different conditions, if possible tracing their dynamic behavior. In networks, nodes typically represent proteins or complexes and a commonly used measurable proxy of their activation or deactivation are posttranslational modifications such as phosphorylation. Perturbations of the system by stimulating the pathway with ligands and/or inhibiting the proteins with targeted drugs or antibodies provide information about the dynamics, which is not available from static data.Protein phosphorylation upon perturbation is commonly measured using mass‐spectrometry or antibody‐based techniques. For a detailed description of the advantages and disadvantages of these techniques, we refer to Saez‐Rodrigues et al.11 Briefly, with antibody‐based techniques, only a maximum of a few dozen phosphorylation sites can be measured at the same time. Additionally, antibodies are selected to be specific for a phosphosite, which implies making assumptions on which the phosphorylation site represents the activity of each node. With mass‐spectrometry techniques, this issue is overcome by the possibility of measuring thousands of phosphorylation sites at the same time. However, the application of mass‐spectrometry to the investigation of dynamic signaling pathways has so far been limited by the relatively low number of experimental conditions, due to the expensive and laborious nature of this technology when compared with antibody‐based techniques. Accordingly, modeling efforts with this type of data have been scarce,69 but this is likely to change in the near future with the rapid development of the technology.70In the data considered for our example,57 the LNCaPprostate cancer cell line was perturbed with combinations of ligands (EGF, IGF1, IL6, TNF, DHT; in green in Figure
2) and kinase inhibitors targeting nodes in the network (PI3K, MEK, IKK, mTOR, p38; in red in Figure
2) for a total of 44 different conditions. The study also included data from perturbation with docetaxel, which we did not consider, as little variation in the phosphoproteome was reported in the docetaxel condition as compared to controls. Docetaxel is known to target β‐tubulin, which is related to the cytoskeleton and has no clear effect on our network of interest. However, it is known that cytotoxic treatments can also rewire cell signaling networks.71 More measurements in the docetaxel condition could allow studying cell signaling rewiring with logic models through network optimization.Data were measured at three timepoints (30 min, 4 h, and 24 h) using an antibody‐based technique to measure key phosphosites of eight proteins in the network (AKT, RPS6, GSK3, ERK, AKT, p38, JNK, HSP27, Stat3; in blue in Figure
2). Since the antibody used for GSK3 measures an inhibitory site, the sign of the regulatory interactions to and from GSK3 were inverted. We chose to consider only the first two timepoints (30 min and 4 h), as signaling through phosphorylation changes works on a fast time scale (affecting a wide part of the cell on the order of minutes) and is expected to reach a semisteady‐state within the first few hours. Considering a longer time scale might also require taking into account slower effects, such as transcription regulation, that would lead to changes in the levels of proteins and thereby to a rewiring of the network. We use these data to train cell line‐ and context‐specific logic model, as detailed in the next paragraphs.
Training the logic‐based ODE model with CellNOpt
First, the PKN is interpreted as a logic‐based ODE as described in Wittmann et al.32 and implemented in the R package CellNOptR.49 The dynamic of each node is described by an ODE as a function of its regulators. For example, for the rate of change of the variable representing AKT activity the corresponding ODE is:
where parameter τ represents the life‐time of the node AKT. When τ = 0, dAKT/dt will also be = 0, meaning that AKT remains at the basal value, i.e., its initial condition. The transfer function f(PI3K, k represents the regulation exerted by PI3K, and is a monotonically increasing sigmoid in the variable PI3K (in the range {0,1}). The increase rate depends on the parameter k defining the strength of the regulatory interaction. When k the dynamic of AKT is independent from PI3K.20Second, some of the nodes and edges in the logic model are not identifiable (i.e., corresponding parameters cannot be precisely estimated) based on the available experimental data and are automatically compressed, as described in Saez‐Rodriguez et al.47 and implemented in CellNOptR,49 in order to reduce the size of the model. These include the so‐called noncontrollable elements, which are the ones that are not affected by any of the perturbed or measured species (e.g., the node “Stress” and its regulation to Rac) and the nonobservable elements, which are the ones downstream of all the measured or perturbed nodes (e.g., the node “Caspase8” and its regulation by TNFR).Third, data are stored using the MIDAS format72 (see Supplementary Materials) and then normalized between 0 and 1 to be in the same range of model simulations. Here, the normalization is performed separately for each measured species (different antibodies can have different affinities and corresponding data should therefore in general be normalized separately), by computing the log2 of the fold change of each perturbed condition with respect to the basal (unperturbed) state, and then linearly scaling the resulting values between 0 and 1, with 0.5 corresponding to the basal state (i.e., the 0 in the log2 fold change). In this way, all initial conditions are set to 0.5 (which is the basal state, or the state at time 0) and data at time 30 min and 4 h can remain at the unperturbed state or show an increased or a decreased activity. More complex normalization procedures, involving saturation effects, are implemented in CellNOptR and can be used, for example, in the case of strong outliers in the data, which would otherwise mask smaller effects.Fourth, the model is trained to the experimental data by looking for the parameter set which minimizes the discrepancy between model simulation and experimental data in terms of sum of squares of the difference between measured and simulated data (namely, RSS, residual sum of squares). We assume that the wave of activation of signaling pathways upon stimulation reaches semisteady‐state within the first few hours. Hence, an additional term (SSpenalty) is included to penalize simulations that do not reach steady‐state within the time range observed in the experimental data. The resulting optimized cost function (Q) can be schematically written as:This optimization problem (code available as Supplementary Materials) is solved using a global population‐based optimization method based on enhanced scatter search, as implemented in the MEIGO software.73 As the optimization problem is nonconvex, we solved the problem 10 times with different initial random guesses for the parameter values in order to further reduce our chances of ending the solution process in a bad local optimum (the local minima reached were actually very similar; coefficient of variation across the 10 runs = 0.009).Fifth, we consider as the best model the one that better fits the experimental data in terms of minimum RSS across the 10 runs (Figure
3
a) and verify that experimental data are well described (Figure
3
b) (best model is available as Supplementary Materials). If the fit is not good, the initial PKN has to be refined, as there might be missing interactions that are supported by the experimental data. This refinement can be performed with a revision of information from literature or databases, or using a combination of data‐driven approaches.74 After refinement of the PKN, model training must be repeated. In our case, the fit is good, as shown in Figure
3
b,c, where measured values are plotted against model simulations showing a good agreement for most of the measured conditions. Importantly, the model is able to represent in the simulations most of the conditions where data show a strong increase or decrease with respect to the basal value (i.e., 0.5).
Figure 3
(a) Optimized model with node and edge parameter values represented in grayscale. Dotted lines correspond to compressed nodes and edges which are removed before training the model, as not identifiable from the experimental data. (b) Top panels show four examples of fit of optimal model simulation to experimental values. For each measured phosphoprotein in each experimental condition, color scale is used to represent the mean squared error (MSE). (c) Scatterplot of simulations using the optimal model with respect to experimental data, showing good correlation. (d) Comparison of best model with the results of model optimization after bootstrap (repeated 300 times), network randomization (100 times), and data randomization (300 times) using different scoring metrics, i.e., MSE, coefficient of determination (COD), Pearson correlation (r).
(a) Optimized model with node and edge parameter values represented in grayscale. Dotted lines correspond to compressed nodes and edges which are removed before training the model, as not identifiable from the experimental data. (b) Top panels show four examples of fit of optimal model simulation to experimental values. For each measured phosphoprotein in each experimental condition, color scale is used to represent the mean squared error (MSE). (c) Scatterplot of simulations using the optimal model with respect to experimental data, showing good correlation. (d) Comparison of best model with the results of model optimization after bootstrap (repeated 300 times), network randomization (100 times), and data randomization (300 times) using different scoring metrics, i.e., MSE, coefficient of determination (COD), Pearson correlation (r).Additionally, we performed bootstrapping to assess the variability of the optimal model when the optimization is repeated (300 times), using for training the data resampled with replacement. In order to assess the statistical significance of the trained models, we also repeated the optimization in two types of randomized conditions: 1) data randomization (data shuffling across all timepoints, measured species and conditions), while keeping as scaffold the network derived from prior knowledge (repeated 300 times); 2) network randomization (using BiRewire
75 to preserve network properties), while maintaining for training the measured data (repeated 100 times). As shown in Figure
3
d, both randomizations showed a significant (P‐values, one‐sided t‐test <10−26) decrease in performance when compared to the bootstrapped distribution or to the best model, regardless of the metric used for comparison (mean squared error, MSE = RSS/N with N size of the training data; coefficient of determination, COD = 1‐SSres/SStot, where SSres is the residual sum of squares and SStot is the total sum of squares; Pearson correlation, r). Thus, these results prove that experimental data and the prior knowledge network are indeed informative and that our best model performs significantly better than random. The results of bootstrap analysis can also be used to assess the variability of the optimized parameters, highlighting which parameters or parts of the network are not well constrained, and possibly suggesting new targeted experiments which would better constrain the problem. Other approaches to address this problem consist in the use of regularization techniques to induce sparsity of the network, thus improving the identifiability of essential parameters.20After assessing the quality of our optimized model, we can use it for further analyses. One thing we observe is that IGF1‐R, TNFR, and IKK are associated with null τ parameters, which means that they become independent from their regulator activities. The system therefore does not depend on the inputs IGF‐1 and TNFα. Moreover, IL6‐R is associated with a very small value for τ, suggesting that the dynamics of the system will predominantly depend on pathways activated by EGF.
Simulation of a logical model in different conditions
We have just shown how it is possible to use experimental data to train and refine our PKN modeled using logic‐based ODEs. Some nodes and interactions, however, were omitted because no data were available to determine their underlying parameters. In this section, we will use continuous time Markov processes on a Boolean network to predict the effect of perturbations in a semiquantitative manner, including those that we could not include in the logic‐based ODE model. In particular, we will focus on the probability of survival that can be predicted and compared to experimental observations.We use the software MaBoSS
50 to compute time trajectories in this graph: continuous time Markov processes generate a population of sequences (1,000 in this case) of asynchronous transitions between states of the system. Such stochastic simulations can exploit continuous values in addition to the Boolean formalism: the proportion of trajectories found in a state represents the probability of reaching this state, and converges toward an asymptotic value. Initial probabilities of inputs (values between 0 and 1) represent environmental conditions and ligand treatments. The relative probability needed to reach the activated output node “Survival” in a perturbed condition compared to the unperturbed condition can be related to the relative proportion of surviving cells after treatment compared to the control experiment.Moreover, as previously mentioned, MaBoSS allows us to associate transition rates with each variable activation or inactivation. Although we keep the inactivation transition rates to the default value of 1, we can assign the values given by the parameters τ obtained in the optimized logic‐based ODE model to the transition rates associated with node activations. Interactions associated with a null strength, an extreme situation, can be removed from the model: it is the case for the activation of JNK by Rac and the activation of AR by AKT whose strengths are close to 0 (k = 6.7e‐05, k = 8.4e‐04). Finally, treatment with targeted inhibitors can be encoded as alterations of the model variables. In the next section, these alterations will be refined to take into account dose–response.To validate the model, we first aim at verifying the main experimental observations reported in Lescarbeau and Kaplan57 on the global behaviors of LNCaP cells across the different treatment conditions,57 indicated on the first row of Table
1. For this purpose, the model is simulated until an asymptotic state is reached. Initial states are random for all nodes, including inputs, with the exception of “Stress,” which is not related to the treatments and is therefore kept at the value of 0. We introduce several perturbations to the model to reproduce the effect of drug treatments. These simulations are explained below. Figure
4 details the simulation outputs for the unperturbed model (control condition) and the final survival probabilities in different conditions are summarized in Table
1. MaBoSS also computes an error associated with each probability, which remains smaller than 0.016 in each condition, showing that the differences between results in different conditions are statistically significant. Detailed simulations with probability errors are given in the Supplementary Materials.
Table 1
Comparison between experimental data and model simulations for both untrained and trained model
Control
DHT=1
DHT=1 and PI3Ki
PI3Ki
mTORi
Androgen depleted (DHT=0)
DHT=0 and PI3Ki
Experimental observations
1.00
1.38
0.75
0.60
0.78
0.50
NA
Survival probability, untrained model
0.86
0.86
0.61
0.61
0.59
0.74
0.59
Survival probability, trained model
0.68
0.86
0.63
0.33
0.53
0.53
0.06
First row: proportion of surviving cells measured in the data57 (NA, not available). Second row: Survival probabilities for simulations in different conditions, with random inputs, with the untrained model (model derived from PKN). Third row: Survival probabilities for the same simulations with the trained model.
Figure 4
Outputs of MaBoSS simulations with random initial states. (a) Time trajectories of unperturbed model (WT) or model treated with PI3K inhibitor (iPI3K) and mTOR inhibitor (imTOR), with arbitrary time units. (b) Barplot of final state distribution for the unperturbed model. The probability of seven final model states are shown (Caspase 8‐Myc state means that the two variables are present, all the others are 0).
Outputs of MaBoSS simulations with random initial states. (a) Time trajectories of unperturbed model (WT) or model treated with PI3K inhibitor (iPI3K) and mTOR inhibitor (imTOR), with arbitrary time units. (b) Barplot of final state distribution for the unperturbed model. The probability of seven final model states are shown (Caspase 8‐Myc state means that the two variables are present, all the others are 0).Comparison between experimental data and model simulations for both untrained and trained modelFirst row: proportion of surviving cells measured in the data57 (NA, not available). Second row: Survival probabilities for simulations in different conditions, with random inputs, with the untrained model (model derived from PKN). Third row: Survival probabilities for the same simulations with the trained model.Androgen treatment with DHT was experimentally shown to cause a 38% increase in survival as compared to the control condition. This survival advantage was essentially abrogated when treated in combination with the PI3K inhibitor: the slightly decreased survival caused by PI3K inhibitor treatment was only increased by 25% with the addition of DHT. To show that the model is able to reproduce experimental results, we run simulations with an initial probability of 0.5 for the control, or 1 for the DHT input, and with or without an inhibition of PI3K. Initial values for other inputs are kept random. We compare the survival probability when the system has reached asymptotic solutions. We find a survival probability of 0.68 for the control condition (unperturbed trained model, random values for inputs including DHT), and an increase of probability in the condition DHT = 1 (probability of survival is 0.87). We also observe a decrease of survival probability with respect to control when the model is perturbed with a complete inhibition of PI3K (survival probability: 0.33). Consistent with experimental observations, the survival probability in this condition is slightly increased with the addition of DHT (survival probability: 0.63), but is not reestablished to the control value.Interestingly, performing the same simulations on the Boolean model derived from the prior knowledge network shows no survival advantage induced by DHT on the untrained model, suggesting that there is an advantage in training the model to experimental data.The decreased survival probability for the trained model with inhibited PI3K in the condition of random DHT (the probability of survival is 0.33) compared to the condition DHT = 1 (the probability of survival is 0.63) suggests a greater dependency of androgen‐independent survival to the PI3K pathway rather than the MAPK pathway in the trained model. This can be verified by simulating the model with inhibited PI3K in the condition DHT = 0. In that case the survival probability is very low (probability 0.06). Therefore, androgen‐independent survival is dependent on PI3K pathway, while androgen‐mediated growth also relies on the MAPK pathway.The experimentally observed effect of mTOR inhibitor treatment on survival was lower than that of PI3K inhibitor treatment, with a relative increase of cell survival between these two conditions of 31%. Our simulations with the trained model reproduce this trend, with a higher survival probability (0.53) predicted when the model is perturbed with an mTOR knockout. In contrast, inhibiting PI3K or mTOR has the same effect on survival probability with the untrained model.Finally, the relative survival measured in experiments in androgen‐depleted conditions was reported to represent more than 50% of the one in normal growth media condition (control).Here again, simulating the model with DHT = 0 confirms this observation, with a predicted survival probability of 0.53, 76% of the probability in the unperturbed model.We have seen in the previous section that although the PKN encompasses the different pathways discussed in the work by Lescarbeau and Kaplan,57 training the model on the experimental data points toward a predominant role for the pathways activated by EGFR. This finding is validated by the stochastic simulations of the trained model. Indeed, simulations in different conditions predict survival probabilities that reproduce the main dynamical observations reported on LNCaP cells.57
Simulations of drug treatments and application to patient data
Beyond the validation of mechanistic hypotheses, the model can be used to predict the effect of new inhibitions, which can suggest new drug treatments or combinations of drugs. The simulation of partial inhibitions that can be related to the dose–response data can be encoded with the introduction of new nodes inhibiting directly possible drug targets in the model. For instance, the inhibition of PI3K can be enforced with a new input node “Anti_PI3K,” which inhibits PI3K. The probability chosen for “Anti_PI3K” represents the strength of inhibition on PI3K. It allows us to assess the effect of low concentrations of inhibitors.We compare the survival probability predicted by simulations of the model with random input values when inhibiting each node of the model separately, with three levels of inhibition: 0.1, 0.5, and 1 (full inhibition). The results are displayed in Figure
5. As expected, the inhibitions of MYC, PI3K, or AKT have the most effect on survival for each inhibition level.
Figure 5
Probability of the node “Survival” predicted by the model for different node inhibitions. Survival probability in the control case (unperturbed model) is marked with a gray line.
Probability of the node “Survival” predicted by the model for different node inhibitions. Survival probability in the control case (unperturbed model) is marked with a gray line.We note that the inhibition of p38 is predicted to increase survival, consistent with experimental observations.57, 76, 77 However, the observation that Erk and Stat3 have an important role in LNCaP cells57 is not verified, as it has no influence on survival in the model. This suggests that the network should be further extended with a description of the effect of Stat3 on downstream proteins in order to account for this experimental observation.Beyond single drug predictions, predicting efficacy and specificity of drug treatments and drug combinations within specific tumor contexts and for individual patients is a major challenge, especially for diseases characterized by a high heterogeneity, such as prostate cancer.78 Multiple perturbations are easily introduced in logic models and can address this problem.As a possible first step in this direction, we suggest to systematically compare the effect between single and double perturbations.79 A drug–drug interaction denoting synergy or antagonism is found when the phenotypic quantitative effect of a double perturbation deviates from the effect predicted by the simple linear additive combination of single perturbations, and can suggest candidate targets for drug combinations. Each interaction is thus associated with a score:
where
and
are phenotype ϕ (here survival) fitness values (asymptotic probabilities) of single perturbations,
is the phenotype fitness of the double perturbation, and
is the phenotype fitness of the unperturbed model.The synergistic or antagonistic interactions associated with the highest score (absolute value higher than 2.3) are represented as a graph in Figure
6, and the table of synergistic scores is available in the Supplementary Materials.
Figure 6
Network of synergistic and antagonistic interactions computed for the trained model, with random initial conditions (except for Stress = 0), with Survival as quantitative phenotype. Red triangles represent gain of function alterations and green glyphs represent loss of function alterations. Edges between two alterations show that a combined alteration has a drastic decreasing (in blue) or increasing (in green) effect on the Survival probability when compared to single alterations.
Network of synergistic and antagonistic interactions computed for the trained model, with random initial conditions (except for Stress = 0), with Survival as quantitative phenotype. Red triangles represent gain of function alterations and green glyphs represent loss of function alterations. Edges between two alterations show that a combined alteration has a drastic decreasing (in blue) or increasing (in green) effect on the Survival probability when compared to single alterations.Blue interactions denote a negative effect on survival (synergy) and predict efficient drug combinations, while antagonistic interactions (in green) have a positive effect on survival and show possible resistance.The intensity of the color is proportional to the score value. Nonsymmetric interactions are represented by an arrow. In that case, the source node is responsible for the predominant effect: it is the most efficient single treatment in synergistic interactions, and it confers the resistance in antagonistic interactions.Computational predictions of drug–drug interactions is especially useful for larger networks, since they can be used to inform high‐throughput screenings, thereby decreasing the scope and hence cost of experiments.Here, the graph contains several synergistic interactions between loss‐of‐function perturbations on the Ras and PI3K pathways, which characterize their parallel influence on the survival probability and predict that combinations of drugs targeting both pathways have a major impact on cell survival.Moreover, such interactions can also be interpreted as genetic interactions between mutations (knockout or overexpression), associated with epistasis scores. Genetic interactions provide insights into relationships between different biological functions, and highlight mutation properties such as sensitivity or resistance predicted by the model. Patient genetic profiles can then be exploited to predict resistance mechanisms and identify personalized treatments. For example, Figure
6 shows that an activating mutation of RAS induces a sensitivity to p53 targeting. In contrast, a gain‐of‐function mutation of AKT provides a resistance to PI3K‐targeted treatment.
CONCLUSION
In this tutorial, we outlined a methodology using logic modeling to better understand and predict the way a system responds to different perturbations. The workflow we present is generic and flexible enough to be adapted to many different other cases. Once we have chosen a system to model, the first step is to build a network based on what is known about the system (PKN). Here, public databases and resources like Omnipath help to gather and relate known information. When appropriate experimental data are available, they can be used to refine the PKN. Tools like CellNOpt help us in this process. Finally, with a functional model, using tools like MaBoSS, we can simulate different cellular and experimental conditions and predict the effect of pharmacological interventions.In all these processes, the construction of the network is crucial in accurately predicting drug effects.Although not all the tools used here work under the same programming environment, the use of standard formats, in particular SBML‐qual,55 facilitate the communication between them. To get the most out of Omnipath, the use of its related Pypath python package is recommended. While the advanced features of CellNOpt are available in R (a wrapper for python is also available), MaBoSS is a command line software and Cytoscape is mainly a standalone application.To help with finding the correct network structures, experimental data coming from technology such as antibody‐based arrays and/or mass spectrometry is of invaluable help. But having access to these data is not always feasible. Both of these approaches require a high amount of cell material, which is not a problem when dealing with cell lines or patient material available in large amounts (e.g., blood samples), but becomes a limiting factor when only a small amount of cells are available, such as in the case of biopsies or resections from solid tumors. Approaches for functional ex vivo screening of patient samples are currently being developed and recent works focus in particular on the investigation of new fast reporters80 and microfluidic‐based technologies that exploit small volumes to test a large number of drug combinations.81 These methods are so far limited to the investigation of the activity of only one node in the network. On the other hand, single timepoint omics data measured on patient samples provide constraints on many nodes for model training, with mutations integrated as perturbations and expression data as initial conditions or steady state goals.While new technology is being developed to predict drug response in patients, cell lines and in silico systems are being successfully exploited to show how differential responses to drugs can derive from a different wiring of the signaling network and how systems pharmacology approaches provide useful tools for personalized medicine. In particular, a number of recent studies have successfully studied logic models to investigate signaling pathways and suggest effective drug combinations which were then validated in vitro and/or in vivo.18, 19, 20, 45 Mathematical models calibrated using cell lines have also been proved effective in predicting clinical patient outcomes.82Overall, network‐based models allow us to formalize this reasoning into a mechanistic computational model, and infer conclusions about a drug's effects from quantitative simulations in a principled manner. For this purpose, logic modeling is a useful approach to capture biological mechanisms in a simplified manner.23 While this simplicity can render models unable to accurately describe important molecular mechanisms,83 it allows us to model larger signaling networks than more detailed approaches such as reaction‐based differential equations. Due to this scalability, we expect logic modeling to become an increasingly used approach in systems pharmacology to gain valuable insights, powered by new developments in data acquisition techniques.
SUPPLEMENTARY MATERIALS
Supplementary materials can be found on GitHub at https://github.com/saezlab/CPT_QSPtutorial. Supplementary files include: model files, codes for CellNOptR and for pypath, results of analyses (logic‐based ODE parameters, MaBoSS simulations and genetic interactions), and a description of a step‐by‐step model extension using pypath.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.Supporting InformationClick here for additional data file.
Authors: Michael J Lee; Albert S Ye; Alexandra K Gardino; Anne Margriet Heijink; Peter K Sorger; Gavin MacBeath; Michael B Yaffe Journal: Cell Date: 2012-05-11 Impact factor: 41.582
Authors: Federica Eduati; Javier De Las Rivas; Barbara Di Camillo; Gianna Toffolo; Julio Saez-Rodriguez Journal: Bioinformatics Date: 2012-06-25 Impact factor: 6.937
Authors: Alessandro Di Cara; Abhishek Garg; Giovanni De Micheli; Ioannis Xenarios; Luis Mendoza Journal: BMC Bioinformatics Date: 2007-11-26 Impact factor: 3.169
Authors: David Croft; Antonio Fabregat Mundo; Robin Haw; Marija Milacic; Joel Weiser; Guanming Wu; Michael Caudy; Phani Garapati; Marc Gillespie; Maulik R Kamdar; Bijay Jassal; Steven Jupe; Lisa Matthews; Bruce May; Stanislav Palatnik; Karen Rothfels; Veronica Shamovsky; Heeyeon Song; Mark Williams; Ewan Birney; Henning Hermjakob; Lincoln Stein; Peter D'Eustachio Journal: Nucleic Acids Res Date: 2013-11-15 Impact factor: 16.971
Authors: Boris Aguilar; David L Gibbs; David J Reiss; Mark McConnell; Samuel A Danziger; Andrew Dervan; Matthew Trotter; Douglas Bassett; Robert Hershberg; Alexander V Ratushny; Ilya Shmulevich Journal: Gigascience Date: 2020-07-01 Impact factor: 6.524
Authors: Karim Azer; Chanchala D Kaddi; Jeffrey S Barrett; Jane P F Bai; Sean T McQuade; Nathaniel J Merrill; Benedetto Piccoli; Susana Neves-Zaph; Luca Marchetti; Rosario Lombardo; Silvia Parolo; Selva Rupa Christinal Immanuel; Nitin S Baliga Journal: Front Physiol Date: 2021-03-25 Impact factor: 4.566