Literature DB >> 31510648

Controlling large Boolean networks with single-step perturbations.

Alexis Baudin¹, Soumya Paul², Cui Su³, Jun Pang^2,3.

Abstract

MOTIVATION: The control of Boolean networks has traditionally focussed on strategies where the perturbations are applied to the nodes of the network for an extended period of time. In this work, we study if and how a Boolean network can be controlled by perturbing a minimal set of nodes for a single-step and letting the system evolve afterwards according to its original dynamics. More precisely, given a Boolean network (BN), we compute a minimal subset Cmin of the nodes such that BN can be driven from any initial state in an attractor to another 'desired' attractor by perturbing some or all of the nodes of Cmin for a single-step. Such kind of control is attractive for biological systems because they are less time consuming than the traditional strategies for control while also being financially more viable. However, due to the phenomenon of state-space explosion, computing such a minimal subset is computationally inefficient and an approach that deals with the entire network in one-go, does not scale well for large networks.
RESULTS: We develop a 'divide-and-conquer' approach by decomposing the network into smaller partitions, computing the minimal control on the projection of the attractors to these partitions and then composing the results to obtain Cmin for the whole network. We implement our method and test it on various real-life biological networks to demonstrate its applicability and efficiency. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 31510648 PMCID： PMC6612811 DOI： 10.1093/bioinformatics/btz371

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

In control theory, a dynamical system is controllable if, through an appropriate manipulation of a few parameters, it can be driven from any initial state to any desired final state within finite time. Although control theory is a mathematically highly developed branch of engineering with applications to electric circuits, manufacturing processes, communication systems, robots etc., fundamental questions pertaining to the controllability of complex biological networks have resisted rapid advances. The reasons for this are 3-fold. First, biological networks tend to be large with an exponential increase in combinatorial complexity with the addition of every parameter or interaction which in turn effects their controllability. This is often referred to as the ‘dimensionality problem’ (Hecker ). Secondly, such networks are highly non-linear with switch-like interactions between the components. It is unclear how the linear functions usually studied in traditional control theory could capture such dynamics (Tyson , 2003; Zañudo and Albert, 2015). And finally, the notion of controllability in biological systems is different from the classical definition of linear controllability. In such systems, rather than controlling single states, the control of collective dynamic behaviour may be more feasible (Wang ). The recent discoveries in cell reprogramming have rekindled the interest in the control of cellular behaviour and biological systems in general. Cell reprogramming is a way to change one cell phenotype to another, allowing tissue or neuron regeneration techniques. Current studies have shown that differentiated adult cells can be reprogrammed to an embryonic-like pluripotent state or directly to other types of adult cells without the need of intermediate reversion to a pluripotent state (Graf and Enver, 2009; Sol and Buckley, 2014). This has led to a surge in regenerative medicine and there is a growing need for the discovery of new and efficient methods for the control of cellular behaviour. Such medicines target specific proteins within the cellular systems aiming to drive it from any state to a desired phenotype. This motivates the question of identifying multiple drug targets using which the network can be ‘controlled’, i.e. driven from any state to any desired target. Furthermore, for the feasibility of the synthesis of such drugs, the number of such targets should be minimized. However, as already mentioned, biological networks are intrinsically large (number of components, parameters, interactions, etc.) which results in an exponentially increasing number of potential drug target combination making a purely experimental approach quickly infeasible. This reinforces the need for mathematical modelling and efficient computational techniques. Boolean networks (BNs), first introduced by Kauffman (1969), are a popular and well-established framework for modelling gene regulatory networks and their associated signalling pathways. Its main advantage is that it is simple and is yet able to capture the important dynamical properties of the system under study (D’haeseleer ), thus facilitating the modelling of large biological systems as a whole. The BN is assumed to evolve dynamically by moving from one state to the next governed by a Boolean function for each of its components. The steady state behaviour of a BN is given by its subset of states called attractors to one of which the dynamics eventually settles down. In biological context, attractors are hypothesized to characterize cellular phenotypes (Kauffman, 1969) and also correspond to functional cellular states such as proliferation, apoptosis, differentiation, etc. (Huang, 2001). The control of a BN therefore refers to the reprogramming/changing of the parameters of the BN (functions, values of variables, etc.) so that its dynamics eventually reaches a desired attractor or steady state. The control of linear networks is a well-studied problem (Kalman, 1963) and such control strategies have been proposed over the years. Recent work on network controllability has shown that the control and reprogramming of intercellular networks can be achieved by a small number of control targets (Kim ). The control of such networks can have two objectives: to drive the dynamics to (i) a single desired target attractor of the network irrespective of the current state. We shall call such a control target control or TC, (ii) any attractor of the network irrespective of the current state. We shall call this type of control full control or FC. Now, biological networks (both intracellular and intercellular) are intrinsically non-linear and the strategies developed for the control of linear networks do not directly apply to these networks. Moreover, networks with non-linear dynamics are arguably more complex with many feed-forward and feedback loops for both activation and inhibition. This might explain why there has not been a lot of work on the control problem for non-linear networks. For the target control problem, Kim developed a method to identify the so-called ‘control kernel’, which is a minimal set of nodes for driving a synchronous BN into a desired attractor. Their method is based on the construction of the full state transition graph of the network and as such does not scale well for large networks. Zhao developed a network graph aggregation approach to control synchronous BNs. These two methods, however, are not applicable for asynchronous BNs. For the control of asynchronous BNs, Zañudo and Albert (2015) developed an extended period control method to identify a set of nodes based on the ‘stable motifs’ (SM) of the network to drive the network towards a desired target attractor. For the problem of full control, Fiedler ; Mochizuki ; Zañudo ) developed a method for controlling networks, whose dynamics are governed by ordinary differential equations (ODEs) by computing the feedback vertex set (FVS) of the corresponding dependency graph. It is however unclear how their method can be lifted to the discrete switch-like dynamics of BNs. The control strategies in the above and most of the methods studied in the literature have one thing in common—the perturbation is applied continuously for an extended period of time. However, there are and can be obvious drawbacks to such a strategy. For example, the concentration of the complexes (drugs, viruses, etc.) applied for the perturbations might fall below the requisite threshold over time in which case it needs to be administered again and again to maintain appropriate levels. For example, half-life or decay rates exist for almost any substance that is ever added to cells—whether it is a drug or a nutrient or a virus—and there will be degradation due to temperature, evaporation, depletion by the cells. etc. This is discussed, for example in Michels and Frei (2013) where they mention the decay of ascorbate in cell culture medium. For the case of adding virus to cells, the depletion of active virus in the cell culture dish happens relatively fast and can therefore be a limiting factor for inserting a potential gene (say) into the target cells via the virus. This typical issue of low transduction efficiency is often counteracted by adding the virus to the cell repeatedly (e.g. see Zhu ). Such repeated administration of the virus is also called for when the experimenter wants to target multiple cells instead of just one (Charrier ; Hofherr ). The phenomenon also occurs when inserting smaller copies of gene into the cell without integrating it into the genome, which does not require the help of a virus. Even in such cases, the experimenter has to try to add the gene-copies repeatedly since the genes are not attached to the cell’s genome (Cervera ). The repeated addition of the complexes to the cell thus requires constant monitoring of the system over an extended period of time. Furthermore, the complexes themselves are difficult and expensive to acquire prohibiting their extensive use. Thus a more short-term control strategy might be well suited for biological networks (Cornelius ). In this work we explore such a control strategy where the perturbation is applied for a single time-step (read instantaneously) and the system is left to evolve on its own, according to its original dynamics. For both versions of the control problem, TC and FC, we develop a method to identify an exact minimal set of nodes of a given Boolean network BN, such that the above controls can be achieved by perturbing some of the nodes in . Such short-term control strategies have been studied in the literature (Cornelius ), where a control method based on simulations for large networks has been proposed. Although the ideas presented in Cornelius are quite relevant to those we use here, their methods do not directly compare to the ones that we develop in this work. Indeed, since first, they deal with ODE networks, and not Boolean networks. And secondly, since there does not yet exist ways to compute the basins of attractions of ODE networks, their method is based on simulations where the search is automatically terminated if the system is not controlled within a sufficiently large number of iterations. On the other hand, we can indeed compute efficiently the attractors and basins of BNs using methods developed in-house, and hence can compute the ‘exact’ minimal control for a given BN. It is well-known that the precise identification of control parameters and control strategies of non-linear networks must exploit both their structural and dynamic properties (Gates and Rocha, 2016). This rules out purely structure-based methods for identifying the exact control subset, like that of Liu , which has been shown to either overshoot or undershoot the control subset for different networks (Gates and Rocha, 2016). The dynamics of a Boolean network BN is given in terms of its transition system, which as we already observed is exponential in the size of BN itself. Any non-simulation-based algorithm that purely exploits this dynamics by working on entire BN in one-go has to, in principle, work with the full transition system, and thus has limited scalability. As the BN grows in size, the number of possible behaviours (traces) grows exponentially with it (state-space explosion). This means that even any simulation-based algorithm has to deal with a very large number of traces to preserve their guaranteed accuracy. This, in turn, limits their efficiency as well. Our algorithm takes the approach of ‘divide-and-conquer’ whereby it decomposes the network into smaller partitions, computes the minimal set of control nodes in each of these partitions and then composes the results to obtain the set for the whole network. While doing the composition, the algorithm crucially needs to check whether there exist subsets of that can be perturbed in the starting state that results in a state that belongs to the ‘basin of attraction’ of the target attractor(s), from which there only exist paths towards the target attractor and there is no path leading to any other attractor of the network. We therefore assume that the algorithm is able to call the efficient procedure developed in Paul to compute the basin of attraction of an attractor of BN. It is worth noting that our algorithm always computes an exact minimal set of control nodes. We have implemented our algorithm and tested it on a variety of real-life biological networks modelled as BNs. We also compared our results with the existing approaches for the control of non-linear networks. For TC, we compared our method with the stable motifs based control (SM) of Zañudo and Albert (2015) and for FC, we evaluate its performance without any comparison as, to the best of our knowledge, no method for the full control of asynchronous BNs exists in the literature. Our findings can be summarized as follows: For TC, our method outperforms the SM based method in terms of efficiency (for almost all the networks). For FC, our method can compute the minimal full control set efficiently. The advantage of our method is that we give the exact strategy to be applied for the control given any source state and any target attractor. We particularly note that, even for very large networks, the subset of control nodes identified for both control strategies forms a relatively small set which is a desirable property for the control of such networks.

2 Background and notations

Let where . A Boolean network is a tuple where such that each x is a Boolean variable and is a tuple of Boolean functions over . In what follows, i will always range over N, unless stated otherwise. A Boolean network may be viewed as a directed graph , called the dependency graph of BN, where is the set of vertices or nodes (intuitively, v corresponds to the variable x for all i) and for every , there is a directed edge from v to v, often denoted as , if and only if f depends on x. Thus V is ordered according to the ordering of . The structure of BN refers to the structure of its dependency graph. For any vertex , we let be the index of v in this ordering. For any subset W of V, . For the rest of the exposition, we assume an arbitrary but fixed network BN of n variables is given to us and is its associated dependency graph. A states of BN is an element in . Let S be the set of states of BN. For any state , and for every i, the value of s, often denoted as , represents the value that the variable x takes when the BN ‘is in state s’. For some i, suppose f depends on . Then will denote the value . For two states , the Hamming distance between s and s′ will be denoted as and will denote the set of indices in which s and s′ differ. For a state s and a subset , the Hamming distance between s and S′ is defined as . We let denote the set of subsets of N such that if and only if I is a set of indices of the variables that realize . The behaviour of BN is captured by its evolution dynamics which is defined as follows. Initially, BN is in a state and its state changes in every discrete time-step according to the update functions f. In this work, we shall be exclusively concerned with the asynchronous updating scheme but all our results transfer to the synchronous updating scheme as well. Suppose is an initial state of BN. The asynchronous evolution of BN is a function such that and for every , if then , is a possible next state of s, if and only if either and where or and there exists i such that . Note that the asynchronous dynamics is non-deterministic. The dynamics of a Boolean network can be represented as a state transition graph or a transition system (TS). The transition system of BN, denoted by the generic notation TS is a tuple where the vertices are the set of states S and for any two states s and s′ there is a directed edge from s to s′, denoted , if and only if s′ is a possible next state of s. A path from a state s to a state s′ is a (possibly empty) sequence of transitions from s to s′ in TS. A path from a state s to a subset S′ of S is a path from s to any state . For a state denotes the set of states s′ such that there is a path from s to s′ in TS. An attractor A of TS (or of BN) is a minimal subset of states of S such that for every . A state which is not part of an attractor is a transient state. An attractor A of TS is said to be reachable from a state s if . Attractors represent the stable behaviour of the BN according to the dynamics. For an attractor A of TS, the basin of attraction of A, denoted , is a subset of states of S such that if and for any attractor of BN. A control C is a (possibly empty) subset of N. For a state , the application ofC to s, denoted , is defined as the state such that if and otherwise. Henceforth, we shall drop the subscripts TS when no ambiguity arises. Control problems. Let BN be a given Boolean network, S be the set of states of BN and be the set of all its attractors. We are interested in the following kinds of control on BN. Note that for us, the control is applied in a single time-step (hence simultaneously) to the current state s under consideration and the system is let to evolve as per its original dynamics afterwards. Source-target control (STC): Let and let be a target attractor, A control is an STC for s and A if, after the application of to s, BN eventually reaches A. Target control (TC): Let be a target attractor. A control is a TC for A if for any attractor , and for any state , there exists a subset of such that is an STC of s for A. Full control (FC): A control C is an FC for BN if for any pair of attractors , and for any state , there exists a subset of C such that is an STC of s for A. Given the above kinds of control, we are interested in the following control problems on a non-linear, asynchronous BN: min- Given BN, a source state s and a target attractor , find a minimal STC. Such an STC will be called a min-STC and denoted as . min- Given BN, and a target attractor , find a minimal TC. Such a TC will be called a min-TC and denoted as . min-FC problem: Given BN and the set of attractors , find a minimal FC for BN. Such a control will be called a min-FC and denoted as . In Paul , we developed a decomposition-based approach for the efficient solution to the min-STC problem [item (1) above] for large BNs exploiting both their structure and dynamics. We showed that the efficient computation of the minimal control given a target attractor A boils down to the efficient computation of the basin, of A. We therefore developed an algorithm for the computation of by decomposing the BN into connected components called blocks, computing the local basins of the projections of A to each of these blocks and then eventually merging these local basins to obtain . We demonstrated both efficiency and effectiveness of our approach on different real-life biological networks. In this work, we shall target the control problems (2) and (3) listed above. Note that for control problem (3), we assume that the set of attractors of BN is already given to us. If however, is not known, we first need to compute from BN for which we have already developed and implemented efficient procedures (see e.g. Mizera , 2018; Yuan ). In the algorithms that we develop here, we shall use the procedure to compute the basin of a given attractor A of a given Boolean network developed in Paul and shall refer to it as Compute_Basin(A).

3 Results

Towards the solution of control problems 2 and 3 above, we first define a generic control problem which we call the Minimal All-Pairs Control. Minimal All-Pairs Control (min- Let be the set of all attractors of BN and let be subsets of attractors, called source and target attractors respectively. A control is an APC for and if for any pair of attractors and any state , there exists such that is an STC of s for A. An APC which is minimal is called a min-APC and is denoted as . The min-APC problem is then: given and , find a min-APC. The control problems min-TC and min-FC are special cases of the min-APC problem when is a singleton and when , respectively. We first observe that the min-APC problem is computationally at least as hard as the min-STC problem. Indeed, since the min-STC problem for a source state s and a target attractor A, where s is a fixpoint attractor, is a special case of the min-APC problem where and . Since min-STC is already hard for PSPACE (Mandon ; Paul ), efficient solutions for min-APC are highly unlikely. To gain an intuition into the problem, suppose all the attractors in are singleton states (fixed points). Suppose, is a source attractor and is a target attractor. It can be easily observed that the BN eventually and surely reaches A following the update dynamics, after a control C is applied to s, if and only if (Paul ). Also, for any state , the number of nodes to perturb to move from s to t is and these nodes are given as . So, let M be a matrix such that for every pair of attractors and , the (A, A)th entry of is a set of subsets of N such that for any subset if and only if there exists such that . is then a minimal subset of N such that there exists a subset of in for every pair of attractors and . The following example illustrates the problem in details. Example 1. Consider a Boolean networkwhereandwhereand. The dependency graph of BN and its TS is shown inWe suppress the self loops present in each of the states of the TS to avoid clutter. It has 3 single-state attractorsshown as dark grey nodes, where. The basins of attractions of the respective attractors are shown as shaded grey regions.

Fig. 1.

(a) Boolean functions, (b) dependency graph and (c) TS for Example 1. The basins of attractions of the respective attractors are shown as shaded grey regions

(a) Boolean functions, (b) dependency graph and (c) TS for Example 1. The basins of attractions of the respective attractors are shown as shaded grey regions shows the matrix M that notes the indices of the variables that need to be changed to move from an attractor Ato the basin of another attractor A. From M we see that both the sets {1, 2, 3} and {1, 3, 4} are min-FCs. However, {2, 3, 4}, for example, is not a min-FC since it is not possible to move to the basin of A4.

Table 1.

The matrix showing the indices to be controlled for pairs of attractors

	A ₁	A ₂	A ₃
A ₁	∅	{{3}, {2, 3}, {3, 4}, {2, 3, 4}}	{{1}, {1, 2}, {1, 3}, {1, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 3, 4}}
A ₂	{{3}, {3, 4}, {2, 3, 4}}	∅	{{1}, {1, 2}, {1, 3}, {1, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 3, 4}}
A ₃	{{1, 2, 3}, {1, 3, 4}, {1, 2, 3, 4}}	{{1}, {1, 2}, {1, 4}, {1, 2, 4}}	∅

The matrix showing the indices to be controlled for pairs of attractors We propose an algorithm based on the approach of ‘divide-and-conquer’ wherein we decompose the network into smaller partitions and solve the min-APC problem on these partitions. We then combine the results to obtain the control set for the entire network. We show that using such an approach, we can solve the problem on large Boolean networks arising from real-life biological systems much more efficiently compared with a global approach that works on the entire network in a single go. Towards that, we first need the notion of projection of a state to a subset of nodes of BN. All-pairs control Let be a subset of V, the projection of s to , denoted is an element of defined as . The projection operation is lifted to a subset S′ of S as . A decomposition of BN is defined as a partitioning of V. Each will be called a partition of BN. For any attractor and for any partition V, and are well-defined. Given sets of source and target attractors and , respectively, and a partition V, is an APC on V if it satisfies the all-pairs control properties on BN projected to V. That is, for all and implies for all , there exists such that . The idea of the algorithm is based on the following proposition. Proposition 1. Letandbe sets of source and targets attractors of BN and letbe a decomposition of V. Ifis a min-APC of BN thenis a min-APC on partition Vj for all. Furthermore, . Proof. Suppose that is an APC of BN. Then by definition, for every pair of attractors and , and for all , there exists such that . This implies, for every partition V, . Now, it must hold that . Thus, by definition, is an APC on V. Moreover, since the partitions are mutually disjoint, we have . Next, suppose is also a minimal APC of BN but there exist some such that is not a minimal APC on . Let be a minimal APC on such that . Then, from above, we have that there is another control which is a minimal APC of BN and , since the partitions are mutually disjoint. But this contradicts the minimality of .

3.1 Main algorithm

We now describe our Algorithm 1, to solve the min-APC problem. The algorithm takes as input the functions of a Boolean network BN, sets of source and target attractors and and the size m of partitions that BN will be decomposed into and works as follows. It first computes and stores the basins of attractors of the attractors in using the procedure Compute_Basin developed in Paul . It randomly decomposes BN into partitions each of size at most m (line 2 of Algorithm 1). For each partition V it computes the set of min-APCs, , on V using the helper function Min_Control (line 4). Let where . By Proposition 1, we know that the size of a min-APC, for BN is at least r. The algorithm chooses one min-APC from each partition and checks if their union is a valid APC on the entire network BN by using the helper function Is_Control which queries the basins of attractions of the target attractors already computed. This is done in lines 8–14. If it cannot find an APC of size r, it increases the value of r by 1 and repeats the process: for each partition V it computes the set of APCs of the next larger size on V using the helper function Fixed_Control (line 18). It checks if there is a union of APC from each of the partitions the sizes of which sum to the new value of r and such that it forms a valid APC on BN. It repeats this process each time increasing the value of r by 1 till it finds a min-APC for BN, (lines 15–20). The correctness of the algorithm is therefore trivially guaranteed. We next describe the procedures Min_Control, Is_Control and Fixed_Control (Algorithm 2) used in Algorithm 1. We assume that the basins for all the attractors in has been computed using the procedure Compute_Basin developed in Paul and stored in an appropriate global data structure and can be accessed by all these procedures. For will denote the basin of A as computed using Compute_Basin. Algorithm 2. Helper functions Min_Control takes as input the description of the Boolean network, the sets of the source and the target attractors and a partition V and it returns the min-APCs on partition V. To do that it first computes the projection to V of every state for every and of every . Then for i from 0 to , it checks if any subset C of of size i satisfies the APC properties on V. That is, if for all and implies for all . It returns all such subsets of size i (for the lowest value of i) and exits. The procedure Fixed_Control is similar to Min_Control except that it returns an APC on V of size if it exists. Otherwise, it returns the empty set. Is_Control checks if the given subset C is indeed an APC for and . It does so by verifying if for all and all and for all , there exists a subset of C such that . As explained in Section 3, TC and FC are special cases of the APC problem. Thus, we compute and with Algorithm 1 by setting and , respectively. We explain the working of Algorithm 1 here with a representative example. Example 2. Continuing with the Boolean network of Example 1, suppose now that we divide the vertices V of , and. The projections to these partitions of the attractors inand their respective basins are given in

Table 2.

The projections of the attractors and basins to V1 and V2

V1={v1,v2}		V2={v3,v4}
Attractor	Basin	Attractor	Basin
00	00, 01	00	00, 01
00	00, 01	11	11, 10
11	11, 10	11	11,10,01,00

The projections of the attractors and basins to V1 and V2 The algorithm works as follows. In Step 1, it computes the min-APC sets for the projections to the partitions Vand, respectively. Combiningandwe get {1, 3} but the check Is_Control returns that {1, 3} is not a valid full control for the whole network. So the algorithm moves to Step 2, where it looks for controls of size 3. For that it needs to find APCs of size 2 in the projections to each of the partitions Vandcomputed in Step 1, to find a control for the whole network. TheAPCs of size 2 that it finds for the two partitions in Step 2 areand. Combiningandwe get {1, 3, 4} and combiningandwe get {1, 2, 3} both of which are validAPCs for BN which are alsoFCs in this example. Hence, the size of a minimumFCis 3. Remark. We make a quick remark on the computational complexity of our algorithm. Note that the algorithm can, in the worst case, take time exponential in the size of its input, which is the BN, the source and target attractors and the partition size. One way in which this can happen is, for example, when Is_Control in line 11 of Algorithm 1 returns FALSE for exponentially many potential controls before finding a valid APC. This, in turn, occurs when although each of the local controls are valid APCs on the partitions but their union C is not a valid APC for the entire BN (the resulting state does not belong to the strong basin of some target attractor in ). However, as we see in Section 4, such a case is extremely rare for BNs constructed for real-life biological networks. For such networks, Is_Control succeeds to find a valid APC within 2–3 iterations. This makes our procedure quite efficient on such networks.

4 Evaluation

As discussed in Section 1, the control method based on the computation of stable motifs (SM) (Zañudo and Albert, 2015) is a method of control applied for an extended period for the target control of asynchronous BNs. In this section, we compare our single-step control method for the min-TC problem (which we simply call TC) with SM even though the control computed by our method is applied only for a single time-step. Regarding the full control of asynchronous BNs, as we are not aware of any previous work in the literature that deals with the exactly same problem, we simply evaluate the performance of our method to compute the min-FC of a BN (which we simply call FC henceforth) to demonstrate its potential. We apply these methods to 10 biological networks (Cohen ; Conroy ; Grieco ; Kim ; Naldi ; Offermann ; Remy ; Saez-Rodriguez ; Schlatter ; Singh ). Our methods for the computation of min-TC and min-FC are implemented as part of the software tool ASSA-PBN (Mizera ). All the experiments are performed on a computer with a CPU of Intel Core i7 @3.1 GHz and 8 GB of DDR3 RAM. Description of the networks. We first describe the networks under study. The myeloid differentiation network is designed to model myeloid differentiation from common myeloid progenitors to megakaryocytes, erythrocytes, granulocytes and monocytes (Krumsiek ). This network has 11 nodes and 6 attractors, 4 of which agrees with microarray expression profiles of two different studies. The tumour network is built to study the role of individual mutations or their combinations in the metastatic process (Cohen ). This network contains 32 nodes and 9 attractors, which are consistent with Cohen . The PC12 cell network models the temporal sequence of protein signalling, transcriptional response and subsequent autocrine feedback (Offermann ). It has 33 nodes and 7 attractors. The bladder cancer network allows one to identify the deregulated pathways and their influence on bladder tumourigenesis (Wang ). It has 35 nodes. When the input nodes EGFR_stimulus and Growth_inhibitors are set to ON and DNA_damage is set to OFF, the network has four attractors: three correspond to growth arrest and one corresponds to cell proliferation. The MAPK network is constructed to study the MAPK responses to different stimuli and their contributions to cell fates (Grieco ). In this paper, we use the MPAK mutant r3, which has 53 nodes and 20 attractors. The model for HGF-induced keratinocyte migration captures the onset and maintenance of hepatocyte growth factor-induced migration of primary human keratinocytes (Singh ). It has 66 nodes and 18 attractors. The Th-cell differentiation network models the regulatory network and the signalling pathways controlling Th-cell differentiation (Naldi ). It consists of 68 nodes and 12 attractors with the same initial condition as mentioned in Naldi . The model of T-cell receptor signalling describes the complex signalling network governing the activation of T-cells via several receptors, including the T-cell receptor, the CD4/CD8 co-receptor, and the accessory signalling receptor CD28 (Saez-Rodriguez ). It has 95 nodes and 16 attractors are detected under certain conditions. The apoptosis network captures the central intrinsic and extrinsic apoptosis pathways and the pathways connected with them (Schlatter ). It has 97 nodes and 32 attractors when the nodes FASL_2, IL_1, TNF, UV, UV_2, FASL are fixed to OFF. The CD4+ T-cell network allows us to study the downstream effects of CAV1+/+, CAV1+/− and CAV1−/− on cell signalling and intracellular networks (Conroy ). This network is comprised of 188 nodes and 12 attractors under certain initial conditions. An overview of the networks is given in Table 3. (We refer the sizes of the basins of attractors to the Supplementary Material.)

Table 3.

An overview of the networks and a comparison of the three methods on the control sets

Network	Nodes	Edges	Attractors	CminTC	CSM	∩	CminFC
Myeloid	11	30	6	3	3	2	8
Tumour	32	158	9	2	*	*	14
PC12	33	62	7	1	1	1	15
Bladder	35	116	4	1	1	1	16
MAPK	53	105	20	4	4	4	20
HGF	66	103	18	4	*	*	34
Th-diff	68	175	12	3	2	2	17
T-cell	95	159	16	4	4	4	4
Apoptosis	97	192	32	5	5	5	5
CD4+	188	380	12	4	3	3	5

Note: represents the overlaps between and The symbol ‘*’ means the method fails to compute the results within 12 h.

An overview of the networks and a comparison of the three methods on the control sets Note: represents the overlaps between and The symbol ‘*’ means the method fails to compute the results within 12 h. Selection of the partition size. We perform experiments on the biological networks described above to find out the best size of partitions for TC and FC. Since TC is a special case of FC, we only perform experiments for FC by setting the maximum size of a partition from 1 to 20 and comparing the time costs. Figure 2 shows the normalized time costs with different sizes of partitions for the 10 networks. When the size equals 3, FC has the best efficiency for most of the networks. Hence, we set the partition size m = 3 except for the TC of HGF-induced keratinocyte migration, which is explained later.

Fig. 2.

Influence of the block size on the efficiency of FC

Influence of the block size on the efficiency of FC Effectiveness. As illustrated in Proposition 1, our computation methods TC and FC identify the minimal control sets for single-step control. SM is an extended period control and it does not guarantee the minimality of the control sets as mentioned in Zañudo and Albert (2015). Table 3 gives the sizes of the control sets computed by the three methods. It is worth noting that SM may capture unnecessary nodes. Taking the myeloid differentiation network as an example, Figure 3 gives the control nodes required by TC and SM to drive the network towards one of the attractors. The grey rectangular node—required by SM solely—has the same value in all the attractors, thus there is no need to control it.

Fig. 3.

The results of TC and SM on the myeloid differentiation network

The results of TC and SM on the myeloid differentiation network Columns and are the number of driver nodes for one of the attractors computed by TC and SM. We can see that the results computed by the two methods are very close (see column in Table 3). Compared with SM, TC may lead to slightly larger control sets, like in the results of the Th-cell differentiation network and the CD4+ T-cell network, due to the application of different control strategies—SM focuses on extended period control while we use single-step control. Despite that, the number of control nodes for single-step control are still small relative to the sizes of the networks. The column describes the number of driver nodes required for the full control of the networks. For most of the networks, is much larger than . Three large networks (the T-cell network, the apoptosis network and the CD4+ T-cell network) have small control sets because the attractors are caused by few nodes. For instance, the 32 attractors of the apoptosis network result from all combinations of values of five input nodes, i.e. 25. Even though it has 97 nodes and 32 attractors, by controlling the five input nodes, we can gain full control of the network. Efficiency. Table 4 gives the execution time of the three methods. Note that the partition size m only has influence on and . The attractors and their basins are computed with methods in Mizera ) and Paul and their computation time may increase as the sizes of the networks increase.

Table 4.

The time costs of the three control methods (TC, FC and SM)

Network	TC and FC				SM
Network	Tatt	Tbas	TCminTC	TCminFC	Tatt	TCSM
Myeloid	0.002	0.004	0.004	0.001	6.989	7.846
Tumour	0.622	1.009	0.177	0.028	*	*
PC12	0.019	0.146	0.017	0.009	97.211	263.249
Bladder	0.881	0.318	0.813	0.745	26.955	32.587
MAPK	2.175	9.409	0.404	0.270	53.354	436.898
HGF	2.552	23.571	860.776	1.164	104.447	*
Th-diff	3.664	17.347	0.824	0.282	121.821	400.043
T-cell	2.170	14.762	0.565	0.335	58.418	9.967
Apoptosis	11.285	1230.200	1.778	1.045	222.241	55.578
CD4+	182.185	948.667	1.850	1.613	60.525	30.894

Note: Units of time are in seconds.

The time costs of the three control methods (TC, FC and SM) Note: Units of time are in seconds. and are the total time costs for computing the target control sets for all attractors of the networks. In general, our computation method TC outperforms SM in terms of efficiency for most of the networks. For the CD4+ T-cell network, SM is faster than our method on attractor detection, mainly due to the fact that this network is sparse and has a simple structure. But this is rare for biological networks, as they are necessarily dense to performs remarkably robust regulatory tasks (Adai ; Blanchini and Franco, 2011). The of HGF-induced keratinocyte migration shows that the iteration of Algorithm 1 (lines 7–21) can be very time consuming. Taking one of the attractors as an example, the initial is 13 and is of size 19. This implies that we need to traverse all solutions of size 13–19 to find and there may exist a considerable number of such solutions. According to extensive experiments, a larger m leads to a larger initial C, which reduces the number of iterations. However, a larger m also increases the time for the computation of . So m is the critical parameter in our control algorithms and has to be properly chosen. For this network, TC has the best efficiency when m = 10. Finally, the numbers for in Table 4 also show that our method is very efficient and scales well even for large-scale networks.

5 Conclusion

In this work, we have described a method to identify a minimal set of nodes Cmin, by perturbing which, for a single time-step, the network can be driven from any initial state in a source attractor to any target attractor. This method is adapted to solve the target control and full control of large-scale BNs. Compared with the traditional methods of control where the perturbation is applied for an extended period, such a control strategy is also realistic and easier to carry out in biological lab experiments. We showed that our method is efficient and the nodes required to control the network form a small subset of the set of all nodes in the network. In the future, as a continuation of the current work, we would like to apply our control algorithm to larger real-life biological networks and study its performance and applicability. As mentioned in Section 4, we found that the size of the partitions, m, has a big influence on the efficiency of our method. We would like to explore whether this is caused by a structural, or dynamic property of the network or a combination of the two. We would also like to extend our work to the setting of probabilistic Boolean networks (PBNs) and explore if and how to adapt the single-step control strategy to such networks and design efficient algorithms for their implementation.

Funding

This work was partially supported by the project SEC-PBN funded by University of Luxembourg and the ANR-FNR project AlgoReCell (INTER/ANR/15/11191283) funded by Luxembourg National Research Fund. Conflict of Interest: none declared.

Algorithm 1.

All-pairs control

1: procedure All_Pairs_Control (BN=(x,f),As,At,m)
2: i:=0,APC:={},k:=n/m, partitions:={V1,V2,…,Vk}
3: for j in [1, k] do	//initialize CMIN0j for all j
4: CMIN0j:= Min_Control(BN,As,At,Vj)	//min full control for V_j
5: size_min(j):=\|Cj\|whereCj∈CMIN0j	//minimum size of the control set on V_j
6: end for
7: whileAPC={}do
8: fora1,…,ak≥0such thata1+⋯+ak=ido
9: possAPC:={C1∪…∪Ck\|Ci∈CMINai(i),i∈{1,…,k}}	//for all possible controls of combined size i
10: forC∈possAPCdo
11: if Is_Control(C, BN,As,At) thenAPC:=APC∪{C}	//check if it is a valid APC for BN
12: end if
13: end for
14: end for
15: ifAPC={}then	//if a valid APC for BN has not yet been found
16: i←i+1;	//increase the size of the potential APC by 1
17: forj = 1 to kdo
18: CMINij:=Fixed_Control(BN,As,At,Vj,size_min(j)+i)	//look for an APC of the new size
19: end for
20: end if
21: end while
22: return APC
23: end procedure

Algorithm 2. Helper functions

1: procedure Min_Control (BN,As,At,Vj)
2: CMINj:={},max=\|Vj\|,success:=TRUE
3: fori = 0 to max do
4: forC⊆ind(Vj),\|C\|=ido	//for all subsets C of size at most max of the indices in V_i
5: forAs∈As,At∈At,As≠Atdo
6: fors∈Asdo
7: if¬(∃C'⊆C,C'(s\|Vj)∈bas(At)\|Vj)thensuccess:= FALSE	//check if there exists a subset of C such that applying it to
8: end if	//the projection of s to V_j results in a state in the projection
9: end for	//of bas(At) to V_j
10: end for
11: ifsuccess=TRUEthen
12: CMINj:=CMINj∪{C}, max:=\|C\|	//if a control has been found, max is set to its size
13: end if
14: end for
15: end for
16: returnCMINj
17: end procedure
18: procedure Fixed_Control(BN,As,At,Vj,m)
19: CMINj:={},success:=TRUE
20: forC⊆ind(Vj),\|C\|=mdo	//the potential control is of a fixed size m
21: forAs∈As,At∈At,As≠Atdo
22: fors∈Asdo
23: if¬(∃C'⊆C,C'(s\|Vj)∈bas(At)\|Vj)thensuccess:=FALSE
24: end if
25: end for
26: end for
27: ifsuccess=TRUEthenCMINj:=CMINj∪{C}	//a valid control on V_j has been found
28: end if
29: end for
30: returnCMINj
31: end procedure
32: procedure Is_Control (C,BN,As,At)
33: success:=TRUE
34: forAs∈As,At∈At,As≠Atdo
35: fors∈Asdo
36: if¬(∃C'⊆C,C'(s)∈bas(At))thensuccess:=FALSE	//check if there is a subset of C which is a valid APC on BN
37: end if
38: end for
39: end for
40: returnsuccess
41: end procedure

37 in total

1. Genetic network inference: from co-expression clustering to reverse engineering.

Authors: P D'haeseleer; S Liang; R Somogyi
Journal: Bioinformatics Date: 2000-08 Impact factor: 6.937

Review 2. Genomics, complexity and drug discovery: insights from Boolean network models of cellular regulation.

Authors: S Huang
Journal: Pharmacogenomics Date: 2001-08 Impact factor: 2.533

Review 3. Network dynamics and cell physiology.

Authors: J J Tyson; K Chen; B Novak
Journal: Nat Rev Mol Cell Biol Date: 2001-12 Impact factor: 94.444

Review 4. Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell.

Authors: John J Tyson; Katherine C Chen; Bela Novak
Journal: Curr Opin Cell Biol Date: 2003-04 Impact factor: 8.382

5. LGL: creating a map of protein function with an algorithm for visualizing very large biological networks.

Authors: Alex T Adai; Shailesh V Date; Shannon Wieland; Edward M Marcotte
Journal: J Mol Biol Date: 2004-06-25 Impact factor: 5.469

Review 6. Gene regulatory network inference: data integration in dynamic models-a review.

Authors: Michael Hecker; Sandro Lambeck; Susanne Toepfer; Eugene van Someren; Reinhard Guthke
Journal: Biosystems Date: 2008-12-27 Impact factor: 1.973

Review 7. Forcing cells to change lineages.

Authors: Thomas Graf; Tariq Enver
Journal: Nature Date: 2009-12-03 Impact factor: 49.962

8. Diversity and plasticity of Th cell types predicted from regulatory network modelling.

Authors: Aurélien Naldi; Jorge Carneiro; Claudine Chaouiya; Denis Thieffry
Journal: PLoS Comput Biol Date: 2010-09-02 Impact factor: 4.475

9. ON/OFF and beyond--a boolean model of apoptosis.

Authors: Rebekka Schlatter; Kathrin Schmich; Ima Avalos Vizcarra; Peter Scheurich; Thomas Sauter; Christoph Borner; Michael Ederer; Irmgard Merfort; Oliver Sawodny
Journal: PLoS Comput Biol Date: 2009-12-11 Impact factor: 4.475

10. A logical model provides insights into T cell receptor signaling.

Authors: Julio Saez-Rodriguez; Luca Simeoni; Jonathan A Lindquist; Rebecca Hemenway; Ursula Bommhardt; Boerge Arndt; Utz-Uwe Haus; Robert Weismantel; Ernst D Gilles; Steffen Klamt; Burkhart Schraven
Journal: PLoS Comput Biol Date: 2007-07-05 Impact factor: 4.475

2 in total

1. Exploring attractor bifurcations in Boolean networks.

Authors: Nikola Beneš; Luboš Brim; Jakub Kadlecaj; Samuel Pastva; David Šafránek
Journal: BMC Bioinformatics Date: 2022-05-11 Impact factor: 3.307

2. Network controllability solutions for computational drug repurposing using genetic algorithms.

Authors: Victor-Bogdan Popescu; Krishna Kanhaiya; Dumitru Iulian Năstac; Eugen Czeizler; Ion Petre
Journal: Sci Rep Date: 2022-01-26 Impact factor: 4.379

2 in total