Literature DB >> 31510702

MCS2: minimal coordinated supports for fast enumeration of minimal cut sets in metabolic networks.

Reza Miraskarshahi¹, Hooman Zabeti¹, Tamon Stephen², Leonid Chindelevitch¹.

Abstract

MOTIVATION: Constraint-based modeling of metabolic networks helps researchers gain insight into the metabolic processes of many organisms, both prokaryotic and eukaryotic. Minimal cut sets (MCSs) are minimal sets of reactions whose inhibition blocks a target reaction in a metabolic network. Most approaches for finding the MCSs in constrained-based models require, either as an intermediate step or as a byproduct of the calculation, the computation of the set of elementary flux modes (EFMs), a convex basis for the valid flux vectors in the network. Recently, Ballerstein et al. proposed a method for computing the MCSs of a network without first computing its EFMs, by creating a dual network whose EFMs are a superset of the MCSs of the original network. However, their dual network is always larger than the original network and depends on the target reaction. Here we propose the construction of a different dual network, which is typically smaller than the original network and is independent of the target reaction, for the same purpose. We prove the correctness of our approach, minimal coordinated support (MCS2), and describe how it can be modified to compute the few smallest MCSs for a given target reaction.
RESULTS: We compare MCS2 to the method of Ballerstein et al. and two other existing methods. We show that MCS2 succeeds in calculating the full set of MCSs in many models where other approaches cannot finish within a reasonable amount of time. Thus, in addition to its theoretical novelty, our approach provides a practical advantage over existing methods.
AVAILABILITY AND IMPLEMENTATION: MCS2 is freely available at https://github.com/RezaMash/MCS under the GNU 3.0 license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Species

Mesh：

Year: 2019 PMID： 31510702 PMCID： PMC6612898 DOI： 10.1093/bioinformatics/btz393

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Constraint-based modeling of metabolic networks has been a major subfield of systems biology thanks to its ability to identify key qualitative characteristics of networks for analyzing and extracting useful information (Bordbar ; Lewis ; Price ). A metabolic network is a collection of chemical reactions which comprise the metabolic activities (i.e. the biochemical transformation of molecules into other molecules for the purpose of maintenance and growth) of a specific organism. One important application of metabolic network analysis is to find interventions that can block a reaction of interest, typically referred to as the target reaction, with applications in drug target identification (Harder ; Hartman ; Imielinski and Belta, 2008; Trinh ; von Kamp and Klamt, 2017) and metabolic engineering (Mahadevan ). When this is achieved by disabling one or more other reactions, the disabled reactions are called a cut set. A cut set is called ‘minimal’ if no proper subset of it can disable the target reaction. The concept of minimal cut sets (MCS) was introduced by Klamt and Gilles (2004) and its applications are examined in detail in Klamt (2006). At the moment, the main approach used for enumerating the MCSs for a target reaction is to compute the elementary flux modes (EFMs) containing the target and then use a dualization procedure to produce the MCSs (Gainer-Dewar and Vera-Licona, 2017). Here, flux modes are possible distributions of fluxes through the reactions, and those can be modeled as hyperedges on the vertex set of possible reactions. EFMs are flux modes which are support-minimal, and it is known that any flux mode can be written as a non-negative linear combination of EFMs. Given the full set of EFMs, MCS can be obtained through the dualization of the hypergraph they define (Haus ; Klamt and Gilles, 2004). Two approaches to do this are Berge’s algorithm (Berge, 1984) and Fredman and Khachiyan’s dualization procedure (Fredman and Khachiyan, 1996). However, both suffer from poor worst-case complexity and produce mixed results in practice. A comparatively new approach (Ballerstein ) produces the MCSs without first computing the EFMs. It works by generating a dual network, which is larger than the original network and depends on the target reaction, and then computing a subset of the EFMs of that network with a specific property, which guarantees that they are precisely the MCSs in the original network. We call this the target-specific dual network method. In this article, we develop a new method, minimal coordinated support (MCS2), which also generates a dual network (either explicitly or implicitly), but in a way that is independent of the target reaction from the original network, then computes the MCSs from those EFMs of the dual network that satisfy a certain property, which also guarantees that they are precisely the MCSs for the target reaction in the original network. MCS2 is based on a generalization of some theoretical results by the last author (Chindelevitch, 2014). We implement MCS2 and find it to be effective on most instances we test it on. We compare it to three alternate methods for enumerating all the MCSs for a target set. The first two methods are to compute the EFMs, and then dualize them with either Berge’s algorithm, or an optimized implementation of Fredman–Khachiyan dualization, respectively. For Berge’s algorithm, we used the implementation in CellNetAnalayzer (Klamt ), containing the enhancements described in Eiter ) and Haus ). For Fredman–Khachiyan dualization, we used the recent implementation of Sedaghat . The target-specific dual network method (Ballerstein ) first creates a dual network based on the given stoichiometric matrix and the given target reaction. It then proceeds to compute the EFMs of that dual network. Following some post-processing, the supports of these EFMs are reduced to give the required MCSs; because the MCSs correspond to only a subset of the vectors produced, this post-processing includes removing any supersets. Like MCS2, the target-specific dual network method reports all the MCSs without first computing the EFMs or requiring them as an input. The authors of Ballerstein did not provide a publicly available implementation of their method, so we did so ourselves, including all the enhancements mentioned in their Supplementary Materials. Most of these enhancements have improved the performance of the target-specific dual network method, for instance by reducing the size of the intermediate results. For the majority of the models we investigated, we find that MCS2 is more efficient than these other methods, in terms of both running time and memory use. On the negative side, we show that our approach does not allow the enumeration of all MCSs through a given target reaction in incremental polynomial time, something that therefore remains a major open problem in the field. Given the challenges in enumerating all MCSs (in part due to their large number, which can be exponential in the size of the network), some recent work (von Kamp and Klamt, 2014; Vieira ) uses mixed-integer linear programming (MILP) formulations to enumerate a subset of the MCSs, in increasing order of size. In some practical applications, quickly obtaining a few MCSs of minimum size may be more desirable than enumerating the complete set. We therefore adapt the MCS2 approach to use MILP formulations to address this task. We also implement this method, which we call MCS2-MILP, and compare it to MCSEnumerator, the target-specific dual network approach adapted to MILP (von Kamp and Klamt, 2014). The comparison shows that MCS2-MILP performs at least as well as MCSEnumerator using a state-of-the-art MILP solver (IBM). We conclude that MCS2 is a promising approach for the computation of MCSs in metabolic networks, and expect it to be a beneficial addition to the analysis tools available for metabolic network models. We now introduce the terminology we will be using throughout this article. When we speak of a metabolic network, it is understood that we are talking about a model in the constraint-based modeling formalism. (Stoichiometric matrix). The stoichiometric matrix S is an m × n matrix with each row representing a metabolite (indexed from 1 to m) and each column, a reaction (indexed from 1 to n). The entry S indicates how many units of metabolite i are produced (if S > 0) or consumed (if S < 0) by reaction j. A vector v is feasible with respect to S if it is in the null space of S, i.e. if it satisfies Sv = 0. (Reaction irreversibility). The set of irreversible reactions is a subset of the set of reactions constrained to have only non-negative fluxes. Its complement , the set of reversible reactions, is allowed to have fluxes of any sign. A vector v respects the reaction irreversibility constraints if , also written as . (Metabolic network). A metabolic network is a pair , where is a stoichiometric matrix and is the set of irreversible reactions. A vector v is a flux mode if it is feasible with respect to S and respects the irreversibility constraints, i.e. Sv = 0 and . The set of all such vectors is called the network’s flux cone. (Reconfigured network). Let be a metabolic network. We can reconfigure this network by replacing S with which contains the columns of S that are not in , and then consider all reactions as irreversible. This is equivalent to splitting each reversible reaction in the network into its forward reaction and reverse reaction. (Null space matrix and network). Let S be a matrix. A null space matrix of S is a matrix whose rows form a basis of the null space of S. The null space network of a metabolic network with stoichiometric matrix S is the fully reversible metabolic network (i.e. with ) whose stoichiometric matrix is a null space matrix of S. (Positive and negative support). Let v be a vector. The positive support of v, , is the set of positions i where v is positive: . The negative support of v, , is the set of positions i where v is negative: . Their union is the support of v: . (Coordinated support). Let v be a vector of size n and let be a set of positions. The A-coordinated support of v, , is the union of its negative support on the positions in A and its support everywhere else: , where A is the complement of the set A with respect to . (EFM). Let be a metabolic network, and let v be a flux mode of . It is an EFM if its support is minimal among all the flux modes of , i.e. for any flux mode w which (Gagneur and Klamt, 2004; Schuster ). (MCS). Let be a metabolic network, and let t be a reaction. C is a cut set for t if . C is a MCS if it is inclusion-minimal: for any set implies that (Klamt and Gilles, 2004). (Canonical form of a network). Let be a metabolic network. We say that is in canonical form if it satisfies: No blocked reactions: for every reaction i, there exists a flux vector v with v = 1 Proper directedness: for every reaction , there exists a flux vector w with No enzyme subsets: no reaction pair satisfies with for all flux vectors v No redundant constraints: S has full row rank A metabolic network can be reduced to an equivalent one in canonical form (a.k.a. compressed form) in time polynomial in m and n (Chindelevitch, 2014).

2 The MCS2 method

Let S be the ith row of the stoichiometric matrix S. Then S represents the amount of metabolite i consumed or produced by reaction r (in these cases, S < 0 and S > 0, respectively). Assume that reaction r produces metabolite i if it has a positive flux. Then, in a steady state where no reaction consuming metabolite i is active, reaction r must be inactive in the forward direction. If reaction r happens to be reversible, it must be consuming metabolite i, and its flux must be negative. This shows that reaction r is blocked in the forward direction if we disable every reaction that can consume metabolite i, i.e. every irreversible reaction with a negative value in row i and every reversible reaction with a non-zero value in row i. Disabling a reaction can be done by removing the corresponding column from the stoichiometric matrix or adding the constraint that this reaction’s flux is zero to the problem. The set of such reactions is then a cut set for the forward direction of reaction r. Every row gives us some, not necessarily minimal, cut set in this manner. We can apply the same reasoning to linear combinations of the metabolites. Consider a new virtual metabolite x, which represents a linear combination of rows S and S corresponding to metabolites i and j respectively, say for example . Since the fluxes producing and consuming each metabolite are balanced in any admissible vector, so are the fluxes of their linear combinations, so the virtual metabolite x must also be balanced. If we pick a reaction with a positive value in u, it produces a virtual metabolite x when it has a positive flux. It will therefore be blocked if we cut all irreversible reactions with negative values in u and all reversible reactions with non-zero values in u. Thus, we can obtain cut sets from the vector u, which is a member of the row space of S, as we did with S and S. A proposal for finding cut sets by analyzing the row space of the stoichiometric matrix was introduced in the Ph.D. thesis of Chindelevitch (Chindelevitch, 2014). The intuition we described shows how vectors in the row space can generate cut sets. However, the lemmas proven in Chindelevitch (2014) work only for the fully reversible networks (i.e. with ) or fully reversible networks (i.e. with ). We generalize them here to networks with both irreversible reactions and reversible reactions.

2.1 Enumerating the full set of MCSs

In the MCS2 method, the dual network is the null space network of the original network, i.e. a fully reversible network whose stoichiometric matrix is the null space matrix of the original stoichiometric matrix. The EFMs in this dual network map to MCS of the original network, though, as we will see, the mapping can be many to one. The dual network has the same number of reactions, n, as the original one, but it typically has fewer metabolites; if the original network has m metabolites and the stoichiometric matrix is full rank, the dual has n – m metabolites. (MCSs for an irreversible reaction). Let be a metabolic network. Let be an irreversible target reaction. Then C is a cut set for t if and only if there exists a vector u in the row space of S, , such that u = 1 and . This lemma is an extension of Lemma 3 of Chindelevitch (2014). We observe that C being a cut set for irreversible reaction t is equivalent to: Based on Farkas’ Lemma and the irreversibility of t, we only need to find a constraint that implies . Thus, there exists a y such that: Here, e is a vector with a 1 in the ith position and 0 elsewhere. Thus: Therefore and , and so . For the other direction, suppose that u satisfies Equation (3) for some C. Then the union of the indices i in for which and the indices i in for which is a subset of C, i.e. . Then equality (2) holds and, by Farkas’ lemma, so does condition (1). □ (MCSs for one direction of a reversible reaction). Let be a metabolic network. Let t be a reversible target reaction. Then C is a cut set for the forward (reverse) direction of t if and only if there exists a vector such that and u = 1 (), respectively. If we assume that t is irreversible for a moment, the first part already follows from the previous lemma. For the second part, replace t with –t in S to create . Reaction t is blocked in the forward direction in if and only if reaction t is blocked in the reverse direction in S, and there is a bijection between the vectors in Row(S) and those in via the mapping that multiplies the tth coordinate by –1. □ With these lemmas, Algorithm 1 can be used to find the MCS for a set of target reactions in an arbitrary metabolic network , where T has separate elements for the opposite directions of a reversible reaction. We call this method MCS2 because it computes MCSs as the MCS2 of the EFMs in the dual network.

Algorithm 1. MCS enumeration via the MCS2 method

Input: A metabolic network , and a target set Output: MCS of target reactions T. 1: functionMCS_Enumeration() 2: Reduce S to its canonical form. 3: Compute the null space matrix N of S. 4: Compute all EFMs of N. 5: for alldo Compute , the set of all elements of involving target t. 6: for alldo Let be the set of minimal -coordinated supports of the elements of . 7: Let . 8: Let be the result of pruning to remove any supersets. 9: Return . The MCS2 method computes the null space network of the original network. Then, by applying coordinated support to the EFMs in which each target reaction is active in turn, it generates the cut sets related for this target. All the MCSs are among these cut sets, and are obtained by pruning. The null space network is fully reversible, and since the null space of the null space is the original space, the dual network of the dual network is equivalent to the original network all of whose reactions have become reversible. An example is shown in Figure 1, where the dual network appears alongside the original one. As shown in the example, flux modes with an active target reaction in the dual network map to cut sets for the target reaction in the original network.

Fig. 1.

Example of a metabolic network with its associated dual network created by the nullspace matrix. Some of its FMs involving target reaction are shown; their coordinated supports result in cut sets for it in the original network Flux modes finders such as FluxModeCalculator (van Klinken and Willems van Dijk, 2016) reconfigure the network before applying the double description method. The double description method is an algorithmic approach for finding the extreme rays of a pointed cone described by linear constraints. The reconfigured network is , where N is the null space network of S. Figure 2 shows the null space matrix of , which is the starting point of double description method. The double description method begins by using elementary row operations to put the matrix in the form suggested in Wagner (2004) which contains an identity matrix of size m + n. When the double description method finishes, it outputs the extreme rays describing a cone in 2n-dimensional space (Terzer and Stelling, 2008). These extreme rays are the non-zero vectors in the flux cone with minimal support. On the other hand, -coordinated support does not count non-zero values in some dimensions, namely, those that correspond to positive values in irreversible reactions. If we ignore these dimensions, we project the cone into a lower-dimensional subspace. While the image of a pointed cone remains a pointed cone, the extreme rays of the new cone are those in the original flux cone with minimal support in the remaining dimensions. Figure 3 shows why all the MCS2 are among the minimal supports, while arguing that there may be some redundant results among them as well. The next theorem formalizes this:

Fig. 2.

This matrix is the nullspace of the reconfigured nullspace of stoichiometry matrix . The double description method begins on this space and finds extreme rays with length

Fig. 3.

Each extreme ray of the projected cone is an image of an extreme ray in the original cone, while some extreme rays of the original cone do not project to extreme rays. It is also possible that two or more extreme rays in the original cone project onto the same one. Our desired projections lie in the plane where the value in the target position is

This matrix is the nullspace of the reconfigured nullspace of stoichiometry matrix . The double description method begins on this space and finds extreme rays with length Each extreme ray of the projected cone is an image of an extreme ray in the original cone, while some extreme rays of the original cone do not project to extreme rays. It is also possible that two or more extreme rays in the original cone project onto the same one. Our desired projections lie in the plane where the value in the target position is (Correctness of the method). Algorithm 1 returns precisely the set of MCS of the network for a single target reaction. Let t be the target reaction. We prove the inclusion in both directions. First, let be one of the sets returned by the method above. Then C is a cut set for t in the reconfigured network, by Lemma 1 and by construction. Indeed, contains flux modes of N involving t, which are precisely the vectors in the row space of S involving t, and (as well as ) contains the -coordinated supports of these vectors. Now, let C be a MCS for t in . We will show that . By Lemma 1, there exists a vector such that u = 1 and . Since , u is a conical combination of the EFMs of N. Note that the results of Müller and Regensburger (2016) imply that since the space to which u belongs is linear (i.e. it does not need to satisfy any non-negativity constraints), this conical combination can be chosen to be conformal, meaning that there are no cancellations involved in any component. Let such a conformal conical combination be given by Since all the coefficients are strictly positive in (4), we deduce that Indeed, each must have a negative component in at least one of the f, as otherwise the jth component of the right-hand side of (4) will be non-negative, which gives the direction, and the fact that the combination is conformal gives the direction, as otherwise there would be a cancellation. In particular, we deduce that for each . In this case, the minimality of C implies that either or f has a 0 in position t, for each . But since u has a 1 in position t, there must be at least one f in the first category, so that and therefore, . Once again, by the minimality of C we conclude that since it cannot be a superset of the -coordinated support of another vector in , concluding the proof. □

2.1.1 Limitations

Our method is limited to blocking one direction of a given reaction. In practice, blocking one direction of a given reaction is the typical objective (Burgard ). To block multiple reactions, it is possible to compute the MCSs of every target reaction, take unions of all possible combinations, then remove the supersets. However, this may not always be efficient. A more critical issue is the possibility of generating a large number of non-MCS before the post-processing. The following Lemma shows that this type of blow-up can occur in theory: (Large number of supersets in the final step). For every integer there exists a network containing k + 2 metabolites, reactions and elementary vectors for the target reaction t = 1 that map to the exact same MCS. This network is in canonical form, and is elementally balanced as per Zabeti . We construct the network as follows. There are two special metabolites, denoted M (initial) and M (final), and k intermediate metabolites, denoted M for . For each metabolite, we have an export reaction and an import reaction, with the export reactions for each intermediate metabolite coupled with an import of the final metabolite. Lastly, each intermediate metabolite except the first one can be transformed into the first one, M1, which itself can also be transformed into the initial metabolite M. All reactions in the network are irreversible and all the stoichiometric coefficients are ±1. We order these reactions as follows (for simplicity of argument): The stoichiometric matrix then looks as follows (shown for k = 2): Here, a + represents a 1 and a – represents a –1. We now proceed to show each part of the desired statement: The network is elementally balanced because every reaction that is not pure import or pure export is an exchange of one metabolite for another in a 1-1 ratio, so we can consider each metabolite as containing exactly 1 atom. The network is in canonical form because every reaction can be active and no pair of reactions is constrained to have proportional fluxes; this is evidenced by the following flux modes: This set of fluxes includes every reaction at least twice, and in at least two of these the sets of other active reactions are disjoint. The only exceptions are R1 and R3, which need R2 to be active in order to occur, but the first and third flux modes (respectively second and third flux modes) show that their fluxes are not proportional to that of R2 or to each other; and the reactions and for , both of which need to be active in order to occur, but not in a fixed ratio, as evidenced by taking linear combinations of the last two sets of flux vectors. The stoichiometric matrix has full row rank, i.e. no metabolite generates a redundant constraint, because every metabolite except M has a pure import reaction, while M has a pure export reaction. There is a unique MCS for target reaction R1, namely, R2. This is because R2 is the only reaction consuming M (recall that all reactions are irreversible). The first row of S, produces this MCS via its negative support. Lastly, there are additional vectors in the row space of S that produce supersets of this MCS via their negative support. The first one is obtained by adding the second row of S to u, replacing it by and then picking any subset P of the set of entries to form a new vector v, as follows. Let be an element of the chosen subset, where . We will replace with via the addition of the -nd row of S (corresponding to the intermediate metabolite ) to the starting vector. Indeed, this row contains three non-zero entries: a –1 from reaction (which cancels out the 1 in position ), as well as another 1 from reaction and another –1 from reaction . We do this addition independently for each element of P to get v (if we get ). It is easy to check that v has support: No proper subset of this support can produce a non-trivial vector in the row space of S, as it is impossible by construction to add a linear combination (possibly with negative coefficients) of the rows of S to v without adding any new elements to its support, so each v is elementary. Furthermore, the negative support of v is: which is a strict superset of the negative support {2} of u. □

2.1.2 Advantages

An advantage of the MCS2 approach is that we find the MCSs directly, without first computing the EFMs of the original network. Also, we do not need to reconfigure or alter the stochiometric matrix; every step is performed directly on the original stoichiometric matrix or its null space matrix. Network compression or reduction may be done as a preprocessing step before going through the main procedure, but these are only used to reduce the running time and space and are optional. These advantages are shared with the target-specific dual network approach. However, there are additional advantages that MCS2 has over this method. First, the null space matrix is typically smaller than the original matrix, especially if the original matrix is nearly full rank, while the target-specific dual network has a matrix that is always larger. This difference in input size can lead to substantial resource savings during EFM computation. Second, and perhaps most importantly, the dual network is independent of the target reaction in our method, while it is not with the target-specific approach. This means that we can calculate these EFMs once and use them for any given target reaction to be blocked.

2.2 Generating a partial set of MCSs via MILP

An alternative strategy for computing EFMs is via MILP, particularly when only a few small MCSs are required instead of a full enumeration (Rezola , 2015). Recall that EFMs are minimal-support vectors in the null space. Our method for finding small MCSs, which we call MCS2-MILP, similarly looks for vectors with MCS2 in the row space, which is to say, EFMs with minimal coordinated support in the dual network. (MCSs of a target set of reactions in a fully irreversible metabolic network). Let S be the stoichiometric matrix of a fully irreversible metabolic network . Let T be a set of target reactions. Then C is a cut set for all the reactions in T if and only if there exist a vector such that and . We need to show that every cut set for the target reactions arises from a vector in the row space with the described constraints, and every vector in the row space satisfying those constraints maps to a cut set. Let C be a cut set for all reactions in T. Therefore, C is a cut set for each reaction in individually. Based on Lemma 1, there exist vectors such that and for . In other words, for all vectors the only negative elements are the ones with indices belonging to C, and all other elements are non-negative, with a strictly positive value in the one with index t in the vector u, for . If we define the vector , then and , and u is clearly in Row(S). Now, let u be a vector in Row(S) such that and . Then for all . Based on Lemma 1, is a cut set for the reaction t, for each . Therefore, C is a cut set for all the reactions in T, completing the proof. □ Based on this Lemma we are able to find MCS for every set of target reactions without the restriction of only blocking one direction of a reaction. Since reversible reactions can be split into two reactions after reconfiguration, we can block a reversible reaction in one direction or in both directions. Let be the reconfigured matrix of the m × n stoichiometric matrix S with irreversible reactions . Since all the values in the stoichiometric matrix are proportions of consumed and produced metabolites, we can scale each row of S to have only integer entries without changing its structural properties. We now describe how to encode the problem of finding the smallest MCS for a target set T as a MILP. Let be a vector in the row space of the reconfigured matrix corresponding to the smallest MCS for target reaction set . Then there exists a vector s.t . If we define as the indicator vectors of the positive and negative supports of v, respectively, we may force v to be non-positive if is 0, and force it to be non-negative if is 0, by adding the following constraints using and as indicator variables: There must also be positive values in the target positions: These constraints also ensure that v = 0 is not in our feasible space. To make v a vector in the row space of we need to add the y variables, namely, the entries of a vector y with size m. The constraint then ensures that v is an element of the row space of . The objective function for finding the smallest MCS is: since the cut set is precisely the negative support of v, i.e. r–. Suppose that we have found the smallest MCS . To find the next smallest MCS, we need to exclude and all its supersets from our feasible space. This is achieved by the following constraint: We can keep excluding newly found MCSs and thus enumerate them in order of increasing size. As we stated above, in most scenarios we only wish to block an irreversible reaction or one direction of a reversible reaction. In those cases, we can avoid re-configuring the network to have a smaller stoichiometric matrix. Let t be the only target reaction. Instead of the constraints (6), we only need one constraint if we want to block it in forward direction, and we need the constraint if we need to block it in the reverse direction. The objective function (7) and constraints (8) can be updated as follows to reflect the coordinated support instead of the negative support: Unlike von Kamp and Klamt (2014), our problem formulation does not require any additional constraints, because they only reduce a part of the feasible space of our problem without affecting the optimum objective value. This concerns constraints such as and .

3 Implementation details

Except where noted, the implementations we discuss are in MATLAB. Each method that we consider requires an extreme ray computation, with the underlying cone varying. We used FluxModeCalculator’s EFM generator (van Klinken and Willems van Dijk, 2016) for this purpose. Note that the optimized Berge algorithm implemented by CellNetAnalyzer (Klamt ) uses the older EFM finder of CellNetAnalyzer by default. However, we observed that it is a slower implementation of an identical calculation, so we rewrote this part to use FluxModeCalculator in order to make a fair comparison. The MCS2 method and the target-specific dual network method both need to remove redundant supersets from the obtained extreme rays, since the desired minimality is not with respect to the full support of the vector, but only a specific part of it (the -coordinated support for the former, and support in the v-coordinates for the latter). We use an implementation in Java whose time complexity is for a collection of N sets. All stoichiometric matrices are compressed by the Mongoose (Leonid ) before processing, which converts them to a canonical form. Since the null space is needed for the row space method, we calculate the null space basis matrix using Mongoose (Leonid ). Since finding the MCSs in every method takes several seconds to several minutes, and the computation time of the null space basis matrix is less than a second in every case, we ignore this component of it. The reduced matrix given by Mongoose (for Berge and MFK), the target-specific dual matrix (for the target-specific dual method) and the null space basis matrix (for MCS2) get further compressed by FluxModeCalculator before processing. For the Berge algorithm, we used CellNetAnalyzer (Klamt ). We also used an existing implementation of the improved modified Fredman–Khachiyan (MFK) algorithm (Sedaghat ). However, we implemented the target-specific dual method from scratch using MATLAB and the source code of FluxModeCalculator. All the enhancements mentioned in the Supplementary Material of the original paper (Ballerstein ) were implemented as well. We used CPLEX (IBM) to solve the MILPs. The implementation was done via the Java API and has been implemented for single target reactions without network reconfiguration, and for multiple target reactions with network reconfiguration. Since the stoichiometric matrices needed to contain only integers, we used the integralize function of MONGOOSE (Leonid ) to multiply each row by the smallest possible integer that makes all the values integer (which is the least common multiple of the denominators of its entries). We also tested the results of our MILP in small networks against other implementations to make sure that the results are consistent. The implementation of all the methods and the MILP version of our method are publicly available at https://github.com/RezaMash/MCS under the GNU 3.0 license. Some of them require the use of non-public modules available for academic use, such as CellNetAnalyzer (Klamt ) and CPLEX (IBM).

4 Results

In this section, we summarize the performance of MCS2 and MCS2-MILP in comparison to the other methods. We ran the implementations on the first 5 models in our database in the GitHub repository. We compared the set of MCSs in these five small examples to confirm that all the implementations produced same results. For the other results presented in Tables 1–3, we again checked the number of MCSs reported by implementations if they finished, and the numbers matched in all cases. We then ran the methods on several models from the BioModels database (Li ). There were a few models on which our method either was not able to finish in the given time (5 h) or took much longer to report the MCSs, while the optimized Berge was able to finish in time and beat our method (see Table 3 for an example). This is due to the large number of supersets generated in that example by MCS2. However, MCS2 always performed better than the target-specific dual network approach, despite all its suggested enhancements being implemented. In addition, as can be seen for the hepatic polyamine and sulfur amino acid combined model (Reyes-Palomares ), the Berge and MFK methods could not finish in 5 h, but MCS2 generated results in 4 min, and the dual method in 30 min.

Table 1.

Result of running the methods on the hepatic polyamine and sulfur amino acid network (Reyes-Palomares )

All times are in seconds	Optimized Berge	Improved MFK	Target-specific dual network	MCS² dual network
Extreme ray computation	270.2	270.2	1191.9	79.8
Secondary process time	>18 000	>18 000	591.3	157.4
Total time	>18 000	>18 000	1783.2	237.2

Note: ; target reaction 1.

Table 3.

Result of running the methods on Fernandez2006 ModelB (Fernandez ) with ; target reaction 1

All times are in seconds	Optimized Berge	Improved MFK	Target-specificdual network	MCS² dual network
Extreme ray computation	99.5	99.5	>18 000	>18 000
Secondary process	2.1	1445.1	—	—
Total time	101.6	1544.6	>18 000	>18 000

Note: This is an example where MCS2 and the target-specific dual methods could not finish in time, while the Berge and MFK methods reported all 194 689 MCSs for the compressed network’s reaction 1 fairly quickly.

Result of running the methods on the hepatic polyamine and sulfur amino acid network (Reyes-Palomares ) Note: ; target reaction 1. Result of running the methods on the kinetic model of yeast network (Stanford ) with ; all the reactions were used as targets Berge computed the MCSs for the first 5 reactions before running out of time. The MFK and target-specific dual methods were not able to finish the computation of the MCSs even for the first reaction. Result of running the methods on Fernandez2006 ModelB (Fernandez ) with ; target reaction 1 Note: This is an example where MCS2 and the target-specific dual methods could not finish in time, while the Berge and MFK methods reported all 194 689 MCSs for the compressed network’s reaction 1 fairly quickly. The first task of every method is an extreme ray computation, which for Berge and MFK is the well-known EFM computation. Berge and MFK then proceed to generate the MCSs through dualization, while the secondary process of the target-specific dual network and MCS2 approaches is removing the redundant cut sets. In the first two provided examples, the target is the forward direction of the first reaction. Table 2 shows the computation time for calculating the MCSs for all possible target reactions. In the kinetic model of yeast metabolic network, described in Stanford , our method’s advantage is clear—it was able to finish computing the MCSs for all the reactions in under 14 s. Note that the dimensions stated in the tables are the ones before compression is applied. The conclusion is there are models for which it was not feasible to enumerate the full set of MCSs for a given target reaction before our work, but it is feasible now with MCS2.

Table 2.

Result of running the methods on the kinetic model of yeast network (Stanford ) with ; all the reactions were used as targets

All times are in seconds	Optimized Berge	Improved MFK	Target-specific Dual network	MCS² Dual network
Extreme ray computation	86.0	86.0	>18 000	53.0
Secondary process time	>18 000^a	>18 000	—	13.6
Total time	>18 000^a	>18 000	>18 000	66.6

Berge computed the MCSs for the first 5 reactions before running out of time. The MFK and target-specific dual methods were not able to finish the computation of the MCSs even for the first reaction.

We ran the MILP versions on larger networks alongside the MILP version of the target-specific dual approach, as described in von Kamp and Klamt (2014). This version is also a part of CellNetAnalyzer and is believed to be the state of the art for extracting some of the smallest MCSs in increasing order of size. We were able to compute 100 MCSs for reaction 10 (the first reaction with at least 100 MCSs) in Li ) model, which has 578 reactions after compression. The time required by both approaches on the first 40 MCSs is shown in Figure 4.

Fig. 4.

Time (in seconds) for computing each of the 40 smallest MCSs for reaction 10 (the first reaction which has at least 100 MCSs) of the Li2012 calcium-mediated synaptic plasticity model (Li )

Time (in seconds) for computing each of the 40 smallest MCSs for reaction 10 (the first reaction which has at least 100 MCSs) of the Li2012 calcium-mediated synaptic plasticity model (Li ) MILP was used to find a subset of the MCSs (Song ). The target-specific dual method has previously been used for this task, in a method called MCSEnumerator (von Kamp and Klamt, 2014; Vieira ). As its authors state, not all the EFMs in the dual space result in valid MCSs, but by adding the appropriate constraints, one can remove the redundant results from the ILP’s feasible space. To get a sense of how our approach, MCS2-MILP, performs compared to MCSEnumerator, we implemented the MILP described in von Kamp and Klamt (2014), currently part of CellNetAnalyzer. We ran our implementation of MCSEnumerator (von Kamp and Klamt, 2014) and MCS2-MILP on E.coli iAF1260 for the sake of comparison, which showed a similar performance, as shown in Table 4. This table contains the result of running MILPs for each reaction as a target reaction once per iteration. In each iteration, we restricted the MILPs to not spend more than 1 min on finding MCSs. Table 5 shows the result of the same experiment repeated with models chosen from the BiGG database (King ).

Table 4.

Result of running MCS2-MILP and MCSEnumerator on the E.coli iAF1260 network with 2382 reactions (981 reactions after compression)

Method used	Average number of MCSs	Average time for shortest MCS	Number of targets MILP failed on
MCS²-MILP	12.74 MCSs	4.45 s	17
MCSEnumerator	12.07 MCSs	5.22 s	13

Table 5.

Result of running MCS2-MILP and MCSEnumerator on the models from the BiGG database which initially have 2000–2600 reactions

Model ID	Average time for shortest MCS for MCSE numerator (s)	Average time for shortest MCS for MCS²-MILP (s)	Reactions before (after)
Model ID	Average time for shortest MCS for MCSE numerator (s)	Average time for shortest MCS for MCS²-MILP (s)	compression
iJO1366	4.66	3.98	2583 (1106)
iRC1080	7.12	7.19	2191 (1080)
STM_v1_0	1.82	1.83	2545 (1031)
iSbBS512_1146	14.10	19.03	2591 (1018)
iAF1260	5.22	4.45	2382 (981)
iSDY_1059	8.00	9.63	2539 (942)
iYL1228	1.88	2.11	2262 (805)

Result of running MCS2-MILP and MCSEnumerator on the E.coli iAF1260 network with 2382 reactions (981 reactions after compression) Result of running MCS2-MILP and MCSEnumerator on the models from the BiGG database which initially have 2000–2600 reactions The results in Figure 4 and Table 5 suggest that while MCS2-MILP does not perform faster than the MCSEnumerator, it is not slower either. On the other hand, MCS2-MILP is in the first version of itself. It has very simple structure compare to the MCSEnumerator and there is a possibility that it would improve. Even if not, MCS2-MILP is an alternative method which is easier to implement and understand and can help to have a better insight about what is exactly happening during the procedure of these methods.

5 Conclusions and future work

One key advantage of our method is that it does not depend on the target reaction to construct the dual network. The computations for one target reaction can therefore be reused for a different target reaction. Furthermore, it tends to operate on a smaller network than the original. One limitation to our method is that it is primarily designed for single target reactions (rather than a target containing a set of reactions), while both are just as easily handled by the competitor methods. Although MCS2 does not find the MCSs for a set of reactions directly, it can easily find the MCSs for each reaction individually, then prune any supersets from the union of these MCSs. An alternate strategy for computing MCSs is via MILP, particularly when only a few short sets are required, rather than a complete enumeration (Rezola , 2015). We showed that MCS2 can be easily adapted to this task via the MCS2-MILP method, which has shown performance not inferior to that of the state of the art. Another topic for further investigation can be the problem of finding a partial set of irreversible minimal cut sets (iMCSs) (Annika ) with MILP. With proper additional constraints, we can find a partial set of iMCSs. There has been some work in enumerating iMCSs; the MILP version of this method is a suitable contender. Another strategy is to alter the double description method to directly find rays with MCS2 instead of minimal support, e.g. by ignoring some of dimensions of the reconfigured network. Here it is important to be careful about zero-cycle flux modes, which are flux modes that have fluxes in both direction of a split reversible reaction. These are not valid flux modes, but they do appear in the output of the double description method (Gagneur and Klamt, 2004) and they may cause the omission of some rays which contain them in their support. As we mentioned, there are many models for which our method outperforms all other existing methods, while for some models, the best performance is obtained by the Berge algorithm. The challenge is to find out what features of these models are different, and then to decide ahead of time what method to choose for a given model. Our method is based on novel insights, and may be refined further. Possible additional sources of improvement include identifying and removing unwanted supersets during the execution of the double description method and optimizing the process of superset removal during post-processing. We believe that our method opens the door to further ideas exploring this different kind of duality between EFMs and MCSs, and deeper insights into the structure of metabolic network models.

Funding

TS would like to acknowledge financial support from an NSERC Discovery Grant. LC would like to acknowledge financial support from a Sloan Foundation Fellowship and an NSERC Discovery Grant. Conflict of Interest: none declared. Click here for additional data file.

32 in total

1. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks.

Authors: S Schuster; D A Fell; T Dandekar
Journal: Nat Biotechnol Date: 2000-03 Impact factor: 54.908

2. Minimal cut sets in biochemical reaction networks.

Authors: Steffen Klamt; Ernst Dieter Gilles
Journal: Bioinformatics Date: 2004-01-22 Impact factor: 6.937

Review 3. Genome-scale models of microbial cells: evaluating the consequences of constraints.

Authors: Nathan D Price; Jennifer L Reed; Bernhard Ø Palsson
Journal: Nat Rev Microbiol Date: 2004-11 Impact factor: 60.633

4. Generalized concept of minimal cut sets in biochemical networks.

Authors: Steffen Klamt
Journal: Biosystems Date: 2005-11-21 Impact factor: 1.973

5. Computing knock-out strategies in metabolic networks.

Authors: Utz-Uwe Haus; Steffen Klamt; Tamon Stephen
Journal: J Comput Biol Date: 2008-04 Impact factor: 1.479

6. Design, construction and performance of the most efficient biomass producing E. coli bacterium.

Authors: Cong T Trinh; Ross Carlson; Aaron Wlaschin; Friedrich Srienc
Journal: Metab Eng Date: 2006-08-15 Impact factor: 9.783

7. Computation of elementary modes: a unifying framework and the new binary approach.

Authors: Julien Gagneur; Steffen Klamt
Journal: BMC Bioinformatics Date: 2004-11-04 Impact factor: 3.169

8. Minimal reaction sets for Escherichia coli metabolism under different growth requirements and uptake environments.

Authors: A P Burgard; S Vaidyaraman; C D Maranas
Journal: Biotechnol Prog Date: 2001 Sep-Oct

9. Structural and functional analysis of cellular networks with CellNetAnalyzer.

Authors: Steffen Klamt; Julio Saez-Rodriguez; Ernst D Gilles
Journal: BMC Syst Biol Date: 2007-01-08

10. DARPP-32 is a robust integrator of dopamine and glutamate signals.

Authors: Eric Fernandez; Renaud Schiappa; Jean-Antoine Girault; Nicolas Le Novère
Journal: PLoS Comput Biol Date: 2006-11-06 Impact factor: 4.475

3 in total

1. An extended and generalized framework for the calculation of metabolic intervention strategies based on minimal cut sets.

Authors: Philipp Schneider; Axel von Kamp; Steffen Klamt
Journal: PLoS Comput Biol Date: 2020-07-27 Impact factor: 4.475

2. Speeding up the core algorithm for the dual calculation of minimal cut sets in large metabolic networks.

Authors: Steffen Klamt; Radhakrishnan Mahadevan; Axel von Kamp
Journal: BMC Bioinformatics Date: 2020-11-09 Impact factor: 3.169

3. Rapid-SL identifies synthetic lethal sets with an arbitrary cardinality.

Authors: Payam Setoodeh; Habil Zare; Mehdi Dehghan Manshadi
Journal: Sci Rep Date: 2022-08-18 Impact factor: 4.996

3 in total