Literature DB >> 32069395

MEMO: A Method for Computing Metabolic Modules for Cell-Free Production Systems.

Abstract

Cell-free bioproduction systems represent a promising alternative to classical microbial fermentation processes to synthesize value-added products from biological feedstocks. An essential step for establishing cell-free production systems is the identification of suitable metabolic modules with defined properties. Here we present MEMO, a novel computational approach to find smallest metabolic modules with specified stoichiometric and thermodynamic constraints supporting the design of cell-free systems in various regards. In particular, one key challenge for a sustained operation of cell-free systems is the regeneration of utilized cofactors (such as ATP and NAD(P)H). Given a production pathway with certain cofactor requirements, MEMO can be used to compute smallest regeneration modules that recover these cofactors with required stoichiometries. MEMO incorporates the stoichiometric and thermodynamic constraints in a single mixed-integer linear program, which can then be solved to find smallest suitable modules from a given reaction database. We illustrate the applicability of MEMO by calculating regeneration modules for the recently published synthetic CETCH cycle for in vitro carbon dioxide fixation. We demonstrate that MEMO is very flexible in taking into account the diverse constraints of the CETCH cycle (e.g., regeneration of 1 ATP, 4 NADPH and of 1 acetyl-group without net production of CO2 and with permitted side production of malate) and is able to determine multiple solutions in reasonable time in two large reaction databases (MetaCyc and BiGG). The most promising regeneration modules found utilize glycerol as substrate and require only 8 enzymatic steps. It is also shown that some of these modules are robust against spontaneous loss of cofactors (e.g., oxidation of NAD(P)H or hydrolysis of ATP). Furthermore, we demonstrate that MEMO can also find cell-free production systems with integrated product synthesis and cofactor regeneration. Overall, MEMO provides a powerful method for finding metabolic modules and for designing cell-free production systems as one particular application.

Entities: Chemical Disease Gene Species

Keywords: cofactor regeneration; constraint-based modeling; design of cell-free systems; metabolic networks; mixed-integer linear programming; synthetic biology

Mesh：

Substances：

Year: 2020 PMID： 32069395 PMCID： PMC7308072 DOI： 10.1021/acssynbio.9b00434

Source DB: PubMed Journal: ACS Synth Biol ISSN： 2161-5063 Impact factor: 5.110

The biobased production of chemicals, materials, and fuels represents one important building block to overcome the dependency on fossil feedstocks. Apart from the replacement of petrochemical feedstocks with biorenewable resources, this includes the development and use of biochemical conversion systems to synthesize value-added products. Biochemical production processes often rely on microbial fermentation where specifically selected or/and genetically (re)designed micro-organisms convert a given substrate to the target product. The optimization of suitable cell factories with superior performance is a key topic of metabolic engineering.[1] This top-down approach has proven successful for biosynthesis of a number of industrially relevant products.[2] An alternative—mainly bottom-up—approach is the use of cell-free production systems, where purified enzymes and selected metabolites are combined in vitro to facilitate bioconversion.[3−6] While cell-free systems initially consisted only of a single or a few enzymes (e.g., in the classical field of biocatalysis), the complexity and scope of cell-free bioproduction systems has grown rapidly in recent years. This includes the successful in vitro reconstruction of entire central metabolic pathways[7,8] and of cell-free production systems for a range of chemicals, for example, ethanol, isobutanol,[9] 1,3-propanediol,[10] and monoterpenes.[8] Moreover, cell-free systems allow the construction of completely new (synthetic) pathways such as the recently published CETCH cycle for carbon dioxide fixation which involves 17 reaction steps.[11] As detailed in the literature,[4,5] both cell-based and cell-free production processes have particular advantages and weaknesses. For example, while cellular systems provide a robust environment for enzyme operation and enable a cheap and integrated synthesis of the required pathway components, cell-free systems allow one to assemble completely new pathways, to precisely control system parameters, and to circumvent the intricate cellular (e.g., genetic) regulation often impeding higher yields and productivities of whole-cell systems. Furthermore, since biomass synthesis does not take place in cell-free systems, up to 100% conversion of the substrate into the desired product is possible and pathways involving toxic compounds can be used in vitro that would not be feasible in vivo. However, one particular challenge for cell-free systems is the balancing and regeneration of cofactors (mainly but not exclusively ADP/ATP and NAD(P)/NAD(P)H) consumed or produced by the in vitro metabolic pathway.[4] While cells can balance the net consumption or net production of those cofactors via multiple dedicated metabolic pathways, their imbalance in cell-free systems will quickly stop the operation of the in vitro system. Accordingly, different regeneration submodules have been used to replenish cofactor pools. For example, for regenerating ATP from ADP, Schwander et al.(11) added polyphosphate and a polyphosphate transferase to the carbon dioxide fixing CETCH cycle. A similar approach has been used for cell-free production of nucleotide sugars.[12] Although this kind of ATP regeneration system is simple and relatively cheap and worked sufficiently in these examples, it has the disadvantage of accumulation of inorganic phosphate which, for thermodynamic reasons, may slow down the ATP regeneration step and thus negatively affect the efficiency of the whole system in longer batch runs. Similar negative effects arise when using other compounds with high-energy phosphate bonds (such as PEP), which have frequently been used for cell-free protein synthesis.[3] Alternative ATP regeneration systems have therefore been developed, for example, via generation of acetyl-phosphate from pyruvate[13] (possibly extend by several upstream glycolytic steps[14]), which is then used to produce ATP via acetate kinase. This approach circumvents net production of inorganic phosphate but leads in turn to the accumulation of acetate, which may again have adverse (e.g., inhibitory) effects on the efficiency of the ATP regeneration cycle or/and of the whole process when running over longer time periods. Regarding regeneration of NAD(P)H, Schwander et al.(11) used formate as an electron donor to replenish the NADPH pool via a formate dehydrogenase. While this one-step module is simple and cheap and was fully sufficient to demonstrate the CO2-fixing capability of the synthetic CETCH cycle, it would, in a realistic application, negatively affect the overall stoichiometry of CO2 fixation because CO2 is released as side product of the formate dehydrogenase reaction (however, if formate is produced electrochemically (from CO2), then its use would be carbon neutral). These examples underline the necessity of a systematic approach to identify suitable regeneration modules for cell-free production systems. As a minimal requirement, a regeneration module must fulfill certain stoichiometric and possibly further specifications so that its coupling with the actual production module leads to an optimal balancing of all cofactors in the entire system. Figure illustrates the situation for the CETCH cycle, which will serve as a running example in this study. As mentioned above, this synthetic cycle is a collection of enzyme-catalyzed reaction steps that assimilate CO2in vitro to form glyoxylate as primary product.[11] Since its first conception it has been improved in many iterations, and in this study we focus on version 5.0, which includes also reactions that combine glyoxylate with externally provided acetyl-CoA to produce the end product malate.[11] CETCH 5.0 comprises 17 enzyme-catalyzed reactions, thereof 14 core reactions, a catalase reaction decomposing hydrogen peroxide (produced as side product during the CETCH cycle) to water and oxygen, as well as the two aforementioned reactions for regenerating ATP (via polyphosphate and polyphosphate kinase) and NADPH (via formate and formate dehydrogenase). Leaving out the two reactions for cofactor regeneration the net stoichiometry of the CETCH cycle reads:Assuming external provision of oxygen and carbon dioxide, the CETCH cycle thus requires for one iteration the provision of 1 ATP, 4 NADPH, and 1 acetyl-CoA (Figure ). In the original work, ATP and NADPH were regenerated with the two described mechanisms, whereas acetyl-CoA was externally provided. However, since acetyl-CoA is very expensive and because its consumption leads to net accumulation of CoA and thus potentially to inhibitory effects over longer batch runs, we will herein consider the task to find a suitable regeneration module that replenishes not only the pools of ATP (from ADP and Pi) and NADPH (from NADP) but also of acetyl-CoA (from CoA). For the special context of the CETCH cycle we demand in addition (1) that no carbon dioxide is released as this would compromise the ultimate goal of the CETCH cycle, namely carbon dioxide fixation, (2) that malate is the only allowed organic byproduct of the regeneration module (but its production is not mandatory), and (3) that the algorithm may choose from a given set of possible (typical) substrates (including glucose, glycerol, acetate, and others). Constraint (2) renders the problem significantly harder but ensures that malate is the only organic product of the CETCH cycle when being coupled with the regeneration module, which will simplify downstream processing.

Figure 1

Application of the MEMO algorithm for finding suitable regeneration modules for the cell-free operation of the CETCH cycle.

Application of the MEMO algorithm for finding suitable regeneration modules for the cell-free operation of the CETCH cycle. Herein we present MEMO, a computational method to find smallest metabolic modules with specified stoichiometric and thermodynamic constraints. MEMO supports the design of cell-free systems in various regards, and the design of cofactor regeneration modules is one particular application on which we will focus herein. In this context, MEMO takes as input a universe of metabolic reactions and identifies smallest, thermodynamically feasible regeneration modules delivering cofactors with specified stoichiometric requirements, e.g., as posed for the CETCH cycle (Figure ). MEMO is generically applicable and based on constraint-based modeling techniques. We demonstrate its applicability for the CETCH cycle example showing that even for this case with rather specific requirements metabolic regeneration modules can be found with as few as 8 reactions. As reaction universe we use modified versions of the BiGG[15] and the MetaCyc[16] databases (Figure ), where reactions between different compartments (and within the membrane compartment) have been removed as their use would complicate the cell-free production system. Apart from the application to find regeneration modules, at the end we will demonstrate that MEMO can also be used to find entire cell-free systems with integrated production pathway and regeneration module ensuring balanced overall conversions.

Methods

Mixed-Integer Linear Program for Calculating Metabolic Modules

In the following, we describe MEMO, a method to compute, from a given reaction database, metabolic modules fulfilling certain specifications. The method is based on an extended version of OptMDFpathway,[17] which is a mixed-integer linear program (MILP) originally developed to identify pathways with highest thermodynamic driving force (according to the max-min driving force (MDF) definition given in[18]) within a given metabolic reaction network. Using OptMDFpathway as a starting point ensures that the computed metabolic modules are (1) balanced and (2) thermodynamically feasible with a (maximal) overall driving force (MDF) exceeding a specified threshold. However, here we need to extend OptMDFpathway in order to enforce the required stoichiometries of the metabolic modules to be identified and to ensure that we find the minimal module (in terms of number of reactions) with these properties. Furthermore, while the original OptMDFpathway implementation was based on the standard Gibbs free energies ΔG′0 of reactions, the extended formulation can alternatively use the Gibbs free energies of formation ΔG′0 of the metabolites. Given a metabolic network (in our application, this will be a reaction database), we consider the following equations of a classical flux balance analysis (FBA) problem on the reaction rates r:As usual, the matrix N comprises the internal metabolites that are considered to be in steady state (eq ). We assume that the network contains only irreversible reactions (α ≥ 0), that is, reversible reactions have been split into forward and backward directions. Furthermore, for technical reasons we demand that the upper flux bounds β must be finite. Constraint eq can be used to add other inequality constraints (like yield constraints) to specify desired properties. Importantly, eqs and 3 will later also be used to incorporate desired stoichiometric requirements of the metabolic module (in the context of regeneration modules, these will be the required cofactor stoichiometries). The thermodynamic driving forces of the reactions are defined as their negative change of Gibbs free energy and collected in vector f. If the standard Gibbs free energies changes of the reactions (ΔG′0) are available and used as thermodynamic parameters then the driving force of reaction i is given byN̂•, is the transposed i-th column (reaction) of the full stoichiometric matrix N̂. N̂ extends the stoichiometric matrix of the internal metabolites (N) with external metabolites which need not be in steady state. R is the ideal gas constant, T the temperature used for determining the Gibbs free energies, and x contains the logarithmized metabolite concentrations. As the exact concentration values are typically not known one demands that x must comply with certain concentration ranges:Eq can be alternatively written with the Gibbs free energies of formation of metabolites (ΔG′):If the standard Gibbs free energy of formation of metabolite j (ΔG′0) is known thenotherwise the value of ΔG′ is allowed to assume an arbitrary value within a specified range bywhere L is a sufficiently large constant (e.g., 10 000). Using constraint eq together with eqs or 7b ensures that no thermodynamically infeasible cycles will arise in the identified solutions for the rate vector r. This even holds true for eq where the ΔG′ is practically unspecified (this approach is similar to st-FBA as proposed by Noor[19]). In a preprocessing step we determine the minimum (f) and maximum (f) values for each driving force f, which depend on the concentration ranges eq and/or on the value of L in case constraints of type eq are present. Each reaction is associated with a binary variable z that must be 1 when there is a flux through this reaction. This is achieved by the constraintsIn order to ensure that the minimal driving force of all active reactions (r > 0) is greater than a given value B > 0 (the driving force of inactive reactions with zero flux is not taken into account) the following constraints are added to the optimization problem: With K = max(f), we set M = K – f, and with B ≤ K, these constraints are always fulfilled for all reactions with r = z = 0. For all reactions with z = 1 it must hold that f ≥ B. Thus, B acts as a demanded lower bound on the MDF of the solution. Finally, in order to minimize the number of active reactions in a given solution the objective functionis imposed, subject to the relevant constraints in eqs –9. Its solution (x, r, z) will deliver a flux distribution r where, given the calculated logarithmized concentration vector x, each active reaction has a driving force of at least B. With the enforced minimization of the number of active reactions, the rate vector r of the solution will represent an elementary flux vector (EFV[20]) (cf. also ref (17)). This means that in networks, where a full enumeration of EFVs is computationally feasible, the optimal solution could be identified by screening the set of these vectors, and even a ranking of all possible minimal solutions could be generated. However, herein we will deal with large networks with thousands of reactions where this is not possible. Instead, the optimal solution of the mixed-integer linear programming (MILP) problem posed by eqs –10 (with x and r as continuous and z as integer variables) will be identified with appropriate MILP solvers.

Incorporation of Required Stoichiometries of Regeneration Modules

Generally, when using MEMO to find certain metabolic modules, the stoichiometric input/output requirements (i.e., the net conversion) of the module must be specified appropriately. In particular, employing MEMO to identify suitable cofactor regeneration modules for a given cell-free production system, one needs to specify (a) a set of possible substrates that can be used, (b) certain stoichiometries of the cofactors to be regenerated, and (c) allowable byproducts of the regeneration module. The set of allowed substrates and byproducts can be easily specified by setting the respective upper bounds β of the substrate uptake and product excretion reactions either to β > 0 (allowed substrate or byproduct) or to zero (not allowed). Compared to classical metabolic network models, which often allow synthesis of, e.g., certain fermentation products, typically only very few (or even no) byproducts will be allowed for regeneration modules. To finally consider the demanded net synthesis of cofactors such as ATP, NAD(P)H, or others, we need to include appropriate demand (pseudo)reactions. For example, the provision of ATP by the regeneration module (and its consumption in the cell-free production system) is modeled by including a reaction that consumes (hydrolyzes) ATP yielding ADP and orthophosphate:Likewise, for the provision (regeneration module) and consumption (production module) of NADH and NADPH, we may include the reactionsandOther cofactors or metabolites to be regenerated can be included as well. In certain applications, NAD(P)H might occur as byproducts of the production module, in this case the regeneration module would need to replenish the NAD(P) pool; hence, the reactions or 13 would be specified in the reverse direction. Finally, the concrete requirements of the cofactor stoichiometries are configured by setting the respective bounds of these exchange reactions in eq . For example, a demanded stoichiometric ATP/NADH ratio of 2:1 would be enforced by setting αATPconsum = βATPconsum = 2 and αNADHconsum = βNADHconsum = 1 effectively fixing rATPconsum = 2 and rNADHconsum = 1. Sometimes the sum of the produced reduction equivalents must give a certain number (e.g., rNADHconsum + rNADPHconsum = 4); such constraints can be incorporated in eq .

Reaction Databases (Universal Networks)

An essential input for MEMO is a metabolic network or, more generally, a reaction universe or database, within which the metabolic module can be identified. Due to the here intended use of MEMO to identify metabolic regeneration modules for cell-free production systems, we are free to compile enzymatic reaction steps from different organisms. On the other hand, we assume that the use of cellular compartments (e.g., mitochondria, chloroplasts etc.) within the cell-free production system is not possible or to be avoided as they would add an additional layer of complexity. We therefore do not consider reactions that involve metabolites from different compartments. Likewise, reactions involving quinone metabolites (which reside in membranes) are not taken into account. Of course, these constraints can be adapted or relaxed for certain applications. In the following we describe the setup of two such universal network models, which were derived from the BiGG[15] and from the MetaCyc[16] database, respectively.

BiGG Model

The BiGG model is a fusion of the 85 models present in the BiGG database[15] (accessed February 2018). As a first step, the models were successively merged while at the same time technical reactions (exchanges, demands, sinks) were removed. For model merging the mergeTwoModels function of the COBRA toolbox[21] was used. This function removes duplicate reactions via the checkDuplicateRxn function, which we extended with a new option that recognizes duplicate reactions by their stoichiometry (up to a scalar factor) and also properly handles the reaction reversibility. Next, reactions that involve metabolites from different compartments were removed for the reasons described above. Although the metabolite IDs in the BiGG models are largely unique, some of them have different formulas in different compartments (mostly due to different protonation states). Such IDs were semiautomatically disambiguated by choosing either a common formula or by making the IDs explicitly distinct. After this the mass balance for the elements C, H, O, N, S, P as well as the charge balance for all reactions were determined and the unbalanced reactions removed. Metabolites occurring in different compartments were mapped onto one ID and the compartment information was removed. Reaction duplicates that had been produced during this process were afterward merged with the extended checkDuplicateRxn function. Finally, reactions that involve any quinone metabolites (which reside in membranes) were deactivated for the calculations. The resulting adapted BiGG network model comprises 5863 metabolites and 8983 reactions. The ΔG′0 values [kJ/mol] for the metabolites were retrieved, as far as available, from the eQuilibrator API[22] by applying a mapping from BiGG metabolite IDs to KEGG IDs. For reactions in which all involved metabolites have a known ΔG′0, reaction reversibility was restricted according to their minimal/maximal driving force. The reversibility of the remaining reactions was left as defined in the BiGG models. For the calculations, driving forces were set up according to eq together with ΔG′ constraints of the form eq or 7b (accordingly, eq was left out).

MetaCyc Model

This model was constructed from the flat file distribution of MetaCyc[16] version 22.6, which contains the files compounds.dat (metabolites) and metabolic-reactions.xml, which were parsed for this purpose. As for the BiGG model, only those reactions were kept whose metabolites all come from the same compartment, and the compartment information for the metabolites was then removed. Duplicate reactions were removed, the mass balance for the elements C, H, O, N, S, P as well as the charge balance were determined, and the unbalanced reactions were removed. Furthermore, reactions that involve metabolites without known ΔG′0 values were removed, and ΔG′0 values (in MetaCyc given in [kcal/mol]) for the remaining reactions were calculated from the ΔG′0. For each reaction, its reversibility was restricted in accordance with its minimal/maximal driving force. Because in this model all reactions have a defined ΔG′0 value, constraints of type eq were used for the driving forces (accordingly, eqs , 7a, and 7b were left out). It should be noted that although the ΔG′0 values in MetaCyc have been determined by a predecessor of the eQuilibrator method,[22] the actual values for the same reaction can be quite distinct between both. The adapted MetaCyc network model comprises 8964 metabolites and 11634 reactions.

Implementation and Availability

Calculations were performed using dedicated functions from (version 2019.3) CellNet-Analyzer,[23,24] including the OptMDFpathway function, which was extended as described in the Methods section. A package with the MATLAB script files for the calculations and with the models (in COBRA format) is available from the following GitHub repository: https://github.com/ARB-Lab/MEMO.git. CPLEX 12.8 was used as MILP solver. The calculations were performed on a cluster node with two Intel Xeon 8-core processors and 192 GB RAM.

Results and Discussion

Computing Regeneration Modules for the CETCH Cycle

To demonstrate the applicability of our MEMO method we employed it, together with the two adapted BiGG and MetaCyc reaction databases (see Methods), to compute regeneration modules that perfectly complement the CETCH cycle.[11] The precise stoichiometric requirements of the CETCH 5.0 cycle were already described in the introduction section and are illustrated in Figure . The following constraints were accordingly set for the MEMO optimization problem: In one iteration, the CETCH cycle consumes 4 NADPH, 1 ATP, and 1 acetyl-CoA (Figure ). Accordingly, the flux through the NADPH consumption reaction was fixed to 4, and the NADH consumption reaction was fixed to zero flux. The consumption of acetyl-CoA was accounted for by introducing another artificial consumption reactionand by fixing its flux value to 1. When computing regeneration modules, it is important to precisely specify the allowed end products (apart from the cofactor requirements). An accumulation of undesired byproducts may not only slow down the regeneration module (and possibly also the actual production system) but also compromise the economic viability of the whole process, for example, due to necessary separation of the main product from byproducts in downstream processing. The main product of the CETCH cycle is malate, and we therefore allowed a net synthesis of this metabolite in the regeneration module as well. Thus, a malate exchange reaction was introduced in the model but without fixing its flux (its synthesis in the regeneration module is thus permitted but not mandatory per se). Additionally, water and oxygen can be exchanged with the environment. Since the CETCH cycle aims to fix carbon dioxide, we did not permit net production of carbon dioxide as this would compromise the efficiency of the whole process. Furthermore, in the BiGG model, protons can also be exchanged while in the MetaCyc model charged metabolites are balanced with protons in their exchange reactions (e.g., when acetate is exchanged this occurs together with one proton). Because all non-exchange reactions are H- and charge-balanced any proton exchange flux in the BiGG model only reflects the difference between the dissociated protons of substrate and product. We separately computed regeneration modules for 9 simple and cheap substrates yielding a set of distributed entry points in the central metabolism: acetate, glycerol, glucose, sorbitol, formate, xylose, lactate, methanol, and succinate. The lower bound B for the thermodynamic driving force (MDF) in eq was set to 0.01 kJ/mol. The metabolite concentrations were allowed to vary between 100 mM and 1 μM, except for CO2 and bicarbonate for which the lower/upper bound was set to 0.1 μM/1 mM. We aimed to calculate the 10 shortest regeneration modules for each substrate. Computation time for each single optimization was limited to 1 h, which, in several cases, was sufficient to reach the guaranteed optimum. Typically, the optimum becomes more difficult to prove with an increasing number of reactions in the smallest regeneration module. Nonetheless, even if the optimum was not proven within 1 h one can usually get a good solution. This can then be evaluated together with the best bound (minimum number of reactions required for any solution) to decide whether or not it may be useful to invest more computation time to try and search for a smaller solution. In a few cases the solver was neither able to prove the infeasibility of the problem nor able to find a feasible solution within 1 h. In such cases the optimization might be repeated with different solver settings and/or the time limit be extended to see if a solution can be found. The results of the computations are shown in Table and in the Supporting Information. Within the given time frame, CETCH regeneration modules were found for 6 out of the 9 substrates in both networks (glucose, glycerol, sorbitol, xylose, lactate, and methanol). All found solutions require between 8 and 16 reactions (excluding the exchange/consumption reactions), which is thus in all cases less than the 17 reaction steps of the entire CETCH cycle. Where both networks offered solutions, they are of the same size or shorter in MetaCyc. In contrast, no solutions were found in either network for the substrates acetate, succinate and formate. In the MetaCyc network, flux variability analysis (without thermodynamic constraints) already reveals that no solution is possible for formate.

Table 1

Smallest CETCH Regeneration Modules (Number of Reactions) and Their Largest Thermodynamic Driving Force (MDF) Found for the Different Substrates in the MetaCyc and in the BiGG Networka

	MetaCyc		BiGG
substrate	number of reactions in smallest module	max MDF [kJ/mol]	number of reactions in smallest module	max MDF [kJ/mol]
glucose	14	12.08	15	2.96
sorbitol	12	8.83	12	1.47
glycerol	8	18.01	8	12.38
methanol	11	3.33	12	3.27
xylose	13	16.28	16	11.50
lactate	11	8.84	14	1.13
succinate	not found		not found
acetate	not found		not found
formate	not found (proven to be infeasible)		not found

Computation time limited to one hour per solution; at most 10 solutions per substrate were computed. The unit of the MDF values from the MetaCyc model were converted from [kcal/mol] to [kJ/mol].

Computation time limited to one hour per solution; at most 10 solutions per substrate were computed. The unit of the MDF values from the MetaCyc model were converted from [kcal/mol] to [kJ/mol]. The shortest regeneration modules were found with glycerol as substrate and require only 8 reactions (proven by the solver to be optimal) in both networks. One of the MetaCyc solutions of size 8 is shown in Figure . In this solution, glycerol is oxidized in two steps to glycerate which in total yields two NADPH. Glycerate is then phosphorylated to 2-phospho-glycerate (2-PG) via the backward direction of the phosphoglycerate phosphatase. Interestingly, this phosphatase has a ΔG′0 of 6.7 kcal/mol and thus favors actually the kinase direction. 2-PG is then converted by the enolase to phosphoenol-pyruvate (PEP) where the pathway splits into two branches. In the first branch PEP is dephosphorylated to pyruvate from which acetyl-CoA, CO2, and NADH are produced. In the second branch PEP is carboxylated to oxaloacetate with concomitant ATP formation. Under consumption of NADH from the first branch oxaloacetate is then converted to malate.

Figure 2

CETCH regeneration module of size 8 found with glycerol as substrate in the MetaCyc database (cf. MetaCyc solution 6 for glycerol in Supporting Information). Including the CETCH cycle, the overall reaction is 2 CO2 + 2 glycerol + O2 → 2 malate + 2 H2O (+ 4 protons). The reactions (MetaCyc identifiers) are (a) GLYCEROL-DEHYDROGENASE-NADP+-RXN, (b) RXN-15115, (c) PHOSPHOGLYCERATE-PHOSPHATASE-RXN, (d) 2PGADEHYDRAT-RXN, (e) PHOSPHOENOLPYRUVATE-PHOSPHATASE-RXN, (f) PYRUVDEH-RXN, (g) PEPCARBOXYKIN-RXN, (h) MALATE-DEH-RXN. The red numbers indicate the (relative) fluxes needed to deliver the cofactors in required stoichiometries to the CETCH cycle (indicated in blue). For the BiGG network, one of the two solutions with 8 reactions (solution 1 in Supporting Information) is shown in Figure . Here, three molecules of glycerol (with concomitant NADPH production) are converted to dihydroxyacetone which is then phosphorylated. From dihydroxyacetone-phosphate (dhap) the pathway splits in two branches. In the first branch two dhap are converted to methyglyoxal from which pyruvate and NADPH are produced. By two separate reactions, pyruvate is converted to acetyl-CoA and malate respectively. In the second branch from dhap, this compound is converted to glycerol-3-phosphate from which ATP and glycerol are produced. Therefore, this regeneration module contains a cyclic (sub)pathway through glycerol constituted by reactions GLYCD, r0242, r0202, and GLYK. However, this does not constitute a thermodynamically infeasible cycle because its net stoichiometry is not empty:Although the ΔG′0 of this partial overall reaction is 27.4 kJ/mol, its ΔG′ can indeed become negative under a suitable metabolite concentration profile, which is compatible with the concentration bounds used for the calculations here.

Figure 3

CETCH regeneration module of size 8 found with glycerol as substrate in the BiGG database (cf. BIGG solution 1 for glycerol in Supporting Information). Including the CETCH cycle, the overall reaction is 2 CO2 + 2 glycerol + O2 → 2 malate + 2 H2O (+ 4 protons). The reactions (BiGG identifiers) are (a) GLYCDy, (b) r0242, (c) r0202, (d) GLYKm, (e) MGSA, (f) ALR, (g) PDHm, (h) ME_x. The red numbers indicate the (relative) fluxes needed to deliver the cofactors in required stoichiometries to the CETCH cycle (indicated in blue). To further analyze the regeneration modules we calculated the elementary modes[20] within each module (cf. Supporting Information). This revealed that each of the two solutions described above is the only steady-state solution within its regeneration module. However, there are also regeneration modules that contain more than one elementary steady-state solution, and therefore can potentially realize multiple cofactor regeneration requirements. As an example, consider the second glycerol solution found in the MetaCyc model (Figure ), where two elementary modes exist. The difference between the modes is that in the first mode the phosphorylation of (the two) glycerate molecules requires no ATP (reaction (c) in Figure ), whereas ATP is consumed in the second mode (reaction (d) in Figure ). The first mode thus has a 2:4:1 (ATP:NADPH:acetyl-CoA) ratio, while the second mode produces a 0:4:1 ratio. When both modes operate together in a 1:1 ratio the result is the desired 1:4:1 ratio, which was enforced by setting the respective fluxes of the cofactor exchange reactions when computing the modules. While the variability with two modes may, at a first glance, appear disadvantageous, it should be noted that when coupling production and regeneration module, operation of the production module will push the net stoichiometry of the regeneration module toward what is consumed in the net by the production module. Moreover, regeneration modules consisting of multiple elementary modes have the advantage that they can, through variation of the flux ratio between the modes, compensate for potential fluctuations in the ATP level, caused, e.g., by its spontaneous hydrolysis. Thereby, this regeneration module may act in a similar manner as the ATP rheostat described in ref (25).

Figure 4

CETCH regeneration module of size 8 found with glycerol as substrate in the MetaCyc database (cf. MetaCyc solution 2 for glycerol in Supporting Information). Including the CETCH cycle, the overall reaction is 2 CO2 + 2 glycerol + O2 → 2 malate + 2 H2O (+ 4 protons). The reactions (MetaCyc identifiers) are (a) GLYCEROL-DEHYDROGENASE-NADP+-RXN, (b) RXN-15115, (c) PHOSPHOGLYCERATE-PHOSPHATASE-RXN, (d) GKI-RXN, (e) 2PGADEHYDRAT-RXN, (f) PEPDEPHOS-RXN, (g) PYRUVDEH-RXN, (h) RXN-19748. The red numbers indicate the (relative) fluxes needed to deliver the cofactors in required stoichiometries to the CETCH cycle (indicated in blue). Of all glycerol solutions, the discussed MetaCyc solution 2 in Figure actually is one of those with the highest MDF for this substrate (18.01 kJ/mol, Table ). The MetaCyc solution in Figure has an MDF of only 3.22 kJ/mol, which is similar to that for the BiGG solution in Figure (3.34 kJ/mol). Therefore, from a thermodynamic standpoint, MetaCyc solution 2 would be preferable. It has been suggested that a MDF of 3 kJ/mol should be sufficient to allow large net fluxes through all participating reactions.[18]Table shows that not for all substrate/network combinations considered here this threshold can be reached with the solutions we calculated so far. However, such a criterion could be enforced during the calculations by setting the value of B in eq to a desired minimum threshold. Following glycerol, methanol is the substrate that allows for the next smallest regeneration modules with 11 reactions in MetaCyc and 12 reactions in BiGG. Exemplarily we discuss the first solution found in MetaCyc and BiGG. In both solutions, the first step after methanol uptake is its conversion to formaldehyde with the concomitant formation of NADH. After this, formaldehyde is bound to a tetrahydrofolate species as a methyl-group. From a part of the methylated tetrahydrofolate the methyl-group is released as glycine under the consumption of CO2, NH4 (which will be released again in later steps), and NADH. A part of this glycine combines with the remaining methylated tetrahydrofolate to produce serine. The remaining glycine is converted to glyoxylate and with acetyl-CoA further to malate, using different reactions in the two models. In the CETCH cycle, the fusion of glyoxylate and acetyl-CoA takes place as well,[11] and it is conceivable that the enzymes involved there could perform the same function if combined with the other reactions from the regeneration modules thereby reducing the number of additional regeneration enzymes needed. In both regeneration modules, serine is converted to pyruvate, which is the basis for acetyl-CoA formation via the pyruvate dehydrogenase. Also, both solutions include a NADH to NADPH transhydrogenase. Nonetheless, there are some differences: The MetaCyc solution requires one reaction less because ATP regeneration occurs together with malate formation and NADPH regeneration in the pyruvate dehydrogenase reaction, while in BiGG both regenerations are coupled to an alpha-keto-glutarate/glutamate/glutamine conversion cycle. Furthermore, the BiGG solution requires 4 methanol (and yields 0.5 malate), while the MetaCyc solution uses 5 methanol and 1 CO2 (and produces 1 malate). Both solutions have a relatively low MDF (3.33 kJ/mol for the MetaCyc and 0.54 kJ/mol for the BiGG solution). Of the three carbohydrates glucose, xylose, and sorbitol, the last allows for the smallest regeneration modules with 12 reactions in both networks, making it the preferable substrate of these three. Despite the fact that an acetyl-group must be generated for the CETCH-cycle, acetate as substrate turned out not to be promising because in neither network a solution was found. Formate alone cannot be used for the regeneration tasks, probably because of its highly oxidized nature and the restriction that no net CO2 must be produced. For succinate no solutions were found either, despite the fact that its conversion to malate should quite easily render the required reduction equivalents. Lactate is then the only of the four acids for which regeneration modules were found. In MetaCyc, the smallest module contains only 11 reactions, while in BiGG it needs 14 reactions.

Thermodynamics of the Combined Production (CETCH) and Regeneration Modules

The MDF in Table and discussed above are given for the stand-alone regeneration modules and can decrease when the modules are integrated with the CETCH cycle. However, it is straightforward to calculate the resulting MDF of the integrated system by taking the union of the regeneration module reactions (without the consumption pseudoreactions) and the CETCH reactions and recalculating the MDF. The results are shown in the Supporting Information, and it turns out that the regeneration modules shown in Figures –4 are still thermodynamically feasible when combined with the CETCH cycle, although with reduced MDF. However, several of the other solutions have an MDF < 0 when combined with the CETCH cycle, which means that they would not be suitable regeneration modules in this context. It is possible to exclude such solutions during the MILP optimization directly by enforcing, e.g., the CETCH reactions to be active with the required fluxes (using α and β of eq ) instead of the consumption pseudoreactions (eqs –14). For the regeneration modules that are thermodynamically feasible when combined with the CETCH cycle, we also calculated the concentration ranges of the participating metabolites that are necessary to keep the MDF ≥ 0.01 (cf. Supporting Information, only those metabolites are listed whose lower or upper concentration bound differs from the default concentration bound). Such results are potentially useful for assessing the overall practicality of the pathway.

Regeneration Modules with Multiple Substrates and Alternative Cofactor Requirements

So far, we only considered single inputs, but it is also possible to have multiple substrate uptakes open together to see whether this would allow for smaller regeneration modules. We therefore calculated 20 solutions in each network with all 9 substrate uptakes open (cf. Supporting Information). It turns out that all these solutions require 8 reactions which means that, in this setting, multiple substrates do not make smaller regeneration modules possible. In fact, for the BiGG model, all 20 solutions use glycerol as single substrate only. All MetaCyc solutions also use glycerol, but there are some that co-utilize acetate or acetate and formate. Interestingly, neither acetate nor formate are usable as single substrate, but might be useful in conjunction with glycerol. All the results above fulfill the requirements of the CETCH 5.0 cycle, but earlier CETCH versions had different cofactor requirements.[11] For instance, if the second reductive carboxylation step (acrylyl-CoA to methylmalonyl-CoA) is replaced by an ATP-dependent carboxylation step (propionyl-CoA to methylmalonyl-CoA) as in CETCH 1.0, then one iteration of this modified cycle requires 3 NADPH, 2 ATP and 1 AcCoA (instead of 4 NADPH, 1 ATP, and 1 AcCoA in CETCH 5.0). We repeated the single substrate calculations with these new requirements to assess the changes in minimal regeneration module sizes. The results (cf. Supporting Information) show that, depending on the network and the substrate, the minimal module sizes differ by at most one reaction. In the BiGG model, the minimal size increases slightly for glycerol and lactate while it remains unchanged for the other substrates. For the MetaCyc model, the size decreases for glucose, sorbitol, and methanol, but increases for xylose and lactate. Thus, whether trading one NADPH for one ATP is favorable or not depends on the substrate used in the regeneration module.

Computing Modules with Integrated Production and Regeneration

So far, the focus of MEMO has been on regeneration modules for cofactors, which are often required for cell-free production system. However, MEMO is more general and can search for any metabolic module as long as the module specification can be formulated with the type of constraints used in MEMO. In fact, with MEMO it is also possible to design entire production systems with desired input-output stoichiometries where production and cofactor regeneration are integrated. Computing such integrated modules comes with the advantage that thermodynamic infeasibilities cannot arise when merging production and regeneration module (as seen for some regeneration modules computed for the CETCH cycle). As an example, we consider the production of the monoterpene limonene from glucose which has recently been achieved with a cell-free system.[8] This particular conversion comprises 23 reaction steps and produces one mol limonene per three mol glucose (Supporting Figure 1 in the original ref (8)). It basically consists of two parts, the glycolysis which converts glucose to acetyl-CoA and cofactors and the mevalonate pathway that uses acetyl-CoA together with the cofactors to produce limonene. Because the glycolysis produces more reduction equivalents than needed for the mevalonate pathway the system has been designed to incorporate a purge valve that (i) removes excess reduction equivalents (via NoxE, an NADH oxidase) and (ii) produces balanced amounts of NADH/NADPH (by a combined use of a NAD-specific and of a NADP-specific glyceraldehyde-3-phosphate dehydrogenase). We checked whether this system is in the solution space of the MetaCyc network by fixing the fluxes of the glycolysis, mevalonate pathway, and purge valve to the values required to the production of one limonene from three glucose (Supporting Information). This leads to a solution which is the same as in ref (8), which shows that this limonene production system can in principle be found by MEMO. To look for alternative production systems, we specified a minimal yield of 1/3 limonene per glucose as sole constraint and then calculated 10 smallest solutions (cf. Supporting Information). All the found solutions require 21 reactions and thus less than for the original system. However, the MDF of the original system is higher (1.03 kcal/mol = 4.31 kJ/mol) than the best MDF of the identified alternative solutions (0.75 kcal/mol = 3.14 kJ/mol). One of the computed solutions is similar to the original system[8] in that it uses many reactions from the glycolysis, but the others mix glycolytic reactions with those of the Entner–Doudoroff pathway. There also are variations as to how cofactor production and consumption are tied together, but in all cases excess reduction equivalent removal proceeds via NAD(P)H oxidases.

Conclusions

In this work, we presented MEMO, a generic approach to identify smallest metabolic modules fulfilling specified stoichiometric and thermodynamic constraints. Although MEMO has many possible applications, herein we focused on its use for finding regeneration modules for cell-free production systems. As exemplified with the CETCH cycle, MEMO is very flexible in taking into account diverse requirements (e.g., regeneration of two cofactors plus one acetyl-group with specific stoichiometry, net production of CO2 not permitted, net synthesis of malate allowed, minimum threshold for MDF) and is able to enumerate multiple solutions in reasonable time also in very large universal networks. A method that may hold similar capabilities but was so far not used for the design of cell-free (regeneration) modules is optStoic.[26] For some predefined reactants and products, this two-step approach identifies in a first step overall conversions that maximize a given objective function (e.g., maximization of product yield). The calculations in this first step are solely based on stoichiometric balance and Gibbs energy changes of the reactants and products without taking into account the reaction steps between them. The actual metabolic pathway or network (generating the overall conversions found in the first step) is identified in the second step by searching for suitable combinations of reactions from a given reaction database. While the applicability of this approach has been proven in several case studies, for example, for finding alternative pathways in metabolic networks,[26] our approach is simpler and requires only one single optimization step to identify stoichiometrically and thermodynamically feasible solutions. In fact, overall conversions found in the first step of optStoic might not have a corresponding pathway in the reaction database used in the second step. Moreover, even if a stoichiometrically balanced pathway has been identified in the second step of optStoic, it may happen that it is thermodynamically infeasible within the defined concentration ranges of the intermediate metabolites (optStoic tests, in a preprocessing step, separately for each reaction whether it can proceed in a certain direction with the given metabolite concentration ranges, however, it does not ensure that a metabolite concentration vector within the specified concentration ranges exits where all reactions of the found pathway are active in the required direction). MEMO directly accounts for thermodynamic feasibility via the MDF approach and it even allows one to set lower thresholds for the MDF thus providing a flexible approach for an integrated search of metabolic modules. The regeneration modules found for the CETCH cycle appear very interesting and, in contrast to the original system, they all avoid accumulation of side products and use of expensive compounds such as acetyl-CoA. The modules found with glycerol as substrate are especially promising and would require only 8 metabolic reactions (enzymes) which is even less than the half of the enzymes used so far in the CETCH cycle. The analysis of the glycerol solutions also revealed that the modules, although with identical net stoichiometry, may nevertheless have different properties. Apart from their thermodynamic driving force, this concerns in particular the phenomenon that the modules may either consist of a single elementary mode or of a combination of modes. In the first case, the demanded stoichiometry will be exactly fulfilled when the substrate is converted by the module. In the second case, the demanded stoichiometry is feasible with the respective module, however, alternative output stoichiometries would also be possible. When coupling production and regeneration module, operation of the production module will push the net stoichiometry of the regeneration module toward that of the production module. However, regeneration modules consisting of multiple elementary modes have the advantage that they can balance the spontaneous loss (decay) of cofactors (e.g., oxidation of NAD(P)H or hydrolysis of ATP), which, in the long run, may lead to problems in modules with fixed stoichiometries (single modes). Such flexible regeneration modules have thus similar (desired) functionalities as the recently published ATP rheostat[25] or as purge valves.[8,27] Although the application focus herein was on regeneration modules, MEMO is much more general and can be used to search for other types of cell-free (or even intracellular) metabolic modules as well. As one additional application for cell-free systems, we used the example of limonene synthesis to demonstrate that MEMO also supports the design of entire cell-free production systems with integrated production and regeneration: instead of fixing first the production module and finding then a suitable regeneration module, MEMO can also directly—in one step—identify the smallest integrated cell-free system with desired input–output stoichiometries. Overall, together with the adapted MetaCyc and BiGG master networks, our MEMO approach thus provides a powerful and flexible framework for designing cell-free production systems and can likewise be used for detecting or/and designing (intra)cellular metabolic modules with desired properties.

24 in total

1. In vitro production of n-butanol from glucose.

Authors: Borimas Krutsakorn; Kohsuke Honda; Xiaoting Ye; Takashi Imagawa; Xiaoyu Bei; Kenji Okano; Hisao Ohtake
Journal: Metab Eng Date: 2013-09-19 Impact factor: 9.783

2. Cell-free metabolic engineering: production of chemicals by minimized reaction cascades.

Authors: Jan-Karl Guterl; Daniel Garbe; Jörg Carsten; Fabian Steffler; Bettina Sommer; Steven Reiße; Anja Philipp; Martina Haack; Broder Rühmann; Andre Koltermann; Ulrich Kettling; Thomas Brück; Volker Sieber
Journal: ChemSusChem Date: 2012-10-19 Impact factor: 8.928

3. Regeneration of adenosine triphosphate from glycolytic intermediates for cell-free protein synthesis.

Authors: D M Kim; J R Swartz
Journal: Biotechnol Bioeng Date: 2001-08-20 Impact factor: 4.530

Review 4. A critical comparison of cellular and cell-free bioproduction systems.

Authors: Nico J Claassens; Simon Burgener; Bastian Vögeli; Tobias J Erb; Arren Bar-Even
Journal: Curr Opin Biotechnol Date: 2019-06-14 Impact factor: 9.740

Review 5. The emerging impact of cell-free chemical biosynthesis.

Authors: Kristen M Wilding; Song-Min Schinn; Emily A Long; Bradley C Bundy
Journal: Curr Opin Biotechnol Date: 2018-01-05 Impact factor: 9.740

6. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0.

Authors: Laurent Heirendt; Sylvain Arreckx; Thomas Pfau; Sebastián N Mendoza; Anne Richelle; Almut Heinken; Hulda S Haraldsdóttir; Jacek Wachowiak; Sarah M Keating; Vanja Vlasov; Stefania Magnusdóttir; Chiam Yu Ng; German Preciat; Alise Žagare; Siu H J Chan; Maike K Aurich; Catherine M Clancy; Jennifer Modamio; John T Sauls; Alberto Noronha; Aarash Bordbar; Benjamin Cousins; Diana C El Assal; Luis V Valcarcel; Iñigo Apaolaza; Susan Ghaderi; Masoud Ahookhosh; Marouen Ben Guebila; Andrejs Kostromins; Nicolas Sompairac; Hoai M Le; Ding Ma; Yuekai Sun; Lin Wang; James T Yurkovich; Miguel A P Oliveira; Phan T Vuong; Lemmer P El Assal; Inna Kuperstein; Andrei Zinovyev; H Scott Hinton; William A Bryant; Francisco J Aragón Artacho; Francisco J Planes; Egils Stalidzans; Alejandro Maass; Santosh Vempala; Michael Hucka; Michael A Saunders; Costas D Maranas; Nathan E Lewis; Thomas Sauter; Bernhard Ø Palsson; Ines Thiele; Ronan M T Fleming
Journal: Nat Protoc Date: 2019-03 Impact factor: 13.491

7. A synthetic pathway for the fixation of carbon dioxide in vitro.

Authors: Thomas Schwander; Lennart Schada von Borzyskowski; Simon Burgener; Niña Socorro Cortina; Tobias J Erb
Journal: Science Date: 2016-11-18 Impact factor: 47.728

Review 8. From elementary flux modes to elementary flux vectors: Metabolic pathway analysis with arbitrary linear flux constraints.

Authors: Steffen Klamt; Georg Regensburger; Matthias P Gerstl; Christian Jungreuthmayer; Stefan Schuster; Radhakrishnan Mahadevan; Jürgen Zanghellini; Stefan Müller
Journal: PLoS Comput Biol Date: 2017-04-13 Impact factor: 4.475

9. Pathway thermodynamics highlights kinetic obstacles in central metabolism.

Authors: Elad Noor; Arren Bar-Even; Avi Flamholz; Ed Reznik; Wolfram Liebermeister; Ron Milo
Journal: PLoS Comput Biol Date: 2014-02-20 Impact factor: 4.475

10. OptMDFpathway: Identification of metabolic pathways with maximal thermodynamic driving force and its application for analyzing the endogenous CO2 fixation potential of Escherichia coli.

Authors: Oliver Hädicke; Axel von Kamp; Timur Aydogan; Steffen Klamt
Journal: PLoS Comput Biol Date: 2018-09-24 Impact factor: 4.475

1 in total

1. The automated Galaxy-SynBioCAD pipeline for synthetic biology design and engineering.

Authors: Joan Hérisson; Thomas Duigou; Melchior du Lac; Kenza Bazi-Kabbaj; Mahnaz Sabeti Azad; Gizem Buldum; Olivier Telle; Yorgo El Moubayed; Pablo Carbonell; Neil Swainston; Valentin Zulkower; Manish Kushwaha; Geoff S Baldwin; Jean-Loup Faulon
Journal: Nat Commun Date: 2022-08-29 Impact factor: 17.694

1 in total