Literature DB >> 32196640

Application of dynamic metabolic flux analysis for process modeling: Robust flux estimation with regularization, confidence bounds, and selection of elementary modes.

Lukas Hebing^1,2, Tobias Neymann¹, Sebastian Engell².

Abstract

In macroscopic dynamic models of fermentation processes, elementary modes (EM) derived from metabolic networks are often used to describe the reaction stoichiometry in a simplified manner and to build predictive models by parameterizing kinetic rate equations for the EM. In this procedure, the selection of a set of EM is a key step which is followed by an estimation of their reaction rates and of the associated confidence bounds. In this paper, we present a method for the computation of reaction rates of cellular reactions and EM as well as an algorithm for the selection of EM for process modeling. The method is based on the dynamic metabolic flux analysis (DMFA) proposed by Leighty and Antoniewicz (2011, Metab Eng, 13(6), 745-755) with additional constraints, regularization and analysis of uncertainty. Instead of using estimated uptake or secretion rates, concentration measurements are used directly to avoid an amplification of measurement errors by numerical differentiation. It is shown that the regularized DMFA for EM method is significantly more robust against measurement noise than methods using estimated rates. The confidence intervals for the estimated reaction rates are obtained by bootstrapping. For the selection of a set of EM for a given st oichiometric model, the DMFA for EM method is combined with a multiobjective genetic algorithm. The method is applied to real data from a CHO fed-batch process. From measurements of six fed-batch experiments, 10 EM were identified as the smallest subset of EM based upon which the data can be described sufficiently accurately by a dynamic model. The estimated EM reaction rates and their confidence intervals at different process conditions provide useful information for the kinetic modeling and subsequent process optimization.

Entities: CellLine Chemical Disease Species

Keywords: CHO fermentations; dynamic metabolic flux analysis; elementary modes; process modeling

Year: 2020 PMID： 32196640 PMCID： PMC7317520 DOI： 10.1002/bit.27340

Source DB: PubMed Journal: Biotechnol Bioeng ISSN： 0006-3592 Impact factor: 4.530

INTRODUCTION

For model‐based optimization of fermentation processes, for example, for process design or control, simple dynamic models which are accurate enough to predict the process behavior under varying conditions are needed (Frahm et al., 2002; Neddermeyer, Rossner, & King, 2015; Teixeira, Alves, Alves, Carrondo, & Oliveira, 2007). Essential elements of models of fermentation processes are the stoichiometry of the biochemical conversion and the dependency of the reaction rates on the process conditions. The metabolism of the cells is very complex and comprises hundreds of chemical reactions, so that it is infeasible to derive rate equations for all these reactions. For the derivation of efficient models—efficient meaning sufficiently accurate with predictive capabilities but not overly complex—the usage of small metabolic networks at steady state (Nolan & Lee, 2011) or selections of elementary modes (EM) as macro reactions (Gao, Gorenflo, Scharer, & Budman, 2007; Provost, 2006; Soons, Ferreira, & Rocha, 2011; Teixeira et al., 2007) have been shown to be a powerful approaches. EM are calculated from a metabolic network and therefore provide a physiologically meaningful abstraction of the metabolism without the need of including dynamic intracellular mass balances and reaction kinetics. Formal kinetics or black‐box models like multilayer perceptron networks (MLP) can then be used to model the dependency of the reaction rates of the EM on the process conditions or on the concentrations of species in the reactor. Alternative modeling approaches use empirical qualitative reaction schemes as macro reactions and fit the corresponding stoichiometric coefficients to data (Herold & King, 2014; Mailier & Wouwer, 2009). The complexity of these models is comparable with the complexity of models which use EM as macro reactions, as internal balances and reactions are lumped onto a few macroscopic pathways. However, physiological constraints as, for example, balances of internal components, energy carriers, or redox‐species cannot be taken into account and available biological knowledge is neglected. For the generation of models that are based on EM it is necessary to (a) select a set of EM from a usually large number of possible EM and (b) select and fit kinetics for each of the reactions in the set. Previously published methods for the selection of EM use estimated uptake or secretion rates. Soons et al. (2010) use a controlled random search algorithm to find the best set of EM which minimizes the difference to the estimated rates, and Abbate et al. (2019) link the number of reactions to the fraction of the explained variance in the estimated rates by comparing eigenvalues of a SVD decomposition. The subset is then found by solving a linear optimization problem. In both approaches, the trade‐off between the size of the set of EM and the accuracy of the representation of the estimated rates is exploited. For the selection and fitting of kinetics, estimates of the cell‐specific EM reaction rates are needed. Several algorithms that have been proposed for this analysis use measured cell‐specific fluxes of medium components (Poolman, Venkatesh, Pidcock, & Fell, 2004; Schwartz & Kanehisa, 2006). A disadvantage of these algorithms for the selection of the EM and the estimation of their reaction rates is that the estimation of the cell‐specific uptake or secretion rates has to be carried out first. This implies that derivatives of concentrations have to be computed in the first step, and, as the data is usually significantly corrupted by measurement errors, the resulting rates show large fluctuations as the derivation amplifies the errors. In this paper, we present methods for the analysis and selection of EM for process modeling where measurement data is used directly: A method for the analysis of EM reaction rates is presented which is based on the approach for dynamic metabolic flux analysis (DMFA) by Leighty and Antoniewicz (2011) for the computation of internal flux distributions of a metabolic network. This method was not developed for the analysis of EM, but, as will be shown, it can be used for this purpose as well. The advantage of the approach by Leighty and Antoniewicz (2011) is that random noise in the measurements is attenuated by solving a linear optimization problem such that smoothing and numerical differentiation is not necessary. The method from Leighty and Antoniewicz (2011) is extended in this paper by the following elements: Additional linear constraints are included so that the fluxes are estimated taking into account the irreversibility of certain reactions. For large sets of reactions, the objective which was proposed in the approach of Schwartz and Kanehisa (2005) is considered in the objective of the DMFA method as a regularization term. The propagation of the measurement errors to the computed rates is obtained by bootstrapping. It is shown that the regularized DMFA for EM method is more robust against measurement noise than other methods which are based upon the estimation of cell specific fluxes and that tuning of the algorithm is straightforward by using cross‐validation. The selection of a subset of EM which are suitable for dynamic process modeling is determined by the following procedure: Before the EM are analyzed, the original DMFA problem with additional constraints to account for the irreversibility of certain reactions provides the best possible fit for a given metabolic network and therefore can be used as a benchmark. A suitable selection of EM should exhibit a comparable fit to the data. If the quality‐of‐fit is sufficient, the selection of EM can be carried out, otherwise essential reactions in the metabolic network are missing. A first reduction is carried out by means of geometrical reduction in which geometrically similar EM are discarded from the possibly large set of possible EM. For a further reduction, a multiobjective genetic algorithm (GA) is used together with the DMFA for EM method. The two objectives of the multiobjective GA are to optimize the fit of the predictions to the measured data and to minimize the size of the employed subset of the EM To explore the trade‐off between a good fit to the data and a low number of EM which is preferable because it reduces the model complexity. After a set of EM has been found, the pdf of the corresponding EM reaction rates over time at different process conditions can be evaluated by a bootstrap method in which the estimation is repeated with resampled measurements. This information is useful for selecting and fitting kinetic expressions of the reactions by statistical methods. A quantification of the uncertainty of the estimated cell‐specific rates is necessary as the magnitude of this uncertainty varies considerably during the course of a fermentation process. Figure 1 gives a graphical overview about the different steps of our modeling procedure.

Figure 1

Overview of the steps for the selection and analysis of EM for process modeling. After a metabolic network has been chosen, the possible quality‐of‐fit can be tested using the bounded DMFA method. If this fit is sufficient, a selection of EM from this network can be performed using the geometrical reduction and the multiobjective genetic algorithm. The reaction rates of the selected set of EM and their confidence intervals are then obtained from the DMFA for EM method with bootstrapping. To obtain a dynamic model, kinetic equations are finally fitted to the estimates of the reaction rates. DMFA, dynamic metabolic flux analysis; EM, elementary modes [Color figure can be viewed at wileyonlinelibrary.com] Some elements of this procedure were already published presented in (Hebing, Neymann, Thüte, Jockwer, & Engell, 2016). This contribution extends this study by (a) adding regularization to the DMFA for EM problem, (b) adding linear constraints to the original DMFA approach, (c) showing how the bounded DMFA method can be used for the evaluation of a metabolic network, and (d) calculating confidence intervals of the specific reaction rates. The selection and analysis of EM from a small metabolic network is demonstrated in Section 4 with real data from a CHO cultivation fed‐batch process under varying conditions based on an extended metabolic network from Nolan and Lee (2011). The selection and fitting of the kinetic equations based on these estimates is only sketched, as this is beyond the scope of this contribution.

THEORY

In this section, a short introduction of the original DMFA method by Leighty and Antoniewicz (2011) (which we will call unbounded DMFA) and the related new formulations (bounded DMFA and DMFA for EM) is given. Furthermore, the evaluation of the confidence intervals of the DMFA estimates and the regularization are explained.

Dynamic metabolic flux analysis

The differential equations that govern the evolution of the concentrations of the species in the reaction medium in a batch process (i.e., without addition or removal of substances from the reaction volume) can be calculated from the time‐dependent vector of fluxes in the metabolic network, : where P is the (external) stoichiometric matrix, the cell‐specific flux vector, and is the concentration of viable cells. The volumetric flux vector is: The evolution of the concentrations of the species inside the cell can also be calculated from the time‐dependent fluxes in the metabolic network, . The steady state assumption for the internal metabolites in metabolic flux analysis (MFA) postulates that the derivatives of the internal concentrations are zero. This results in a, usually under‐determined, system of linear equations: where N is the (internal) stoichiometric matrix. Under this assumption, can be calculated from volumetric rates of the so‐called free fluxes : where null(N) denotes that . All fluxes which satisfy Equation ((4a), (4b)) fulfill the steady state assumption for the internal metabolites. Leighty and Antoniewicz (2011) assume that the volumetric free fluxes are piece‐wise linear over time between inflection points , such that Equation (1) has a simple analytic solution. The values of can then be determined by solving the unbounded DMFA problem (5): where sums of squared residuals (SSR) is the sim of squared residuals: Each in Equation (6) can be computed as a linear combination of the estimated fluxes . The number of inflection points is usually chosen manually according to the expected profiles of the volumetric rates which are computed by this method. In general, the resulting estimation problem is over‐determined and the estimated concentration profiles therefore are smoothened and the influence of the measurement noise is averaged out to a certain extent. The smoothing of the estimated concentration profile is a consequence of the limited flexibility of the assumed piece‐wise linear volumetric reaction rates. For further details, the reader is referred to Leighty and Antoniewicz (2011). In the bounded DMFA problem, constraints for certain internal fluxes that are non‐negative due to the irreversibility of the corresponding reactions are taken into account: s.t. All fluxes which are calculated from Equations ((4a), (4b)) and ((7a), (7b)7) fulfill the steady state assumption and the irreversibility constraints. The (volumetric) fluxes which fulfill the steady state assumption and irreversibility constraints can also be expressed as a non‐negative linear combination of reaction rates of EM, assembled in the vector (Schuster & Hilgetag, 1994): where E is the elementary mode matrix. The profile of the (volumetric) EM reaction rates over time is obtained by solving the DMFA for EM problem (9): s.t. where is the value of the (volumetric) reaction rate of EM k at the inflection point , is the number of EM and the number of inflection points. Each can be computed as a linear combination of all estimated quantities . The cell‐specific reaction rates can be obtained from the volumetric rates by:

Regularization: Minimizing the norm of the specific reaction rates

For large sets of EM, the solution of the estimation problem (9) may lead to ill‐conditioned problems, so that even large changes in result in only small changes of the value of the objective function, as many EM with a similar stoichiometry exits. To overcome this problem, a penalty term is added to the cost function: where is the concentration of viable cells and is a scalar weighting factor which optimal value is found by cross‐validation. With this additional term, the L2‐norm of the cell‐specific rates is penalized. Penalizing cell‐specific rates is preferred over penalizing volumetric rates as unwanted and unrealistic behavior on the cellular level is less likely to occur. Otherwise, unrealistically high cell‐specific rates might be obtained from the DMFA for EM method in the beginning or at the end of a process where the concentration of viable cells is low. This criterion is also used by Schwartz and Kanehisa (2005) for the estimation of EM reaction rates.

Confidence intervals by resampling of measurements

Classical methods for the estimation of confidence intervals like the Cramer‐Rao lower bound are problematic in systems with many parameters from which a few might be unobservable, which can happen in the DMFA method when a large number of reactions are involved. Additionally, constraints on the reaction rates cannot be taken into account. We therefore propose to use a bootstrap method instead. Here, the measurements are resampled from their expected probability density functions. Often, the probability density functions of the measurements can be assumed to be Gaussian distributed and the variances can be estimated or are available from sensor data. Alternatively, these uncertainties can be estimated by a maximum likelihood estimator. The estimation of the reaction rates with the DMFA method can then be repeated using the sampled data. However, two major changes have to be made to get rid of the estimation bias by the DMFA method: The position of the inflection time‐points must be randomized. The regularization coefficient must be zero or small such that numerically ill‐conditioned estimations can be carried out without strongly influencing the result. From the sampled estimates of the reaction rates and their confidence intervals are obtained.

Choice of a subset of EM

The different variants of the DMFA method can be used to identify a set of active EM directly from measurement data and thus enable the modeler to select a suitable subset of EM for a dynamic process model. The two DMFA problems bounded DMFA and DMFA for EM for the complete set of EM are both based on the steady state assumption for the internal metabolites and take the irreversibility of reactions into account. The minimum SSR, which can be calculated from Equations (7) and (9) therefore are equal if the complete set of EM is used. When some EM are removed from the complete set, the corresponding columns in E and elements of R (Equation (8)) are deleted and the calculated SSR will increase. So the SSR value from the solution of the bounded DMFA provides a lower bound which is only dependent on the assumed metabolic network. Choosing a subset of EM will lead to a higher SSR value, that is, a worse fit to the measurement data. If the results from the bounded DMFA are not sufficiently accurate, a modification of the metabolic network should be considered before a subset of EM is chosen. For the selection of a subset for a process model, two algorithms are proposed: A geometrical reduction, which discards similar EM from the original set based on their cosine similarity. This algorithm can be used for a preliminary reduction if the initial number of EM is very large. The SSR value which is calculated using the DMFA for EM method can be used to ensure that no significant EM is removed. The algorithm is described in the appendix. A multiobjective GA which considers as objectives the SSR value and the number of EM in the set. The degrees of freedom which the GA optimizes are binary decision variables for each EM. The variable determines whether the corresponding EM is part of the subset or not. For each selected subset, problem (9) is solved using a linear solver to obtain the corresponding SSR value. The optimization problem can be written formally as: The resulting Pareto front describes the SSR which results from an optimized selection of EM as a function of the cardinality of the set of EM. The Pareto front helps the modeler to decide how many EM are necessary to capture the measured behavior of the process with an acceptable model error. The SSR value approaches the SSR value of the bounded DMFA problem when the number of EM in the subset increases. Thus, before any kinetic expression is fitted, the SSR value of the resulting sets of EM give an indication of the expected quality‐of‐fit and of the complexity of the process model which is based on this set of reactions. Practically, one will search for the lowest number of EM which still provide a desired fit to the data.

SIMULATION STUDY: EM ANALYSIS USING NOISY CONCENTRATION DATA

In this simulation study, the capability of predicting reaction rates of EM from noisy concentration measurements is tested. We compare the DMFA for EM method that was presented above with other commonly used techniques that are based on the analysis of cell‐specific uptake or secretion rates, namely Poolman et al. (2004) and Soons et al. (2011). As the estimation method of these cell‐specific rates from time‐series of noisy concentration measurements plays a significant role in the analysis, several combinations for smoothing, interpolation, and filtering are also tested. A small metabolic network with known kinetic terms was used to generate artificial measurement data of a fed‐batch process at different levels of measurement noise. The complete model, the chosen boundary conditions and the estimation methods are described in the appendix. Figure 2 shows all estimated reaction rates together with the real reaction rates of the EM.

Figure 2

True reaction rates of all EM and estimations with the Poolman, Soons and DMFA for EM methods with interpolation and filtering steps. For this example, the variance of the Gaussian noise was chosen such that the 95% confidence interval equals 5% of the magnitude of the measurements. DMFA, dynamic metabolic flux analysis; EM, elementary modes [Color figure can be viewed at wileyonlinelibrary.com] It can be seen that either smoothing or filtering is necessary to obtain good estimates when the Poolman or the Soons methods are used. The choice of the interpolation and filtering method which is used for calculating the specific rates has a significantly higher impact on the result than the estimation method itself. Table 1 shows the average approximation error at different levels of measurement noise. It can be seen that the estimations from the regularized DMFA for EM method are more accurate in the presence of realistic measurement noise. However, in the absence of noise the estimation is worse due to the lower flexibility of the piece‐wise linear volumetric rates.

Table 1

Average reconstruction error of different estimation methods for EM reaction rates at different levels of simulated measurement noise

Note: The calculation of specific rates is carried out using different interpolation (splines, smoothing splines)‐ and filtering methods (MA = moving average filter). No smoothing and filtering is employed for the DMFA for EM method. More details are described in the appendix.

Average reconstruction error of different estimation methods for EM reaction rates at different levels of simulated measurement noise Note: The calculation of specific rates is carried out using different interpolation (splines, smoothing splines)‐ and filtering methods (MA = moving average filter). No smoothing and filtering is employed for the DMFA for EM method. More details are described in the appendix.

REAL WORLD EXAMPLE: EM SELECTION AND ANALYSIS USING EXPERIMENTAL CHO FED‐BATCH FERMENTATION DATA

The concept for the choice of an EM reaction set was applied to data‐sets from six fed‐batch fermentations of a CHO culture. One experiment was excluded from this procedure and used as a validation data set for the model which was built using the chosen set of EM and fitted to the other five data sets. For brevity, we will show only three of the remaining five experiments in the following which represent the most “extreme” responses of the process to different set‐points for the pH value and glucose levels. Measurements of the viable cell density (), the total cell density, concentrations of the components of the medium, and dissolved oxygen‐sensor information were available and used. For confidentiality reasons, not more details can be given and not all measured components can be shown in the following.

Metabolic network

For the calculation of pseudo‐batch data from the fed‐batch measurements, the influence of the liquid feed as well as the mass‐transfer over the gas‐liquid phase boundary were compensated according to the shifting method which is described in the appendix. The metabolic network from which the EM were calculated was taken from Nolan and Lee (2011) and extended by the glyoxylate‐cycle and a maintenance reaction in which ATP is consumed. The network reactions are listed in Table 2.

Table 2

Reactions of the reduced metabolic network from Nolan and Lee (2011), extended as described. Components which are written in red are extracellular

Note: Components which are written in red are extracellular.

Reactions of the reduced metabolic network from Nolan and Lee (2011), extended as described. Components which are written in red are extracellular Note: Components which are written in red are extracellular. Before the EM analysis and the choice of an EM subset, it was determined how well the network can explain the evolutions of the concentrations in the measured data. With the methods unbounded DMFA (5) and bounded DMFA (7), the fluxes of the reactions in the network and their confidence intervals were estimated by minimizing the difference between the estimated concentration profile and the (resampled) measured data (cf. Section 2.3). Figure 3 shows the fit to the data of one experiment for four out of 10 different concentration profiles. Figure 4 shows the estimated reaction rates from eight chosen reactions of the metabolic network. It can be seen that the irreversibility of some reactions is violated when the unbounded DMFA method is used which leads to unrealistic results.

Figure 3

Figure 4

Estimated reaction rates of eight chosen reactions from the metabolic network obtained using the bounded DMFA (in blue) and unbounded DMFA (in red) methods for the data of one CHO fermentation fed‐batch experiment. The color shadings refer to the 95% and the 68% confidence bounds and the median of all estimations after 500 resamples of the measurements (cf. Section 2.3). Although the estimated concentrations are similar in the unbounded and bounded estimations, the differences in the estimated rates are quite large for some intracellular reactions. The major reason for this is the reversibility of the death rate in the unbounded DMFA method which also leads to lower reaction rates in anaplerotic reactions in the TCA cycle. The low concentration of the biomass in the beginning of the process leads to larger confidence bounds of the rates at these time‐points. DMFA, dynamic metabolic flux analysis [Color figure can be viewed at wileyonlinelibrary.com]

Measured data of different medium concentrations of one data set with the profiles of the estimated concentrations which were generated using the bounded DMFA (in blue, left column) and unbounded DMFA (in red, right column) methods. The color shadings refer to the 95% and the 68% confidence bounds and the median of all estimations after 500 resamples of the measurements (cf. Section 2.3). The estimated evolutions of the concentrations do not differ much, but from the evolution of mAb and Amm it can be seen that flux bounds bounds are useful for the reduction of the confidence bounds as the uptake of these substances is ruled out in the bounded DMFA. DMFA, dynamic metabolic flux analysis [Color figure can be viewed at wileyonlinelibrary.com] Estimated reaction rates of eight chosen reactions from the metabolic network obtained using the bounded DMFA (in blue) and unbounded DMFA (in red) methods for the data of one CHO fermentation fed‐batch experiment. The color shadings refer to the 95% and the 68% confidence bounds and the median of all estimations after 500 resamples of the measurements (cf. Section 2.3). Although the estimated concentrations are similar in the unbounded and bounded estimations, the differences in the estimated rates are quite large for some intracellular reactions. The major reason for this is the reversibility of the death rate in the unbounded DMFA method which also leads to lower reaction rates in anaplerotic reactions in the TCA cycle. The low concentration of the biomass in the beginning of the process leads to larger confidence bounds of the rates at these time‐points. DMFA, dynamic metabolic flux analysis [Color figure can be viewed at wileyonlinelibrary.com] The quality‐of‐fit of the bounded DMFA method determines how well the measured concentration profiles can be expressed by any selection of EM of the network. As this quality‐of‐fit is satisfactory, the EM analysis and the selection of a subset of EM were carried out.

Identifying the set of active EM

Using the metatool‐software (Pfeiffer, Nu, Montero, & Schuster, 1999), more than 18,000 EM were calculated from the network. With the geometrical reduction technique, based on the cosine similarity (cf. chapter 2.4), this number was first reduced. The resulting approximation errors for the different EM selections are shown in Figure 5. The tolerance value in the geometrical reduction was set to 1% of the initial SSR. With <100 EM this tolerance value is exceeded. It can be seen that a further reduction of the set of EM below 100 would lead to an significant increase of the SSR when the geometrical reduction is used.

Figure 5

Pareto front of the size of the optimized EM subsets and the corresponding optimized SSR values. The red lines indicate the SSR values which were obtained by solving the unbounded‐ and bounded DMFA problems. The SSR values for different EM subsets approaches the SSR value of the bounded DMFA with an increasing number of EM in the set. DMFA, dynamic metabolic flux analysis; EM, elementary modes; SSR, sums of squared residuals [Color figure can be viewed at wileyonlinelibrary.com] The multiobjective GA, described in Section 2.4, was then used to generate optimal selections of EM on the Pareto front between the approximation quality (SSR) over the number of EM. In each evaluation of the objective function of the multiobjective GA, the DMFA for EM method was used for all experiments. The number of inflection points was set to 5. Figure 5 shows the calculated SSR values for different reduced sets of EM together with the SSR of the unbounded and bounded DMFA problem for the training data. In this study, the MATLAB® implementation gamultiobj was used in which the NSGA‐II algorithm is utilized. Custom mutation and crossover functions were created to account for the binary variables (Equation (12)). The computation time of the GA was approximately 4 hours on an Intel four core i7 desktop computer with 2.67 GHz. The SSR value of the bounded DMFA problem provides the lower limit of the SSR values for the subsets of EM. This lower bound is only dependent on the network itself and not on the number and the choice of the EM. The geometrical reduction turned out to be a suitable algorithm for reducing large sets of EM without a loss of physiologically important EM, but if smaller subset sizes are aimed at, the evolutionary algorithm gives much better results. From the Pareto front of the multiobjective evolutionary algorithm, it is easily possible to select a subset of EM for a process model. The set should be as small as possible but with an acceptable SSR value (i.e., an acceptable fit to the data). The fit to the data of one experiment for four out of ten different concentration profiles is shown for different selections of EM subsets in Figure 6. As expected from the Pareto front (cf. Figure 5), the quality of fit of subsets with 10 reactions and more is comparable to the result obtained by the bounded DMFA method. For <10 reactions, the approximation becomes significantly worse. With <10 EM, essential pathways seem to be missing and the data cannot be reconstructed well enough. The stoichiometry of the 10 EM is shown in Table 3. The actual state of the metabolism can sufficiently be described by a combination of these metabolic modes. The following metabolic features are present in the set of EM:

Figure 6

Measured data of four different medium concentrations of one experiment of the CHO fermentation with the profiles of the estimated concentrations ĉ which were generated by the DMFA for EM method using different EM subsets from the Pareto front of Figure 5. The sizes of the subsets are: . Additionally, the death‐rate was also included. DMFA, dynamic metabolic flux analysis; EM, elementary modes [Color figure can be viewed at wileyonlinelibrary.com]

Table 3

Stoichiometry of the chosen subset of EM consisting of 10 EM

EM 1:	0.1649Glc + 0.010215Lac + 0.022254Gln + 0.66661O₂→0.36483Xv + 0.062105Amm + 0.50773CO₂
EM 2:	0.47847Glc + 0.006275Gln + 0.015173O₂→0.10287Xv + 0.86106Lac + 0.090627CO₂
EM 3:	0.23253Glc + 0.0004173Gln + 9.2732e−05O₂→0.046366Xv + 0.42958Glu + 0.87011CO₂
EM 4:	0.045706Glc + 0.41079Lac + 0.37711Gln + 0.059863Amm + 0.18476O₂→0.10112Xv + 0.79296CO₂
EM 5:	0.26767Glc + 0.014165Gln + 0.72659Amm + 0.50654O₂→0.31478mAb + 0.1456Glu + 0.15284CO₂
EM 6:	0.58685Glc + 0.024585Gln + 0.57456O₂→0.40304Xv
EM 7:	0.85201Glc + 0.01146Gln + 0.213O₂→0.25468mAb + 0.40461Glu
EM 8:	0.065947Glc + 0.655Lac + 0.53142Glu + 0.012693Gln + 0.26659O₂→0.1459Xv + 0.41281CO₂
EM 9:	0.70711Glu→0.70711Gln
EM 10:	0.57735Gln→0.57735Glu + 0.57735Amm
Death rate:	Xv→

Note: The death rate is added as an additional reaction.

Abbreviation: EM, elementary modes.

Oxic‐ and anoxic conversion of glucose with‐ and without lactate formation Different yields for biomass and/or product formation Utilization of different nitrogen sources Reutilization of by‐products like lactate. Measured data of four different medium concentrations of one experiment of the CHO fermentation with the profiles of the estimated concentrations ĉ which were generated by the DMFA for EM method using different EM subsets from the Pareto front of Figure 5. The sizes of the subsets are: . Additionally, the death‐rate was also included. DMFA, dynamic metabolic flux analysis; EM, elementary modes [Color figure can be viewed at wileyonlinelibrary.com] Stoichiometry of the chosen subset of EM consisting of 10 EM Note: The death rate is added as an additional reaction. Abbreviation: EM, elementary modes.

EM reaction rates and confidence intervals

After the set of 10 EM has been found, the confidence intervals for the EM reaction rates are estimated using the bootstrap method (cf. Section 2.3). For each component, the measurement uncertainty was assumed to be a linear function of the magnitude of the measurement: where and were estimated from replicates of the experiments. The sampled estimates for four different components in three different experiments are shown in (Figure 7). The differences in this experiments are due to different process conditions. The estimated EM reaction rates and their confidence intervalsare shown in Figure 8. It can be seen that the magnitude of the estimation uncertainty varies significantly over time. Especially in the beginning and at the end of the process, the confidence bounds of the reaction rates are very large. This is due to the low concentration of viable cells. As a consequence, no significant differences in the reaction rates can be observed. Only in the middle of the process, some reaction rates are clearly different due to the different process conditions. In this example, the reaction rates of , , and are not significantly influenced by the different process conditions but the other EM show observable differences.

Figure 7

Figure 8

Estimated EM reaction rates r(t) [mmol/106 cells/h] of three different experiments at different conditions which were generated using the DMFA for EM method using the selected 10 EM. The color shadings refer to the 95% and the 68% confidence bounds and the median of all estimations after 500 re‐samples of the measurements (cf. section 2.3). DMFA, dynamic metabolic flux analysis; EM, elementary modes [Color figure can be viewed at wileyonlinelibrary.com]

Concentration data of three different experiments at different conditions with the profiles of the estimated concentrations ĉ which were generated using the DMFA for EM method using the selected 10 EM. The color shadings refer to the 95% and the 68% confidence bounds and the median of all estimations after 500 re‐samples of the measurements (cf. section 2.3). DMFA, dynamic metabolic flux analysis; EM, elementary modes [Color figure can be viewed at wileyonlinelibrary.com] Estimated EM reaction rates r(t) [mmol/106 cells/h] of three different experiments at different conditions which were generated using the DMFA for EM method using the selected 10 EM. The color shadings refer to the 95% and the 68% confidence bounds and the median of all estimations after 500 re‐samples of the measurements (cf. section 2.3). DMFA, dynamic metabolic flux analysis; EM, elementary modes [Color figure can be viewed at wileyonlinelibrary.com]

Further modeling steps

After the EM have been chosen and analyzed, kinetic equations must be selected for each reaction which should adequately represent the influence of the process conditions on the reactions. The selection and fitting of kinetics is not in the scope of this paper so these steps are only sketched. The information about reaction rates and their confidence intervals is very useful for the modeling of the process, as it enables the modeler to differentiate between significant and nonsignificant influences on the reaction rates. The kinetic functions for the reaction rates should express only the significant influences. The results of the real‐world example showed that the evaluation of the confidence intervals is especially useful for fitting specific rate equations to estimates as the observability of these rates heavily depends on the state of the process. In this example, the range of the confidence intervals comprises several orders of magnitude. In an earlier contribution (Hebing et al., 2016; Neymann, Hebing, & Engell, 2019) we proposed to select and fit nonlinear reaction kinetics to estimated rates of EM r by solving: here, is the parameter vector and is the standard deviation of the estimate which can be obtained from the bootstrap samples of the estimate . Only with a reliable estimate of , it is possible to fit and compare meaningful kinetics based on a statistical measure as, for example, the Akaike information criterion. Also black‐box models like MLP for reaction kinetics of the selected EM can be fitted to the estimated reaction rates , for example, by back‐propagation. Here also, the standard deviation of the rates can be used for the weighting of the values at different time‐points. If this information is neglected, the identified rate expressions would be corrupted by unreliable estimation noise, especially in beginning and at the end of the fed‐batch process where the observability of the specific reaction rates is bad due to a low viable cell density. With the selected EM and fitted kinetics, the dynamic process model for fermentation processes is ready to be used for process design, optimization, model‐predictive control, or other purposes.

CONCLUSION

This paper presents methods for the selection of small sets of EMs and for the estimation of reaction rates from noisy concentration measurements. We propose extensions to the method of Leighty and Antoniewicz (2011) which help to (a) consider irreversible reactions in the estimation and (b) increase the robustness against measurement noise due to additional regularization. It could be shown in a simulation study that the DMFA for EM method is superior for estimating EM reaction rates from noisy concentrations measurements. The presented algorithm for the selection of small subsets of EM was used to select a suitable set of EM for a model of a CHO fed‐batch fermentation using real‐world data. It could be shown that the metabolism can be described by 10 EM with an acceptable accuracy at the experimental conditions. Using a bootstrap resampling method, the confidence intervals of the EM reaction rates of this set were calculated. This information can then be used for the choice and fitting of kinetic equations.

Table 4

Parameter values, initial conditions, and inputs used for the generation of artificial fed‐batch data

Parameter name	value	unit
Ain	300	mmol/L
fx	0.1	g/mmol
μd,min	0.01	1/hr
μd,add	0.025	1/hr
Kd,C	15	mmol/L
ν1,max	1.5	mmol/hr/g
KM,A	40	mmol/L
ν3,max−	0.25	mmol/hr/g
KM,C	10	mmol/L
KI,A	5	mmol/L
ν2,max	0.5	mmol/hr/g
A(t=0)	100	mmol/L
C(t=0)	0	mmol/L
Xv(t=0)	0.1	g/L
V(t=0)	1	L
V˙Feed(t∈{[0,80),[85,120),[125,140),[145,200]})	0	L/hr
V˙Feed(t∈{[0,80),[80,85),[120,125),[140,145)})	0.05	L/hr

11 in total

1. METATOOL: for studying metabolic networks.

Authors: T Pfeiffer; I Sánchez-Valdenebro; J C Nuño; F Montero; S Schuster
Journal: Bioinformatics Date: 1999-03 Impact factor: 6.937

2. Adaptive, model-based control by the Open-Loop-Feedback-Optimal (OLFO) controller for the effective fed-batch cultivation of hybridoma cells.

Authors: Björn Frahm; Paul Lane; Hendrik Atzert; Axel Munack; Martin Hoffmann; Volker C Hass; Ralf Pörtner
Journal: Biotechnol Prog Date: 2002 Sep-Oct

3. Dynamic model of CHO cell metabolism.

Authors: Ryan P Nolan; Kyongbum Lee
Journal: Metab Eng Date: 2010-10-07 Impact factor: 9.783

4. A quadratic programming approach for decomposing steady-state metabolic flux distributions onto elementary modes.

Authors: Jean-Marc Schwartz; Minoru Kanehisa
Journal: Bioinformatics Date: 2005-09-01 Impact factor: 6.937

5. Dynamic metabolic modeling for a MAB bioprocess.

Authors: Jianying Gao; Volker M Gorenflo; Jeno M Scharer; Hector M Budman
Journal: Biotechnol Prog Date: 2007 Jan-Feb

6. Automatic identification of structured process models based on biological phenomena detected in (fed-)batch experiments.

Authors: Sebastian Herold; Rudibert King
Journal: Bioprocess Biosyst Eng Date: 2013-12-10 Impact factor: 3.210

7. Hybrid elementary flux analysis/nonparametric modeling: application for bioprocess control.

Authors: Ana P Teixeira; Carlos Alves; Paula M Alves; Manuel J T Carrondo; Rui Oliveira
Journal: BMC Bioinformatics Date: 2007-01-29 Impact factor: 3.169

8. A method for the determination of flux in elementary modes, and its application to Lactobacillus rhamnosus.

Authors: M G Poolman; K V Venkatesh; M K Pidcock; D A Fell
Journal: Biotechnol Bioeng Date: 2004-12-05 Impact factor: 4.530

9. Quantitative elementary mode analysis of metabolic pathways: the example of yeast glycolysis.

Authors: Jean-Marc Schwartz; Minoru Kanehisa
Journal: BMC Bioinformatics Date: 2006-04-03 Impact factor: 3.169

10. Application of dynamic metabolic flux analysis for process modeling: Robust flux estimation with regularization, confidence bounds, and selection of elementary modes.

Authors: Lukas Hebing; Tobias Neymann; Sebastian Engell
Journal: Biotechnol Bioeng Date: 2020-05-12 Impact factor: 4.530

1 in total

1. Application of dynamic metabolic flux analysis for process modeling: Robust flux estimation with regularization, confidence bounds, and selection of elementary modes.

Authors: Lukas Hebing; Tobias Neymann; Sebastian Engell
Journal: Biotechnol Bioeng Date: 2020-05-12 Impact factor: 4.530

1 in total