Literature DB >> 35557667

Modeling Based Identifiability and Parametric Estimation of an Enzymatic Hydrolysis Process of Amylaceous Materials.

Daniel Padierna-Vanegas^1,2, Juan Camilo Acosta-Pavas^3,4, Laura María Granados-García¹, Héctor Antonio Botero-Castro².

Abstract

This work presents the modeling of an enzymatic hydrolysis process of amylaceous materials considering the parameter identification problem as a basis for the construction of the model. For this, a modeling methodology is modified in order to apply the identifiability property and improve the proposed model structure. A brief theoretical explanation of the identifiability is described. This concept is based on the observability property of a nonlinear dynamic system. The used methodology is based on the phenomenological based semiphysical model (PBSM). This methodology visualizes that the structure of a dynamic model can only improve with new mass or energy balances suggested by model suppositions. Additionally, a computer algorithm is included in the methodology to validate if the model is structurally locally identifiable or know if the parameters are unidentifiable. Also, an optimization algorithm is used to obtain the numeric values of the identifiable parameters and, hence, guarantee the validity of the result. The methodology focuses on the liquefaction and saccharification stages of an enzymatic hydrolysis process. The results of the model are compared with experimental data. The comparison shows low errors of 7.96% for liquefaction and 7.35% for saccharification. These errors show a significant improvement in comparison with previous models and validate the proposed modeling methodology.

Entities: Chemical

Year: 2022 PMID： 35557667 PMCID： PMC9088767 DOI： 10.1021/acsomega.1c06193

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Many years ago, various industries introduced mathematical models into their processes to analyze the behavior of the process against variations in initial product concentrations, temperature, or hydrogen potential (pH). Specifically, the introduction of phenomenological models in the biotechnology industry has allowed for better understanding of dynamics of variables interacting in a bioprocess, such as substrates, products, enzymes, or microorganisms, and from there, achievement of improvements in design, optimization, and control. Some examples of these models can be seen in ref (1), where the phenomenology of a submerged membrane bioreactor was developed to couple biomass kinetics with bubble aeration to understand membrane behavior in wastewater treatment. Likewise, in ref (2) an online model was proposed for the growth of Chilean mussel crops in some regions of Chile for predictive purposes. Also in ref (3) a polymerization reactor was modeled to implement a scale-up methodology and to facilitate its implementation at the industrial level. Although the modeling of processes is becoming increasingly relevant, previous works show difficulties related to a considerable number of parameters involved in the model and the obtainment of numerical values that are able to adjust the dynamics of the nonlinear system. One of the solutions is the use of constitutive equations and bibliographic references. However, sometimes it is necessary to model a bioprocess that has not been extensively studied, and experimentation must be used to quantify some parameters. This method in dealing with parameter identification can be observed in ref (4), where a bioprocess identification strategy was proposed based on the generalized bioprocess model to improve the parameter adjustment associated with the reaction kinetics used in the modeling stage, or in refs (5) and (6), which use genetic algorithms to perform a global search of the parameters describing bioreactor behavior for the production of ethanol and culture of E. coli MC4110 in a semibatch reactor, respectively. Finally, in ref (7) was proposed a methodology for modeling the enzymatic hydrolysis process, considering the estimation of some parameters selected arbitrarily, using the mean square error as a cost function. However, that research does not present an analysis of identifiability, an important aspect to consider in this type of model due to its sensitivity at the moment of changing some of its parameters. The previous works allow an understanding of the methodologies used for parameter identification, but they do not give an explanation on whether the implemented algorithms provide a valid solution for the system, that is, if it is possible that, given a set of outputs, y(x, ψ, t), it will be possible to obtain a set of parameters p such that they can solve the dynamic system ẋ = f(x, θ, t).[8] The feasibility of this solution can be evaluated in dynamic systems based on the concept of identifiability. This concept can guarantee reliability in the process of parameter identification. Also, it can be used for iterating the modeling process in such a way that the algorithms used for obtaining these values are more reliable. Therefore, the main purpose of this work is to present the modeling of an enzymatic hydrolysis process of amylaceous materials, considering the problem of parameter identification, by using the verification of the identifiability as a basis for the construction of the model. These bioprocess are good candidates for applying the proposed methodology because the enzymatic hydrolysis is characterized by its complexity arising from the considerable number of model parameters and nonlinearities present in the bioprocess. Furthermore, the phenomenology of the process will force the designer to focus on the identifiability problem to develop the associated phenomenological based semiphysical model (PBSM). This process is the basis of a major research project in bioprocess estimation and control started by Acosta-Pavas and Ruiz-Colorado.[9] Additionally, a better model of the enzymatic hydrolysis process of amylaceous materials can contribute to understand better how to use agro-industrial wastes for production of value-added products. The article is organized as follows: First, the concept of identifiability and the method used to check it are explained. Next, the used methodology for process modeling aimed at parameter identification is explained step by step. The next section deals with the enzymatic hydrolysis model and its development, indicating hypotheses, definitions, equations, and identifiability analysis. Finally, the results of the strategy applied to enzymatic hydrolysis are discussed and the conclusions are presented.

Identifiability and the Method Used to Check It

The concept of identifiability was originally introduced by Bellman and Åström.[10] In this work, an error estimation function was defined aswhere y(t) is the output vector and ŷ(t) is the estimated output vector. In addition, y(t) is related to the nonlinear dynamic system:where x is the state vector, θ is the parameter vector associated with the state function f(x, θ, t), ψ is the parameter vector associated with the output function h(x, ψ, t), and x0 is the initial condition vector. If a parameter belongs to θ, the parameter is associated with the process phenomenology and is part of the proposed constitutive equations. On the other hand, if a parameter belongs to ψ, the parameter is associated with how process outputs are measured (installed sensor, signal processing, and output interpretation). Furthermore, the parameter vectors can be summarized in a unique vector p = [θ, ψ]. The system described by eq 2 shows that there are internal couplings between x, p, and t defined by the structure of f. Additionally, available information on the output system y also is defined by the structure of h, i.e., by internal couplings between x, p, and t. The last idea implies that a nonlinear dynamic system will be understood if the structures of related functions (f and h) are well-known, and the structure of related functions will be well-known if the p vector can be known or identifiable.[10] Considering eq 1, the system defined by eq 2 is structurally locally identifiable if there is a candidate parameter vector p̂ such that J(p) has a local minimum. If p̂ is also a global minimum, the system is structurally globally identifiable.[10] The last concept is an a priori property of the dynamic system; that is, the property only depends on the proposed model structure from eq 2. This paper only focuses on identifiability as an a priori property. The decision taken due to this is focused on allowing a phenomenological review of the model in comparison with a posteriori concepts based only on the data quality.[8]

Structural Identifiability as an Observability Redefinition

For obtaining a correct idea of the structural identifiability of some system, many alternative definitions related to the original concept of Bellman and Åström[10] have been proposed. These proposals include the use of Taylor series expansion, differential algebra, or similarity transformation.[8] However, in recent years, the study of structural identifiability for the observability concept has been a novel proposal due to its easy understanding and application in nonlinear systems. That proposal was accurately developed by Villaverde[11] as follows: With the model defined by eq 2 and L0h(x), L1h(x), and Lh(x) set to zero, the first and ith Lie derivatives of the output function h(x) (eq 2b) with respect to the state function f(x) (eq 2a) are defined as Then is locally observable around the neighborhood defined for the initial point x0 (eq 2c) if , where n is the number of states and is the nonlinear observability matrix calculated as The above condition is known as the nonlinear observability rank condition (ORC).[11] Now, let be the model defined by eq 2, p the static parameter vector (ṗ = 0), and x̃ the augmented state vector defined by eq 5a; then is structurally locally identifiable around the neighborhood defined for the initial point , if , where n is the number of parameters associated with the state function θ, n is the number of parameters associated with the output function ψ, and is the augmented nonlinear observability–identifiability matrix given for eq 5b.[11] The last condition is known as the nonlinear observability–identifiability condition (OIC). If this condition is fulfilled, then all elements into p are structurally locally identifiable and all states into x are locally observable around . Otherwise, at least one element into p is structurally unidentifiable or one state into x is unobservable.[11]

Toolbox for the OIC Computation

The computation of requires the use of symbolic derivation toolboxes. STRIKE-GOLDD (structural identifiability taken as extended-generalized observability with lie derivatives and decomposition) is a Matlab toolbox that analyses the local structural identifiability, observability, and invertibility of (possibly nonlinear) dynamic models of ordinary differential equations. This toolbox was developed by Villaverde et al.[12] Some advantages of STRIKE-GOLDD are the computation of a minimum number of Lie derivatives for reducing the computational cost for obtaining . If the OIC is not met, the toolbox can analyze each one of the model parameters and know which of these are structurally locally identifiable, structurally unidentifiable, and possible identifiable combinations of parameters. This feature allows the user to know problematic parameters in the model that could avoid obtaining a feasible solution in the parametric estimation. This packet aims to reduce the computational time of the system and obtain new knowledge about each one of the elements. The benefits enable the model to iterate and adjust the system to avoid obtaining the identifiability property. STRIKE-GOLDD is used to determine if the model parameters can be correctly identifiable through an optimization algorithm. If the OIC is not fulfilled, then the optimizer cannot guarantee that p̂ might be the solution vector for the identification problem and the model could describe the real process.

Basis for Parametric Identification

From the point of view of optimization algorithms, the problem of parameter identification in a dynamic system is a topic of constant development. Many approaches about the cost function for its use in parameter identification problems have been proposed in eq . Roeva[6] proposed a cost function defined by eq , where y(t) is not a continuous measurement but the process can obtain a finite quantity of samples m for the output vector with dimension n. From eq , J(p) can be modified to increase the weight of some samples into the algorithm. For example, Rivera et al.[5] uses the maximum possible value of each output as seen in eq . The purpose of that modification is to penalize estimation errors that could be above a defined boundary and affect the optimization trajectory. Meanwhile, Richelle and Bogaerts[4] uses the standard deviation σ of each measurement as seen in eq . The objective of that change is to penalize the noisiest measurements. In addition, J(p) can consider the number of experiments done m, as observed in eq . This cost function can be solved using a maximum likelihood estimator.[13] For executing the parameter identification as an optimization problem, the fmincon function available in the Matlab optimization toolbox was selected for generating a local minimum based on the formulated problem in the last section. In detail, the function requires as input arguments the cost function and matrices including the model constraints (boundaries, equalities, and inequalities). In addition, the toolbox can change the optimization algorithm to execute, but fmincon will use the interior point algorithm for both enzymatic hydrolysis stages. The interior point algorithm generates a new cost function by introducing a barrier function as seen in eq 10. This function is compounded for an additional logarithmic term as a barrier value s and a constraint term μ. As a consequence, the problem is changed from an inequality constraint problem to a limited equality problem[14] as follows: In regard to eq 10, the algorithm selects one of the following steps in each iteration: Direct step: The algorithm tries to solve the Karush–Kuhn–Tucker (KKT) equations for the optimization problem using a linear approximation. Conjugate gradient: After the algorithm tried to do a direct step but the obtained result was not feasible, the algorithm calculated the Lagrange multipliers for solving approximated KKT equations. In other words, the algorithm tries to minimize the problem in a reliability region, causing a feasible trajectory in each iteration.

Used Methodology for Process Modeling

The used methodology is based on the PBSM methodology, developed by Alvarez et al.[15] and applied in ref (9), which can generate gray-box and lumped parameter models. In this work, steps 6–11 were added in order to consider the identifiability of parameters. The methodology steps are as follows, and they are summarized in Figure .

Figure 1

Flow chart for the proposed methodology

Flow chart for the proposed methodology Step 1: Describe the process to identify the elements and physical interactions that should be in the model. This action is used as input for the next steps’ balances, variables, and constitutive equations. Step 2: Propose the model suppositions associated with the process. These statements must be considered the main objective of the model, involving variables, physical limits, and available tools for parameter identification and model testing. Step 3: Define the balances associated with the dynamic process. This step implies an initial PBSM, i.e., a state function f and an output function y with variables, inputs, and outputs. Step 4: Obtain constitutive equations and the parameter vector p of the system through the suppositions, balances, and variables indicated in steps 2 and 3. Step 5: Calculate the augmented nonlinear observability–identifiability matrix (eq 5). This step is developed by means of the STRIKE-GOLDD toolbox. If the matrix has full rank, then the system is structurally locally identifiable. Otherwise, the modeling process is reset from step 2 considering how to change the model to obtain the identifiability property. This return to step 2 can be possible because the toolbox indicates unidentifiable parameters that must be modified or removed from the PBSM. Step 6: Choose a cost function J(p) for the development of the optimization problem. If the previous steps were satisfactory, the PBSM will guarantee the existence of a local solution. So, the objective is to obtain a set of optimal values for the system parameters p̂ to describe the process through the proposed model. Step 7: Define the constraints based on the model description and related suppositions. This step is able to define a subspace such that p̂ is feasible. In the first iteration of the methodology, the constraints should be defined by means of the knowledge of the process and the model suppositions. The reason for the last statement is reflected in the fact that these inputs can give information about the logical scales, units, and values of the system parameters. Step 8: Get a data set associated with the model for using J(p) and finding p̂. For this case of study, the selected data set is based on the measurement of various samples of the process. Step 9: Filter the obtained data set for removing non-necessary information for the optimization algorithm. Step 10: Define an initial point p0 for initializing the optimization algorithm. Step 11: Execute an optimization algorithm to solve the parameter identification problem. If the obtained vector p̂ to the optimization process can fit the PBSM and there is not an alternative to improve the proposed model, then p̂ ≈ p; that is, the identification problem associated with the proposed model is achieved and the methodology generates an acceptable result. If p̂ cannot fit the PBSM, the methodology is re-executed from step 7 for improving the algorithm inputs (initial point, constraints, cost function, and data). If there is some feasible improvement to the PBSM, the methodology is reset to step 2. The proposed modeling methodology allows design of the PBSM with structurally locally identifiable parameters. Step 5 is very important to aim that property because it functions as a validation test for the parameters used in step 4 and suggested in step 2 and step 3. When the OIC condition cannot be guaranteed, the designer must review the results given by STRIKE-GOLDD and identify what parameters are problematic, i.e., unidentifiable. Due to each parameter having at least one associated constitutive equation, the designer must return to step 2 to review what supposition could cause or influence the use of that. When the designer identifies a supposition–parameter pair, it is possible to analyze how to modify this association without contradicting the process description:Making changes will improve the structure of the dynamic system if modifications are done based on a good knowledge of the real process. Each change in some supposition can modify significantly equations or variables for use in the next steps in the methodology loop. Redefine the supposition for obtaining new constitutive equations and new possible identifiable parameters. Change the supposition for modifying balances or constitutive equations and their associated parameters. Eliminate the supposition for redefining balances and their associated equations and variables. Add a new supposition for obtaining new balances and their associated equations and variables. When the methodology guarantees that the obtained model is structurally locally identifiable, this can add steps for estimating the values of parameters included in the PBSM via the optimization strategy (steps 6–11). The last idea is possible because, as was mentioned in section , the obtained identifiable model will have a candidate parameter vector p̂ such that an error estimation function J(p) proposed by the designer has a local minimum. Additionally, step 7 allows defining constraints based on the model suppositions such that the p vector helps to limit the feasible neighborhood and simulate correctly the process. For example, a model supposition can suggest rules for setting a logical value or range for the initial condition of the model. For that, if the p vector is obtained in the modeling methodology but the designer can identify an improvement point based on the final results, the methodology will suggest reviewing the model suppositions if a new knowledge of the process or its associated phenomena can set a new parameter neighborhood. However, improvement points also can be proposed based on adding new dynamic behavior. This option suggests proposing news suppositions, and consequently, new variables must be added without losing the identifiability property of the model.

Results and Discussions

Case of Study: Enzymatic Hydrolysis from Wheat Starch

An interesting case study to apply the methodology is the enzymatic hydrolysis process of amylaceous materials. These materials come from agro-industrial wastes, which, when subjected to hydrolysis processes, allow the production of materials for the production of jams, medicines, fuels, and others.[9,16−21] These agro-industrial wastes are associated with the production of cereal derivatives such as rice, corn, wheat, barley, and oats, legumes such as chickpeas, lentils, soy, and beans, and tubers such as potato, cassava, and yam.[22−24]

Step 1: Model Description

This type of agro-industrial residue is composed of two polymers: amylose, with a linear structure of d-glucose linked by α-(1, 4) bonds, and amylopectin, with a branched structure linked by α-(1, 4) and α-(1, 6) bonds.[22,25−27] Sourcing products derived from these polymers can be achieved through the process of enzymatic hydrolysis, which consists of swelling an initial substrate (starch) and then breaking the bonds α-(1, 4) and α-(1, 6) from the use of enzymes through a series of synergistic stages: gelatinization, liquefaction, and saccharification.[24,28−31] Each stage aims to modify the type of structure of d-glucose and the quantity in the final product. If there is a polymer composed for sd-glucoses linked, then the product has a polymer with a degree of polymerization (DP) equal to s. In the gelatinization stage, the starch granules are subjected to high temperatures, which allows swelling by absorption of water and subsequent release of amylose and amylopectin.[28,30,32−36] In the liquefaction stage, the amylose and amylopectin are degraded by the enzyme α-amylase, and the internal bonds α-(1, 4) and α-(1, 6) are hydrolyzed, producing the successive division of both polymers. An intermediate product is obtained, composed of smaller oligosaccharides (DP < 7).[20,29,30,34,36−39] Finally, in the saccharification stage, the products of the last stage are degraded by the enzyme amyloglucosidase until products such as glucose or maltose are obtained.[20,29,30,34,36−39]

Step 2: Model Suppositions

For the Liquefaction Stage

The bioprocess consists of the formation of oligosaccharides with DP ≤ 7. However, the production of maltopentose G5 includes concentrations of oligosaccharides among 3 < DP ≤ 7. The remaining products, glucose G1, maltose G2, and maltotriose G3, are not lumped together. On the basis of process data via high-performance liquid chromatography (HPLC), the liquefaction stage shows available measurements of G1, G2, and maltodextrins. The last values are equal to the weighted sum of the remaining oligosaccharides in the process. As the oligosaccharide DP becomes smaller, oligosaccharides with greater DP compete for the active site of the enzyme. As a consequence, there are reversible reactions among oligosaccharides with DP < 5. E01 hydrolyzes the internal bonds into the initial substrate S0, causing oligosaccharide production on a basis to the individual concentrations and the hydrolysis efficiency. The initial conditions of the stage are null for all oligosaccharides with DP ≤ 3. The growth of each oligosaccharide is equal to a first order kinetic model. It is an endothermic bioprocess. There is heat transfer in the process (temperature T) with the jacketed stirred tank (temperature T). The reaction velocity is associated with the process temperature by an Arrhenius function.

For the Saccharification Stage

The bioprocess consists of the formation of oligosaccharides with DP ≤ 7. However, the production of maltopentose G5 includes concentrations of oligosaccharides among 3 < DP ≤ 7. The remaining products, glucose G1, maltose G2, and maltotriose G3, are not lumped together On the basis of process data via high-performance liquid chromatography (HPLC), the liquefaction stage shows available measurements of G1, G2, and maltodextrins. The last values are equal to the weighted sum of the remaining oligosaccharides into the process. All oligosaccharides compete for the active site of the amyloglucosidase enzyme. The inhibition for substrate k is only considered for oligosaccharides with DP > 3. The initial conditions of the bioprocess are equal to the last values in the liquefaction stage before the introduction of amyloglucosidase enzyme. The behavior of each oligosaccharide in this stage is equal to Michaelis–Menten kinetics. With respect to maltotriose and maltose, the product of the oligosaccharide with lower DP can decelerate growth. In this stage, the reactions among oligosaccharides have a behavior over the following stoichiometry relationships: It is an endothermic bioprocess. There is heat transfer between the process (temperature T) with the jacketed stirred tank (temperature T). The reaction velocity is associated with the process temperature by an Arrhenius function.

Step 3: Definition of Variables and Dynamics

Table S1 in the Supporting Information describes the liquefaction process variables, where The dynamic system for the state vector (eq 12) is defined asAnd y is defined as The state vector of the system is equal to eq , where the dynamic corresponds to the rate concentrations of glucose, maltose, and maltodextrins and the process temperature. The mass balances associated with each oligosaccharide only depend on their concentration rates.[9] Meanwhile, the energy balances show a heat exchange using the thermal jacket Q and the heat taken using concentrations rates . The output vector of the system is equal to eq , where the outputs correspond to the concentrations of glucose, maltose, and maltodextrins and the process temperature. In comparison with Acosta-Pavas and Ruiz-Colorado[9] the maltodextrins concentration does not depend on maltotriose and maltopentose and also has a dependency of the values of β1 and β2. These parameters indicate a weighted sum relationship between those concentrations during the liquefaction process. For that reason, there are two additional parameters in the model with an equivalent constraint for parameter identification. Table S3 in the Supporting Information describes the saccharification process variables, where The dynamic system for the state vector (eq ) is defined as And y is defined as equal to eq 14. The state vector of the system is equal to eq , where the dynamic corresponds to the rate concentrations of glucose, maltose, and maltodextrins and the process temperature. The mass balances associated with each oligosaccharide depend on the stoichiometry relationships of the G5, G3, and G2 concentration rates in eq 11. Meanwhile, the energy balances show a heat exchange using thermal jacket Q and the heat taken using the concentration rates of the saccharification stage . The output vector of the system is equal to eq 14. In contrast with the liquefaction stage, the numeric values of β1 and β2 indicate a linear combination relationship, due to the maltodextrins being compounded by oligosaccharides with 3 ≤ DP ≤ 5, causing the system to take different values, although the output equation is the same in both stages. In addition, these parameters have an inequality constraint associated with the change in the relationship. That inequality avoids the maltodextrin concentration in an extreme case, the sum of oligosaccharides 3 ≤ DP ≤ 5 with the same weight or the representation of a unique oligosaccharide in the range.

Step 4: Constitutive Equations and Parameters

For the Liquefaction Stage

Reaction Velocity

The reaction velocity k associated with the oligosaccharides G is defined aswhere i ∈ DP = {1, 2, 3, 5} indicates the DP of the associated oligosaccharide, is the maximum reaction velocity of G, is the activation energy for G, R is the Arrhenius constant, and is a scale factor to compensate the final value of k.

First Order Reaction

The reaction for each oligosaccharide with DP = i is defined aswhere k is defined in relation to eq and h is the hydrolysis relationship between S0 and G.

Convective Heat Transfer

The heat transferred between the thermal jacket and the process is described aswhere U is the heat transfer coefficient, A is the heat transfer area, ρ is the product density, V is the process volume, c is the heat capacity, and T is the jacket temperature.

Heat Produced by the Process

The heat generated by the liquefaction stage is described aswhere ΔH is the process enthalpy, is defined per eq , and is the molecular weight of each oligosaccharide in GP.

Parameters

Based on constitutive equations, Table S1 in the Supporting Information describes the liquefaction parameters, where In addition, Table S2 in the Supporting Information provides the model constants used in the liquefaction stage.

For the Saccharification Stage

The reaction velocity v associated with the reaction rate is defined aswhere i ∈ r = {2, 3, 5} indicates the reaction rate associated with the oligosaccharide with DP = i, is the maximum reaction velocity of G, E is the activation energy for the final product (in this case, G1), R is the Arrhenius constant, and is a scale factor used to compensate for the magnitude of the exponential expression. In a similar way to in eq , is a scale factor including an offset to the magnitude of the exponential expression. The use of has a similar objective to in the liquefaction stage.

Reaction Rate of Maltopentose

The reaction rate associated with G5 is defined aswhere K5 is the inverse of the inhibition constant for , K3 is the inverse of the inhibition constant for , K2 is the inverse of the inhibition constant for , K is the inverse of the inhibition constant for , and K is the inverse of the inhibition constant for the substrate .

Reaction Rate of Maltotriose

The reaction rate associated with G3 is defined aswhere is the equilibrium constant associated with G3.

Reaction Rate of Maltose

The reaction rate associated with G2 is defined aswhere is the equilibrium constant associated with G2. In comparison with the traditional structure of Michaelis–Menten kinetics, in eqs 23, 24, and 25 the inverses of the inhibitions present in the stage are used. This change generated in each could be simpler to operate mathematically due to not adding the derivative of the quotient. Therefore, the model is easier for OIC computing in STRIKE-GOLDD concerning other models and permits validation of parameter identification. The heat transferred between the thermal jacket and the process is described in eq . The heat generated by the saccharification stage is described aswhere is defined in eqs 23, 24, and 25.

Parameters

Based on constitutive equations, Table S3 in the Supporting Information describes the saccharification parameters, where In addition, Table S4 in the Supporting Information provides the model constants used in the saccharification stage.

Step 5: Identifiability Analysis

The STRIKE-GOLDD packet in Matlab was used for computing the OIC. The initial model conditions were known. The software obtained rank equal to 15 in 2.8 s of executing time, and it calculated up to the third Lie derivative. This result indicates that is full rank, i.e., the system is structurally identifiable for the whole parameters. In previous proposals were not possible to guarantee this condition.[17,24,40] However, it was possible with the current model due to the changes in constitutive equations. The STRIKE-GOLDD packet in Matlab was used for computing the OIC. The initial model conditions were known. The software gave rank equal to 17 in 398.4 s executing time, and it calculated up to the fourth Lie derivative. This result indicates that is full rank; that is, the system is structurally identifiable for all of the parameters. In previous proposals, it was not possible to guarantee this condition.[17,24,40] However, it was possible with the current model due to the changes in the constitutive equations.

Step 6: Cost Function in the Liquefaction and Saccharification Stages

As intended in the proposed methodology, the enzymatic hydrolysis model in both stages has numerous parameters, 10 in the liquefaction stage and 12 in the saccharification stage. Let g(ψ) be an equality constraint function between ψ parameters and a constant c, and let g(ψ) be an inequality constraint function between ψ and another constant c, where the function must be less than or equal to the constant. Let p and p be the lower and upper boundaries of the vector p, respectively. In developing parameter identification in both the liquefaction stage and saccharification stage, J(p) is proposed assubject towhere y̅ and σ(y̅) are the mean value of the output j and the standard deviation associated with the sample i, respectively, and are defined as J(p) is subject to the mean values describe in eq 29. This structure allows use of the experimental data shown in ref (9) and obtains a point of comparison between the proposed model and the model included in the previously mentioned reference. Therefore, J(p) is a measurement of the integral quadratic error of the parameter identification adjusted to the standard deviation and mean of each sample for including all information on the measured data in comparison with previous works.[7,9] While J(p) takes a lower value, p̂ will be a local minimum nearer to the real value of p.

Step 7: Constraints’ Definition in the Liquefaction and Saccharification Stages

In the liquefaction stage, the algorithm does not consider eq and uses eq as the basis of eq for including the weighted sum constraint in the maltodextrins concentration. Meanwhile, the saccharification stage only considers eqs 28e–31 with the objective that ψ can adjust the maltodextrin concentration between the limits. Table S5 in the Supporting Information shows the numeric values of the constraints in the lower boundary p and the upper boundary p. In the liquefaction stage, the boundaries are based on previous results shown in the literature,[9,17] except the h parameters that must have a range between −1 and 0 in the proposed model. The last decision is due to the fact that each h must indicate the growing possibility of the respective oligosaccharide concerning S0. In the saccharification stage, the first numerical values of the boundaries are based on previous results shown in the literature.[9] Moreover, those values were iterated following expected values selected using the phenomenological knowledge of the system. The objective of that is to reduce the neighborhood and improve the solution obtained in p̂.

Steps 8 and 9: Obtain and Filter Measurements of the Liquefaction and Saccharification Stages

For each stage, Acosta-Pavas and Ruiz-Colorado[9] ran three experiments (m). For each experiment, y̅ and σ(y̅) were calculated. The details of the experiment can be found in refs (9 and 41). Based on this data set, some considerations were done for each model. In the liquefaction stage, the data set has six available measurements (m) taken in a process time of 2 h for each of the outputs defined in eq . In this data set, the first measurements (initial condition) were ignored because the data causes problems with the cost function. In detail, these data points might set singularities into the cost function due to standard deviations being equal to zero. In the saccharification stage, the data set has 15 available measurements (m) taken in a process time of 6.5 h for each of the outputs defined in eq . In this data set, the eighth and ninth measurements (initial condition) were ignored, justified on the same basis as in the liquefaction stage.

Step 10: Initial Point in the Liquefaction and Saccharification Stages

For the initial conditions x0 in each stage, the algorithm took the published values in ref (9). With respect to the initial values for parameter vector p0 for executing fmincon, these were selected such that the algorithm can obtain a feasible p̂ over the parameter boundaries p and p. These values are shown in Table .

Table 1

Initial Values for p and x

Liquefaction model			Saccharification model
Variable	Value	Units	Variable	Value	Units
G₁	0		G₁	0.0041
G₂	0		G₂	0.0063
G₃	0		G₃	0.0259
G₅	213.49		G₅	0.2419
T	331.95	[K]	T	333.75	[K]
	1.55		K_I	80000
	404		K_S	390
	185		K₂	460000
	14.25		K₃	2.15
h₁	–0.3407	[ – ]	K₅	3500
h₂	–0.3362	[ – ]		150
h₃	–0.3228	[ – ]		0.008
h₅	–0.4567	[ – ]		9
β₁	0.0612	[ – ]		3000
β₂	0.9388	[ – ]		3500
-	-	-	β₁	0.0612	[ – ]
-	-	-	β₂	0.9387	[ – ]

Step 11. Optimization Results in the Liquefaction and Saccharification Stages

Table shows the p̂ values obtained through the optimization algorithm. In the liquefaction stage, the proposed methodology obtained a lower J(p) value (3.8665 × 103) than the evaluation of the cost function over the developed model in ref (9) (2.0461 × 104). In terms of the relative error between the model and data, the proposed model obtains 7.64% concerning 7.96% of the previous model. This result can be interpreted as an improvement of this model. For the other side, in the saccharification stage, the J(p) value obtained by the proposed methodology is also lower (2.2062 × 103) than the evaluation of the cost function over the developed model in ref (9) (7.4025 × 103), but the difference is less than the result obtained in the liquefaction stage. Moreover, the improvement is more significant, indicating a better adjustment using the proposed model. Model improvement was also observed, with the relative error between each model and the respective experimental data being calculated. The proposed model obtains 7.35% concerning 11.99% of the previous model.

Table 2

Obtained Values for p̂

Liquefaction model			Saccharification model
Variable	Estimated value	Units	Variable	Estimated value	Units
	1.548671		K_I	75032.6049
	396.445826		K_S	554.9090
	182.857411		K₂	460614.7352
	7.152166		K₃	0.6699
h₁	– 0.343724	[ – ]	K₅	3915.2964
h₂	– 0.335595	[ – ]		137.8292
h₃	– 0.322011	[ – ]		0.0059
h₅	– 0.455272	[ – ]		10.9983
β₁	0.061203	[ – ]		2782.6009
β₂	0.938797	[ – ]		5651.7907
-	-	-	β₁	1.2043	[ – ]
-	-	-	β₂	1.3040	[ – ]

Figure shows the simulation of the proposed model with the identified parameters. In Figure a it is possible to see the match between the data and the model. In Figure b there is a difference between the data and the model in the initial phase of growth because of the supposition about oligosaccharides with more DP competing for the active site of the enzyme. As a consequence, there are reversible reactions among oligosaccharides with DP < 5 in the model.

Figure 2

Comparison between the models for the liquefaction stage: (a) glucose; (b) maltose; (c) maltodextrines; (d) temperature.

Comparison between the models for the liquefaction stage: (a) glucose; (b) maltose; (c) maltodextrines; (d) temperature. In Figure c it was impossible to obtain a better result because this output variable is a combination of two state variables; however, this result is better than the previous result obtained in ref (9). Finally, Figure d shows no significant change in temperature because the process has a feedback control loop of control not incorporated in this study. Despite this, the developed methodology aims at adjusting the bioprocess dynamics correctly for all outputs. For G1, G2, β1G3 + β2G5, and T, the relative errors in the liquefaction stage were 12.72%, 12.33%, 4.96%, and 0.54%, respectively. In other words, the proposed model obtained relative errors in the liquefaction stage <13%. It is important that the best adjustment in the concentration dynamic in the liquefaction stage was the maltodextrin concentration. The last idea indicates that the changes introduced into the model for that output were positive for bioprocess modeling. Figure shows the simulation of the proposed model with the identified parameters. In Figure a–c it is possible to see the match between the data and the model. This result was possible because the amount of data was significantly greater than in the liquefaction stage. Finally, Figure d shows no significant change in temperature because this variable was controlled. Therefore, the developed methodology aimed to adjust the bioprocess dynamics correctly for all outputs. For G1, G2, β1G3 + β2G5, and T, the relative errors in the saccharification stage were 5.66%, 20.87%, 2.74%, and 0.13%, respectively. In other words, the proposed model obtained relative errors in the saccharification stage <6%, except for maltose, where the error was 20.87% (Figure ). Similar to the liquefaction stage, the best adjustment in a concentration dynamic in the liquefaction stage was the maltodextrin concentration. The last idea indicates that the change in the use of the constraint for ψ was positive for bioprocess modeling.

Figure 3

Comparison between the models for the saccharification stage: (a) glucose; (b) maltose; (c) maltodextrines; (d) temperature.

Conclusions

A methodology to get a PBSM of the enzymatic hydrolysis process of amylaceous materials was applied in this work. The methodology considers the identifiability concept which is evaluated by using the STRIKE-GOLDD tool. This new perspective takes into account the viability of the parameter identification on the mathematical structure of the dynamic system to be iterated. Therefore, it is possible to reach a simplification of the modeling process and to obtain better results when applying optimization algorithms. To show the effectiveness of the methodology, the enzymatic hydrolysis of amylaceous materials was carried out in the liquefaction and saccharification stages. The obtained results have a less complex mathematical structure, in comparison with previous models found in the literature. The errors with respect to the experimental data were 7.64% and 7.35% for the liquefaction and saccharification stages, respectively. A relevant modification to the model was the change in the definition of the concentration of maltodextrins based on the parameters β1 and β2 depending on the stage. This fact was associated with additional restrictions for ψ that allowed a better fit of the model. Finally, the aim is to present greater robustness in the model and simulation of biological processes based on their phenomenology and parametric identification, achieving user-friendly methodologies.

10 in total

Review 1. Systems biology: parameter estimation for biochemical models.

Authors: Maksat Ashyraliyev; Yves Fomekong-Nanfack; Jaap A Kaandorp; Joke G Blom
Journal: FEBS J Date: 2009-02 Impact factor: 5.542

2. Kinetic modeling and parameter estimation in a tower bioreactor for bioethanol production.

Authors: Elmer Ccopa Rivera; Aline Carvalho da Costa; Betânia Hoss Lunelli; Maria Regina Wolf Maciel; Rubens Maciel Filho
Journal: Appl Biochem Biotechnol Date: 2007-10-10 Impact factor: 2.926

3. Biophysical features of cereal endosperm that decrease starch digestibility.

Authors: Laura Roman; Manuel Gomez; Cheng Li; Bruce R Hamaker; Mario M Martinez
Journal: Carbohydr Polym Date: 2017-02-20 Impact factor: 9.381

4. Model-based scale-up methodology for aerobic fed-batch bioprocesses: application to polyhydroxybutyrate (PHB) production.

Authors: Gloria Milena Monsalve-Bravo; Fabricio Garelli; Md Salatul Islam Mozumder; Hernan Alvarez; Hernan De Battista
Journal: Bioprocess Biosyst Eng Date: 2015-01-30 Impact factor: 3.210

5. Impact of ancient cereals, pseudocereals and legumes on starch hydrolysis and antiradical activity of technologically viable blended breads.

Authors: Concha Collar; Teresa Jiménez; Paola Conte; Costantino Fadda
Journal: Carbohydr Polym Date: 2014-07-18 Impact factor: 9.381