| Literature DB >> 30347746 |
Albert Uhoraningoga1, Gemma K Kinsella2, Gary T Henehan3, Barry J Ryan4.
Abstract
The production of high yields of soluble recombinant protein is one of the main objectives of protein biotechnology. Several factors, such as expression system, vector, host, media composition and induction conditions can influence recombinant protein yield. Identifying the most important factors for optimum protein expression may involve significant investment of time and considerable cost. To address this problem, statistical models such as Design of Experiments (DoE) have been used to optimise recombinant protein production. This review examines the application of DoE in the production of recombinant proteins in prokaryotic expression systems with specific emphasis on media composition and culture conditions. The review examines the most commonly used DoE screening and optimisation designs. It provides examples of DoE applied to optimisation of media and culture conditions.Entities:
Keywords: design of experiments; process optimization; recombinant protein production; response surface methodology; screening design
Year: 2018 PMID: 30347746 PMCID: PMC6316313 DOI: 10.3390/bioengineering5040089
Source DB: PubMed Journal: Bioengineering (Basel) ISSN: 2306-5354
Summary of the most widely used recombinant expression strains from E. coli and Bacillus species outlining their advantages and disadvantages.
| General Advantages | Disadvantages | References | |
|---|---|---|---|
| Most common | Rapid expression, high yield, ease of culture and gene modification, cost effective. | Post translational modification not possible. | [ |
| BL21, | |||
| Most common | |||
|
| Preferred for homologous expression of some enzymes (e.g., proteases and amylases), | Contains proteases, which may hydrolyse recombinant proteins. | [ |
Figure 1A typical DoE workflow in protein production. Case study A illustrates the optimization of recombinant lipase KV1 expression in E. coli [84] where a screening process was not required since the number of factors affecting this enzyme is not large (four factors). The four factors (A, B, C, D), therefore, underwent optimisation by Central Composite Design (CCD) under Response Surface Methodology (RSM) which resulted in a yield increase in protein expression of 3.1-fold. Case study B describes the optimisation process for high yield production of recombinant human interferon-γ [85]. In this case, the number of factors involved is large (nine factors) and they were subjected to a screening process before optimisation. Four factors (X1, X2, X3, X7) out of nine were identified by Plackett-Burman Design (PBD) based screening to be the most influential and subsequently used for further optimisation. A Box-Benkhn Design (BBD) also under RSM was selected to optimize the screened factors and increased the production of human interferon-γ up to 5.1 fold. Further details of these two case studies can be found in the references provided and similar cases are found in Tables 4 and 7.
Figure 2Comparison between Design of Experiments (DoE) and One-Factor-at-A-Time (OFAT) by examining the effect of two parameters, P1 (Parameter 1) and P2 (Parameter 2). (a) OFAT is performed using more experiments than DoE (each black dot represents an experiment) and does not identify the true optimum (indicated as a red oval). However, with the DoE approach (b) fewer experiments are used and the likelihood of finding the optimum conditions (in red) for the process being studied is high. With DoE the combined or interaction effect of P1 and P2 on the response can be identified and measured. The ovals indicate production yields, blue indicates the lowest yields, whereas red indicates highest yields, where the optimum is found. The DoE approach also identifies a pathway to the optimum response (indicated by the arrow).
Figure 3A typical DoE workflow for the optimisation of recombinant protein production. The figure describes the main steps involved in the experimental design when both screening and optimisation designs are used. (1) The objectives of the study are defined including the selection of factors, levels and responses. (2) Process variables and expected responses are identified; the process variable levels (for a 2 level study) are set as high (+1), low (−1), (on occasion a 0 point is included). (3) The experimental screening design is selected based on the objectives of the study and the number of factors involved. (4) A mathematical model is built with certain conditions to meet the desired objectives (e.g., measurement of all the desired responses, process stability and accurate approximation by polynomial models). (5) The response data are analysed and visualised using plots for ease of data interpretation. At this stage, a reduced number of factors (i.e., the most influential) are retained for the subsequent optimisation phase. (6) Further optimisation can be carried out (via an optimisation DoE design).
An example of a two level experimental design having nine factors that are known to influence recombinant protein expression. In this case the nine factors relate to two experimental components; media composition and induction conditions. When planning the screening phase the selected factors (yeast extract, tryptone, glycerol, NaCl, Inoculum size, IPTG concentration, induction temperature, incubation time and pH, labelled X1 to X9 respectively) and associated levels (high, defined as +1 and low defined as −1 are selected to cover the intended experimental space (i.e., to cover the productive range). The levels are defined as the range between the known working limits.
| Factors | Levels | ||
|---|---|---|---|
| Low | High | ||
| Media composition | X1 Yeast Extract | − | + |
| X2 Tryptone | − | + | |
| X3 Glycerol | − | + | |
| X4 NaCl | − | + | |
| Induction condition | X5 Inoculum size | − | + |
| X6 IPTG concentration | − | + | |
| X7 Induction temperature | − | + | |
| X8 Incubation time | − | + | |
| X9 pH | − | + | |
A comparison of DoE screening designs commonly used in optimizing recombinant protein production. The table lists the types of screening designs; the effect explained by the model along with number of factors and associated number of runs (a rune refers to an experiment). It should be noted that extra runs (such as those related to central points) can be added when required. Custom design is more flexible and allows the designer to select the number of experimental runs.
| Factors | |||||||
|---|---|---|---|---|---|---|---|
| Number of Runs | |||||||
| Screening Design | Effect explained by the model | 2 | 3 | 4 | 5 | 6 | 7 |
| Full Factorial Design | Main effect and 2 factor interactions | 4 | 8 | 16 | 32 | 64 | 128 |
| Fractional Factorial Design | Main effect only | - | - | - | 8 | 8 | 8 |
| Main effect and 2 factor interactions | - | 8 | 8 | 16 | 16 | 16 | |
| Main effect and 2 factors interactions | - | - | 16 | 16 | 32 | 64 | |
| Plackett-Burman Design | Main effect only | - | - | - | - | 12 | 12 |
| Definitive Screening Design | Main effect and 2 factor interaction | - | 13 | 13 | 13 | 13 | 17 |
| Main effect, 2 factor interaction and quadratic effects | - | 17 | 17 | 17 | 17 | 22 | |
| Custom Design | Main effect only | ≥3 | ≥4 | ≥5 | ≥6 | ≥7 | ≥8 |
A selection of the widely used screening designs and their application in identifying the influential factors on the production of recombinant proteins.
| Host Organism | Protein Involved | Screening Design | Factors Studied | Screened Significant Factors | Reference |
|---|---|---|---|---|---|
|
| Xylanase | Full Factorial Design | Media composition | Xylan, casein hydrolysate, NH4Cl | [ |
|
| Non-structural protein NS3 | Full Factorial Design | Culture condition | temperature, induction length | [ |
|
| Fibrinolytic enzyme | Full Factorial Design | Media composition | pH, maltose and NaH2PO4 | [ |
|
| Zinc-metalloprotease (SVP2) | Fractional Factorial Design | Media composition and culture condition | IPTG and Ca2+ ion concentration and temperature | [ |
|
| Soluble pneumolysin | Fractional Factorial Design | Media composition and culture condition | Temperature, tryptone and kanamycin | [ |
|
| L-asparaginase | Plackett-Burman | Media composition | Soya bean meal, asparagine, woodchips, NaCl | [ |
|
| Vascular endothelial growth factor | Plackett-Burman design | Media composition and culture condition | Glycerine, inducing time, peptone | [ |
|
| L-asparaginase | Plackett-Burman Design | Culture condition | pH, casein hydrolysate and corn steep liquor | [ |
|
| Human interferon gamma | Plackett-Burman Design | Media composition | Gluconate, glycine, KH2PO2 | [ |
|
| Chitinase | Plackett–Burman Design | Media composition | Yeast extract and K2HPO4, KH2PO4 | [ |
Identification of the statistically significant factors during a screening process using a Fractional Factorial Design. The table depicts the effect, positive or negative and p-value for seven factors examined (labelled X1 to X7 respectively). The effect of each factor, positive (+) or negative (−) is identified during the analysis stage using the statistical formula imbedded in DoE software used (JMP in this example). Interaction effects are also identified (e.g., X5*X1 and X3*X7; where * indicates an interaction). The p-value of each factor is also shown, at the significance level of 0.05. In this example, the highlighted factors, (X3, X6, X1), were identified as the most influential based on their high effects (−1.11273, 0.2252, 0.17492) and p-values < 0.05 (0.001, 0.0143, 0.0296). Thus, only factors X3, X6 and X1 are statistically significant at the level of 0.05, with X3 having a negative effect while X6 and X1 have positive effects. Other factors, X2, X4, X5, X7 and interactions X5*X1, X3*X7 are not statistically significant.
| Factor | Effect | Relative Effect | |
|---|---|---|---|
| X3 | −1.11273 |
| 0.001 |
| X6 | 0.2252 | | 0.0143 |
| X1 | 0.17492 | | 0.0296 |
| X4 | 0.06408 | | 0.2215 |
| X7 | 0.04154 | | 0.4112 |
| X2 | −0.07970 | | 0.1421 |
| X5 | 0.00233 | | 0.9664 |
Figure 4A comparative illustration of screening and optimisation designs. (a) In screening designs a large number of factors, with reduced number of runs, are used to screen for important factors affecting the process. (b) In optimisation designs, a reduced number of factors, with large number of runs, are utilised to find the optimum conditions for high yield of recombinant protein.
Common CCD components and the possible total number of runs. Factorial, axial and central points are the main components of a typical CCD and the total number of runs is dictated by the number of factors being tested. As the number of factors increases, the number of component points increase and so the total number of runs. In some cases, CCDs do not contain axial points, especially when the variance of model prediction is not suspected [140].
| Number of Factors | Number of Factorial Points | Number of Axial Points | Number of Central Points | Total Number of Runs |
|---|---|---|---|---|
| 2 | 4 | 4 | 5 | 13 |
| 3 | 8 | 6 | 6 | 20 |
| 4 | 16 | 8 | 7 | 31 |
| 5 | 16 | 10 | 6 | 32 |
| 6 | 32 | 12 | 9 | 53 |
| 7 | 64 | 14 | 14 | 92 |
CCD has been extensively used to optimise the production of recombinant proteins (see Table 7).
RSM methods used to optimise the production of recombinant proteins along with their effect on yield and citing reference.
| Microorganism | Recombinant Protein | RSM Methods | Optimised Factors | Optimised vs. Non-Optimised Yield | Reference |
|---|---|---|---|---|---|
| Superoxide dismutase | Box–Behnken design | Tryptone, tween-80, lactose | Enzyme activity increase by 1.54-fold | [ | |
| Human interferon beta | Box–Behnken Design | Temperature, cell density, NaCl | hIFN- β concentration increase by 5-fold | [ | |
| Human interferon gamma | Box–Behnken Design | Temperature, biomass concentration, NaCl | hIFN- γ concentration increase by 13-fold | [ | |
|
| β-glucosidase | Box-Behnken Design | Sorbitol, MeOH, pH | Enzyme activity increase by 3.3-fold | [ |
| Amylase | Central Composite Design | Soybean meal, yeast extract, wheat bran | Enzyme yield increase by 1.25-fold | [ | |
| Central Composite Design | Starch, yeast extract, glycerol, peptone | Enzyme activity reached 17.54 IU/mL | [ | ||
|
| Cyclodextrin glucanotransferase | Central composite Design | IPTG, arabinose B, post induction temperature | Enzyme activity increase by 3.45-fold | [ |
|
| Cytochrome 2C9 protein | Central Composite Design | Ampicillin, chloramphenicol, IPTG, peptone | Enzyme production increased by 1.05- fold | [ |
| Interferon beta | Central Composite Design | DCW (dry cell weight), IPTG | Production increase more than 3-fold | [ | |
| L-Asparaginase | Central Composite Design | Tryptone, yeast extract, peptone, CaCl2 | Enzyme activity reached 17,386 U/L | [ | |
| Peptide T-20 | Central Composite Design | NPK, IPTG, post induction time | Production increase by more than 2-fold | [ | |
| TaqI endonuclease | Central Composite Design | Glucose, (NH4)2HPO4, KH2PO4, MgSO4.7H2O | Enzyme yield increase by about 3.6-fold | [ | |
|
| Xylanase | Central Composite Design | Glucose, (NH4)2HPO4, CK2HPO4, DKH2PO4, MgSO4 | Production increase by 1.7- fold | [ |
| Bromelain | Central Composite Design | Temperature, inducer concentration, post induction period | Enzyme activity increase by 1.3-fold | [ | |
| Phytase | Central Composite Design | Tryptone, yeast extract, NaCl | Production increase by 2.78-fold | [ | |
| Chitinase | Central Composite Design | Temperature, incubation time | Total activity increased by 1.54-fold | [ | |
| Zinc metalloprotease | Central Composite Design | IPTG, Ca2+, induction time | Production increase by 15-fold | [ | |
| Carboxymethyl-Cellulose | Central Composite Design | Rice bran tryptone and initial pH of medium | Production increase by 3-fold | [ | |
| Phytase | Central Composite Design | Yeast extract, tween-80, methanol | Specific activity increase by 21.8-fold | [ | |
| MBP-Heparinase | Central Composite Design (Orthogonal) | Yeast extract, glucose, Ca2+, OD600 | Specific activity increase by 2.5-fold | [ | |
| Central Composite Design (Rotatable) | Inoculation level, induction-starting time, lactose, induction temperature, induction time | Enzyme activity increase by 4.6-fold | [ |
Central Composite Design of four independent factors (labelled X1, X2, X3, X4 respectively) studied at two levels (+1 and −1) including two central point replicates (0 and 0). The table also shows different types of common responses found in optimisation process; (1) Actual data refers to experimental results; (2) predicted data are generated by software based on the design and actual results. The residuals are the difference between actual and predicted data.
| Coded Values | Responses | ||||||
|---|---|---|---|---|---|---|---|
| Runs | X1 | X2 | X3 | X4 | Actual | Predicted | Residuals |
| 1 | −1 | 1 | −1 | 1 |
|
|
|
| 2 | −1 | −1 | 1 | 1 | |||
| 3 | 0 | 0 | 0 | 0 | |||
| 4 | −1 | 0 | 0 | 0 | |||
| 5 | −1 | 1 | 1 | −1 | |||
| 6 | 1 | 1 | 1 | 1 | |||
| 7 | 1 | 1 | −1 | 1 | |||
| 8 | −1 | 1 | 1 | 1 | |||
| 9 | 1 | −1 | −1 | 1 | |||
| 10 | 0 | −1 | 0 | 0 | |||
| 11 | 1 | 1 | 1 | −1 | |||
| 12 | 0 | 0 | 0 | 0 | |||
| 13 | 0 | 0 | 1 | 0 | |||
| 14 | 0 | 1 | 0 | 0 | |||
| 15 | 1 | 0 | 0 | 0 | |||
| 16 | 0 | 0 | 0 | 1 | |||
| 17 | 1 | 1 | −1 | −1 | |||
| 18 | −1 | 1 | −1 | −1 | |||
| 19 | −1 | −1 | 1 | −1 | |||
| 20 | −1 | −1 | −1 | 1 | |||
| 21 | 1 | −1 | −1 | −1 | |||
| 22 | 0 | 0 | 0 | −1 | |||
| 23 | 1 | −1 | 1 | 1 | |||
| 24 | 0 | 0 | −1 | 0 | |||
| 25 | 1 | −1 | 1 | −1 | |||
| 26 | −1 | −1 | −1 | −1 | |||
| Responses (e.g., actual, predicted and residues) data are utilised during the optimisation analysis to evaluate the validity of the model and determine the optimum. | |||||||
Figure 5A typical DoE analysis route from initial Experiments to validation and conclusions. The rationale for data analysis is to evaluate the effects of variables on response. Graphical Representation shows how the data are distributed. The Statistical Analysis and Probability stage identifies variables that are statistically significant. This will identify variables that are important to bring forward to the subsequent optimisation step based on their statistical significance. The Visualization and Interpretation stage will focus on representational analysis that identifies optimal levels.
An example of Analysis of Variance (ANOVA) for Response Surface Methodology fitted to a second-order polynomial equation. The table depicts R-squared (R2), Adjusted R-squared (Adj-R2), Predicted R-squared (Pred-R2), degree of freedom (DF), adjusted sum of square (Adj SS), adjusted mean square (Adj MS), F-value and p-value of the model.
| Source | DF | Adj SS | Adj MS | ||
|---|---|---|---|---|---|
| Model | 11 | 40.4149 | 3.67408 | 1255.77 | 0.0001 |
| Linear | 4 | 3.1531 | 0.78828 | 269.43 | 0.0001 |
| Square | 4 | 35.3209 | 8.83022 | 3018.09 | 0.0001 |
| Interaction | 3 | 1.9409 | 0.64697 | 221.13 | 0.0001 |
| Residues | 40 | 0.117 | 0.00293 | ||
| Lack-of-fit | 13 | 0.00369 | 0.00284 | 0.96 | 0.515 |
| Pure error | 27 | 0.0802 | 0.00297 | ||
| Total | 51 | 40.532 | |||
Figure 6A linear plot estimating accuracy of a regression model by comparing actual versus predicted data sets. The plot determines the correlation between the model’s predictions and actual data and thereby indicates how well the model fits the data. The closer the value of R is to 1, the better the fit of the line to the data and the goodness of the model.
Figure 7An example of response surface and contour plot adapted from Nelofer et al., 2012 [163]. The figure depicts the two-factor interaction (in this case the two factors explored are glucose and culturing temperature) where one factor influences the response of another factor. It also shows the visualisation of optimum levels. The colour scale indicates the level of lipase activity (IU/mL) where red indicates the region of optimal yield, yellow indicates medium yield, and green indicates low yield. In this case, the optimal enzyme activity (33 IU/mL) was achieved at a culture temperature between 30 °C and 34 °C; and a glucose concentration between 40 g/mL–50 g/mL. Image used with permission.