Zachary Cosenza1, Raul Astudillo2, Peter I Frazier2, Keith Baar3, David E Block4. 1. Department of Chemical Engineering, University of California Davis, Davis, California, USA. 2. Operations Research and Information Engineering, Cornell University, Ithaca, New York, USA. 3. Departments of Neurobiology, Physiology, and Behavior and Physiology and Membrane Biology, University of California Davis, Davis, California, USA. 4. Department of Viticulture and Enology, University of California, Davis, USA.
Abstract
Culture media used in industrial bioprocessing and the emerging field of cellular agriculture is difficult to optimize due to the lack of rigorous mathematical models of cell growth and culture conditions, as well as the complexity of the design space. Rapid growth assays are inaccurate yet convenient, while robust measures of cell number can be time-consuming to the point of limiting experimentation. In this study, we optimized a cell culture media with 14 components using a multi-information source Bayesian optimization algorithm that locates optimal media conditions based on an iterative refinement of an uncertainty-weighted desirability function. As a model system, we utilized murine C2C12 cells, using AlamarBlue, LIVE stain, and trypan blue exclusion cell counting assays to determine cell number. Using this experimental optimization algorithm, we were able to design media with 181% more cells than a common commercial variant with a similar economic cost, while doing so in 38% fewer experiments than an efficient design-of-experiments method. The optimal medium generalized well to long-term growth up to four passages of C2C12 cells, indicating the multi-information source assay improved measurement robustness relative to rapid growth assays alone.
Culture media used in industrial bioprocessing and the emerging field of cellular agriculture is difficult to optimize due to the lack of rigorous mathematical models of cell growth and culture conditions, as well as the complexity of the design space. Rapid growth assays are inaccurate yet convenient, while robust measures of cell number can be time-consuming to the point of limiting experimentation. In this study, we optimized a cell culture media with 14 components using a multi-information source Bayesian optimization algorithm that locates optimal media conditions based on an iterative refinement of an uncertainty-weighted desirability function. As a model system, we utilized murine C2C12 cells, using AlamarBlue, LIVE stain, and trypan blue exclusion cell counting assays to determine cell number. Using this experimental optimization algorithm, we were able to design media with 181% more cells than a common commercial variant with a similar economic cost, while doing so in 38% fewer experiments than an efficient design-of-experiments method. The optimal medium generalized well to long-term growth up to four passages of C2C12 cells, indicating the multi-information source assay improved measurement robustness relative to rapid growth assays alone.
Every bioprocess in which cells are the final product or used in the production process requires suitable culture conditions for cell growth and product quality. In the rapidly growing cellular agriculture/cultivated meat industry, where cells are grown for consumption to replace carbon‐intensive and often unethical animal agriculture, cost‐effective media has been identified as the most critical aspect in scale‐up and commercialization (O'Neill et al., 2021). Optimizing these conditions is difficult due to a large number of media components with nonlinear and interacting effects between cells, medium, matrix material, and reactor environment (Brunner et al., 2010). Typically, culture media used for processes in cellular agriculture consist of a basal medium of glucose, amino acids, vitamins, and salts (such as the common Dulbecco's Modified Eagle Medium [DMEM]) supplemented with fetal bovine serum (FBS) for improved cell survival. FBS is an undefined, animal‐derived serum consisting of proteins, hormones, and other large molecular weight components, and contributes substantially to the cost of media (van der Valk, 2018). Even when enriched with additional growth factors or FBS, media is often far from optimal for all cell types and requires adaptation and/or optimization (Kolkmann et al., 2020), which is difficult for media mixtures with >30 components, as is common in cell culture.To manage this complexity, design‐of‐experiments (DOE) methods are often employed in which factors (concentrations or environmental conditions) are set to a user‐specified value (usually “high” or “low”) and outputs are measured (Singh et al., 2017; Zhang & Mills, & Block, 2009). These DOE designs are arranged in such a way that statistically meaningful correlations can be found in fewer experiments than techniques like intuition, “one‐factor‐at‐a‐time” sequences, or random designs. A more advanced form of this is to use sequential, model‐based DOEs such as a radial basis function (Cosenza & David, 2020; Zhang & Block, 2009; Zhang et al., 2007) or Gaussian Process (GP) (Kotthoff et al., 2021), combined with an optimizer/sampling policy, to automatically select sequences of optimal designs. These approaches are often more efficient than traditional DOE at optimizing systems using fewer experiments (Cosenza et al., 2021) and allow for more natural incorporation of process priors (Coleman & Block, 2006), measurement noise (Ankenman et al., 2010; Tracey & Wolpert., 2018), probabilistic output constraints and constraint learning (Letham et al., 2019), multiobjective (Belakaria et al., 2019), multipoint (Wang et al., 2020), and multi‐information source designs (Jian & Frazier, 2018; Shu et al., 2021; Takeno et al., 2019).Even with these methods available, limitations still exist. In previous work, we applied a machine learning approach to optimize complex media design spaces but had limited success due to the difficulty in measuring cell number for multipassage growth (Cosenza et al., 2021). Therefore, in this study, we utilized a multi‐information source (IS) Bayesian model to fuse “cheap” measures of cell biomass (rapid chemical assays which can be done at scale) with more “expensive” but higher quality measurements (cell numbers over time which represents a high‐quality metric of growth media quality) to predict long‐term medium performance. We refer to the simpler and cheap assays as “low‐fidelity” IS, and more complex and expensive assays as “high‐fidelity” IS. While not always predictive of long‐term growth, these lower fidelity assays are at least correlated with cell health and can help in identifying interesting regions of the design space for further study with the high‐fidelity IS. We used this model, with Bayesian optimization (BO) tools, to optimize a cell culture medium with 14 components while minimizing the number of experiments, optimally allocating laboratory resources, and building process knowledge to improve our optimization scheme and model. In Section 2 we discuss the computational and experimental components of this BO method. In Section 3 we present the results of the BO method in comparison to a traditional DOE method, followed by Section 4 where we demonstrate the importance of fusing multiple sources of information to obtain relevant process knowledge and/or optimization results.
METHODS
Cells and media components
The system under consideration was the proliferation of C2C12 (ATCC) cells. These cells are immortalized muscle cells with similar metabolism and growth characteristics as other adherent cell lines useful in the cellular agriculture industry. Cells were stored in 70% DMEM (Gibco), 20% FBS (BioWest), 10% dimethyl sulfoxide (Thermo Fischer) freeze medium at −196°C until thawed. Vials were thawed to 25°C and the freezing medium was removed by centrifugation at 1500 g for 5 min. The centrifuged cell pellet was resuspended in 17 ml of DMEM with 10% FBS and placed on 15 cm sterile plastic tissue culture dishes (at about 106 cells/plate) (Cellstar, Greiner Bio‐One). Cells were incubated in a 37°C and 5% CO2 environment. After 24 h the medium was removed, the culture dish‐washed with Phosphate Buffer Solution (PBS) (Gibco), and fresh DMEM with 10% FBS was introduced. After an additional 24 h, cells were harvested using tripLE solution (Gibco), diluted in PBS, and counted using Countess II with trypan blue exclusion and disposable slides (Invitrogen). The process of removing cells from a plate, counting, and re‐plating them with fresh medium is called subculturing or passaging. How well the C2C12 cells survive and grow after passaging is indicative of their long‐term potential in a large cellular agriculture process.The design space was comprised of the components and minimum/maximum concentrations listed in Table 1. These components were chosen because they are often used to supplement standard DMEM to improve cell growth; this represents a reasonable test case for the industrial application of these multi‐IS BO methods to the cellular agricultural industry. The composition of standard DMEM (such as the medium used above), is shown in Table 3, and should not be confused with the base DMEM “supplement” (Gibco), which contains only amino acids, trace metals, salts, and vitamins and none of the other 14 components. pH and osmolarity are not controlled in this study, so act as latent variables.
Table 1
Media components.
Abrev.
Component
Conc. min (mg/ml)
Conc. Max (mg/ml)
Cost
Use in cell culture
T
Transferrin
0
0.026
6.53E−03
Iron transport, homeostasis
I
Insulin
0
0.035
1.43E−02
GF for glucose and amino acid utilization
SS
Sodium selenite
0
1.75E−05
6.4E−09
Chemical pathways
AA
Ascorbic acid
0
8.75E−03
9.8E−06
Antioxidant
Glu
Glucose
0
15.75
0.2
Carbon source
Gluta
Glutamine (GlutaMAX)
0
1.519
2.09E−02
Carbon source
Albu
Albumin (AlbuMAX)
0
1.4
4.94
Stabilization of small molecules
FBS
FBS (% v/v)
0
17.5
14.00
Shear protection, cytokines, other
H
Hydrocortisone
0
1.75E−05
1.1E−05
Proliferation, differentiation, inhibition
D
Dexamethasone
0
7.00E−04
7.2E−03
Short‐term proliferation, muscle breakdown
P
Progesterone
0
1.75E−05
4.0E−07
Proliferation
Esd
Estradiol
0
8.75E−06
1.6E−06
Proliferation
Ethan
Ethanolamine
0
6.65E−03
6.1E−06
Phospholipid synthesis
Glutath
Glutathione
0
3.50E−03
6.0E−04
Antioxidant, thiol chemical pathways
–
DMEM supplement (% v/v)
–
***54.3
2.1E−02
Amino acids, vitamins, salts, buffer
Note: All components are shown were stored as per manufacturers (PreproTech unless specified) instructions in stock solutions. The concentration (mg/ml) of all media was between the minimum and maximum listed. The cost shown is a unitless scalarization of the relative economic cost of each component.
All media have a 54.3% v/v (volume percent) base of DMEM supplement (liquid form, no glucose, glutamine, or FBS). Remaining volume (minus component volumes) was made up in water.
Table 3
Optimal media.
Abrev.
Component conc. (mg/ml)
BO
DOE
DMEM (control)
T
Transferrin
2.16E−02
0
0
I
Insulin
2.27E−02
0
0
SS
Sodium selenite
6.34E−06
0
0
AA
Ascorbic acid
0
0
0
Glu
Glucose
8.97
6.75
4.50
Gluta
Glutamine (GlutaMAX)
1.32
0.65
0.43
Albu
Albumin (AlbuMAX)
0
0
0
FBS
FBS (% v/v)
10.1
7.5
10.0
H
Hydrocortisone
0
0
0
D
Dexamethasone
0
0
0
P
Progesterone
1.75E−05
0
0
Esd
Estradiol
8.75E−06
0
0
Ethan
Ethanolamine
3.64E−03
0
0
Glutath
Glutathione
2.49E−03
0
0
–
DMEM supplement (%v/v)
54.3
54.3
54.3
Outputs (dimensionless)
Num. experiments
81
133
–
D(x)
Desirability
0.94
0.40
0.44
y(x)
Proliferation metric
2.82
0.86
1.00
c(x)
Cost metric
8.22
6.12
8.09
Note: Concentrations (mg/ml) of best BO‐designed medium alongside that found by DOE and the DMEM control used throughout this study. The resulting objective function , cell number , and cost of each medium are shown with the required number of experiments to get the optimal result.
Media components.Note: All components are shown were stored as per manufacturers (PreproTech unless specified) instructions in stock solutions. The concentration (mg/ml) of all media was between the minimum and maximum listed. The cost shown is a unitless scalarization of the relative economic cost of each component.Abbreviations: DMEM, Dulbecco's modified Eagle's medium; FBS, fetal bovine serum.All media have a 54.3% v/v (volume percent) base of DMEM supplement (liquid form, no glucose, glutamine, or FBS). Remaining volume (minus component volumes) was made up in water.
Cell growth experiments and assays
For the high‐fidelity IS, 750 µl of cell suspension containing 60,000 cells were placed in a six well plate (three replicates) with 2.25 ml of the test medium. For low‐fidelity IS, 25 µl of cell suspension containing 2000 cells were placed in 96 well plates (four replicates and two control wells without cells) with 75 µl of the test medium. All experiments thus had 6250 cells/cm2 and 312.5 ml/cm2 of media. After 72 h, all wells were measured using the IS methods shown in Table 2.
Table 2
Information sources for cell number measurement.
Information source
Time required
Format
Mechanism of action
Passage 2 (high‐fidelity)
6 Days
6 Wells
Trypsinization/trypan blue exclusion
Passage 1***
3 Days
6 Wells
Trypsinization/trypan blue exclusion
AlamarBlue (low‐fidelity)
3 Days
96 Wells
Mitochondria activity/colorimetric
LIVE Stain (low‐fidelity)
3 Days
96 Wells
Nuclear/fluorometric
Note: These sources of information (IS) were used to approximate and model C2C12 cell number. In this study, Passage 2 cell numbers were considered the highest‐fidelity IS, while AlamarBlue and the LIVE stains were the lowest.
Passage 1 cell number measurements were necessary to get Passage 2, so were included as a separate IS. Every high‐fidelity IS measurement of a medium was also made in parallel with a low‐fidelity measurement. Their inter‐IS correlations are shown in Figure 8c.
Information sources for cell number measurement.Note: These sources of information (IS) were used to approximate and model C2C12 cell number. In this study, Passage 2 cell numbers were considered the highest‐fidelity IS, while AlamarBlue and the LIVE stains were the lowest.Passage 1 cell number measurements were necessary to get Passage 2, so were included as a separate IS. Every high‐fidelity IS measurement of a medium was also made in parallel with a low‐fidelity measurement. Their inter‐IS correlations are shown in Figure 8c.
Figure 8
Kernel plots and IS distributions. (a) and (b) Show the output of the kernel for all data collected and a simulated data set where only is varied from , respectively. Darker regions indicate large values of , and thus a correlation. Also (c) the various IS cell number/correlate distributions (diagonal histograms) are shown. Above the diagonal (squares) are the actual inter‐IS correlations for each IS with their respective R
2 values, and below the diagonal (circles) are the predicted inter‐IS correlations for a random data set.
The AlamarBlue assay required staining wells with 10% v/v (10 µl) AlamarBlue stock solution (Invitrogen), 4 h of incubation in a 37°C and 5% CO2 incubator, and measurement of 570 nm and 600 nm absorbance wavelengths (Molecular Devices, ImageXpress Pico) as well as the control wells and (no cells) to get AlamarBlue reduction metric .The LIVE assay required that the test wells be washed with PBS, and 100 µl of 1 µM LIVE stain Calcein AM (Biotium) be introduced into the test wells and incubated for 1.5 h at 37°C and 5% CO2. The biomass/cell number correlates was then measured using a fluorometer (Molecular Devices, ImageXpress Pico) at Ex/Em 494/530 fluorescein filters and calculated using the emission . Both and metrics are correlated with cell number and thus were the low‐fidelity IS metric of cell number.We also measured the cell number using an automatic cell counter (Countess II) with trypan blue exclusion. This required trypsinization outlined in the previous section. Because we wished to measure long‐term cell viability, after the first cell count (Passage 1), we re‐seeded the cells under the same conditions and measured the cell count after an additional 72 h (Passage 2). The Passage 2 metric incorporated long‐term viability and the effect of trypsinization, and thus was the most robust measurement of cell number. All measurements/correlates of cell number (, , cell number) were reported as ratios, normalized to the DMEM control (whose concentration is shown in Table 3).Optimal media.Note: Concentrations (mg/ml) of best BO‐designed medium alongside that found by DOE and the DMEM control used throughout this study. The resulting objective function , cell number , and cost of each medium are shown with the required number of experiments to get the optimal result.Abbreviations: BO, Bayesian optimization; DMEM, Dulbecco's modified Eagle's medium; DOE, design‐of‐experiments; FBS, fetal bovine serum.
DOE method
To compare our BO method to a typical method used in the optimization of fermentation/bioprocess systems (Singh et al., 2017; Zhang & Mills, & Block, 2009), we used a DOE. We first screened all 14 components using a folded Plackett‐Burman (PB) design (which is a normal PB combined with an additional PB design with “high” factors set to “low” and vice versa) using the AlamarBlue IS. A linear model let us quantify the desirability of each component using the slope (desirability will be talked about in Section 2.4.2). The lower bound of the PB was set to (0 mg/ml) and the upper bound (28% of maximum concentration shown in Table 1, for example, glucose would be set to 0.28 × 15.75 mg/ml = 4.41 mg/ml) so that component quality could first be judged at modest concentration where nonlinear effects would be minimal. We then set unimportant or harmful () components to for the rest of the DOE study, then ran a Box‐Behnken (BB) design over the remaining useful components using AlamarBlue IS. The BB was used to estimate an interaction‐polynomial model and a multistart Newton's Method was used to find the ‐optimal concentrations inside the design space. If the optimal concentrations were found to be on any edge of the current BB, then the bounds of the design were shifted dimensionless units in that direction (steepest accent) and another BB was run using these new bounds. This was done because the optimal boundary of the design space is uncertain and needed to be found. The sequential BB was run until the optimal bounds were found or resources exhausted. The best medium was then reported as the optimal point found using multistart Newton's Method within the final optimal bounds.
BO method
Bayesian model
In standard BO, a function is modeled using a Gaussian Process (GP) (Poloczek et al., 2017), characterized by a prior mean and covariance , with the property that for any finite collection of points with dimensionality , the prior distribution of the output is normal . The prior determines the directionality and “wigglyness” of the function through the covariance kernel function , which models the relationship between any two points and . We chose the squared exponential function for the kernel to encode the belief that (i) similar experiments are more alike than dissimilar experiments governed by hyperparameters and and (ii) that the overall biological processes underlying the response surface are smooth with (iii) each component response governed by , allowing each component to have different degrees of “wigglyness.”If we collect observations of inputs and outputs from the generative process we can get the posterior distribution where the mean and variance of are given by Equations (2) and (3) respectively for homoscedastic noise with process noise variance .A more detailed discussion of GP models can be found in (Schulz et al., 2018). With a predictive model of the mean and variance of cell number, we can use past experimental data inputs and outputs to inform future process optimization.The key objective of this study was to maximize C2C12 cell growth/accumulation while minimizing media costs. To do this, measuring cell growth was critical but experimentally expensive. Less expensive assays can approximate cell growth, yet with reduced accuracy. Therefore, it was beneficial to use combinations of cell growth assays to facilitate experimentation while decreasing the overall experimental burden. This provides a balance between quality of information and experimental cost. To this end we adopted the multi‐information source GP model introduced by (Poloczek et al. (2017), which utilizes auxiliary information sources to model an underlying “true” function. We chose this model over the more typical multi‐task GP to encode the prior belief that the generative model includes an underlying “true” function and several biased/variable but correlated auxiliary functions, and to provide the flexibility of allowing different length‐scale hyperparameters for each IS to be learned from the data.Let us assume a generative model for a given medium combination at an IS indexed by . We, therefore, have one independent GP for the underlying function and one for each auxiliary IS deviation function for the th auxiliary IS (where references the underlying IS). To implement this, Equation (1) is modified by adding an additional kernel (squared exponential) to the original kernel anytime an auxiliary datapoint is referenced and and have the same IS index using an indicator function .Further details about the noise model of the GP, training, and using prior information can be found in Appendix 1. In addition to information on cell numbers, however, we wish to incorporate information about the process cost of . Therefore, we formulate a cost function where is a scaled marginal cost of each media component whose coefficients can be found in Table 1.
BO acquisition function
To maximize media utility, we wish to maximize while minimizing cost for the highest‐fidelity IS. Therefore, we posed this multi‐objective optimization problem as a single‐objective in the form of a desirability function (Akteke‐Ozturk et al., 2018) where cell number and cost are scaled as and respectively.
where is a feasibility indicator function that is non‐zero when the predicted is greater than or equal to some minimum cell number metric . We set and to exclude media that fail to be 50% as proliferative as the control media to preferentially select high‐performance media. We scale as a “smaller‐the‐better” metric where and so that we may solve our new cost‐aware objective function as a single‐objective problem .With a predictive multi‐IS GP modeling and computing from Equation (5), we can use it to suggest optimal media conditions . However, because we would like to solve for some optimal group of experiments rather than a single experiment (it is much more efficient to run multiple experiments at a time), we pose the optimization problem as a ‐dimensional multi‐point optimization problem for multiple optimal media conditions at once. This formulation (i) does not consider uncertainty when quantifying the value of a particular set of media components and (ii) does not have an analytical form. We solved both problems by using the multi‐point expected improvement function (Wang et al., 2020).
where is the ‐optimal desirability of the points collected and is the ‐optimal desirability of the points evaluated by . If (no improvement from evaluating ) then the “+” operator sets the improvement of the design to . Thus, with we can quantify the value of multiple points rather than just a single point . Evaluating for any group of experiments requires further mathematical treatment, which can be found in Appendix 2.
BO algorithm
The BO algorithm that designs optimal experiments is shown in Figure 1. After collecting some initial data, the multi‐IS GP is trained and found using multistart L‐BFGS‐B for some maximum allowable number of experiments (based on laboratory constraints). The L‐BFGS‐B optimizer was chosen because it performs well on high dimensional problems, can be ran with multiple restarts thus improving its global optimization capabilities, and has access to gradients and Hessian approximations thus reducing computational time. Because we want to optimize the high‐fidelity IS (long‐term growth as Passage 2) all calculations in the BO algorithm are done using the high‐fidelity IS prediction. With in hand, we now must find the optimal IS to sample. We start by defining the number of high‐fidelity samples we are willing to measure , with the remaining being low‐fidelity IS (Figure 2).
Figure 1
BO algorithm. This loop describes the Bayesian Optimization algorithm to maximize some acquisition function for a process given high‐fidelity IS and low‐fidelity IS samples per batch of experiments. After each batch, the process is repeated until process is optimized or resources are exhausted. BO, Bayesian optimization
Figure 2
Simulation results. (a) Number of cumulative high‐fidelity simulations used plotted against average (with standard deviation for five runs of the entire optimization loop) optimal output from across five sequential iterations of the optimization framework. The multi‐IS GP (solid) had access to total simulations with high‐fidelity and low fidelity simulations per iteration (multi‐IS GP has stopped one iteration early to reduce computational burden). The regular GP (dotted) only had access to the high‐fidelity simulations per iteration. Each test function had two biased low‐fidelity versions whose correlations are described by plots (b). Squares and triangles represent a given and respectively. The solid line represents the underlying high fidelity IS . Hyperparameter and acquisition function optimization was done using multistart L‐BFGS‐B implemented in botorch/scipy.
BO algorithm. This loop describes the Bayesian Optimization algorithm to maximize some acquisition function for a process given high‐fidelity IS and low‐fidelity IS samples per batch of experiments. After each batch, the process is repeated until process is optimized or resources are exhausted. BO, Bayesian optimizationSimulation results. (a) Number of cumulative high‐fidelity simulations used plotted against average (with standard deviation for five runs of the entire optimization loop) optimal output from across five sequential iterations of the optimization framework. The multi‐IS GP (solid) had access to total simulations with high‐fidelity and low fidelity simulations per iteration (multi‐IS GP has stopped one iteration early to reduce computational burden). The regular GP (dotted) only had access to the high‐fidelity simulations per iteration. Each test function had two biased low‐fidelity versions whose correlations are described by plots (b). Squares and triangles represent a given and respectively. The solid line represents the underlying high fidelity IS . Hyperparameter and acquisition function optimization was done using multistart L‐BFGS‐B implemented in botorch/scipy.We can pose the IS‐allocation problem as “which designs in has the highest in combination”? This requires calculating for all combinations in , and allocating the highest‐fidelity budget to the dominant combination. The remaining experiments can be allocated to low‐fidelity IS. New experiments are collected using the IS and component concentrations found, and the procedure looped until the process was optimized to satisfaction or resources are exhausted.We started our BO method by initialization with 10 experiments according to Latin Hypercube design similar to (Poloczek et al., 2017) (10 experiments being the approximate capacity of our laboratory at any given time). The algorithm allocates experiments with high‐fidelity IS and low‐fidelity IS using the combinatorial heuristic described above for the optimal group . This was repeated seven times, with iterative training and optimization stages to improve our model while simultaneously find optimal media. After 80 experiments (we stopped after exhausting our cell bank) a final high‐fidelity IS experiment was performed at the theoretical optima , for 81 experiments total.
Computational environment and packages
Hardware used: Dell Precision 5820 Tower, Intel Xeon W‐2145 DDR4‐2666 Processor (3.7 GHz), 32 GB Memory. Software used: python 3.9.7 (for all programming), gpytorch 1.3.0, pytorch 1.8.1, and botorch 0.4.0 (for modeling and Bayesian optimization), pydoe 0.3.8 (for initialization using Latin Hypercube experiments).
RESULTS
Computational validation of BO method
Before optimizing our experimental system, we tested the BO algorithm on various multi‐information source mathematical test functions (Appendix 3) solving using the nosey expected multi‐point improvement acquisition function on a 10‐dimensional problem. Each had two low‐fidelity test functions ( and ) which differed substantially from the true test function. Given an extremely limited high‐fidelity budget (10 simulations at two per iteration of the optimization loop), the multi‐IS GP saw better average performance (higher outputs) compared to a regular GP with otherwise the same model architecture (hyperparameters, training method, priors, etc.). The major limitation of this experiment is that these test functions do not represent the true biological process. However, as the test functions were created to mimic noisy biological processes, we should be able to differentiate the performance of optimization methods using these results.
Experimental validation of BO method
We then applied our BO method to C2C12 media optimization design problem. The BO method achieved a maximum desirability of in 81 total experiments while the DOE only achieved a maximum at requiring 132 experiments. This represented a 132% improvement over the DOE and a 113% improvement over the control DMEM with 38% fewer experiments. The optimal BO medium corresponds to cell number with cost , or a 227% improvement in cell number over DOE at a 34% increase in cost, and a 181% improvement in cell number over the DMEM control at a mere 1.6% increase in cost. As seen in Figure 3a the BO method also found a suboptimal medium, with higher than DOE and the DMEM control, within 30 experiments, or a 77% reduction in experimental effort.
Figure 3
Learning curve and trade‐off curve of BO method. (a) Learning curve of shows BO and DOE method designing experiments over the course of the optimization study. The line and dots represent the high‐fidelity IS optima and designs, the dashed and dotted lines represent the DMEM control and DOE optima values for respectively. Each IS experiment is shown in (b) the trade‐off curve indicating a clear tradeoff between cost and cell number , where the dots, triangles, squares, and “x's” represent Passage 2, Passage 1, AlamarBlue, and Live Stain respectively. (c) Simulated trade‐off curve also shown for high‐fidelity IS also showing a predicted parabolic relationship between competing objectives and . BO, Bayesian optimization; DMEM, Dulbecco's modified Eagle's medium; DOE, design‐of‐experiments
Learning curve and trade‐off curve of BO method. (a) Learning curve of shows BO and DOE method designing experiments over the course of the optimization study. The line and dots represent the high‐fidelity IS optima and designs, the dashed and dotted lines represent the DMEM control and DOE optima values for respectively. Each IS experiment is shown in (b) the trade‐off curve indicating a clear tradeoff between cost and cell number , where the dots, triangles, squares, and “x's” represent Passage 2, Passage 1, AlamarBlue, and Live Stain respectively. (c) Simulated trade‐off curve also shown for high‐fidelity IS also showing a predicted parabolic relationship between competing objectives and . BO, Bayesian optimization; DMEM, Dulbecco's modified Eagle's medium; DOE, design‐of‐experimentsTable 3 shows the media concentrations resulting from the BO and DOE methods along with the DMEM control used throughout this study. The BO method found that transferrin, glutamine, progesterone, and estradiol should be at a high relative concentration. Ascorbic acid, hydrocortisone, and dexamethasone should be at a low/zero concentration. The remaining components should be somewhere in between the two extremes. The DOE method, using only AlamarBlue, used a PB screening design (32 experiments) to reduce the problem size from 14 components to four, finding that glucose, glutamine, albumin, and FBS had the highest positive effect on . Next, four sequential BB designs (25 experiments each), with bounds shifting in the direction of ‐steepest accent after each BB, used 100 experiments to find the optimal bounds of the four‐dimensional factor space. Optimal factors were predicted to be nearly identical to the DMEM control, resulting in nearly identical desirability ( vs for DOE and DMEM control, respectively).As expected, there was a trade‐off between a number of cells and medium‐cost captured in Figure 3b,c. More nutrients, especially FBS, improved cell number at the expense of higher cost; this trend then breaks down as more FBS and Albumin have a deleterious effect on growth.We also note from Figure 4 that the BO algorithm found the optimal concentration of some components faster than others, as indicated by heavier clustering of datapoints. This is a function of how confident the multi‐IS GP was in certain regions of the design space, with denser sampling being indicative of higher confidence in improvement.
Figure 4
Learned optimal concentration. The conditions of each experiment (concentration ranges in Table 1) are shown plotted as a function of the cumulative number of experiments in the BO (circle) and DOE (box) study. The moving average (solid and dashed line for BO and DOE respectively) shows how each method searches for optimal concentrations. The horizontal line represents the final BO optimal concentration. BO, Bayesian optimization; DOE, design‐of‐experiments
Learned optimal concentration. The conditions of each experiment (concentration ranges in Table 1) are shown plotted as a function of the cumulative number of experiments in the BO (circle) and DOE (box) study. The moving average (solid and dashed line for BO and DOE respectively) shows how each method searches for optimal concentrations. The horizontal line represents the final BO optimal concentration. BO, Bayesian optimization; DOE, design‐of‐experiments
Experimental validation of long‐term cell number objective function
The robustness of the multi‐IS GP model was evaluated by re‐sampling the optimal BO medium which had a cell number metric of . When measured again the cell number metric was , indicating measurement and overall system reproducibility. Next, all four optimal media were cultured for 288 total hours (to Passage 4 with 72 h/passage), to determine how well our high‐fidelity IS generalized to longer‐term growth. The optimal medium designed by the BO method outperformed the DOE and DMEM control substantially in a number of cells grown at Passage 4, with results summarized in Figure 5.
Figure 5
Long‐term validation of optima media. The optimal BO‐designed (dots), DOE (triangles), and DMEM control (squares) media performance up to Passage 4 Each passage was 72 h of growth at 37°C and 5% CO2. Trypsinization took place after each 72 h period to count cells and replate them to allow for further growth (standard deviations indicated). The BO method designed an optimal media with substantially improved long‐term growth capacity than the DOE or DMEM control. BO, Bayesian optimization; DMEM, Dulbecco's modified Eagle's medium; DOE, design‐of‐experiments
Long‐term validation of optima media. The optimal BO‐designed (dots), DOE (triangles), and DMEM control (squares) media performance up to Passage 4 Each passage was 72 h of growth at 37°C and 5% CO2. Trypsinization took place after each 72 h period to count cells and replate them to allow for further growth (standard deviations indicated). The BO method designed an optimal media with substantially improved long‐term growth capacity than the DOE or DMEM control. BO, Bayesian optimization; DMEM, Dulbecco's modified Eagle's medium; DOE, design‐of‐experiments
Sensitivity analysis
We then examined the first and second‐order effects of each component as predicted by the multi‐IS GP (training on all datapoints). Most components show a parabolic effect in both and (Figure 6), where the optimal medium is in the middle of the factor space, often in sample dense regions.
Figure 6
Predicted first and second‐order effects. First‐order predicted effects of each component of the high‐fidelity IS are shown on diagonal plots (y‐axis is not to scale) with solid and dashed lines representing predicted cell number and desirability respectively. The “above” diagonal plots are second‐order plots for cell number and “below” are those for desirability . The range of all components as described in Table 1. Labels are left off for clarity; to find the axis labels read the x‐axis labels horizontal from the diagonal label and read the y‐axis labels vertical from the label.
Predicted first and second‐order effects. First‐order predicted effects of each component of the high‐fidelity IS are shown on diagonal plots (y‐axis is not to scale) with solid and dashed lines representing predicted cell number and desirability respectively. The “above” diagonal plots are second‐order plots for cell number and “below” are those for desirability . The range of all components as described in Table 1. Labels are left off for clarity; to find the axis labels read the x‐axis labels horizontal from the diagonal label and read the y‐axis labels vertical from the label.To quantify the magnitude of the predicted global effect of each component, we employ the VARS method (Razavi & Gupta, 2016a, 2016b) of sensitivity analysis because standard methods of sensitivity analysis cannot capture the “importance” of a given factor in the presence of nonlinear effects. In VARS we defined as the number of pairs in a set such that all possible pairs of points and are separated by a normalized factor distance . We then integrated the variance of all pairs separated by to get the variogram . If we set (10% of total normalized factor space for a given component) we are estimating a “local” variability in the output whereas would be an estimate of the “long‐range” effect. Figure 7 shows these variograms for each component integrated to their “local,” “medium,” and “global” ranges, showing albumin, FBS, dexamethasone, and glutamine have the largest effect on , with FBS being by far the most critical component.
Figure 7
Variogram sensitivity analysis. The local (horizontal hatching), global (diagonal hatching), and mid‐range sensitivity of each component on is indicated by the height of the bars. Albumin, FBS, dexamethasone, and glutamine have the largest effect on , with FBS being by far the most critical component with respect to global sensitivity. Predicted variogram for each component was formed from random samples from domain . FBS, fetal bovine serum.
Variogram sensitivity analysis. The local (horizontal hatching), global (diagonal hatching), and mid‐range sensitivity of each component on is indicated by the height of the bars. Albumin, FBS, dexamethasone, and glutamine have the largest effect on , with FBS being by far the most critical component with respect to global sensitivity. Predicted variogram for each component was formed from random samples from domain . FBS, fetal bovine serum.It was also useful to examine the correlations between different IS. The model predicts all IS to have very linear correlations (Figure 8c), while Passage 1, having the most experimental noise, had the weakest inter‐IS correlations. Biases are predicted at the upper end of the output range as indicated by the deviation from the 45° line in Figure 8c. This fact is also evident in the predicted kernel matrix in Figure 8b, where the more error‐prone Passage 1 data displays high off‐diagonal intra‐IS correlation, and the other IS show nearly identical inter and intra‐IS correlations.Kernel plots and IS distributions. (a) and (b) Show the output of the kernel for all data collected and a simulated data set where only is varied from , respectively. Darker regions indicate large values of , and thus a correlation. Also (c) the various IS cell number/correlate distributions (diagonal histograms) are shown. Above the diagonal (squares) are the actual inter‐IS correlations for each IS with their respective R
2 values, and below the diagonal (circles) are the predicted inter‐IS correlations for a random data set.
DISCUSSION AND CONCLUSION
Production scale cellular agricultural processes will require >10 passages of cell growth (O'Neill et al., 2021) so optimizing growth based on single‐passage information is not adequate (Cosenza et al., 2021). However, multipassage growth assays are difficult/expensive to measure, and even more difficult to optimize when given many components. We managed this complexity by coupling long‐term (i.e., >1 passages) cell number measurements with simpler but less valuable rapid growth chemical assays (single passage) in murine C2C12 cultures as a model system for cellular agricultural applications, capturing a more wholistic model of the process. We combined this with an optimization algorithm that efficiently allocates laboratory resources toward solving for desirability function , a function that incorporates both cell growth and medium cost. This resulted in a 38% reduction in experimental effort, relative to a comparable DOE method, to find a media 227% more proliferative than the DMEM control at nearly the same cost. As the longer‐term passaging study suggests, our Passage 2 objective function and IS were well‐calibrated to mimicking the complex industrial process of growing large batches of cells over many passages, with Passage 4 cell numbers well‐predicted by this objective function.The reasons for the success of the BO are myriad. The BO method iteratively refines a single process model to improve certainty in ‐optimal regions, whereas the DOE relies on a series of BB designs where the older data sets are ignored because they were outside of the optimal factor space. The BO also used a variety of IS, whereas the DOE only used a single low‐fidelity AlamarBlue metric (as is common in analysis of growth media). Looking at Figure 8c, the AlamarBlue and LIVE tended to cluster around the point , making it difficult to distinguish between high‐quality and low‐quality media. This may be due to the deviation of linearity of the and metric at high biomass. The BO method also refined its multi‐IS model over the entire feasible design space, allowing it to take advantage of optimal combinations and concentrations of all 14 components over the entire domain, whereas the DOE needed to reduce the design and factor spaces to reduce the number of experiments needed, and may have identified the wrong optimal boundary locations resulting in suboptimal experimental designs. The BO method was also able to leverage information about process uncertainty to improve the model is poorly understood regions of the design space, whereas the steepest accent method used by the DOE chased after improved with little regard for overall noise or experimental errors. This was worsened by the sensitivity of the polynomial model to random inter‐batch fluctuations in , which may have driven the DOE to suboptimal media. Note that the success of our BO method should not be taken as generic superiority over all potential instantiations of DOE or commercial media used for C2C12 growth.While the BO method worked well at solving the experimental optimization problem, the multi‐IS GP accuracy was limited to highly sampled regions of the design space, thus limiting the efficacy of sensitivity analysis. This was a conscious decision made to trade off postfacto analysis for sampling media with high desirability . Accuracy was also limited by the low amount of data available relative to the large dimensionality , which is inherently the case in complex biological experiments where each batch of experiments takes >1 week to evaluate. Finally, the hyperparameters used in the multi‐IS squared exponential kernel were deliberately regularized with prior distributions to smooth the posterior of the prediction . Regularization may have diminished the quality of the inter‐IS correlations; the model hyperparameters ignored features where IS differed in favor of a simpler correlative structure to explain the data. This is seen in Figure 8b,c, where the kernel evaluations show nearly equal inter‐IS correlative strength for most IS used. This may have “squished”/ignored features that could have provided additional information, but at the cost of sampling the design space too widely, again a deliberate choice of model skepticism towards outliers.Even with these limitations, the BO method clearly performs well on media optimization systems relevant to cellular agriculture, that is, those with multiple and potentially conflicting information sources with varying levels of difficulty in measuring. The media resulting from the BO algorithm supported significantly more C2C12 cell growth with only a small increase in cost. This algorithm performs better than traditional DOE in this case, especially in incorporating critical data from growth after the multiple passages in an affordable manner. With these results, it should be possible to implement this type of experimental optimization algorithm in other systems of importance to cellular agriculture and cell culture production processes with difficult‐to‐measure output spaces, including for optimization of serum‐free media for cell growth and for differentiation.Supporting information.Click here for additional data file.
Authors: Daniel Brunner; Jürgen Frank; Helmut Appl; Harald Schöffl; Walter Pfaller; Gerhard Gstraunthaler Journal: ALTEX Date: 2010 Impact factor: 6.043
Authors: Jan van der Valk; Karen Bieback; Christiane Buta; Brett Cochrane; Wilhelm G Dirks; Jianan Fu; James J Hickman; Christiane Hohensee; Roman Kolar; Manfred Liebsch; Francesca Pistollato; Markus Schulz; Daniel Thieme; Tilo Weber; Joachim Wiest; Stefan Winkler; Gerhard Gstraunthaler Journal: ALTEX Date: 2017-08-09 Impact factor: 6.043
Authors: Edward N O'Neill; Joshua C Ansel; Grace A Kwong; Michael E Plastino; Jenny Nelson; Keith Baar; David E Block Journal: NPJ Sci Food Date: 2022-09-29