Literature DB >> 34089321

Faster, reduced cost calibration method development methods for the analysis of fermentation product using near-infrared spectroscopy (NIRS).

Nosa Agbonkonkon¹, Greg Wojciechowski¹, Derek A Abbott¹, Sara P Gaucher¹, Daniel R Yim¹, Andrew W Thompson¹, Michael D Leavell¹.

Abstract

Recent innovations in synthetic biology, fermentation, and process development have decreased time to market by reducing strain construction cycle time and effort. Faster analytical methods are required to keep pace with these innovations, but current methods of measuring fermentation titers often involve manual intervention and are slow, time-consuming, and difficult to scale. Spectroscopic methods like near-infrared (NIR) spectroscopy address this shortcoming; however, NIR methods require calibration model development that is often costly and time-consuming. Here, we introduce two approaches that speed up calibration model development. First, generalized calibration modeling (GCM) or sibling modeling, which reduces calibration modeling time and cost by up to 50% by reducing the number of samples required. Instead of constructing analyte-specific models, GCM combines a reduced number of spectra from several individual analytes to produce a large pool of spectra for a generalized model predicting all analyte levels. Second, randomized multicomponent multivariate modeling (RMMM) reduces modeling time by mixing multiple analytes into one sample matrix and then taking the spectral measurements. Afterward, individual calibration methods are developed for the various components in the mixture. Time saved from the use of RMMM is proportional to the number of components or analytes in the mixture. When combined, the two methods effectively reduce the associated cost and time for calibration model development by a factor of 10.

Entities: Chemical

Keywords: Fermentation; Generalized calibration model (GCM); NIR spectroscopy; RMMM; Sibling model

Mesh：

Year: 2021 PMID： 34089321 PMCID： PMC9113423 DOI： 10.1093/jimb/kuab033

Source DB: PubMed Journal: J Ind Microbiol Biotechnol ISSN： 1367-5435 Impact factor: 4.258

Introduction

Across the biotechnology sector, the ultimate test of a new strain is its performance in a fermentation vessel. This typically involves measurements of sugar, biomass density, and product for evaluation of key performance metrics such as yield and productivity (rate of product formation). In current practice, fermentation sample analysis is a slow and cumbersome process in which samples are collected and prepared manually and analyzed by unique wet chemistry assays (Fig. 1). A limited number of samples can be taken each day, which limits temporal resolution over the fermentation time-course. Sample sets are also often analyzed in batches, resulting in significant delays between fermentation completion and data availability.

Fig. 1

Schematic of current wet chemistry method showing its long turnaround time as well as traditional individualized NIR model method. GCM and RMMM time savings are also depicted.

Schematic of current wet chemistry method showing its long turnaround time as well as traditional individualized NIR model method. GCM and RMMM time savings are also depicted. Since performance in the fermentation vessel is critical to strain development there is a need to develop rapid, accurate, and nondestructive analytical methods involving little or no sample preparation that will make fermentation measurements faster and decrease labor. Rapid, real-time methods decrease fermentation analysis time by 75–90% by eliminating almost all of the daily sampling requirements and improve the quality of decisions by enabling biologists and process engineers to gain useful insights into minute-to-minute or hour-by-hour metabolic performances of strains (Cozzolino et al., 2006). These real-time insights will lead to further reduction of daily sampling as critical parameters are monitored and understood. Sampling will become more strategic, used for either confirmatory purposes or for model maintenance and improvement. Amongst the various methods that are being investigated and developed, near-infrared (NIR) spectroscopy is foremost in this regard. NIR spectroscopy is a nondestructive measurement technique that has found application in many industries, including pharmaceuticals, petrochemicals, and food (Prajapati et al., 2016; Rhiel et al., 2002; Riley et al., 1999, 2001; Roggo et al., 2007). NIR vibrations are observed as the overtones or combination bands of the fundamental mid-IR bands. NIR, in combination with chemometric tools including partial least squares (PLS) and principal component analysis, and when coupled with mathematical pretreatment or spectral conditioning methods like the first and second derivatives, standard normal variate (SNV), and multiplicative scatter correction (MSC), produces powerful methods that enable both qualitative strain ranking ( Cozzolino et al., 2006) and quantitative determination of product and intermediate concentrations (Roggo et al., 2007). The history, theory, and principles of NIR are well described (Bec & Huck, 2019; Burns & Ciurczak, 2007; Davis, 2017; Williams, 2019). There are several metrics used to determine the quality of NIR calibration models, including: : A measure of variation in the calibration samples that is described by the model. It also measures the strength of correlation between NIR measurement and the reference method. : NIR calibration models are made by dimensional reduction of the complex NIR spectra, and calibration models are chosen based in part upon the number of ranks (also known as factors or principal components) needed to accurately describe the data. : Ratio of the standard deviation to the standard error of prediction, a measure of the predictive power of a model. : A measure of the error in NIR models. : A measure of constant bias in the model. NIR has emerged as the workhorse of process analytical technology methodology and practice, as reflected in the enormous investment pharmaceutical companies have made to comply with both the FDA and the European Union guidelines for the implementation of NIR (European Medicines Agency, 2012; U.S. Food & Drug Administration, 2004; Whitford & Julien, 2008). Lack of sample preparation and fast analysis time on the millisecond time-scale (Card et al., 2008) are two of the advantages in the use of NIR. The nondestructive nature of NIR measurement means that sample integrity is maintained, thus allowing measurement to be made in the sample's native state (Li et al., 2020). It is this lack of sample preparation that makes real-time measurements possible, and a major reason NIR is an attractive mode of measurement in pharmaceutical and chemical manufacturing where real-time adjustments are made to manufacturing processes to maintain consistency in product quality (Vann & Sheppard, 2017). Another notable feature of NIR is the ability to measure multiple components from one spectral scan. Multiple sample preparations for multiple assays are no longer needed, thus saving time and resources. Over the years, uses of NIR in fermentation operations (Cervera et al., 2009; Tosi et al., 2003) have steadily increased, albeit at a slower pace than hoped for, and even more so in the critical area of strain development or strain screening (Cozzolino et al., 2006; Saha & Jackson, 2017) where broth matrices are constantly changing as strains and fermentation management practices improve. Another challenge is the large number of samples needed to develop calibration models, as a model must capture all the variables, parameters, and conditions that describe the fermentation process. This makes initial development of the NIR calibration model expensive and labor intensive. In this study, we address some of these challenges with innovative model building methods that speed up the pace of NIR calibration modeling. By reducing the barriers to NIR calibration model development, the potential of NIR in fermentation process operation can be realized by (1) dramatically reducing the need for daily sample collection and preparation, (2) reducing analysis times, and (3) increasing information density, giving critical and actionable insights into strain behavior and performance. NIR calibration model development is by its very nature an offline activity. The data and discussion presented here were all generated in an offline mode; however, once validated, NIR methods can be transferred and used for real-time monitoring of fermentation processes without any issues.

The Evolution of Calibration Model Development: Two Novel Calibration Methods

Calibration model development is perhaps the most important aspect of the implementation of NIR titer measurements in fermentation. However, developing good, effective, and high performing calibration models can be quite an expensive undertaking. In strain and fermentation process development, it requires acquiring samples across processes that are often complex and differ in raw materials, strain, and operational processes. Building new models for every new molecule or strain developed makes the aggregated cost over time enormous. : The prevailing practice across many industries in developing NIR calibration models is to make one model per analyte in stable, well-defined, and characterized systems (Cozzolino et al., 2006; Monono et al., 2012). In this study, we demonstrated that we can improve on this practice by using GCM, also known as sibling modeling, which allows the amount of data required for calibration model building to be reduced by up to 50%. Generalizability of the model is aimed at developing a single model that can be used across related molecule in a chemical class (chemical siblings). GCMs can be used for a broad range of analytes with similar functional groups or other structural attributes (hence the moniker ‘sibling’). The principle behind generalizability is that if a group of molecules or analytes share similar core structural functionality, a single model can be built from a reduced number of samples from each molecule in the group into a larger pool of samples representing the shared similarities and unique differences among all analytes. Using one of the molecules in the group as the base analyte, we can build a generalized model for the group by starting with a minimum number of samples for the base analyte, and then add a reduced number of samples for the other chemical siblings of the group. As an example, suppose we are to build a GCM for a group of alcohols including ethylene glycol, 1,2-propanediol, 1,3-propanediol, and 1-propanol. We can build a good model for ethylene glycol with about 40 samples, then by adding about 20 samples of each of 1,2-propanediol, 1,3-propanediol, and 1-propanol we can extend this model to become one calibration model that is generalized across the group. Instead of building 4 individual calibration models of 40 samples apiece (a total of 160 samples), we end up with a single GCM of similar quality using only 100 samples, a reduction of 37%. : The limitation to the generalized model approach is that it requires that analytes share some structural or functional features (chemical siblings), that they have similar NIR vibrational frequencies, or have some overlapping NIR frequency bands. When generalizability is not possible, we used multicomponent measurement (Monono et al., 2012; Quentin et al., 2017; Riley et al., 1997, 2001) to develop calibration models for multiple analytes or molecules at the same time. Here, since the spectrum of a multicomponent mixture is the concentration-weighted linear combination of the spectra of the individual components in the mixture, multivariate statistics are used to develop individual models from mixture spectra. Thus, instead of taking multiple individual spectral measurements for each analyte on its own, a single spectral measurement of the mixture is taken and then chemometric methods are used to develop calibration models for individual analytes. For this to be possible, two conditions must be met: Analytes in the mixtures must not chemically react to each other, or to the matrix; they must remain and retain their individual identity in the mixture and not react to form new compounds. Either the mixtures used must be designed such that either the correlation between the concentrations of each pair of analytes must be zero or very close to zero, or if uncorrelated mixtures are not available, different spectral regions must be used. In this study, analytes were specifically grouped into nonreactive sets, then analyte concentrations in the sample mixtures were assigned in a randomized fashion (Supplementary Fig. A) with the aid of a random number generator.

Materials and Methods

Analyzer and Probe

NIR spectra were acquired using a Bruker Matrix-F spectrometer and an IN271 transflection custom probe. Probe material was stainless steel 1.4404 (316 l) with a sapphire window. Probe optical pathlength was 2 mm with a 1 mm slit, with an attached fiber optics bundle of 3 m length with seven low OH quartz fibers (core diameter 600 µm) terminated in 2 SMA-905 connectors, with a Kalrez 6375 O-ring seal. Probe housing was 15 mm diameter with an electro-polished finished surface. Probe housing immersion depth was 220 mm with a sealing plug for autoclaving. Probe and housing were rated for a temperature range of −50°C to 1,400°C with a maximum pressure of 5 bar.

Generalized Calibration Modeling Method

In this study, we present one GCM for a group of organic acids and a second for terpenes. Succinic, tartaric, adipic, and glutaric acids made up the organic acids, while farnesene and α-bisabolol made up the terpene group (Supplementary Fig. B). In the organic acid group, the analytes were spiked into a fermentation broth of a naïve Saccharomyces cerevisiae strain to simulate production, while in the terpene group, α-bisabolol and farnesene were produced in situ by engineered strains.

Organic Acid Generalized Model

Sample concentrations for the organic acids were calculated after the analytes were spiked into 50 ml fermentation broth from six fermentors. Succinic acid 99% purity, glutaric acid 99% purity, adipic acid 99.5% purity, and tartaric acid 99.5% were used and were all obtained from Sigma Aldrich, St. Louis, MO. Aliquoted amounts were added to the broth by weight to achieve calculated concentrations of 0–4 M. Glutaric acid was designated the base analyte with 30 spiked samples. Five spiked samples were created for each of the other analytes in the group. Fermentation ran for 5 days, and daily spectra of the spiked fermentation samples were taken to simulate the growth and production of these analytes in fermentation. All spectra were combined in the Bruker Opus chemometric software, and PLS regression along with first derivative math pretreatment was used to create a single organic acid GCM. NIR measurements were collected offline, with measured broth in a 100 ml media bottle and the organic acids spiked in. A magnetic stir bar was used to stir the broth both for proper mixing as well as for keeping the broth homogenous during spectral acquisition. Media bottles containing broth spiked with analytes were placed on a stir plate and stirred at 2000 rpm while the NIR spectra were obtained.

Terpene (α-Bisabolol/Farnesene) Generalized Model

The generalized model was tested further by using actual production (manufacturing scale-down) fermentors making farnesene and α-bisabolol independently. Here, ten 2 l fermentors containing a farnesene-producing strain and twenty 2 l fermentors containing an α-bisabolol-producing strain were used. NIR spectra of 250 samples of α-bisabolol broth spanning 13 days of fermentation and 114 samples of farnesene broth spanning 11 days were used for calibration model development. Spectra of an additional 60 farnesene samples taken from a different set of 10 fermentors over 6 days were used to test the predictive power of the terpene GCM. The terpene generalized model was further put to the test by using it to predict α-bisabolol concentrations in 10 samples taken from a 300 l pilot-scale fermentor. As in the organic acid measurement, NIR spectra were taken from 50 ml broth samples of α-bisabolol and farnesene by placing a magnetic stir bar in the media bottle, placing it on a stir plate, and immersing the NIR transflection probe into sample until the broth covered the probe window.

Randomized Multicomponent Multivariate Modeling

Here, six analytes spanning chemical class and biological diversity were selected. They were grouped into two groups for chemical compatibility to reactivity, with group members not reacting to make a new compound. Cadaverine, 2-phenylethanol and α-bisabolol made up group A and squalene, d-limonene and oleic acid made up group B. An engineered yeast strain producing α-bisabolol was used for group A, while broth from a nonengineered S. cerevisiae was used for group B. Ten fermentors were used for group A, while group B had 12 fermentors. Fermentation was for 10 days in a fed-batch process. Analytes were purchased from Sigma-Aldrich, St. Louis, MO. Members of groups A and B have the following purity: oleic acid, 90%, limonene, 97%, squalene, 98%, cadaverine, 95% and 2-phenylethanol, 99%, and all these were in liquid form. α-Bisabolol was produced via fermentation using an engineered yeast strain, and the concentration was determined by gas chromatography as described below. The concentration of all remaining analytes was calculated gravimetrically after they were aliquoted directly into the fermentation broth.

Sample Measurement vessel and Mixing accessories

For RMMM, a different mixing strategy was employed than the one used for GCMM. Instead of a media bottle with a stir bar, we employed the IKA Ultra-Turrax Tube Drive P workstation (Ident. No. 0025005836) and a mixing cup of 50 ml volume as shown in Fig. C (Supplementary Material). This setup enabled continuous mixing while spectra were acquired. Speed of the Ultra-Turrax Tube Drive was set at 2000 rpm to make it consistent with the magnetic stirrer speed.

NIR Spectrum Acquisition (GCM and RMMM)

Fermentation broth (50 ml) was measured into the media bottle (GCM) and IKA tube (RMMM), and a varying amount of analyte was added. With the mixing rate set at 2000 rpm, the sample was mixed for about 2 min, then the NIR probe was lowered into the mixing cup and spectra were taken. Several typical NIR fermentation broth spectra containing both spiked in analytes and fermentation products mentioned above taken by the Bruker Matrix-F spectrometer are shown in Fig. D (Supplementary Material).

Gas Chromatography Assay

Determination of farnesene and α-bisabolol titers in broth was carried out using gas chromatography with flame ionization detection (GC-FID). Farnesene: 10 ml farnesene broth sample was acquired in a 20 ml glass vial and mixed well. 0.5 ml was transferred into a tared 20 ml glass vial and the weight recorded. Eighteen milliliter of 90:10:0.25 methanol:butoxyethanol:tetradecane was added to the broth, and the total weight of broth, extraction reagent, and the vial was recorded. Extractions were thoroughly mixed and left to stand for 15 min allowing the solids to settle. One hundred microliters supernatant was transferred to a GC-FID vial and diluted with 900 µl ethyl acetate. Triplicate injections were made using pulsed split injection onto an Agilent J&W DB-1MS-LTM column (methyl silicone, 10 m × 0.10 mm × 0.10 µm film thickness). The GC-FID inlet was a split/splitless with a split ratio of 108:1 with injection pulse pressure of 75 psi, and split flow of 70.2 ml/min for a duration of 0.15 min. Inlet temperature was 300°C, at constant pressure of 59.7 psi, with hydrogen as the carrier gas. The GC-FID oven temperature was set initially for 100°C for 0.15 min then ramped to 175°C at a rate of 15°C/min until it reached a final temperature of 320°C for a total runtime of 5.5 min. Farnesene titer was calculated using tetradecane as an internal standard and reported as grams per kilogram (g/kg) of broth. α-Bisabolol: As for farnesene, 10 ml α-bisabolol broth was sampled into a 20 ml scintillation vial and mixed well. Two hundred fifty microliters broth sample was aliquoted while stirring and transferred to a second vial. The weight of the vial plus sample was recorded, 15 ml 1400:1200:2.5 methanol:ethyl acetate:hexadecane was added, and the vial was reweighed. Samples were vortexed for 120 s then left to stand for at least 30 min, or until solids were settled completely and supernatant was clear. Five hundred microliters supernatant was transferred to a GC-FID vial and diluted with 500 µl ethyl acetate. One microliter was injected into an inlet set to 275°C with a split ratio of 50:1 and pressurized to 75 psi for 0.2 min. immediately following injection. Separation was performed on an Agilent HP-1 column (10 m × 0.10 mm × 0.10 µm film thickness), using hydrogen at a constant pressure of 65 psi. After injection the oven temperature was held at 140°C for 0.1 min, then ramped to 240°C at 25°C/min, then to 300°C at 30°C/min, then to 320°C at 20°C/min, and finally held at 320°C for 0.5 min. Detector temperature was 325°C, with hydrogen, air, and makeup gas (nitrogen) flows set at 30, 360, and 45 ml/min, respectively.

Results/Discussion

Glutaric acid was used as the base analyte for the GCM approach, with 30 samples per day taken across 5 days of fermentation. Each day the model was extended by adding tartaric, succinic, and adipic acid at five concentration levels into one broth sample. The resulting 225 samples constituted the calibration set. Individual models for each acid are shown in Fig. 2 (a–d). Table 1 shows the spectral region and the math pretreatment used for each model, as well as model quality metrics including correlation coefficient (R2), rank, RMSECV, and RPD. In each case, a cross-validation leave-one-out validation method was used. While the concentrations of tartaric, succinic, and adipic acids added into the individual broth were the same daily, we arrived at calibration models optimized for each analyte by using different modeling parameters like spectral region, math pretreatment, and rank. While the calibration regression plots are similar looking with high R2, Table 1 also reveals areas of overlapping spectral regions and math pretreatment that were exploited and used in the building of a generalized model for the combined analytes.

Fig. 2

Calibration models for individual organic acids. This individualized model method is the traditional mode for NIR model development.

Table 1.

Calibration Model Parameters and Quality Metrics for the Individual Organic Acid Models and the Generalized Model

Name	Rank	R²	RMSECV (M)	Spectral range (cm^–1)	Math pretreatment	RPD
Glutaric acid	6	0.9984	0.0045	6348–5315	SNV	24.7
Tartaric acid	8	0.9984	0.0035	7505–6796, 4428–4242	First derivative + SNV	25.4
Succinic acid	5	0.9954	0.0072	8458–7498, 6101–5446	SNV	15
Adipic acid	5	0.9676	0.017	9403–7498	First derivative	5.59
Generalized model	11	0.9915	0.0103	9403–5446	First derivative	10.8

Calibration models for individual organic acids. This individualized model method is the traditional mode for NIR model development. Calibration Model Parameters and Quality Metrics for the Individual Organic Acid Models and the Generalized Model In each case, the R2 of the correlation between concentrations measured by NIR and the calculated referenced concentration was over 0.96, meaning the models developed for individual molecules described over 96% of the concentration variation in the calibration sample set. The rank used ranged from 5 to 8, which is good for a complex system like fermentation broth which changes matrix as cells grow, sugar and nutrients are consumed, and products are made. Both the rank used and the R2 of these calibrations are consistent with previous reports. Card et al. (2008) reported correlation coefficients ranging from 0.926 to 0.995 and rank ranging from 3 to 7 for a mammalian culture that was primarily producing glucose, lactate, and glutamine, while Riley et al. (1997) reported ranks ranging from 4 to 8 for a fed-batch process. Cervera et al. (2009) published a detailed review article in which they put together spectral regions, math pretreatment, R2, rank, and errors in cell culture and fermentation for various types of compounds, and our study appears to be consistent with the values they compiled. When the calibration set sample spectra were combined to develop a generalized model, we obtained a very good model, quite like the individualized models (Fig. 3a). The GCM R2 of 0.9915 shows that the model describes over 99% of the concentration variation in the calibration set. The rank used is much higher than any of the individual models above (Table 1), but is still consistent with the literature, albeit for much simpler systems (Cervera et al., 2009). This was to be expected as we have a more complex sample system and a higher rank will be required to adequately describe the new system of samples. The broad spectral region used in the generalized model (9403–5446 cm–1) also indicates the need to capture more information in the GCM than in the individual models. While there is debate on what level or value of RPD is most useful (Cozzolino & Moron, 2006; Camacho-Tamayo et al., 2014; Saeys et al., 2005; Tenhunen et al., 1994), it is generally accepted that an RPD of over 4 is indicative of good predictive power. The RPD of 10.8 calculated for the GCM is a strong indication that it could be used for predicting any of the acids in the group within the specified range.

Fig. 3

The organic acid GCM combining spectra from samples containing the individual organic acids is shown in (a). The organic acid GCM with glutaric, tartaric, and adipic acids is shown in (b). Model (b) was used to predict concentrations of succinic acid, which was not included in its calibration set. The regression plot (c) shows good correlation between measured and predicted succinic acid concentrations. Having shown that the predictive power of the generalized model is as good as that of the individual models, we can now demonstrate one of the advantages of this method when it is used to measure other analytes that are within the class but were not used in the model development phase. As an example, we built a different organic acid generalized model, this time with glutaric, tartaric, and adipic acids only (Fig. 3b), and used it to predict spiked-in succinic acid concentrations. As shown in Fig. 3c, the NIR-predicted concentrations of succinic acid were highly correlated to calculated concentrations and were statistically equivalent within 5% to the calculated succinic acid concentration using two one-sided t-tests (TOST, analysis not shown). To further demonstrate the power and applicability of GCMM, we also tested the predictive power of a GCM built using two of our products, α-bisabolol and farnesene. Individual models for farnesene and α-bisabolol were built and their spectra were also subsequently combined to build a generalized terpene model. Two hundred fifty samples of α-bisabolol were used to develop the α-bisabolol-only model, 115 samples of farnesene were used for the farnesene-only model, then all 365 spectra were combined in the terpene generalized model. Individual calibration models for α-bisabolol and farnesene are shown in Fig. 4(a, b). Each had an R2 of over 0.99, with ranks of 8 and 7 and RPDs of 11.8 and 31.3, respectively, indicating a high level of predictivity. The extremely high RPD of the farnesene model is possibly due to the very well developed and characterized GC-FID farnesene assay used in our laboratory. Table 2 shows a summary of the model parameters. Just as in the case of the organic acid generalized model, different spectral ranges and math pretreatment were used for the individual models. The different spectral regions and math pretreatment are reflective of the uniqueness of each model, while the regions of spectral overlap showed commonalities that were used in the development of the generalized model. The generalized model combining α-bisabolol and farnesene spectra had an R2 of 0.9958, indicating the model describes over 99% of the combined population of α-bisabolol and farnesene samples used to build the model. Just as in the organic acid generalized model, the terpene GCM used both a shared spectral region and math pretreatment from the individual models.

Fig. 4

Table 2.

Calibration Model Parameters and Quality Metrics of the Individual α-Bisabolol and Farnesene Models and the Terpene GCM

Name	Rank	R ²	RMSECV (g/kg)	Spectral range (cm^–1)	Math pretreatment	RPD
α-Bisabolol	9	0.9929	2.240	8454–7498, 6102–5446	First derivative + SNV	11.8
Farnesene	7	0.9990	1.310	7506–6094	SNV	31.3
Terpenes	15	0.9958	2.090	7506–5446	SNV	15.4

Individual α-bisabolol and farnesene models are shown in (a) and (b). The terpene GCM from the combination of spectra taken from samples containing either α-bisabolol and farnesene (c) shows a model that is as good as the two individual models. The calibration range of farnesene was higher than that of α-bisabolol, and combining both analytes into the terpene GCM allowed extension of the α-bisabolol range. (d) Compares α-bisabolol concentrations in a set of samples generated at pilot scale predicted by the terpene GCM generated from lab-scale samples. Calibration Model Parameters and Quality Metrics of the Individual α-Bisabolol and Farnesene Models and the Terpene GCM The generalized terpene calibration model from the combined spectra of α-bisabolol and farnesene measurement is shown in Fig. 4c. An interesting feature of the terpene generalized model and indeed all generalized models is the potential to extend the model quantification range, like the titer range in the current study, for any member of the group. As new strains are designed and developed, improvements are measured or captured in critical parameters like titer, yield, and productivity. The terpene generalized model shows that it could be used to measure α-bisabolol at titers up to about 150 g/kg, beyond the α-bisabolol-only model maximum of about 120 g/kg, without acquiring new α-bisabolol samples above 120 g/kg. This reduces development cost and timeline, as we have a ready-to-use method that can be used until α-bisabolol strains are making more product than the farnesene titer range of 150 g/kg. Another important feature of the generalized model method and a major requirement for its efficacy is that members must have overlapping concentration ranges. This overlapping characteristic is critical to building a good generalized model as well as ensuring its performance. With the region of overlap established, one or two components can then be used to extend the overall calibration range, which can now be applied to all members within the group or other molecules that share similar characteristics. This is demonstrated in both the organic acids and terpene GCMs (see Figs 3a, b and 4c). For the terpene GCM, the overlapping region is 1–120 g/kg as shown, while in the organic acid GCM it was the entire calibration range (0–0.4 M). These regions act as a base that enables the possibility of extending the calibration range in the future as new samples are added, which makes GCMM very powerful. Predictivity of the terpene GCM was tested by applying it to samples from a pilot-scale production (manufacturing scale-down) fermentor. This was an independent data set built from samples taken from a 300 l fermentor, 150× larger than the 2 l fermentors used to build the model. While we make every effort to ensure that our process conditions are the same across scales, the difference in volume naturally leads to process differences like mixing and localized temperature variation within the fermentors, which impact sample composition. However, even with these differences, the terpene GCM created using 2 l fermentors accurately predicted α-bisabolol concentrations measured by GC-FID in samples taken from a 300 l fermentor (Fig. 4d). The α-bisabolol concentrations predicted by NIR showed no fixed or proportional bias compared to those measured by GC-FID in a Passing–Bablok (Passing & Bablok, 1983) regression analysis (not shown). The slope of 0.967 was not significantly different from 1 and the intercept of −1.117 was not significantly different from zero, indicating that the GCM, built at laboratory scale, accurately predicted α-bisabolol concentrations in our pilot plant fermentor, Next, the terpene GCM and the terpene-only model were used to predict farnesene in a set of 60 independent farnesene samples taken from ten 2 l fermentors over a 6-day fermentation period. NIR spectra of these samples were acquired offline, and the farnesene titers were determined by using both models, respectively. This was done to compare the accuracy of titers predicted by both models to those from the GC-FID reference assay. Fig. E (Supplementary Material) shows the farnesene titers in the ten fermentors measured by the GC-FID reference assay method, along with the titers predicted by the terpene GCM and the farnesene-only model. While the titers predicted by both NIR models were very close to those measured by the GC-FID reference measurement, in essence showing that the use of both or either of the NIR models will yield high level of accuracy in farnesene measurements, neither the difference between the titers predicted by both models (about 1.0 g/kg) nor the standard deviations (0.1 g/kg) was significant in a Passing–Bablok regression analysis. Furthermore, the terpene GCM appears to be positively biased, as it consistently predicted slightly higher titer values when compared to the referenced GC-FID method, while the farnesene-only model was slightly negatively biased and predicted, on average, less farnesene in the broth when compared to the referenced GC-FID method. In fact, these biases were also not significant as determined using Passing–Bablok regression (Passing & Bablok, 1983) (analysis not shown), which returned slopes that were not significantly different from 1 and intercepts that were not significantly different from 0 (slope = 0.9987 and intercept = −0.9627 for the GC-FID-farnesene-only model comparison, and a slope = 0.9970 and intercept = 0.3893 for the GC-FID-terpene model comparison). This shows neither model had either fixed or proportional bias when compared to the GC-FID reference assay. The terpene GCM offered a slightly more accurate prediction of farnesene in the sample broth when compared to the referenced GC-FID method than the farnesene-only model, though as described above, this difference was not statistically significant. This could be a result of the increased sample size used in model development as we added the α-bisabolol samples to the original farnesene data set, leading to more extensive coverage of the fermentation environment and a more robust description of the system. There is a less than 4% difference between all three methods (the two NIR methods and the GC-FID method), which indicates that either model will be a suitable replacement for the GC-FID method, and that the GCM method can accurately predict any of the components in the mix. It is this generalizability across product class that allows rapid deployment of GCM methods to new molecules. We can quickly build new generalized models from historical spectral libraries of similar molecules with a reduced number of samples of the new molecules, thus reducing the cost associated with developing new models from scratch. As demonstrated here, we can continue to use this generalized model even as strain improvements lead to higher titers.

Randomized Multicomponent Multivariate Modeling Method

In the RMMM calibration models built for this study, the concentration of the analytes ranged from 0 to 30 g/l for spiked-in analytes and the α-bisabolol produced by the engineered strain ranged from 0 to 110 g/kg. Molecules were grouped based on solubility and chemical compatibility to prevent cross-reactivity. We were able to develop very good models for all molecules in each group, which illustrates the efficacy of the RMMM and the ability to quickly develop NIR calibration models for new analytes. Table 3 shows a summary of the calibration model results for the analytes in the two groups. Even though the analytes that made up each group were mixed together artificially and one spectrum per sample was taken, the calibration models were not impacted by the presence of other analytes in the sample mix. The combinations of spectral range and the math pretreatment were mostly different for each member of the group, and even when similar pretreatment was used as in α-bisabolol and cadaverine in group A, different spectral regions were used, and the ranks were different.

Table 3.

Summary Results of the NIR Calibration Models for Molecules in Groups A and B, Showing the Metrics Used to Determine Model Quality. Different Spectral Regions, Math Pretreatment, and Ranks Were Used

Group	Name	Rank	R ²	RMSECV (g/kg)	Spectral range (cm^–1)	Math pretreatment	RPD
A	Cadaverine	9	0.9767	0.84	9403–6094, 4605–4420	First derivative + SNV	6.6
	2-Phenylethanol	10	0.9610	1.41	9403–7336, 6310–5785	First derivative	5.1
	α-Bisabolol	5	0.9902	3.57	6012–5447	First derivative + SNV	10.3
B	Squalene	6	0.9735	1.16	6102–5770	Second derivative	6.2
	Oleic acid	7	0.9753	1.28	10,391–9588, 8794–7992, 6395–5592	MSC	6.4
	d-Limonene	4	0.9951	0.534	6102–5770	First derivative + MSC	14.2

Summary Results of the NIR Calibration Models for Molecules in Groups A and B, Showing the Metrics Used to Determine Model Quality. Different Spectral Regions, Math Pretreatment, and Ranks Were Used Spectra for group B samples were obtained in triplicate, while in group A single measurements were taken. There seems not to be any difference between using a single spectrum and triplicate spectra, as both measurements yielded good NIR calibration models. Calibration model plots (Fig. 5) show that in all cases good models were developed as evidenced by R2 ranging from 0.96 to 0.99, as well as the other metrics listed in Table 3 and Fig. 5. Thus, our calibration models adequately described the sample set used to build the model, regardless of other analytes in the sample mixture. Group A is particularly interesting, as the quality of the model suggests that spiking in cadaverine and 2-phenylethanol did not affect the α-bisabolol calibration model, and similarly the in situ production of α-bisabolol by an engineered strain did not impact the models of the spiked-in analytes. Group B models showed similar characteristics, with models showing no impact from other analytes.

Fig. 5

Calibration models of the molecules in groups A and B generated using RMMM.

Calibration models of the molecules in groups A and B generated using RMMM. What these results show is that we can compress and reduce the timeline for NIR calibration model development by combining spectral acquisition work that is often time-consuming and laborious for multiple analytes into one set of measurements. Instead of taking individual measurements and repeating the same work for each analyte, we can combine the analytes, take one spectral measurement and develop individual calibration models. Although we have presented here the results from groups of three, we have equally demonstrated (not shown) that we can extend the RMMM calibration method to build models for at least five analytes in a sample mixture. The two approaches described here can be used together, and in combination have the potential to reduce the associated cost and time required for NIR calibration model development 10-fold.

Future Work

Here, we have demonstrated that using the GCM and RMMM modeling approaches can lead to a reduction in the time and number of samples needed to develop NIR calibration models. However, more work is needed to improve on these methods to further reduce the sample and time requirements. An area that requires further investigation that potentially will lead to greater reduction in the number of calibration samples required is the use, in most processes, of a single calibration model in processes with more than one phase. Fermentation offers a good example. As the fermentation process moves from the initial growth phase, where cells are primarily multiplying, to the production phase, where cells are primarily making products, the NIR spectrum can change significantly. We hypothesize that building a single model to cover two qualitatively different process phases equally well will require many more samples than building multiple models that cover different process components like fermentation lag/log and production phases. However, splitting the model into multiple process phases will lead to region or phase-specific models that in total should require fewer samples to describe the entire process. More work like this will help lower the cost of integrating NIR into fermentation monitoring and operations.

Conclusion

While NIR spectroscopy gives us the chance to increase data information density, provide real-time information, and eliminate sample preparation for traditional chromatographic assays, the initial cost is steep in terms of model development time and effort. In this study, we show two ways of reducing the cost and time associated with model development. The GCM method reduced samples required for model building by about 50% by combining spectra of samples of chemical siblings to create one generalized model that works for all components in that group instead of multiple individual models. Using GCM to create a generalized model for terpenes, farnesene in fermentors was predicted as well or even slightly better than a farnesene-only model. Using the RMMM method, we also showed that by combining different analytes into one mixture and acquiring its spectrum, we can use a single spectrum combining multiple analytes for model building, thus reducing the number of sample measurements needed as well as the time required for spectral data acquisition by a factor that is equal to the number of analytes in the mixture.

Funding

The authors gratefully acknowledge the Defense Advanced Project Research Agency (DARPA) for funding this work (Technology investment Agreement (TIA) number HR0011-15-3-001). Click here for additional data file.

13 in total

1. Nondestructive near-infrared spectroscopic measurement of multiple analytes in undiluted samples of serum-based cell culture media.

Authors: Martin Rhiel; Michael B Cohen; David W Murhammer; Mark A Arnold
Journal: Biotechnol Bioeng Date: 2002-01-05 Impact factor: 4.530

2. Simultaneous measurement of 19 components in serum-containing animal cell culture media by fourier transform near-infrared spectroscopy.

Authors: M R Riley; H M Crider; M E Nite; R A Garcia; J Woo; R M Wegge
Journal: Biotechnol Prog Date: 2001 Mar-Apr

Review 3. A review of near infrared spectroscopy and chemometrics in pharmaceutical technologies.

Authors: Yves Roggo; Pascal Chalus; Lene Maurer; Carmen Lema-Martinez; Aurélie Edmond; Nadine Jent
Journal: J Pharm Biomed Anal Date: 2007-03-30 Impact factor: 3.935

4. Use of near-infrared spectroscopy (NIRs) in the biopharmaceutical industry for real-time determination of critical process parameters and integration of advanced feedback control strategies using MIDUS control.

Authors: Lucas Vann; John Sheppard
Journal: J Ind Microbiol Biotechnol Date: 2017-10-25 Impact factor: 3.346

5. A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in clinical chemistry, Part I.

Authors: H Passing
Journal: J Clin Chem Clin Biochem Date: 1983-11

6. Analysis of moisture, oil, and fatty acid composition of olives by near-infrared spectroscopy: development and validation calibration models.

Authors: Uttam Saha; Daniel Jackson
Journal: J Sci Food Agric Date: 2017-10-12 Impact factor: 3.638

7. Combining near infrared spectroscopy and multivariate analysis as a tool to differentiate different strains of Saccharomyces cerevisiae: a metabolomic study.

Authors: D Cozzolino; L Flood; J Bellon; M Gishen; M De Barros Lopes
Journal: Yeast Date: 2006 Oct-Nov Impact factor: 3.239