| Literature DB >> 35000555 |
Ian Walsh1, Matthew Myint1, Terry Nguyen-Khuong1, Ying Swan Ho1, Say Kong Ng1, Meiyappan Lakshmanan1,2.
Abstract
Ensuring consistent high yields and product quality are key challenges in biomanufacturing. Even minor deviations in critical process parameters (CPPs) such as media and feed compositions can significantly affect product critical quality attributes (CQAs). To identify CPPs and their interdependencies with product yield and CQAs, design of experiments, and multivariate statistical approaches are typically used in industry. Although these models can predict the effect of CPPs on product yield, there is room to improve CQA prediction performance by capturing the complex relationships in high-dimensional data. In this regard, machine learning (ML) approaches offer immense potential in handling non-linear datasets and thus are able to identify new CPPs that could effectively predict the CQAs. ML techniques can also be synergized with mechanistic models as a 'hybrid ML' or 'white box ML' to identify how CPPs affect the product yield and quality mechanistically, thus enabling rational design and control of the bioprocess. In this review, we describe the role of statistical modeling in Quality by Design (QbD) for biomanufacturing, and provide a generic outline on how relevant ML can be used to meaningfully analyze bioprocessing datasets. We then offer our perspectives on how relevant use of ML can accelerate the implementation of systematic QbD within the biopharma 4.0 paradigm.Entities:
Keywords: Biomanufacturing; Multivariate data analysis (MVDA); Quality by Design (QbD); hybrid modeling; machine learning (ML); upstream bioprocess design
Mesh:
Year: 2022 PMID: 35000555 PMCID: PMC8744891 DOI: 10.1080/19420862.2021.2013593
Source DB: PubMed Journal: MAbs ISSN: 1942-0862 Impact factor: 5.857
Figure 1.Historical trends of QbD in biomanufacturing. A comprehensive literature survey was performed focussing on studies which employed QbD in upstream bioprocessing to analyze the CPP, titer and CQA inter-relationships. (a) Conventional QbD framework: Selected process parameters are varied within a range as guided by DoE and the corresponding variations in titer and CQAs are measured, and the CPP – titer/CQA interrelationship is analyzed using statistical approaches. (b) Historical trends in the focus of process outputs and the statistical methods used to establish mathematical models.
The pros and cons of MVDA and machine learning
| Method | Pros | Cons |
|---|---|---|
| MVDA | Simple to set up; excellent computational tools with graphical user interface readily available Fast to optimize Models are often understandable linear equations Suitable when number of CPPs are small Useful for data visualization (2D and 3D) | Linear equations-based algorithms like PCA/ PLSR can lose information Cannot model complex relationships between CPP and CQA when the data is noisy and involves non-linear relationships |
| ML | Can capture complex relationships/functions including non-linear relationships that may model the underlying process more effectively Can handle very large datasets obtained from different sources e.g., multi-omics, ML feature selection algorithms can find novel levers/CPPs in high-dimensional data | Large amounts of data are usually required for efficient model training Often slow to optimize – may need high computational power. Complicated to set up and therefore can often be incorrectly designed |
Figure 2.Example ML application for simulating CQAs using different feeding and physiochemical variables. (a) The training and testing strategy to develop the final feeding and physiochemical model. From the full dataset of M fed-batch cultures, split it into a training set used for model optimization and a testing set used to evaluate the performance of the model. The final model is the one that performs best on the test set; (b) If the model performance is acceptable in (A), then it can be used to simulate what media components and physiochemical variables can be used for a desired CQA prediction. In this example, “s” simulations are performance with the i showing closest match to the desired CQA. A final validation of the model can be done by using the i CPPs in the fed batch process to confirm experimentally that the desired CQA was achieved.
Figure 3.Hypothetical ML model and their multiple applications in QbD. The tables show all CPPs and CQAs found in the literature, whether the CQAs are a regression or classification ML problem. Real-time models are trained to predict the output variables on the same day (or same moment) the input variables are collected. Forecasting involves using data from the current day and previously to predict a future days CQA.
| ANNs | Artificial Neural Networks |
| CPPs | Critical process parameters |
| CQAs | Critical quality attributes |
| dO2 | Dissolved oxygen |
| DoE | Design of experiments |
| EMA | European Medicines Agency |
| ET | Ensemble trees |
| FDA | Food and Drug Administration |
| GPR | Gaussian process regression |
| IoT | Internet of Things |
| mAbs | Monoclonal antibodies |
| ML | Machine learning |
| MVDA | Multi-variate data analysis |
| NGRK | Nonparametric regression with Gaussian kernel |
| PAT | Process analytical technologies |
| PCA | Principal component analysis |
| PCs | Principal components |
| PLSR | Partial least squares regression |
| QbD | Quality by Design |
| RBF-ANN | Radial basis function neural network |
| RF | Random forests |
| RT | Regression trees |
| SVM | Support vector machines |