| Literature DB >> 36017534 |
Sara Sousa Rosa1,2, Davide Nunes3, Luis Antunes3, Duarte M F Prazeres1,2, Marco P C Marques4, Ana M Azevedo1,2.
Abstract
Messenger RNA (mRNA) vaccines are a new alternative to conventional vaccines with a prominent role in infectious disease control. These vaccines are produced in in vitro transcription (IVT) reactions, catalyzed by RNA polymerase in cascade reactions. To ensure an efficient and cost-effective manufacturing process, essential for a large-scale production and effective vaccine supply chain, the IVT reaction needs to be optimized. IVT is a complex reaction that contains a large number of variables that can affect its outcome. Traditional optimization methods rely on classic Design of Experiments methods, which are time-consuming and can present human bias or based on simplified assumptions. In this contribution, we propose the use of Machine Learning approaches to perform a data-driven optimization of an mRNA IVT reaction. A Bayesian optimization method and model interpretability techniques were used to automate experiment design, providing a feedback loop. IVT reaction conditions were found under 60 optimization runs that produced 12 g · L-1 in solely 2 h. The results obtained outperform published industry standards and data reported in literature in terms of both achievable reaction yield and reduction of production time. Furthermore, this shows the potential of Bayesian optimization as a cost-effective optimization tool within (bio)chemical applications.Entities:
Keywords: Bayesian optimization; in vitro transcription; mRNA; machine learning; vaccines
Mesh:
Substances:
Year: 2022 PMID: 36017534 PMCID: PMC9539360 DOI: 10.1002/bit.28216
Source DB: PubMed Journal: Biotechnol Bioeng ISSN: 0006-3592 Impact factor: 4.395
In vitro transcription (IVT) reaction parameters and evaluation metric (see also Figure 1c)
| Name | Units | Type | Domain/range |
|---|---|---|---|
| Cofactor | Cofactor choice | Categorical | MgAcetate, MgCl2 |
| Cofactor concentration | mM | Real number | [0, 100] |
| DTT | mM | Real number |
|
| RNase inhibitor | U ml−1 | Integer |
|
| NTPs | mM | Real number |
|
| DNA template | n | Integer | [10, 100] |
| Inorganic pyrophosphatase | U ml−1 | Integer | [0, 10] |
| Spermidine | mM | Real number | [0, 10] |
| T7 RNA polymerase | U ml−1 | Integer | [1000, 50,000] |
| Temperature | °C | Integer | [20,50] |
| Reaction time | min | Integer | [10, 300] |
| pH | — | Real number | [6.5,8] |
| Evaluation | gmRNA L−1 | Integer |
|
Abbreviations: DTT, dithiothreitol; mRNA, messenger RNA; NTP, nucleoside triphosphate.
mRNA production reaction parameters and mRNA concentration for highest production conditions after Bayesian optimization (Reactions 1–6) compared to the benchmark reaction (Bancel et al., 2016) condition (Reaction 7)
| Reaction | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| Cofactor C (m | 60.00 | 48.46 | 41.79 | 59.87 | 49.28 | 40.00 | 40.00 |
| DTT (m | 7.09 | 3.85 | 5.57 | 9.85 | 5.27 | 3.99 | 5.00 |
| RNase I (U mL−1) | 829 | 1072 | 1217 | 986 | 1474 | 1045 | 1000 |
| NTPs (m | 8.57 | 8.81 | 9.89 | 8.50 | 7.75 | 9.29 | 7.50 |
| DNA template (n | 61 | 100 | 89 | 100 | 89 | 72 | 40 |
| Ppase (U mL−1) | 10 | 9 | 5 | 2 | 8 | 7 | 1 |
| Spermidine (m | 2.65 | 1.35 | 2.24 | 1.31 | 2.25 | 2.03 | 1.00 |
| T7 RNAP (U mL−1) | 7346 | 7320 | 6607 | 6166 | 7743 | 7748 | 7000 |
| Temperature (°C) | 43 | 39 | 44 | 40 | 44 | 44 | 37 |
| Time (min) | 263 | 98 | 120 | 148 | 121 | 279 | 240 |
| pH | 6.89 | 6.80 | 6.65 | 6.78 | 6.67 | 6.60 | 8.00 |
| mRNA C (g · L−1) | 12.61 | 10.76 | 11.76 | 12.27 | 12.18 | 11.52 | 7.64 |
| ±0.82 | ±0.47 | ±0.66 | ±0.77 | ±0.98 | ±0.23 | ±0.87 |
Abbreviations: DTT, dithiothreitol; mRNA, messenger RNA; NTP, nucleoside triphosphate.
Figure 1Bayesian optimization of messenger RNA (mRNA) in vitro transcription (IVT) reaction. (a) Bayesian optimization workflow. (b) One‐dimensional example of Bayesian optimization process using a Gaussian process surrogate model and corresponding acquisition function, maximized to select the next set of parameters to be tested. The surrogate model is plotted as the posterior mean, with the shaded region representing a posterior distribution uncertainty of 2 units. (c) All parameter configurations for all the IVT experimental runs along with their respective evaluation in mRNA concentration (g · L−1). (d) Convergence plot depicting the best evaluation throughout all the IVT experimental runs, and convergence to the optimum. (e) Feature importance summary computed from the average SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) computed for the Gaussian Process regressor predictions across all IVT experimental data. (f) Impact of feature value in model prediction value for the Gaussian Process regressor used as surrogate model in the Bayesian optimization process.
Figure 2Messenger RNA (mRNA) production analysis. (a) mRNA production profile of the six best runs obtained by Bayesian optimization and the benchmark reaction 19 for a time course of 5 h. Error bars represent standard deviation obtained for each point. A second order polynomial function was used as a trendline for visualization purposes. (b) Percentage of mRNA produced as a function of minutes of reaction time considering 100% the highest mRNA concentration produced for each set of runs. Error bars represent standard deviation obtained for each point. A second order polynomial function was used as a trendline for visualization purposes. (c) Agarose gel electrophoresis analysis of the reaction mixture at the defined setpoints for the best production run (Run 5). (d) mRNA production profile for a time course of 5 h using the best run parameters (Run 5), and templates with the different sizes (EGFP—1195 bp; RBD_EGFP—1864 bp) Cas9_EGFP—5299 bp). Error bars represent standard deviation obtained for each point. A second order polynomial function was used as a trendline for visualization purposes. (e) mRNA production using the parameters of the Run 5 and 2 h of reaction time with the three different size templates (EGFP, RBD_EGFP, Cas9_EGFP). (f) mRNA production concentration () using EGFP template for the following runs: optimized—Run 5 and 2 h of reaction time; Moderna, Inc. (Bancel et al., 2016) and Curevac N.V. (Wochner et al., 2021) patent conditions and literature conditions (Henderson et al., 2021).