Zhenpeng Zhou1, Xiaocheng Li2, Richard N Zare1. 1. Department of Chemistry, Stanford University, Stanford, California 94305, United States. 2. Department of Management Science and Engineering, Stanford University, Stanford, California 94305, United States.
Abstract
Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.
Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.
Unoptimized
chemical reactions are costly and inefficient in regard to time and
reagents. A common way for chemists to optimize reactions is to change
a single experimental condition at a time while fixing all the others.[1] This approach will often miss the optimal conditions.
Another way is to search exhaustively all combinations of reaction
conditions via batch chemistry. Although this approach has a better
chance to find the global optimal condition, it is time-consuming
and expensive. An efficient and effective framework to optimize chemical
reactions will be of great importance for both academic research and
industrial production. We present here one potential approach to achieving
this goal.There have been various attempts to use automated
algorithms to optimize chemical reactions.[2] Jensen and co-workers utilized the simplex method to optimize reactions
in microreactors.[1,3] Poliakoff and co-workers constructed
a system with what they called a stable noisy optimization by branch
and fit (SNOBFIT) algorithm to optimize reactions in supercritical
carbon dioxide.[4] Jensen and co-workers
optimized the Suzuki–Miyaura reaction, which involves discrete
variables, by automated feedback.[5] There
are also numerous studies on optimizing a chemical reaction in flow
reactors.[6] Truchet and co-workers optimized
the Heck–Matsuda reaction with a modified version of the Nelder–Mead
method.[7] Lapkin and co-workers developed
a model-based design of experiments and self-optimization approach
in flow.[8] Ley and co-workers built a novel
Internet-based system for reaction monitoring and optimization.[9] Bourne and co-workers developed automated continuous
reactors, which use high performance liquid chromatography (HPLC)[10] or online mass spectrometry (MS)[11] for reaction monitoring and optimization. deMello
and co-workers designed a microfluidic reactor for controlled synthesis
of fluorescent nanoparticles.[12] Cronin
and co-workers provided a flow-NMR platform for monitoring and optimizing
chemical reactions.[13] We are going to suggest
a different approach that we believe will further improve the reaction
optimization process.Recently, the idea of machine learning
and artificial intelligence[14,15] has produced surprising
results in the field of theoretical and computational chemistry. Aspuru-Guzik
and co-workers designed graph convolutional neural networks for molecules,[16] and realized automatic chemical design with
data-driven approaches.[17−19] One step further, Pande and co-workers
extended the idea of graph convolution,[20] and proposed one-shot learning for drug discovery.[21] Meanwhile, both the Aspuru-Guzik group and the Jensen group
derived intuition of predicting organic reactions from the machine
learning perspective.[22,23] Besides, the machine learning
approach also succeeded in using experimental data to make predictions.
Norquist and co-workers predicted the reaction outcome from failed
experiments with the help of a support vector machine.[24] Sebastian and co-workers utilized neural networks
to identify skin cancers,[25] Zare and co-workers
applied machine learning/statistics on mass spectrometry data to determine
cancer states[26] and identify personal information.[27] Inspired by all the current successes achieved
for prediction, we have applied the decision-making framework to problems
in chemistry, specifically chemical reactions.We developed
a model we call the Deep Reaction Optimizer (DRO) to guide the interactive
decision-making procedure in optimizing reactions by combining state-of-the-art
deep reinforcement learning with the domain knowledge of chemistry.
Iteratively, the DRO interacts with chemical reactions to obtain the
current experimental condition and yield, and determines the next
experimental condition to attempt. In this way, DRO not only served
as an efficient and effective reaction optimizer but also provided
us a better understanding of the mechanism of chemical reactions than
that obtained using traditional approaches. With extensive experiments
on simulated reactions, our method outperformed several state-of-the-art
blackbox optimization algorithms of covariance matrix adaption–evolution
strategy (CMA-ES),[28] the Nelder–Mead
simplex method,[29] and stable noisy optimization
by branch and fit (SNOBFIT)[30] by using
71% fewer steps. We also demonstrated that DRO applied to four real
microdroplet reactions found the optimal experimental conditions within
30 min, owing its success to the acceleration of reaction rates in
microdroplet chemistry[31,32] and to its efficient optimization
algorithm. Moreover, our model achieved better performance after running
on real reactions, showing its capability to learn from past experience.Besides, our model showed strong generalizability in two ways:
First, based on optimization of a large family of functions, our optimization
goal can be not only yield but also selectivity, purity, or cost,
because all of them can be modeled by a function of experimental parameters.
Second, a wide range of reactions can be accelerated by 103 to 106 times in microdroplets.[33] Showing that a microdroplet reaction can be optimized in 30 min
by our model of DRO, we therefore propose that a large class of reactions
can be optimized by our model. The wide applicability of our model
suggests it to be useful in both academic research and industrial
production.
Method
Optimization of Chemical Reactions
A reaction can be
viewed as a system taking multiple inputs (experimental conditions)
and providing one desired output. Example inputs include temperature,
solvent composition, pH, catalyst, and time. Example outputs include
product yield, selectivity, purity, and cost. The reaction can be
modeled by a function r = R(), where stands for the experimental conditions and r denotes
the objective, say, the yield. The function R describes
how the experimental conditions affect r. Reaction optimization refers to the procedure for searching
the combination of experimental conditions that achieves the objective
in an optimal manner, also, desirably with the least number of steps.In general, chemical reactions are expensive and time-consuming
to conduct, and the outcome can vary largely, which is caused in part
by measurement errors. Motivated by these considerations, we developed
our model of Deep Reaction Optimizer (DRO) with the help of reinforcement
learning.
Deep Reaction Optimizer by Reinforcement Learning
Reinforcement
learning is an area of machine learning concerned with how the “decision-maker(s)”
ought to take sequential “actions” in a prescribed “environment”
so as to maximize a notion of cumulative “reward”. In
the context of reaction optimization, where the reaction is the environment,
an algorithm or person (decision-maker) decides what experimental
conditions to try (actions), in order to achieve the highest yield
(reward).Mathematically, the underlying model for reinforcement
learning is the Markov decision process characterized by , wheredenotes
the set of states . In the context
of reaction optimization, is
the set of all possible combinations of experimental conditions.denotes
the set of actionsa. In the context
of reaction optimization, is
the set of all changes that can be made to the experimental conditions,
for example, increasing the temperature by 10 °C and so forth.{P} denotes the state transition probabilities. Concretely, P specifies the probability
of transiting from to another state
with action a. In the context of a chemical reaction, P specifies to what experimental
conditions the reaction will move if we decide to make a change a to the experimental condition . Intuitively, P measures the inaccuracy when operating the instrument. For example,
the action of increasing the temperature by 10 °C may result
in a temperature increase of 9.5–10.5 °C.R denotes the reward function of state and action a. In the environment
of a reaction, the reward r is only a function of
state , i.e., a certain experimental
condition (state) is mapped to yield r (reward) by the reward function r = R().The core of reinforcement learning is to search for an optimal
policy, which captures the mechanism of decision-making. In the context
of chemical reactions, the policy refers to the algorithm that interacts
with the chemical reaction to obtain the current reaction condition
and reaction yield, from which the next experimental conditions are
chosen. Rigorously, we define the policy as the function π,
which maps from the current experimental condition and history of the experiment
record to the next experimental condition, that is,where is the history, and t records the number of steps we have taken in reaction
optimization.Intuitively, the optimization procedure can be
explained as follows: We iteratively conduct an experiment under a
specific experimental condition and record the yield. Then the policy
function makes use of all the history of experimental record (what
condition led to what yield) and tells us what experimental condition
we should try next. This procedure is described in Algorithm 1 and illustrated in Scheme .
Scheme 1
Visualization of
the DRO Model Unrolled over Three Time Steps
As stated earlier, the environment of chemical reaction is characterized
by the reaction function of = ().
Visualization of
the DRO Model Unrolled over Three Time Steps
As stated earlier, the environment of chemical reaction is characterized
by the reaction function of = ().
Recurrent Neural Network as the Policy Function
Our DRO model employs the recurrent neural network (RNN) to fit
the policy function π under the settings of chemical reactions.
A recurrent neural network is a nonlinear neural network architecture
in machine learning. An RNN parametrized by θ usually takes
the form ofwhere and are the data at time steps t and t + 1, and and refer to the hidden state at time steps t and t + 1. The hidden state contains
the history information that RNN passes to the next time step, enabling
the network to remember past events, which can be used to interpret
new inputs. This property makes RNN suitable as the policy for making
decisions, which takes a similar form of the policy function of eq . A modified version of
RNN to model the policy function,where at time step t, is the hidden state to model the history , , denotes the
state of reaction condition, and r is the yield (reward) of reaction outcome. The policy of RNN
maps the inputs at time step t to outputs at time
step t + 1. The model is exemplified as in Scheme .
Training the
DRO
The objective in reinforcement learning is to maximize
the reward by taking actions over time. Under the settings of reaction
optimization, our goal is to find the optimal reaction condition with
the least number of steps. Then, our loss function l(θ) for the RNN parameters θ is defined aswhere T is the time horizon (total number of steps) and r is the reward at time step t. The term inside the parentheses measures the improvement
we can achieve by iteratively conducting different experiments. The
loss function (eq )
encourages reaching the optimal condition faster, in order to address
the problem that chemical reactions are expensive and time-consuming
to conduct.The loss function l(θ) is
minimized with respect to the RNN parameters θ by an algorithm
of gradient descent, which computes the gradient of the loss function
∇θl(θ), and updates
the parameter of θ by the rule θ ← θ –
η∇θl(θ), where
η is the step size.
Results and Discussion
Pretraining
on Simulated Reactions
As mentioned earlier, chemical reactions
are time-consuming to evaluate. Although our DRO model can greatly
accelerate the procedure, we still propose to first train the model
on simulated reactions (a technique called pretraining in machine
learning). A class of nonconvex “mixture Gaussian density functions”
is used as the simulated reactions environment r = R(). The nonconvex functions
could have multiple local minima.The motivation for using a
mixture of Gaussian density functions comes from the idea that they
can be used to approximate arbitrarily close all continuous functions
on a bounded set. We assume that the response surface for most reactions
is a continuous function, which can be well approximated by a mixture
of Gaussian density functions. Besides, a mixture of Gaussian density
functions often has multiple local minima. The rationale behind this
is that the response surface of a chemical reaction may also have
multiple local optima. As a result, we believe a mixture of Gaussian
density functions can be a good class of function to simulate real
reactions.We compared our DRO model with several state-of-the-art
blackbox optimization algorithms of covariance matrix adaption–evolution
strategy (CMA-ES), Nelder–Mead simplex method, and stable noisy
optimization by branch and fit (SNOBFIT) on another set of mixture
Gaussian density functions that are unseen during training. This comparison
is a classic approach for model evaluation in machine learning. We
use “regret” to evaluate the performance of the models.
The regret is defined asand
it measures the gap between the current reward and the largest reward
that is possible. Lower regret indicates better optimization. In the
context of simulated reaction, the functions are randomly generated
and we can access the global maximum/minimum value of the function,
which corresponds to the “largest reward that is possible”.Figure A shows
the average regret versus time steps of the two algorithms from which
we see that DRO outperforms CMA-ES significantly by reaching a lower
regret value in fewer steps. For 5000 random functions, DRO can reach
the criterion of regret ≤0.05 in approximately 32 steps, on
average, whereas CMA-ES needs 111 steps, SNOBFIT needs 187 steps,
and Nelder–Mead fails to reach the criterion. Overall, the
experiments demonstrate that our model outperforms those algorithms
on the task of nonconvex function optimization, i.e., simulated chemical
reaction optimization.
Figure 1
(A) Comparison of average regret of CMA-ES, Nelder–Mead
simplex method, SNOBFIT, and DRO. The average regret is calculated
as the average regret on 1000 random nonconvex functions. (B) The
observed regret of 10 random nonconvex functions in which each line
is the regret of one function.
(A) Comparison of average regret of CMA-ES, Nelder–Mead
simplex method, SNOBFIT, and DRO. The average regret is calculated
as the average regret on 1000 random nonconvex functions. (B) The
observed regret of 10 random nonconvex functions in which each line
is the regret of one function.
Randomized Policy for Deep Exploration
Although our model
of DRO optimizes nonconvex functions faster than CMA-ES, we observe
that DRO sometimes get stuck in a local maximum (Figure B) because of the deterministic
“greedy” policy, where greedy means making the locally
optimal choice at each stage without exploration. In the context of
reaction optimization, a greedy policy will stick to one reaction
condition if it is better than any other conditions observed. However,
the greedy policy will get trapped in a local optimum, failing to
explore some regions in the space of experimental conditions, which
may contain a better reaction condition that we are looking for. To
further accelerate the optimization procedure in this aspect, we proposed
a randomized exploration regime to explore different experimental
conditions, in which randomization means drawing the decision randomly
from a certain probability distribution. This idea came from the work
of Van Roy and co-workers,[34] which showed
that deep exploration can be achieved from randomized value function
estimates. The stochastic policy also addresses the problem of randomness
in chemical reactions.A stochastic recurrent neural network
was used to model a randomized policy,[35] which can be written asSimilar to the notations
introduced before, the RNN is used to generate the mean μ, and the covariance matrix Σ; the next state is then drawn from a multivariate
Gaussian distribution of , at time step t + 1. This approach
achieved deep exploration in a computationally efficient way.Figure compares
between the greedy policy and the randomized policy on another group
of simulated reactions. Although the randomized policy was slightly
slower, it arrives to a better function value owing to its more efficient
exploration strategy. Comparing the randomized policy with a deterministic
one, the average regret was improved from 0.062 to 0.039, which shows
a better chance of finding the global optimal conditions.
Figure 2
Comparison
of deterministic policy and randomized policy in the model of DRO.
Comparison
of deterministic policy and randomized policy in the model of DRO.
Optimization of Real Reactions
We carried out four experiments in microdroplets and recorded the
production yield: The Pomeranz–Fritsch synthesis of isoquinoline
(Scheme a),[36] Friedländer synthesis of a substituted
quinoline (Scheme b),[36] the synthesis of ribose phosphate
(Scheme c),[37] and the reaction between 2,6-dichlorophenolindophenol
(DCIP) and ascorbic acid (Scheme d).[38] In all four reactions,
two reagents are filled in separate syringes. Solutions from those
two syringes are mixed in a T-junction and sprayed from an electrospray
ionization (ESI) source with high voltage and pressure. The flow rate
(from 0 to 10 μL/min), voltage (from 0 to 10 kV), and pressure
(from 0 to 120 psi) applied on the spray source are the experimental
parameters that are optimized, with all other conditions held constant.
In these four reactions, the variables are continuous. The reaction
yield of product, which was measured by mass spectrometry, was set
as the optimization objective. The initial reaction conditions are
randomly chosen. DRO, CMA-ES, and one variable at a time (OVAT) methods
were compared on the four reactions. The DRO model had been pretrained
on simulated reaction data, and the “OVAT” refers to
the method of scanning a single experimental condition while fixing
all the others, i.e., hold all variables but one, and see the best
result when the one free variable is varied. As mentioned before,
CMA-ES is the state-of-the-art blackbox optimizer in machine learning
and OVAT is the classic approach followed by many researchers and
practitioners in chemistry. DRO outperformed the other two methods
by reaching a higher yield in fewer steps (Figure ). In both reactions, DRO found the optimal
condition within 40 steps, with the total time of 30 min required
to optimize a reaction. In comparison, CMA-ES needs more than 120
steps to reach the same reaction yield as DRO, and OVAT failed to
find the optimal reaction condition.
Scheme 2
(a) Pomeranz–Fritsch
Synthesis of Isoquinoline, (b) Friedländer Synthesis of a Substituted
Quinoline, (c) Synthesis of Ribose Phosphate, and (d) the Reaction
between 2,6-Dichlorophenolindophenol (DCIP) and Ascorbic Acid
Figure 3
Performance comparison of CMA-ES, DRO,
and OVAT methods on the microdroplet reaction of (A) Pomeranz–Fritsch
synthesis of isoquinoline, (B) Friedländer synthesis of a substituted
quinoline, (C) synthesis of ribose phosphate, and (D) the reaction
between DCIP and ascorbic acid. The signal intensity can be converted
into reaction yield with calibration.
Performance comparison of CMA-ES, DRO,
and OVAT methods on the microdroplet reaction of (A) Pomeranz–Fritsch
synthesis of isoquinoline, (B) Friedländer synthesis of a substituted
quinoline, (C) synthesis of ribose phosphate, and (D) the reaction
between DCIP and ascorbic acid. The signal intensity can be converted
into reaction yield with calibration.The optimal conditions in microdroplet reactions may not
always be the same as those in bulk reactions. It is also our experience
that most reactions in bulk follow the same reaction pathway as in
microdroplets, so that we feel that learning to optimize microdroplet
reactions may often have a direct bearing on bulk reactions. For simulated
reactions, we showed that the model of DRO can optimize any random
mixture Gaussian density function. And it is provable that all continuous
functions on a bounded set can be approximated arbitrarily close by
a mixture of Gaussian density functions. Given that the response surface
of a large quantity of reactions can be viewed as a continuous function,
we propose that our model of DRO can optimize a bulk-phase reaction
as well.To demonstrate the applicability of our model to a
more general experimental setup, we optimized the bulk-phase reaction
of silver nanoparticle synthesis. Silver nanoparticles were synthesized
by mixing silver nitrate (AgNO3), sodium borohydride (NaBH4), and trisodium citrate (TSC).[39] The optimization objective was set to be maximizing the absorbance
at 500 nm (in order to get silver nanoparticles of approximately 100
nm), and the optimization parameters were the concentration of NaBH4 (from 0.1 to 10 mM), the concentration of TSC (from 1 to
100 mM), and reaction temperature (from 25 to 100 °C), with all
other conditions held constant. Figure shows the comparison of DRO and CMA-ES on silver nanoparticle
synthesis. We therefore conclude that DRO is extendable to bulk-phase
reactions.
Figure 4
Performance comparison of CMA-ES and DRO on the bulk-phase reaction
of silver nanoparticle synthesis.
Performance comparison of CMA-ES and DRO on the bulk-phase reaction
of silver nanoparticle synthesis.
Learning for Better Optimization
We also observed that the
DRO algorithm is capable of learning while optimizing on real experiments.
In other words, each time running a similar or even dissimilar reactions
will improve the DRO policy. To demonstrate this point, the DRO was
first trained on the reaction of the Pomeranz–Fritsch synthesis
of isoquinoline and then tested on the reaction of the Friedländer
synthesis of substituted quinoline. Figure A compares the performance of the DRO before
and after training. The policy after training showed a better performance
by reaching a higher yield at a faster speed.
Figure 5
(A) The performance on
Friedländer synthesis of DRO before and after training on the
Pomeranz–Fritsch synthesis. (B) The performance on ribose phosphate
synthesis of DRO before and after training on the Pomeranz–Fritsch
and Friedländer syntheses.
(A) The performance on
Friedländer synthesis of DRO before and after training on the
Pomeranz–Fritsch synthesis. (B) The performance on ribose phosphate
synthesis of DRO before and after training on the Pomeranz–Fritsch
and Friedländer syntheses.The DRO policy showed better performance not only after training
on a reaction with similar mechanism but also on reactions with different
mechanisms. Figure B compares the performance of the DRO on ribose phosphate synthesis
before and after training on the Pomeranz–Fritsch and Friedländer
syntheses. Although they have achieved similar product yield, the
DRO after training can reach the optimal condition with a faster speed.
Understanding the Reaction Mechanism
The reaction optimization
process also provided us insights into the reaction mechanism. The
reaction response surface was fitted by a Gaussian process (Figure ), which showed that
the yield at a low voltage of 1 kV was more sensitive to the pressure
and flow rate change than the reaction at a higher voltage. On the
other hand, the feature selection by Lasso[40,41] suggests that pressure/(flow rate), voltage/(flow rate), and square
of pressure were the three most important features in determining
the reaction yield. Flow rate and pressure were correlated because
higher flow rate resulted in more liquid flowing out. In turn, the
higher liquid flow needed higher pressure to generate smaller-sized
droplets, in which reactions had a higher rate. The correlation was
similar for voltage/(flow rate) pairs. Higher flow rate made the droplets
larger; as a result, a higher voltage was required to generate enough
charge to drive the reactions occurring inside them. The quadratic
dependence on the pressure suggested that there was an optimal pressure
for the reaction, because higher pressure would generate smaller-sized
droplets with higher reaction rates, but smaller droplets would also
evaporate faster; the faster evaporation would result in a shorter
reaction time. The optimization process of DRO provides a better sampling
than a grid search for LASSO regression. The DRO algorithm samples
around the response surface with higher uncertainty, which reduces
the bias of fitting. Besides, DRO also samples more around the optimal
point, in order to get a more accurate fitting near the optimal. All
of this data analysis leads us to a better understanding of how reactions
occur in microdroplets.
Figure 6
Possible reaction response surface of the Friedländer
synthesis of a substituted quinoline, predicted from the optimization
process.
Possible reaction response surface of the Friedländer
synthesis of a substituted quinoline, predicted from the optimization
process.
Conclusion
We
have developed the DRO model for optimizing chemical reactions and
demonstrated that it has superior performance under a number of different
circumstances. The DRO model combines state-of-the-art deep reinforcement
learning techniques with the domain knowledge of chemistry, showcasing
its capability in both speeding up reaction optimization and providing
insight into how reactions take place in droplets. We suggest that
the optimization strategy of integrating microdroplet reactions with
our DRO acceleration can be applied to a wide range of reactions.
Experimental
Details
Model Design
A modified long short-term memory (LSTM)[42] architecture is proposed to accept two inputs
of and , and output the new . The LSTM cell is defined as follows:in which the variables arex, the input (reaction
conditions) at time step t – 1;y, the output (reaction yield) at time step t –
1;c, the cell state vector;f, f, i, i, o, and o, gate
vectors;W, U, and b, parameter matrices and vectors.We denote elementwise multiplication by
◦ to distinguish it from matrix multiplication. The functions f, i, and o are named
from forget, input, and output.The proposed LSTM cell can be
abbreviated asIn order to learn the optimal
policy modeled by the RNN, we define a loss function inspired by the
regret in reinforcement learning community aswhere T is the overall time horizon, r is the reward at time step t, and γ ∈ (0, 1) is the discount factor. The
term inside the parentheses measures the improvement we can achieve
by exploration. The intuition of applying the discount factor is that
the importance of getting a maximum reward increases as time goes
on.As mentioned before, chemical reactions are time-consuming
to evaluate. Therefore, we need to train the model on mock reactions
first. A class of nonconvex functions of the mixture Gaussian probability
density functions is used as mock reactions, which allows a general
policy to be trained. A Gaussian error term is added to the function
to model the large variance property of chemical reaction measurements.
The mock reactions can be written aswhere c is the coefficient, μ is the mean, and Σ is the covariance of a multivariate Gaussian distribution; k is the dimension of the variables. is the error term, which is a random variable drawn from a Gaussian
distribution with mean 0 and variance σ2. In our
case, the number of parameters to be optimized is three. N was set to 6. μ was
sampled from a uniform distribution from 0.01 to 0.99, diagonalized Σ was sampled from a Gaussian
distribution of , and the coefficient of c was sampled from a Gaussian distribution of and then normalized.
Training Details
The framework of tensorflow[43] is used to formulate and train the model. The
LSTM structure is unrolled on trajectories of T =
50 steps, the nonconvex functions with random parameters in each batch
are used as training sets, and the loss function (eq. ) is used
as the optimization goal. The Adam optimizer is used to train the
neural network.The hyperparameters chosen are “batch_size”,
128; “hidden_size”, 80; “num_layers”,
2; “num_epochs”, 50000; “num_params”,
3; “num_steps”, 50; “unroll_length”, 50;
“learning_rate”, 0.001; “lr_decay”, 0.75;
“optimizer”, “Adam”; “loss_type”,
“oi”; and “discount_factor”, 0.97.
Chemicals
and Instrumentation
All chemicals are purchased as MS grade
from Sigma-Aldrich (St. Louis, MO). Mass spectra are obtained using
an LTQ Orbitrap XL Hybrid Ion Trap-Orbitrap Mass Spectrometer from
Thermo Fisher Scientific Inc. (Waltham, MA).
Experimental Setup
In the microdroplet reactions, reactant solutions from two syringes
are mixed through a T-junction and sprayed in a desorption electrospray
ionization source with application of high voltage and pressure. The
reactions occur in droplets and are monitored by a mass spectrometer.In the reaction of silver nanoparticle synthesis, sodium borohydride
and trisodium citrate of specific concentrations were mixed and put
into a bath at a specific temperature. Silver nitrate solution was
then added dropwise. The concentration of AgNO3 was fixed
at 1 mM. The concentration of NaBH4 ranged from 0.1 to
10 mM, the concentration of TSC ranged from 1 to 100 mM, and reaction
temperature ranged from 25 to 100 °C.
Feature Selection by Lasso
Let p be the gas pressure, u be
the voltage, and v be the flow rate. The engineered
features are u, v, pu, pv, uv, p/u, p/v, u/v, p2, u2, v2. The loss function of
lasso regression is , where ∥•∥2 is the 2 norm, ∥•∥1 is the 1 norm, θ is the model parameter, x is the input features, y is the output results, and λ is the regularization
coefficient. Features will pop out in an order corresponding to their
importance when increasing the regularization coefficient while minimizing
the loss function.[13] Lasso regression is
performed repeatedly so that exactly 1, 2, or 3 features are selected;
the top three most important features are p/v, u/v, and p2.
Data Availability
The corresponding
code will be released on github.
Authors: Livia S Eberlin; Robert J Tibshirani; Jialing Zhang; Teri A Longacre; Gerald J Berry; David B Bingham; Jeffrey A Norton; Richard N Zare; George A Poultsides Journal: Proc Natl Acad Sci U S A Date: 2014-02-03 Impact factor: 11.205
Authors: Connor W Coley; Regina Barzilay; Tommi S Jaakkola; William H Green; Klavs F Jensen Journal: ACS Cent Sci Date: 2017-04-18 Impact factor: 14.553
Authors: Benjamin J Shields; Jason Stevens; Jun Li; Marvin Parasram; Farhan Damani; Jesus I Martinez Alvarado; Jacob M Janey; Ryan P Adams; Abigail G Doyle Journal: Nature Date: 2021-02-03 Impact factor: 49.962
Authors: Lukáš Grajciar; Christopher J Heard; Anton A Bondarenko; Mikhail V Polynski; Jittima Meeprasert; Evgeny A Pidko; Petr Nachtigall Journal: Chem Soc Rev Date: 2018-11-12 Impact factor: 54.564