Literature DB >> 35756374

Bayesian Optimization of Computer-Proposed Multistep Synthetic Routes on an Automated Robotic Flow Platform.

Anirudh M K Nambiar¹, Christopher P Breen², Travis Hart¹, Timothy Kulesza¹, Timothy F Jamison², Klavs F Jensen¹.

Abstract

Computer-aided synthesis planning (CASP) tools can propose retrosynthetic pathways and forward reaction conditions for the synthesis of organic compounds, but the limited availability of context-specific data currently necessitates experimental development to fully specify process details. We plan and optimize a CASP-proposed and human-refined multistep synthesis route toward an exemplary small molecule, sonidegib, on a modular, robotic flow synthesis platform with integrated process analytical technology (PAT) for data-rich experimentation. Human insights address catalyst deactivation and improve yield by strategic choices of order of addition. Multi-objective Bayesian optimization identifies optimal values for categorical and continuous process variables in the multistep route involving 3 reactions (including heterogeneous hydrogenation) and 1 separation. The platform's modularity, robotic reconfigurability, and flexibility for convergent synthesis are shown to be essential for allowing variation of downstream residence time in multistep flow processes and controlling the order of addition to minimize undesired reactivity. Overall, the work demonstrates how automation, machine learning, and robotics enhance manual experimentation through assistance with idea generation, experimental design, execution, and optimization.

Entities: Chemical

Year: 2022 PMID： 35756374 PMCID： PMC9228554 DOI： 10.1021/acscentsci.2c00207

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 18.728

Introduction

Machine assistance has helped to automate and accelerate steps in the synthesis of organic compounds, accelerating the discovery and development of new medicines and materials. Going from a molecular structure to a tangible product or a fully defined synthesis route involves two high-level tasks: (1) synthesis planning, where the starting materials, reactions, and reagents that can be used to make the target molecule are identified, and (2) process development, where the unit operations and reaction conditions needed to maximize process performance are specified. Computer-aided synthesis planning (CASP) tools,[1,2] based on human-curated reaction rules or algorithmically learned transformations from published reaction data, can automatically generate recommendations for retrosynthetic routes and reaction conditions, helping to generate ideas quickly and reducing manual database lookups. On the process development side, walk-up multistep synthesis platforms[3−11] equipped with reactors, separators, process analytical technology (PAT), and automation tools (e.g., liquid handlers, reaction sampling, code scripts, user interfaces) enable data-rich experimentation and lighten the manual workload associated with repetitive reaction execution and analysis. There is growing interest in experimentally validating CASP suggestions and integrating CASP tools with automated synthesis platforms. Experimental validation of CASP recommendations has been demonstrated via batch synthesis[12−14] and on a robotic continuous flow synthesis platform,[6] and disclosed examples still remain rare. The main obstacle to direct implementation of CASP-designed routes without human intervention is that synthesis procedures require a fully defined set of instructions (a recipe) specifying the sequence of unit operations and corresponding reaction conditions (e.g., concentrations, temperature, time, stoichiometry) with a level of precision that exceeds what can currently be learned or extracted from limited data.[15] Therefore, human input was still needed in the reported cases to fill in key procedural details and manually optimize approximate reaction conditions generated by CASP. Algorithm-guided automated optimization of reaction conditions to maximize a desired objective function (e.g., yield) is another example of machine assistance in organic synthesis.[4,16−23] In this approach, an algorithm proposes reaction conditions to evaluate within a defined search space (set of values that continuous or categorical process variables can take) based on feedback and analysis of results from previous experiments. Coupling the algorithm with an automated synthesis platform and inline/online PAT tools results in closed-loop design and execution of experiments for efficient reaction development.[24−26] Prior applications of algorithmic optimization to chemistry have focused primarily on single-step transformations[17] involving model reactions, with recent work on two-step processes.[27−29] Furthermore, only a few reports consider categorical reaction variables[18,21,22] (e.g., catalyst) and multiple objectives.[19,22] In reality, however, process development for functional organic compounds often involves multireaction pathways with categorical reagent choices and several process metrics of interest (e.g., yield, productivity). In this work, we leveraged algorithm-guided multistep reaction optimization as a tool to identify optimal process conditions for approximate synthesis recipes generated by CASP (Figure ). The open source CASP software ASKCOS[6,30] was used to generate recommendations for synthesis routes and reaction conditions for an exemplary small molecule target, sonidegib. After manual assessment of synthetic feasibility, we selected a high-ranked route to optimize involving three reactions and one phase separation. While ASKCOS recommendations and manual solubility testing populated some reaction conditions (solvent, base, catalyst, concentration) in the approximate recipe, specification of critical process parameters was left to a multi-objective Bayesian algorithm that optimized both continuous and categorical variables with respect to multiple process metrics. Experiments proposed by the algorithm were executed on a modular, robotically reconfigurable, continuous flow synthesis platform equipped with multiple PAT tools for reaction analysis and feedback. We consider the chemical and physical interdependencies that arise when optimizing multiple telescoped steps simultaneously and show how the synthesis platform’s modularity unlocks an independent degree of freedom for varying downstream reaction time.

Figure 1

Overall approach for machine-assisted synthesis planning and process development. Computer-aided synthesis planning (CASP) recommendations for synthesis routes and reaction conditions to make the target molecule are assessed by humans to create an approximate recipe with missing process details. A multi-objective Bayesian reaction optimization algorithm coupled to a robotic multistep flow synthesis platform optimizes continuous and categorical reaction conditions to fully specify the synthesis recipe.

Results and Discussion

Computer-Aided Synthesis Planning for Sonidegib

The active pharmaceutical ingredient sonidegib 6 (Odomzo)[31,32] was chosen as an exemplary target molecule for multistep synthesis. The open source CASP software ASKCOS was used to generate retrosynthetic pathways and forward reaction conditions (Scheme , full details in SI). In the rank #1 pathway, the first retrosynthetic disconnection of 6 occurs at the central amide bond, resulting in substituted aminopyridine 4 and biaryl 5 (Scheme A). Aminopyridine 4 may be constructed either in one step via the rank #1 SNAr reaction of morpholine 1 with a haloaminopyridine 7 or in two steps via the rank #2 pathway involving reduction of 3 and SNAr starting with halonitropyridine 2. The software further proposed that chlorinated 2a, brominated 2b, and fluorinated 2c were viable starting materials. Structure–reactivity principles suggested that the SNAr reaction of 1 and 7 would be disfavored while the same reaction of 1 and 2 would proceed smoothly. Therefore, we decided to proceed with the 3-step route starting from 1 and 2.

Scheme 1

Computer-Aided Synthesis Planning Recommendations for Sonidegib 6

Computer-Aided Synthesis Planning Recommendations for Sonidegib 6

(A) Top-ranked retrosynthetic pathways. (B) Proposed forward reaction conditions with continuous variables in red and categorical variables in blue. Abbreviations: EDG (electron donating group), EWG (electron withdrawing group), HATU (hexafluorophosphate azabenzotriazole tetramethyl uronium), EDC (1-ethyl-3-(3-dimethylaminopropyl)carbodiimide), HOBt (1-hydroxybenzotriazole). In the forward reaction conditions for the 3-step route (Scheme B, shown under the arrows), dimethylformamide (DMF) was recommended as the rank #1 solvent for both the SNAr and amide coupling reactions. For the SNAr reaction, both inorganic and organic amine bases were recommended but solubility testing revealed that DIPEA (N,N-diisopropylethylamine, or Hünig’s base) was necessary to fully solubilize the ammonium halide salt byproduct. The need to verify that all components are soluble at the reaction concentration before flow synthesis is a step where human insight is still needed in the current approach. A heterogeneous palladium(0) (Pd0) catalyst with H2 as the terminal reductant was proposed for the reduction of 3. While these details partially populated the synthesis recipe, there were both continuous and categorical reaction conditions (red and blue, respectively, in Scheme B) that were left to experimental optimization. These included continuous variables such as temperature, residence time, and stoichiometry, but also categorical choices recommended by ASKCOS such as the SNAr halide leaving group (Cl, Br, or F) and the amide coupling reagent (HATU or EDC/HOBt).

Robotic Flow Synthesis Platform

The CASP-proposed and human-refined synthesis route was experimentally optimized on a modular, robotically reconfigurable, continuous flow synthesis platform (Figure ) originally developed by our lab[6] and improved with a faster robot and capabilities for reaction analysis and feedback optimization (see Supplementary Movie and SI for more details). The enabling features of continuous synthesis include access to elevated temperatures and pressures to accelerate reaction rates, tight control over reaction conditions due to shorter length scales for heat and mass transfer, ability to model scaling and mixing behavior due to well-defined reactor geometries, and availability of hardware for inline separations and reaction monitoring.[26,33−35]

Figure 2

Multistep flow synthesis platform with a library of robotically reconfigurable process modules for reactions, separations, and inline/online analysis.

Multistep flow synthesis platform with a library of robotically reconfigurable process modules for reactions, separations, and inline/online analysis. Continuous telescoped multistep synthesis, i.e., all steps performed simultaneously in an uninterrupted sequence,[36] can be realized on the platform by fluidically connecting multiple unit operations together. The platform consists of 2 process stacks with 8 process bays in total, and process modules can be placed by a 4-axis gantry robot (3 linear XYZ Cartesian axes +1 rotary θ axis) onto the process bays in any order to access both linear and convergent sequences. The linear and rotary axes of the gantry robot facilitate straightforward path planning (compared to our previous six axis robot) and have a high repeatability (XYZ axes: 5 μm, θ axis: 5 arc-sec) for consistent pick-and-place within 1 min per process module. Process modules available include heated tubular reactors with various volumes (0.5, 1, and 3 mL of PFA tubing), a packed bed reactor, and a membrane phase separator for inline liquid–liquid or gas–liquid separations. The robot also interfaces with a reagent tubing “switchboard” (akin to a telephone wire switchboard) to attach inlets and outlets only to the process bays where they are required. Two of the inlet lines are connected to selector valves that allow switching between different reagent candidates during an experiment. New robotically reconfigurable analytical modules for inline FT-IR spectroscopy and online LC-MS were integrated into the platform enabling data-rich experimentation. The analytical modules (three available in total) contain tubing that directs the process stream either to an external FT-IR instrument with a flow cell or to an HPLC sample injection valve connected to an LC-MS, and brings it back to the process stack post-analysis enabling further reactions to be performed if necessary. The modular approach allowed analysis at multiple points in a multistep sequence to quickly obtain critical process performance information such as conversion, yield, and impurity profiles.

Interdependencies in Multistep Flow Processes

Telescoped flow sequences, analogous to one-pot batch synthesis, are attractive since they minimize the number of intermediate purification steps required and allow unstable high-potency intermediates to react shortly after formation. However, chemical compatibility of the reagents, solvents, and byproducts must be ensured across all individual unit operations. Optimization of multiple interconnected reactions simultaneously, in comparison to separately or sequentially, reduces the number of optimization campaigns needed, decreases the time and manual effort spent to isolate intermediates, and increases the likelihood of identifying a global system optimum. The process systems engineering literature shows that when there are interdependencies between individual subsystems that affect their respective performance, combining the subsystem optima does not necessarily lead to the global system optimum.[37,38] In the context of chemical process optimization, examples of interdependencies include cases where excess reagent improves the yield of an upstream reaction but adversely affects the yield of a downstream reaction, or cases in which a certain solvent may be optimal for one reaction but another provides the best overall process yield and ensures solubility. It is therefore beneficial, when possible, to optimize all variables in the overall process simultaneously to identify any interactions and avoid reoptimizing steps. Another type of interdependency that arises in multistep flow processes is physical constraints on downstream residence times. In continuous reactors, input flow rate and reactor volume determine residence time. While the residence time in the first reactor can be varied independently by changing the flow rate, downstream flow rates are fully specified by stoichiometric relationships with upstream reagents. Consequently, downstream residence times cannot be varied independently. Chatterjee et al. overcame this constraint in their radial synthesis platform[8] by storing intermediate solutions in an interim vessel until needed for the next reaction. While this approach decouples residence times, it is restricted to chemistries that do not produce unstable or hazardous intermediates that may pose additional challenges upon accumulation and long-term storage. In this work, we leveraged the modular nature and robotic reconfigurability of our synthesis platform to introduce an independent degree of freedom enabling variation of downstream residence time. Specifically, the robot was used to automatically reconfigure downstream reactor volume between a 1 and 3 mL module on the fly during a multistep optimization campaign. In this way, both shorter and longer downstream residence times could be accessed without altering upstream flow rates.

Bayesian Optimization Algorithm

The open source Bayesian optimization package, Dragonfly, developed by Kandasamy et al.[39−41] was chosen as the optimization algorithm (further details in SI). Dragonfly employs a flexible Gaussian process (GP) surrogate model to mathematically describe the relationship between input variables and objective functions. It supports continuous (e.g., temperature), discrete categorical (e.g., leaving group), and discrete numeric (e.g., reactor volume) variables, all of which were present in our case. Another useful feature in Dragonfly is the capability to optimize multiple objectives simultaneously, which we took advantage of to consider multiple process metrics of interest (yield, productivity, cost). An optimization campaign begins with an initial space-filling design of experiments. After fitting the model to initialization data, the algorithm was queried for one new reaction condition to evaluate at each refinement iteration. Refinement experiments are generated using the upper confidence bound (UCB) and Thompson sampling (TS) acquisition functions, which balance an exploitative strategy (query regions where objective is expected to be high) with an explorative one (query regions where uncertainty is high) to increase the likelihood of finding globally optimal points.[39,42] For multi-objective optimization, the algorithm samples different weights (relative importance) for each objective function at each refinement iteration and maximizes the weighted sum.[40] If the system contains a Pareto trade-off front[16,43] where one objective cannot be improved without making another worse, sampling different weights enables the algorithm to identify multiple Pareto optimal points. Hardware set point control and execution of experimental designs were automated using Python scripts. Once the user specified the process configuration and optimization variable bounds in a spreadsheet file, it was loaded into a graphical user interface (GUI) where the backend code parsed this information, queried the algorithm for experimental conditions, calculated flow rates, and generated a queue of commands to set up, execute, and transition between experiments. The GUI automatically read and converted LC peak areas to reaction yields using a predetermined calibration curve, which closed the information feedback loop and allowed the platform to operate independently once initiated.

Fully Telescoped Process Experiments

The fully telescoped sonidegib 6 synthesis process was executed on the platform (Figure A,B). The first step was performed in a 1 mL reactor followed by an LC-MS module to determine the yield of SNAr product 3. The platform switched between the three different halonitropyridine starting materials 2a, 2b, and 2c via a selector valve. Subsequent reduction to amine 4 was accomplished using the packed bed reactor module wherein H2 was introduced via a mass flow controller and the resulting biphasic gas–liquid stream passed through heated tubes containing a heterogeneous Pd0 catalyst. Hydrogenation reactions generally benefit from continuous processing due to enhanced gas–liquid–solid mass transfer and smaller reaction volumes that improve throughput and safety.[44] Immediately after the packed bed, an inline membrane separator module removed residual hydrogen gas which exited in the retentate stream, since the downstream amide coupling step and reaction sampling modules required a homogeneous liquid stream. The liquid permeate was conveyed to the inline FT-IR module for real-time monitoring of nitro 3 and amine 4 concentrations. We observed that when exposed to ambient air, solutions of amine 4 changed from colorless to dark red because of amine oxidation on the time scale of minutes. The telescoped flow approach was therefore beneficial in this case for minimizing exposure to oxygen, obviating the need for purification, and ensuring that the unstable amine was utilized promptly after formation. After FT-IR analysis, the reaction stream was delivered to the second process stack where the final amide coupling of 4 and 5 was conducted in either a 1 or 3 mL reactor to access different residence times. The overall yield of sonidegib 6 for the three-step sequence was determined using the second LC-MS module. Deploying three analytical modules simultaneously enabled data-rich process development, since each experiment provided information about three different reactions.

Figure 3

Fully telescoped process experiments for multistep synthesis of sonidegib. (A) Platform configuration. (B) Process scheme. (C) FT-IR timecourse data. (D) LC chromatograms. (E) Schemes of (a) telescoped and (b) pure nitro reduction. Abbreviations: CSM (catalytic static mixer). Before running a multistep optimization campaign, preliminary experiments (Table ) were performed to verify that the fully telescoped process and Pd0 catalyst activity were stable. For these preliminary experiments, continuous variable values were typically set near the midpoint of the intended optimization range. Two different catalysts, Pd0/silica[45,46] and Pd0-electroplated stainless steel catalytic static mixers (Pd-CSM),[47,48] were initially investigated.

Table 1

Preliminary Experiments with Fully Telescoped Processa

entry	S_NAr leaving group	S_NAr yield [%]b	reduction catalyst	coupling reagent	coupling time [mins] (V_R)	overall yield [%]b	observations
1	Cl	84	Pd/silicac	HATU	2 (1 mL)	14	Incomplete reduction and catalyst deactivation (even at 100 °C)
2	F	95	Pd/silicac	EDC/HOBt	2 (1 mL)	-	S_NAr fluoride salt byproduct resulted in disintegration of silica support
3	F	96	Pd-CSMd	EDC/HOBt	2 (1 mL)	17	SS316L support resistant to fluoride
4	F	96	Pd-CSMd	HATU	6 (3 mL)	53	Catalyst deactivation eventually observed after few hours

Fixed process conditions (see Figure A): SNAr (T = 70 °C, tres = 5 min, 1 equiv 1, 1 equiv DIPEA), amide coupling (T = 60 °C, 2 equiv DIPEA, 1 equiv coupling reagent).

Yield determined by LC with an internal standard.

Pd/silica reduction conditions: 0.5 g catalyst, T = 70 °C, backpressures = 100 psi retentate, 95 psi outlet, H2 flow = 20 sccm (10 equiv).

Pd-CSM reduction conditions: two CSMs in series, T = 120 °C, backpressures = 125 psi retentate, 120 psi outlet, H2 flow = 30 sccm (15 equiv). Abbreviations: CSM (catalytic static mixer), VR (reactor volume).

Fixed process conditions (see Figure A): SNAr (T = 70 °C, tres = 5 min, 1 equiv 1, 1 equiv DIPEA), amide coupling (T = 60 °C, 2 equiv DIPEA, 1 equiv coupling reagent). Yield determined by LC with an internal standard. Pd/silica reduction conditions: 0.5 g catalyst, T = 70 °C, backpressures = 100 psi retentate, 95 psi outlet, H2 flow = 20 sccm (10 equiv). Pd-CSM reduction conditions: two CSMs in series, T = 120 °C, backpressures = 125 psi retentate, 120 psi outlet, H2 flow = 30 sccm (15 equiv). Abbreviations: CSM (catalytic static mixer), VR (reactor volume). The first LC-MS module showed that the SNAr reaction proceeded to >80% yield with both leaving groups (Table , entries 1–4). However, the inline FT-IR after the reduction step revealed that Pd catalyst deactivation within the time scale of an experiment (tens of minutes to an hour) was a major issue preventing stable operation with either catalyst support. With Pd/silica, the FT-IR timecourse showed a gradual rise in unconverted nitro starting material 3 with a concomitant decrease in amine product 4 (similar to Figure C(a)). While the Cl leaving group in 2a resulted only in deactivation, the F leaving group in 2c led to disintegration of the silica support (Table , entries 1–2). A stoichiometric byproduct of the upstream SNAr reaction is the conjugate acid salt of DIPEA (Figure B), and the disintegration of the silica support can be attributed to chemical incompatibility with fluoride. To solve the chemical compatibility issue, we turned our attention to Pd-CSMs, which are 3D-printed from chemically resistant stainless steel 316L.[47] The nitro reduction was operated at the platform’s upper limit of 120 °C to suppress catalyst deactivation.[49,50] When we experimentally evaluated two Pd-CSMs in series with 2c starting material (Table , entries 3–4), the stainless steel scaffold was indeed chemically resistant to the DIPEA·HF byproduct, but deactivation was eventually observed after a few hours of operation. An attempt to regenerate the Pd catalyst in situ(46,51) did not restore the original catalyst activity (Figure C(a)). To investigate the cause of deactivation (see SI for further details), only the nitro reduction step was carried out with isolated nitro compound 3. FT-IR and LC-MS data (Figure C(b),D(b)) showed complete conversion and stable performance over multiple hours with both catalysts. While the pure nitro reduction proceeded cleanly, the telescoped reduction outlet contained a significant amount of hydrazo dimer 8, an intermediate in the reduction pathway (Figure D).[52] Therefore, the quick onset of catalyst deactivation observed with the telescoped process (Figure E) was likely due to catalyst poisoning by the SNAr byproduct salt or by strong adsorption of condensation intermediates on the Pd surface.[51] The ability to integrate analytical modules both within and at the end of the multistep sequence was crucial for identifying which reaction was affecting the overall yield and stability of the fully telescoped process. Due to the catalyst stability issues, it was necessary to divide the process into a single-step SNAr optimization with offline purification of SNAr product 3, followed by a multistep optimization involving reduction of isolated 3, gas–liquid membrane separation, and amide coupling to form sonidegib 6.

SNAr Multi-Objective Optimization Campaign

For the SNAr campaign (Figure ), five optimization variables and three objective functions were optimized simultaneously. The reaction was carried out in a heated 1 mL reactor module (operated up to the hardware limit of 120 °C) followed by an LC-MS module for yield quantification and an FT-IR module for steady-state monitoring (see SI Movie). The optimization domain consisted of four continuous variables (residence time, temperature, equiv of 1, equiv of DIPEA) and one categorical variable (2a, 2b, or 2c). We considered objective functions related to both reaction performance and raw material costs. Therefore, in addition to reaction yield and productivity (grams product/hour), we included cost as a third optimization objective which was defined as the cost of reagents used per mole of product made. The reagent cost accounted for the different costs of the three starting material candidates (cost 2a < 2b < 2c) as well as the additional cost of using excess morpholine 1 and DIPEA.

Figure 4

SNAr multi-objective optimization campaign. (A) Platform configuration. (B) Reaction scheme, optimization variables, and objective functions. (C) Objective values versus experiment number. (D) 3D plot of objective values. (E) 3D scatter plot showing continuous variable values explored. (F) Yield response surfaces generated using Gaussian process models fitted to experimental data. Equiv of 1 and DIPEA set to 1.1 to enable visualization. Thirty experiments (Figure C–E; tabulated results in SI) were run continuously over 10 h (3 experiments/hour, ∼0.4 g starting materials/experiment) in a closed-loop manner (automated design and execution of experiments, reaction sampling, data processing, and feedback). The initialization phase, which aims to conduct a preliminary scan of the design space, contained 9 experiments evenly divided between the 3 leaving groups (3 each). In the refinement phase, the algorithm proposed one new experiment at a time with the aim of experimentation specifically focused in regions where optimal values for all three objectives could be simultaneously achieved. Objective values obtained over the course of the 30 experiments are plotted (Figure C,D) with data points color-coded by leaving group. In the initialization phase (Figure C), yields ranged from 84% to nearly quantitative (>98%) when 2c was employed as the starting material (orange points). As a result of the relatively high yields for all potential starting material candidates, the cost per mole of product depended primarily on the cost differences between leaving group identity. This is visually apparent as 2a (∼$400/mol), 2b (∼$500/mol), and 2c (∼$600/mol) data points are clustered in the bottom, middle, and top regions of the cost axis. The utility of the multi-objective algorithm becomes evident in the refinement phase. With regard to the productivity objective, the initialization phase provided throughput values between 0.6 and 2.6 g/h, whereas the refinement phase led to a significant improvement (Figure C) with most experiments operating in the 5–6 g/h range. This corresponds to more points populating the right-hand portion of the 3D scatter plot (Figure D) where productivity is high. Furthermore, in contrast to the initialization phase where the goal is to scan the chemical design space, the algorithm’s goal in the refinement phase is to identify conditions where all three objective values are optimized simultaneously. Consequently, the majority of experiments in the refinement phase have high yields as well as high productivities. For certain systems, multi-objective optimization can lead to situations where one objective cannot be improved without making another one worse off. The set of optimal points for which this trade-off exists is known as the Pareto front.[16,43] Instead of fixing the weight (relative importance) of each objective for the campaign, which would result in the identification of one optimal point along the Pareto front, the Dragonfly algorithm samples different weights at each refinement iteration, which enables it to find multiple Pareto optimal points.[40] For this SNAr reaction, the data show that 2c (orange, F leaving group) led to the highest yield and productivity combinations (e.g., expt 30: 98.3%, 5.97 g/h, $595/mol). However, 2a (blue, Cl leaving group) conditions were ∼33% cheaper but came with the trade-off of slightly lower yields and productivities (e.g., expt 27: 93.8%, 5.70 g/h, $414/mol). Both these conditions are Pareto-optimal, and it is up to the experimenter to decide which point is optimal for their specific context given other process considerations (e.g., purity requirements, separations). To visualize the impact of continuous variables, the experimental data are plotted (Figure E) with respect to the three most influential continuous variables (temperature, time, equiv of 1) where a data point’s color and size correlate with yield and productivity, respectively. Experiments with 2c (orange lines) resulted in nearly quantitative yields even at mild reaction conditions, whereas 2a (blue) and 2b (green) required higher temperatures and longer residence times to promote reactivity. Data points with the largest size (highest productivity) are concentrated on the right face of the plot corresponding to the shortest residence time of 1 min. The Gaussian process (GP) mathematical model that underlies the Bayesian algorithm was useful not only during the optimization, but also after the fact for process characterization. By regressing the GP model to the experimental data and evaluating it over the design space, yield response surfaces for the three leaving groups were generated (Figure F). The surfaces represent the predicted yield (GP mean value) and the color reflects the local model uncertainty (GP standard deviation) at that point. The flatness of the 2c surface reflects the robust reaction rate even at low temperature, while the curvature of the 2a and 2b surfaces illustrates the significant rate acceleration that increased temperature provided. Model uncertainty is lowest (dark blue) in the more optimal regions of the design space preferentially explored by the algorithm, and uncertainty is greatest (most yellow) at low temperature and long residence time, since very few experiments were sampled from this suboptimal design region. It is worth noting that if the experimenter wanted to improve model accuracy even further for process modeling purposes, the ability to quantify local uncertainty using the GP model can be leveraged to perform post-optimization experiments in regions with the highest uncertainty.

Multistep Downstream Process Optimization Campaign

Following the SNAr campaign, we optimized the multistep downstream process (Figure ) involving nitro reduction, gas–liquid separation, and amide coupling for converting purified SNAr product 3 into sonidegib 6. An important change was made to the process configuration, however, based on prior results. During the fully telescoped process experiments (Figure , Table ), we noticed a significant side-product in the final amide coupling LC-MS data (Figure F(a)) when HATU was employed as the coupling reagent. Based on our understanding of the reaction mechanism (see SI),[53] this side-product was assigned as guanidinium 10, formed by an undesired reaction between amine 4 and HATU, which occurred due to simultaneous addition of the amide coupling streams (Figure A). This side-reaction can be suppressed by adding the amine after HATU that is consumed in the desired activation reaction with carboxylic acid 5. Therefore, the activation and amide coupling steps were performed sequentially in separate reactors to control the order of addition. To do this, we took advantage of the synthesis platform’s flexibility by switching from a linear to a convergent process configuration (Figure A,B) containing two parallel branches (nitro reduction to form amine 4, and activation to form the activated ester 9b or 9c) that merge downstream for coupling. Two FT-IR modules were deployed to monitor both the activation reaction and the nitro reduction Pd catalyst stability in real-time over the course of the campaign. A spherical Pd0/C catalyst[50] (Heraeus GmbH) with high surface area (see SI) and low pressure drop was used for the nitro reduction, which was operated at high temperature (125 °C) to minimize the likelihood of deactivation. The overall yield of sonidegib 6 was determined via an LC-MS module after the amide coupling reactor.

Figure 5

Multistep downstream process optimization campaign for sonidegib synthesis. (A) Platform configuration for convergent process. (B) Process scheme, optimization variables, and objective functions. (C) Objective values versus experiment number. (D) 2D plot of objective values. (E) 3D scatter plot showing continuous variable values explored. (F) LC chromatograms from (a) linear and (b) convergent processes. (G) FT-IR timecourse data from (a) nitro reduction of 3 and (b) activation of 5. (H) Yield response surfaces generated using Gaussian process models fitted to experimental data. Activation time and equiv 3:5 set to 1 min and 1.1 equiv to enable visualization. Abbreviations: HOAt (1-hydroxy-7-azabenzotriazole). Five optimization variables and two objective functions (Figure B) were considered for the multistep campaign. The activation reagent (EDC/HOBt or HATU) and activation residence time were categorical and continuous variables, respectively. Equivalents of nitro 3 with respect to carboxylic acid 5 was included as a continuous variable, since amide couplings are often performed with a slight excess of the amine. Since the flow rate of 3 was fully specified by the 3:5 stoichiometry and the acid flow rate in the activation reactor, the reduction residence time also varied based on equivalents of 3 and activation time. For the final amide coupling step, in addition to including temperature as a continuous variable, reactor volume (1 or 3 mL) was a discrete numeric variable that was automatically reconfigured by the robotic gantry arm during the campaign to provide an independent degree of freedom for accessing shorter and longer residence times without altering upstream flow rates. Overall yield and productivity of sonidegib 6 were chosen as the objective functions, while cost was not included in this case, since both coupling reagents were similarly priced. Fifteen total experiments were run over 13 h (∼1.4 g starting materials/experiment) with 8 initialization and 7 refinement experiments (Figure C–G; tabulated results in SI). The initialization runs were evenly divided between the two activation reagents and reactor volumes (Figure C), and higher yields were observed with HATU (red points, 73–93%) than EDC/HOBt (blue points, 23–56%). After the first 4 experiments which utilized a 1 mL reactor for the amide coupling, the system performed a midrun robotic reconfiguration from the 1 to 3 mL reactor (see SI Movie) and brought the process back up to resume the campaign. The ability to change reactor volume helped us investigate the effect of coupling time independently of other variables. In this case, longer reaction times provided by the larger volume increased conversion and yield. This is particularly evident when comparing EDC/HOBt experiment 3 (1 mL, 1 min, 25% yield, 1.1 g/h) and experiment 8 (3 mL, 3 min, 57% yield, 2.65 g/h), which employed otherwise similar conditions. For all the refinement experiments (9–15), the algorithm proposed HATU and the 3 mL reactor. Compared to the initialization (e.g., expt 5, 93% yield, 1.6 g/h), the multi-objective optimization algorithm identified conditions in the refinement phase with simultaneously high yields and productivities (optimal expt 15, 93% yield, 7.4 g/h) (Figure C,D). When plotting the data (Figure E) with respect to the three most impactful continuous variables (coupling temperature, coupling time, equiv 3:5), the refinement experiments (points with solid vertical lines) generally employed faster coupling times and elevated temperatures to boost the reaction rate, yield, and productivity. The 3 mL reactor was key to providing sufficiently long residence times (∼1.6–2 min) at high flow rates to achieve greater conversion compared to the 1 mL reactor, which is likely why the algorithm selected the 3 mL module for each refinement experiment. Yield response surfaces (Figure H) generated using the Gaussian process model fitted to the experimental data help visualize the impact of coupling temperature and time. Model uncertainty is generally much lower (dark purple) for HATU than EDC/HOBt (more orange/yellow), since 11 out of 15 experiments were conducted with the optimal HATU reagent. Representative analytical data (Figure F,G) collected by the integrated LC-MS and FT-IR modules provided insight into the individual reactions within the multistep process. Comparing the LC profiles from the fully telescoped linear process and convergent process (Figure F) showed that guanidinium impurity 10 (red peak) formed by the undesired side-reaction between HATU and amine 4 was significantly reduced in the convergent process. Monitoring the activation reactor outlet using FT-IR module 2 (Figure G(b)) revealed that the steady-state concentration of the HATU activated ester 9c (purple profile) reached the same plateau for all HATU experiments which ranged from 1 to 5 min of activation time. This indicated that activation of acid 5 with HATU was rapid and proceeded essentially to full conversion even at 1 min. As a result of the controlled order of addition with the convergent process, minimal HATU proceeded downstream, which suppressed guanidinium side-product 10 and increased overall process yield. The first FT-IR module, which analyzed the liquid permeate from the nitro reduction and gas–liquid separator, played a key role in helping us verify in real-time that the Pd catalyst activity was stable as the campaign progressed. The diagnostic IR signal from 3 (Figure G(a), yellow profile) was at the baseline, which indicated high levels of conversion even at the highest liquid flow rates (∼0.8 mL/min). This was corroborated by the complementary LC-MS module downstream which also revealed complete reduction to amine 4. The enabling features of continuous processing for hydrogenation (enhanced gas–liquid–solid mass transfer, high local catalyst loading, and prompt telescoping of the oxidation-sensitive amine into the amide coupling) contributed to realizing high nitro reduction throughput and sonidegib productivity.

Conclusion

A CASP-proposed and human-refined multistep synthesis recipe for an exemplary small molecule, sonidegib 6, was experimentally optimized on a modular, robotic flow synthesis platform. Integrated analytics (FT-IR, LC-MS) facilitated data-rich experimentation and monitoring of multiple reactions (SNAr, nitro reduction, amide coupling) for thorough process understanding. Automation scripts for reaction execution, sampling, and data processing helped accelerate and reduce manual effort during experimentation. By utilizing a multi-objective Bayesian optimization algorithm that iteratively proposed new conditions to evaluate in order to optimize several objectives simultaneously (yield, productivity, cost), optimal settings for categorical (reagent choice) and continuous (temperature, time, stoichiometry) reaction conditions were identified to complete the synthesis recipe as well as to generate predictive mathematical process models. The platform’s hardware features, such as robotically reconfigurable reactor volumes (1 and 3 mL) and convergent synthesis capability, were essential for allowing variation of downstream residence time in multistep flow synthesis and controlling the order of addition to minimize unproductive reactivity. This work not only explored how algorithmic optimization can help fill in process details when integrating CASP tools with recipe-driven synthesis platforms but also identified several areas where human input is still needed. In multistep syntheses, any chemical incompatibilities between reagents across steps that are hard to anticipate a priori (e.g., SNAr byproduct leading to catalyst deactivation in the sonidegib 6 case study) pose a major challenge for one-pot or telescoped synthesis. Such cases may necessitate intermediate purification procedures which can be challenging to predict or develop, especially when nonchromatographic. Furthermore, procedural details such as concentration and solubility require additional experiments, while knowledge of the reaction mechanism and human interpretation of analytical data are needed to inform the order of reagent addition and strategies to eliminate side-reactions. Due to the limited data availability, it is currently not possible to plan every detail required to execute a synthesis without human intervention. However, this work exemplifies how machine assistance in the repetitive aspects of initial formulation, experimental execution, and data collection can help us focus on the application of domain knowledge, critical interpretation of data, and creative problem-solving where human input still provides the greatest utility.

24 in total

1. A robotic platform for flow synthesis of organic compounds informed by AI planning.

Authors: Connor W Coley; Dale A Thomas; Justin A M Lummiss; Jonathan N Jaworski; Christopher P Breen; Victor Schultz; Travis Hart; Joshua S Fishman; Luke Rogers; Hanyu Gao; Robert W Hicklin; Pieter P Plehiers; Joshua Byington; John S Piotti; William H Green; A John Hart; Timothy F Jamison; Klavs F Jensen
Journal: Science Date: 2019-08-09 Impact factor: 47.728

2. Bayesian reaction optimization as a tool for chemical synthesis.

Authors: Benjamin J Shields; Jason Stevens; Jun Li; Marvin Parasram; Farhan Damani; Jesus I Martinez Alvarado; Jacob M Janey; Ryan P Adams; Abigail G Doyle
Journal: Nature Date: 2021-02-03 Impact factor: 49.962

3. Multi-step continuous-flow synthesis.

Authors: Joshua Britton; Colin L Raston
Journal: Chem Soc Rev Date: 2017-03-06 Impact factor: 54.564

4. Reconfigurable system for automated optimization of diverse chemical reactions.

Authors: Anne-Catherine Bédard; Andrea Adamo; Kosi C Aroh; M Grace Russell; Aaron A Bedermann; Jeremy Torosian; Brian Yue; Klavs F Jensen; Timothy F Jamison
Journal: Science Date: 2018-09-21 Impact factor: 47.728

5. A remote-controlled adaptive medchem lab: an innovative approach to enable drug discovery in the 21st Century.

Authors: Alexander G Godfrey; Thierry Masquelin; Horst Hemmerle
Journal: Drug Discov Today Date: 2013-03-21 Impact factor: 7.851

6. Discovery of NVP-LDE225, a Potent and Selective Smoothened Antagonist.

Authors: Shifeng Pan; Xu Wu; Jiqing Jiang; Wenqi Gao; Yongqin Wan; Dai Cheng; Dong Han; Jun Liu; Nathan P Englund; Yan Wang; Stefan Peukert; Karen Miller-Moslin; Jing Yuan; Ribo Guo; Melissa Matsumoto; Anthony Vattay; Yun Jiang; Jeffrey Tsao; Fangxian Sun; AnneMarie C Pferdekamper; Stephanie Dodd; Tove Tuntland; Wieslawa Maniara; Joseph F Kelleher; Yung-Mae Yao; Markus Warmuth; Juliet Williams; Marion Dorsch
Journal: ACS Med Chem Lett Date: 2010-03-16 Impact factor: 4.345

Review 7. Amide bond formation: beyond the myth of coupling reagents.

Authors: Eric Valeur; Mark Bradley
Journal: Chem Soc Rev Date: 2008-12-04 Impact factor: 54.564

8. Reinforcing the supply chain of umifenovir and other antiviral drugs with retrosynthetic software.

Authors: Yingfu Lin; Zirong Zhang; Babak Mahjour; Di Wang; Rui Zhang; Eunjae Shim; Andrew McGrath; Yuning Shen; Nadia Brugger; Rachel Turnbull; Sarah Trice; Shashi Jasty; Tim Cernak
Journal: Nat Commun Date: 2021-12-16 Impact factor: 14.919

9. A self-driving laboratory advances the Pareto front for material properties.

Authors: Benjamin P MacLeod; Fraser G L Parlane; Connor C Rupnow; Kevan E Dettelbach; Michael S Elliott; Thomas D Morrissey; Ted H Haley; Oleksii Proskurin; Michael B Rooney; Nina Taherimakhsousi; David J Dvorak; Hsi N Chiu; Christopher E B Waizenegger; Karry Ocean; Mehrdad Mokhtari; Curtis P Berlinguette
Journal: Nat Commun Date: 2022-02-22 Impact factor: 17.694

10. Current and Future Roles of Artificial Intelligence in Medicinal Chemistry Synthesis.

Authors: Thomas J Struble; Juan C Alvarez; Scott P Brown; Milan Chytil; Justin Cisar; Renee L DesJarlais; Ola Engkvist; Scott A Frank; Daniel R Greve; Daniel J Griffin; Xinjun Hou; Jeffrey W Johannes; Constantine Kreatsoulas; Brian Lahue; Miriam Mathea; Georg Mogk; Christos A Nicolaou; Andrew D Palmer; Daniel J Price; Richard I Robinson; Sebastian Salentin; Li Xing; Tommi Jaakkola; William H Green; Regina Barzilay; Connor W Coley; Klavs F Jensen
Journal: J Med Chem Date: 2020-04-14 Impact factor: 7.446

1 in total

1. Photochemical Synthesis of the Bioactive Fragment of Salbutamol and Derivatives in a Self-Optimizing Flow Chemistry Platform.

Authors: Romaric Gérardy; Anirudh M K Nambiar; Travis Hart; Prajwal T Mahesh; Klavs F Jensen
Journal: Chemistry Date: 2022-06-08 Impact factor: 5.020

1 in total