| Literature DB >> 35412905 |
Brittany C Haas1, Adam E Goetz2, Ana Bahamonde1, J Christopher McWilliams2, Matthew S Sigman1.
Abstract
Amides are ubiquitous in biologically active natural products and commercial drugs. The most common strategy for introducing this functional group is the coupling of a carboxylic acid with an amine, which requires the use of a coupling reagent to facilitate elimination of water. However, the optimal reaction conditions often appear rather arbitrary to the specific reaction. Herein, we report the development of statistical models correlating measured rates to physical organic descriptors to enable the prediction of reaction rates for untested carboxylic acid/amine pairs. The key to the success of this endeavor was the development of an end-to-end data science–based workflow to select a set of coupling partners that are appropriately distributed in chemical space to facilitate statistical model development. By using a parameterization, dimensionality reduction, and clustering protocol, a training set was identified. Reaction rates for a range of carboxylic acid and primary alkyl amine couplings utilizing carbonyldiimidazole (CDI) as the coupling reagent were measured. The collected rates span five orders of magnitude, confirming that the designed training set encompasses a wide range of chemical space necessary for effective model development. Regressing these rates with high-level density functional theory (DFT) descriptors allowed for identification of a statistical model wherein the molecular features of the carboxylic acid are primarily responsible for the observed rates. Finally, out-of-sample amide couplings are used to determine the limitations and effectiveness of the model.Entities:
Keywords: amide coupling; data science; reactivity
Year: 2022 PMID: 35412905 PMCID: PMC9169781 DOI: 10.1073/pnas.2118451119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.(A) Project workflow. (B) PCA plots showing clusters by color for carboxylic acids in 22 clusters and primary alkyl amines in 16 clusters. Three PCs describe 57.2% (72.6% in 5 PCs) of the variance in the acids and 59.5% (78.1% in 5 PCs) of the variance in the amines. (C) Selected carboxylic acids and (D) primary alkyl amines for training set substrates.
Fig. 2.(A) Reaction under study. (B) Amine-quenching reaction, noting colored amine adduct 7 and internal standard (8, 2,6-dinitrotoluene). (C) Alternative amine consumption pathways by CO2 and CDI.
Fig. 3.MLR equation and model plotted as predicted vs. measured ln(k) and representations of the molecular descriptors used in the model. Reaction half-life (t1/2) calculated for second-order kinetics and time to 97% conversion (denoted with superscript letters a and b, respectively) for reactions with initial amine and CDI concentrations of 0.5 M.
Fig. 4.(A) MLR model, using the same descriptors as the Fig. 3 model, used to predict rates of external validation amide couplings. (B) Model retrained on all training and external validation data (pseudorandom 70:30 training/test split), where x indicates a coupling previously in the external validation set. (C) External validation couplings with their measured rates and predicted rates based on the MLR model in A and B. (D) Living model concept schematic.