Literature DB >> 35936424

Eliminating Transition State Calculations for Faster and More Accurate Reactivity Prediction in Sulfa-Michael Additions Relevant to Human Health and the Environment.

Piers A Townsend^1,2, Elliot H E Farrar², Matthew N Grayson².

Abstract

Fast and accurate computational approaches to predicting reactivity in sulfa-Michael additions are required for high-throughput screening in toxicology (e.g., predicting excess aquatic toxicity and skin sensitization), chemical synthesis, covalent drug design (e.g., targeting cysteine), and data set generation for machine learning. The kinetic glutathione chemoassay is a time-consuming in chemico method used to extract kinetic data in the form of log(k GSH) for organic electrophiles. In this work, we use density functional theory to compare the use of transition states (TSs) and enolate intermediate structures following C-S bond formation in the prediction of log(k GSH) for a diverse group of 1,4 Michael acceptors. Despite the widespread use of transition state calculations in the literature to predict sulfa-Michael reactivity, we observe that intermediate structures show much better performance for the prediction of log(k GSH), are faster to calculate, and easier to obtain than TSs. Furthermore, we show how linear combinations of atomic charges from the isolated Michael acceptors can further improve predictions, even when using inexpensive semiempirical quantum chemistry methods. Our models can be used widely in the chemical sciences (e.g., in the prediction of toxicity relevant to the environment and human health, synthesis planning, and the design of cysteine-targeting covalent inhibitors), and represent a low-cost, sustainable approach to reactivity assessment.

Entities: Chemical

Year: 2022 PMID： 35936424 PMCID： PMC9352231 DOI： 10.1021/acsomega.2c03739

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Michael addition reactions, characterized by nucleophilic attack at the β-carbon of an α,β-unsaturated carbonyl compound, have been widely used in synthesis for generating a variety of carbon-nucleophile bond types (e.g., C–S, C–N, and C–C bonds).[1−3] The sulfa-Michael addition in particular is a highly important reaction given its extensive use in organic synthesis[4−6] pharmacology,[7,8] toxicology,[9,10] and materials science.[11] Therefore, the ability to assess and predict the reactivity of Michael acceptors (MAs) toward sulfur nucleophiles is of paramount importance across a range of disciplines. In chemical synthesis, predicting rates of reaction between MAs and a given nucleophile would provide low-cost, quantitative predictions for novel synthetic transformations. In toxicology, such predictions could be used in the chemical risk assessment of aquatic toxicity and skin sensitization, while aligning with the increasingly important “Green” toxicology approach.[12−14] Lastly, in drug discovery, such predictions could be used in the design of cysteine-targeting, targeted covalent inhibitors (TCIs). TCIs are an emerging class of compounds in drug discovery and recently, three FDA-approved drugs (Afatanib, Ibrutinib, and Osimertinib) were designed to react irreversibly with a sulfur-containing cysteine residue through a hetero-Michael addition.[15,16] In 2006, Schultz et al. presented a pioneering framework for modeling reactions between an electrophilic toxicant and a biological macromolecule, many of which are nucleophilic.[17] This framework proposed the use of model nucleophiles to better understand reactive toxicity and the ultimate biological effects associated with its existence. Traditionally, MA reactivity has been assessed using chemoassays, with the kinetic glutathione chemoassay developed by Böhme et al. being the most widely used to examine sulfa-Michael additions.[18] This method has been used extensively to obtain second-order rate constants [L mol–1 min–1] to assess the reactivity of different α,β-unsaturated carbonyl compounds and uses glutathione (GSH) as the nucleophile (Figure ).[19] In 2016, Schürmann and co-workers compiled, to the best of our knowledge, the largest known experimental kinetic glutathione assay rate data set for α,β-unsaturated carbonyl compounds.[20] As we move toward a more sustainable future, it is vital that in silico approaches are developed that align with the principles of green chemistry. Predictive in silico approaches provide advantages in the design of safer chemicals and less hazardous syntheses, along with reducing harmful waste products that are commonly utilized in experimental methods. Thus, to reduce the environmental, financial, and time-based costs attached to chemoassays, there have been numerous attempts to develop quantum chemical transition state (TS) approaches to the prediction of MA reactivity data sets. In 2010, Cronin and co-workers performed TS calculations for 22 MAs reacting with methanethiolate.[21] They presented a QSAR regression equation for the prediction of log(kGSH) that used the activation barrier ΔE‡PCM for both the forward and backward reaction, and the solvent accessible surface area (SASA), resulting in an impressive squared Pearson correlation coefficient (r2) of 0.90 between their calculated and experimental values for log(kGSH):

Figure 1

Generalized reaction mechanism between glutathione and a 1,4 Michael acceptor. “Glut” represents a glutathione residue minus its sulfur atom.

Generalized reaction mechanism between glutathione and a 1,4 Michael acceptor. “Glut” represents a glutathione residue minus its sulfur atom. Although this work did provide strong regression statistics, multiple descriptors were required, and when only ΔE‡PCM and the SASA were utilized as features, a relatively poor correlation (r2 = 0.51) was observed. In the same year, Schürmann and co-workers published a set of multidescriptor models based on ground state features such as a charge-limited local electrophilicity index, a σ-bond energy (obtained from natural bond order analysis), and SASAs for both the α- and β-carbons.[22] Overall, their models showed strong statistics, with a four-descriptor model showing the best performance (r2 = 0.93). In 2011, Mulliner et al. used TS calculations to predict log(kGSH) for 35 α,β-unsaturated carbonyl compounds.[23] They also performed calculations under the PCM solvent model to examine the effect of implicit solvation on the prediction of log(kGSH). For solvated and nonsolvated models, respectively, they obtained regression equations with r2 = 0.76 and r2 = 0.68 between calculated and experimental log(kGSH) values. Following this in 2013, the same authors calculated reaction barriers (ΔE‡) for a set of esters, and from ΔE‡, calculated log(kGSH) using their previously developed regression model from 2011. This study examined the use of log(kGSH) and hydrophobicity (in the form of log(Kow), where Kow is the octanol–water partition coefficient) for the prediction of 50% growth inhibition ofT. pyriformis (log(EC50)). It is thus clear that TSs have commonly been used in assessing MA reactivity, but far fewer studies have examined high energy intermediate structures (HEI, see Figure ) and their ability to predict reactivity.[24] In 2013, Enoch and Roberts performed DFT calculations on a data set of 26 MAs and calculated their ΔEHEI values upon reaction with methanethiolate.[25] With the full data set, their results showed a poor correlation (r2 = 0.02) between skin sensitization potency (pEC3) and intermediate energies upon linear regression analysis. Upon refining the data set and removing some problematic compounds from the analysis, their results showed improved correlation (r2 = 0.43) for a single descriptor model using intermediate energies. This was further improved by adding a SASA descriptor, resulting in a multivariate regression model with strong statistics (r2 = 0.79). Further in 2016, Ebbrell et al. developed an in silico profiler for the prediction of RC50 values using intermediate MA structures, where RC50 is the concentration of electrophiles needed to reduce the concentration of GSH by 50%. Their results showed that a single descriptor model using ΔEHEI resulted in modest regression statistics (n = 54, r2 = 0.52). However, this model was greatly improved (n = 41, r2 = 0.87) by adopting a multivariate approach and including an additional SASA descriptor.[10] In 2017, Ebbrell et al. presented a thorough analysis on how their previously developed fragment profiler could be used in the prediction of aquatic toxicity (towardT. pyriformis) and skin sensitization potential.[26] This study highlighted the crucial role of ΔEHEI regression models for the prediction of important toxicological endpoints such as EC50/pEC50. Thus, it is clear that intermediates and ΔEHEI are of great use and importance in predictive toxicology. Furthermore, intermediate structures are particularly desirable from a practical standpoint; TS calculations are not only computationally costly but are challenging for both experts and nonexperts to perform.[27] Compared with TSs, intermediate structure calculations involve optimization to a minimum, which is typically less costly, easier to automate (e.g., for data set generation in machine learning (ML)), and much easier to perform.[24] In this work, we find that easily calculable intermediate energies perform better than TSs in the prediction of log(kGSH). Furthermore, we find that linear regression models built with very inexpensive features from semiempirical quantum mechanical (SQM) calculations can make reliable predictions of log(kGSH) comparable to methods using DFT calculations, even when only reactant-derived features are used.

Figure 2

GSH-MA reaction pathway showing the structure of the reactants, intermediate, TS, and product.

Materials and Methods

Kinetic glutathione assay data (log(kGSH)) for 23 1,4 MAs were taken from the work published by Böhme et al., providing experimental rate data for nine esters, seven aldehydes, and seven ketones (see Figure and Table S1 in the Supporting Information).[20] DFT calculations were performed with Gaussian 16 (Rev. A.03)[28] at the M06-2X/def2-TZVPP level of theory under the IEFPCM implicit solvation model (water), which has been used extensively for modeling organic reactions.[29−32] Implicit solvation in water was chosen to simulate the experimental conditions used in the kinetic glutathione chemoassay. In line with previous studies, and to ensure computational feasibility, methanethiolate was used as a model nucleophile in all calculations.[10,25] All regression models were developed via the Scikit-learn Python package.[33] Activation barriers and intermediate energy differences were calculated according to:where ΔG‡ is the activation free energy, GTS is the TS free energy, GMA is the free energy of a MA, ΔGHEI is the free energy difference between the intermediate and the reactant state, GInt is the free energy of the intermediate structure, and Gnuc is the free energy of methanethiolate. For full computational details, see the Supporting Information.

Figure 3

Visualizing log(kGSH) for the 23 compounds included in this study. Red indicates a higher reactivity with glutathione, while green indicates lower reactivity. Blue compounds indicate that minor truncation was performed (see the SI for full details).

Results and Discussion

In total, across the data set, 94 reactant ground states, 226 TSs, and 229 intermediate structures were obtained and verified as either minima or first-order saddle points on the M06-2X/def2-TZVPP-IEFPCM (water) potential energy surface. Two compounds were minorly truncated in this work; 1-pentene-3-one to methyl vinyl ketone and trans-2-pentenal to but-2-enal. Previous literature demonstrates that conformational freedom is often neglected in the construction of QSAR models that use descriptors derived from quantum chemical methods.[34−36] However, a more thorough exploration of chemical space is needed. To illustrate this, six reactant conformers were obtained for 2-ethylacrolein, and the lowest energy structure was 3.26 kcal/mol lower in energy than the highest energy conformation. Thus, if single conformations are examined, large errors can be introduced in the calculation of thermochemical data. A further example includes the two reactant conformers obtained for methyl methacrylate; a difference of 7.73 kcal/mol was obtained between the two conformers. Thus, it is quite clear that without direct consideration of molecular flexibility, calculated activation barriers and relative intermediate energies can vary considerably.

TS Calculations

Across the data set, experimental rate constants ranged from 3.1 to −2.15 log units (see Figure ). The more positive a value of log(kGSH), the faster the reaction between an MA and methanethiolate, and thus, the corresponding MA is more reactive. Our calculated activation energies ranged from 9.7 to 16.1 kcal/mol across all compounds, with esters showing the largest range among the three groups. A single descriptor linear regression analysis showed a poor correlation between log(kGSH) and ΔG‡ (Figure ), with a squared Pearson correlation coefficient (r2) of 0.49 between the predicted and measured log(kGSH). The resultant model demonstrates that TSs and free energies of activation provide a poor prediction of log(kGSH) in the MA data set, with a relatively large test set mean absolute error (MAE) of 0.69 log units. These results show good agreement with previous work by Schwöbel et al., where it was shown that at the B3LYP/6-31G** level of theory, no global model for esters, ketones, and aldehydes exists that utilizes activation energies derived from TSs.[21] Additionally, it has been previously reported that DFT can prove problematic in the study of charged ionic TSs due to large errors in the calculated reactivity parameters.[37] It is also known that most implicit solvent models define the solvent cavity as a set of overlapping spheres with fixed atomic radii. These radii are experimentally calculated, and for IEFPCM, the Bondi data set is used; for each atom, the atomic radius is calculated through multiplication of a constant (1.2) by the Bondi atomic van der Waals radius.[38,39] These empirically determined radii use solvation free energy training sets for parameterization, and TSs are often not included in the initial training set, thereby introducing error when TSs are examined with implicit solvent models.[40] Additionally, radii of the specific atoms involved in the bond-forming/breaking process (e.g., S–C from our study) are often poorly defined in TSs for this reason. It must also be noted that from a practical perspective, explicit solvation is too costly and would eliminate the easy-to-use nature of our models.

Figure 4

Linear regression of log(kGSH) on the activation energy derived from transition state structures (M06-2X/def2-TZVPP-IEFPCM (water)) was performed. Predicted log(kGSH) is plotted against measured log(kGSH). Therefore, our work, combined with previous literature, indicates that a poor correlation between log(kGSH) and activation energy is likely to be independent of both basis set and DFT functional. However, to further corroborate this, benchmarking studies could be performed to thoroughly examine the role that level of theory and solvation method plays in this problem. Upon examination of our results and previously published work, it is clear that single-descriptor ΔG‡ models do not provide a sufficient description of MA reactivity toward sulfur nucleophiles and should not be used for making predictions about the reactivity of MAs with glutathione.

Intermediate Structures

The computed stability of the enolate intermediates after C–S bond formation, which closely resembles the TSs in line with Hammond’s postulate, may be used to accurately predict the reactivity of MAs toward thiol containing compounds.[41] Across the data set, calculated intermediate energies ΔGHEI varied from 2.4 to 13.3 kcal/mol. Intermediate energies for ketones ranged from 2.4 to 11.6 kcal/mol, aldehydes ranged from 2.8 to 7.5 kcal/mol, and esters ranged from 6.2 to 13.3 kcal/mol. A linear regression analysis with ΔGHEI resulted in a significantly improved model compared to TS barriers (Figure ), with r2= 0.76 and a test set MAE of 0.48 log units. This is a noteworthy result; generally, it would be expected that TS barriers correlate more strongly with kinetic data such as log(kGSH), if the rate-determining step has been modeled. As highlighted by Schürmann and co-workers, the rate-determining step is the addition of methanethiolate to the MA, and the protonation step (see Figure , TS-2) is expected to be fast and not rate-limiting.[22] However, to ensure we have focused our calculations on the rate-determining step (TS-1), we calculated reaction barriers of the protonation step (ΔG‡prot) for six structures in our data set; two esters, two ketones, and two aldehydes, with a fast and slow reacting compound being chosen for each (see Table S2). Methanethiol was the proton source as per the work by Northrop and co-workers.[42] All calculated barriers for the protonation step were lower than the corresponding barrier for the addition of methanethiolate by an average value of 6.41 kcal/mol, thus confirming it highly likely that thiolate addition is the rate-determining step. From these results, it remains clear that, in line with the Hammond postulate, calculation of ΔGHEI can provide a fast, high-throughput measure of 1,4 MA reactivity toward methanethiolate (and thus, thiol containing compounds), with our models permitting strong quantitative predictions to be made for initial, preliminary reactivity assessments. As detailed above, it is likely that the deficiencies of DFT TS modeling account for why our intermediate models perform better than those using TS-derived features.

Figure 5

Linear regression of log(kGSH) on the intermediate energy differences (M06-2X/def2-TZVPP-IEFPCM (water)). Predicted log(kGSH) is plotted against measured log(kGSH).

Linear regression of log(kGSH) on the intermediate energy differences (M06-2X/def2-TZVPP-IEFPCM (water)). Predicted log(kGSH) is plotted against measured log(kGSH). A key advantage in the development of this model is its simplicity. A single variable regression model that uses calculated intermediate energies is very simple to use for both experts and nonexperts. TS searching is a nontrivial task, involving specialist knowledge and a high degree of human input to arrive at suitable structures. Optimization of only reactant and intermediate structures is significantly easier for the nonspecialist to make predictions. Minima, such as the intermediate structures presented here, converge more readily, thereby providing a great practical advantage for making predictions.[43] Successfully obtaining a set of TSs for a single structure can take many rounds of re-optimization, while intermediate geometries can be readily obtained after a single calculation. A generalized workflow for obtaining intermediate structures was presented in one of our previous studies, and can be equally applied in the context of this study.[24] Our model presents a fast, easy-to-use method for predicting the reactivity of 1,4 MAs with thiol containing compounds.

Multivariate Models

To see if the intermediate or TS models could be improved upon, a number of multivariate linear regression models were also generated using key Mulliken atomic charges, APT atomic charges (Figure ), and several other chemical features (see Supporting Information) for all MAs, intermediates, and TSs. With the aim to create an easy-to-use method, Mulliken and APT charges, which are readily extracted from Gaussian output files and are thus very user friendly for the nonexpert performing calculations, were selected. The best results were obtained using MA features only, with r2 = 0.88 and an MAE of 0.35 log units via a three-descriptor model combining the Mulliken atomic charges of the α- and β-carbons and the carbonyl oxygen (Figure ). Not only are these results a significant improvement on the ΔGHEI model, but with fewer atoms and less conformational flexibility than the corresponding intermediates, calculations on reactant MA structures are even more trivial.

Figure 6

Mulliken and APT atomic charges were calculated for highlighted atoms only.

Figure 7

Linear regression of log(kGSH) on key atomic charges of the MA (M06-2X/def2-TZVPP-IEFPCM (water)). Predicted log(kGSH) is plotted against measured log(kGSH).

Mulliken and APT atomic charges were calculated for highlighted atoms only. Linear regression of log(kGSH) on key atomic charges of the MA (M06-2X/def2-TZVPP-IEFPCM (water)). Predicted log(kGSH) is plotted against measured log(kGSH).

Semiempirical Models

Finally, we reoptimized all MAs and intermediates using the SQM AM1 method[44] and generated multivariate linear models using the same chemical features as above. Because SQM calculations are typically orders of magnitude faster than DFT, regression models derived from SQM calculations could substantially reduce the computational expense of predictions. A result of particular significance is the negative relationship observed between log(kGSH) and the Mulliken charges (of C1 and C2) in the regression equation (see Figure ). Although counterintuitive according to chemical intuition, previous literature indicates that MAs have a propensity to show interesting charge behavior on the α- and β-carbons. Spencer et al. used 13C NMR chemical shifts to examine the effect of electron-withdrawing groups (EWGs) at different positions on the aryl group of methyl cinnamates upon reaction with GSH.[45] The presence of EWGs resulted in an increased rate of reaction with GSH, with the chemical shifts of the β-carbon being shifted slightly upfield (less positively charged). Additionally, although downfield shifts were observed for the α-carbon resonances with an increasing rate of reaction, no obvious statistical relationship was apparent between the rate of GSH addition and the α-carbon chemical shifts. When ortho-hydroxyl substitution on the aryl group was considered, a significant observation of an upfield shift of 3–6 ppm was observed at the β-carbon. These observations suggest that the electron distribution (and thus charge) at the α- and β-carbons can behave in an unusual way in the context of sulfa-Michael addition, and they agree with our results. Further to this point, the regression equation demonstrates a positive relationship between log(kGSH) and the APT charge on the carbonyl oxygen. It is possible that due to the highly electronegative nature of oxygen, an increasingly positive charge on the carbonyl oxygen is indicative of less overall electron density in the conjugated α,β-unsaturated carbonyl substructure. Thus, with lower electron density, the MA electrophile should become more receptive to nucleophilic attack, and the rate of reaction would increase as the charge becomes increasingly positive on oxygen. Similar to the model presented in the previous section, the best performing model utilized atomic charges on the α- and β-carbons and the carbonyl oxygen of the MA, with r2 = 0.89 and an MAE of 0.37 log units (Figure ). This provides very similar performance to the DFT MA model, and is better than the single feature ΔGHEI model but reduces the need for time-consuming DFT calculations. It is significant to note that a combination of models with differing levels of theory, opens up their use to a wider audience with differing computational resources.

Figure 8

Linear regression of log(kGSH) on key atomic charges of the MA (AM1). Predicted log(kGSH) is plotted against measured log(kGSH).

Conclusions

In this work, TS, intermediate, and reactant structures were explored in reactivity predictions (log(kGSH)) for a group of 23 sulfa-Michael additions. Such an approach to predicting sulfa-Michael reactivity is desirable for high-throughput screening in chemical synthesis, toxicology (aquatic toxicity and skin sensitization prediction), drug discovery (targeted covalent inhibitor design), and reactivity data set generation for ML. Further, our models provide a practical advantage in shifting the focus toward more sustainable approaches to chemical reactivity assessment. TSs were first considered, and activation free energies showed poor predictive performance in regression analyses toward log(kGSH) (r2 = 0.49, MAE = 0.69 log units). Intermediate enolate structures that follow the C–S bond-forming TSs were also considered and showed stronger predictive performance in regression analyses toward log(kGSH) (r2 = 0.76, MAE = 0.48 log units). Intermediate structures provide two key advantages over TSs: they are easier to compute and provide increased calculation speeds. Using linear combinations of purely reactant-derived chemical features, thus simplifying the calculations even further, resulted in noticeable improvements to our models with respective squared Pearson correlation coefficients and MAEs of 0.88 and 0.35 log units with DFT, and 0.89 and 0.37 log units with SQM. The models presented here are fast and easy-to-use methods for predicting log(kGSH).

31 in total

1. A new polymer-anchored chiral catalyst for asymmetric Michael addition reactions.

Authors: G Sundararajan; N Prabagaran
Journal: Org Lett Date: 2001-02-08 Impact factor: 6.005

2. Quantum descriptors for predictive toxicology of halogenated aliphatic hydrocarbons.

Authors: S Trohalaki; R Pachter
Journal: SAR QSAR Environ Res Date: 2003-04 Impact factor: 3.000

3. An evaluation of selected global (Q)SARs/expert systems for the prediction of skin sensitisation potential.

Authors: G Patlewicz; A O Aptula; E Uriarte; D W Roberts; P S Kern; G F Gerberick; I Kimber; R J Dearman; C A Ryan; D A Basketter
Journal: SAR QSAR Environ Res Date: 2007 Jul-Sep Impact factor: 3.000

4. Validation of a Fragment-Based Profiler for Thiol Reactivity for the Prediction of Toxicity: Skin Sensitization and Tetrahymena pyriformis.

Authors: David J Ebbrell; Judith C Madden; Mark T D Cronin; Terry W Schultz; Steven J Enoch
Journal: Chem Res Toxicol Date: 2017-01-19 Impact factor: 3.739

5. Vinylidene Homologation of Boronic Esters and its Application to the Synthesis of the Proposed Structure of Machillene.

Authors: James M Fordham; Matthew N Grayson; Varinder K Aggarwal
Journal: Angew Chem Int Ed Engl Date: 2019-09-18 Impact factor: 15.336

10. Chemoavailability of Organic Electrophiles: Impact of Hydrophobicity and Reactivity on Their Aquatic Excess Toxicity.

Authors: Alexander Böhme; Anja Laqua; Gerrit Schüürmann
Journal: Chem Res Toxicol Date: 2016-05-13 Impact factor: 3.739