Piers A Townsend1,2, Elliot H E Farrar2, Matthew N Grayson2. 1. Centre for Sustainable Chemical Technologies, Department of Chemistry, University of Bath, Claverton Down, Bath BA2 7AY, U.K. 2. Department of Chemistry, University of Bath, Claverton Down, Bath BA2 7AY, U.K.
Abstract
Fast and accurate computational approaches to predicting reactivity in sulfa-Michael additions are required for high-throughput screening in toxicology (e.g., predicting excess aquatic toxicity and skin sensitization), chemical synthesis, covalent drug design (e.g., targeting cysteine), and data set generation for machine learning. The kinetic glutathione chemoassay is a time-consuming in chemico method used to extract kinetic data in the form of log(k GSH) for organic electrophiles. In this work, we use density functional theory to compare the use of transition states (TSs) and enolate intermediate structures following C-S bond formation in the prediction of log(k GSH) for a diverse group of 1,4 Michael acceptors. Despite the widespread use of transition state calculations in the literature to predict sulfa-Michael reactivity, we observe that intermediate structures show much better performance for the prediction of log(k GSH), are faster to calculate, and easier to obtain than TSs. Furthermore, we show how linear combinations of atomic charges from the isolated Michael acceptors can further improve predictions, even when using inexpensive semiempirical quantum chemistry methods. Our models can be used widely in the chemical sciences (e.g., in the prediction of toxicity relevant to the environment and human health, synthesis planning, and the design of cysteine-targeting covalent inhibitors), and represent a low-cost, sustainable approach to reactivity assessment.
Fast and accurate computational approaches to predicting reactivity in sulfa-Michael additions are required for high-throughput screening in toxicology (e.g., predicting excess aquatic toxicity and skin sensitization), chemical synthesis, covalent drug design (e.g., targeting cysteine), and data set generation for machine learning. The kinetic glutathione chemoassay is a time-consuming in chemico method used to extract kinetic data in the form of log(k GSH) for organic electrophiles. In this work, we use density functional theory to compare the use of transition states (TSs) and enolate intermediate structures following C-S bond formation in the prediction of log(k GSH) for a diverse group of 1,4 Michael acceptors. Despite the widespread use of transition state calculations in the literature to predict sulfa-Michael reactivity, we observe that intermediate structures show much better performance for the prediction of log(k GSH), are faster to calculate, and easier to obtain than TSs. Furthermore, we show how linear combinations of atomic charges from the isolated Michael acceptors can further improve predictions, even when using inexpensive semiempirical quantum chemistry methods. Our models can be used widely in the chemical sciences (e.g., in the prediction of toxicity relevant to the environment and human health, synthesis planning, and the design of cysteine-targeting covalent inhibitors), and represent a low-cost, sustainable approach to reactivity assessment.
Michael addition reactions, characterized
by nucleophilic attack
at the β-carbon of an α,β-unsaturated carbonyl compound,
have been widely used in synthesis for generating a variety of carbon-nucleophile
bond types (e.g., C–S, C–N, and C–C bonds).[1−3] The sulfa-Michael addition in particular is a highly important reaction
given its extensive use in organic synthesis[4−6] pharmacology,[7,8] toxicology,[9,10] and materials science.[11] Therefore, the ability to assess and predict
the reactivity of Michael acceptors (MAs) toward sulfur nucleophiles
is of paramount importance across a range of disciplines. In chemical
synthesis, predicting rates of reaction between MAs and a given nucleophile
would provide low-cost, quantitative predictions for novel synthetic
transformations. In toxicology, such predictions could be used in
the chemical risk assessment of aquatic toxicity and skin sensitization,
while aligning with the increasingly important “Green”
toxicology approach.[12−14] Lastly, in drug discovery, such predictions could
be used in the design of cysteine-targeting, targeted covalent inhibitors
(TCIs). TCIs are an emerging class of compounds in drug discovery
and recently, three FDA-approved drugs (Afatanib, Ibrutinib, and Osimertinib)
were designed to react irreversibly with a sulfur-containing cysteine
residue through a hetero-Michael addition.[15,16]In 2006, Schultz et al. presented a pioneering framework for
modeling
reactions between an electrophilic toxicant and a biological macromolecule,
many of which are nucleophilic.[17] This
framework proposed the use of model nucleophiles to better understand
reactive toxicity and the ultimate biological effects associated with
its existence. Traditionally, MA reactivity has been assessed using
chemoassays, with the kinetic glutathione chemoassay developed by
Böhme et al. being the most widely used to examine sulfa-Michael
additions.[18] This method has been used
extensively to obtain second-order rate constants [L mol–1 min–1] to assess the reactivity of different α,β-unsaturated
carbonyl compounds and uses glutathione (GSH) as the nucleophile (Figure ).[19] In 2016, Schürmann and co-workers compiled, to the
best of our knowledge, the largest known experimental kinetic glutathione
assay rate data set for α,β-unsaturated carbonyl compounds.[20] As we move toward a more sustainable future,
it is vital that in silico approaches are developed that align with
the principles of green chemistry. Predictive in silico approaches
provide advantages in the design of safer chemicals and less hazardous
syntheses, along with reducing harmful waste products that are commonly
utilized in experimental methods. Thus, to reduce the environmental,
financial, and time-based costs attached to chemoassays, there have
been numerous attempts to develop quantum chemical transition state
(TS) approaches to the prediction of MA reactivity data sets. In 2010,
Cronin and co-workers performed TS calculations for 22 MAs reacting
with methanethiolate.[21] They presented
a QSAR regression equation for the prediction of log(kGSH) that used the activation barrier ΔE‡PCM for both the forward and backward
reaction, and the solvent accessible surface area (SASA), resulting
in an impressive squared Pearson correlation coefficient (r2) of 0.90 between their calculated and experimental
values for log(kGSH):
Figure 1
Generalized reaction mechanism between glutathione
and a 1,4 Michael
acceptor. “Glut” represents a glutathione residue minus
its sulfur atom.
Generalized reaction mechanism between glutathione
and a 1,4 Michael
acceptor. “Glut” represents a glutathione residue minus
its sulfur atom.Although this work did provide strong regression
statistics, multiple
descriptors were required, and when only ΔE‡PCM and the SASA were utilized as features,
a relatively poor correlation (r2 = 0.51)
was observed. In the same year, Schürmann and co-workers published
a set of multidescriptor models based on ground state features such
as a charge-limited local electrophilicity index, a σ-bond energy
(obtained from natural bond order analysis), and SASAs for both the
α- and β-carbons.[22] Overall,
their models showed strong statistics, with a four-descriptor model
showing the best performance (r2 = 0.93).
In 2011, Mulliner et al. used TS calculations to predict log(kGSH) for 35 α,β-unsaturated carbonyl
compounds.[23] They also performed calculations
under the PCM solvent model to examine the effect of implicit solvation
on the prediction of log(kGSH). For solvated
and nonsolvated models, respectively, they obtained regression equations
with r2 = 0.76 and r2 = 0.68 between calculated and experimental log(kGSH) values. Following this in 2013, the same authors
calculated reaction barriers (ΔE‡) for a set of esters, and from ΔE‡, calculated log(kGSH) using their previously
developed regression model from 2011. This study examined the use
of log(kGSH) and hydrophobicity (in the
form of log(Kow), where Kow is the octanol–water partition coefficient)
for the prediction of 50% growth inhibition ofT. pyriformis (log(EC50)). It is thus clear that TSs have commonly
been used in assessing MA reactivity, but far fewer studies have examined
high energy intermediate structures (HEI, see Figure ) and their ability to predict reactivity.[24] In 2013, Enoch and Roberts performed DFT calculations
on a data set of 26 MAs and calculated their ΔEHEI values upon reaction with methanethiolate.[25] With the full data set, their results showed
a poor correlation (r2 = 0.02) between
skin sensitization potency (pEC3) and intermediate energies upon linear
regression analysis. Upon refining the data set and removing some
problematic compounds from the analysis, their results showed improved
correlation (r2 = 0.43) for a single descriptor
model using intermediate energies. This was further improved by adding
a SASA descriptor, resulting in a multivariate regression model with
strong statistics (r2 = 0.79). Further
in 2016, Ebbrell et al. developed an in silico profiler for the prediction
of RC50 values using intermediate MA structures, where
RC50 is the concentration of electrophiles needed to reduce
the concentration of GSH by 50%. Their results showed that a single
descriptor model using ΔEHEI resulted
in modest regression statistics (n = 54, r2 = 0.52). However, this model was greatly improved
(n = 41, r2 = 0.87) by
adopting a multivariate approach and including an additional SASA
descriptor.[10] In 2017, Ebbrell et al. presented
a thorough analysis on how their previously developed fragment profiler
could be used in the prediction of aquatic toxicity (towardT. pyriformis) and skin sensitization potential.[26] This study highlighted the crucial role of ΔEHEI regression models for the prediction of
important toxicological endpoints such as EC50/pEC50. Thus, it is clear that intermediates and ΔEHEI are of great use and importance in predictive
toxicology. Furthermore, intermediate structures are particularly
desirable from a practical standpoint; TS calculations are not only
computationally costly but are challenging for both experts and nonexperts
to perform.[27] Compared with TSs, intermediate
structure calculations involve optimization to a minimum, which is
typically less costly, easier to automate (e.g., for data set generation
in machine learning (ML)), and much easier to perform.[24] In this work, we find that easily calculable
intermediate energies perform better than TSs in the prediction of
log(kGSH). Furthermore, we find that linear
regression models built with very inexpensive features from semiempirical
quantum mechanical (SQM) calculations can make reliable predictions
of log(kGSH) comparable to methods using
DFT calculations, even when only reactant-derived features are used.
Figure 2
GSH-MA
reaction pathway showing the structure of the reactants,
intermediate, TS, and product.
GSH-MA
reaction pathway showing the structure of the reactants,
intermediate, TS, and product.
Materials and Methods
Kinetic glutathione assay data
(log(kGSH)) for 23 1,4 MAs were taken
from the work published by Böhme
et al., providing experimental rate data for nine esters, seven aldehydes,
and seven ketones (see Figure and Table S1 in the Supporting
Information).[20] DFT calculations were performed
with Gaussian 16 (Rev. A.03)[28] at the M06-2X/def2-TZVPP
level of theory under the IEFPCM implicit solvation model (water),
which has been used extensively for modeling organic reactions.[29−32] Implicit solvation in water was chosen to simulate the experimental
conditions used in the kinetic glutathione chemoassay. In line with
previous studies, and to ensure computational feasibility, methanethiolate
was used as a model nucleophile in all calculations.[10,25] All regression models were developed via the Scikit-learn Python
package.[33] Activation barriers and intermediate
energy differences were calculated according to:where ΔG‡ is the activation free energy, GTS is
the TS free energy, GMA is the free energy
of a MA, ΔGHEI is the free energy
difference between the intermediate and the reactant state, GInt is the free energy of the intermediate structure,
and Gnuc is the free energy of methanethiolate.
For full computational details, see the Supporting Information.
Figure 3
Visualizing log(kGSH) for
the 23 compounds
included in this study. Red indicates a higher reactivity with glutathione,
while green indicates lower reactivity. Blue compounds indicate that
minor truncation was performed (see the SI for full details).
Visualizing log(kGSH) for
the 23 compounds
included in this study. Red indicates a higher reactivity with glutathione,
while green indicates lower reactivity. Blue compounds indicate that
minor truncation was performed (see the SI for full details).
Results and Discussion
In total, across the data set,
94 reactant ground states, 226 TSs,
and 229 intermediate structures were obtained and verified as either
minima or first-order saddle points on the M06-2X/def2-TZVPP-IEFPCM
(water) potential energy surface. Two compounds were minorly truncated
in this work; 1-pentene-3-one to methyl vinyl ketone and trans-2-pentenal
to but-2-enal. Previous literature demonstrates that conformational
freedom is often neglected in the construction of QSAR models that
use descriptors derived from quantum chemical methods.[34−36] However, a more thorough exploration of chemical space is needed.
To illustrate this, six reactant conformers were obtained for 2-ethylacrolein,
and the lowest energy structure was 3.26 kcal/mol lower in energy
than the highest energy conformation. Thus, if single conformations
are examined, large errors can be introduced in the calculation of
thermochemical data. A further example includes the two reactant conformers
obtained for methyl methacrylate; a difference of 7.73 kcal/mol was
obtained between the two conformers. Thus, it is quite clear that
without direct consideration of molecular flexibility, calculated
activation barriers and relative intermediate energies can vary considerably.
TS Calculations
Across the data set, experimental rate
constants ranged from 3.1 to −2.15 log units (see Figure ). The more positive
a value of log(kGSH), the faster the reaction
between an MA and methanethiolate, and thus, the corresponding MA
is more reactive. Our calculated activation energies ranged from 9.7
to 16.1 kcal/mol across all compounds, with esters showing the largest
range among the three groups. A single descriptor linear regression
analysis showed a poor correlation between log(kGSH) and ΔG‡ (Figure ), with a squared
Pearson correlation coefficient (r2) of
0.49 between the predicted and measured log(kGSH). The resultant model demonstrates that TSs and free energies
of activation provide a poor prediction of log(kGSH) in the MA data set, with a relatively large test set mean
absolute error (MAE) of 0.69 log units. These results show good agreement
with previous work by Schwöbel et al., where it was shown that
at the B3LYP/6-31G** level of theory, no global model for esters,
ketones, and aldehydes exists that utilizes activation energies derived
from TSs.[21] Additionally, it has been previously
reported that DFT can prove problematic in the study of charged ionic
TSs due to large errors in the calculated reactivity parameters.[37] It is also known that most implicit solvent
models define the solvent cavity as a set of overlapping spheres with
fixed atomic radii. These radii are experimentally calculated, and
for IEFPCM, the Bondi data set is used; for each atom, the atomic
radius is calculated through multiplication of a constant (1.2) by
the Bondi atomic van der Waals radius.[38,39] These empirically
determined radii use solvation free energy training sets for parameterization,
and TSs are often not included in the initial training set, thereby
introducing error when TSs are examined with implicit solvent models.[40] Additionally, radii of the specific atoms involved
in the bond-forming/breaking process (e.g., S–C from our study)
are often poorly defined in TSs for this reason. It must also be noted
that from a practical perspective, explicit solvation is too costly
and would eliminate the easy-to-use nature of our models.
Figure 4
Linear regression
of log(kGSH) on the
activation energy derived from transition state structures (M06-2X/def2-TZVPP-IEFPCM
(water)) was performed. Predicted log(kGSH) is plotted against measured log(kGSH).
Linear regression
of log(kGSH) on the
activation energy derived from transition state structures (M06-2X/def2-TZVPP-IEFPCM
(water)) was performed. Predicted log(kGSH) is plotted against measured log(kGSH).Therefore, our work, combined with previous literature,
indicates
that a poor correlation between log(kGSH) and activation energy is likely to be independent of both basis
set and DFT functional. However, to further corroborate this, benchmarking
studies could be performed to thoroughly examine the role that level
of theory and solvation method plays in this problem. Upon examination
of our results and previously published work, it is clear that single-descriptor
ΔG‡ models do not provide
a sufficient description of MA reactivity toward sulfur nucleophiles
and should not be used for making predictions about the reactivity
of MAs with glutathione.
Intermediate Structures
The computed stability of the
enolate intermediates after C–S bond formation, which closely
resembles the TSs in line with Hammond’s postulate, may be
used to accurately predict the reactivity of MAs toward thiol containing
compounds.[41] Across the data set, calculated
intermediate energies ΔGHEI varied
from 2.4 to 13.3 kcal/mol. Intermediate energies for ketones ranged
from 2.4 to 11.6 kcal/mol, aldehydes ranged from 2.8 to 7.5 kcal/mol,
and esters ranged from 6.2 to 13.3 kcal/mol. A linear regression analysis
with ΔGHEI resulted in a significantly
improved model compared to TS barriers (Figure ), with r2= 0.76 and a test set MAE of 0.48 log units. This is a noteworthy
result; generally, it would be expected that TS barriers correlate
more strongly with kinetic data such as log(kGSH), if the rate-determining step has been modeled. As highlighted
by Schürmann and co-workers, the rate-determining step is the
addition of methanethiolate to the MA, and the protonation step (see Figure , TS-2) is expected
to be fast and not rate-limiting.[22] However,
to ensure we have focused our calculations on the rate-determining
step (TS-1), we calculated reaction barriers of the protonation step
(ΔG‡prot) for
six structures in our data set; two esters, two ketones, and two aldehydes,
with a fast and slow reacting compound being chosen for each (see Table S2). Methanethiol was the proton source
as per the work by Northrop and co-workers.[42] All calculated barriers for the protonation step were lower than
the corresponding barrier for the addition of methanethiolate by an
average value of 6.41 kcal/mol, thus confirming it highly likely that
thiolate addition is the rate-determining step. From these results,
it remains clear that, in line with the Hammond postulate, calculation
of ΔGHEI can provide a fast, high-throughput
measure of 1,4 MA reactivity toward methanethiolate (and thus, thiol
containing compounds), with our models permitting strong quantitative
predictions to be made for initial, preliminary reactivity assessments.
As detailed above, it is likely that the deficiencies of DFT TS modeling
account for why our intermediate models perform better than those
using TS-derived features.
Figure 5
Linear regression of log(kGSH) on the
intermediate energy differences (M06-2X/def2-TZVPP-IEFPCM (water)).
Predicted log(kGSH) is plotted against
measured log(kGSH).
Linear regression of log(kGSH) on the
intermediate energy differences (M06-2X/def2-TZVPP-IEFPCM (water)).
Predicted log(kGSH) is plotted against
measured log(kGSH).A key advantage in the development of this model
is its simplicity.
A single variable regression model that uses calculated intermediate
energies is very simple to use for both experts and nonexperts. TS
searching is a nontrivial task, involving specialist knowledge and
a high degree of human input to arrive at suitable structures. Optimization
of only reactant and intermediate structures is significantly easier
for the nonspecialist to make predictions. Minima, such as the intermediate
structures presented here, converge more readily, thereby providing
a great practical advantage for making predictions.[43] Successfully obtaining a set of TSs for a single structure
can take many rounds of re-optimization, while intermediate geometries
can be readily obtained after a single calculation. A generalized
workflow for obtaining intermediate structures was presented in one
of our previous studies, and can be equally applied in the context
of this study.[24] Our model presents a fast,
easy-to-use method for predicting the reactivity of 1,4 MAs with thiol
containing compounds.
Multivariate Models
To see if the intermediate or TS
models could be improved upon, a number of multivariate linear regression
models were also generated using key Mulliken atomic charges, APT
atomic charges (Figure ), and several other chemical features (see Supporting Information) for all MAs, intermediates, and TSs. With the
aim to create an easy-to-use method, Mulliken and APT charges, which
are readily extracted from Gaussian output files and are thus very
user friendly for the nonexpert performing calculations, were selected.
The best results were obtained using MA features only, with r2 = 0.88 and an MAE of 0.35 log units via a
three-descriptor model combining the Mulliken atomic charges of the
α- and β-carbons and the carbonyl oxygen (Figure ). Not only are these results
a significant improvement on the ΔGHEI model, but with fewer atoms and less conformational flexibility
than the corresponding intermediates, calculations on reactant MA
structures are even more trivial.
Figure 6
Mulliken and APT atomic charges were calculated
for highlighted
atoms only.
Figure 7
Linear regression of log(kGSH) on key
atomic charges of the MA (M06-2X/def2-TZVPP-IEFPCM (water)). Predicted
log(kGSH) is plotted against measured
log(kGSH).
Mulliken and APT atomic charges were calculated
for highlighted
atoms only.Linear regression of log(kGSH) on key
atomic charges of the MA (M06-2X/def2-TZVPP-IEFPCM (water)). Predicted
log(kGSH) is plotted against measured
log(kGSH).
Semiempirical Models
Finally, we reoptimized all MAs
and intermediates using the SQM AM1 method[44] and generated multivariate linear models using the same chemical
features as above. Because SQM calculations are typically orders of
magnitude faster than DFT, regression models derived from SQM calculations
could substantially reduce the computational expense of predictions.
A result of particular significance is the negative relationship observed
between log(kGSH) and the Mulliken charges
(of C1 and C2) in the regression equation (see Figure ). Although counterintuitive according to
chemical intuition, previous literature indicates that MAs have a
propensity to show interesting charge behavior on the α- and
β-carbons. Spencer et al. used 13C NMR chemical shifts
to examine the effect of electron-withdrawing groups (EWGs) at different
positions on the aryl group of methyl cinnamates upon reaction with
GSH.[45] The presence of EWGs resulted in
an increased rate of reaction with GSH, with the chemical shifts of
the β-carbon being shifted slightly upfield (less positively
charged). Additionally, although downfield shifts were observed for
the α-carbon resonances with an increasing rate of reaction,
no obvious statistical relationship was apparent between the rate
of GSH addition and the α-carbon chemical shifts. When ortho-hydroxyl
substitution on the aryl group was considered, a significant observation
of an upfield shift of 3–6 ppm was observed at the β-carbon.
These observations suggest that the electron distribution (and thus
charge) at the α- and β-carbons can behave in an unusual
way in the context of sulfa-Michael addition, and they agree with
our results. Further to this point, the regression equation demonstrates
a positive relationship between log(kGSH) and the APT charge on the carbonyl oxygen. It is possible that
due to the highly electronegative nature of oxygen, an increasingly
positive charge on the carbonyl oxygen is indicative of less overall
electron density in the conjugated α,β-unsaturated carbonyl
substructure. Thus, with lower electron density, the MA electrophile
should become more receptive to nucleophilic attack, and the rate
of reaction would increase as the charge becomes increasingly positive
on oxygen. Similar to the model presented in the previous section,
the best performing model utilized atomic charges on the α-
and β-carbons and the carbonyl oxygen of the MA, with r2 = 0.89 and an MAE of 0.37 log units (Figure ). This provides
very similar performance to the DFT MA model, and is better than the
single feature ΔGHEI model but reduces
the need for time-consuming DFT calculations. It is significant to
note that a combination of models with differing levels of theory,
opens up their use to a wider audience with differing computational
resources.
Figure 8
Linear regression of log(kGSH) on key
atomic charges of the MA (AM1). Predicted log(kGSH) is plotted against measured log(kGSH).
Linear regression of log(kGSH) on key
atomic charges of the MA (AM1). Predicted log(kGSH) is plotted against measured log(kGSH).
Conclusions
In this work, TS, intermediate, and reactant
structures were explored
in reactivity predictions (log(kGSH))
for a group of 23 sulfa-Michael additions. Such an approach to predicting
sulfa-Michael reactivity is desirable for high-throughput screening
in chemical synthesis, toxicology (aquatic toxicity and skin sensitization
prediction), drug discovery (targeted covalent inhibitor design),
and reactivity data set generation for ML. Further, our models provide
a practical advantage in shifting the focus toward more sustainable
approaches to chemical reactivity assessment. TSs were first considered,
and activation free energies showed poor predictive performance in
regression analyses toward log(kGSH) (r2 = 0.49, MAE = 0.69 log units). Intermediate
enolate structures that follow the C–S bond-forming TSs were
also considered and showed stronger predictive performance in regression
analyses toward log(kGSH) (r2 = 0.76, MAE = 0.48 log units). Intermediate structures
provide two key advantages over TSs: they are easier to compute and
provide increased calculation speeds. Using linear combinations of
purely reactant-derived chemical features, thus simplifying the calculations
even further, resulted in noticeable improvements to our models with
respective squared Pearson correlation coefficients and MAEs of 0.88
and 0.35 log units with DFT, and 0.89 and 0.37 log units with SQM.
The models presented here are fast and easy-to-use methods for predicting
log(kGSH).
Authors: G Patlewicz; A O Aptula; E Uriarte; D W Roberts; P S Kern; G F Gerberick; I Kimber; R J Dearman; C A Ryan; D A Basketter Journal: SAR QSAR Environ Res Date: 2007 Jul-Sep Impact factor: 3.000
Authors: David J Ebbrell; Judith C Madden; Mark T D Cronin; Terry W Schultz; Steven J Enoch Journal: Chem Res Toxicol Date: 2017-01-19 Impact factor: 3.739