Literature DB >> 33534569

Nucleophilicity Prediction via Multivariate Linear Regression Analysis.

Manuel Orlandi1,2, Margarita Escudero-Casao1,2, Giulia Licini1,2.   

Abstract

The concept of nucleophilicity is at the basis of most transformations in chemistry. Understanding and predicting the relative reactivity of different nucleophiles is therefore of paramount importance. Mayr's nucleophilicity scale likely represents the most complete collection of reactivity data, which currently includes over 1200 nucleophiles. Several attempts have been made to theoretically predict Mayr's nucleophilicity parameters N based on calculation of molecular properties, but a general model accounting for different classes of nucleophiles could not be obtained so far. We herein show that multivariate linear regression analysis is a suitable tool for obtaining a simple model predicting N for virtually any class of nucleophiles in different solvents for a set of 341 data points. The key descriptors of the model were found to account for the proton affinity, solvation energies, and sterics.

Entities:  

Year:  2021        PMID: 33534569      PMCID: PMC7901016          DOI: 10.1021/acs.joc.0c02952

Source DB:  PubMed          Journal:  J Org Chem        ISSN: 0022-3263            Impact factor:   4.354


Introduction

The term nucleophile (and electrophile) was first introduced by Ingold in 1933.[1,2] Since then, the field of physical organic chemistry has grown shaping the thinking of organic chemists, and the concept of nucleophilicity established roots becoming one of the basis of organic chemistry. In the attempt to establish general reactivity rules, several groups developed different nucleophilicity scales. The Swain–Scott[3] and the Ritchie[4,5] equations are examples of early contributions in the field. However, a more general scale was more recently found to rely on the Mayr–Patz[6,7] equation (eq )where k is the reaction rate constant, N is the nucleophilicity parameter, E is the electrophilicity parameter, and sN is a nucleophile-specific sensitivity parameter. Equation was found to provide linear correlations for more than 1200 nucleophiles and 330 electrophiles over a reactivity range greater than 35 orders of magnitude.[7c] The reasons for such long linear correlations and the physical basis of these parameters are still not fully understood.[7d,8] However, the advantages provided by the Mayr–Patz reactivity scale from an experimental perspective are unquestionable.[9] Given N, sN, and E for two reaction partners, it is possible to predict semi-quantitatively their reaction rate constant k at 20 °C. Moreover, comparison of N for different nucleophiles (or E for different electrophiles) gives direct evaluation of their relative reactivity. The determination of N and sN parameters from experimentally measured rate data for a nucleophile requires the kinetic analysis of the reaction with a series of suitable reference electrophiles of known electrophilicity E.[6a,7d] However, this experimental procedure is time-consuming in the context of predicting/screening the reactivity of compounds outside the database. Given the increasing precision of quantum chemical calculations, the kinetics and thermodynamics of most organic reactions can now be evaluated in reasonable time. Therefore, several groups employed computational methods to evaluate Mayr’s reactivity parameters for different species. The groups of Houk and Mayr reported that the electrophilicities E of benzhydrylium ions correlate with their affinities for CH3–, OH–, and H–.[8a] Contreras, Pérez, Gázquez, and Fujiyama explored interesting trends between E and N with hard and soft acids and bases theory (HSAB) indices.[10] Fu and Liu evaluated N, sN, and E for a set of nucleophiles and electrophiles by reproducing the experimental procedure in silico via calculation of reaction energy barriers.[8d] These studies showed that benzhydrylium electrophilicities E correlate with simple descriptors such as the lowest unoccupied molecular orbital (LUMO) energy or the Mulliken atomic charge at the electrophilic C atom. However, Michael acceptors did not follow the same trend and were shown to correlate with their methyl anion affinities instead.[8a,11] Similarly, nucleophiles provided fair correlations with the highest occupied molecular orbital (HOMO) energy only when divided in structurally similar subsets over a relatively small range of N values. A general correlation accounting for different types of nucleophiles could not be found so far. This is likely due to the structural diversity of nucleophiles in the database with respect to the electrophiles. In fact, these include π-, n-, and σ-nucleophiles of different functional groups, charges (neutral or anionic nucleophiles), and steric properties. Thus, in order to access a general and simple approach for the prediction of nucleophilic reactivity, we reasoned that such a variety of features affecting reactivity would be better described by a multidimensional correlation. Ideally, each electronic, steric, or solvation factor affecting nucleophilicity could be described by an appropriate parameter and embodied into a polynomial equation via multivariate regression analysis as described by Sigman and co-workers.[12] Analysis of the model equation obtained could also provide insights regarding the significance of each descriptor for a given class of nucleophiles, thus deciphering the origin of its reactivity (Figure ).[12]
Figure 1

Rational of the work and parameters obtained.

Rational of the work and parameters obtained. According to eq , relative nucleophilic reactivities are controlled not only by the nucleophilicity parameter N but also by the susceptibility sN, with the consequence that a nucleophile A which reacts faster than B with one electrophile may react slower than B with another electrophile. This is graphically translated in line crossings in log k versus E correlations.[7a,7b] However, by using a floating scale of reference electrophiles for determining N according to eq , far-reaching extrapolations are avoided with the result that the nucleophilicity parameter N represents a good approximation for average nucleophilic reactivities. Being aware of this limitation, we will focus on the interpretation of N in this work and refer to the Supporting Information for evaluation of N·sN as a possible alternative.

Results and Discussion

The diversity of these nucleophiles poses a challenge for their parameterization as this is most often performed on sets of similar compounds in order to avoid biases given by major perturbations to the structure.[8d,10] How can steric descriptors be calculated consistently for different classes of compounds? For instance, which bond should be chosen for obtaining steric parameters in olefins to obtain meaningful description of sterics when compared to NHCs or amines? To solve this problem, we reasoned that consistent parameters could be calculated from the “electrophile perspective”. Therefore, the structure of the protonated nucleophiles (, Figure ) was optimized at the M06-2X/def2TZVP level of theory alongside with the plain nucleophile structures (, Figure ). In the general structure , the proton H represents a probe electrophile which is common for every nucleophile (N is independent from the electrophile) and allows for obtaining consistent parameters through the whole nucleophile data set. The computed parameters include the following (Figure ): % V, buried volume of a sphere centered on the electrophile H in ;[13]B1 and B5, sterimol minimum and maximum width of the nucleophile, respectively, measured from the H– bond in considering atomic van der Waals radii (i.e., least and most distant points from the bond trajectory of the van der Waals isosurface’s projection on the plane orthogonal to ); L,[14] sterimol length of the nucleophile measured from the H– bond in ;[14]NBO and MK, natural bond orbital[15] and Merz–Kollman[16] charges of the nucleophilic atom in , respectively; e and e, HOMO and LUMO energies of , respectively; e and e, HOMO and LUMO energies of , respectively;[17]μ and η, chemical potential and hardness of according to HSAB, respectively;[18]E, protonation energy of (calculated as the difference between the electronic energy of and ), that is, the negative of the proton affinity; S, S, and S, solvation energy of , , and H+, respectively, calculated by single-point energy calculation at the M06-2X/def2TZVP [SMD = solvent] level on the M06-2X/def2TZVP structure optimized in the gas phase.[19] Additional convenient descriptors were added to the parameter matrix in order to provide possible corrections to the models. These are q, the nucleophile charge accounting for rough electrostatic effects, and ε, the solvent dielectric constant. We obtained these parameters for 264 nucleophiles in different solvents for a total number of 341 data points from Mayr’s data set, which include 41 olefins,[7b,20] 38 (hetero)aromatics,[7b,20,21] 22 Si-enolates (silyl ketene acetals, silyl enol ethers),[7b,20,22] 35 enamines,[7b,20a,23] 26 ylides,[24] 5 NHCs,[25] 31 anionic C-nucleophiles (carbanions),[26] 67 N-nucleophiles (aliphatic amines, pyridines, imidazoles, etc.),[7c,7e,27] 27 anionic N-nucleophiles,[28] 41 O-nucleophiles (phenolates and carboxylates),[29a,29b] and 8 S-nucleophiles.[29c,29d] These were chosen over a nucleophilicity range of 35 N units in 7 different solvents (DCM, ACN, THF, acetone, DMF, DMSO, and water). The full list of the nucleophiles included in this study is given in Chart .
Chart 1

Nucleophiles Included in This Study

Preliminary analysis of the computed descriptors via single-parameter correlation against N showed that E provides a good basis for further multidimensional modeling. The plot reported in Figure a shows a fair correlation (R2 = 0.86) when removing anionic and N(sp3)-nucleophiles from the data set. Interestingly, this latter subset gave a parallel trend with a similar slope. Anionic nucleophiles follow a separate trend and do not correlate. As recently shown by Van Vranken in the context of electrophilicity E,[11b] this is due to stronger solvation effects at place in the case of charged species. Upon addition of solvation descriptors S and S to E, the raw model in Figure b is obtained (eq 2). Thus, inclusion of solvent effects increased the quality of the fit (R2 = 0.87) and allowed maintaining all the nucleophiles into the same correlation. Aiming at deeper understanding of the factors implicated in the correlation between N and E, we performed a multivariate linear regression analysis using E as the observable and the other acquired descriptors as the parameters. Since E is the difference between the energies of and by definition, it is not surprising that the regression rendered a linear combination of e and e with similar coefficients (R2 = 0.98, Figure ).[17]
Figure 2

(a) Correlation between the nucleophilicity and the proton affinity E of the nucleophiles. (b) Raw model obtained by adding solvation effects to E. The number of data points for each class of compounds is reported in parenthesis.

Figure 3

Correlation showing the major contribution of e and e to the parameter E. Orange crosses refer to those nucleophiles for which the σ* orbital energy was considered instead of e (vide infra).

(a) Correlation between the nucleophilicity and the proton affinity E of the nucleophiles. (b) Raw model obtained by adding solvation effects to E. The number of data points for each class of compounds is reported in parenthesis. Correlation showing the major contribution of e and e to the parameter E. Orange crosses refer to those nucleophiles for which the σ* orbital energy was considered instead of e (vide infra). This suggests that E (and therefore N) depends not only on the activation of the reagent (high e) but also on the electronic stabilization of the product (high e). The highlighted data points in the plot (orange crosses) identify those cases in which the LUMO of does not correspond to the reacting MO. These can be corrected to the shown values by changing such e values with the energy of the MO corresponding to the formed σ* orbital (see the Supporting Information). Interestingly, even though e and e are related (e is obtained by mixing e into the electrophile’s LUMO), these two parameters do not correlate among them for the whole data set (see the Supporting Information). Equation 2 in Figure b allows establishing a simple linear free-energy relationship. According to the thermodynamic cycle for a generic protonation reaction in the solvent phase (s), eq holds, where thermal and entropic factors are neglected in E.[30] Substitution with N according to eq 2 gives eq , which upon differentiation with respect to the nucleophile structural variation gives eq Equation is reminiscent of the linear free-energy relationship αδΔG = δΔG, where N, being proportional to log k, is also related to ΔG(s). Thus, eq 2 correlates N with a set of parameters reproducing the nucleophile computed relative pKa’s. This could explain why S-nucleophiles are moderate outliers in the plot in Figure b as these are known to follow a reversed H- versus C-basicity trend with respect to O-nucleophiles, for instance.[31a] This is likely due to solvation factors as Arnett showed that in the gas phase, Me2S and MeSH have higher proton affinity than Me2O and MeOH, respectively.[31b] For this reason, the methyl cation affinity was computed for a subset of 96 nucleophile/solvent combinations including all the S-nucleophiles. However, these were found to follow a similar trend to the proton affinities without providing any improvement (see the Supporting Information). The correlation in Figure b is in line with previous work by Bordwell and co-workers, who found a correlation between the nucleophilicity toward butyl chloride and the pKa of a set of fluorene-derived anionic nucleophiles.[32] Interestingly, they also found that different sets of nucleophiles follow different correlations depending on steric effects.[32] Mayr and co-workers found several correlations between N and experimental pKa values. However, they showed in different instances that such correlations are dependent on several experimental factors including the steric and solvent.[33] This prompted us to include additional parameters into our correlation via multidimensional correlation analysis in order to access an increased predictive power. We divided randomly our data set into a training set (222 data points) and an external validation set (119 data points). Submitting the N array and the parameter matrix of the training set to multivariate linear regression resulted in the model depicted in Figure (eq 6). In addition to E, S, and S, the parameters q, B1, % V, e, ε, and S were found to be relevant descriptors for N (see the Supporting Information). B1 and % V describe the steric hindrance of the nucleophile and suggest this to impact its reactivity. The magnitude of the coefficients (0.18 and −0.17 for B1 and % V, respectively) suggests that steric effects are much less important than electronics in this instance. It is worth mentioning that the combination of steric parameters rendered by the multivariate regression procedure is not a general representation of the nucleophile steric hindrance. This is related to the steric effects arising when the nucleophile is combined with benzhydrylium cations (or structurally related quinone methides). Different steric effects may be expected to impact the reaction rate when used in combination with a different set of electrophiles![34] Such effects also include noncovalent interactions that can be specific for each nucleophile–electrophile pair.[35] Additional considerations can be made concerning the inclusion of the other parameters. The correlation in Figure b, on which the multidimensional model is based, is purely thermodynamic in nature. That is, the kinetic behavior of different nucleophiles is correlated with a thermodynamic property: the pKa. This is a limitation as it is known that in some cases, Marcus’ intrinsic barriers ΔG are not colinear and not negligible with respect to the reaction free energy ΔG. This plays a crucial role in ambident reactivity.[29d,36] Mayr’s N values also bring information regarding the solvation energy of the reference electrophile, the E value of which is defined to be independent from the solvent. In addition, N, sN, and E are derived via fitting of experimental k-values, and the parameters calculated hereby are dependent on the used computational method. These procedures do not come without errors. These considerations provide a rationalization for the requirement for additional parameters that are needed to provide fine tuning of the model (q, e, ε, and S).
Figure 4

(a) Multidimensional model for Mayr’s nucleophilicity (training set: blue dots) and cross validation by external predictions (orange crosses). (b) Analysis of the residuals plotted by the nucleophile class as the training set (dots) or external predictions (crosses). For the color legend, see Figure .

(a) Multidimensional model for Mayr’s nucleophilicity (training set: blue dots) and cross validation by external predictions (orange crosses). (b) Analysis of the residuals plotted by the nucleophile class as the training set (dots) or external predictions (crosses). For the color legend, see Figure . Our model shows an excellent quality of fit (R2 = 0.935, Figure a) and robustness as suggested by cross validation analyses (leave-one-out, Q2 = 0.927 and average-2-fold, Q2 = 0.922). Moreover, the root-mean-square error (RMSE) associated to the fitting is as low as 1.41, suggesting good predictive quality. The much lower RMSE given by eq 6 compared to the one provided by eq 2 (2.56) supports the requirement for the additional parameters for fine tuning of the nucleophilicity description. Prediction of the nucleophilicities of the validation set with the model eq 6 resulted in a fit with similar properties to the training set (R2 = 0.946 and RMSE = 1.40). The plot of the residuals for both training (circles) and validation sets (crosses) is reported in Figure b and divided in subsets according with the color legend in Figure b. The residual plot shows that the error roughly increases with the nucleophilicity, thus making prediction of anionic nucleophiles more challenging. Modeling of the product N·sN showed that this could arise from neglecting sN for strong nucleophiles as the sensitivity parameter becomes more important as the nucleophilicity N increases (see the Supporting Information). The RMSE for each class of nucleophiles is also reported in Figure . Overall, the nucleophilicity of 295 nucleophiles (87% of the set explored) could be predicted with an error <2.5 units.

Conclusions

In summary, we have shown that parameterization of a variety of nucleophiles featuring a wide structural diversity is possible. This adds to recent advances into the development of holistic multidimensional models.[37] Multivariate linear regression analysis of Mayr’s nucleophilicity N with the acquired parameters allowed establishing a multidimensional model accounting for the reactivity of >300 nucleophiles over a range of 35 orders of magnitude. The model is statistically sound and provides high predicting power (RMSE < 1.5). Moreover, the descriptors appearing in the model are informative regarding the key factors affecting nucleophilicity. The nucleophile proton affinity (−E) and the solvation energy of both the nucleophile (S) and the addition product (S) appear as the main parameters. As these were shown to coincide with the pKa, this suggests that thermodynamics is a key factor contributing to the reaction rate constant k in agreement with the Bell–Evans–Polanyi principle. However, other minor parameters were rendered by multidimensional correlation analysis, which account for several effects such as solvation, sterics, and electrostatic attraction that might play a role at the level of intrinsic activation energies (e.g., deviations from the Bell–Evans–Polanyi principle according with the Marcus theory). We also showed that E can be decomposed into two major contributors, which are the HOMO energy of the nucleophile and the LUMO energy of the addition product. This observation suggests that describing general nucleophilicity as a sole function of one nucleophile feature (e.g., HOMO energy) is reductive. This could be done only in specific cases in which e and e are colinear, that is, in the case of structurally similar subsets. In these cases, factors contributing to e (change of geometry, overlap integral, mixing of other orbitals, etc.) can be neglected. Due to the high impact that Mayr’s reactivity scale has been having since its development, we foresee that this work will support its implementation in the chemists’ everyday effort toward exploring and rationalizing new chemical reactivity.

Experimental Section

All computations were performed using the Gaussian 16[38] suite of programs. The nucleophile conformational space was explored manually as most nucleophiles included in this study are small, simple molecules with only few rotatable bonds. Geometry optimizations, vibrational frequencies, and intensities were calculated at the M06-2X/def2TZVP level of theory.[39] All the structures computed in this study are minima as confirmed by vibrational analysis showing no imaginary frequencies. NBO charges were computed using NBO 3.1 implemented in Gaussian 16.[40] Buried volumes were computed using SambVca 2.1.[41] Solvation energies were computed by single-point energy calculation on the M06-2X/def2TZVP geometry at the same theory level with the addition of the SMD[42] solvation model implemented in Gaussian 16. Multidimensional regression analyses were performed using Matlab.[43] Further details are given in the Supporting Information.
  71 in total

1.  Conceptual density functional theory.

Authors:  P Geerlings; F De Proft; W Langenaeker
Journal:  Chem Rev       Date:  2003-05       Impact factor: 60.622

2.  Electrodonating and electroaccepting powers.

Authors:  José L Gázquez; Andrés Cedillo; Alberto Vela
Journal:  J Phys Chem A       Date:  2007-02-17       Impact factor: 2.781

3.  Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions.

Authors:  Aleksandr V Marenich; Christopher J Cramer; Donald G Truhlar
Journal:  J Phys Chem B       Date:  2009-05-07       Impact factor: 2.991

4.  Quantifying electro/nucleophilicity by partitioning the dual descriptor.

Authors:  Vincent Tognetti; Christophe Morell; Laurent Joubert
Journal:  J Comput Chem       Date:  2015-02-03       Impact factor: 3.376

5.  Nucleophilic reactivities of imide and amide anions.

Authors:  Martin Breugst; Takahiro Tokuyasu; Herbert Mayr
Journal:  J Org Chem       Date:  2010-08-06       Impact factor: 4.354

6.  Nucleophilic reactivities of sulfur ylides and related carbanions: comparison with structurally related organophosphorus compounds.

Authors:  Roland Appel; Herbert Mayr
Journal:  Chemistry       Date:  2010-08-02       Impact factor: 5.236

7.  First-principles prediction of nucleophilicity parameters for pi nucleophiles: implications for mechanistic origin of Mayr's equation.

Authors:  Chen Wang; Yao Fu; Qing-Xiang Guo; Lei Liu
Journal:  Chemistry       Date:  2010-02-22       Impact factor: 5.236

8.  From QCA (Quantum Cellular Automata) to Organocatalytic Reactions with Stabilized Carbenium Ions.

Authors:  Andrea Gualandi; Luca Mengozzi; Elisabetta Manoni; Pier Giorgio Cozzi
Journal:  Chem Rec       Date:  2016-04-09       Impact factor: 6.771

9.  Inverse solvent effects in carbocation carbanion combination reactions: the unique behavior of trifluoromethylsulfonyl stabilized carbanions.

Authors:  Stefan T A Berger; Armin R Ofial; Herbert Mayr
Journal:  J Am Chem Soc       Date:  2007-07-18       Impact factor: 15.419

Review 10.  Reactions of Allylmagnesium Reagents with Carbonyl Compounds and Compounds with C═N Double Bonds: Their Diastereoselectivities Generally Cannot Be Analyzed Using the Felkin-Anh and Chelation-Control Models.

Authors:  Nicole D Bartolo; Jacquelyne A Read; Elizabeth M Valentín; K A Woerpel
Journal:  Chem Rev       Date:  2020-01-06       Impact factor: 60.622

View more
  5 in total

1.  Gut Microbiota of Individuals Could Be Balanced by a 14-Day Supplementation With Laminaria japonica and Differed in Metabolizing Alginate and Galactofucan.

Authors:  Xueqian Zhang; Changyu Su; Cui Cao; Guiping Gong; Linjuan Huang; Zhongfu Wang; Shuang Song; Beiwei Zhu
Journal:  Front Nutr       Date:  2022-05-18

2.  Basicity as a Thermodynamic Descriptor of Carbanions Reactivity with Carbon Dioxide: Application to the Carboxylation of α,β-Unsaturated Ketones.

Authors:  Pietro Franceschi; Catia Nicoletti; Ruggero Bonetto; Marcella Bonchio; Mirco Natali; Luca Dell'Amico; Andrea Sartorel
Journal:  Front Chem       Date:  2021-11-24       Impact factor: 5.221

3.  Predicting relative efficiency of amide bond formation using multivariate linear regression.

Authors:  Brittany C Haas; Adam E Goetz; Ana Bahamonde; J Christopher McWilliams; Matthew S Sigman
Journal:  Proc Natl Acad Sci U S A       Date:  2022-04-11       Impact factor: 12.779

4.  Uncertainty Quantification of Reactivity Scales.

Authors:  Jonny Proppe; Johannes Kircher
Journal:  Chemphyschem       Date:  2022-03-18       Impact factor: 3.520

5.  Analysis of the Stage Performance Effect of Environmental Protection Music and Dance Drama Based on Artificial Intelligence Technology.

Authors:  Li Zeng
Journal:  J Environ Public Health       Date:  2022-09-19
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.