Literature DB >> 33356255

On the Accuracy of the Direct Method to Calculate pK_a from Electronic Structure Calculations.

Felipe Ribeiro Dutra¹, Cleuton de Souza Silva², Rogério Custodio¹.

Abstract

The direct method (HA(soln) ⇌ A(soln)- + H(soln)+) for calculating pKa of monoprotic acids is as efficient as thermodynamic cycles. A selective adjustment of proton free energy in solution was used with experimental pKa data. The procedure was analyzed at different levels of theory. The solvent was described by the solvation model density (SMD) model, including or not explicit water molecules, and three training sets were tested. The best performance under any condition was obtained by the G4CEP method with a mean absolute error close to 0.5 units of pKa and an uncertainty around ±1 unit of pKa for any training set including or excluding explicit solvent molecules. PM6 and AM1 performed very well with average absolute errors below 0.75 units of pKa but with uncertainties up to ±2 units of pKa, using only the SMD solvent model. Density functional theory (DFT) results were highly dependent on the basis functions and explicit water molecules. The best performance was observed for the local spin density approximation (LSDA) functional in almost all calculations and under certain conditions, as high as those obtained by G4CEP. Basis set complexity and explicit solvent molecules were important factors to control DFT calculations. The training set molecules should consider the diversity of compounds.

Entities: Chemical

Year: 2020 PMID： 33356255 PMCID： PMC7872415 DOI： 10.1021/acs.jpca.0c08283

Source DB: PubMed Journal: J Phys Chem A ISSN： 1089-5639 Impact factor: 2.781

Introduction

The pKa of an acid can be accurately determined experimentally by different techniques, such as capillary electrophoresis, spectrophotometry, and high-performance liquid chromatography.[1−3] Theoretically, the simplest way to estimate the pKa of an acid in solution using quantum methods is based on the calculation of the equilibrium reaction Gibbs energy: HA(soln) ⇌ A(soln)– + H(soln)+. This calculation process is known as a direct method, and its application is rarely found in the literature.[4−6] Usually, significant errors are produced in determining free energies of these three chemical species in solution. One of the most difficult terms to estimate theoretically is the free energy of the solvated proton.[7] Some have attempted using experimental values or theoretical approaches to achieve acceptable values of ΔGsolv(H+),[8−11] which are normally between −252.6 and −271.7 kcal mol–1.[12] Currently, the most used value is −265.6 ± 1 kcal mol–1; however, the use of this value is questioned because the error is considered significant.[13−16] One of the most common procedures for minimizing calculation errors uses thermodynamic cycles involving deprotonation reactions in the gas phase, combined with the same reaction in solution for the acid of interest.[17−21] The calculation of Gibbs energies in the gas phase is usually performed by ab initio or density functional theory (DFT) calculations. In solution, different implicit solvation models are used (conductor-like polarizable continuum model (C-PCM),[22] conductor-like screening model (COSMO),[23] and solvation models (SM-X)[8]), whether or not they explicitly include solvent molecules. Explicit inclusion of solvent molecules can assist in modeling solute–solvent interactions. There are problems related to the number of solvent molecules, as well as the best position of each solvent molecule around the solute.[21,24] Even with these uncertainties regarding the inclusion of explicit molecules, the method works reasonably well for compounds with low and, in some cases, intermediate complexity.[25−28] However, this approach is not efficient for flexible molecules due to conformational changes in solution with respect to the gas phase, especially when the acid has a high degree of freedom and can assume several conformations in solution, requiring the use of other methodologies.[29,30] To simplify the calculation steps and improve the performance of theoretical methods for calculating pKa, in addition to eliminating the gas phase dependency, another common procedure uses isodesmic reactions. The method considers proton competition between two acids. One is a reference acid, and pKa is determined from the second acid. In other words, the free energy of the reaction is estimated according to the reaction HA(soln) + Ref(soln)– ⇌ A(soln)– + HRef(soln).[25,31−33] The isodesmic method is considerably simple and provides satisfactory results for several compounds. Another advantage is that the use of doubtful energies with respect to the proton are not required. The method depends directly on the choice of the reference substance. The usual recommendation is to use reference molecules with similar chemical structures and pKa values to the molecule of interest.[31,34−37] In addition to these two strategies, there are others that follow the same patterns, by determinations related to a reference molecule, more complex thermodynamic cycles, or descriptions through empirical equations correlating some molecular properties to the pKa itself.[38−40] Regardless of the adopted methodology, errors or significant uncertainties always arise with respect to experimental values used, choice of references, or conformational differences between aqueous and gaseous phases, in addition to error by the implicit solvation model or inclusion of explicit solvent molecules.[41,42] The direct method is the simplest alternative, depending on the Gibbs energy of the acid, its conjugated base, and the proton in solution. As mentioned previously, one of the main difficulties is the determination of the free energy of the solvated proton. Many papers have estimated the best free energy based on cluster models or least-squares methods or combination of theoretical and experimental data.[14,43−45] The general tendency of the literature also suggests that the use of a specific parameter for the free energy of the solvated proton is satisfactory for high-level calculations.[28,34,35] It seems convenient to have a simple procedure to estimate pKa using the Gibbs energy of the solvated proton compatible with the level of theory and calculation conditions. Therefore, the objective of this work is to evaluate the performance of the pKa calculation using the direct method, considering the energy of the solvated proton as an adjustable parameter. The pKa calculation is evaluated at different levels of theory, such as Hartree–Fock (HF), semiempirical, DFT, and composite methods. Assessment of the solvent effect is carried out using a continuous solvation model and the role of explicit solvent molecules.

Computational Method

Gibbs energy of the direct deprotonation reaction (HA(aq) ⇌ A(aq)– + H(aq)+) can be calculated by the equationwhere R is the gas constant and T is the temperature. Gaq(A–), Gaq(HA), and Gaq(H+) are the respective Gibbs energies of the conjugate base (A–), protonated acid (HA), and the proton (H+) in solution.[46]Gaq(H+) cannot be calculated accurately by solvation models. Therefore, the present work proposes to determine it by rearranging eq asThus, one can estimate the average value of Gaq(H+), , for a training set from the experimental pKa(exp) values and theoretical values of Gibbs energies of the acid and its conjugate base at any level of theory. This average Gibbs energy of the solvated proton can then be used to determine the pKa of acids at the same level of theory from eq In the present work, the free energies for Gaq(HA) and Gaq(A–) were performed using the Gaussian16 program[47] at the temperature of 298.15 K using the solvation model density (SMD).[8] SMD has been recommended by Gaussian16 for continuous representation of solvent effect. Calculations of all acids and the respective conjugate bases were performed at the following theory levels according to Gaussian definition: AM1,[48] PM6,[49] HF,[50] local spin density approximation (LSDA), PBE0,[51] M06-2X,[52] B3LYP,[53,54] CAM-B3LYP,[55] WB97XD,[56] B2PLYP,[57] and the G4CEP composite method.[58] LSDA is a combination of Slater exchange potential and Vosko–Wilk–Nusair[59] correlation functional. HF and DFT calculations were performed with aug-cc-pVDZ and aug-cc-pVTZ basis functions. The use of AM1 and PM6 semiempirical methods in predicting pKa is generally associated with chemometric methods, quantitative structure–activity relationship (QSAR) or quantitative structure–property relationship (QSPR).[60−62] However, the literature indicates that it is possible to achieve a certain accuracy in the calculation of pKa.[63] HF calculations were considered because of the absence of electronic correlation effects. The criterion for choosing DFT methods ranges from its simplicity and use in the literature, such as LSDA, PBE0,[51] and B3LYP,[53,54] to more recent and sophisticated hybrid functionals, such as M06-2X,[52] CAM-B3LYP,[55] WB97XD,[56] and B2PLYP.[57] The G4CEP composite method was chosen because it includes extrapolation of the basis function, reduction of the computational cost using pseudopotential, and additional corrections related to deficiencies of the basis functions and electronic correlation.[58] It is important to mention that the G4CEP method was used to calculate pKas using thermodynamic cycle with excellent performance.[28] In addition to the use of implicit solvation, the presence of explicit water molecules was also analyzed.[33] The orientation and position of the water molecules in these systems are extremely important and were initially placed close to the oxygens from two carboxyl groups. To identify the most stable molecular geometries, a preoptimization was performed at the B3LYP/aug-cc-pVDZ level for the DFT calculations and later the structures were reoptimized at the respective level of theory. This procedure was used for calculations including or not the explicit water molecules.

Results and Discussion

Assessing the Training Set

To assess the dependence of the pKa calculation with a set of reference molecules, a set of 22 monoprotic acids, previously used by de Souza Silva and Custodio,[28] were employed. The average proton solvation energy was determined from three distinct training sets: (a) training set 1—the entire set of 22 acids, (b) training set 2—three very simple reference acids (acetic, propanoic, and butanoic acids), and (c) training set 3—three acids chosen arbitrarily from the 22 (pentanoic, 2-chlorobutanoic, and 2-methylbutanoic acids). The use of these three sets indicates sensitivity of the average energy of the solvated proton with the number and type of reference molecules. Training set 1 was the first set analyzed. It is representative of all molecules, and in principle, the solvated proton energy should provide the smallest error for this set of molecules. It is expected that for a transferable free energy of the solvated proton, a small training set must provide an equivalent result obtained by the full set of molecules. This hypothesis will be analyzed comparing the results from the three training sets. Table shows the 22 acids studied, the respective experimental pKa values, and the differences between the experimental and calculated values for each level of theory, in addition to the mean absolute error (MAE), standard deviation (std. dev.), and the largest positive and negative deviation with respect to the experimental data for each theoretical method. These are the simplest calculations using aug-cc-pVDZ basis functions for the Hartree–Fock and DFT methods. The solvent effect was represented using only the SMD model for all calculations. The literature suggests that calculated pKa values are considered acceptable if they have a mean absolute error below one pKa unit.[64]Table shows that six of the 11 levels satisfy the criterion. The lowest mean absolute errors were obtained by the PM6 (0.57), G4CEP (0.63), AM1 (0.73), HF (0.95), and B2PLYP (0.96) methods, with standard deviations of 0.66, 0.42, 0.81, 0.89, and 0.94, respectively, in units of pKa. Standard deviations multiplied by 2 provide an uncertainty estimate with 95% confidence. The maximum positive and negative deviations show that some results present significant errors. However, Table indicates that unusual deviations are usually related to a few specific acids that produce inadequate results for almost all methods, such as trichloroacetic and hexanoic acids. Surprisingly, the computationally most expensive method, G4CEP, and the semiempirical ones, PM6 and AM1, showed the best performances. PM6 was previously tested in the literature using the isodesmic method with an average error similar to the present work.[63] If computational cost is considered, the semiempirical methods are more advantageous than G4CEP. Although the latter presents a lower uncertainty, which can be verified both by the standard deviation multiplied by 2 and the most positive and negative deviations with respect to the experimental data. Importantly, the results obtained with the direct method using G4CEP are a little better than using a thermodynamic cycle.[28] The seven functionals tested yielded inadequate performance. Six of the seven showed mean absolute errors just above one pKa unit, and the best results were close to one.

Table 1

Experimental pKa Values and Differences between Experimental and Calculated Values for Different Levels of Theory, in Addition to the Mean Absolute Error, Standard Deviation, and the Largest Positive and Negative Deviationsa

acids	pK_ab (exp)	G4CEP	AM1	PM6	HF	LSDA	PBE	B3LYP	CAM B3LYP	WB97XD	M062X	B2PLYP
acetic	4.76	0.12	–0.53	0.04	–0.73	–1.32	–0.59	–1.45	–1.46	–0.53	–0.67	–1.31
propanoic	4.88	–0.77	–0.77	0.75	–0.47	–1.37	–1.47	–1.32	–1.29	–1.48	–1.27	–1.32
butanoic	4.82	0.27	0.55	–1.20	–0.89	–1.49	–1.28	–0.97	–0.80	–1.09	–1.24	–0.86
pentanoic	4.82	–0.32	–0.64	0.11	–1.67	–0.92	–1.61	–1.44	–1.36	–1.56	–1.07	–1.39
hexanoic	4.85	–1.61	0.64	0.20	–1.69	–1.12	–2.16	–2.50	–1.50	–1.01	–1.75	–1.60
chloroacetic	2.86	0.79	0.19	–0.38	0.98	1.13	1.23	0.08	–0.07	1.26	0.99	0.07
bromoacetic	2.90	0.21	–0.61	–0.42	0.68	1.22	0.71	0.81	0.68	0.58	0.50	0.67
trichloroacetic	0.70	1.04	3.52	2.66	4.52	4.34	5.03	5.07	4.63	4.78	4.63	4.79
2-chlorobutanoic	2.83	0.78	1.17	0.01	0.76	0.45	0.71	1.17	0.68	0.61	0.40	1.27
3-chlorobutanoic	3.98	–0.43	0.00	0.52	–0.28	–0.20	–0.32	–0.06	–0.17	–0.17	–0.31	–0.11
4-chlorobutanoic	4.52	–0.52	–0.37	0.81	–0.27	–0.62	–0.18	–0.01	0.23	0.19	0.64	–0.15
3-butenoic	4.35	–0.07	0.25	–0.27	0.01	–0.26	–0.29	–0.09	–1.07	–0.27	–1.14	–0.16
2-methylpropanoic	4.84	–0.17	–0.09	0.72	–0.96	0.05	–0.01	–1.15	–0.78	–0.83	–1.13	–0.80
2.2-dimethylpropanoic	5.03	–0.64	0.20	–0.34	–1.02	–1.16	–0.92	–0.73	–0.82	–0.93	–0.77	–0.76
3-methylbutanoic	4.77	–1.29	0.09	0.25	–0.89	–1.09	–1.17	–0.95	–0.75	–1.81	–0.90	–0.99
2-methylbutanoic	4.80	–0.87	–0.24	0.01	–1.18	–1.45	–1.01	–0.90	–0.79	–1.15	–1.11	–0.78
2-butynoic	2.62	1.17	1.45	–0.28	1.13	1.30	0.75	1.59	1.75	1.17	1.36	0.77
2-chloropropanoic	2.83	0.82	–0.19	–0.06	1.01	0.75	0.72	1.29	1.15	0.92	0.36	1.07
3-bromopropanoic	4.00	1.01	–0.81	0.21	0.00	0.96	1.24	0.41	0.35	–0.04	1.18	0.35
3-chloropropanoic	3.98	0.05	–0.59	–0.51	0.57	0.91	0.39	0.53	0.56	1.25	1.20	0.51
trans-crotonic	4.69	–0.22	–0.75	–0.64	–0.44	–1.02	–0.79	–0.57	–0.36	–0.92	–0.60	–0.31
formic	3.75	0.64	–2.46	–2.20	0.82	0.93	1.01	1.21	1.19	1.03	0.70	1.05
MAE		0.63	0.73	0.57	0.95	1.09	1.07	1.11	1.02	1.07	1.09	0.96
std		0.42	0.81	0.66	0.89	0.81	1.00	1.04	0.91	0.93	0.85	0.94
max		1.17	3.52	2.66	4.52	4.34	5.03	5.07	4.63	4.78	4.63	4.79
min		–1.61	–2.46	–2.20	–1.69	–1.49	–2.16	–2.50	–1.50	–1.81	–1.75	–1.60

HF and DFT calculations used aug-cc-pVDZ basis functions. All calculations were performed with solvent represented by SMD and free energies of the solvated proton obtained with training set 1.

Data from refs (72) and (73).

HF and DFT calculations used aug-cc-pVDZ basis functions. All calculations were performed with solvent represented by SMD and free energies of the solvated proton obtained with training set 1. Data from refs (72) and (73). In general, error in the pKa calculation produced by the direct method may result from sensitivity to the training set. Table shows the mean absolute errors, standard deviations, and the largest positive and negative deviations for each theoretical method using training sets 2 and 3, which use only three acids to determine the average energy of the solvated proton. The calculated pKa values are available as Supporting Information in Tables S1 and S2.

Table 2

Mean Absolute Error (MAE), Standard Deviation (Std. Dev.), and the Largest Positive (Max) and Negative (Min) Deviations at Different Levels of Theorya

Training Set 2
	G4CEP	AM1	PM6	HF	LSDA	PBE	B3LYP	CAM B3LYP	WB97XD	M062X	B2PLYP
MAE	0.63	0.73	0.59	1.01	1.41	1.30	1.41	1.26	1.21	1.18	1.25
std. dev.	0.43	0.85	0.66	1.08	1.35	1.29	1.38	1.29	1.27	1.28	1.27
max	1.30	3.77	2.80	5.22	5.73	6.14	6.32	5.81	5.81	5.69	5.95
min	–1.49	–2.21	–2.06	–0.99	–0.10	–1.05	–1.25	–0.32	–0.78	–0.69	–0.43

HF and DFT calculations used aug-cc-pVDZ basis functions. All calculations were performed with the SMD model and training sets 2 and 3.

HF and DFT calculations used aug-cc-pVDZ basis functions. All calculations were performed with the SMD model and training sets 2 and 3. Table shows that mean absolute errors below one pKa unit continue in the increasing sequence: PM6, G4CEP, and AM1. Almost all functionals maintained error above one pKa unit, and the errors did not present a well-defined trend. Compared with Table , the mean absolute error produced by training set 3 is closer to set 1 than set 2 for DFT calculations. The mean absolute errors for the G4CEP, PM6, and AM1 methods are not particularly sensitive with the chosen reference acids. In the case of DFT calculations, training set 2 reached a maximum mean absolute error of 1.41 units of pKa while training set 3 achieved a value of 1.13 pKa units. The worst performance of training set 2 is certainly associated with the similarity of the reference acids, the diversity in the set of 22 acids, and the exchange and correlation effects of the functionals. The more diversified electronic environments of training set 3 certainly provided a better representation of the substances for calculation of the 22 acids.

Basis Set Dependency

The Hartree–Fock and DFT methods depended on the choice of basis function. The best alternative for a quantum calculation is to consider a complete basis set or the extrapolation of properties with increasing complexity of a basis function.[25,65−67] Calculations applicable to medium or large molecules are made with modest basis functions, like the one used in the previous chapter. However, it is necessary to assess whether enlargement of the basis function is significant in determining pKa using the direct method. Therefore, the Hartree–Fock and DFT calculations were also performed with aug-cc-pVTZ basis functions. Table summarizes the mean absolute errors for all levels tested, standard deviations, and the largest positive and negative deviations for each method. Details of the pKa deviations for all acids and the three training sets are available as Supporting Information in Tables S3–S5. Table shows that there are important consequences in increasing the basis function. Almost all DFT calculations improve with mean absolute errors below 1.1 pKa units. The only exception is the WB97XD method for the second training set, which shows a significant increase in the average error. On the other hand, calculations employing the LSDA, M062X, and B2PLYP functionals provided mean errors below one unit of pKa. Training sets 1 and 3 produce results similar to each other and near the average errors using the aug-cc-pVDZ basis function. Calculations for training set 2 are significantly improved with aug-cc-pVTZ compared to aug-cc-pVDZ. These changes are, in part, a consequence of the nature of the energies produced by the methods themselves and small changes in the optimized molecular geometries. The structures are optimized initially at the B3LYP/aug-cc-pVDZ level and, later, at the corresponding level of calculation. Thus, the exceedingly small mean absolute error, standard deviation, and the largest positive and negative deviation from the LSDA results are surprising for any training set with aug-cc-pVTZ basis functions. These data surpass the performance verified by the semiempirical and G4CEP methods. Tables S3–S5 show that the pKa deviations calculated with LSDA, with respect to the experimental data, are usually lower than 0.5 units of pKa with few exceptions. A final aspect of Table is that, although almost all functionals improved with the size of the basis function, the largest positive deviations persist. Analysis of Tables S3–S5 indicates that trichloroacetic acid is persistent in the deviation of pKa for almost all functionals tested. The remaining deviations are within the estimated uncertainty.

Table 3

Experimental pKa, Mean Absolute Error (MAE), Standard Deviations (Std. Dev.), and the Largest Positive (Max) and Negative (Min) Deviation at HF and DFT Levelsa

Training Set 1
	HF	LSDA	PBE	B3LYP	CAM B3LYP	WB97XD	M062X	B2PLYP
MAE	1.10	0.39	1.04	1.07	1.06	1.07	0.82	0.93
std. dev.	0.88	0.33	1.00	0.86	0.87	0.86	0.74	0.90
max	4.37	1.43	5.04	4.42	4.30	4.33	3.85	4.50
min	–2.04	–1.12	–1.96	–1.58	–1.79	–1.72	–1.17	–1.70

All calculations were carried out with the SMD model, aug-cc-pVTZ basis set, and training sets 1, 2, and 3.

All calculations were carried out with the SMD model, aug-cc-pVTZ basis set, and training sets 1, 2, and 3. In general, results with the larger basis sets improve performance of the DFT calculations. Although the computational cost increases, the deviation with respect to the experimental data for part of the tested functionals is significant. The LSDA functional with aug-cc-pVTZ functions produced exceptional results at a considerably reduced computational cost, which qualifies it as the best alternative associated with direct determination of pKa.

Explicit Solvent

In thermodynamic cycles, the literature frequently indicates that, in addition to the reaction field, the inclusion of explicit solvent molecules improves the pKa estimate. Table presents the mean absolute error and standard deviations of pKa regarding experimental data using the SMD model and one explicit water molecule. The position of the water molecule can change the value of the pKa, and the optimized molecular geometry should characterize the global minimum of energy and not a local one. Data related to the deviation of each pKa with respect to experimental results for the three training sets and including one water molecule are found in Tables S6–S11.

Table 4

Mean Absolute Error (MAE) and Standard Deviations (Std. Dev.) at Different Levels of Theory Using the SMD Model and One Explicit Water Molecule with Training Sets 1, 2, and 3a

Training Set 1 + SMD + H₂O
	G4CEP	AM1	PM6	HF	LSDA	PBE	B3LYP	CAM B3LYP	WB97XD	M062X	B2PLYP
aug-cc-pVDZ
MAE	0.50	0.72	0.89	1.12	0.49	0.91	0.84	0.79	0.77	0.59	0.78
std. dev.	0.29	0.74	0.85	0.87	0.34	0.60	0.60	0.57	0.67	0.61	0.61
aug-cc-pVTZ
MAE				1.01	0.47	0.65	0.71	0.73	0.72	0.63	0.68
std. dev.				0.81	0.33	0.54	0.54	0.48	0.65	0.47	0.61

The aug-cc-pVTZ and aug-cc-pVTZ basis sets were used for HF and DFT calculations.

The aug-cc-pVTZ and aug-cc-pVTZ basis sets were used for HF and DFT calculations. Table shows that the G4CEP method maintains the same regularity with excellent performance for training sets 1 and 3 with mean errors around 0.5 units of pKa and uncertainties less than ±0.6. The mean error increases for training set 2 but is still an excellent option since the average error is 0.70 units of pKa with an uncertainty around ±1 pKa unit. The AM1 and PM6 semiempirical methods performed worse with the inclusion of one water molecule, even for training sets 1 and 3. The PM6 results show a mean absolute error around 0.72 units of pKa. In contrast, for AM1, this error is significantly larger and about 1.13 units of pKa for training sets 1 and 3. Uncertainties also increase to approximately ±1.5 and ±2 units of pKa for PM6 and AM1, respectively. For training set 2, the mean absolute error and uncertainties increase for both methods, though more significantly for AM1. The inclusion of a second water molecule in semiempirical calculations keeps the errors in the same order of magnitude but favoring the AM1 method instead of PM6 (data not shown). The Hartree–Fock results improve accuracy and achieve a mean absolute error lower than one unit of pKa with the aug-cc-pVDZ basis function but worsen the results with aug-cc-pVTZ. HF calculations tend to favor bonded states by reducing bond lengths with larger basis functions, which affects geometry, cancellation of errors, and the quality of calculated results. On the other hand, DFT results improve significantly with the inclusion of one water molecule. Training sets 1 and 2 improve with aug-cc-pVTZ, while with training set 3, this association is not evident. Almost all functionals tested present mean absolute errors lower than 1 unit of pKa. The worst performances are related to PBE/aug-cc-pVDZ calculations and training set 2. These results and analyses of all previous data indicate that the largest deviations occurred with acids containing halogens. The inclusion of halogenated compounds in the training set provides average free energies of the solvated proton suitable to the acids tested in this article. This information shows that the training set must be representative of acids in the validation set. However, one of the most remarkable aspects is, once again, the excellent performance of the LSDA functional. The mean absolute error and standard deviation are usually below 0.5 pKa units, except for aug-cc-pVTZ and training set 3. Due to its simplicity, LSDA is not recommended for the calculation of chemical properties. However, the use of this functional with empirically adjusted solvated proton free energy yields an efficient cancellation of errors. Additionally, by increasing to two explicit water molecules, the mean absolute errors are reduced even further for LSDA and almost all other functionals, producing uncertainties below ±1 unit of pKa for both aug-cc-pVDZ and aug-cc-pVTZ calculations.

Gibbs Energy of the Solvated Proton

The literature presents a set of possibilities for free energy of the solvated proton with values between −252.6 and −271.7 kcal mol–1.[12] Many studies use the value of −265.6 ±1 kcal mol–1, due to reduced experimental uncertainty and quality of the pKa estimates. As an example, Zhan and Dixon[68] performed high-level ab initio calculations in the supermolecule/continuous approach and obtained a value of −264.3 kcal mol–1, and when corrected to the standard condition of 1 M, it became −265.63 ± 0.22 kcal mol–1.[69] However, calculations in the present work demonstrate that the adjustment of this energy is essential and can significantly reduce the pKa error. As a consequence, the direct method is extremely attractive, economical, and simple for the determination of pKa values. Table shows all Gibbs energies of proton solvation used in this work. There is a significant difference between the values obtained with the AM1 and PM6 semiempirical methods from those obtained from ab initio and DFT calculations. This difference arises because AM1 and PM6 produce enthalpies of formation at 298 K, rather than molecular electronic energies. Therefore, programs that use AM1 and PM6 energies to estimate thermochemical quantities are working with the free energy of formation and not free molecular energy. Regardless of how the free-energy calculation is conducted, the pKa results are quite promising and follow a relatively accurate trend, especially without the inclusion of explicit solvent molecules.

Table 5

Average Gibbs Energies of the Solvated Proton Calculated at Different Levels of Theory with the aug-cc-pVDZ and aug-cc-pVTZ Basis Functions with Training Sets 1, 2, and 3 and Solvent Effect Represented by SMD and with and without One Explicit Water Molecule. Data in kcal mol−1.

SMD
	G4CEP	AM1	PM6	HF	LSDA	PBE	B3LYP	CAM B3LYP	WB97XD	M062X	B2PLYP
aug-cc-pVDZ
train. 1	–266.91	104.93	120.86	–277.22	–267.22	–271.97	–273.07	–271.88	–274.71	–273.14	–272.12
train. 2	–267.08	104.59	120.67	–278.17	–269.12	–273.49	–274.78	–273.49	–276.12	–274.59	–273.71
train. 3	–267.09	105.06	120.92	–278.17	–268.09	–272.84	–273.61	–272.55	–275.67	–273.95	–272.53
aug-cc-pVTZ
train. 1				–279.02	–271.67	–273.43	–273.95	–272.84	–276.11	–272.65	–273.18
train. 2				–271.74	–274.62	–275.46	–274.09	–277.98	–273.87	–274.37	–271.74
train. 3				–279.49	–271.19	–274.37	–274.43	–273.61	–277.17	–273.46	–273.69

On the other hand, we noted that the ab initio and DFT data of the free energies of the solvated proton are relatively close to the interval given by the literature. Tests involving the solvent effect, considering only the SMD model, change in the basis functions, or the inclusion of explicit solvent molecules indicates a greater similarity in the mean absolute errors and standard deviations for training sets 1 and 3 due to greater diversity of the acids present in training. Table shows a greater similarity between the free energies of the solvated proton involving these two training sets than data obtained with set 2. In general, each theoretical method presents a specific Gibbs energy of the solvated proton that corrects a systematic error in obtaining pKa. The G4CEP method produced extremely reliable results with reduced standard deviations in all tests performed and presented a free energy for the solvated proton close to the most used value of −265 kcal mol–1. On the other hand, DFT calculations presented values close to −270 kcal mol–1. The best results produced with LSDA also showed Gibbs energies of the solvated proton around this value. Note that the lack of electronic correlation in the HF method significantly increases the energy value of the solvated proton. It is important to remember that when performing a frequency calculation in SMD, the standard state is 1 atm, and not 1 mol L–1. Therefore, the Gibbs energies of all species require a correction of 1.9 kcal mol–1, as shown in the literature.[44,70,71] This correction is just an additive constant and its effect is being canceled between the acid and the respective conjugated basis. On the other hand, to define the free energy of the solvated proton, it must be considered. Since the main objective of the paper is to find an empirical transferable parameter to be used by each method, the formal energies of the solvated proton shown in Table were not corrected by this constant.

Conclusions

The direct method (HA(soln) ⇌ A(soln)– + H(soln)+) for the pKa calculation of monoprotic acids seems to be as efficient as thermodynamic cycles. The results of direct calculation are sensitive to the level of calculation and Gibbs energy of the solvated proton. The procedure was analyzed at different levels of theory: two semiempirical levels (AM1 and PM6), one composite method (G4CEP), seven functionals (LSDA, PBE0, B3LYP, M06-2X, CAM-B3LYP, WB97XD, and B2PLYP), and Hartree–Fock (HF). Two basis functions were tested for HF and DFT: aug-cc-pVDZ and aug-cc-pVTZ. The solvent was described by the SMD model, including and excluding explicit water molecules. The Gibbs energy of the solvated proton was determined using three training sets chosen from 22 monoprotic carboxylic acids: (a) training set 1, which included the entire set of 22 acids; (b) training set 2, which contained three very simple reference acids (acetic, propanoic, and butanoic acids); and (c) training set 3, which consisted of three acids chosen arbitrarily from the 22 (pentanoic, 2-chlorobutanoic, and 2-methylbutanoic acids). Evaluation of the results involving all of the mentioned conditions allowed specific and general conclusions to be drawn. Acceptable pKa results were considered to have mean absolute errors less than 1 pKa unit. In this sense, the best performance in any condition was obtained by the G4CEP method. The mean absolute errors were close to 0.5 units of pKa with a standard deviation usually below this quantity, leading to an uncertainty around ±1 unit of pKa for any training set with or without explicit solvent. This performance is better than the thermodynamic cycles with the same set of acids. Another important aspect is the proximity of the optimized Gibbs energy of the solvated proton, which is close to the most used value of −265.6 kcal mol–1. The PM6 and AM1 methods perform very well with average absolute errors below 0.75 units of pKa and uncertainties of less than ±2 units of pKa using the SMD solvent model without explicit solvent molecules. The Gibbs energies of the solvated proton adjusted with the semiempirical methods have no correlation with the experimental data since the electronic energies of these methods reproduce enthalpies of formation and not absolute molecular enthalpy. The Hartree–Fock and DFT results showed a worse performance using aug-cc-pVDZ basis functions and SMD compared to the semiempirical and G4CEP methods. On the other hand, the use of aug-cc-pVTZ basis functions and explicit water molecules significantly reduced the mean absolute error and pKa uncertainty, making them attractive due to the computational cost and accuracy. The best result was achieved by the LSDA functional under almost all calculation conditions. The performance of this functional is exceptional, mainly at the aug-cc-pVTZ level. The errors and uncertainties are as good as those obtained by the G4CEP method, i.e., around 0.5 units and ±1 unit of pKa, respectively. The values of free energies of the solvated proton for almost all functional ones were generally higher than −270 kcal mol–1. The only functional that had a value below −270 kcal mol–1 was LSDA. Hartree–Fock calculations performed worse than semiempirical calculations in any condition. Obviously, the absence of electronic correlation is mandatory for an acceptable pKa result, and empirical adjustment is not sufficient. The Gibbs energy values of the solvated proton were the furthest from the most used value compared with experimental data. In general, the addition of solvent molecules tends to improve results, except for semiempirical levels. An increase in the complexity of the basis functions is an important factor to be controlled, especially for DFT calculations. Regarding the training sets, better results are obtained using selected molecules representing the chemical diversity of all species to be calculated.

37 in total

1. Estimating the pKa of phenols, carboxylic acids and alcohols from semi-empirical quantum chemical methods

Authors:
Journal: Chemosphere Date: 1999-01 Impact factor: 7.086

2. Accurate pK(a) calculations for carboxylic acids using complete basis set and Gaussian-n models combined with CPCM continuum solvation methods.

Authors: M D Liptak; G C Shields
Journal: J Am Chem Soc Date: 2001-08-01 Impact factor: 15.419

3. Aqueous solvation free energies of ions and ion-water clusters based on an accurate value for the absolute aqueous solvation free energy of the proton.

Authors: Casey P Kelly; Christopher J Cramer; Donald G Truhlar
Journal: J Phys Chem B Date: 2006-08-17 Impact factor: 2.991

2. Molecular docking assisted exploration on solubilization of poorly soluble drug remdesivir in sulfobutyl ether-tycyclodextrin.

Authors: Yumeng Zhang; Zhouming Zhao; Kai Wang; Kangjie Lyu; Cai Yao; Lin Li; Xia Shen; Tengfei Liu; Xiaodi Guo; Haiyan Li; Wenshou Wang; Tsai-Ta Lai
Journal: AAPS Open Date: 2022-04-25

3. Computational Estimation of the Acidities of Pyrimidines and Related Compounds.

Authors: Rachael A Holt; Paul G Seybold
Journal: Molecules Date: 2022-01-07 Impact factor: 4.411

3 in total

On the Accuracy of the Direct Method to Calculate pK_a from Electronic Structure Calculations.

Introduction

Computational Method

Results and Discussion

Assessing the Training Set

Basis Set Dependency

Explicit Solvent

Gibbs Energy of the Solvated Proton

Conclusions

1. Estimating the pKa of phenols, carboxylic acids and alcohols from semi-empirical quantum chemical methods

2. Accurate pK(a) calculations for carboxylic acids using complete basis set and Gaussian-n models combined with CPCM continuum solvation methods.

3. Aqueous solvation free energies of ions and ion-water clusters based on an accurate value for the absolute aqueous solvation free energy of the proton.

4. Semiempirical hybrid density functional with perturbative second-order correlation.

5. Calculation of solvation free energies of charged solutes using mixed cluster/continuum models.

6. Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections.

7. Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density.

8. Theoretical evaluation of pK(a) in phosphoranes: implications for phosphate ester hydrolysis.

9. Theoretical prediction of relative and absolute pKa values of aminopyridines.

10. Density functional theory in prediction of four stepwise protonation constants for nitrilotripropanoic acid (NTPA).

1. Accurate acid dissociation constant (pK_a) calculation for the sulfachloropyridazine and similar molecules.

2. Molecular docking assisted exploration on solubilization of poorly soluble drug remdesivir in sulfobutyl ether-tycyclodextrin.

3. Computational Estimation of the Acidities of Pyrimidines and Related Compounds.