| Literature DB >> 23574978 |
Radka Svobodová Vařeková1, Stanislav Geidl, Crina-Maria Ionescu, Ondřej Skřehota, Tomáš Bouchal, David Sehnal, Ruben Abagyan, Jaroslav Koča.
Abstract
: The acid dissociation constant p Ka is a very important molecular property, and there is a strong interest in the development of reliable and fast methods for p Ka prediction. We have evaluated the p Ka prediction capabilities of QSPR models based on empirical atomic charges calculated by the Electronegativity Equalization Method (EEM). Specifically, we collected 18 EEM parameter sets created for 8 different quantum mechanical (QM) charge calculation schemes. Afterwards, we prepared a training set of 74 substituted phenols. Additionally, for each molecule we generated its dissociated form by removing the phenolic hydrogen. For all the molecules in the training set, we then calculated EEM charges using the 18 parameter sets, and the QM charges using the 8 above mentioned charge calculation schemes. For each type of QM and EEM charges, we created one QSPR model employing charges from the non-dissociated molecules (three descriptor QSPR models), and one QSPR model based on charges from both dissociated and non-dissociated molecules (QSPR models with five descriptors). Afterwards, we calculated the quality criteria and evaluated all the QSPR models obtained. We found that QSPR models employing the EEM charges proved as a good approach for the prediction of p Ka (63% of these models had R2 > 0.9, while the best had R2 = 0.924). As expected, QM QSPR models provided more accurate p Ka predictions than the EEM QSPR models but the differences were not significant. Furthermore, a big advantage of the EEM QSPR models is that their descriptors (i.e., EEM atomic charges) can be calculated markedly faster than the QM charge descriptors. Moreover, we found that the EEM QSPR models are not so strongly influenced by the selection of the charge calculation approach as the QM QSPR models. The robustness of the EEM QSPR models was subsequently confirmed by cross-validation. The applicability of EEM QSPR models for other chemical classes was illustrated by a case study focused on carboxylic acids. In summary, EEM QSPR models constitute a fast and accurate p Ka prediction approach that can be used in virtual screening.Entities:
Year: 2013 PMID: 23574978 PMCID: PMC3663834 DOI: 10.1186/1758-2946-5-18
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Summary information about the EEM parameter sets used in the present study
| QM theory level | PA | EEM parameter | Published by | Year of | Elements included |
|---|---|---|---|---|---|
| + basis set | set name | publication | |||
| HF/STO-3G | MPA | Svob2007_cbeg2 | Svobodova et al. [ | 2007 | C, O, N, H, S |
| Svob2007_cmet2 | Svobodova et al. [ | 2007 | C, O, N, H, S, Fe, Zn | ||
| Svob2007_chal2 | Svobodova et al. [ | 2007 | C, O, N, H, S, Br, Cl, F, I | ||
| Svob2007_hm2 | Svobodova et al. [ | 2007 | C, O, N, H, S, F, Cl, Br, I, Fe, Zn | ||
| Baek1991 | Baekelandt et al. [ | 1991 | C, O, N, H, P, Al, Si | ||
| Mort1986 | Mortier et al. [ | 1986 | C, O, N, H | ||
| HF/6–31G* | MK | Jir2008_hf | Jirouskova et al. [ | 2008 | C, O, N, H, S, F, Cl, Br, I, Zn |
| B3LYP/6–31G* | MPA | Chaves2006 | Chaves et al. [ | 2006 | C, O, N, H, F |
| Bult2002_mul | Bultinck et al. [ | 2002 | C, O, N, H, F | ||
| NPA | Ouy2009 | Ouyang et al. [ | 2009 | C, O, N, H, F | |
| Ouy2009_elem | Ouyang et al. [ | 2009 | C, O, N, H, F | ||
| Ouy2009_elemF | Ouyang et al. [ | 2009 | C, O, N, H, F | ||
| Bult2002_npa | Bultinck et al. [ | 2002 | C, O, N, H, F | ||
| Hir. | Bult2002_hir | Bultinck et al. [ | 2002 | C, O, N, H, F | |
| MK | Jir2008_mk | Jirouskova et al. [ | 2008 | C, O, N, H, S, F, Cl, Br, I, Zn | |
| Bult2002_mk | Bultinck et al. [ | 2002 | C, O, N, H, F | ||
| CHELPG | Bult2002_che | Bultinck et al. [ | 2002 | C, O, N, H, F | |
| AIM | Bult2004_aim | Bultinck et al. [ | 2004 | C, O, N, H, F |
Quality criteria and statistical criteria for all the QSPR models analyzed in the present study and focused on phenols
| QM theory level | PA | EEM parameter | QSPR model |
| RMSE |
|
|
|
|---|---|---|---|---|---|---|---|---|
| + basis set | set name | |||||||
| HF/STO-3G | MPA | - | 3d QM | 0.9515 | 0.490 | 0.388 | 0.504 | 458 |
| - | 5d QM | 0.9657 | 0.412 | 0.310 | 0.430 | 358 | ||
| Svob2007_cbeg2 | 3d EEM | 0.8671 | 0.812 | 0.571 | 0.835 | 152 | ||
| 3d EEM WO | 0.9239 | 0.482 | 0.382 | 0.497 | 255 | |||
| 5d EEM | 0.9179 | 0.638 | 0.481 | 0.666 | 152 | |||
| Svob2007_cmet2 | 3d EEM | 0.8663 | 0.814 | 0.577 | 0.837 | 151 | ||
| 3d EEM WO | 0.9239 | 0.482 | 0.386 | 0.497 | 255 | |||
| 5d EEM | 0.9189 | 0.634 | 0.476 | 0.661 | 154 | |||
| Svob2007_chal2 | 3d EEM | 0.8737 | 0.792 | 0.554 | 0.814 | 161 | ||
| 3d EEM WO | 0.9127 | 0.483 | 0.387 | 0.498 | 220 | |||
| 5d EEM | 0.9203 | 0.629 | 0.473 | 0.656 | 157 | |||
| Svob2007_hm2 | 3d EEM | 0.8671 | 0.812 | 0.578 | 0.835 | 152 | ||
| 3d EEM WO | 0.9241 | 0.481 | 0.387 | 0.496 | 256 | |||
| 5d EEM | 0.9179 | 0.638 | 0.478 | 0.666 | 152 | |||
| Baek1991 | 3d EEM | 0.9099 | 0.669 | 0.531 | 0.688 | 236 | ||
| 3d EEM WO | 0.9166 | 0.531 | 0.423 | 0.548 | 231 | |||
| 5d EEM | 0.9195 | 0.632 | 0.493 | 0.659 | 155 | |||
| Mort1986 | 3d EEM | 0.8860 | 0.752 | 0.577 | 0.773 | 181 | ||
| 3d EEM WO | 0.9151 | 0.520 | 0.405 | 0.536 | 226 | |||
| 5d EEM | 0.9142 | 0.652 | 0.524 | 0.680 | 145 | |||
| HF/6–31G* | MK | - | 3d QM | 0.8405 | 0.890 | 0.727 | 0.915 | 123 |
| - | 5d QM | 0.8865 | 0.750 | 0.641 | 0.782 | 106 | ||
| Jir2008_hf | 3d EEM | 0.8612 | 0.830 | 0.582 | 0.853 | 145 | ||
| 3d EEM WO | 0.9182 | 0.500 | 0.394 | 0.516 | 236 | |||
| 5d EEM | 0.9154 | 0.648 | 0.488 | 0.676 | 147 | |||
| B3LYP/6–31G* | MPA | - | 3d QM | 0.9671 | 0.404 | 0.317 | 0.415 | 686 |
| - | 5d QM | 0.9724 | 0.370 | 0.274 | 0.386 | 479 | ||
| Chaves2006 | 3d EEM | 0.891 | 0.735 | 0.570 | 0.756 | 191 | ||
| 3d EEM WO | 0.9198 | 0.505 | 0.398 | 0.521 | 241 | |||
| 5d EEM | 0.9192 | 0.633 | 0.489 | 0.660 | 155 | |||
| Bult2002_mul | 3d EEM | 0.8876 | 0.747 | 0.589 | 0.768 | 184 | ||
| 3d EEM WO | 0.9151 | 0.520 | 0.416 | 0.536 | 226 | |||
| 5d EEM | 0.9158 | 0.646 | 0.504 | 0.674 | 148 | |||
| B3LYP/6–31G* | NPA | - | 3d QM | 0.9590 | 0.451 | 0.349 | 0.464 | 546 |
| - | 5d QM | 0.9680 | 0.399 | 0.295 | 0.416 | 411 | ||
| Ouy2009 | 3d EEM | 0.8731 | 0.793 | 0.541 | 0.815 | 161 | ||
| 3d EEM WO | 0.9043 | 0.505 | 0.379 | 0.521 | 198 | |||
| 5d EEM | 0.9094 | 0.670 | 0.503 | 0.699 | 137 | |||
| Ouy2009_elem | 3d EEM | 0.8727 | 0.795 | 0.546 | 0.817 | 160 | ||
| 3d EEM WO | 0.9113 | 0.487 | 0.382 | 0.502 | 216 | |||
| 5d EEM | 0.9132 | 0.656 | 0.495 | 0.684 | 143 | |||
| Ouy2009_elemF | 3d EEM | 0.8848 | 0.756 | 0.519 | 0.777 | 179 | ||
| 3d EEM WO | 0.9012 | 0.512 | 0.386 | 0.528 | 192 | |||
| 5d EEM | 0.8866 | 0.750 | 0.520 | 0.782 | 106 | |||
| Bult2002_npa | 3d EEM | 0.9044 | 0.689 | 0.532 | 0.708 | 221 | ||
| 3d EEM WO | 0.9098 | 0.523 | 0.405 | 0.539 | 212 | |||
| 5d EEM | 0.9180 | 0.638 | 0.488 | 0.666 | 152 | |||
| Hir. | - | 3d QM | 0.9042 | 0.689 | 0.503 | 0.708 | 220 | |
| - | 5d QM | 0.9477 | 0.509 | 0.356 | 0.531 | 246 | ||
| Bult2002_hir | 3d EEM | 0.8415 | 0.887 | 0.636 | 0.912 | 124 | ||
| 3d EEM WO | 0.8838 | 0.579 | 0.414 | 0.597 | 160 | |||
| 5d EEM | 0.9050 | 0.687 | 0.522 | 0.717 | 130 | |||
| MK | - | 3d QM | 0.8447 | 0.878 | 0.705 | 0.903 | 127 | |
| - | 5d QM | 0.8960 | 0.718 | 0.594 | 0.749 | 117 | ||
| Jir2008_dft | 3d EEM | 0.8696 | 0.804 | 0.555 | 0.827 | 156 | ||
| 3d EEM WO | 0.9224 | 0.487 | 0.371 | 0.502 | 250 | |||
| 5d EEM | 0.9148 | 0.650 | 0.489 | 0.678 | 146 | |||
| Bult2002_mk | 3d EEM | 0.8639 | 0.822 | 0.610 | 0.845 | 148 | ||
| 3d EEM WO | 0.9053 | 0.519 | 0.384 | 0.535 | 201 | |||
| 5d EEM | 0.9131 | 0.657 | 0.508 | 0.685 | 143 | |||
| Chel. | - | 3d QM | 0.8528 | 0.854 | 0.712 | 0.878 | 135 | |
| - | 5d QM | 0.9087 | 0.673 | 0.552 | 0.702 | 135 | ||
| Bult2002_che | 3d EEM | 0.8695 | 0.805 | 0.597 | 0.828 | 155 | ||
| 3d EEM WO | 0.8863 | 0.588 | 0.436 | 0.606 | 164 | |||
| 5d EEM | 0.9057 | 0.684 | 0.540 | 0.714 | 131 | |||
| AIM | - | 3d QM | 0.9609 | 0.440 | 0.332 | 0.452 | 573 | |
| - | 5d QM | 0.9677 | 0.400 | 0.285 | 0.417 | 407 | ||
| Bult2004_aim | 3d EEM | 0.8646 | 0.819 | 0.619 | 0.842 | 149 | ||
| 3d EEM WO | 0.8972 | 0.590 | 0.438 | 0.608 | 183 | |||
| 5d EEM | 0.9017 | 0.698 | 0.571 | 0.728 | 125 |
Figure 1for the correlation between calculated and experimental p .
Average between experimental and predicted p for all QSPR models of a certain type and percentages of QSPR models whose values are in a certain interval
| QSPR model | 3d EEM | 3d EEM WO | 5d EEM | 3d QM | 5d QM | |
|---|---|---|---|---|---|---|
| Average | 0.876 | 0.911 | 0.913 | 0.929 | 0.951 | |
| Interval of | 11% | 83% | 94% | 78% | 83% | |
| 0.9 ≥ | 83% | 17% | 6% | 6% | 17% | |
| 0.85 ≥ | 6% | 0% | 0% | 17% | 0% | |
|
|
|
| ||||
| Average | 0.900 | 0.940 | ||||
| Interval of | 63% | 81% | ||||
| 0.9 ≥ | 35% | 13% | ||||
| 0.85 ≥ | 2% | 6% | ||||
Average between experimental and predicted p for all QSPR models using atomic charges calculated by a specific combination of theory level and basis set, or by a specific population analysis
| QSPR model | 3d EEM | 3d EEM WO | 5d EEM | 3d QM | 5d QM | |
|---|---|---|---|---|---|---|
| Theory level | HF/STO-3G | 0.878 | 0.919 | 0.918 | 0.952 | 0.966 |
| and basis set * | B3LYP/6–31G* | 0.889 | 0.917 | 0.918 | 0.967 | 0.972 |
| Population | MPA | 0.889 | 0.917 | 0.918 | 0.967 | 0.972 |
| analysis ** | NPA | 0.884 | 0.907 | 0.907 | 0.959 | 0.968 |
| Hirshfeld | 0.842 | 0.884 | 0.905 | 0.904 | 0.948 | |
| MK | 0.867 | 0.914 | 0.914 | 0.845 | 0.896 | |
| CHELPG | 0.870 | 0.886 | 0.906 | 0.853 | 0.909 | |
| AIM | 0.865 | 0.897 | 0.902 | 0.961 | 0.968 | |
*Only QSPR models employing MPA were included in this analysis.
**Only QSPR models using B3LYP/6–31G* were included in this analysis.
Figure 2Correlation graphs. Graphs showing the correlation between experimental and calculated p Kfor selected QSPR models.
Comparison between the performance of the QSPR models developed here, and previously developed models
| Theory | Number of | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Method | level | PA | Basis set | Descriptors |
|
|
| molecules | Source |
| QM | B3LYP | NPA | 6–311G** |
| 0.789 | 1.300 | 48 | 15 | Kreye and Seybold [ |
| B3LYP | NPA | 6–311G** |
| 0.731 | 1.500 | 38 | 15 | Kreye and Seybold [ | |
| B3LYP | NPA | 6–31+G* |
| 0.880 | 0.970 | 95 | 15 | Kreye and Seybold [ | |
| B3LYP | NPA | 6–31+G* |
| 0.865 | 1.000 | 38 | 15 | Kreye and Seybold [ | |
| B3LYP | NPA | 6–311G(d,p) |
| 0.911 | 0.252 | 173 | 19 | Gross and Seybold [ | |
| B3LYP | NPA | 6–311G(d,p) |
| 0.887 | 0.283 | 134 | 19 | Gross and Seybold [ | |
| B3LYP | NPA | 6–31G* | 0.961 | 0.440 | 986 | 124 | Svobodova and Geidl [ | ||
| B3LYP | NPA | 6–311G | 0.962 | 0.435 | 1013 | 124 | Svobodova and Geidl [ | ||
| B3LYP | NPA | 6–31G* | 0.959 | 0.464 | 545 | 74 | This work | ||
| B3LYP | NPA | 6–31G* | 0.968 | 0.410 | 705 | 74 | This work | ||
| EEM | B3LYP | NPA | 6–31G* | 0.918 | 0.656 | 261 | 74 | This work | |
| QM | B3LYP | MPA | 6–311G(d,p) |
| 0.913 | 0.248 | 179 | 19 | Gross and Seybold [ |
| B3LYP | MPA | 6–311G(d,p) |
| 0.894 | 0.274 | 144 | 19 | Gross and Seybold [ | |
| B3LYP | MPA | 6–311G | 0.938 | 0.556 | 605 | 124 | Svobodova and Geidl [ | ||
| B3LYP | MPA | 6–31G* | 0.959 | 0.450 | 936 | 124 | Svobodova and Geidl [ | ||
| B3LYP | MPA | 6–31G* | 0.967 | 0.415 | 685 | 74 | This work | ||
| B3LYP | MPA | 6–31G* | 0.972 | 0.380 | 822 | 74 | This work | ||
| EEM | B3LYP | MPA | 6–31G* | 0.919 | 0.651 | 265 | 74 | This work | |
| QM | B3LYP | MK | 6–311G(d,p) |
| 0.344 | 0.682 | 9 | 19 | Gross and Seybold [ |
| B3LYP | MK | 6–311G(d,p) |
| 0.692 | 0.467 | 38 | 19 | Gross and Seybold [ | |
| B3LYP | MK | 6–311G | 0.822 | 0.941 | 185 | 124 | Svobodova and Geidl [ | ||
| B3LYP | MK | 6–31G* | 0.808 | 0.978 | 168 | 124 | Svobodova and Geidl [ | ||
| B3LYP | MK | 6–31G* | 0.845 | 0.902 | 126 | 74 | This work | ||
| B3LYP | MK | 6–31G* | 0.896 | 0.739 | 201 | 74 | This work | ||
| EEM | B3LYP | MK | 6–31G* | 0.915 | 0.669 | 250 | 74 | This work | |
aWith solvent model SM5.4.
bWith solvent model SM8.
cEEM parameter set Bult2002 npa.
dEEM parameter set Chaves2006.
eEEM parameter set Jir2008 mk.
Comparison of the quality criteria and statistical criteria for the training set, test set and complete set for some selected charge calculation approaches
| 5d EEM QSPR model employing Svob2007_chal2 EEM parameters: | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Complete set: | ||||||||||
|
| RMSE |
|
| Number of molecules | ||||||
| 0.920 | 0.629 | 0.647 | 269 | 74 | ||||||
|
| ||||||||||
| Cross- | Training set | Test set | ||||||||
| validation | Number of | Number of | ||||||||
| step |
| RMSE |
|
| molecules |
| RMSE |
|
| molecules |
| 1 | 0.9283 | 0.5211 | 0.5498 | 137 | 59 | 0.9202 | 1.0754 | 1.3884 | 21 | 15 |
| 2 | 0.9210 | 0.6538 | 0.6899 | 124 | 59 | 0.9029 | 0.5394 | 0.6963 | 17 | 15 |
| 3 | 0.9191 | 0.6442 | 0.6796 | 120 | 59 | 0.9275 | 0.5823 | 0.7517 | 23 | 15 |
| 4 | 0.9207 | 0.6244 | 0.6588 | 123 | 59 | 0.9271 | 0.6878 | 0.8880 | 23 | 15 |
| 5 | 0.9274 | 0.6302 | 0.6643 | 138 | 60 | 0.9008 | 0.6678 | 0.8834 | 15 | 14 |
|
| ||||||||||
|
| ||||||||||
|
| RMSE |
|
| Number of molecules | ||||||
| 0.8866 | 0.7501 | 0.7825 | 106 | 74 | ||||||
|
| ||||||||||
| Cross- | Training set | Test set | ||||||||
| validation | Number of | Number of | ||||||||
| step |
| RMSE |
|
| molecules |
| RMSE |
|
| molecules |
| 1 | 0.8936 | 0.6349 | 0.6698 | 89 | 59 | 0.8704 | 1.2857 | 1.6598 | 12 | 15 |
| 2 | 0.8953 | 0.7526 | 0.7940 | 91 | 59 | 0.8018 | 0.7802 | 1.0072 | 7 | 15 |
| 3 | 0.8908 | 0.7481 | 0.7893 | 86 | 59 | 0.8647 | 0.7983 | 1.0306 | 12 | 15 |
| 4 | 0.8821 | 0.7614 | 0.8033 | 79 | 59 | 0.9154 | 0.7481 | 0.9658 | 19 | 15 |
| 5 | 0.8956 | 0.7557 | 0.7966 | 93 | 60 | 0.8089 | 0.8396 | 1.1107 | 7 | 14 |
Figure 3Correlation between calculated and experimental p for carboxylic acids