| Literature DB >> 31981015 |
Nicolas Tielker1, Daniel Tomazic1, Lukas Eberlein1, Stefan Güssregen2, Stefan M Kast3.
Abstract
Results are reported for octanol-water partition coefficients (log P) of the neutral states of drug-like molecules provided during the SAMPL6 (Statistical Assessment of Modeling of Proteins and Ligands) blind prediction challenge from applying the "embedded cluster reference interaction site model" (EC-RISM) as a solvation model for quantum-chemical calculations. Following the strategy outlined during earlier SAMPL challenges we first train 1- and 2-parameter water-free ("dry") and water-saturated ("wet") models for n-octanol solvation Gibbs energies with respect to experimental values from the "Minnesota Solvation Database" (MNSOL), yielding a root mean square error (RMSE) of 1.5 kcal mol-1 for the best-performing 2-parameter wet model, while the optimal water model developed for the pKa part of the SAMPL6 challenge is kept unchanged (RMSE 1.6 kcal mol-1 for neutral compounds from a model trained on both neutral and ionic species). Applying these models to the blind prediction set yields a log P RMSE of less than 0.5 for our best model (2-parameters, wet). Further analysis of our results reveals that a single compound is responsible for most of the error, SM15, without which the RMSE drops to 0.2. Since this is the only compound in the challenge dataset with a hydroxyl group we investigate other alcohols for which Gibbs energy of solvation data for both water and n-octanol are available in the MNSOL database to demonstrate a systematic cause of error and to discuss strategies for improvement.Entities:
Keywords: EC-RISM; Integral equation theory; Quantum chemistry; SAMPL6; Solvation model; log P
Mesh:
Substances:
Year: 2020 PMID: 31981015 PMCID: PMC7125249 DOI: 10.1007/s10822-020-00283-4
Source DB: PubMed Journal: J Comput Aided Mol Des ISSN: 0920-654X Impact factor: 3.686
Fig. 1Calculated vs. experimental Gibbs energies of solvation in n-octanol for the MNSOL dataset [27] based on EC-RISM calculations for various n-octanol models: dry octanol (A) and wet octanol (B) using either a single (1-par, light blue triangles) or two parameters (2-par, dark blue triangles) in the trained correction. Uncorrected data is shown as red squares. Dashed lines indicate descriptive regression results. Optimized solution and gas phase structures are provided as Online Resource 2; calculated data, also split into separate components, are provided as Online Resource 3
Regression parameters of optimized EC-RISM-based Gibbs energy of solvation models (c, c / kcal mol−1 Å−3, c / kcal mol−1 e−1) along with statistical metrics (root-mean-square error RMSE/kcal mol−1, mean absolute error MAE / kcal mol−1, mean signed error MSE / kcal mol−1, slope m′, intercept b′ / kcal mol−1, and coefficient of determination R2 from descriptive regression). For water, as
taken from Ref. [3], separate metrics are reported for neutrals, anions, and cations in addition to the full MNSOL dataset
| Solvent | RMSE | MAE | MSE | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Water | |||||||||
| All | 2.04 | 1.43 | − 0.26 | 1.00 | − 0.35 | 1.00 | – | − 0.10251 | − 15.728 |
| Neutrals | 1.56 | 1.13 | − 0.36 | 0.97 | − 0.47 | 0.89 | – | – | – |
| Anions | 3.07 | 2.46 | 0.01 | 1.10 | 7.18 | 0.94 | – | – | – |
| Cations | 2.98 | 2.10 | 0.02 | 0.96 | − 2.62 | 0.85 | – | – | – |
| Octanol (dry) | |||||||||
| 1-par | 1.78 | 1.33 | 0.03 | 0.66 | − 2.15 | 0.85 | – | − 0.00799 | – |
| 2-par | 1.48 | 1.14 | − 0.08 | 0.89 | − 0.78 | 0.87 | 1.33446 | − 0.00609 | – |
| Octanol (wet) | |||||||||
| 1-par | 1.73 | 1.31 | − 0.01 | 0.68 | − 2.08 | 0.85 | – | − 0.01552 | – |
| 2-par | 1.51 | 1.16 | − 0.10 | 0.87 | − 0.93 | 0.86 | 1.28924 | − 0.01315 | – |
Fig. 2EC-RISM-derived vs. experimental log P values for the SAMPL6 log P dataset using either a single parameter (1-par) for the n-octanol model (A) or a two-parameter (2-par) n-octanol model (B). Data generated using dry/wet octanol are shown as light/dark blue squares, respectively. Optimized solution phase structures are provided as Online Resource 4; calculated data, also split into separate components, are provided as Online Resource 5
Individual experimental and corresponding predicted log P values for all models
| log | Dry, 1-par | Wet, 1-par | Dry, 2-par | Wet, 2-par | |
|---|---|---|---|---|---|
| SM02 | 4.09 | 3.74 | 3.66 | 4.56 | 4.19 |
| SM04 | 3.98 | 2.97 | 3.00 | 4.08 | 3.86 |
| SM07 | 3.21 | 2.60 | 2.65 | 3.62 | 3.46 |
| SM08 | 3.10 | 1.55 | 1.62 | 3.78 | 3.37 |
| SM09 | 3.03 | 2.23 | 2.31 | 3.41 | 3.22 |
| SM11 | 2.10 | 0.22 | 0.29 | 2.25 | 2.01 |
| SM12 | 3.83 | 3.19 | 3.15 | 4.25 | 3.92 |
| SM13 | 2.92 | 1.99 | 2.22 | 3.28 | 3.22 |
| SM14 | 1.95 | 0.05 | 0.18 | 1.51 | 1.42 |
| SM15 | 3.07 | 0.42 | 0.51 | 1.85 | 1.71 |
| SM16 | 2.62 | 1.64 | 1.65 | 3.00 | 2.73 |
Submission IDs for the individual submission are 2tzb0 (dry, 1-par), rdsnw (wet, 1-par), qyzjx (dry, 2-par), j8nwc (wet, 1-par)
Statistical metrics for log P predictions (root-mean-square error RMSE, mean absolute error MAE, mean signed error MSE, slope m′, intercept b′, and coefficient of determination R2 from descriptive regression) for various models, encoded according to Table 2
| Model | Submission ID | RMSE | MAE | MSE | |||
|---|---|---|---|---|---|---|---|
| Dry, 1-par | 1.38 | 1.21 | − 1.21 | 1.58 | − 2.99 | 0.79 | |
| Wet, 1-par | 1.32 | 1.15 | − 1.15 | 1.51 | − 2.72 | 0.77 | |
| Dry, 2-par | 0.54 | 0.45 | 0.15 | 1.22 | − 0.51 | 0.73 | |
| Wet, 2-par | 0.47 | 0.31 | − 0.07 | 1.14 | − 0.51 | 0.73 |
Calculated Gibbs energies of the neutral microstates relative to the most favorable tautomer (microstate) of each compound for both solvents (in kcal mol−1)
| Microstate | Water | Octanol (wet, 2-par) | Octanol (dry, 2-par) | Octanol (wet, 1-par) | Octanol (dry, 1-par) |
|---|---|---|---|---|---|
| SM02_micro002 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM02_micro003 | 5.16 | 5.57 | 5.66 | 5.65 | 5.71 |
| SM02_micro007 | 6.18 | 8.86 | 8.80 | 10.30 | 10.40 |
| SM04_micro003 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM04_micro004 | 8.45 | 9.81 | 9.74 | 10.68 | 10.76 |
| SM04_micro009 | 11.10 | 11.72 | 11.78 | 12.15 | 12.24 |
| SM07_micro002 | 8.97 | 10.59 | 10.61 | 11.63 | 11.78 |
| SM07_micro003 | 6.75 | 7.97 | 8.00 | 8.34 | 8.41 |
| SM07_micro004 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM08_micro008 | 10.26 | 24.63 | 24.61 | 32.59 | 33.52 |
| SM08_micro010 | 5.69 | 6.05 | 6.56 | 4.70 | 4.89 |
| SM08_micro011 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM09_micro002 | 6.79 | 9.55 | 9.45 | 11.45 | 11.57 |
| SM09_micro003 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM09_micro011 | 5.60 | 6.02 | 6.09 | 6.46 | 6.55 |
| SM11_micro005 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM11_micro028 | 7.14 | 8.07 | 8.21 | 8.46 | 8.61 |
| SM11_micro029 | 14.81 | 17.69 | 17.68 | 18.81 | 18.93 |
| SM11_micro030 | 26.91 | 34.04 | 34.12 | 36.10 | 36.40 |
| SM12_micro002 | 4.73 | 5.21 | 5.32 | 5.35 | 5.43 |
| SM12_micro011 | 5.76 | 8.48 | 8.42 | 10.04 | 10.14 |
| SM12_micro012 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM13_micro005 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM13_micro007 | 6.23 | 6.28 | 6.31 | 6.69 | 6.76 |
| SM13_micro009 | 8.01 | 10.72 | 10.51 | 12.78 | 12.84 |
| SM14_micro001 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM14_micro005 | 28.76 | 37.41 | 37.02 | 41.99 | 42.23 |
| SM15_micro001 | 9.24 | 19.80 | 18.80 | 26.68 | 26.76 |
| SM15_micro002 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM16_micro002 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SM16_micro003 | 12.41 | 13.39 | 13.61 | 12.68 | 12.79 |
| SM16_micro007 | 6.75 | 11.48 | 11.49 | 13.61 | 13.93 |
Individual tautomer Gibbs energies in each solvent are provided as Online Resource 6. In contrast to the calculation of the partition coefficients where special treatment is not necessary, we here made sure that individual conformations undergoing a protonation shift during QC optimization were manually assigned to the correct microstate before evaluation of the partition function
Fig. 3Calculated vs experimental log P of the combined SAMPL6 and MNSOL datasets (A) and errors in the solvation Gibbs energies of the MNSOL compounds in both solvents (B). In panel (A), SAMPL6 data are represented by squares, MNSOL data by triangles. Additionally, alcoholic compounds and their regression statistics are colored in red (y = 1.03 x − 1.16) while all other compound classes are shown in blue (y = 1.14 x − 0.37). In panel (B), aliphatic alcohols are depicted as squares while aromatic alcohols are depicted as triangles. Dark blue data points represent the errors of the solvation Gibbs energy in water, whereas light blue points refer to the errors of the solvation Gibbs energies in wet n-octanol, sorted in ascending n-octanol error order per group