Literature DB >> 31497695

Prediction of Partition Coefficients of Environmental Toxins Using Computational Chemistry Methods.

David van der Spoel1, Sergio Manzetti1,2, Haiyang Zhang3, Andreas Klamt4,5.   

Abstract

The partitioning of compounds between aqueous and other phases is important for predicting toxicity. Although thousands of octanol-water partition coefficients have been measured, these represent only a small fraction of the anthropogenic compounds present in the environment. The octanol phase is often taken to be a mimic of the inner parts of phospholipid membranes. However, the core of such membranes is typically more hydrophobic than octanol, and other partition coefficients with other compounds may give complementary information. Although a number of (cheap) empirical methods exist to compute octanol-water (log k OW) and hexadecane-water (log k HW) partition coefficients, it would be interesting to know whether physics-based models can predict these crucial values more accurately. Here, we have computed log k OW and log k HW for 133 compounds from seven different pollutant categories as well as a control group using the solvation model based on electronic density (SMD) protocol based on Hartree-Fock (HF) or density functional theory (DFT) and the COSMO-RS method. For comparison, XlogP3 (log k OW) values were retrieved from the PubChem database, and KowWin log k OW values were determined as well. For 24 of these compounds, log k OW was computed using potential of mean force (PMF) calculations based on classical molecular dynamics simulations. A comparison of the accuracy of the methods shows that COSMO-RS, KowWin, and XlogP3 all have a root-mean-square deviation (rmsd) from the experimental data of ≈0.4 log units, whereas the SMD protocol has an rmsd of 1.0 log units using HF and 0.9 using DFT. PMF calculations yield the poorest accuracy (rmsd = 1.1 log units). Thirty-six out of 133 calculations are for compounds without known log k OW, and for these, we provide what we consider a robust prediction, in the sense that there are few outliers, by averaging over the methods. The results supplied may be instrumental when developing new methods in computational ecotoxicity. The log k HW values are found to be strongly correlated to log k OW for most compounds.

Entities:  

Year:  2019        PMID: 31497695      PMCID: PMC6713992          DOI: 10.1021/acsomega.9b01277

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

The chemical properties, biochemical interference, and biopersistence of environmental pollutants are critical factors for toxicology programs and strategies such as the REACH program and Tox21c.[1−3] It is necessary to label, classify, and predict toxicological properties of chemicals so as to develop evidence-based environmental health and safety (EHS) standards for new and emerging compounds.[4−6] As many new and emerging compounds and pollutants are known to pose serious environmental and health risks,[7] effective and inexpensive modes of the assessment of chemical and toxicological properties are required. In this study, we use a number of computational chemistry methods to estimate the octanolwater (log kOW) and hexadecanewater (log kHW) partition coefficients in order to get insight into membrane permeability. The modeling methods are applied to 133 compounds from the following categories: haloalkanes, haloaromatics, polycyclic aromatic hydrocarbons (PAHs), polycyclic biphenyls (PCBs), perfluorinated compounds (PFCs), parabens (PRBs), and phthalates (PHTs). All of these compounds belong to ubiquitous pollutant categories.[7] The log kOW value approximates a compound’s potential to partition into membranes, which is indirectly related to toxicity, because for most modes of action compounds have to cross a cell membrane. log kOW values therefore represent a cornerstone in pharmaceutical as well as environmental chemistry and toxicology, and it is important to determine log kOW for pollutants. Indeed, a toxicity profile including log kOW has to be determined before a chemical is allowed to enter the market in Europe, the United States of America, and Japan.[8] However, for many emerging pollutants, the log kOW have not been resolved,[9] and questions on their toxicity as well as regulatory decisions are still pending.[10−13] In addition, there have been reports of disagreements in log kOW measurements.[8] Indeed, experimental numbers for physicochemical observables may need more scrutiny in general.[14] A number of recent experimental studies have focused on the compound classes studied here, for example, Quinn et al. studied partitioning of PCBs into different phases,[15] while Xiang and co-workers focused on log kOW of perfluorated carboxylic acids,[16] but for numerous compounds no reliable measurements are available. In general, the error in experimental measurements of log kOW varies from 0.1 to 1 log units.[17] There is a large body of work related to measurement or prediction of log kOW based on, for instance, quantitative structure–property relationships (QSPRs).[18,19] Although efforts into developing better experimental methods are ongoing,[20] more research is focused on computational predictions ranging from coarse-grained simulations[21] to analytical reference interaction site model theory,[22] to quantum chemistry,[23] to machine learning[24,25] and other empirical methods.[26,27] Interestingly, log kOW have been used to parameterize models for dissipative particle dynamics simulations as well.[28] Simultaneously, there are still new efforts to measure new physicochemical data including log kOW with improvement of QSPR methods being one of the reasons.[29] This is important as experimental databases are known to have errors in everything from names and structures to physicochemical properties.[14,30,31] Environmental and toxicological sciences can, in principle, benefit from using the tools of computational chemistry to determine log kOW values and other properties of new chemicals to facilitate the differentiation of pollutants from innocuous chemicals.[23,24,32−39] In this study, we compare three computational approaches to calculate log kOW of 133 pollutant compounds, and, in addition, the log kHW are computed using two of the methods. The results are compared to experimental data as well as the XlogP prediction method[40,41] and the widely used KowWin (kOW-Windows),[42,43] which is part of the EPI Suite.[44]

Methods

One hundred and thirty-three compounds were selected from seven pollutant categories, haloalkanes, haloaromatics, PAHs, polychlorinated biphenyls (PCBs), PFCs, PRBs, PHTs, and a control category. All compounds and computed values are listed in Table S1. Initial 3D coordinates of the pollutants were taken, if available, from the PubChem database directly,[45] and if not available, the structures were built manually using Discovery Studio 4.5 Visualizer, followed by AM1[46] and PM6[47] optimizations in the gas phase using the Gaussian 09[48] software. It should be noted that a varying number of the compounds studied in this paper have been used for parameterizing the methods used here. Experimental log kOW and log kHW values were taken from a number of sources (Table S1). In some cases, the partition coefficients were determined from the difference between solvation free energies in water and octanol, alternatively water and hexadecane.[49] Some of the log kHW were taken from Hafkenscheid and Tomlinson who specify that the solvent is “aliphatic alkane”.[50] All results can be visualized on the (http://virtualchemistry.org) website.[51] It should be noted that many papers in the literature refer to computed data as experimental or even present log kOW without any reference whatsoever.

SMD Calculations

Solvation model based on electronic density (SMD) calculations were performed with the Gaussian 09[48] or the Gaussian 16[52] software. Optimizations and frequency calculations were carried out for all the pollutants in the liquid phase with SMD solvent models (i.e., water, n-octanol, and n-hexadecane)[49] and in the gas phase separately at the HF/6-31+G(d,p)[53] level of theory. The LANL2DZdp-ECP basis set[54] was used for iodine atoms, as this has shown to yield reasonable results in other studies.[55] The abbreviation Hartree–Fock (HF) will be used for these calculations in the remainder of this work. In earlier work,[56] a number of levels of theory were used, and it was found that both HF and density functional theory (DFT) are predicting numbers that are too low, which could be due to not only the basis set size applied but also the method. In order to distinguish these two possibilities, a further set of calculations was done using the BP86 functional[57,58] (denoted as BP86 in what follows). The solvation free energy of pollutant molecules was defined to be the difference in the free energy of solute calculated in the liquid phase and in the gas phase.[56,59,60] The partition coefficients were computed from the differences in solvation free energy of each compound in water, 1-octanol, and n-hexadecane. This approximation ignores the fact that octanol is significantly hydrated, the solubility of water in octanol being 48.8 g/kg at room temperature.[61] The Minnesota solvation database[62] has been used to develop and tune the SMD method, and some compounds from the database are used here.

COSMO-RS Calculations

COSMO-RS, that is, the conductor-like screening model for realistic solvation,[63−65] is a quantum chemically based approach to predict thermodynamic equilibrium properties of molecules in liquids. It starts from polarization charge densities of solute and solvent molecules, which arise if the molecules are embedded in a virtual conductor. These can be efficiently calculated using DFT combined with the conductor-like screening model (COSMO)[66] which is available in most quantum chemical programs. The TURBOMOLE program[67] with a Becke–Perdew functional[57,58] and a TZVPD basis set[68] was used for these calculations, together with the default COSMO parameters in TURBOMOLE. On the basis of the individual COSMO results of solutes and solvent molecules, the COSMO-RS method expresses the specific interactions of molecules in a liquid system, that is, electrostatic interactions and hydrogen bonding, pairwise, local interactions of surface segments quantified by the COSMO polarization charge densities σ of the interacting segments. By an efficient and accurate statistical thermodynamics calculation for the interacting surfaces, the chemical potentials and free energies of the molecules in pure and mixed solvents are calculated. For the current project, standard COSMO-RS calculations have been performed with the COSMOtherm program with the BP_TZVPD_FINE_18 parameterization.[69] This means that the conformations and geometries used in the COSMO-RS calculations for the solutes and solvents were generated and handled as described in Klamt et al.[70] For compounds, for which multiple conformations are relevant, in each solvent the free energy is calculated from the logarithm of the conformational partition function, leading to a multiconformational treatment which would be cumbersome in methods like SMD. log kOW values of 16 of the compounds studied here were used in tuning the COSMO-RS code.

XlogP3 and KowWin

The XlogP3 algorithm first searches for the most similar compound in the database, and if there is no full hit, the differences in the structures are accounted for by an incremental method.[41] The values for the compounds studied here were downloaded from PubChem[71] (Table S1). The algorithm yielded a root-mean-square deviation (rmsd) of 0.41 log units for 8199 compounds in the original paper.[41] A significant fraction of the compounds studied here is in the training set for XlogP3. KowWin log kOW were computed based on an empirical atom and fragment contribution method[42] by the widely used EPI Suite.[43,44]

Potential of Mean Force Calculations

Rectangular boxes containing 313 molecules of 1-octanol and 2627 water molecules were built where the 1-octanol fraction was slightly solvated (see analysis in Results and Discussion). This box was equilibrated for 2 ns in order to obtain a stable biphasic system. Pollutant input files were generated as described above. The generalized Amber force field (GAFF[72]) was used for 1-octanol and all pollutant compounds. Charges for the pollutants were computed from the electrostatic potential using the Merz–Kollman procedure[73,74] in Gaussian 16,[52] computed using DFT (B3LYP[57,75−77]) combined with the aug-cc-pVTZ basis set.[78−80] The compounds are part of the Alexandria database,[31] and Gaussian log files are available for download at Zenodo.[81] The TIP3P water model[82] was used. The GROMACS 2016 software package[83,84] was used for all simulations. Long-range Coulomb interactions and Lennard-Jones (LJ) interactions were treated using the particle-mesh Ewald method (PME).[85,86] LJ-PME was used because it has been shown that the omission of long-range LJ interactions leads to incorrect surface tensions of liquids[87−89] and biological membranes[86] and, in addition, has an effect on protein aggregation at high protein concentrations in simulations.[90] Constraints were used on all chemical bonds to hydrogen atoms, applying the LINCS algorithm,[91] allowing a 1 fs integration time step. Temperature coupling in production simulations was applied using the v-rescale algorithm[92] with a time constant of 0.5 ps. The pressure was controlled using the Parrinello–Rahman algorithm[93] with a time constant of 10 ps, using the semi-isotropic scheme where the direction orthogonal to the interface is coupled separately from the other two directions. 2 ns simulations were performed “pulling” the pollutant through in the 1-octanol water box (see Movie M1) perpendicular to the water/1-octanol interface. We note that in the PMF method explicit water molecules do enter the octanol phase and contribute to the energy profile as discussed below. As the resulting numbers are in principle a property of the force field only, they will be denoted GAFF-ESP because electrostatic potential-derived charges were used.

Results and Discussion

The prediction of log kOW values is summarized quantitatively in Table for each of the classes of compounds. The lowest rmsd from experimental data for all compounds is obtained for XlogP3, COSMO-RS, and KowWin (all about 0.4 log unit), followed by the SMD method (HF: 1.0 and BP86: 0.9 log units) and the potential of mean force (PMF) calculations (1.1 log unit). Both SMD methods systematically underestimate log kOW with a mean signed error (MSE) of ≈−0.6. The HF method is known to overpolarize compounds;[31,94] however, this should not be the case for BP86, and therefore, there may be other contributing factors. The GAFF-ESP calculations on the other hand overestimate log kOW possibly due to lack of explicit polarizability. The results are skewed slightly by outliers in some of the compound classes, which will be discussed in some detail below. The XlogP3 rmsd is low due to the fact that some of compounds may be part of the database used for optimizing the algorithm. It should also be noted that the experimental error varies between 0.1 and 1.0 log unit, with larger compound having larger uncertainty.[17]
Table 1

Statistics for Prediction of log kOW per Method and Compound Class and Number of (Neutral) Compounds Included Is Determined by the Availability of Experimental Dataa

 HF
BP86
COSMO-RS
classNr2rmsdMSENr2rmsdMSENr2rmsdMSE
control120.980.57–0.33120.990.43–0.08120.990.22–0.04
haloalkane80.980.420.3780.970.460.4180.980.180.01
haloaromatic120.990.47–0.44120.670.63–0.11120.740.42–0.05
PAH170.891.51–1.35170.711.36–1.15170.930.41–0.18
PCB230.870.69–0.64230.860.51–0.44230.880.30–0.19
PFC60.880.38–0.1360.540.60–0.0960.670.890.76
PRB70.981.82–1.8170.951.77–1.7570.990.31–0.15
PHT120.831.09–0.42120.931.00–0.76120.990.560.47
all970.870.99–0.64970.870.92–0.52970.960.420.01

Number of compounds N, squared correlation coefficient r2, rmsd from experiment (rmsd) and MSE, both in log P units.

Number of compounds N, squared correlation coefficient r2, rmsd from experiment (rmsd) and MSE, both in log P units. The calculation times vary between less than a second for the QSPR methods to minutes for COSMO-RS to days for the DFT and HF methods to half a year for each of the PMFs. Although long calculation times may preclude high-throughput usage of the methods, it is important to establish the relative accuracies of the methods.

Control Compounds

The control class consists of a number of small polar and apolar compounds including aromatic compounds. They were chosen to have a range of log kOW values, including negative ones and known experimental values. All methods except GAFF-ESP perform relatively well for this category with small MSE (Table ).

Haloalkanes

The haloalkane group comprised a set of eight compounds (Table S1), for which log kOW predictions all yield high correlation coefficients and low rmsd (Table ). Interestingly, SMD yields an almost perfect correlation coefficient r2 of 0.98; however, all log kOW are overestimated by 0.42 log units, while, in contrast, all other groups are underestimated systematically (Table ). This result suggests that there is room for improvement with the parameterization of the 1-octanol solvent in the SMD model. The compounds chloromethane, chloroethane, and pentachloroethane were also evaluated using GAFF-ESP, and these are overestimated in all cases, in particular for 1,1,1,2,2-pentachloroethane. The reason for this discrepancy is likely associated with the force-field parameters, which are not specifically optimized for compounds such as haloalkanes. Addition of a virtual site to model halogen bonding, as present in other general force fields,[95] might help resolve these issues to some extent.

Haloaromatic Compounds

log kOW for 12 haloaromatic compounds were predicted using the SMD and COSMO-RS methods, and for three of these, 1-chloro-3-phenylbenzene and hexachlorobenzene predictions were done using GAFF-ESP as well. The results (Tables and S1) show an rmsd between 0.4 and 0.5 for all methods. The rmsd for SMD-BP86 is slightly higher than the other methods because of one compound, namely, 1-chloro-3-phenylbenzene (Table S1).

Polycyclic Aromatic Hydrocarbons

It should be relatively easy to predict log kOW for PAHs given their planar structures, lack of substituents, and their limited number of geometrical conformations. Indeed, our calculations display a reasonable agreement with empirical data (Table ) with r2 > 0.85 in all cases except SMD-BP86. However, both SMD methods have a few severe outliers (−2 log units, Table S1). It may be that larger PAHs (>C16) which pertain higher log kOW values are more difficult to be predicted correctly using SMD as a result of their aromatic moment across the large planar structures.[32] 1,2-Dihydroacenaphthylene and chrysene were predicted using GAFF-ESP, yielding moderate overestimations in both cases (Table S1), in line with the overall trend (Table ).

Polychlorinated Biphenyls

PCBs are predicted quite accurately by all methods, in particular COSMO-RS. In the case of SMD-HF, there is a MSE of −0.64 log units, −0.44 for SMD-BP86. Given that PCBs have been present in the environment for a long time,[7] there is a large amount of data available and only two predictions are made here (Table ).
Table 2

Predictions for log kOW from Multiple Calculations for Neutral Compounds Where No Experimental Data Are Availablea

compoundcategoryHFBP86COSMO-RSXlogP3KowWinaverage
(2R)-1,2-dibromo-3-chloropropanehaloalkane3.033.172.302.40 2.7 (0.4)
1,2,3,4,5-pentafluoro-6-(2,3,4,5,6-pentafluorophenyl)benzenehaloaromatic3.734.365.844.605.764.9 (0.9)
5-methylchrysenePAH4.404.715.766.006.075.4 (0.7)
7H-benzo[c]fluorenePAH4.174.395.155.705.194.9 (0.6)
benzo[j]fluoranthenePAH4.394.695.836.406.115.5 (0.8)
cyclopenta[cd]pyrenePAH4.034.315.125.505.704.9 (0.7)
indeno[1,2,3-cd]pyrenePAH4.725.126.227.006.706.0 (0.9)
1,2,3,4,5-pentachloro-6-(2,3,4,6-tetrachlorophenyl)benzenePCB7.457.767.728.209.568.1 (0.8)
1,2,3,5-tetrachloro-4-(2,3,5,6-tetrachlorophenyl)benzenePCB6.987.277.297.708.917.6 (0.7)
2,2,3,3,4,4,4-heptafluorobutanoic acidPFC1.180.982.882.202.141.9 (0.8)
(3R)-2,2,3-trifluoro-3-(trifluoromethyl)oxiranePFC1.521.672.942.101.722.0 (0.6)
2,3,4,5,6-pentafluorobenzoic acidPFC1.991.422.822.001.782.0 (0.5)
(4R,6S)-1,1,2,2,3,3,4,5,5,6-decafluoro-4,6-bis(trifluoromethyl)cyclohexanePFC4.554.425.315.406.025.1 (0.6)
2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,12-tricosafluorododecanoic acidPFC4.435.557.747.607.496.6 (1.4)
2,2,3,3,4,4,5,5,6,6,7,7,7-tridecafluoroheptanoic acidPFC2.542.78–1.824.304.152.4 (2.3)
1,1,2,2,3,3,4,4,5,5,6,6,6-tridecafluorohexane-1-sulfonic acidPFC0.791.215.123.703.162.8 (1.7)
2,2,3,3,4,4,5,5,6,6,6-undecafluorohexanoic acidPFC2.301.874.203.603.483.1 (0.9)
1,1,1,2,2,3,3,4,5,5,5-undecafluoro-4-(trifluoromethyl)pentanePFC4.094.264.895.105.024.7 (0.5)
2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,9-heptadecafluorononanoic acidPFC3.094.646.165.605.485.0 (1.1)
2,2,3,3,4,4,5,5,5-nonafluoropentanoic acidPFC1.011.803.502.902.812.4 (0.9)
1,1,2,2,3,3,4,4,5,5,5-undecafluoro-N,N-bis(1,1,2,2,3,3,4,4,5,5,5-undecafluoropentyl)pentan-1-aminePFC8.218.3910.8311.9010.219.9 (1.5)
2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,11-henicosafluoroundecanoic acidPFC3.634.047.446.906.825.8 (1.6)
2-methylpropyl 4-hydroxybenzoatePRB1.742.053.173.403.402.8 (0.8)
propan-2-yl 4-hydroxybenzoatePRB1.071.582.732.802.912.2 (0.8)
octyl 4-hydroxybenzoatePRB4.053.705.555.405.434.8 (0.8)
pentyl 4-hydroxybenzoatePRB1.912.553.883.803.963.2 (0.9)
1-O-butyl 2-O-cyclohexyl benzene-1,2-dicarboxylatePHT4.364.645.305.305.415.0 (0.5)
dicyclohexyl benzene-1,2-dicarboxylatePHT4.175.005.665.206.205.2 (0.7)
bis(7-methyloctyl) benzene-1,2-dicarboxylatePHT7.758.159.809.609.378.9 (0.9)
diundecyl benzene-1,2-dicarboxylatePHT9.3411.8512.3312.3011.4911.5 (1.2)
butyl decyl phthalatePHT5.686.618.058.007.567.2 (1.0)
bis(5-methylhexyl) benzene-1,2-dicarboxylatePHT5.586.567.697.407.416.9 (0.8)
bis(4-methylpentyl) benzene-1,2-dicarboxylatePHT4.965.616.536.306.436.0 (0.6)
bis(11-methyldodecyl) benzene-1,2-dicarboxylatePHT11.7712.61 13.9013.3012.9 (0.8)
bis(9-methyldecyl) benzene-1,2-dicarboxylatePHT8.239.1811.9911.70 10.3 (1.7)
bis(2-propylheptyl) benzene-1,2-dicarboxylatePHT8.129.0510.729.6010.369.6 (1.0)

XlogP3 values taken from PubChem.[99] KowWin values produced using EPI suite.[42] Standard deviations within brackets.

XlogP3 values taken from PubChem.[99] KowWin values produced using EPI suite.[42] Standard deviations within brackets.

Perfluorinated Compounds

PFCs have an antipromiscuous chemistry, which makes these compounds associate with neither water nor octanol. This might lead to problems in experimental assessments too. Our predictions for the few compounds are quite close to the experimental values; however, the values in Table vary a lot between the methods used here, with large standard deviation for a number of compounds. Hidalgo and colleagues[96] recently reported log kOW values computed using SMD for medium-weight (up to 11 carbon atoms) linear PFCs and compare the results to empirical log kOW from the KowWin program.[42−44] They question some of the experimental data and also find systematic differences between the results obtained using purely empirical methods and the quantum chemistry-based SMD results. GAFF-ESP simulations were done for four PFCs from this set, namely, 1,1,1,2,2,2-hexafluoroethane, 1,1,1,2,2,3,3,3-octafluoropropane, 1,1,1,3,3,3-hexafluoropropan-2-ol, and 2,2,2-trifluoroacetic acid yielding overestimations of 1.2, 1.0, 1.4, and 1.1 log units from the experiment, respectively. The PMF method with the used GAFF-ESP force field therefore does not improve on the accuracy of the SMD methods or COSMO-RS.

Parabens

The log kOW for PRBs are predicted accurately using COSMO-RS and for XlogP3 and KowWin as well. For SMD, a large systematic deviation (MSE of −1.8 log units) was found using both HF and BP86 methods. To our knowledge however, only one study has reported multiple theoretically predicted log kOW values for PRBs.[97] In that work by Casoni and Sârbu, the most accurate method for calculation was found to be ACLogP, where methylparaben, ethylparaben, propylparaben, and butylparaben were predicted with a deviation of 0–0.22 log units compared with experimental results from a study by Kitagawa and Li.[98] From the study by Casoni and Sârbu, it can be concluded that the larger the PRB molecule becomes, the higher the deviation from the empirical results. The results for SMD based on either HF or BP86 point more to a constant offset, however. Five of the PRBs were also studied using PMFs, namely, methyl 4-hydroxybenzoate, ethyl 4-hydroxybenzoate, propyl 4-hydroxybenzoate, butyl 4-hydroxybenzoate, and heptyl-4-hydroxybenzoate, which gave a lower rmsd than SMD (1.2 log units) but opposite sign of the MSE.

Phthalates

PHTs turned out to be difficult to predict with an rmsd > 0.9 for both SMD methods (Table ). For COSMO-RS, this category displays the largest (positive) MSE of all. The PHTs bear a ring moiety with two carbon chains attached and have therefore a large number of degrees of freedom, which may contribute entropically to the free energies of solvation and hence to the high rmsd. Although the PHT category contains many relatively large compounds, there is no clear correlation between molecular weight and error in the prediction.

log kOW Predictions

Table displays computed log kOW for 36 compounds. The averages over the five numbers are proposed to be the predicted values because the rmsd for the average for the compounds where there are experimental data is slightly lower than any of the methods by itself. Inoue and co-workers reported predictions of log kOW for some large PFCs[100] using the KowWin program.[42−44] Their numbers are quite a bit higher than what is found here using SMD, but it seems that the log kOW of this category of compounds as well as PRBs and PHTs are underestimated systematically in SMD (Figure ). For PAHs, the size-dependent underestimation is present for the SMD methods. Nevertheless, there is a good correspondence between the methods, and because of the physical approach used in SMD and COSMO-RS, it seems reasonable to assume that the average numbers provided in Table are good approximations.
Figure 1

Correlation between experimental and calculated log P (residual) for six methods separately for all the classes considered.

Correlation between experimental and calculated log P (residual) for six methods separately for all the classes considered. The PMF calculations described here allow water to enter the octanol phase and in this manner influence the log kOW through preferential solvation or, in principle, by binding alcohol groups in the octanol phase. An analysis of the amount of water in the octanol phase yields no difference depending on the solute: in all systems, approximately 6 ± 1 water molecule is found in the octanol phase.

log kHW

log kOW partition coefficients have been used in environmental analysis and for toxicity prediction for several decades, and various methods for calculating and determining log kOW empirically have been devised. However, some studies[101,102] suggest that relying solely on the solubility of a compound in octanol and water may not yield a complete picture of the potential toxicity of the compound. For this reason, we have also performed predictions of the hexadecane (C16H34) water partition coefficient log kHW (Table S1). C16H34 has a higher capacity to solvate heavy apolar compounds, such as large PAHs and hydrocarbons and potentially also PFCs.[50] All computed log kHW are given in Table S1 alongside a small number (12) of experimental data points obtained from the Minnesota solvation database[62] (based on experimental data from, e.g., Abraham[103]) and from partition coefficients for water/aliphatic alkanes.[50] Compared to these 12 data points, the predictions are within 1 log unit for all compounds except for urea. Figure shows the correlation between log kHW computed using COSMO-RS and both SMD methods. When neglecting one outlier, 2,2,3,3,4,4,5,5,6,6,7,7,7-tridecafluoroheptanoic acid, the correlations r2 = 0.72 for HF and 0.76 for BP86, respectively, with rmsd between the two methods of 2.4 (HF) and 1.1 (BP86) log units. Because both COSMO-RS and SMD contain empirical elements, it is difficult to pinpoint what could be the underlying reason for the discrepancies, but BP86 is much closer to COSMO-RS than HF. It may be, obviously, that less effort has gone into fine-tuning the parameterization of implicit solvent models for hexadecane than for water and octanol. Nevertheless, COSMO-RS is known to perform well for a range of solvents,[104,105] while SMD also has been shown to outperform implicit solvent models based on empirical force fields.[56] Another issue could be that the basis set is not sufficiently large for PFCs, but evaluation of basis sets is beyond the scope of this paper.
Figure 2

Comparison between log kHW computed using the COSMO-RS (X-axis) and the quantum chemical method SMD (Y-axis) for the HF and BP86 methods.

Comparison between log kHW computed using the COSMO-RS (X-axis) and the quantum chemical method SMD (Y-axis) for the HF and BP86 methods. Figure shows that the correlation plots for the PFCs, PRBs, and PHTs all have a slope close to one but an offset of −3 to 4 log units. For haloalkanes, haloaromatics, and PCBs, the difference between log kOW and log kHW is small as it is for the hydrophobic (log kOW > 0) control compounds. The truly hydrophilic control compounds are much more soluble in octanol than in hexadecane. For most PAH compounds log kHW is slightly larger than log kOW. These findings are in line with the well-known result that more aliphatic compounds solvate more readily in lipid bilayers.[106] This suggests that there is not a lot of extra information to be had from log kHW calculations (or measurements) if the log kOW is known already.
Figure 3

Comparison between log kOW and log kHW computed using the quantum chemical method SMD-HF and SMD-BP86 as well as COSMO-RS for all compounds. The green lines correspond to log kOW = log kHW, and it is plotted to guide the eye.

Comparison between log kOW and log kHW computed using the quantum chemical method SMD-HF and SMD-BP86 as well as COSMO-RS for all compounds. The green lines correspond to log kOW = log kHW, and it is plotted to guide the eye.

Conclusions

Comparisons of empirical methods for computing log kOW have been published previously, including molecular modeling studies[107,108] and more approximative models.[109−113] Bannan et al. used separate free energy of solvation calculations in different solvents to obtain the log kOW.[108] These authors obtained an rmsd of ≈1.6 log units, slightly larger than the value found here (1.1, Table ). Although the number of compounds is too small to draw any conclusion and the compounds studied are different, it might be worthwhile to study whether the biphasic system used here improves the predictive power. Benfenati and co-workers compared KowWin to a number of other software packages and found this package to be one of the most accurate ones.[109] More recently, dos Reis and co-workers compared different prediction algorithms in a statistical analysis and found KowWin to be one of the most accurate ones.[111] In contrast, Geisler et al., in a comparison of log kOW prediction for small compounds, found KowWin to be quite a bit less accurate than COSMOtherm,[112] which is more in line with the results presented here. It is of interest that development of QSPR methods is ongoing,[114] in part fueled by the finding that databases used to derive older QSPR methods from needed to be curated.[30] In this study, we have derived log kOW values for a large set of compounds from different chemical classes using four computational methods. The quantum chemical SMD approach and the COSMO-RS methods were used to compute log kOW and log kHW for 133 compounds, while XlogP3 (log kOW) values were downloaded for reference and KowWin values computed using the EPI suite. By taking the average over four to five values, we provide what we consider accurate log kOW predictions for 36 compounds for which no experimental data are available (Table ). Because the number of available experimental data points remains limited despite decades of measurements, we hope these numbers may be of use in environmental toxicity applications. Of the three methods used for the predictions, the SMD method systematically underestimates log kOW, while COSMO-RS and XlogP3 overestimate log kOW slightly. COSMO-RS yields the most accurate predictions in the tests provided here. For a number of difficult cases, molecular dynamics simulations were used to compute the PMF for transport through the octanolwater phase. The method has an rmsd from the experimental data of ≈1.1 log units. However, if the large MSE (≈1.0 log units) is subtracted from the results, the rmsd reduces to 0.8 log units. The finding that the PMF method systematically overestimates log kOW could be related to a deficiency in either of the solvent models, although the TIP3P water model is known to reproduce solvation relatively well.[104,105,115,116] Considering the computational cost, PMFs are not competitive because the quality of the predictions is not better than the cheaper methods. Of the other methods, SMD is relatively expensive with CPU requirements varying from minutes to days because of nonlinear scaling of quantum chemical calculations with system size. DFT is more CPU-time efficient than HF. COSMO-RS is quite a bit more efficient than SMD, while KowWin is virtually instantaneous. Nevertheless, with the present quality of predictions, it may be wise to apply more than one method. It should also be added that our results should not be extrapolated to compounds with chemical moieties far outside the range of compounds here.
  65 in total

Review 1.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings.

Authors:  C A Lipinski; F Lombardo; B W Dominy; P J Feeney
Journal:  Adv Drug Deliv Rev       Date:  2001-03-01       Impact factor: 15.470

2.  The Kow controversy.

Authors:  Rebecca Renner
Journal:  Environ Sci Technol       Date:  2002-11-01       Impact factor: 9.028

3.  Predicting logP of pesticides using different software.

Authors:  E Benfenati; G Gini; N Piclin; A Roncaglioni; M R Varì
Journal:  Chemosphere       Date:  2003-12       Impact factor: 7.086

4.  Development and testing of a general amber force field.

Authors:  Junmei Wang; Romain M Wolf; James W Caldwell; Peter A Kollman; David A Case
Journal:  J Comput Chem       Date:  2004-07-15       Impact factor: 3.376

Review 5.  Quantitative correlation of physical and chemical properties with chemical structure: utility for prediction.

Authors:  Alan R Katritzky; Minati Kuanar; Svetoslav Slavov; C Dennis Hall; Mati Karelson; Iiris Kahn; Dimitar A Dobchev
Journal:  Chem Rev       Date:  2010-10-13       Impact factor: 60.622

6.  Canonical sampling through velocity rescaling.

Authors:  Giovanni Bussi; Davide Donadio; Michele Parrinello
Journal:  J Chem Phys       Date:  2007-01-07       Impact factor: 3.488

7.  Computation of octanol-water partition coefficients by guiding an additive model with knowledge.

Authors:  Tiejun Cheng; Yuan Zhao; Xun Li; Fu Lin; Yong Xu; Xinglong Zhang; Yan Li; Renxiao Wang; Luhua Lai
Journal:  J Chem Inf Model       Date:  2007-11-07       Impact factor: 4.956

8.  Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions.

Authors:  Aleksandr V Marenich; Christopher J Cramer; Donald G Truhlar
Journal:  J Phys Chem B       Date:  2009-05-07       Impact factor: 2.991

9.  The lipophilicity of parabens estimated on reverse phases chemically bonded and oil-impregnated plates and calculated using different computation methods.

Authors:  Dorina Casoni; Costel Sârbu
Journal:  J Sep Sci       Date:  2009-07       Impact factor: 3.645

10.  Optimization of parameters for semiempirical methods V: modification of NDDO approximations and application to 70 elements.

Authors:  James J P Stewart
Journal:  J Mol Model       Date:  2007-09-09       Impact factor: 1.810

View more
  1 in total

1.  Predicting octanol/water partition coefficients for the SAMPL6 challenge using the SM12, SM8, and SMD solvation models.

Authors:  Jonathan A Ouimet; Andrew S Paluch
Journal:  J Comput Aided Mol Des       Date:  2020-01-30       Impact factor: 3.686

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.