Literature DB >> 34473856

Reliable Prediction of the Octanol-Air Partition Ratio.

Sivani Baskaran¹, Ying Duan Lei¹, Frank Wania¹.

Abstract

The octanol-air equilibrium partition ratio (KOA ) is frequently used to describe the volatility of organic chemicals, whereby n-octanol serves as a substitute for a variety of organic phases ranging from organic matter in atmospheric particles and soils, to biological tissues such as plant foliage, fat, blood, and milk, and to polymeric sorbents. Because measured KOA values exist for just over 500 compounds, most of which are nonpolar halogenated aromatics, there is a need for tools that can reliably predict this parameter for a wide range of organic molecules, ideally at different temperatures. The ability of five techniques, specifically polyparameter linear free energy relationships (ppLFERs) with either experimental or predicted solute descriptors, EPISuite's KOAWIN, COSMOtherm, and OPERA, to predict the KOA of organic substances, either at 25 °C or at any temperature, was assessed by comparison with all KOA values measured to date. In addition, three different ppLFER equations for KOA were evaluated, and a new modified equation is proposed. A technique's performance was quantified with the mean absolute error (MAE), the root mean square error (RMSE), and the estimated uncertainty of future predicted values, that is, the prediction interval. We also considered each model's applicability domain and accessibility. With an RMSE of 0.37 and a MAE of 0.23 for predictions of log KOA at 25 °C and RMSE of 0.32 and MAE of 0.21 for predictions made at any temperature, the ppLFER equation using experimental solute descriptors predicted the KOA the best. Even if solute descriptors must be predicted in the absence of experimental values, ppLFERs are the preferred method, also because they are easy to use and freely available. Environ Toxicol Chem 2021;40:3166-3180.

© 2021 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC. © 2021 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.

Entities: Chemical

Keywords: Environmental partitioning; Organic contaminants; Partitioning coefficient; Partitioning ratio; Quantitative structure-activity relationships

Mesh：

Substances：

Year: 2021 PMID： 34473856 PMCID： PMC9292506 DOI： 10.1002/etc.5201

Source DB: PubMed Journal: Environ Toxicol Chem ISSN： 0730-7268 Impact factor: 4.218

INTRODUCTION

The use and importance of equilibrium partition ratios have been well established in environmental chemistry and in chemical exposure and risk assessment (Mackay et al., 2015; Schwarzenbach et al., 2005), in agricultural chemistry (Lacoste et al., 2020), and in the pharmaceutical sciences (Lipinski et al., 1997). The octanol–air equilibrium partition ratio (K OA) describes the distribution of a compound between n‐octanol and the gas phase at equilibrium. The log K OA is frequently used to describe the volatility of organic chemicals or their tendency to be absorbed from the gas phase. The solvent n‐octanol serves as a substitute for a variety of organic phases or sorbents. This includes organic matter in atmospheric particles (Finizio et al., 1997), soils (Finizio et al., 1998), and sludge (Cousins et al., 1997), as well as biological tissues including plant foliage (Paterson et al., 1990), lipids (Kelly & Gobas, 2003), blood, and milk (Batterman et al., 2002). It has also been applied to describe organic vapors partitioning into technical sorbents, such as polyurethane foam (Shoeib & Harner, 2002a) and silicone (Anderson et al., 2017), as well as phases such as organic films (Harner et al., 2003), house dust (Bennett & Furtaw, 2004), and cotton and polyester (Saini et al., 2016). Accordingly, it has been increasingly used in chemical exposure and risk assessment including estimations of a chemical's potential for bioaccumulation in air‐breathing species (Gobas et al., 2003). Laboratory measurements of the K OA by methods such as the generator column (Harner & Mackay, 1995), headspace (Lei et al., 2019; Xu & Kropscott, 2012, 2013, 2014), and gas chromatography retention time techniques (Su et al., 2002; Wania et al., 2002; Zhang et al., 1999) are often time consuming, difficult, and/or expensive. Reliable prediction methods could be useful as an alternative to determining the K OA experimentally. Multiple quantitative structure–activity relationships (QSARs) have been presented for predicting the K OA of organic chemicals. QSARs are commonly used in the pharmaceutical industry and in environmental science and chemistry to predict a property of a chemical based on its structure. Many of these QSARs are focused on predicting the K OA of a specific subset of closely related chemicals, for example, polychlorinated dibenzo‐p‐dioxins (PCDDs; Chen et al., 2002; Zeng et al., 2013), polybrominated diphenyl ethers (PBDEs; Chen et al., 2003a; Liu et al., 2013; Xu et al., 2007), polychlorinated naphthalenes (PCNs; Chen et al., 2003b), polychlorinated biphenyls (PCBs; Chen et al., 2003c; Chen et al., 2016; Li et al., 2020; Yuan et al., 2016), and polycyclic aromatic hydrocarbons (PAHs; Ferreira, 2001); thus they are too limited in their applicability domain for most purposes. Not all prediction models are limited to a specific subset of compounds. Polyparameter linear free energy relationships (ppLFERs) use system constants that describe the characteristics of the bulk phases, whereas solute descriptors describe specific characteristics of a solute. If experimental solute descriptors are not available, they can be estimated from molecular structure using QSARs (e.g., ACD/Absolv [Advanced Chemistry Development, 2021] and IFSQSAR [Brown et al., 2012]). Lack of structural diversity among the chemicals used in the calibration of the system constants can limit the applicability domain of a ppLFER equation. COSMOtherm, OPERA, EPISuite™ (US Environmental Protection Agency [USEPA], 2012), and SPARC Performs Automated Reasoning in Chemistry (SPARC) can also estimate the K OA from molecular structure. Different estimation softwares can yield divergent results and consequently may result in inconsistent regulatory decisions depending on the selected model (Zhang et al., 2010). It is important to compare the performance of various estimation techniques for K OA prior to implementation of predictions within regulatory procedures. For an estimation technique to be useful, it must be able to predict the K OA for a large variety of chemicals with quantifiable uncertainty, which is largely dependent on the diversity of its training and validation sets. To our knowledge, no assessment has compared different prediction techniques for K OA, but previous work has compared the performance of COSMOtherm, Absolv, and SPARC with respect to their ability to predict the hexadecane–air partition ratio (Bronner et al., 2010; Stenzel et al., 2012). Bronner et al. (2010) found that ppLFERs in combination with Absolv‐predicted solute descriptors (root mean square error [RMSE] = 0.40) were best able to predict the hexadecane–air partition ratio for bifunctional compounds, whereas for pesticides, drugs, and hormones the COSMOtherm model had the smallest RMSE (0.97). Stenzel et al. (2012) also reported that COSMOtherm (RMSE = 0.94) and ppLFER/Absolv (RMSE = 0.99) performed better than the SPARC model for predicting hexadecane–air partition ratios. Further research by Stenzel et al. (2014) in assessing the ability of these approaches to predict partitioning in four liquid/liquid systems again found that COSMOtherm and ppLFER/Absolv had similar levels of performance, and performed much better than SPARC. In the present study, using an exhaustive data set of experimentally determined K OA values recently compiled by Baskaran et al. (2021), we comparatively evaluated the ability of five methods, namely, EPISuite, COSMOtherm, OPERA, and the ppLFERs with either experimental or estimated solute descriptors, to reliably predict the K OA of an organic compound both at 25 °C and at any temperature. The five different prediction techniques were selected based on their capacity to predict K OA for a wide range of chemicals.

MATERIALS AND METHODS

Outline of the approach

The basis of the evaluation was deviations of predicted from experimental log K OA. Specifically, we used RMSEs, residuals (log K observed – log K predicted), and mean absolute errors (MAEs) as numerical criteria of model performance. Positive residuals indicate the model is underpredicting the log K OA, and negative residuals indicate an overprediction. The MAE is the average of the absolute residuals: A 95% prediction interval for future estimates was calculated for each model using the mean (x̄) and standard deviation (SD) of the residuals. The prediction interval is a confidence interval that predicts the residual for a future estimate (x new) made using the model, and can therefore give a range in which the true log K OA value will likely fall. The performance of the model was compared over the whole range of experimental data, whereby the average of multiple measurements for the same chemical at the same temperature was taken (n = 1453). In addition, the model performance was assessed for the much smaller set of chemicals for which all five techniques succeeded in making log K OA predictions at 25 °C (n = 439). The methods EPISuite, OPERA, and ppLFERs with estimated solute descriptors also provide information on how well a chemical's molecular structure fits into the model's applicability domain and how reliable the prediction therefore is likely to be. We also evaluated separately the deviations from measurements of predictions belonging to different categories of reliability. Because the applicability domain for each model is defined differently, these categories of reliability cannot be compared between the different estimation techniques but are rather a measure of how reliable a model considers its prediction. For example, a prediction classified as “good” from a ppLFER using estimated solute predictions is not comparable to an OPERA estimated value marked “excellent.” Instead, these are independent assessments that tell us that the ppLFER system considers this prediction to be reliable and the OPERA model assesses this prediction to be very good. Although the rankings of different models cannot be compared, we can compare the prediction performance for the different categories of reliability of one model. Finally, we also compare the different techniques with respect to the size of their applicability domain, and the accessibility and ease‐of‐use of the modeling software.

Experimental data

The data used in the present study were taken from an extensive literature review that identified 2017 experimental K OA values in the literature (Baskaran et al., 2021). After K OA values for wet octanol (K′ OA), chemical mixtures, ambiguous or nonorganic chemicals, and K OA values judged unreliable (Baskaran et al., 2021) were removed from the data set, 1950 experimental data points for 604 organic chemicals measured between –10 and 110 °C were included. The temperature of each experimental log K OA was rounded to the nearest tenth. If there was more than one log K OA measurement for a chemical at a given temperature, the average experimental log K OA was used. This procedure yielded 475 and 1673 unique log K OA values at 25 °C and any temperature, respectively. The original calibration of ppLFER and OPERA models was based on measured K OA data, and thus the measurements used to train and validate these models are included in our experimental data set. These values have not been removed from our analysis, and may bias the results of the assessment slightly in favor of models with more chemicals in the training and validation data sets. Note that the COSMOtherm and EPISuite models require no external calibration. The simplified molecular‐input line‐entry system (SMILES) notations, CAS numbers, and names of the chemicals were included in the database by Baskaran et al. (2021). Structure data files (SDFs) having explicit hydrogen bonding and three‐dimensional coordinates were generated using Open Babel (Ver 2.4.1; O'Boyle et al., 2011).

Prediction techniques

ppLFERs

Two of the prediction techniques included in the comparison are ppLFERs that use solute descriptors provided by the online UFZ‐LSER Database (Ulrich et al., 2017). The ppLFERs quantify a solute's interactions with bulk phases using descriptors reflecting the properties of the solute and system constants characterizing the solvating phases (Endo & Goss, 2014). Solute descriptors include a chemical's H‐bond basicity (B), H‐bond acidity (A), polarizability (S), excess molar refraction (E), McGowan molar volume (V), and log hexadecane–air partition ratio (L). The L and V terms describe the size of the chemical and general solute–solvent interactions, and the remaining terms quantify specific solute–solvent interactions (Abraham et al., 2001). Specifically, A and B terms describe hydrogen‐bonding interactions (Abraham, 1993), S accounts for solute polarizability and dipolarity (Abraham et al., 1990), and E also describes polarizability and interactions between π and lone election pairs (Abraham et al., 1990; Abraham, 1993). Constants for the octanol–air system, determined using multiple linear regressions of measured log K OA values against solute descriptors, have been reported by Abraham and Acree (2008; Abraham et al., 2010; Ulrich et al., 2017; Equation 3) and Endo and Goss (2014; Equation 4). The standard error (SE) associated with each system constant is included in the parentheses. The difference between Equations (3) and (4) is the use of either E or V as a solute descriptor. When used in conjunction, as in Equation (3), the E and L (or V ) parameters together describe the cavity energy and van der Waals interactions (Goss, 2005). Because V describes the size of the cavity, it must inherently contain information on the cavity energy, which is the basis of ppLFER equations of type (4) (Goss, 2005). The latter is often preferred because V is easily calculated. For most organic compounds, V and L are highly correlated; exceptions are polyfluorinated compounds and organosilicon compounds such as the cyclic volatile methyl siloxanes (Endo & Goss, 2014; Supporting Information, Figure SI 4). The system constant for the V term in Equation (4) is very small, has a p value of 0.17, and has a high relative uncertainty, suggesting that this term may not be necessary. Further details are provided in the Supporting Information. Thus, using the same data set of 181 chemicals used by Endo and Goss (2014), we calibrated a ppLFER equation that uses only the S, A, B, and L parameters: Mintz et al. (2008) presented a ppLFER for the enthalpy of solvation in octanol (∆H°OA in units of kJ/mol): Whereas ∆H°OA is temperature dependent, a ∆H°OA value obtained with Equation (6) is assumed to apply to the range from –10 to 45 °C (Mintz et al., 2008). We did not consider an alternative ppLFER for ∆H°OA that uses the E solute descriptor instead of V (Mintz et al., 2007). The ∆H°OA can be converted to an internal energy of phase transfer between octanol and the gas phase (∆U°OA in units of kJ/mol; Atkinson & Curthoys, 1978; Goss & Eisenreich, 1996) using R, the ideal gas constant (8.314 10–3 kJ K–1 mol–1), and the temperature T at which ∆H°OA was measured (assumed to be 298.15 K): A log K OA at 25 °C and a U° OA in combination with the van't Hoff equation allows for the estimation of K OA at different temperatures: A temperature‐dependent ppLFER equation for log K OA was presented by Jin et al. (2017), who expanded the ppLFER approach proposed by Chen et al. (2002) for PCBs to a wider range of chemicals: The use of Equation (9) eliminates the need to use Equations (6–8) to derive a temperature‐dependent estimate of the log K OA. Currently only the system constants for Equations (3) and (6) are integrated within the UFZ‐LSER website (Ulrich et al., 2017). In the literature, one can also find ppLFERs for the partitioning between water‐saturated (“wet”) octanol and the gas phase (Endo & Goss, 2014; Flanagan et al., 2005). Most measured K OA data are, however, for partitioning between “dry” octanol and the gas phase. The UFZ‐LSER Database (Ulrich et al., 2017) provides experimental and estimated solute descriptors. Chemicals were identified within the UFZ website using either the CAS number or, if not available, the universal SMILES format. For some chemicals the UFZ database contains multiple sets of solute descriptors from different peer‐reviewed articles. In the present study we used three types of solute descriptors: those experimentally derived and reported in the peer‐reviewed literature, labeled as “UFZ‐preselected published values,” those that have been part of the Absolv database, which are for the most part experimentally derived values, but for which an explicit reference is not available, and those that have been predicted with QSAR models built into the UFZ‐LSER website and that rely on the Iterative Fragment Selection algorithm (Brown et al., 2012). It is expected that the reliability of the solute descriptors decreases from 1 to 3. When we selected experimental solute descriptors for the chemicals with experimental K OA values, preference was therefore given to descriptors from peer‐reviewed studies over the Absolv data set. In general, the “UFZ‐preselected published values” were very similar to the Absolv solute descriptors. Experimental solute descriptors can be determined by measuring partitioning ratios, solubilities, and chromatographic data (Sprunger et al., 2007). It is assumed that if experimental solute descriptors exist for a given chemical, it is inherently within the applicability domain of the ppLFER model. In addition, estimated solute descriptors were determined for all chemicals with experimental K OA values using the IFSQSARs (available from Brown, 2020; also see Brown, 2014; Brown et al., 2012). The IFSQSAR models have been integrated into the UFZ website (Ulrich et al., 2017). For PCDDs with a single substitution in the 2‐ or 7‐position (i.e., PCDD 26, PCDD 29, and PCDD 50), the estimated L value was corrected by +4.039 (T.N. Brown, personal communication, 2020). Ultimately, we evaluated the predictive performance of six ppLFERs: Equations (3), (4), and (5), each with either experimental or IFSQSAR‐predicted solute descriptors. Of these six ppLFERs, we selected the best performing equations for both experimental and estimated solute descriptors for comparison with the other prediction techniques. The UFZ website provides information on how well a molecule fits within the applicability domain of the IFSQSAR models used to estimate solute descriptors. It is based on a chemical similarity score and leverage value. Using the IFSQSAR model directly provides an SE for each predicted solute descriptor (except V; Brown, 2020). Because SEs for the system constants in Equations (3)–(6) are also available, we applied a Monte Carlo analysis to calculate the overall error of a predicted log K OA. The Supporting Information includes a sample calculation. Using the overall error of the prediction, we assigned a reliability score to each prediction (Table 1). Because the SEs for experimental solute descriptors are usually not available, only the error of the system constants could be considered in that case.

Table 1

Reliability score of polyparameter linear free energy relationship (ppLFER) predictions based on the overall error (OE) of the estimate

Reliability score	Guideline
Poor	OE > 1
Fair	OE ≤ 1
Good	OE ≤ 0.75
Excellent	OE ≤ 0.5

Reliability score of polyparameter linear free energy relationship (ppLFER) predictions based on the overall error (OE) of the estimate

EPISuite

The KOAWIN model for predicting K OA is a part of the US EPA's (2012) Estimation Programs Interface (EPI) Suite software. It predicts K OA from a thermodynamic triangle with the Henry's law constant (HLC; Pa m3 mol–1) and the octanol–water equilibrium partition ratio (K OW): where R is the ideal gas constant (8.314 Pa mol K–1 m–3) and T is the temperature (K) of the HLC. If the dimensionless HLC air–water partition ratio (K AW) is used, this simplifies to: The K OW describes chemical partitioning between two mutually soluble solvents. By using measured or estimated K OW to derive K OA in a thermodynamic triangle, KOAWIN is calculating a partition ratio between water‐saturated octanol and air, which we denote as K′ OA. The KOAWIN model can provide two K′ OA values for a compound, using either K OW and HLC values estimated from KOWWIN & HENRYWIN or by substituting any available experimental data for these estimated values. Although the latter is expected to provide more reliable results, for many chemicals EPISuite's PhysProp database does not contain measured HLC values. Consequently, the value for K OA recommended by KOAWIN is often a value at least partially derived from estimated values. The K OW and HLC at 25 °C are estimated with KOWWIN and HENRYWIN, respectively, which are both fragment‐based QSARs. By default, KOAWIN uses HLCs predicted by the bond method within HENRYWIN, which can make predictions for a larger range of compounds than the group contribution method, although it is expected to be less accurate (Meylan & Howard, 1991). The KOAWIN model assumes that the K OW does not vary greatly with temperature, and the effect of temperature on K OA can be estimated from the temperature dependence of the HLC (Meylan & Howard, 2005). The HENRYWIN model uses Equation (11) to express the temperature dependence of the HLC (USEPA, 2012): Whereas A h and B h for some compounds have been compiled from the literature, a slope analogy method is applied to estimate B h for most compounds, whereby classes of similar chemicals, such as different aldehydes or PCB congeners, share the same slope B h (USEPA, 2012). The A h is then obtained using the HLC at 25 °C, which is an experimental value, if available. Version 1.11 of KOAWIN was run in batch mode using EPISuite Version 4.11 (USEPA, 2012) with the 2017 updated files provided on the US EPA website. The temperature dependence equations were obtained from HENRYWIN Version 3.21 in batch mode. Thus, two sets of estimated K OA values were obtained from EPISuite. The first set comprises predictions made at 25 °C only, using the HENRYWIN and KOWWIN predictions, hereafter referred to as EPISuite‐25. The second set, referred to as EPISuite‐T, uses the temperature‐dependent equations provided by HENRYWIN to adjust the HLC in the calculation of K OA. Because these equations use both experimental and estimated HLC values, the resulting K OA at 25 °C can differ from those made by EPISuite‐25. Because the K OA is determined using a thermodynamic triangle, KOAWIN does not have an applicability domain. However the applicability domains of the KOWWIN and HENRYWIN models should be considered. The applicability domains of HENRYWIN include both a range for the molecular mass (26.04–451.47 g/mol) and the log K AW (–11.64–2.92) of the chemicals in the training set (USEPA, 2012). The reported applicability domain of KOWWIN is based on the molecular mass range of chemicals in the training set (18.02–719.92 g/mol; USEPA, 2012). We assign a reliability score for every EPISuite estimation of K OA based on how a chemical fits within the applicability domains of HENRYWIN and KOWWIN. Equation (11) does not have an applicability domain limit, and in some cases the HLC value used is experimental, so the classifications of the predictions for EPISuite‐25 and EPISuite‐T are different (Table 2).

Table 2

Reliability of the EPISuite‐25 predictions determined using the applicability domain (AD) set by the KOWWIN and HENRYWIN models

EPISuite set	Reliability score	Guideline
EPISuite‐25	Poor	Outside all 3 AD limits
	Fair	Outside 2 of the AD limits
	Good	Outside 1 of the AD limits
	Excellent	Inside all AD limits
EPISuite‐T	Poor	Outside all 3 AD limits
	Fair	Outside 2 of the AD limits
	Good	Outside of the KOWWIN AD or uses slope analogy to obtain the HLC equation
	Excellent	Inside KOWWIN AD, experimental HLC equation

HLC = Henry's law constant.

Reliability of the EPISuite‐25 predictions determined using the applicability domain (AD) set by the KOWWIN and HENRYWIN models HLC = Henry's law constant.

OPERA

The command‐line version of the OPEn structure–activity/property Relationship App (OPERA) model (Version 2.5) by Mansouri et al. (2018) was downloaded via GitHub (available from Mansouri, 2018; also see Mansouri & Williams, 2017). The OPERA model for the K OA at 25 °C is a QSAR model that uses molecular descriptors calculated using PaDEL and a weighted k‐nearest neighbor approach (Mansouri & Williams, 2017). It was developed with the PhysProp database within EPISuite (Mansouri & Williams, 2017). The two PaDEL descriptors used for K OA prediction are the number of H‐bond donors, expressing the chemical's capacity for hydrogen bonding, and the log of the gas–hexadecane partitioning ratio (Mansouri & Williams, 2017). The number of H‐bond donors plays a similar role to the solute descriptors B, A, and S in a ppLFER, whereas the log hexadecane–air partition ratio is identical to the solute descriptor L. The OPERA model does not estimate the K OA at temperatures other than 25 °C. The applicability domain of the model is assessed through a global and a local applicability domain index (Mansouri & Williams, 2017). The global index assesses whether a chemical fits within the space of the training set used to create OPERA (Mansouri et al., 2018), and the local index compares how similar the chemical is to the five nearest neighbors in the model space (Mansouri et al., 2018). The confidence level index assesses the reliability of the prediction based on the distances of the chemical to its nearest neighbors and the accuracy of the predictions for these nearest neighbors (Mansouri et al., 2018). For chemicals that fall within the global applicability domain, the confidence level index provides additional information regarding the reliability of the prediction (Mansouri et al., 2018). Each estimate from the OPERA model is categorized as excellent, good, fair, or poor, based on the importance of the global and local applicability domain level, and the confidence index (Table 3).

Table 3

The reliability score for the OPERA model, determined based on the reported information regarding the applicability domain (AD) of the prediction

Reliability score	Global AD	Local AD level	Confidence level index
Poor	Outside	<0.4
Fair	Outside	0.4–0.6
Fair	Inside	<0.6
Good	Outside	≥0.6
Good	Inside	≥0.6	<0.75
Excellent	Inside	≥0.6	≥0.75

The reliability score for the OPERA model, determined based on the reported information regarding the applicability domain (AD) of the prediction

COSMO‐RS

The use of the COnductor like Screening Model for Realistic Solvents (COSMO‐RS) software suite requires both COSMOconf with TURBOMOLE and COSMOtherm from Dassault Systèmes. The approach uses both quantum chemical density functional theory (DFT) and statistical thermodynamics of the molecular interactions to predict K OA (Klamt et al., 2009). In brief, COSMOconf with TURBOMOLE, using DFT/COSMO calculations, determines different possible conformations of a molecule based on its polar charge density and how those charges interact with a virtual conductor (Klamt et al., 2009). The resulting electron density and geometry of the molecule are used to identify the most energetically optimal state for the compound in the virtual conductor (Klamt et al., 2009). The COSMOtherm model uses the polar charge density of the different conformations of the compound to quantify the interaction energy of the chemical in octanol and the gas phase, which combined with statistical thermodynamics allows for the calculation of the chemical potential of the compound in the different phases and subsequently the Gibbs free energy of the phase transfer (∆G°OA; Klamt et al., 2009). The SDFs were entered into COSMOconf (Ver 20.0.0) with TURBOMOLE (Ver 4.5) to generate COSMO files. All conformers were used by COSMOtherm (Ver 20.0.0) using the BP‐TZVPD‐FINE+GAS parameterization to calculate the ∆G°OA for each chemical at a given temperature, which is used to calculate the K OA: Because this method, hereafter referred to as COSMOtherm, does not involve calibration using existing measurements or any experimental data, there is no applicability domain. Thus no reliability score can be assigned to the predictions.

RESULTS AND DISCUSSION

Selecting a ppLFERs equation

Predictions made with ppLFER Equations (3–5) and (9) using experimental descriptors showed similar performance when assessed against measured K OA values (Table 4). This was expected, because Equations (3–5) were calibrated from similar training sets. Endo and Goss (2014) expanded the K OA data set by Abraham and Acree (2008) with chemicals with reliable solute descriptors including some polyfluorinated and organosilicon compounds, and we subsequently used the Endo and Goss data set for calibrating Equation (5). The data set Jin et al. (2017) used to train their model is much larger than the others, because it includes K OA values at temperatures other than 25 °C. However, the temperature‐dependent K OA values included by Jin et al. (2017) are similar to those used to parameterize Equation (6) by Mintz et al. (2008). Most of the chemicals included in the Jin et al. (2017) data set that are not included in the data sets used to develop Equations (3–6) are persistent organic pollutants such as PCBs and PCNs, as noted by Jin et al. (2017) and shown in the Supporting Information, Table SI 4.

Table 4

Performance of different polyparameter linear free energy relationships (ppLFERs) using experimental solute descriptors

	25 °C				All temperatures
Estimate	No.	MAE	RMSE	PI_width	No.	MAE	RMSE	PI_width
Abraham & Acree, 2008 (Equation 3)	337	0.21	0.33	1.29	1363	0.22	0.32	1.23
Endo & Goss, 2014 (Equation 4)	347	0.22	0.37	1.41	1395	0.20	0.32	1.25
Modified (Equation 5)	347	0.23	0.37	1.43	1395	0.21	0.32	1.27
Jin et al. 2017 (Equation 9)	337	0.21	0.33	1.28	1363	0.22	0.33	1.28

MAE = mean absolute error; RMSE = root mean square error; PI = prediction interval.

Performance of different polyparameter linear free energy relationships (ppLFERs) using experimental solute descriptors MAE = mean absolute error; RMSE = root mean square error; PI = prediction interval. Although predictions made at 25 °C by Equation (3) were slightly better than those made by the other three equations (Table 4), the difference was small. The residuals for each of the equations were also highly correlated, and Equations (4) and (5) performed better for fluorinated and organosilicon chemicals (see the Supporting Information). Because Equations (3) and (9) cannot predict a log K OA for solutes without an E value, Equations (4) and (5) gave 10 more predictions at 25 °C and 32 more at all temperatures. Because all ppLFER system equations performed equally well and Equations (4) and (5) allow for predictions of more molecules, we only compared the results of Equation (5) with the other prediction techniques. Statistics for the residuals from the Abraham and Acree (2008), Endo and Goss (2014), and Jin et al. (2017) equations are included in the Supporting Information (Table SI 5).

Comparing model performance in predicting KOA at 25 °C

We first compared the ability of different approaches to accurately predict K OA at 25 °C. A violin plot shows the distribution of residuals for all prediction methods (Supporting Information, Figure SI 7). The EPISuite‐25, OPERA, COSMOtherm, and ppLFER with estimated solute descriptors models can predict the K OA for all 475 chemicals with a measured value at 25 °C. For 128 chemicals (27%), the lack of empirical data prevented the K OA prediction with the ppLFER with experimental solute descriptors. The EPISuite‐T model was not able to predict log K OA for one chemical, because the empirical HLC temperature equation reported for acetic acid (CAS# 64‐19‐7) to predict HLC at 25 °C appeared to be erroneous and did not match what was reported in the original work by Khan and Brimblecombe (1992) nor did the original equation produce a sensible result. Because the EPISuite‐25 method already considers HLC values predicted with the bond method at 25 °C we did not substitute these HLC estimates in EPISuite‐T. Table 5 lists the numerical metrics of the comparison of predicted with measured K OA values. This comparison does not provide an entirely level playing field, because of the lower number of predictions for two of the techniques. There were 129 chemicals for which K OA could not be predicted by at least one of the six methods. We therefore also compared the predictive performance for the 346 measured K OA values at 25 °C, for which all six models were able to provide a prediction (Table 6).

Table 5

Prediction tool	No.	Mean	MAE	Median	SD	RMSE	PI_U	PI_L
ppLFER, experimental	347	–0.07	0.23	–0.01	0.37	0.37	0.65	–0.79
ppLFER, estimated	475	–0.09	0.34	–0.05	0.50	0.51	0.89	–1.07
EPISuite‐25	475	0.00	0.58	0.06	0.78	0.78	1.54	–1.54
EPISuite‐T	474	–0.01	0.54	0.01	0.74	0.74	1.43	–1.45
OPERA	475	0.07	0.33	0.00	0.52	0.52	1.09	–0.95
COSMOtherm	475	0.02	0.41	–0.04	0.56	0.56	1.12	–1.08

ppLFER = polyparameter linear free energy relationship.

Table 6

Prediction tool	Mean	MAE	Median	SD	RMSE	PI_U	PI_L
ppLFER, experimental	–0.07	0.23	–0.01	0.37	0.37	0.65	–0.79
ppLFER, estimated	–0.05	0.31	–0.01	0.45	0.45	0.83	–0.94
EPISuite‐25	0.08	0.47	0.09	0.64	0.65	1.35	–1.18
EPISuite‐T	0.02	0.44	0.05	0.60	0.60	1.19	–1.15
OPERA	0.00	0.26	–0.01	0.44	0.44	0.86	–0.86
COSMOtherm	0.00	0.34	–0.04	0.46	0.46	0.90	–0.90

ppLFER = polyparameter linear free energy relationship.

Statistics on the residuals, including the mean absolute error (MAE), standard deviation (SD), root mean square error (RSMSE), upper (PIU), and lower (PIL) prediction interval, when considering all log K OA estimates at 25 °C from all models ppLFER = polyparameter linear free energy relationship. Statistics on the residuals, including the mean absolute error (MAE), standard deviation (SD), root mean square error (RSMSE), upper (PIU), and lower (PIL) prediction interval, when considering only estimates for chemicals, for which all models could make predictions at 25 °C (n = 346) ppLFER = polyparameter linear free energy relationship. By any of the metrics in Tables 5 and 6, the ppLFER with experimental solute descriptors performed the best in predicting the K OA at 25 °C, including if the comparison was restricted to the same set of chemicals (Table 6). The MAE and RMSE of the prediction were only approximately just over a fifth and a third of a log unit, respectively. When all possible predictions were considered (Table 5), the ppLFER equation with estimated solute descriptors slightly outperformed the COSMOtherm and OPERA models. If the 346 chemicals in Table 6 were considered, the OPERA model had the smallest RMSE of the three, although the differences were small. The ppLFER with experimental solute descriptors and COSMOtherm performed particularly well for volatile chemicals with a log K OA less than 6 (Figure 1).

Figure 1

Plots of the residual of the log K OA predictions at 25 °C against the measured log K OA value for 346 chemicals for which an estimate could be made by all models. The dashed lines indicate the prediction interval and the mean. The color and shape of each point indicate the reliability score of each prediction as described in the Materials and Methods section. The COSMOtherm model has no applicability domain or reliability score. The corresponding plot for log K OA predictions at 25 °C for all available data is available in the Supporting Information. ppLFER = polyparameter linear free energy relationship. The two EPISuite predictions had the largest deviations from the measured values, almost twice those of the best performing method. Invalidity of the assumption that wet and dry octanol have the same solvation properties (Abraham & Acree, 2008; Pinsuwan et al., 1995) may contribute to the higher residuals. Wet octanol has a stronger hydrogen bonding acidity than dry octanol and is less capable of dissolving hydrophobic chemicals (Abraham & Acree, 2008). Thus K′OA will typically be lower than K OA for nonpolar compounds and higher for more polar compounds (Abraham & Acree, 2008). However, when chemicals are grouped by their ability to undergo hydrogen bonding, as described by Baskaran et al. (2021), no correlation with the residual is observed (Supporting Information, Figure SI 9). If we looked at the partition ratios used to calculate K OA in EPISuite (Supporting Information, Figure SI 10), we found that residuals were larger when the log K OW was greater than 5, and in these cases, the tendency for the EPISuite models to under‐ and overpredict K OA occurred more frequently when the log K AW was greater than 3 and less than –2, respectively. The KOAWIN model performed well for a small subset of chemicals with a log K OW between 2 and 5 and log K AW between –3 and 3. When the log K OW was less than 5, predictions for K OA were generally close to experimental values, except at very low K AW. This analysis suggests that when one is using estimated K OW and HLC values in a thermodynamic triangle, the log K OA is less reliable for more hydrophobic chemicals, as suggested by Abraham and Acree (2008). Only for 256 chemicals with a K OA measured at 25 °C did EPISuite contain experimental data for both log K OW and HLC. Although it was expected that the use of experimental values would improve predictions of K OA, it is difficult to see whether this was the case due to the sparsity of data (Supporting Information, Figure SI 11). The EPISuite‐T model performed slightly better than EPISuite‐25 because it incorporates some experimental values for HLC, whereas EPISuite‐25 relies only on estimated log K OW and HLC values. Using both estimated K OW and HLC, EPISuite was reported to have a standard deviation of 0.688 and MAE of 0.479, (n = 310; USEPA, 2012), which is very similar to our assessment (Table 5). The good performance of the ppLFER models and OPERA may in part be attributable to an overlap between the data set that was used in their calibration and the data set of measured values we used for model evaluation. The training sets for the ppLFER and OPERA models consist of 181 (see the Supporting Information, Table SI 4) and 270 chemicals, respectively. Because our data set aimed to be comprehensive and include all reliable measured K OA values that have been reported in the literature (Baskaran et al., 2021), it is likely that almost all the log K OA data used to calibrate these models were also contained in the evaluation data set. Given the limited number of K OA data, particularly at 25 °C, the overlap of the training chemicals with the chemicals used in the present assessment will reduce the error calculated with these prediction techniques. Lampic and Parnis (2020) compared the prediction performance of ppLFERs with estimated solute descriptors, OPERA, COSMOtherm, and EPISuite for various physical–chemical properties of per‐ and polyfluoroalkyl compounds at 25 °C. The OPERA model had the smallest reported RMSE and MAE for K OA (Lampic & Parnis, 2020). The experimental data set included multiple measurements for the same compound, which can bias the statistical calculations. The data set also includes K OA values for fluorotelemer alcohols, perfluorooctane sulfonamido ethanol (FOSE), and fluorooctane sulfonamide measured with the gas chromatography‐retention time (GC‐RT) technique (Lei et al., 2004). We have excluded those data from our data set because this technique is not suited for polar compounds (Baskaran et al., 2021).

Comparing model performance in predicting KOA at any temperature

We next compared the prediction performance of the four models that can predict the K OA at temperatures other than 25 °C. Because only 28% of measured K OA values are for 25 °C, this data set is considerably larger; it comprises 1676 data points for 604 chemicals at temperatures ranging from –10 to 110 °C. The COSMOtherm and the ppLFER using estimated solute descriptors methods were able to predict K OA corresponding to all 1676 literature values (Table 7). The ppLFERs equations using experimental solute descriptors were limited by the availability of the solute descriptors for 281 chemicals. The EPISuite‐T model was able to predict K OA values for all but one compound, acetic acid, as mentioned in the previous section, Comparing model performance in predicting K OA at 25 °C. In total, 1394 measurements that could be compared against all four models (Table 8).

Table 7

Prediction tool	No.	Mean	MAE	Median	SD	RMSE	PI_U	PI_L
ppLFER, experimental	1395	0.01	0.21	0.03	0.32	0.32	0.64	–0.63
ppLFER, estimated	1676	–0.02	0.29	0.01	0.43	0.43	0.82	–0.86
EPISuite‐T	1675	0.13	0.59	0.10	0.80	0.81	1.69	–1.44
COSMOtherm	1676	0.04	0.40	0.04	0.55	0.56	1.12	–1.05

ppLFER = polyparameter linear free energy relationship.

Table 8

Statistics on the residuals of log K OA predictions at temperatures –10–110 °C that could be made with all models (n = 1394), including the mean absolute error (MAE), standard deviation (SD), root mean square error (RMSE), upper (PIU), and lower prediction interval (PIL)

Prediction tool	Mean	MAE	Median	SD	RMSE	PI_U	PI_L
ppLFER, experimental	0.01	0.21	0.03	0.32	0.32	0.64	–0.63
ppLFER, estimated	0.01	0.26	0.02	0.38	0.38	0.75	–0.73
EPISuite‐T	0.11	0.51	0.10	0.69	0.69	1.46	–1.23
COSMOtherm	0.08	0.34	0.05	0.45	0.46	0.97	–0.81

ppLFER = polyparameter linear free energy relationship.

Statistics on the residuals of log K OA predictions at temperatures between –10 and 110 °C, including the mean absolute error (MAE), standard deviation (SD), root mean square error (RMSE), upper (PIU), and lower prediction interval (PIL) ppLFER = polyparameter linear free energy relationship. Statistics on the residuals of log K OA predictions at temperatures –10–110 °C that could be made with all models (n = 1394), including the mean absolute error (MAE), standard deviation (SD), root mean square error (RMSE), upper (PIU), and lower prediction interval (PIL) ppLFER = polyparameter linear free energy relationship. All models predicted the log K OA with a MAE less than 0.6 and SDs less than or equal to 0.8. The residuals consistently fell within 3 log units of the measured value (Figure 2 and Supporting Information, Figure SI 12), with some exceptions: the log K OA predictions at 5 °C by COSMOtherm for N‐ethyl FOSE and N‐methyl FOSE, the prediction at 45 °C for 2,2′,3,4,4′,5′,6‐heptabromodiphenyl ether (BDE 183) and at 5 and 10 °C for endosulfan I by EPISuite‐T, and the 25 °C prediction for 2'‐methoxy‐2,4,4'‐tribromodiphenyl ether (2'‐MeO BDE 28) by the ppLFER equation using estimated solute descriptors. The ppLFERs using experimental solute descriptors had the lowest RMSE (0.32) and MAE (0.21) values. The use of ppLFERs with estimated solute descriptors performed marginally better than COSMOtherm.

Figure 2

The measured log K OA is plotted against the residual of the log K OA prediction for chemicals when estimates could be made with all models (n = 1394). The dashed lines indicate the prediction interval and the mean. The color and shape of each point indicate the reliability score of each prediction as described in Materials and Methods. A similar plot for all chemicals with measured data is available in the Supporting Information. ppLFER = polyparameter linear free energy relationship. The ∆H°OA is itself temperature dependent, and Equation (6) is meant to calculate ∆H°OA within the much narrower range of 10–45 °C (Mintz et al., 2008); so we explored whether the error of the prediction is dependent on temperature (Supporting Information, Figure SI 16). Although we saw no temperature dependence on the residuals, we evaluated the ppLFER equations within the applicability domain of the ∆H°OA equation (Supporting Information, Figure SI 19). There was little change in the model performance (Supporting Information, Table SI 7). In fact, the RMSE increased slightly to 0.33 and 0.44 for ppLFERs using experimental and estimated solute descriptors, respectively. In addition, the ∆H°OA and log K OA at 25 °C predicted by the ppLFERs were highly correlated (R 2 = 0.98), which suggests that ∆H°OA could be estimated from log K OA (Supporting Information, Figure SI 5). Further research is needed to understand the relationship between these properties using empirical data. Because the equations in EPISuite are intended to predict HLC within the range from 0 to 50 °C, these considerations could equally apply to explain the relatively poor performance of EPISuite‐T. However, the MAEs for EPISuite‐T are not higher at extreme temperatures. The EPISuite information notes that a log K OA at 10 °C estimated from an HLC at 10 °C and a K OW at 25 °C can be expected to have an SD of approximately 0.575 and a MAE of 0.433, based on a sample size of 126 compounds (USEPA, 2012). In the present study we estimated higher SD and MAE for EPISuite‐T predictions at any temperature of 0.80 and 0.59, respectively, based on a sample size of 1675. Limiting estimates to the temperature applicability domain of the model (0–50 °C) had little effect on the SD (0.81) and MAE (0.60; Supporting Information, Table SI 8). Because COSMOtherm requires no internal calibration, it is particularly impressive that it predicted the log K OA at different temperatures so well. The COSMOtherm model was previously shown to systematically underpredict (with high residuals) the log K OA for substituted PAHs (Parnis et al., 2015). In addition, we found that COSMOtherm also systematically overpredicted the log K OA for the polar fluorinated compounds and PBDEs (Supporting Information, Figures SI 14 and SI 22). The ppLFER equations also tended to underpredict the log K OA for PBDEs. It is also important to acknowledge that in some instances large residuals may be due to flawed measurements. Supporting Information, Figure SI 23, compiles measured data that cause absolute residuals greater than 0.75, when predicted with ppLFERs and COSMOtherm. The models all over‐ or underpredicted the K OA to a similarly large extent for these compounds, with the exception of benzo[ghi]perylene (Odabasi et al., 2006). This finding suggests that the measured K OA values may be too low for BDE 183, cyclopentadecanone, 1,3,5‐tribromo‐2‐(2,3‐dibromopropoxy)benzene (DPTE), 2,4,6‐tribromophenyl allyl ether (TBPAE), endosulfan I, N‐nitrosodibutylamine, and N‐nitrosodipropylamine and too high for β‐hexachlorocyclohexane (HCH) and δ‐HCH. The K OA values for cyclopentadecanone, TBPAE, and DPTE were measured using a GC‐RT time technique (Okeme et al., 2020). Other log K OA values from Okeme et al. (2020) had been excluded from our data set because the chemicals were judged to be too polar for this technique (Baskaran et al., 2021). It is likely, particularly in the case of TBPAE and DPTE, that the chemicals are capable of some hydrogen bonding with octanol and did not interact with a nonpolar stationary phase in the same way they would with octanol (Baskaran et al., 2021). The reported log K OA would then be expected to be smaller than the true value. The K OA values for N‐nitrosodipropylamine and N‐nitrosodibutylamine were measured using a static technique relating the volatility of analytes from fish tissue to octanol (Hiatt, 1997). Although many of the values reported in the present study did not stand out as erroneous, the SDs of some measurements were high (e.g., N‐nitrosodipropylamine 20,000 ± 5100; N‐nitrosodibutylamine 16,000 ± 8500). The log K OA of 11.96 for BDE 183 measured using a generator column technique (Harner & Shoeib, 2002) may also be erroneous. This value is close to the limits of this technique, and it is possible that BDE 183 never reached equilibrium in the generator column. This would also explain why the reported ΔU°OA (referred to as ΔH° OA) for BDE 183 is 10 kJ/mol lower than the ΔU°OA for BDE 153 (2,2′,4,4′,5,5′‐hexabromodiphenyl ether) and BDE 156 (2,3,3′,4,4′,5‐hexabromodiphenyl ether), even though these two PBDEs have measured log K OA values of 11.82 and 11.97 at 25 °C, respectively (Harner & Shoeib, 2002). A log K OA for BDE 183 closer to 13, as predicted by the ppLFER models and COSMOtherm, would also be more consistent with the other physical–chemical properties reported for this congener (Wania & Dugani, 2003). Both β‐HCH and δ‐HCH had a measured log K OA of almost 9 at 25 °C using the generator column technique, whereas log K OA values for α‐HCH and γ‐HCH measured using the same technique were in the region of 7.5 and 8 (Shoeib & Harner, 2002b). As stereoisomers, these chemicals are unlikely to have log K OA values differing by more than 1 log unit, and the true log K OA for β‐HCH and δ‐HCH is likely closer to the value estimated by the ppLFER equations and COSMOtherm.

Predictive performance and applicability domains

Reliability scores for predictions were assigned as described in the Materials and Methods section. Most models consider chemicals with measured K OA vales within their applicability domain as having a reliability score of excellent. The reliability scores for EPISuite‐25 predictions appeared to give a reasonable indication of the error associated with a prediction, namely, predictions falling outside of the 95% prediction interval (Figure 1) are scored as either good or fair. The 423 predictions made by EPISuite‐25 that are scored excellent have an RMSE of 0.66 (Supporting Information, Table SI 9). The RMSE values for good and fair predictions were 2.24 (n = 6) and 1.29 (n = 46), respectively. The reliability scores of EPISuite‐T predictions at 25 °C were generally lower than for the other models, with 46 judged good (RMSE = 1.22), and 268 fair (RMSE = 0.68). The lower reliability scores of the EPISuite‐T predictions also occurred for predictions made at different temperatures (Supporting Information, Table SI 10). The RMSE of poor predictions (2.25 at 25 °C and 1.66 at all temperatures) was higher than the excellent predictions (0.49 at 25 °C, 0.55 at all temperatures). However, as with EPISuite‐25, the RMSE of the good EPISuite‐T predictions were in both cases higher than for the fair predictions. The good and fair categorizations used to describe the applicability domain of the EPISuite model appeared to be unreliable indicators of the uncertainty of the prediction, which may reflect the impact of a few outliers on the RMSE for a relatively small set of chemicals. When ppLFER equations with experimental solute descriptors were used, the overall error of the prediction was always estimated to be less than 0.2, which meant all predictions were considered excellent. This means that considering only the uncertainty of the system constants and ignoring the error of the solute descriptors does not provide a metric suitable for judging the reliability of the prediction. Most of the chemicals with a measured K OA at 25 °C fell within the applicability domain of the ppLFER model with estimated solute descriptors, with 362 chemicals falling into the excellent (RMSE = 0.40) and 96 into the good category (RMSE = 0.94). The reliability scores, determined using Monte Carlo analysis with the standard error of system constants and solute descriptors, gave a good indication of the error of the prediction, particularly for values at 25 °C. If we compare the reliability scores with the RMSE of the predictions at all temperatures (Supporting Information, Table SI 10), predictions with scores of excellent and poor had the smallest (0.33) and largest (0.81) RMSEs, respectively. The RMSEs for the good (0.65) and fair (0.41) categories suggest that these intermediate scores are less reliable indicators of prediction quality, possibly again because of the small sample size. The ppLFER equations generally had the fewest number of predictions that scored excellent and had an absolute residual greater than 1. The OPERA model also judged most chemicals to have a good fit with its applicability domain, with predictions assigned categories of excellent and good for 417 and 58 chemicals, respectively. The RMSE values of the residuals for the good predictions were higher (0.64) than those for the excellent predictions (0.50), as would be expected. The OPERA model had 29 predictions that were within the global applicability domain, and had high local applicability domain and confidence levels, which had an absolute residual greater than 1. After EPISuite‐25 (n = 53), this was the highest number of absolute residuals larger than 1 seen for excellent predictions at 25 °C. There is no means of assessing whether a prediction made by COSMOtherm falls within the applicability domain of the model.

Estimating the uncertainty of future predictions

Our analysis makes it possible to estimate the possible error of future predictions with the investigated methods. If a model is used to make a new prediction, there is a 95% chance that the residual of a new prediction for a chemical within the applicability domain of the model is within the range of the prediction interval. A smaller prediction interval gives high confidence in future predictions. Figure 3 shows the prediction intervals for each technique at 25 °C and for the four models capable of predicting log K OA at other temperatures. Because the prediction interval is calculated using the SD of the residuals, the width of the prediction interval is directly correlated with the reported SDs. The prediction interval for each model is listed in Tables 5, 6, 7, 8. For future predictions, we recommend using the prediction intervals as reported in Table 9.

Figure 3

Table 9

Summary of the mean absolute error (MAE), root mean square error (RMSE), and prediction intervals (PIs) for prediction models that work best to predict log K OA at 25 °C and any temperaturea

T	Rank	Prediction tool	No.	MAE	RMSE	PI_U	PI_L	PI_width
25 °C	1	Modified ppLFER, experimental	347	0.23	0.37	0.65	–0.79	1.43
	2	Modified ppLFER, estimated	475	0.34	0.51	0.89	–1.07	1.96
	3	OPERA	475	0.33	0.52	1.09	–0.95	2.03
	4	COSMOtherm	475	0.41	0.56	1.12	–1.08	2.19
Any T	1	Modified ppLFER, experimental	1395	0.21	0.32	0.64	–0.63	1.27
	2	Modified ppLFER, estimated	1676	0.29	0.43	0.82	–0.86	1.68
	3	COSMOtherm	1676	0.40	0.56	1.12	–1.05	2.17

Models are ranked based on their performance and usability within each temperature range.

PIwidth is equal to |PIU| + |PIL|.

ppLFER = polyparameter linear free energy relationship.

Prediction interval for each model for log K OA predictions at 25 °C and at all temperatures. The red points indicate the upper and lower prediction intervals and the bar indicates the mean error. ppLFER = polyparameter linear free energy relationship. Summary of the mean absolute error (MAE), root mean square error (RMSE), and prediction intervals (PIs) for prediction models that work best to predict log K OA at 25 °C and any temperaturea Models are ranked based on their performance and usability within each temperature range. PIwidth is equal to |PIU| + |PIL|. ppLFER = polyparameter linear free energy relationship.

Accessibility and usability of models

Another factor to consider when comparing the different prediction techniques is their usability and accessibility. Although COSMOtherm performed very well, it is a licenced software that can be expensive to purchase. The COSMOconf calculations are very demanding in central processing unit (CPU) time. Using a supercomputer can reduce the time for calculation, but this requires both access to, and set‐up, of the COSMOconf calculations on a supercomputer. On the other hand, once COSMO files for the different congeners of a chemical have been generated, any number of partition ratios at any temperature can be obtained with COSMOtherm with very limited additional CPU demand. On balance, the cost and time necessary to use COSMOtherm reduce its accessibility and usefulness as a routine prediction tool. Experimental solute descriptors for ppLFER equations can be obtained from the UFZ website (Ulrich et al., 2017). Although Equations (3) and (6) are integrated into the website and it is possible to directly export the log K OA of a chemical at 25 °C and the ΔH°AO, using Equations (4), (5), or (9) or the prediction of K OA values at temperatures other than 25 °C require a simple spreadsheet for calculating the partition ratio and adjusting it for temperature. Equation (5) uses the fewest number of solute descriptors while still performing as well as Equations (3) or (9). This means that there is a higher likelihood that a complete set of experimental solute descriptors would be available for calculating log K OA. If no experimental solute descriptors are available, the estimated solute descriptors can be calculated using the IFSQSARs implemented in the UFZ website (Ulrich et al., 2017), or by downloading the IFSQSAR prediction software by Brown (2020). Using the UFZ website for estimated solute descriptors will be sufficient for most people wanting to predict a log K OA. The use of the stand‐alone IFSQSAR model is only necessary to obtain the SE of the estimated solute descriptors. The OPERA model is another freely available software that includes a graphic user interface that makes it easy to use (Mansouri, 2018). However, it is available only on Windows and Linux operating systems. The OPERA model has also been integrated into the CompTox Dashboard, which provides the prediction and details regarding the applicability domain and reliability of the prediction for a single chemical. When predicting the log K OA for multiple chemicals, obtaining these details from the CompTox Dashboard is not easy, and using the downloadable software is recommended. The EPISuite models (EPISuite‐25 and EPISuite‐T) are also freely available from the US EPA website for computers running on a Windows operating system (USEPA, 2012). Of course, using the thermodynamic triangle approach does not require the use of the software, but the KOAWIN model can incorporate the use of experimental K OW and HLC values when available. The models KOAWIN, HENRYWIN, and KOWWIN can all complete batch mode calculations, which is useful for large datasets. However, extracting the temperature equations for multiple chemicals from HENRYWIN for EPISuite‐T can be time consuming, because the format of the equation can differ between chemicals.

CONCLUSIONS

If only the K OA at 25 °C is required, the ppLFER equation using experimental solute descriptors has the best performance, although the ppLFER equation with estimated solute descriptors, OPERA, and COSMOtherm also give reasonably good estimates. Both EPISuite‐25 and EPISuite‐T consistently had the worst performance of all assessed prediction techniques. The ppLFER using estimated solute descriptors and COSMOtherm were able to reliably predict the log K OA for the greatest number of chemicals and temperatures. Solute descriptors can be estimated for almost all neutral chemicals, and the COSMO‐RS approach for estimating log K OA can be applied to virtually any chemical. Both EPISuite‐T and ppLFER equations using experimental solute descriptors were also able to predict the log K OA for chemicals at various temperatures, but these are limited by the availability of HLC temperature correction equations and experimental solute descriptors, respectively. For most models, the reliability of each of the prediction techniques as assessed based on the fit with the applicability domain only correlated somewhat with the residual of the actual prediction. In terms of usability, the ppLFER equations and the OPERA model are the easiest to use. The COSMOtherm model, being more expensive and more complex, has higher costs and labor associated with its use. In summary, a ppLFER using experimental solute descriptors is the best predictor of log K OA regardless of temperature. If experimental descriptors are not available, a ppLFER with estimated solute descriptors or the COSMOtherm model are also well suited to predicting the log K OA. The OPERA model also works well for predicting log K OA at 25 °C.

Supporting Information

The Supporting Information is available on the Wiley Online Library at https://doi.org/10.1002/etc.5201. This article includes online‐only Supporting Information. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file.

41 in total

1. Iterative fragment selection: a group contribution approach to predicting fish biotransformation half-lives.

Authors: Trevor N Brown; Jon A Arnot; Frank Wania
Journal: Environ Sci Technol Date: 2012-07-20 Impact factor: 9.028

2. Method for simultaneous determination of partition coefficients for cyclic volatile methylsiloxanes and dimethylsilanediol.

Authors: Shihe Xu; Bruce Kropscott
Journal: Anal Chem Date: 2012-02-06 Impact factor: 6.986

3. QSPR/QSAR models for prediction of the physicochemical properties and biological activity of polybrominated diphenyl ethers.

Authors: Hui-Ying Xu; Jian-Wei Zou; Qing-Sen Yu; Yan-Hua Wang; Jian-Ying Zhang; Hai-Xiao Jin
Journal: Chemosphere Date: 2006-09-08 Impact factor: 7.086

4. Using measured octanol-air partition coefficients to explain environmental partitioning of organochlorine pesticides.

Authors: Mahiba Shoeib; Tom Harner
Journal: Environ Toxicol Chem Date: 2002-05 Impact factor: 3.742

5. Characterization of polymer coated glass as a passive air sampler for persistent organic pollutants.

Authors: Tom Harner; Nick J Farrar; Mahiba Shoeib; Kevin C Jones; Frank A P C Gobas
Journal: Environ Sci Technol Date: 2003-06-01 Impact factor: 9.028

6. Characterizing the sorption of polybrominated diphenyl ethers (PBDEs) to cotton and polyester fabrics under controlled conditions.

Authors: Amandeep Saini; Cassandra Rauert; Myrna J Simpson; Stuart Harrad; Miriam L Diamond
Journal: Sci Total Environ Date: 2016-04-29 Impact factor: 7.963

7. Quantitative relationships between molecular structures, environmental temperatures and octanol-air partition coefficients of polychlorinated biphenyls.

Authors: J W Chen; Tom Harner; K-W Schramm; X Quan; X Y Xue; A Kettrup
Journal: Comput Biol Chem Date: 2003-07 Impact factor: 2.877

8. Quantitative structure-property relationships for octanol-air partition coefficients of polychlorinated naphthalenes, chlorobenzenes and p,p'-DDT.

Authors: Jingwen Chen; Xingya Xue; Karl-Werner Schramm; Xie Quan; Fenglin Yang; Antonius Kettrup
Journal: Comput Biol Chem Date: 2003-07 Impact factor: 2.877

9. OPERA models for predicting physicochemical properties and environmental fate endpoints.

Authors: Kamel Mansouri; Chris M Grulke; Richard S Judson; Antony J Williams
Journal: J Cheminform Date: 2018-03-08 Impact factor: 5.514

10. Evaluation of the three-phase equilibrium method for measuring temperature dependence of internally consistent partition coefficients (K(OW), K(OA), and K(AW)) for volatile methylsiloxanes and trimethylsilanol.

Authors: Shihe Xu; Bruce Kropscott
Journal: Environ Toxicol Chem Date: 2014-10-31 Impact factor: 3.742

1 in total

1. Identifying organic chemicals not subject to bioaccumulation in air-breathing organisms using predicted partitioning and biotransformation properties.

Authors: Frank Wania; Ying Duan Lei; Sivani Baskaran; Alessandro Sangion
Journal: Integr Environ Assess Manag Date: 2021-12-16 Impact factor: 3.084

1 in total