Muhammad Irfan Khawar1, Deedar Nabi1,2. 1. Institute of Environmental Sciences and Engineering (IESE), National University of Sciences and Technology (NUST), H-12, Islamabad 48000, Pakistan. 2. Bigelow Laboratory for Ocean Sciences, 60 Bigelow Dr, East Boothbay, Maine 04544, United States.
Abstract
Over the past 3 decades, low-density polyethylene (PE) passive sampling devices have been widely used to scout organic chemicals in air, water, sediments, and biotic phases. Experimental partition coefficient data, required to calculate the concentrations in environmental compartments, are not widely available. In this study, we developed and rigorously evaluated linear free energy relationships (LFERs) to predict the partition coefficient between the PE and the water phase (log K pe-w). Poly-parameter (pp) LFERs based on Abraham solute parameters performed better (root-mean-square error, rmse = 0.333-0.350 log unit) in predicting log K pe-w compared to the two one-parameter (op) LFERs built on n-hexadecane-water and octanol-water partition coefficients (rmse = 0.41-0.42 log unit), indicating that one parameter is not able to account for all types of interactions experienced by a chemical during PE-water exchange. Dimensionality analyses show that the calibration dataset used to train pp-LFERs fulfills all the requirements to obtain a robust model for log K pe-w. Van der Waals interactions of the molecule tend to favor the PE phase, and polar interactions of the molecule favor the water phase. The PE phase is the most sensitive to polarizable chemicals compared to other commonly used passive sampling polymeric phases such as polydimethylsiloxane, polyoxymethylene, and polyacrylate. For op-LFERs, the PE phase is better represented by the hexadecane phase than by the octanol phase. A computational method based on the conductor-like screening model for real solvents theory did good job in estimating log K pe-w for chemicals that were neither very hydrophobic nor very hydrophilic in nature. Our models can be used to reliably predict the log K pe-w values of simple neutral organic chemicals. This study provides insights into the partitioning behavior of PE samplers compared to other commonly used passive samplers.
Over the past 3 decades, low-density polyethylene (PE) passive sampling devices have been widely used to scout organic chemicals in air, water, sediments, and biotic phases. Experimental partition coefficient data, required to calculate the concentrations in environmental compartments, are not widely available. In this study, we developed and rigorously evaluated linear free energy relationships (LFERs) to predict the partition coefficient between the PE and the water phase (log K pe-w). Poly-parameter (pp) LFERs based on Abraham solute parameters performed better (root-mean-square error, rmse = 0.333-0.350 log unit) in predicting log K pe-w compared to the two one-parameter (op) LFERs built on n-hexadecane-water and octanol-water partition coefficients (rmse = 0.41-0.42 log unit), indicating that one parameter is not able to account for all types of interactions experienced by a chemical during PE-water exchange. Dimensionality analyses show that the calibration dataset used to train pp-LFERs fulfills all the requirements to obtain a robust model for log K pe-w. Van der Waals interactions of the molecule tend to favor the PE phase, and polar interactions of the molecule favor the water phase. The PE phase is the most sensitive to polarizable chemicals compared to other commonly used passive sampling polymeric phases such as polydimethylsiloxane, polyoxymethylene, and polyacrylate. For op-LFERs, the PE phase is better represented by the hexadecane phase than by the octanol phase. A computational method based on the conductor-like screening model for real solvents theory did good job in estimating log K pe-w for chemicals that were neither very hydrophobic nor very hydrophilic in nature. Our models can be used to reliably predict the log K pe-w values of simple neutral organic chemicals. This study provides insights into the partitioning behavior of PE samplers compared to other commonly used passive samplers.
Over the past 3 decades, passive sampling devices (PSDs) have been
widely used to scout organic chemicals in air, water, sediments, and
biotic phases.[1] The increasing popularity
of passive sampling techniques among analytical chemists may be attributed
to the facts that passive samplers provide cleaner extracts, improved
detection limits, ease of storage, and archiving of samples.[2]To environmental chemists, passive sampling
brings value because
PSDs bio-mimic the passive uptake of truly dissolved concentrations
(Cfree)[3] of
chemicals in the environment.[4]Cfree is considered as a more accurate endpoint
of chemical exposure than given by the total concentration measured
using conventional sampling methods.[5] At
environmental levels, measuring Cfree is
equivalent to determining the chemical activity of a contaminant.[6] At equilibrium, chemical activities between multiple
phases such as a whole organism (or its compositional components such
as lipids, proteins, and carbohydrates) and a reference phase (e.g.,
water and air) are equal; using appropriate partition coefficients,
we can calculate concentrations in different compartments of interest
and evaluate the bioaccumulation disequilibrium.[6−8] On a fringe
side, laboratory experimentalists have started preferring passive
dosing methods, which use the same polymeric phases as used in passive
sampling, to determine exposure,[9] toxicity,[10] bioconcentration,[11] and speciation and fractionation[12] of
hydrophobic organic chemicals in complex systems. Passive dosing offers
a tight control on the exposure concentrations of hydrophobic chemicals
in laboratory experiments involving multiple phases.[13] Taken together, environmental chemists can gain insights
about bioavailability, ecotoxicity, bioaccumulation, and biomagnifications
of organic contaminants using low-cost and low-tech PSDs.[14]In field, several types of PSDs have shown
promise in monitoring
organic pollutants in environmental waters.[1] These PSDs include polydimethylsiloxane (PDMS),[15] polyoxymethylene (POM),[16] polyacrylate
(PA),[17] ethylene–vinyl acetate,[18] semipermeable membrane devices,[19] high-density polyethylene (HDPE),[20] and low-density polyethylene (PE).[20] However,
PE is a cheap and widely available material with proven robustness
for long-term monitoring of organic pollutants.[21] These organic pollutants include diverse chemical families
such as organochlorine pesticides (OCPs), polycyclic aromatic hydrocarbons
(PAHs), nitro-PAHs, polychlorinated biphenyls (PCBs), polybrominated
diphenyl ethers (PBDEs), alkyl benzenes, and alkyl phenols.[22−26]The partition coefficient between the PE and water phase (Kpe–w) is required to calculate the concentration
(Cfree) of organic pollutants in environmental
waters (eq ).[27]where Cpe is the
equilibrium concentration of contaminants accumulated in the PE passive
sampler deployed in water. However, the experimental Kpe–w data are not available beyond few hundred
chemicals. Experimental methods are expensive, laborious, and difficult
especially for hydrophobic chemicals. Consequently, environmental
analysts resort to different estimation methods to compute Kpe–w.The estimation approaches
mostly include one-parameter (op-) and
poly-parameter (pp-) linear free energy relationships (LFERs).[21] Theoretically speaking, log Kpe–w is a free energy related property (eq ), which may be related
to other free energy properties to predict log Kpe–w.where R is the universal
gas constant, T is the temperature, and Δpe–wG is the Gibbs free energy change
for the transfer of solute in the PE–water system.Previously,
free energy properties such as subcooled pure liquid
solubility and partition coefficients for octanol–water (log Kow) and hexadecane–water (log Khexadecane–w) systems are used to develop
op correlations for the estimation of log Kpe–w.[21] However, such relationships between
two partitioning properties work accurately only if the same type
of interactions governs both properties.[28] This is because one parameter can explore only one of many types
of intermolecular interactions involved in a partitioning.[29] Thus, op-LFERs are limited to specific chemical
classes and cannot be applied to estimate property for chemicals belonging
to different chemical class.[8,29]To explore the
entire spectrum of interactions governing the partitioning
of diverse chemicals, both specific and nonspecific intermolecular
interactions need to be taken into account.[8] In other words, the total partitioning Δpe–wG is a linear combination of free energy changes
due to van der Waals (Δpe–wGvdW) and polar interactions (Δpe–wGpolar) (eq ).[30]Abraham
and co-workers successfully linked the van der Waals and
polar interactions of chemicals to their macroscopic partitioning
properties. They demonstrated that not more than five to six intermolecular
interaction parameters are required to develop LFERs for diverse partitioning
properties.[31−35] Such pp-LFERs are also referred to as Abraham solvation models (ASMs).
According to ASM (eqs and 5)[36,37] and other variants
proposed by Goss (eq )[38] and van Noort (eq ),[39] equilibrium
partition coefficients (log K) of nonionic chemicals for a system of two phases, x and y, can be described by following
general expressionswhere E defines the polarizability
of the solute in excess of that of a comparably sized n-alkane, S parameter blends the electrostatic polarity
and polarizability of the solute;[40,52] and A and B are the parameters depicting hydrogen
bond donating and hydrogen bond accepting capacities of the solute,
respectively. V is the McGowan molecular volume of
solute i and L is the hexadecane–air
partition coefficient of the solute at 25 °C. Small letters c, e, s, a, b, l, and v are
coefficients specific to each two-phase partitioning system xy. Generally, for liquid–liquid partitioning processes,
the lL term is ignored (eq , henceforth referred to as the ESABV model),
and for gas–liquid partitioning, the vV term
would not be considered (eq , henceforth referred to as the ESABL model).[37] Goss developed a single equation, keeping both terms, vV and lL, and ignoring the eE term in the model to describe both the liquid–liquid and
gas–liquid partitioning (eq , henceforth referred to as the SABVL model).[38] Van Noort proposed that for air–organic
solvent partitioning, both vV and eE terms can be excluded without any loss of statistical quality (eq , henceforth referred to
as the SABL model).[39]In a recent
study, Zhu and co-workers[41] developed pp-LFER
based on three Abraham solute parameters (ASPs), A, B, and V, using a diverse
set of 254 chemicals. Their pp-LFER explained 78% of variance in the
experimental log Kpe–w training
data (n = 203) and exhibited the root-mean-square
error (rmse) of 0.59 log unit compared to the experimental log Kpe–w training data. However, authors
did not discuss if they evaluated the role of other ASPs, E, S, and L while developing
their pp-LFER. Furthermore, the authors did not report if they used
experimental or calculated data for ASPs to develop pp-LFER. In another
study, Zhu and co-workers[42] developed pp-LFER
based on the theoretical parameters such as average molecular polarizability,
dipole moment, and the net charge of the most negative atoms as a
proxy of hydrogen bonding interaction using a diverse set of 191 chemicals.
The reported rmse for this equation is 0.60 log unit. Authors successfully
validated their pp-LFER using an external set of 48 chemicals. However,
this model requires the quantum-chemically computed theoretical descriptors,
which are not quickly accessible to common users of passive samplers.The octanol–water partition coefficient (log Kow) is the most widely used parameter to develop op-LFERs
for many passive sampling phases.[15,21,33,43] The octanol–water
partition coefficient-based LFERs are developed for the PE–water
system.[21] However, octanol is reckoned
not to be a good solvent to represent PE phase due to its semipolar
trait.[44,45] Generally, such relationships are good only
for single chemical family and cannot be applied to diverse chemicals.[21,27,44,46]Long-chain n-alkanes such as n-hexadecane are expected to be a more appropriate proxy of the PE
phase than n-octanol. The solvation similarity between n-hexadecane–water and PE–water systems has
been reported for the chlorinated organic chemicals.[47] Hale and co-workers developed op-LFER between n-hexadecane–water and PE–water systems for 14 OCPs.[44] Sacks and Lohmann reported an op-LFER between
the partition coefficients of triclosan and alkyl phenols for the n-hexadecane–water (log Khexadecane–w) and PE–water systems.[45] The accuracy
of these op-LFERs was better than the ones based on linear relationships
between log Kow and log Kpe–w for chemical families having semipolar traits.The LFER approach requires calibration and validation with reliable
experimental data. Generally, the parameters used in the LFERs are
not readily available. For instance, the experimental data for ASPs
are available only for 8000 chemicals.[48] However, quantum-chemical approaches generally do not require parameterization.
A computational solvation approach based on the COSMO-RS (conductor-like
screening model for real solvents) theory[49] has been widely used to predict thermodynamic properties such as
activity coefficients, solubility, partition coefficients, vapor pressure,
and free energy of solvation.[50] The COSMO-RS
integrates continuum solvation model and surface interaction methods
to calculate the chemical potential of molecules in a variety of solvents,
mixtures, and polymers. It considers the solvent as a dielectric continuum,
in which solute molecule is embedded in cavity of its size and shape,
to calculate the surface charge density of the molecules. The solute–solvent
interaction energies are computed based on the interactions of surface
segments.[51] The partition coefficients
are computed through fast statistical thermodynamics of interacting
molecular surface segments. COSMOtherm (COSMOlogic GmbH & Co.
KG) is a predictive tool which is used to implement COSMO-RS.[52]Goss evaluated the performance of COSMOtherm
to predict partition
coefficients of neutral organic compounds for several polymers used
in analytical chemistry.[53] The author demonstrated
the utility of COSMOtherm in selecting polymers for various applications
in analytical chemistry. For instance, linear regression between experimental
and COSMOtherm-predicted values resulted in R2 = 0.95 and rmse = 0.44 log unit for 146 chemicals. In another
study, Loschen and Klamt[54] calculated the
solubilities and partitioning of gaseous and liquid solutes in different
polymers. They demonstrated the importance of free volume correction
in improving the prediction of partitioning properties of polymers.
For PE–water partition coefficients, when regressed against
the experimental values, the predicted values calculated without a
combinatorial term resulted in R2 = 0.87
and rmse = 1.83 log units for 10 chemicals. With incorporation of
free volume term and estimated crystalline fraction of 0.67 for PE,
this comparison improved yielding R2 =
0.86 and rmse = 1.13 log units for 10 chemicals.The objective
of the study was to (i) develop op and pp-LFERs,
(ii) understand what types of intermolecular interactions are relevant
to the PE–water system, and (iii) evaluate the performance
of COSMO-RS model compared to LFERs for the prediction of partition
coefficient for PE–water samplers.
Materials
and Methods
Data Source and Analysis
The experimental
values of low-density PE to water partition coefficient (Kpe–w) comprising 270 chemicals were taken from
compilations given in the previous works.[41,44] These data are shown as Table S1 in the Supporting Information. Multiple values reported in the literature were
averaged. Experimental ASPs, which were available for 214 chemicals
(Dataset-I), were imported from the Helmholtz Centre for Environmental
Research-Linear Solvation Energy Relationships (UFZ-LSER) database[48] (Table S2). Dataset-I
was used to train and validate the pp-LFER equations. The estimated
values of ASPs for the remaining 56 chemicals were calculated using
the UFZ-LSER Database tool[48] (Table S3). However, only 26 out of the 56 remaining
chemicals (Dataset-II) were retained for additional validation of
the pp-LFER equations (Section ) as they were found within the acceptable application domain[48,55] (Table S4). We excluded the estimated
values which were outside of the application domain.Experimental
values of log Kow, ranging from 2.7 to
8.6 log unit, were taken from the literature[41,56] (Table S2). The data of log Khexadecane–w values, which spanned about 7 orders
of magnitude, were estimated using the ASM equation reported in the
literature[57] (Table S2). These partition coefficients were used to develop op-LFERs
for the PE–water passive sampler.System coefficients
of pp-LFER equations for common passive samplers
such as PDMS–water,[58] POM–water,[35] PA–water,[33] and technical solvents such as octanol–water and[59] hexadecane–water[57] systems were taken from the literature to compare the similarities
of PE with these systems (Table S5).The experimental Kpe–w dataset,
used to develop the pp-LFER equations, was diverse and spanned over
6 orders of magnitude (Table S1) of Kpe–w. The dataset contains chemicals
with diverse structures and comprises the compounds from families
such as n-alkane, linear alkyl benzenes, chlorinated
benzenes, OCPs, PAHs, nitro-PAHs, PCBs, PBDEs, polyhalogenated dibenzo-p-dioxins (PHDDs), and dibenzofurans (PHDFs).A dataset
comprising 238 chemicals were used to evaluate the COSMOtherm
model. This dataset (Table S6) traversed
wider ranges of ASPs than by the Dataset-I (Table S1). However, experimental data were available only for 47
chemicals (Table S7) to compare with the
COSMOtherm predictions. For the remaining 189 chemicals, Kpe–w values were estimated using the pp-LFER equation
developed in this study.
Statistical Analysis
The statistical
tests such as correlation analysis, principle component analysis (PCA),
multiple linear regression, and cross-validation tests were performed
using R Program (3.5.3)[60] and XLSTAT (2018).[61] Significant and optimum number of descriptors
for each model was selected using step-wise multiple linear regression
based on the statistical criteria such as Student’s t-test, Akaike information criteria, adjusted R2, and variance inflation factor. Uncertainties around
regression coefficient, which correspond to a 95% probability interval
of the fitted values, were estimated using the bootstrap method with
1000 synthetic resampling. Where the intercept was found to be statistically
indistinguishable from zero, the regression was repeated with an intercept
set equals to zero.To define the domain of applicability, and
to find the influential values in the training datasets, the regression
diagnostics such as Studentized Residuals, Hat Values, and Cook’s
Distance were applied to each model (Tables S9–S12). The bootstrap resampling method was used to estimate the standard
errors of beta coefficients for all models. Cross-validation tests
such as K-fold, repeated K-fold
(r = 10), leave one-out, and bootstrapping (n = 1000) were performed for each model to evaluate the
robustness (Sections S1–S4). The
PCA test was used to find the contribution of all variables in the
principal components. Applicability domains (ADs) of all predictive
models were assessed using influence plots, which reflect leverages,
studentized residuals, and Cook’s D values.
Chemicals that have values for these metrics above the critical thresholds
were flagged as outside the AD.
COSMOtherm
Calculation
Cosmo files
comprising the screening charge densities for the select 47 chemicals,
for which the experimental log Kpe–w values were available (Table S7), were
generated using the TURBOMOLE package[62] at the B-P86 density functional level with a def-TZVP basis set.
Cosmo files for 189 chemicals (Table S8) were taken from the COSMOtherm database. These cosmo files were
then inputted in the COSMOtherm software using the BP_TZVP_C30_1701
parametrization to calculate log Kpe–w.
Results and Discussion
Appropriateness
of Dataset for the Development
of ASM
To begin with, we investigated if the calibration
dataset fulfills the necessary requirements stipulated in the literature[63] for developing a robust ASM equation. First,
we note that the calibration dataset almost follows the normal distribution
which traversed more than 6 orders of magnitude (102.02–108.4) for Kpe–water values (Figure a).
Second, all ASM parameters in the calibration dataset, except the
hydrogen bond donating ability, span a reasonable range of values
(Figure b). Only 5
of 214 chemicals in the calibration datasets have nonzero values for A parameter, which still fulfills the condition of having
a minimum four solute per parameter[63] required
to account for the true dependence of Kpe–water on acidity. Third, the solute set must not exhibit significant covariance
among the ASM parameters. As evident from the correlogram (Figure c), there is a moderate
correlation between E, S, V, and L parameters. A and B are fairly uncorrelated parameters. This
overlap in information is expected as ASPs do not represent orthogonal
information in terms of fundamental intermolecular interactions such
as dispersion, Keesom, Debye, and hydrogen bond forces. For example, S parameter represents a mixture of polarity and polarizability.[40,63] Similarly, the polarizability and induction effects are rooted in
the definitions of L and E.[63] In fact, polarizability is linearly correlated
with the size of the molecule.[63] As a result,
descriptors correlate with each other due to mixing of fundamental
intermolecular interactions. In the presence of correlations among
the solute parameters, the choice of training set becomes critical
to develop meaningful ASM-type equations.
Figure 1
Partitioning variability
embedded in the training set (n = 214) in terms of
ASPs and PE–water partition
coefficient. Top panels show the distribution of (a) PE–water
partition coefficients (log KPE–water) and (b) of ASPs. Lower panels show (c) the correlogram of the correlation
matrix obtained, respectively, by Pearson correlation analysis and
(d) the percent contribution of variables in the first five dimensions
obtained by PCA of the 214 × 8 matrix [ESABVL log Kpe–water]. In Panel (c),
red and purple color, respectively, show positive and negative correlations
between the pair. The value of correlation coefficient for each pair
of variables is shown in each square. In panel (d), color intensity
and size of the circle are proportional to the percent contribution
of a variable. In panel (d), Dim. stands for dimension.
Partitioning variability
embedded in the training set (n = 214) in terms of
ASPs and PE–water partition
coefficient. Top panels show the distribution of (a) PE–water
partition coefficients (log KPE–water) and (b) of ASPs. Lower panels show (c) the correlogram of the correlation
matrix obtained, respectively, by Pearson correlation analysis and
(d) the percent contribution of variables in the first five dimensions
obtained by PCA of the 214 × 8 matrix [ESABVL log Kpe–water]. In Panel (c),
red and purple color, respectively, show positive and negative correlations
between the pair. The value of correlation coefficient for each pair
of variables is shown in each square. In panel (d), color intensity
and size of the circle are proportional to the percent contribution
of a variable. In panel (d), Dim. stands for dimension.To further investigate the impact of overlap in chemical
information
among solute parameters and their relationship with log Kpe–w, we performed PCA on the 214 × 7 matrix
[ESABVL log Kpe–w]. The first five dimensions account for more than 99% information
for this matrix. The total variance in the dataset due to ASM parameters
is partitioned in almost all the orthogonal dimensions obtained after
PCA (Figure d). This
is also indicative of the absence of multi-collinearity (i.e., one
of the solute parameters might be a linear function of a combination
of other parameters). The lack of multi-collinearity is further corroborated
by the variance inflation factors (VIFs) obtained after regression
of log Kpe–w against the ASM parameters.
Similarly, the contribution of log Kpe–w is significant in all the orthogonal dimensions. This indicates
that all ASM parameters are important to explain the variance in the
log Kpe–w of the dataset. Taken
together, these results show that the calibration dataset fulfills
all the requirements to obtain a robust ASM equation for log Kpe–w.
pp-LFERs
for PE–Water Partitioning
We calibrated and evaluated
four variants of the pp-LFER model
based on Abraham solvation parameters. Models based on the ESABL and
SABL pp-LFERs are presented in the Supporting Information (Sections S5 and S6). Two models, ESABV and SABVL
pp-LFERs, are discussed here in detail.
ESABV
Model
The ESABV model, based
on the relationship of log Kpe–w with a linear combination of E, S, A, B, and V parameters,
successfully described 99% of variation in the log Kpe–w data (eq and Figure a).where n, R2, Radj2, rmse,
and F-statistic denote the number of experimental
values of log Kpe–w, coefficient
of determination, adjusted coefficient of determination, root-mean-squared
error, and the overall Fisher statistic, respectively.
Figure 2
Linear regression plot
for the (a) ESABV model, (b) SABVL model,
and (c) op-LFER model based on hexadecane–water partition coefficient
and (d) op-LFER model based on octanol–water partition coefficient.
Upper and lower green lines bound 95% confidence interval around the
regression line, which is shown as dotted black line in the middle.
Blue diamonds (◇) and purple circle (◯)represent the
data points in the training set and validation set, respectively.
Linear regression plot
for the (a) ESABV model, (b) SABVL model,
and (c) op-LFER model based on hexadecane–water partition coefficient
and (d) op-LFER model based on octanol–water partition coefficient.
Upper and lower green lines bound 95% confidence interval around the
regression line, which is shown as dotted black line in the middle.
Blue diamonds (◇) and purple circle (◯)represent the
data points in the training set and validation set, respectively.For external validation, the PPM full dataset (n = 214, Table S2) was split
randomly into
a training set (n = 174, Table S13) and a validation set (n = 40, Table S14). Equation was derived using the training set of 174
compounds.The fitting coefficients and
regression statistics of eq are statistically similar to eq . For equation, the largest
VIF value among ASM parameters was 4.8 for S parameter,
which was lesser than the cutoff value of 10 for multicollinearity.[64] Predictions of eq compared favorably with the experimental data for
the external validation set (Rexternal2 = 0.962 and rmseexternal = 0.296).The results of four types of cross-validation tests for eq were in good agreement
with each other, indicating that the model is internally valid for
predictive purpose (Section S1). These
tests exhibited rmse and R2 in the range
of 0.33–0.39 log unit and 0.915–0.930, respectively.The values of ASM parameters for 26 chemicals, for which experimental
ASM data were not available, were calculated from UFZ-LSER Website
(Dataset-II). With the input of these calculated ASM parameters (n = 26), eq predicted values were in good agreement with the experimental values
of log Kpe–w (rmse = 0.58 log units).
In this comparison, largest residuals were observed for chemicals
that were either very hydrophobic or had significant hydrogen bonding
interactions (Figure S1).Application
domain for the ESABV model was established by using
influence plot (Figure a). The following six chemicals were flagged as influential observations
in this plot: PCB 209, aldrin, methoxychlor, n-dodecylbenzene, n-octylphenol, and triclosan. These chemicals are either
very hydrophobic or have substantial hydrogen bonding interactions.
Leverages for n-octylphenol and triclosan were significantly
higher than other chemicals in the dataset. Leverage higher than the
critical values generally indicates possible issues with predictor
variables, which in our case are the ASM parameters for these solutes.
The values of ASM parameters for some chemicals, especially for very
hydrophobic and complex molecules, might be in considerable error.[65,66] Using different sets of published experimental solute parameters
from several different sources, estimated water–air partition
coefficient for pesticides exhibited rmse values ranging from 0.54
to 1.39 log units when compared to experimental data.[66] For triclosan, the percent relative standard deviations
for the values of S, A, and B reported in the literature are 17, 36, and 33%, respectively
(Table S15).
Figure 3
Comparison of experimental
values with the predicted values obtained
by inputting calculated values of ASPs in the (a) ESABV model and
(b) SABVL model for chemicals for which the experimental values of
ASPs were not available. The dotted line in the middle shows 1:1 agreement,
and upper and lower dotted lines indicate 1:2 agreement between the
experimental and predicted values.
Comparison of experimental
values with the predicted values obtained
by inputting calculated values of ASPs in the (a) ESABV model and
(b) SABVL model for chemicals for which the experimental values of
ASPs were not available. The dotted line in the middle shows 1:1 agreement,
and upper and lower dotted lines indicate 1:2 agreement between the
experimental and predicted values.Higher than the critical studentized residual values for PCB 209,
aldrin, methoxychlor, and n-dodecylbenzene may indicate
a problem in measured log Kpe–w values for these compounds. These compounds are considerably hydrophobic
with log Kow ranging from 5.08 to 8.65.
Water–solvent partitioning properties of hydrophobic chemicals
fall near the extreme limits of analytical techniques and therefore
suffer from significant uncertainties in the reported values.[67]
SABVL Model
Model equation based
on the relationship of log Kpe–w with a linear combination of S, A, B, V, and L parameters
(SABVL model) successfully explained more than 99% of variation in
the log Kpe–w data (eq ).For eq , the largest VIF value among ASM parameters was 8.4
for L parameter, which was acceptable being below
the cutoff value of 10 for multicollinearity.[64] Higher VIF value observed for SABVL model can be attributed to a
higher correlation coefficient of L with S (r = 0.81) and V (r = 0.87) parameters as compared to the correlation coefficient
of E with S (r =
0.80) and V (r = 0.61) parameters
used in the ESABV model (Figure c). The ESABV model performed slightly better than
the SABVL model by showing the value of rmse, which was 0.017 log
unit lower than rmse for the SABVL model.External validation
was performed by splitting the full dataset
(n = 214) randomly into a training set (n = 174, Table S16) and a validation set
(n = 40, Table S17). Equation was obtained using
the training set of 174 compounds.The fitting coefficients and
regression statistics of eq are statistically similar
to eq . For eq , the largest VIF value
among ASM parameters was 8.2. Predictions of eq for the external validation set were in
good agreement with the experimental values (Rexternal2 = 0.9364, rmseexternal = 0.456)
(Figure b).Cross-validation tests (Section S2)
yielded rmse and R2 values in the range
of 0.35–0.48 log unit and 0.893–0.925, respectively,
indicating robustness of the model.With the input of calculated
ASM parameters (n = 26), eq predicted
values compared with the experimental values of log Kpe–w with an rmse = 0.75 log units for Dataset-II.
Higher rmse observed for the SABVL model than for the ESABV model
might be attributed to the errors in the calculation of the L parameter. Inter-laboratory variation for the experimental
value of the L parameter for hydrophobic chemicals
has been reported significant.[31]The application domain of the SABVL model is somewhat similar to
that of the ESABV model (Figure b). Four chemicals, PCB 209, endrin, n-octylphenol, and triclosan, were flagged as influential observations
in the influence plot. Only endrin was not flagged as influential
in the ESABV model (Figure a). These deviations may be rationalized either due to error
in the response or predictor variables used for these chemicals. For
example, percent relative standard deviations found for the values
of L reported in different literature sources for
endrin and triclosan were more than 9% (Table S18).
Figure 4
Application domains for (a) ESABV model, (b) SABVL model,
(c) op-LFER
model based on hexadecane–water partition coefficient and (d)
op-LFER model based on octanol–water partition coefficient.
Studentized residuals are plotted against hat-values, and the size
of circle is proportional to Cook’s distance. Hat-value is
a measure of leverage. Observation 170, 182, 184, 185, 186, 189, 199,
208, 209, 210, and 212 flagged in the above figures correspond to
PCB 209, aldrin, endrin, endrin aldehyde, endrin ketone, methoxychlor,
toluene, n-dodecylbenzene, n-octylphenol,
pentane, and triclosan, respectively.
Application domains for (a) ESABV model, (b) SABVL model,
(c) op-LFER
model based on hexadecane–water partition coefficient and (d)
op-LFER model based on octanol–water partition coefficient.
Studentized residuals are plotted against hat-values, and the size
of circle is proportional to Cook’s distance. Hat-value is
a measure of leverage. Observation 170, 182, 184, 185, 186, 189, 199,
208, 209, 210, and 212 flagged in the above figures correspond to
PCB 209, aldrin, endrin, endrin aldehyde, endrin ketone, methoxychlor,
toluene, n-dodecylbenzene, n-octylphenol,
pentane, and triclosan, respectively.
Types of Interaction Dominating PE–Water
Partitioning
The LSERs shed light on the type and relative
importance of the interactions that govern contaminant uptake by passive
samplers from the water phase. This is an important information for
the choice of the polymeric phase for passive sampling.In the
PE–water system, solute parameters such as size of the molecule
(V), polarizability (E), and dispersion
interactions (L) favor the transport of the contaminants
in the direction of the PE phase. On the other hand, contaminants
that have stronger polar interactions such as hydrogen bonding interactions
(A and B) and polarity/polarizability
(S) tend to favor the water phase over the PE phase
(Figure a). These
relative transport tendencies are somewhat similar to other PDMS,
POM, and PA passive sampling phases.
Figure 5
Intermolecular interactions governing
PE–water and other
related partitioning systems. (a) Standardized regression coefficient
obtained by regressing log KPE–water against all six ASPs indicate the relative contribution of ASPs
in controlling the partitioning of chemicals between the water phase
and the PE phase. Error bars indicate the 95% confidence interval
around the mean. (b) A biplot between the first two orthogonal dimensions
obtained by PCA on the system coefficients for PE–water, PDMS–water,
POM–water, PA–water, octanol–water, and hexadecane–water
partitioning systems. Red lines are the projections of system coefficients
on the two-dimensional space. The first principal dimension (Dim 1)
and the second principal dimension (Dim 2) account for 65.47 and 19.51%,
respectively, of variance for these partitioning systems.
Intermolecular interactions governing
PE–water and other
related partitioning systems. (a) Standardized regression coefficient
obtained by regressing log KPE–water against all six ASPs indicate the relative contribution of ASPs
in controlling the partitioning of chemicals between the water phase
and the PE phase. Error bars indicate the 95% confidence interval
around the mean. (b) A biplot between the first two orthogonal dimensions
obtained by PCA on the system coefficients for PE–water, PDMS–water,
POM–water, PA–water, octanol–water, and hexadecane–water
partitioning systems. Red lines are the projections of system coefficients
on the two-dimensional space. The first principal dimension (Dim 1)
and the second principal dimension (Dim 2) account for 65.47 and 19.51%,
respectively, of variance for these partitioning systems.Comparison of pp-LFER coefficients for the PE–water
system
with those for the PDMS–water, POM–water, PA–water
system indicates that the PE phase has the largest e coefficient among these passive sampling phases. Consequently, the
PE phase shows highest affinity for the chemicals that are highly
polarizable. This is also indicative from the biplot obtained by the
PCA on the system coefficients of these passive samplers (Figure b). The PE–water
system stands out in terms of the e coefficient.
System coefficients corresponding to the specific interactions (i.e., s, a, and b) for the PA
and POM phases are higher than for PE and PDMS phases. This indicates
that the PA and POM are more polar than the PDMS and PE passive sampler
phases. The PDMS phase almost occupies the central position in the
biplot, which indicates that it has good affinity for chemicals with
a wide range of polarities.
op-LFER Model
The op-LFER model based
on the linear relationship between log Kpe–w with log Kow and with log Khexadecane–w was re-examined with statistical diagnostics
and for comparison with the pp-LFER model.
Hexadecane–Water
LFER
Comparison
of the system coefficients for Abraham solvation equations for PE–water
and hexadecane–water systems (Table S5) reveals that the cost of cavity formation in hexadecane is approximately
1 order of magnitude lower than in the PE phase. This is expected
as PE is a rigid polymeric matrix compared to hexadecane, which is
a liquid solvent.[21] The hydrogen bond donating
trait for PE relative to water (a = −1.82)
is about 58 times higher than that for hexadecane (a = −3.59). The hydrogen bond accepting trait for PE relative
to water (b = −4.04) is about 6 times higher
than that for hexadecane (b = −4.87). The
polarity/polarizability trait of PE is s = −1.30
compared to s = −1.62 for hexadecane. The
polarizability trait of PE (e = 1.00) is about twice
that of hexadecane (e = 0.67). These tendencies can
easily be discerned from the biplot (Figure b). The Euclidean distance between the hexadecane–water
system and the PE–water system is larger than the distance
between PE–water and PDMS–water systems. This indicates
that the solvation trait of PE is more similar to PDMS than to hexadecane.
Taken together, hexadecane shows a stronger nonpolar trait than the
PE phase.Linear regression of log KPE–water against log Khexadecane–water resulted in the following form of op-LFER (eq ).Higher rmse observed for eq compared to the ESABV
model indicates that the one parameter
is not enough to account for the total variance of log KPE–water data. The performance of eq is significantly better than the
previously reported hexadecane–water LFERs,[21,45] which were developed using smaller and lesser diverse datasets.For external validation, the full dataset (n =
214) was split randomly into a training set (n =
174, Table S19) and a validation set (n = 40, Table S20). Equation was derived using
the training set of 174 compounds.The fitting coefficients and
regression statistics of eq are statistically similar
to eq . Predictions
of eq compared favorably
with the experimental data for the external validation set (Rexternal2 = 0.9336, rmseexternal = 0.390).The results of four types of cross-validation tests
for eq were in good
agreement
with each other, indicating that the model is internally valid for
predictive purpose (Section S3). These
tests exhibited rmse and R2 in the range
of 0.40–0.41 log unit and 0.898–0.901, respectively.Three chemicals were found to be outside of the application domain
for the hexadecane–water LFER, which are PCB 209, aldrin, and n-dodecylbenzene. These chemicals were flagged in the ESABV
model, as well, and are either very hydrophobic or have substantial
hydrogen bonding interactions.
Octanol–Water
LFER
As evident
from the respective ASM equations (Table S5), the solvation character of octanol is significantly different
from that of the PE. The hydrogen bond acidity coefficient for the
PE–water system (a = −1.82) is about
62 times higher than that for the octanol–water system (a = 0.03). The hydrogen bond basicity trait for PE relative
to water (b = −4.04) is about 6 times higher
than that for octanol (b = −3.46). The polarity/polarizability
trait of the PE–water system is more pronounced (s = −1.30) as compared to that for the octanol–water
system (s = −1.05). The polarizability trait
of PE (e = 1.00) is about twice that of octanol (e = 0.56). The cost of cavity formation in octanol is approximately
0.4 log unit lower than in the PE phase. This is further corroborated
by the biplot (Figure b) where the octanol–water system occupies the position in
the direction of a, b, and s coefficients, and the PE–water system is oriented
more toward the e coefficient. These differences
in the system coefficients for the two systems imply that the fugacity
of polar and semipolar chemicals from water phase to PE phase is lower
than from water phase to octanol phase.The linear regression
of log KPE–water against log Kow resulted in the following model equation.The performance of eq is considerably lower than that of eq , which further substantiates the notion
that the octanol phase is not a good representation of the PE phase.To evaluate the external validity of octanol–water op-LFER,
the full dataset (n = 214) was split randomly into
a training set (n = 174, Table S21) and a validation set (n = 40, Table S22). Equation was obtained using the training set of 174
compounds.The fitting coefficients and
regression statistics of eq are statistically similar
to eq . Predictions
of eq compared favorably
with the experimental data for the external validation set (Rexternal2 = 0.8716, rmseexternal = 0.442).The results of four types of cross-validation tests
for eq were in good
agreement
with each other, indicating that the model is internally valid for
predictive purpose (Section S4). These
tests exhibited rmse and R2 in the range
of 0.40–0.42 log unit and 0.892–0.896, respectively.The application domain for the octanol–water LFER was established
by using an influence plot. The following five chemicals were flagged as
influential observations in this plot: PCB 209, endrin aldehyde, endrin
ketone, toluene, and pentane.We compared the predictions from
our four models with those from
chemical class-specific octanol–water LFER equations reported
for PCB and PAH families in Ghosh et al.[68] For 117 PCBs, the comparison of experimental log KPE–water values with the predicted log KPE–water values from the ESABV model
(eq ), hexadecane–water
LFER (eq ), octanol–water
LFER (eq ), and PCB-specific
octanol–water LFER equation (eq S1), respectively, resulted into rmse values of 0.30, 0.30, 0.26, and
0.29 log units (Table S23). The residuals
(predicted–experimental values) observed for superhydrophobic
chemicals such as PCB 207 and 209 were more than 1 order of magnitude
for all models. For 47 PAHs, the agreement between experimental log KPE–water values and predicted log KPE–water values from the ESABV model
(eq ), hexadecane–water
LFER (eq ), octanol–water
LFER (eq ), and PAH-specific
octanol–water LFER (eq S2) exhibited
rmse values of 0.23, 0.41, 0.29, and 0.26 log units, respectively
(Table S24). As expected, the op-LFERs
trained on datasets comprising specific chemical families (PCBs and
PAHs) perform better than the op-LFERs trained using diverse multiclass
chemicals.
COSMOtherm Predictions
COSMOtherm
did a reasonable job in predicting the PE–water partition coefficients
for diverse chemicals. Overall, COSMOtherm predictions were in good
agreement with the experimental values of all chemicals (n = 47, rmse = 0.52 log unit) (Figure a and Table S7). A regression
line with an intercept set equal to zero between the two sets of values
resulted in slope = 0.96 with R2 = 0.99.
For chemical families such as PCBs (n = 28), OCPs
(n = 7), PBDEs (n = 3), and hydrocarbons
(n = 8), the comparison of predicted values with
the experimental values yielded rmse values of 0.50, 0.24, 0.22, and
0.82 log unit, respectively. In general, the largest deviations (residual
= predicted log Kpe–w –
experimental log Kpe–w) were for
the compounds that were significantly hydrophobic in nature with the
exceptions of n-pentane (residual = 1.47) and n-hexane (residual = 0.99).
Figure 6
Comparison of COSMOtherm predictions with
(a) experimental and
(b) ESABV model predicted log Kpe–w values. Lower panel (c) shows the variation of deviations (residual
= ESABV model predicted log Kpe–w – COSMOtherm log Kpe–w) as a function of log Kow.
Comparison of COSMOtherm predictions with
(a) experimental and
(b) ESABV model predicted log Kpe–w values. Lower panel (c) shows the variation of deviations (residual
= ESABV model predicted log Kpe–w – COSMOtherm log Kpe–w) as a function of log Kow.For a set of 192 chemicals—for which the experimental
values
of log KPE–water were not available—agreement
between the predictions of the ESABV model and those of COSMOtherm
was good (R2 = 0.93) but had a systematic
bias (slope = 0.77). Overall, comparison of COSMOtherm predictions
with those of the ESABV model resulted in rmse = 0.97 log unit (Figure b, Table S8). COSMOtherm systematically underpredicted the log KPE–water values with respect to predictions
of the ESABV model for hydrophobic chemicals. For most of hydrophilic
chemicals, COSMOtherm overpredicted values as compared to the ESABV
model. This indicates that COSMOtherm predictions require some sort
of adjustment factor to offset these systematic biases.The
deviations (residual = ESABV model predicted log Kpe–w – COSMOtherm log Kpe–w) from the two models were plotted against
log Kow to inspect if the deviations depend
on hydrophobicity of chemicals (Figure c). For hydrophilic compounds, deviations are positive,
implying that COSMOtherm underestimates the values as compared to
the ESABV model. These hydrophilic compounds have significant hydrogen
bonding interaction traits (Table S6).
On the other hand, COSMOtherm overestimated log Kpe–w values as compared to the ESABV model for
hydrophobic chemicals. A similar pattern was found in the literature[69] for COSMOtherm calculation of PDMS–water
partition coefficients, where huge deviations were found for compounds
that were either very hydrophobic or were having a significant hydrogen
bonding interaction.From a practical standpoint, we recommend
users to prefer the ESABV
model (eq ) on account
of its better predictive performance compared to other models developed
in this study for estimation of log Kpe–w values. For this purpose, users can find the experimental values
of ASPs from the UFZ-LSER database.[48] Where
experimental values are not available, users may input the calculated
ASPs from the UFZ-LSER site,[48] as long
as the estimated values fall within the acceptable application domain.
Where reliable estimates of ASPs are not available, we recommend the
use of hexadecane–water LFER model (eq ) instead of octanol–water LFER model
(eq ). This is due
to better solvation proximity between hexadecane–water and
PE–water systems than is observed between octanol–water
and PE–water systems. For eq , users can input values of log Khexadecane–water that are either found in experimental
database[57] or can quickly be predicted
using the estimation approaches listed elsewhere.[44,70] The COSMO-RS model is of value to users who have access to supercomputers
and commercial software such as TURBOMOLE and COSMOtherm to compute
log Kpe–w for the chemicals for
which reliable experimental or estimated solute descriptors are not
available for input into the above LFERs. However, the COSMO-RS model
requires corrections for the free volume and crystallinity of the
PE, which might not be always readily accessible. Finally, users are
advised to carefully evaluate the quality of input data to these models
especially for the compounds that are either very hydrophobic or have
substantial hydrogen bonding interaction attribute. As discussed above,
these compounds might fall outside the application domain of these
calibrated models.In summary, we successfully trained op- and
pp-LFER models using
datasets, which are large and structurally more diverse than reported
in previous studies. These models were subjected to rigorous validation
tests to verify their predictive robustness. Overall, pp-LFERs performed
better than op-LFERs in describing the partitioning variability for
the PE–water system. The COSMOtherm model, which is based on
the COSMO-RS theory, predicted log Kpe–w values with reasonable accuracy for chemicals that are moderately
hydrophobic in nature. These models also provide insights about the
partitioning behavior of neutral organic chemicals during PE–water
exchange, which may help evaluate the utility of PE water passive
samplers for the contaminants of interest.