Literature DB >> 27363449

The First Attempt at Non-Linear in Silico Prediction of Sampling Rates for Polar Organic Chemical Integrative Samplers (POCIS).

Thomas H Miller¹, Jose A Baz-Lomba², Christopher Harman³, Malcolm J Reid², Stewart F Owen⁴, Nicolas R Bury⁵, Kevin V Thomas², Leon P Barron¹.

Abstract

Modeling and prediction of polar organic chemical integrative sampler (POCIS) sampling rates (Rs) for 73 compounds using artificial neural networks (ANNs) is presented for the first time. Two models were constructed: the first was developed ab initio using a genetic algorithm (GSD-model) to shortlist 24 descriptors covering constitutional, topological, geometrical and physicochemical properties and the second model was adapted for Rs prediction from a previous chromatographic retention model (RTD-model). Mechanistic evaluation of descriptors showed that models did not require comprehensive a priori information to predict Rs. Average predicted errors for the verification and blind test sets were 0.03 ± 0.02 L d(-1) (RTD-model) and 0.03 ± 0.03 L d(-1) (GSD-model) relative to experimentally determined Rs. Prediction variability in replicated models was the same or less than for measured Rs. Networks were externally validated using a measured Rs data set of six benzodiazepines. The RTD-model performed best in comparison to the GSD-model for these compounds (average absolute errors of 0.0145 ± 0.008 L d(-1) and 0.0437 ± 0.02 L d(-1), respectively). Improvements to generalizability of modeling approaches will be reliant on the need for standardized guidelines for Rs measurement. The use of in silico tools for Rs determination represents a more economical approach than laboratory calibrations.

Entities: Chemical

Mesh：

Substances：

Year: 2016 PMID： 27363449 PMCID： PMC5089532 DOI： 10.1021/acs.est.6b01407

Source DB: PubMed Journal: Environ Sci Technol ISSN： 0013-936X Impact factor: 9.028

Introduction

Contamination of the aquatic environment with herbicides, pesticides, pharmaceuticals, and personal care products (PPCPs), among other contaminants, has been the focus of environmental monitoring campaigns over the last two decades. Reported concentrations and associated adverse effects of these contaminants has led to the introduction of legislative procedures to monitor and assess risk associated with pollutants, such as the EU water framework directive and the EU registration, evaluation, authorization, and restriction of chemicals (REACH).[1,2] High frequency sampling campaigns often involve the use of grab or composite sampling, but are practically difficult and costly to manage for monitoring longer-term fluctuations in contaminant concentrations in the aquatic environment. These methods are also often labor intensive with respect to sampling and can lead to considerable cost during instrumental analysis. More recently, however, the development and use of passive sampling devices (PSDs) is increasing due to their capability for a time-integrated approach to averaging contaminant concentrations in surface waters as well as influent and effluent wastewater over extended periods.[3] PSDs minimize sample preparation and allow in situ enrichment of analytes which may potentially reduce limits of quantification in comparison to those achieved by point sampling.[4] Passive sampling devices in some fields are well-established, such as use of semipermeable membrane devices (SPMD) for organochlorines[5] and other similarly hydrophobic compounds.[6−8] However, one type of PSD which is emerging currently is the polar organic chemical integrative sampler (POCIS). These samplers have been used to determine the occurrence of a range of chemically diverse, and comparatively polar to moderately nonpolar compounds.[9−13] However, for quantitative studies, POCIS suffer from some limitations, mainly relating to the reliability of derived estimations of the sampling rates (Rs) from experimental measurements, as well the lack of a well-developed performance reference compound (PRC) exposure correction method.[14−16] One further hindrance is that reported sampling rate data are few and methods for their estimation vary which leads to limited transferability across other locations or studies.[17] Given the time-intensive nature of determining Rs experimentally, it is possible that computational modeling approaches could offer a solution that would enable prediction of sampling rate data for compounds without the need for experimental determination. A previous investigation by Stephens et al.[18] evaluated the use of an empirical method (Sherwoods correlation) to determine PSD kinetic parameters for a limited number of compounds showing maximum errors of +40 and −20% for the estimation of the aqueous boundary layer mass transfer coefficient (kf). In contrast to empirical methods for estimating specific parameters, quantitative structure–property relationship (QSPR) models are becoming more frequently used in ecotoxicology where a set of x variables are used to predict a response, y.[19] The variables are often molecular descriptors that cover constitutional, topological, geometrical, and physicochemical properties which can then be used to model a desired output. Models can vary from simple linear regression approaches to complex nonlinear functions where such models are often designed by machine learning methods. Two well-known machine learning methods are support vector machines (SVMs) and artificial neural networks (ANNs) and have been used successfully in related areas such as the prediction of bioconcentration factors (BCFs), octanol–water partition coefficients (logP) and biosolid/water partition coefficients (Kd), as well as for suspect compound screening via prediction of chromatographic retention time.[20−27] The use of SVMs for environmental applications is still in its infancy and substantial programming capability is required for routine application. On the other hand, ANNs are well-known and more user-friendly software has been available for many years. ANNs comprise a layered structure (normally three), each with a different purpose. The input layer contains the molecular descriptor data for each compound for training, verification and blind testing and the output layer is the response. The hidden layer sits in between and contains several nodes, and often multiple sublayers of such nodes, where linear or nonlinear functions are used to relate the descriptors to the output layer. The residual errors are monitored and reduced by using iterative algorithms which adjust weights associated with the nodes in the hidden layer. Thus, such modeling approaches could greatly increase the applicability of POCIS in environmental monitoring studies through bypassing the need for laboratory and in situ calibrations. The aim of this work was to investigate the potential of ANNs to model and predict Rs for POCIS devices for a range of pharmaceuticals, endocrine disrupting chemicals, pesticides, herbicides and drugs of abuse. The objectives were to identify suitable analyte molecular descriptors to build, train and test a range of suitable model types and architectures and then finally to externally validate the approach for predicting Rs for several compounds which were, for comparison, determined in parallel by laboratory calibration. To the authors’ knowledge, this represents the first study to draw together, harmonize and predict the published Rs data for ionizable pharmaceutical compounds on POCIS. Ultimately, where such tools can provide adequate predictions using new data generated in the future, this approach could reduce the analytical burden of laboratory estimations of Rs.

Materials and Methods

Selection of Data Sets, Molecular Descriptors and ANN Models

A working data set derived from the literature (2007-present) was used to build, train and optimize models for Rs prediction on POCIS. A total of n = 73 compound Rs data were derived from Fauvelle et al.[28] and Morin et al.,[24] which were generated using similar experimental conditions to give the largest combined data set of all studies. Compounds included herbicides, pesticides, endocrine disrupting compounds and pharmaceuticals. Where duplicate compound Rs data existed, both values were removed entirely from the data set (six compounds). Generally, in these cases Rs differed and it was uncertain which value was correct or whether an average was appropriate for modeling. Simplified molecular input line entry system (SMILES) strings were generated from Chemspider (Royal Society of Chemistry, UK). Using these, n = 185 molecular descriptors were generated from Parameter Client freeware (Virtual Computational Chemistry Laboratory, Munich, Germany) and an additional n = 16 descriptors were from ACD laboratories Percepta software (Advanced Chemistry Development Laboratories, ON, Canada). Two models were generated using two separate sets of descriptors covering constitutional, topological, geometrical and physicochemical properties that were investigated for their comparative prediction performance. The first subset of 24 descriptors (see Supporting Information (SI), Table S1) was generated using a genetic feature selection algorithm to produce the genetically selected descriptor model (GSD-model). Genetic feature selection algorithms follow evolutionary concepts to convert input descriptors into binary strings, in this case to prioritise descriptors for Rs prediction. Using a process similar to natural selection, prioritised strings are crossed to form a new population of strings. The generational “breeding” of strings produced an optimized selection of input variables for application to prediction. The parameters for the GA were as follows; population = 100, generation = 100, mutation rate = 0.1 and crossover rate = 1. In an alternative approach, a much simpler descriptor data set previously used to model elution from reversed-phase liquid chromatography (RPLC) stationary phases was investigated to assess any improvement (see SI Table S2).[23,25] This model is referred to as the retention time descriptor model (RTD-model). POCIS devices contain a divinylbenzene and N-vinylpyrrolidone copolymer, which enabled dual polar and nonpolar interactions for retention. As retention on reversed-phase chromatographic columns is governed predominantly by hydrophobic interactions too, it is possible that these same descriptors will also be important in passive sampling. No retention data was available for the studies by Fauvelle et al.[28] and Morin et al.[24,29] However, correlation between Rs and 21 corresponding retention times (tR) gathered on a C18 stationary phase in a study by Bade et al.[30] showed a weak relationship (R = 0.472). For both descriptor subsets, several network types were tested for predictive ability using Trajan 6.0 neural network software (Trajan Software Ltd., Lincolnshire, UK) and these included radial basis function (RBF), generalized regression neural networks (GRNNs) and multilayer perceptrons (MLPs). Following training and optimization using both data sets, the GSD- and RTD-models were produced. The GSD-model architecture was a four-layer MLP with 24 descriptors in the input layer (independent variables); two hidden layers containing 17 and 14 nodes and the dependent variable output layer (Rs). Training involved two types of algorithms, the first was back-propagation (BP) and the second was conjugate gradient descent (CGD). The data set was split into 45:14:14 cases for the training, verification and test subsets (optimized). The RTD-model architecture was also a four-layer MLP using both BP and CGD. The first and fourth layers were the inputs (using the set of descriptors previously used for chromatographic retention modeling) and outputs (Rs), respectively, and the second and third layers (hidden layers) contained 14 and 9 nodes, respectively. The division of cases included 51 compounds for training, 11 compounds for verification and 11 compounds for blind testing (optimized).[27] All cases were randomly selected to avoid bias. The verification data set was used to characterize network predictive performance during training and also to allow regularisation to prevent overfitting. The test set was then used to validate the model to ensure that the model generalized well to new cases. The optimized models were selected based on the lowest errors and consistency across the training, verification and test subsets.

Laboratory Calibration of Sampling Rates to Test Model Generalizability

Sampling rates (L d–1) were determined using a static renewal method over a 14 day exposure period and in a similar manner to data in the literature which were used for modeling here.[28] Briefly, 3 L of high-density polyethylene vessels were filled with ultrapure water, the pH adjusted to 7.6 with 20 mg L–1 NaHCO3 and spiked with the mixture of respective compounds to expose the POCIS. Each vessel contained three POCIS devices for exposure to an aqueous-based standard mixture of 200 ng L–1 of each target compound (solvent <0.001%). This standard solution was prepared and replaced daily in 3 L volumetric flasks to maintain the nominal concentration. Following this, all three POCIS were removed from each vessel at day 4, 7, and 14, rinsed with ultrapure water and frozen at −20 °C. Extraction of POCIS sorbents was performed using a wash phase of 5 mL of ultrapure water and then elution using 5 mL of MeOH. Eluate was dried under nitrogen at 35 °C for 40 min. The dried residue was then reconstituted in 0.5 mL of starting mobile phase. The analysis of the benzodiazepines was performed on an Acquity UPLC system coupled to a Xevo G2 S QTOF mass analyzer (Milford, MA) with an online Oasis HLB Direct Connect HP loading column. Analyte separation was performed on an Acquity UPLC BEH C18 column (1.7 μm, 50 × 2.1 mm) from Waters (Milford, MA) at 50 °C. Gradient elution (0.6 mL min–1) for analyte separation was with 0.1% (v/v) formic acid in water (phase A) and 0.1% formic acid in methanol (phase B). Full method details for the laboratory calibration experiments and analysis are given in the SI.

Results and Discussion

Rs Prediction Using a GSD-Model

Following genetic feature selection, a 24–17–14–1 MLP yielded the best performance using 24 input descriptors with R2 = 0.8800, 0.8694, and 0.8050 for training, verification and blind test sets respectively (sum of squared residual errors were 0.084, 0.062, and 0.116, respectively). Therefore, this model initially seemed quite promising for application to prediction of Rs for new compounds (Figure a). Many shortlisted descriptors were derived from topological indices, but some others were expected to have more importance for this application, such as those that describe molecular hydrophobicity. These include the octanol–water partition coefficient (logP) and the distribution ratio between octanol and water (logDow). The latter takes into account the ionised proportion of a compound at a particular pH and is dependent on the logP and the pKa of all ionizable functional groups in a molecule. An investigation by Booij et al.,[31] demonstrated that uptake rates in SPMDs correlated well with logP where Rs ≈ P–0.044. Correlations between logP and Rs has also been observed for POCIS devices.[32−34] Assessment of the collinearity with Rs (SI Table S7) showed rather unsurprisingly for so many ionizable compounds that logDow had, by far, the highest correlation (R = 0.59), but was insufficient by itself to describe sorption to POCIS sorbents. To the authors knowledge, no previous investigations have used logDow to model Rs, although it has been weakly correlated with Rs.[35] Furthermore, interinput descriptor collinearity also existed and especially for constitutional descriptors such as the number of non-H bonds (nBO) and the sum of conventional bond orders (SCBO); as well as topological descriptors such as log Narumi simple topological index (Snar), second Zagreb index (ZM2). Pearson’s coefficients were ≥0.8 for these descriptors with at least eight other descriptors. Therefore, though genetic algorithms shortlisted useful descriptors for potential Rs modeling here, back-interpretation of model sensitivity to descriptor data for derivation of mechanistic understanding of physicochemical POCIS uptake mechanisms would be limited. However, as a tool to predict Rs, the training set overall displayed good accuracy within 22% of the measured value on average. In comparison, the verification and test subsets were predicted on average within 19% of their measured values showing consistency across all subsets (SI, Figure S1). For particular blind test cases, however, some notably large inaccuracies were observed such as for sotalol (80% inaccuracy) where the lower hydrophobicity of this molecule may explain poorer correlation with Rs.[31] Larger errors were also recorded for acetochlor ethanesulfonic acid (40%), diclofenac (39%) and sulcotrione (38%). The verification subset contained two largely inaccurate predictions (2,4-dichlorophenoxyacetic acid at 59% and timolol at 31%), but all remaining compounds were within 20% of measured Rs. Larger inaccuracies may be related to poor learning from selected training data. For example, mesotrione had a 59% inaccuracy to the measured value in the training set which may explain the poor prediction of another structurally similar compound, sulcotrione, in the test set. Overall, inaccuracy was most prevalent for sulfonate-containing compounds where genetic selection did not sufficiently prioritise descriptors for this portion of cases for reliable Rs prediction. As the number of available cases expands, genetic selection of descriptors may improve for such compounds in the future. It is also unclear whether sulfonate bearing molecules are subject to steric and/or repulsive forces arising from the PES membrane. Furthermore, larger inaccuracies (>30%) in the full data set generally corresponded to compounds with Rs < 0.1 such as 2,4-dichlorophenoxyacetic acid, sotalol, sulcotrione and nicosulfuron. However, when predictive accuracy was plotted against Rs for all compounds, no correlation was observed for other compounds with Rs < 0.1 (SI, Figures S2 and S3).

Figure 1

Measured Rs against predicted Rs for (a) the GSD-model and (b) the RTD-model. Crosses, circles and triangles are the training, verification and test subsets, respectively. Open circles and triangles indicate predicted inaccuracies of >30% of the measured value.

Rs Prediction Using a RTD-Model

The correlation of predicted versus measured Rs for the RTD-model is shown in Figure b. The error (sum squared) for the subsets were 0.092, 0.062, and 0.121 for the training, verification and test sets, respectively. The model was, again, a four-layered MLP with a 16:14:9:1 architecture. Generally, acceptable correlations were achieved for the training, verification and blind test sets (R2 = 0.8511, 0.9085, and 0.6425, respectively) though this model performed slightly worse (training and test) than the GSD-model. The training subset showed several larger errors which corresponded to the compounds t-butylphenol (149%), 2,4-dichlorphenol (41%), and simazine (41%). The compound sulfamethoxazole showed an 81% overestimation of its experimentally determined Rs. As discussed earlier, this large inaccuracy was also reflected in the GSD-model which showed an overestimation of 230% for sulfamethoxazole which also bears a sulfonate group. Overall, however, the model showed relatively good predictions of Rs (mean absolute error for training set = 15%; and for both verification and test subsets = 22%). The average error ± standard deviation across the verification and blind test subsets was 0.03 ± 0.02 L day–1 showing acceptable overall predictive accuracy for Rs. Atenolol, the compound with the lowest Rs, yielded poor prediction accuracy (predicted Rs = 0.067, measured Rs = 0.025) which was initially thought to be due its higher polarity in comparison to others selected for this study. However, no correlation was observed between predictive accuracy and logDow (Figure ). Average predictive mean error of the verification and blind test sets both reduced to ∼15% upon removal of the atenolol data-point. Importantly, as very polar compounds are generally not retained well by n-vinylpyrrolidone-co-divinylbenzene-based polymer sorbents, inaccuracy in measured Rs may be compound specific as a result, which in turn may contribute to RTD-model prediction errors. This highlights the lack of consistent measurements available for training of such models for predictive purposes. Nonetheless, considering this performance alongside the potential for inaccuracy in Rs data from different laboratory calibrations, predictions using these models were considered reasonable.

Figure 2

RTD-model residual plot of predicted Rs values for the verification and test subset only, ordered in parentheses by their ascending distribution ratio values between octanol and water (logDow). Circles and triangles represent the verification and test subset, respectively. The measured Rs values are displayed in parentheses on the x-axis. 2,4-D (2,4-dichlorophenoxyacetic acid), ESA (ethanesulfonic acid), OA (oxanilic acid), and IPPMU (isoproturon-monodemethyl).

Model Interpretation and Descriptor Contribution to Rs Prediction

Given the level of multicollinearity observed for GSD-model descriptors, a sensitivity analysis could only be performed to identify the relative contribution of each descriptor to predictions in the RTD-model. This was represented as the error ratio, i.e. the ratio between the model error using all descriptors and the model error when one descriptor was removed. However, like in the GSD-model, the use of sensitivity analysis to further mechanistic understanding of sorption processes should be approached with caution if some individual descriptors display multicollinearity (please refer to SI Tables S1–S3 for full descriptor details and data). The logDow, the Moriguchi octanol–water partition coefficient (MlogP), the Ghose-Crippen octanol–water partition coefficient (AlogP) and the number of Benzene rings (nBnz) were the top four descriptors used by the RTD-model (Figure ). This is in agreement with Bäuerlein et al., who showed that hydrophobicity and pi-pi interactions (e.g., via benzene rings) were important for adsorption to HLB sorbents in batch experiments[36] and which can also affect diffusion. Other important descriptors were the number of triple bonds (nTB; error ratio = 1.2165), number of five-membered rings (nR05; error ratio = 1.2041) and number of nine-membered rings (nR09; error ratio = 1.4544). The importance of the n-membered ring descriptors could be attributed to molecular size and flexibility thus affecting the diffusivity of molecules through the water boundary layer (WBL), PES membrane (pore = 0.1 μm) or pores of the HLB copolymer (80 Å).[37−39] A previous investigation showed that size descriptors were also important for predicting soil sorption coefficients for pesticides.[39] We also previously showed these descriptors were important for ANN-based predictions of pharmaceutical sorption to soils and sludge.[40] In addition to those mentioned above, the number of carbons (nC), number of oxygens (nO) and hydrophilic factor (Hy) also showed that they were important to the RTD-model. Hy relates to the number of hydrophilic groups in the molecule such as hydroxyls, thiols and sulfonates. As polar surface area has been previously shown to influence interactions with HLB sorbents, it is logical that hydrophilicity/polarity related descriptors would have some importance.[36] Several authors have suggested that diffusion is the main factor governing uptake rates in PSDs.[41] We have attributed the importance of the descriptors mainly to sorbent interactions so far, but it is also possible that these same descriptors could relate to diffusion processes due to the number of molecular properties that will affect it including dipole moments, polarizability, molecular size (including hydration radius) and electrostatic charge.[42] The genetic feature selection algorithm did not select some recognized diffusion-related descriptors, such as molecular weight as a simple example. However, it did select other descriptors that showed interdependencies on factors affecting diffusion such as number of atoms, number of rotatable bonds, and electrotopological states. Rs has been attributed mainly in the past to diffusion processes in partition samplers such as silicone rubbers.[41] The portion of Rs governed by diffusion in adsorption samplers using HLB-type sorbents in POCIS remains unclear especially whether sorption of analytes via hydrogen bonding, dipole–dipole, dipole–induced dipole, van der Waals and pi-pi interactions plays a more significant role. It is also possible that the models presented here for Rs prediction could be developed and improved further with additional or alternative descriptors, such as diffusion coefficients. However, adding such descriptors may introduce a greater uncertainty into the model as estimates can be based on several different approaches.[43−45] In addition, diffusion coefficients will be affected by numerous environmental factors and hydrodynamic conditions that would be difficult to replicate or control in situ. Inclusion of larger numbers of descriptors to cover all the processes involved will likely inhibit model generalizability. Indeed, ANNs learn more holistically, making predictions possible without the need for such comprehensive a priori information. However, such a holistic approach obviously limits deeper understanding of the precise contribution of individual mechanisms involved in POCIS.

Figure 3

Sensitivity analysis of the optimized RTD-model. Acronyms: nDB/nTB = number of double/triple bonds; nC/nO = number of carbon/oxygen atoms; nR04-nR09= number of 4–9 membered rings; Ui = unsaturation index; Hy = hydrophilic factor; nBnz = number of benzene-like rings; MlogP/AlogP = Moriguchi/Ghose-Crippen logarithm of octanol–water partition coefficient; logD7.6 = logarithm of distribution ratio between octanol and water at pH 7.6. By comparison, the GSD-model featured many more topological and geometrical descriptors than in the RTD-model. These descriptors showed multicollinearity and therefore the sensitivity analysis could not be performed reliably (Table S7). Simply adding noncollinear descriptors to the RTD-model is also disadvantageous at this point. As the number of descriptors increases, overfitting of data is more likely to occur and would require significantly more case examples for valid application.[46] Model complexity will also limit the ability of the network to generalize when predicting unknown compounds therefore a smaller number of descriptors (and nodes in the hidden layer(s)) is ultimately more beneficial.

Reproducibility of Predicted and Experimentally Determined Rs

Model performance and generalizability is limited by the quality of input data. Measured Rs can differ considerably even within calibration studies performed in the same laboratory. The largest variance in measured Rs used corresponded to diclofop which had a 60% relative standard deviation (RSD), n = 15.[28] Many of the reported Rs values vary by more than 2-fold depending on the methodology used for their estimation.[29] Although pH and temperature during collection of both sets of data used herein were similar, the type of calibration experiment applied was slightly different (flow-through and static renewal) therefore the resulting differences in the Rs estimates from each investigation could have affected the performance of a model. For the six compounds common to both calibration methods that were removed from the original data set used for model optimization, the absolute difference in Rs was 0.088 ± 0.072 L day–1 between measurements. The average % RSD of measured Rs data used herein was 11% (mean deviation was ±0.017 L day–1) (Figure ). In 45% of all cases, the % RSDs of predicted Rs across triplicate network trained ab initio were better than the % RSDs of the measured data. For several specific cases, such as DET, DIA, diclofop and ioxynil, the experimental variation was relatively large when compared to the variation in predicted Rs. Such deviation in experimentally derived sampling rates can be attributed to several similar factors to those already discussed above (e.g., temperature, pH, flow rate etc.). Figure shows that for cases which had poor predictive accuracy with respect to the mean true value, such as for acetochlor ESA, the standard deviation of the predicted Rs overlapped with the reported experimental variance. A review by Harman et al., suggests that literature reported Rs data should only be considered as an approximation.[47] However, in the absence of a standardized method for POCIS calibration, either in the laboratory or in the field, it would seem that Rs modeling in this way offers similar accuracy and precision without being labor or resource intensive. Calibration experiments for each compound can take several weeks, requiring a large mass of reference material for static renewal and flow through experiments, or very frequent and accurate water sampling for in situ experiments. Furthermore, given that models developed herein are derived from a very limited number of training cases, any new reported Rs data generated by similar methods to those used herein will likely enable better generalizability in the future, as was observed with retention time predictions in reversed-phase liquid chromatography.[27]

Figure 4

Comparison of the measured and predicted Rs values and their respective variances against the variance in predicted Rs (n = 73) from replicate RTD-models (n = 3). Inset: Optimized 16–14–9–1 model architecture. Compounds in bold represent the verification and blind test cases. All others were used for model training.

External Application to Rs Prediction

To further support the application of the optimized modeling approach, Rs data for several additional benzodiazepines were experimentally determined in our laboratory using a similar approach. In the previous sections, blind test compounds were structurally diverse which is logical for testing model accuracy.[48] However, for this experiment, structural similarity was deliberately chosen to externally test its discriminative power. Despite this similarity, it was expected that measured Rs could be different on POCIS given their slight differences in chromatographic retention on C18 phases. The retention order of the benzodiazepines was as follows: oxazepam (3.26 min) nitrazepam (3.26 min), clonazepam (3.29 min), lorazepam (3.29 min), alprazolam (3.31 min), midazolam (3.32 min), flunitrazepam (3.37 min) and diazepam (3.58 min). As discussed previously, measurement of Rs often suffers from some imprecision. The calibration experiment performed here was not exempt from this either. Two compounds, lorazepam (Rs: 0.205 L d–1) and oxazepam (Rs: 0.226 L d–1), were originally present in the training set and verification set respectively during model development. The Rs values for these compounds were experimentally determined again here to characterize the variance between the selected calibration method used here and the method by Morin et al.[24] The Rs determined here varied by approximately 0.1 L d–1 for both compounds (lorazepam: 0.302 L d–1 and oxazepam: 0.327 L d–1). This observation showed again that the difference in calibrations between flow-through and static renewals is not negligible and was an unavoidable limitation of the calibration experiment used here. Standard deviations for the six compounds ranged from ±0.024 to ±0.055 L day–1 (n = 9). Overall, the average RSD for all compounds was 20 ± 6% (flunitrazepam: 19%; clonazepam: 13%; nitrazepam: 13%; midazolam: 23%; diazepam: 23%; and alprazolam: 29%) and this variance was consistent with other studies.[29] As shown in Figure , both the GSD- and RTD-models predicted Rs well to within the measured value for all six compounds. The two largest errors in the RTD-model corresponded to those substances with the highest Rs variance (diazepam and alprazolam at 16 and 17%, respectively), but the four remaining compounds showed little inaccuracy (≤5%). In terms of absolute inaccuracy of the measured Rs however, examination of the RTD-model residual errors showed that for all compounds except nitrazepam, that predictions were slightly overestimated. The GSD-model performed worse by comparison (Figure ). The two largest errors corresponded to nitrazepam and midazolam that were 37% and 43% inaccurate, respectively. The remaining compound inaccuracies were alprazolam (28%), clonazepam (18%), diazepam (10%) and flunitrazepam (19%). The average absolute error for the GSD-model predictions was 0.0437 L d–1 and all compounds were predicted within ±0.075 L d–1. By contrast, the RTD-model had an average absolute error of 0.0145 L d–1 for these benzodiazepines (and Rs for all compounds were predicted within 0.03 L day–1). These predictions again demonstrated that predicted Rs were similar enough to those determined by experimental determination to be practical.

Figure 5

Residual plot of the predicted Rs values for the GSD-model (cross) and RTD-model (diamond) for external prediction validation (as a blind test application) using six additional benzodiazepines. Measured Rs values are displayed in parentheses. Passive sampling for nonhydrophobic compounds is mainly used for screening purposes and as a semiquantitative technique. Furthermore, in situ exposures are difficult to quantify accurately as laboratory calibrations may not translate well into field Rs due to several factors such as biofouling and other matrix- or environmentally related effects on diffusion, for example. In addition, for reliable quantification the performance reference compound approach has limited availability and application for polar passive sampling due to the strong retention of analytes on HLB sorbents.[49] However, modeling approaches could potentially overcome these limitations if models were built from in situ calibration data. It is also possible that estimation of Rs by in silico approaches may offer a viable alternative for compounds where Rs data cannot be estimated by field studies due to poor correlation of concentrations in water to sample mass on the PSD.[35] Lastly, the two different approaches to the molecular descriptor selection presented show acceptable predictive accuracy for polar compound passive sampling. However, the use of descriptors derived for tR prediction in a model for Rs prediction holds significant potential for application to new compounds based solely on their SMILES strings by simultaneously allowing preliminary identification (by tR and high resolution m/z, for example) and estimation of Rs using the same descriptors.

35 in total

1. The problem of overfitting.

Authors: Douglas M Hawkins
Journal: J Chem Inf Comput Sci Date: 2004 Jan-Feb

2. Sorption behavior of charged and neutral polar organic compounds on solid phase extraction materials: which functional group governs sorption?

Authors: Patrick S Bäuerlein; Jodie E Mansell; Thomas L Ter Laak; Pim de Voogt
Journal: Environ Sci Technol Date: 2012-01-04 Impact factor: 9.028

3. The challenge of exposure correction for polar passive samplers--the PRC and the POCIS.

Authors: Christopher Harman; Ian John Allan; Patrick Steven Bäuerlein
Journal: Environ Sci Technol Date: 2011-10-06 Impact factor: 9.028

4. Modelling and field application of the Chemcatcher passive sampler calibration data for the monitoring of hydrophobic organic pollutants in water.

Authors: Branislav Vrana; Graham A Mills; Michiel Kotterman; Pim Leonards; Kees Booij; Richard Greenwood
Journal: Environ Pollut Date: 2006-08-21 Impact factor: 8.071

5. Prediction of chromatographic retention time in high-resolution anti-doping screening data using artificial neural networks.

Authors: Thomas H Miller; Alessandro Musenga; David A Cowan; Leon P Barron
Journal: Anal Chem Date: 2013-10-04 Impact factor: 6.986

6. In situ calibration of a passive sampling device for selected illicit drugs and their metabolites in wastewater, and subsequent year-long assessment of community drug usage.

Authors: Christopher Harman; Malcolm Reid; Kevin V Thomas
Journal: Environ Sci Technol Date: 2011-06-07 Impact factor: 9.028

7. Use of mixed-mode ion exchange sorbent for the passive sampling of organic acids by polar organic chemical integrative sampler (POCIS).

Authors: Vincent Fauvelle; Nicolas Mazzella; François Delmas; Karine Madarassou; Mélissa Eon; Hélène Budzinski
Journal: Environ Sci Technol Date: 2012-12-06 Impact factor: 9.028

8. Calibration and field evaluation of Polar Organic Chemical Integrative Sampler (POCIS) for monitoring pharmaceuticals in hospital wastewater.

Authors: Emilie Bailly; Yves Levi; Sara Karolak
Journal: Environ Pollut Date: 2012-12-17 Impact factor: 8.071

9. POCIS passive samplers as a monitoring tool for pharmaceutical residues and their transformation products in marine environment.

Authors: M J Martínez Bueno; S Herrera; D Munaron; C Boillot; H Fenet; S Chiron; E Gómez
Journal: Environ Sci Pollut Res Int Date: 2014-11-11 Impact factor: 4.223

10. Polymer selection for passive sampling: a comparison of critical properties.

Authors: Tatsiana P Rusina; Foppe Smedes; Jana Klanova; Kees Booij; Ivan Holoubek
Journal: Chemosphere Date: 2007-02-28 Impact factor: 7.086

4 in total

1. Development of quantitative structure-property relationship model for predicting the field sampling rate (R_s) of Chemcatcher passive sampler.

Authors: Yaqi Wang; Huihui Liu; Xianhai Yang
Journal: Environ Sci Pollut Res Int Date: 2020-01-14 Impact factor: 4.223

2. Quantitative structure-property relationships for predicting sorption of pharmaceuticals to sewage sludge during waste water treatment processes.

Authors: L Berthod; D C Whitley; G Roberts; A Sharpe; R Greenwood; G A Mills
Journal: Sci Total Environ Date: 2016-12-03 Impact factor: 7.963

3. Calibration and application of the Chemcatcher® passive sampler for monitoring acidic herbicides in the River Exe, UK catchment.

Authors: Ian Townsend; Lewis Jones; Martin Broom; Anthony Gravell; Melanie Schumacher; Gary R Fones; Richard Greenwood; Graham A Mills
Journal: Environ Sci Pollut Res Int Date: 2018-06-25 Impact factor: 4.223

4. Prediction of bioconcentration factors in fish and invertebrates using machine learning.

Authors: Thomas H Miller; Matteo D Gallidabino; James I MacRae; Stewart F Owen; Nicolas R Bury; Leon P Barron
Journal: Sci Total Environ Date: 2018-08-10 Impact factor: 7.963

4 in total