Literature DB >> 35420030

Applicability Domain of Polyparameter Linear Free Energy Relationship Models Evaluated by Leverage and Prediction Interval Calculation.

Abstract

Polyparameter linear free energy relationships (PP-LFERs) are accurate and robust models employed to predict equilibrium partition coefficients (K) of organic chemicals. The accuracy of predictions by a PP-LFER depends on the composition of the respective calibration data set. Generally, extrapolation outside the domain defined by the calibration data is likely to be less accurate than interpolation. In this study, the applicability domain (AD) of PP-LFERs was systematically evaluated by calculating the leverage (h) and prediction interval (PI). Repeated simulations with experimental data showed that the root mean squared error of predictions increased with h. However, the analysis also showed that PP-LFERs calibrated with a large number (e.g., 100) of training data were highly robust against extrapolation error. For such PP-LFERs, the common definition of extrapolation (h > 3 hmean, where hmean is the mean h of all training compounds) may be excessively strict. Alternatively, the PI is proposed as a metric to define the AD of PP-LFERs, as it provides a concrete estimate of the error range that agrees well with the observed errors, even for extreme extrapolations. Additionally, published PP-LFERs were evaluated in terms of their AD using the new concept of AD probes, which indicated the varying predictive performance of PP-LFERs in the existing literature for environmentally relevant compounds.

Entities: Chemical

Keywords: QSAR; QSPR; applicability domain; extrapolation; linear solvation energy relationship; partition coefficient; perfluoroalkyl substances; property prediction

Mesh：

Substances：
Organic Chemicals
Water

Year: 2022 PMID： 35420030 PMCID： PMC9069697 DOI： 10.1021/acs.est.2c00865

Source DB: PubMed Journal: Environ Sci Technol ISSN： 0013-936X Impact factor: 9.028

Introduction

Equilibrium partition coefficients largely determine the environmental distribution of organic contaminants and are crucial parameters for environmental risk assessments. Among various models, the linear solvation energy relationships (LSERs)[1] or, generally, polyparameter linear free energy relationships (PP-LFERs) that use Abraham’s solute descriptors have been confirmed to be accurate and robust for predicting partition coefficients.[2] The PP-LFERs cover the intermolecular interactions relevant to the phase partitioning of neutral organic compounds. Their successful environmental applications have been previously reviewed.[3,4] PP-LFERs are multiple linear regression models that typically use five solute descriptors. The following three types of equations are most often applied.[1,5] The symbols denote the following: K, partition coefficient; E, excess molar refraction; S, solute polarizability/dipolarity parameter; A, solute hydrogen (H)-bond donor property; B, solute H-bond acceptor property; V, McGowan’s molar volume; and L, logarithmic hexadecane/air partition coefficient. The lowercase letters are regression coefficients and are typically trained with several tens of compounds, for which experimental log K and the solute descriptors (i.e., E, S, A, B, V, and L) are available. The fitting of the PP-LFERs is high even to data that are highly diverse in size and polarity.[1,3] For solvent/water and solvent/air partition coefficients, the calibration typically results in a standard deviation (SD) of 0.2 or below for the log K values.[1] Partition systems that involve a heterogeneous phase (e.g., natural organic matter) can exhibit a lower quality of fit (SD, 0.3–0.5 log units).[3] PP-LFERs are derived from a multiple linear regression; therefore, their applicability domain (AD) is related to the training (calibration) set of compounds. Generally, extrapolation (i.e., prediction beyond a specific domain defined by calibration data) is likely to be less accurate than interpolation. Moreover, a long-range extrapolation is expected to be more error-prone than a short-range extrapolation. However, in a multidimensional space (here, 5 descriptors), it is not straightforward to define the terms “interpolation” and “extrapolation” and to establish a quantitative relationship between the extent of extrapolation and prediction accuracy. Notably, an extrapolation can be less accurate but is not necessarily inaccurate or unreliable. The required accuracy depends on the purpose of the model use, and extrapolation can be acceptable within the range where its accuracy is satisfactory. Among various approaches, the calculation of the leverages has been considered to define and evaluate the AD for linear regression models.[6−9] The leverage is a quantitative measure of the distance from the entire set of calibration data. Leverage calculation is applied to identify outliers within the calibration set, and it can also be used to quantitatively define extrapolation in the prediction. A large leverage value indicates a long distance from the calibration data in terms of explanatory variables and thus an extrapolation with the possibility of increased error. The prediction interval (PI) is the range of values where future data are expected to fall at a given frequency.[10] Typically, 95% or 99% PIs are calculated. Although PIs are frequently calculated for predictions by a simple linear regression model, they are not commonly presented for multiple linear regression models, including PP-LFERs. However, the PI can be more useful than the leverage, as the PI considers both the distance from the calibration set and the quality of the model fitting (see Section for more details). The purposes of this study are threefold: (i) to quantitatively demonstrate how the prediction accuracy of a PP-LFER decreases when moving away from a specific domain of calibration defined by the leverage, (ii) to compare actual prediction errors with error margins expected by PIs, and (iii) to evaluate several calibration sets for PP-LFERs in terms of their AD using a new concept of AD probes. On the basis of these, a discussion is presented on the definition and evaluation of AD for PP-LFER models. The information should also be helpful for the future development of PP-LFERs because it ensures an optimized calibration data set.

Methodology

Definition and Calculation of the Leverage and PI

The definition and calculation of the leverage and PI are described in full in the Supporting Information and only briefly here. The PP-LFER regression can be expressed in a matrix form as followswhere y is the vector of observations for log K, β is the vector of regression coefficients, ε is the error vector, and X is the design matrix consisting of a column of ones and the solute descriptors of n training compounds. The hat matrix (H) can be derived from X, and the diagonals of H (i.e., hii) are referred to as the leverages and infer the distance of each calibration compound from the others in terms of the solute descriptor combination. hii is between 0 and 1, and the sum of hii for the n training compounds is equal to the number of fitting parameters p, which is 6 for the PP-LFERs (including the regression constant). An overly high hii indicates that the respective calibration compound is an outlier in terms of its descriptors. Typically, hii = 3hmean is considered a threshold value,[6−9] where hmean is the mean of hii for all calibration compounds and is equal to p/n. To evaluate the extrapolation for compound j, which is not included in the calibration set, h is calculated aswhere xj is the column vector containing the solute descriptors of j. Analogous to the identification of outliers in the training set, h = 3hmean is typically considered the threshold value for extrapolation.[6−9] The PI of the PP-LFER can be expressed as [log Kj – Δ(log K), log Kj + Δ(log K)], where log Kj is the value of compound j predicted with eq (i.e., log Kj = xT β) and Δ(log K) is half the width of the PI. Δ(log K) is calculated as[10]where tα/2, is the two-tailed t-value for a given confidence level (α, e.g., 95%), number of training data (n), and number of independent variables (k; 5 for PP-LFERs). SDtraining is the standard deviation of the PP-LFER model fitted to the training data. Δ(log K) may be normalized to SDtraining, as In this study, the following two tests were performed to discuss the use of h and the PIs to delineate the AD of PP-LFERs.

Test 1: Comparison of Prediction Errors with h and the PIs

In the first test, the variation in actual prediction errors by PP-LFERs with h and the PIs was examined. Six experimental data sets of partition coefficients from the existing literature were used: octanol/water (Kow, n = 314),[11] air/water (Kaw, n = 390),[12] oil/water (Koilw, n = 247),[13] soil organic carbon/water (Koc, n = 79),[14] phospholipid liposome/water (Klipw, n = 131),[15] and bovine serum albumin/water (KBSAw, n = 82).[16] These data sets comprise a relatively large number of compounds and exhibit environmental and toxicological relevance. Kow, Kaw, and Koilw were partition coefficients between two homogeneous solvents, whereas Koc, Klipw, and KBSAw involved a heterogeneous or anisotropic phase. The K values and solute descriptors were obtained from the aforecited references, are listed in Tables S1–S6, and are summarized in Table S7. To evaluate the prediction accuracy, the K data of each set were divided into training and test sets. Training compounds were randomly selected from the entire data set. The number of the training compounds (ntraining) was 20, 30, 40, 50, 75, or 100. Rather small values of ntraining were also included in this test to simulate cases of insufficient calibration. The compounds that were not selected as training compounds were used as test compounds. The PP-LFER in the form of eq was calibrated with the training data and was used to predict log K for the test compounds. Prediction errors (predicted log K – experimental log K) were calculated and compared with h and Δ(log K). For each combination of the K set and ntraining, the cycle of “random generation of a training set”, “calibration of the PP-LFER”, and “prediction for the test set” was repeated 200 times. This number was arbitrary but appeared sufficient for stable results. Additionally, using the 200 calibrated PP-LFERs for each case, the log K values of per- and polyfluoroalkyl substances (PFASs) and organosilicon compounds (OSCs) were predicted. PFASs and OSCs possess extremely weak van der Waals interaction properties; thus, the E and L values are comparatively low for their molecular sizes.[17,18] Therefore, PP-LFERs have to be extrapolated to predict K values of PFASs and OSCs unless calibrated with these compounds.[18] PFASs and OSCs are not present in the data set of any considered PP-LFER and are used to evaluate the influences of extrapolation on the prediction accuracy. All calculations mentioned above were performed with R software.

Test 2: Evaluating Reported PP-LFERs with AD Probes

In the second test, h and PI calculation was applied to evaluate the AD of the reported PP-LFER equations. Here, n, SDtraining, and the solute descriptors of the calibration compounds were extracted from the existing literature and used to calculate h and PIs for 25 selected compounds (Table S8). These compounds, referred to as AD probes herein, were selected because of their wide variations in descriptor values, structural diversity, and environmental relevance. They represent aliphatic and aromatic compounds with varying molecular size and hydrogen (H)-bond interaction properties and include multifunctional polar compounds such as various pesticides and pharmaceuticals, a neutral PFAS, and an OSC. Experimental solute descriptors for the AD probes were obtained from the UFZ-LSER database and are listed in Table S8.[19] Test 2 did not require the experimental K values of the AD probes, and only solute descriptors were used for the calculation. An Excel file with a macro is available on the Web that calculates h, h/hmean, and Δ(log K) for the AD probes and any desired chemical based on the user-entered training data (https://doi.org/10.26434/chemrxiv-2022-qs03q). Note that there exist compounds with extreme descriptor values that are not covered by the 25 AD probes proposed here. For example, an antibiotic erythromycin (E = 2.90, S = 3.73, A = 1.25, B = 4.96, V = 5.773)[20] exhibits exceptionally high S, B, and V values. However, such compounds are rarely used for calibration and are thus highly likely to be out of the AD, which is clear without testing; therefore, compounds with extreme descriptor values were not included in the AD probe set.

Results and Discussion

Prediction Errors Compared to h and the PIs (Test 1)

Figure S1 shows the root mean squared errors (RMSEs) for training and testing sets randomly generated 200 times. The test compounds were grouped into several bins according to the h normalized to hmean (h/hmean) before the RMSEs were calculated. The observed RMSE for the test compounds increased with h for a given K data set and ntraining. The increasing trend of RMSE with h was particularly clear for simulations with small ntraining values (i.e., 20, 30). The increasing trend was sometimes unclear, or even an apparent decrease was seen (e.g., for log Koilw) for simulations with high ntraining values in a high h/hmean range, likely because a large ntraining resulted in a relatively small ntest, which may not be able to provide representative RMSEs, particularly for high h/hmean bins. In other words, the sample size was sometimes too small to derive accurate RMSEs for high h/hmean bins. To demonstrate the increase in RMSE with h/hmean more clearly, the RMSE values for the test data relative to the RMSE for the training data were calculated (Figures and S2). The relative RMSE generally increased with h/hmean but to a lesser extent when ntraining was large. For example, the relative RMSEs of log Kow data in the “2 < h/hmean < 3” bin were 1.75, 1.52, 1.42, and 1.34 for ntraining = 20, 40, 75, and 100, respectively. This result suggests that if the PP-LFER is trained with a sufficient size of data, the RMSEs for interpolations (i.e., h/hmean < 3) resemble the RMSE for the training set. Noteworthily, even for the “3 < h/hmean < 4” bin (i.e., extrapolation), the relative RMSE for any K considered was <1.5 when ntraining ≥ 50 and <2.2 when ntraining ≥ 20. These RMSEs can be sufficiently accurate for various purposes. Although h > 3hmean is the common definition of extrapolation, the actual threshold of h may be adapted to the required accuracy of predictions, depending on the quality of the PP-LFER fit and ntraining. For example, if the required accuracy is 0.3 log units, which is typically the level of accuracy of contaminant fate models,[21] extrapolations by the PP-LFERs for log Kow and log Kaw up to an h/hmean of 4 can be allowed, according to the results of Test 1 (Figure S1). In contrast, a stricter criterion, for example, h/hmean < 2 or even <1, should be set to log Koc, log Klipw, and log KBSAw to comply with the criterion of 0.3 log unit RMSE. Alternative AD thresholds are further discussed in Section .

Figure 1

RMSEs of the test data, sorted according to h/hmean, relative to the RMSE of the training data. The plots for ntraining = 30 and 50 and log Koc and log KBSAw are available in Figure S2.

RMSEs of the test data, sorted according to h/hmean, relative to the RMSE of the training data. The plots for ntraining = 30 and 50 and log Koc and log KBSAw are available in Figure S2. Along with average errors, such as RMSEs, the risk of an extremely inaccurate prediction is of interest. Individual data of Test 1 for log Kow and log Klipw were plotted against h (Figure ). All other data are shown in Figure S3. When ntraining was small (e.g., 20, 30), both h (x-axis) and prediction errors (y-axis, normalized to SDtraining) for the test data were widely distributed. Extremely large errors (|error/SDtraining| > 5) occasionally occurred, particularly if h was large (>10hmean). In contrast, when ntraining was large (e.g., 75, 100), the training and test data were similarly distributed in terms of h and the prediction errors.

Figure 2

Prediction errors normalized to SDtraining plotted against h. Results from 200 simulations are shown. The vertical line indicates 3hmean. The dashed horizontal lines indicate errors that are three times the SDtraining. The curves indicate the 95% (inside) and 99% (outside) prediction intervals. Top, log Kow; bottom, log Klipw. All other data are shown in Figure S3. The percentage of large prediction errors, defined by |error/SDtraining| > 3, was generally higher for extrapolation (h/hmean > 3) than interpolation (h/hmean < 3) (Figure S4). However, the percentage strongly decreased with ntraining. As an example, for log Kow, when ntraining = 20, 3.3% of the interpolations and 17% of the extrapolations suffered from large prediction errors. In contrast, when ntraining = 100, 0.94% of the interpolations and 4.7% of the extrapolations resulted in large prediction errors, the latter conversely indicating that 95% of the extrapolations ended up with errors within 3 SDtraining. Figure additionally shows the 95% and 99% PIs as a function of h. The PIs were narrow up to h ∼ 1 and diverged with h, as expected from eq . The extent of divergence was large when ntraining was small, which can be explained by a large tα/2, in eq . The data points from Test 1 were within the PIs with a few outliers. The percentage of the test data within a given PI agrees with the theoretical expectations; for example, ca. 95% of the test data are within the 95% PI, independent of ntraining (Figure S5). Overall, Test 1 demonstrated that h increased with the mean prediction error and could be used to identify “risky predictions” that frequently cause high inaccuracy. However, a threshold of 3hmean did not appear to be versatile in defining the AD, as the ntraining appeared to influence the range of prediction errors. The plots in Figures , 2, and S1–S5 suggest that, when ntraining was large, h = 3hmean might be overly strict as a threshold, because prediction errors were often similar in magnitude even when h > 3hmean. Note that Test 1 was also performed with eq , the PP-LFER equation that uses L instead of E. However, the results were similar to those of eq and are thus not discussed herein.

PFASs and OSCs

Using 200 trained PP-LFERs, log Kow of 3 PFASs [4:2 fluorotelomer alcohol (FTOH), 6:2 FTOH, and 8:2 FTOH] and 3 OSCs [octamethylcyclotetrasiloxane (D4), decamethylcyclopentasiloxane (D5), and dodecamethylcyclohexasiloxane (D6)] were predicted and compared to the experimental data (Figure ; additional data in Figure S6).[18] For this comparison, eq instead of eq was used because the latter is unsuitable for PFASs and OSCs (ref (18); also compare Figures S6 and S7). The h/hmean ratios for these six chemicals were always above 3 with any ntraining used and were up to 300, indicating strong extrapolations. The predictions were highly inaccurate when ntraining was small. However, the predictions appeared to improve with an increase in ntraining. When ntraining = 100, even largely extrapolated FTOHs (h ∼ 2, h/hmean ∼ 33) were frequently predicted within 3 SDtraining. The dependence of the prediction error on h was well captured by the PIs; the majority of the data were within the 99% PIs, and this was the case for extreme extrapolations as well (Figures and S6). The results for PFASs and OSCs can be considered another indication that well-calibrated PP-LFERs are robust against extrapolation and that h = 3hmean as the cutoff is excessively strict if the ntraining is large. Notably, although well-calibrated PP-LFERs appear to bear extrapolation, the inclusion of PFASs and OSCs in the calibration set is the first choice to develop PP-LFERs that work for these classes of chemicals, as that substantially decreases h for PFASs and OSCs.[18]

Figure 3

Prediction errors for log Kow of PFASs and OSCs normalized to SDtraining plotted against h. The results from 200 simulations are shown. The lines indicate the same as in Figure . Equation was used for this plot (see text for more details). Additional data are in Figure S6.

Defining the AD of PP-LFERs?

In previous discussions regarding the AD of quantitative structure activity relationships (QSARs), the use of h with a cutoff value of 3hmean has been frequently presented. As shown in Test 1 of this study, however, this cutoff may excessively limit the potential of well-calibrated PP-LFERs to predict a broad range of compounds above the 3hmean threshold. The use of the PI, in contrast, has rarely been investigated in the context of QSAR development but may be more practical for multiple linear regression models, such as PP-LFERs, because the PI encompasses the distance (h), quality of model fit (SDtraining), and size of training data (influencing h and tα/2,) and provides a concrete estimate of the error range (eq ). To use the PI to define the AD, an upper threshold for Δ(log K) must be set. Here, two ways that may be acceptable are discussed.

Set the Δ(log K) Threshold at a Multiple of SDtraining

The AD may be defined by a Δ(log K) threshold that is a multiple of SDtraining. An example of such a criterion is Δ(log K)99%PI < 3SDtraining. According to eq , this condition corresponds to Inequality 9 describes the two intersections in Figures and 3 where the curves for the 99% PI meet the horizontal lines for ±3SDtraining. By solving this inequality for h, we obtain Inequality 10 describes a new h criterion that is derived from “Δ(log K)99%PI < 3SDtraining“ and is a function of tα/2,. As tα/2, is dependent on ntraining, this h threshold is also dependent on ntraining (Figure ). For example, if ntraining = 50, the new threshold of h is 0.24, which corresponds to h/hmean = 2.0. If ntraining = 100, the threshold is h = 0.30, which is h/hmean = 5.0. The common threshold of h/hmean = 3 can be derived when ntraining = 66.6. Thus, the new threshold is stricter if ntraining ≤ 66 and less strict if ntraining ≥ 67, compared with the 3hmean rule.

Figure 4

New thresholds of h (blue dash-dotted line) and h/hmean (orange solid line) derived from the Δ(log K)99%PI < 3SDtraining criterion (eq ) as a function of ntraining. The horizontal arrows indicate the axes that the data refer to. The ntraining value that corresponds to the h/hmean = 3 threshold is also indicated.

Set the Δ(log K) Threshold at a Certain Value

In the second approach, the AD is defined in such a way that the PI becomes narrower than a certain range. For example, we may consider Δ(log K)99%PI < 0.5 (i.e., a factor of 3 for K) as an acceptable error margin; then, eq becomeswhich can be rewritten as Using the SDtraining value for the PP-LFER of log Kow (Table S7) as an example, we can derive a threshold of h specific to log Kow. By inserting SDtraining = 0.154 and t99/2, = 2.59 (with n = 314) in inequality 12, we obtained h < 0.57 (i.e., h/hmean < 30) as the new criterion. Note that if SDtraining is high (e.g., 0.285 for log Klipw), “Δ(log K)99%PI < 0.5” is not achievable no matter how large ntraining is, because t99/2, is >2.58 regardless of ntraining and the right-hand side of inequality 12 is always negative. The difficulty associated with this approach to define the AD may be to set the acceptable Δ(log K)99%PI level such that it is both useful and achievable.

Evaluating the AD of Published PP-LFERs with AD Probes (Test 2)

Test 1 demonstrated the usefulness and limitations of h and the PI in evaluating the AD of PP-LFERs. In Test 2, h and the PI were applied to evaluate 10 published PP-LFER equations[11−16,22−25] including those that had originally been derived from the data sets used in Test 1 (Figures and S8). In this test, 25 environmentally relevant chemicals were considered as AD probes, as explained in the method section.

Figure 5

Leverage (bars) and prediction intervals (triangles and circles) of 25 applicability domain (AD) probes calculated with the training data sets of PP-LFERs for Kow, Klipw (liposome/water), KFAw (fulvic acid/water), and KACw (activated carbon/water). Solid horizontal lines indicate h/hmean = 3 and Δ(log K) = 3SD. For convenience, chemicals are grouped, according to their structure and polarity, into nonpolar (nonP), H-bond acceptor (H-A), H-bond donor (H-D), multiple functional polar (multi), PFAS and OSC (F/Si), aliphatic, and aromatic chemicals. The cited reference does not give SD but a “mean error” of 0.2, which was used here (asterisked). Plots for all 10 PP-LFERs are shown in Figure S8. The h calculation showed that none of the 10 training sets considered encompassed all the 25 AD probes within the 3hmean domain. This indicates that certain environmentally relevant compounds must be extrapolated with these PP-LFERs. Particularly, 8:2 FTOH and D5 always appeared as highly extrapolated chemicals (h/hmean = 8–50), reflecting the fact that PFASs and OSCs were not included in any of the training sets and indicating that these compounds were not well represented by other training compounds. For each type of chemical, relatively small compounds (e.g., dichloromethane, methyl tert-butyl ether, and benzene) exhibited lower h/hmean ratios than larger compounds (e.g., hexadecane, tri-n-butyl phosphate, and benzo[ghi]perylene). Generally, relatively small compounds are easy to measure, and their data are present in the training set, whereas obtaining data for large compounds tends to be more challenging.[26,27] Consequently, PP-LFERs must be frequently extrapolated for large compounds. The data sets for log Kow(11) and log Kaw(12) exhibited similar patterns for h/hmean and Δ(log K). Thus, the h/hmean ratios of the small compounds were <3 (interpolation) and those of the large compounds were in the range of 3–15 (extrapolation) (Figure A). However, the Δ(log K) values were not largely different across the 25 AD probes. Although 12 out of 25 AD probes exhibited h/hmean > 3, Δ(log K)95%PI and Δ(log K)99%PI were ∼0.3 and ∼0.4, respectively, for all the AD probes. Even for strongly extrapolated 8:2 FTOH, Δ(log K)95%PI and Δ(log K)99%PI of log Kow predictions were 0.36 and 0.47, respectively. These relatively low Δ(log K) values for the extrapolated compounds originated from the low SDtraining and the substantial size of the training data for Kow and Kaw. The log Koilw(13) data set resulted in similar patterns for h/hmean and Δ(log K), but the values of Δ(log K) were higher than those of log Kow and log Kaw because of the higher SDtraining of log Koilw (Figure S8). The data set for log Klipw(15) had the benefit of excellent coverage of the AD probes; only 5 out of 25 AD probes exhibited h/hmean > 3 (Figure B). A wealth of data for hydrophobic compounds (e.g., PAHs), substituted phenols, hormones, and pharmaceuticals in addition to simple aliphatic and aromatic and polar and nonpolar compounds with varying sizes resulted in low h/hmean for the AD probes. Because of the low h/hmean and high n, the Δ(log K) values were similar for all AD probes. Nevertheless, the values of Δ(log K)95%PI and Δ(log K)99%PI (∼0.6 and ∼0.8, respectively) for log Klipw were higher than those for log Kow by a factor of ∼2, because the SDtraining of log Klipw was higher by the same factor. Figure C,D shows illustrative examples of PP-LFERs with limited training data. The data set of fulvic acid/water partition coefficients (KFAw)[22] comprised 34 training data, and 16 out of 25 AD probes were extrapolated (h/hmean > 3). The major difference from log Kow and log Klipw was the wide range of Δ(log K); the Δ(log K)95%PI and Δ(log K)99%PI values for log KFAw were in the range of 0.5–1.0 and 0.7–1.4, respectively. The data set of activated carbon/water partition coefficients (KACw)[25] was a clearer example of insufficient calibration. It contained only 14 training data, and all AD probes were considered extrapolated (h/hmean, 8–480). Although the model fitting seemed to be good (stated mean error: 0.2),[25] the PIs were extremely broad, with Δ(log K)95%PI and Δ(log K)99%PI being 1.0–6.8 and 1.5–10, respectively. These results indicate that PP-LFERs from such small training sets will have a limited predictive ability for external compounds. Conversely, the calculation of h and the PIs will be most useful for such poorly calibrated PP-LFERs, as they can identify compounds for which the precision of prediction is still acceptable. In Supporting Information 10, a comparative discussion is provided for three data sets of log Koc(14,23,24) in terms of their ADs. These data sets comprise different calibration compounds and, accordingly, cover different types of compounds within their ADs, as demonstrated by the AD probes. Overall, it can be concluded that the 25 AD probes are useful in illustrating the strength and weakness of calibrated PP-LFERs. The missing classes of compounds in the training data, for example, large hydrophobic compounds and multifunctional polar compounds, can be identified using the h/hmean values, and the associated elevation of error margins can be evaluated by calculating the PIs. While 25 AD probes were exemplarily used in this study, other sets of AD probes could be also used with a larger or smaller number of chemicals or with specific chemicals of interest (e.g., pesticides), depending on the purpose of evaluation.

Practical Implications

This study demonstrated that extrapolation was error-prone when the number of training data was limited and the h/hmean value was extremely high. In contrast, PP-LFERs calibrated with many training data (e.g., 100) were highly robust even when h/hmean signified extrapolation. For partition coefficients between solvent phases or solvent and air such as Kow and Kaw, the data are typically accurate and abundant. Thus, extrapolations can frequently result in low prediction errors. Extrapolation is expected to cause unacceptable errors more often for heterogeneous environmental, biological, and technical phases, because the data are often limited, and SDtraining tends to be large. The commonly used threshold of h being 3 hmean appeared to be not useful in defining the AD of PP-LFER models. Alternatively, two possible ways were proposed in this article to define the AD based on the calculation of the PIs. For practical purposes, presenting the PIs for each prediction may be highly recommended. For example, using the PP-LFER, log Kow for hexachlorobenzene is predicted as 5.49 with a 95% PI of [5.16, 5.81]. With these PI values, the model user can appreciate the reliability of the prediction and decide whether the value is taken or not, following the accuracy required for the given model use. It could be claimed that calculating the PI each time is more important and useful than seeking a strict definition of the AD because the former presents a quantitative estimate of the error range, while the latter is a qualitative, binomial indicator with an arbitrary cutoff in the end. To develop a robust PP-LFER, the training set should contain (i) a large number (>60, preferably >100) of (ii) accurate experimental K data for (iii) diverse compounds with (iv) accurate descriptors available. The reason is that (i) decreases tα/2, and h, (ii) and (iv) decrease SDtraining, and (iii) decreases h in eq , all contributing to tight PIs. The predictive performance of an empirical model is always restricted by the quality and quantity of the underlying experimental data. The improvement in data accuracy and availability will contribute to the further development of PP-LFER approaches. Extended use of the PI may be considered for evaluating the AD of QSARs that are derived by the multiple linear regression analysis. The calculation of the PI is no more complex than h is, but the former provides far more insights into the reliability of predictions, as discussed above. Noteworthily, the success of applying the PIs for PP-LFERs stems from the excellent linear dependence of log K on the solute descriptors over a wide range, which is the premise of PP-LFER models such as eqs –3. This conversely means that, if the linear relationship between the solute descriptors and log K is weak (e.g., for complex phases), prediction errors can be larger and extreme outliers can occur more frequently than predicted by the PIs. The suitability of the PI for various partitioning phases and for various existing QSAR descriptors and properties warrants future investigation.

20 in total

Review 1. Linear free energy relationships used to evaluate equilibrium partitioning of organic compounds.

Authors: K U Goss; R P Schwarzenbach
Journal: Environ Sci Technol Date: 2001-01-01 Impact factor: 9.028

2. Evaluating dissolved organic carbon-water partitioning using polyparameter linear free energy relationships: Implications for the fate of disinfection by-products.

Authors: Peta A Neale; Beate I Escher; Kai-Uwe Goss; Satoshi Endo
Journal: Water Res Date: 2012-04-11 Impact factor: 11.236

3. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52.

Authors: Tatiana I Netzeva; Andrew Worth; Tom Aldenberg; Romualdo Benigni; Mark T D Cronin; Paolo Gramatica; Joanna S Jaworska; Scott Kahn; Gilles Klopman; Carol A Marchant; Glenn Myatt; Nina Nikolova-Jeliazkova; Grace Y Patlewicz; Roger Perkins; David Roberts; Terry Schultz; David W Stanton; Johannes J M van de Sandt; Weida Tong; Gilman Veith; Chihae Yang
Journal: Altern Lab Anim Date: 2005-04 Impact factor: 1.303

4. Statistical external validation and consensus modeling: a QSPR case study for Koc prediction.

Authors: Paola Gramatica; Elisa Giani; Ester Papa
Journal: J Mol Graph Model Date: 2006-08-04 Impact factor: 2.518

5. Air to lung partition coefficients for volatile organic compounds and blood to lung partition coefficients for volatile organic compounds and drugs.

Authors: Michael H Abraham; Adam Ibrahim; William E Acree
Journal: Eur J Med Chem Date: 2007-04-24 Impact factor: 6.514

Review 6. Estimation of the environmental properties of compounds from chromatographic measurements and the solvation parameter model.

Authors: Colin F Poole; Thiloka C Ariyasena; Nicole Lenca
Journal: J Chromatogr A Date: 2013-05-27 Impact factor: 4.759

7. LFERs for soil organic carbon-water distribution coefficients (Koc) at environmentally relevant sorbate concentrations.

Authors: Satoshi Endo; Peter Grathwohl; Stefan B Haderlein; Torsten C Schmidt
Journal: Environ Sci Technol Date: 2009-05-01 Impact factor: 9.028

8. Predicting partition coefficients of Polyfluorinated and organosilicon compounds using polyparameter linear free energy relationships (PP-LFERs).

Authors: Satoshi Endo; Kai-Uwe Goss
Journal: Environ Sci Technol Date: 2014-02-11 Impact factor: 9.028

9. Predicting sorption of pesticides and other multifunctional organic chemicals to soil organic carbon.

Authors: Guido Bronner; Kai-Uwe Goss
Journal: Environ Sci Technol Date: 2010-12-31 Impact factor: 9.028

10. Determining octanol-water partition coefficients for extremely hydrophobic chemicals by combining "slow stirring" and solid-phase microextraction.

Authors: Michiel T O Jonker
Journal: Environ Toxicol Chem Date: 2016-03-18 Impact factor: 3.742