| Literature DB >> 20818579 |
A Furuhama1, T Toida, N Nishikawa, Y Aoki, Y Yoshioka, H Shiraishi.
Abstract
The KAshinhou Tool for Ecotoxicity (KATE) system, including ecotoxicity quantitative structure-activity relationship (QSAR) models, was developed by the Japanese National Institute for Environmental Studies (NIES) using the database of aquatic toxicity results gathered by the Japanese Ministry of the Environment and the US EPA fathead minnow database. In this system chemicals can be entered according to their one-dimensional structures and classified by substructure. The QSAR equations for predicting the toxicity of a chemical compound assume a linear correlation between its log P value and its aquatic toxicity. KATE uses a structural domain called C-judgement, defined by the substructures of specified functional groups in the QSAR models. Internal validation by the leave-one-out method confirms that the QSAR equations, with r(2 )> 0.7, RMSE <or= 0.5, and n > 5, give acceptable q(2) values. Such external validation indicates that a group of chemicals with an in-domain of KATE C-judgements exhibits a lower root mean square error (RMSE). These findings demonstrate that the KATE system has the potential to enable chemicals to be categorised as potential hazards.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20818579 PMCID: PMC2946238 DOI: 10.1080/1062936X.2010.501815
Source DB: PubMed Journal: SAR QSAR Environ Res ISSN: 1026-776X Impact factor: 3.000
QSARs for fish acute toxicity estimated by the equation: log(1/LC50[mM]) = a ∗ log P + b.
| Class name | a,b | n | RMSE | r2, q2 | logP range | ||
| 0.630, −0.883 | 43 | 0.368 | 0.826, 0.803 | [0.60, 5.17] | |||
| 0.568, 0.551 | 12 | 0.669 | 0.331, 0.170 | [0.56, 3.60] | |||
| 0.678, −0.693 | 9 | 0.300 | 0.875, 0.760 | [0.82, 5.10] | |||
| −0.005, 2.671 | 7 | 0.354 | 0.001, 0.887 | [−0.30, 4.47] | |||
| 0.012, 1.863 | 7 | 0.307 | 0.003, 0.737 | [3.67, 8.47] | C | ||
| 0.214, 0.945 | 16 | 0.305 | 0.272, 0.106 | [0.15, 3.68] | |||
| 0.725, −0.779 | 56 | 0.321 | 0.900, 0.890 | [0.51, 7.54] | |||
| 0.544, −0.612 | 22 | 0.324 | 0.661, 0.600 | [0.35, 3.50] | |||
| 0.529, −0.622 | 23 | 0.406 | 0.803, 0.741 | [−2.04, 3.60] | |||
| 0.592, −0.595 | 10 | 0.512 | 0.731, 0.605 | [−1.43, 2.79] | C | ||
| 0.417, 1.832 | 4 | 0.413 | 0.884, 0.639 | [−1.68, 4.70] | |||
| 0.746, −1.026 | 17 | 0.601 | 0.696, 0.607 | [−0.48, 3.80] | |||
| 0.638, −0.600 | 13 | 0.393 | 0.722, 0.651 | [0.18, 3.65] | |||
| 0.513, −0.157 | 9 | 0.253 | 0.856, 0.790 | [1.94, 5.53] | |||
| 0.484, 0.279 | 15 | 0.557 | 0.272, 0.111 | [−0.34, 2.47] | |||
| 0.728, −1.652 | 9 | 0.355 | 0.816, 0.667 | [0.33, 4.20] | |||
| 0.122, 0.045 | 3 | 0.039 | 0.607, 0.271 | [0.35, 1.33] | |||
| 0.753, 2.084 | 4 | 1.012 | 0.463, 0.111 | [−1.11, 2.20] | |||
| 0.436, 0.901 | 17 | 1.007 | 0.264, 0.066 | [−0.38, 4.10] | |||
| NO–QSAR | |||||||
| 0.371, 0.732 | 4 | 0.291 | 0.910, 0.633 | [−0.17, 6.12] | C | ||
| 0.753, −1.336 | 8 | 0.259 | 0.699, 0.573 | [2.46, 4.16] | C | ||
| 0.386, 0.845 | 6 | 0.666 | 0.210, 0.012 | [1.74, 4.44] | C | ||
| 0.004, 1.894 | 11 | 0.519 | 0.000, 0.645 | [−0.47, 4.60] | |||
| NO–QSAR | |||||||
| 0.158, 1.498 | 6 | 0.155 | 0.474, 0.022 | [−0.21, 2.36] | |||
| 0.465, −0.031 | 6 | 0.417 | 0.657, 0.293 | [0.47, 4.54] | |||
| 0.323, 1.055 | 4 | 0.272 | 0.755, 0.283 | [0.08, 3.98] | C | ||
| 1.583, −2.560 | 4 | 0.291 | 0.657, 0.927 | [1.47, 2.10] | |||
| 0.691, −0.111 | 11 | 0.856 | 0.389, 0.175 | [2.23, 5.33] | |||
| 0.274, 0.956 | 9 | 0.579 | 0.791, 0.628 | [–8.36, 6.69] | C | ||
| 0.254, 1.325 | 6 | 0.971 | 0.112, 0.078 | [0.45, 4.50] | |||
| 0.824, −0.318 | 8 | 0.560 | 0.879, 0.810 | [−0.06, 5.04] | |||
| 0.783, −1.291 | 42 | 0.263 | 0.879, 0.868 | [1.25, 4.89] | |||
| NO–QSAR | |||||||
| 0.839, −1.154 | 6 | 0.254 | 0.938, 0.901 | [−0.34, 3.12] | N | ||
| 0.864, −1.602 | 21 | 0.345 | 0.891, 0.867 | [−0.24, 4.09] | N | ||
| 0.853, −1.958 | 23 | 0.321 | 0.950, 0.924 | [−0.77, 5.82] | N | ||
| 0.891, −1.926 | 3 | 0.257 | 0.865, 0.485 | [2.83, 4.59] | N | ||
| 0.753, −1.286 | 15 | 0.289 | 0.824, 0.785 | [2.42, 5.56] | N | ||
| 0.749, −1.806 | 11 | 0.190 | 0.972, 0.962 | [−0.54, 4.25] | N | ||
| 0.870, −1.466 | 10 | 0.233 | 0.922, 0.892 | [1.16, 4.21] | N | ||
| 0.842, −1.674 | 88 | 0.384 | 0.924, 0.919 | [−0.77, 5.82] | |||
| 0.744, −0.898 | 25 | 0.714 | 0.712, 0.660 | [−1.35, 5.50] |
∗1 C: an equation is generated by calculated Clog P. N: a member of the Neutral organics class. Note: n, RMSE, r2 and q2 denote the number of chemicals in a class, the root mean square error, the squared correlation coefficient, and the leave-one-out version of the squared correlation coefficient, respectively. The log P range shows minimum and maximum log P values.
QSARs for the daphnia acute toxicity estimated by the equation: log(1/EC50[mM]) = a ∗ log P + b.
| Class name | a,b | n | RMSE | r2,q2 | logP range | ||
| 0.607, −0.414 | 26 | 0.351 | 0.808, 0.762 | [0.65, 5.17] | |||
| 0.408, 0.632 | 5 | 0.561 | 0.343, 0.090 | [0.56, 3.60] | |||
| 0.547, −0.164 | 4 | 0.238 | 0.915, 0.675 | [1.17, 5.10] | |||
| 0.085, 2.441 | 7 | 0.443 | 0.057, 0.375 | [−0.33, 3.41] | |||
| 0.097, 1.152 | 6 | 0.277 | 0.239, 0.031 | [3.67, 8.47] | C | ||
| 0.132, 1.748 | 16 | 0.406 | 0.119, 0.018 | [0.04, 3.91] | |||
| 0.576, −0.042 | 28 | 0.297 | 0.838, 0.814 | [1.32, 6.06] | |||
| 0.552, 0.114 | 12 | 0.260 | 0.802, 0.728 | [1.18, 3.91] | |||
| 0.189, −0.059 | 4 | 0.248 | 0.390, 0.095 | [−1.31, 1.49] | |||
| 0.133, 0.200 | 4 | 0.150 | 0.517, 0.040 | [−1.50, 1.45] | |||
| 0.190, 1.987 | 5 | 0.289 | 0.766, 0.360 | [−2.46, 4.70] | C | ||
| 0.212, 0.585 | 8 | 0.593 | 0.151, 0.135 | [0.23, 3.80] | |||
| 0.666, −0.819 | 6 | 0.324 | 0.927, 0.762 | [0.25, 5.41] | |||
| 0.459, −0.417 | 3 | 0.010 | 1.000, 0.998 | [1.60, 4.72] | |||
| 0.521, 0.295 | 5 | 0.555 | 0.616, 0.084 | [0.42, 4.47] | C | ||
| 0.222, −0.113 | 7 | 0.644 | 0.133, 0.298 | [0.08, 4.20] | |||
| 0.057, 0.248 | 3 | 0.143 | 0.025, 0.947 | [0.35, 1.33] | |||
| 0.630, 1.393 | 5 | 0.321 | 0.957, 0.916 | [−1.76, 4.65] | C | ||
| 0.213, 0.906 | 11 | 0.775 | 0.097, 0.047 | [0.17, 3.70] | |||
| NO–QSAR | |||||||
| 0.427, 1.410 | 4 | 0.786 | 0.647, 0.049 | [−0.17, 6.12] | C | ||
| NO–QSAR | |||||||
| 1.041, −0.724 | 3 | 0.480 | 0.865, 0.635 | [1.74, 4.44] | C | ||
| 0.046, 2.991 | 4 | 0.688 | 0.008, 0.523 | [0.94, 4.60] | |||
| NO–QSAR | |||||||
| 0.003, 1.401 | 4 | 0.069 | 0.002, 0.646 | [−0.21, 2.36] | |||
| 0.461, −0.422 | 5 | 0.301 | 0.824, 0.653 | [0.47, 4.54] | |||
| 0.486, 0.589 | 4 | 0.341 | 0.817, 0.598 | [0.08, 3.98] | C | ||
| NO–QSAR | |||||||
| 2.133, −2.376 | 3 | 1.477 | 0.204, 0.526 | [3.08, 3.88] | |||
| NO–QSAR | |||||||
| −0.665, 4.825 | 3 | 0.350 | 0.800, 0.998 | [2.09, 4.50] | |||
| 0.880, −0.317 | 4 | 0.552 | 0.860, 0.494 | [1.10, 5.04] | |||
| 0.826, −1.008 | 24 | 0.237 | 0.901, 0.883 | [1.47, 4.73] | |||
| NO–QSAR | |||||||
| NO–QSAR | N | ||||||
| NO–QSAR | N | ||||||
| 0.641, −1.053 | 6 | 0.214 | 0.958, 0.923 | [1.10, 5.82] | N | ||
| 0.579, −0.634 | 3 | 0.103 | 0.983, 0.922 | [1.44, 4.59] | N | ||
| 0.660, −0.555 | 10 | 0.268 | 0.891, 0.797 | [2.42, 6.54] | N | ||
| NO–QSAR | N | ||||||
| 0.492, 0.285 | 4 | 0.437 | 0.406, 0.088 | [2.16, 4.21] | N | ||
| 0.696, −0.870 | 26 | 0.418 | 0.857, 0.835 | [0.68, 6.54] | |||
| 0.537, 0.078 | 12 | 1.097 | 0.475, 0.287 | [−1.02, 5.50] |
∗1 C: an equation is generated by the calculated Clog P. N: a member of the Neutral organics class. Note. n, RMSE, r2, and q2 denote the number of chemicals in a class, the root mean square error, the squared correlation coefficient, and the leave-one-out version of the squared correlation coefficient, respectively. The logP range shows minimum and maximum log P values.
Figure 1.The correlation between log P and the measured toxicity values of chemicals used in KATE as a daphnia end-point. The dotted-dashed, dashed and bold lines are the QSAR equations of amines aromatic or phenols4, amines aromatic or phenols5, and neutral organics, respectively.
Statistical information comparing measured and calculated fish log(1/LC50[mM]) of 287 test set compounds. The complete results are shown in Appendix 5-1.
| KATE | ||||||||
| TIMES | ECOSAR | All | log P | C(l) | C(2) | logP | logP | |
| 274 | 242 | 274 | 207 | 152 | 192 | 111 | 144 | |
| 274 | 259 | 318 | 252 | 187 | 233 | 145 | 179 | |
| 0.751 | 0.790 | 0.868 | 0.833 | 0.901 | 0.890 | 0.886 | 0.866 | |
| RMSE | 0.935 | 0.869 | 0.685 | 0.641 | 0.644 | 0.655 | 0.588 | 0.617 |
| Under | 11.3 | 10.0 | 4.7 | 5.2 | 5.3 | 5.6 | 2.8 | 3.9 |
| Over | 5.1 | 8.1 | 7.2 | 6.7 | 8.0 | 6.9 | 8.3 | 1.3 |
Notes:
∗1Each chemical is identified by one QSAR class.
∗2When a chemical is found to belong to more than one QSAR class, all the estimated data are adopted. If only the name of the class is available, such data are omitted.
∗3Both in-domain and out-of-domain data for log P and C-judgements are included.
∗4In-domain of log P-judgement.
∗5In-domain of C-judgement is defined as all substructures of a test chemical being found in reference chemicals in the class.
∗6In-domain of C-judgement defined as all substructures of a test chemical being in reference chemicals in either Neutral organics or the class.
∗7The number of compounds that can be predicted.
∗8The total number of the predicted values by using the training sets. Some chemicals belong to more than one class, and thus Predicted is larger than Chemicals. r2, RMSE, Under and Over were calculated based on the Predicted number.
∗9Fractions (%) of the underestimated chemicals. Underestimation is defined as [calculated log(1/LC50)–measured log(1/LC50)]<1.
∗10Fractions (%) of the overestimated chemicals. Overestimation is defined as [calculated log(1/LC50) – measured log(1/LC50)]> 1.
Statistical information between measured and calculated Daphnia log(1/EC50[mM]) for 98 test set compounds. The complete results are shown in Appendix 5-2.
| KATE2 | ||||||||
| TIMES | ECOSAR | all | logP | C(l) | C(2) | logP | logP | |
| 93 | 82 | 94 | 58 | 43 | 55 | 25 | 33 | |
| 93 | 85 | 102 | 66 | 46 | 61 | 31 | 39 | |
| r2 | 0.668 | 0.699 | 0.662 | 0.732 | 0.793 | 0.686 | 0.807 | 0.801 |
| RMSE | 1.404 | 1.364 | 0.993 | 0.784 | 0.799 | 0.968 | 0.639 | 0.689 |
| Under | 21.5 | 14.1 | 9.8 | 1.5 | 6.5 | 8.2 | 0.0 | 0.0 |
| Over | 11.8 | 18.8 | 14.7 | 15.2 | 6.5 | 11.5 | 6.5 | 10.3 |
Notes:
∗1Each chemical is identified by one QSAR class.
∗2When a chemical is found to belong to more than one QSAR class, all the estimated data are adopted. If only the name of the class is available, such data are omitted.
∗3Both in-domain and out-of-domain data for log P and C-judgements are included.
∗4In-domain of log P-judgement.
∗5In-domain of C-judgement is defined as all substructures of a test chemical being found in reference chemicals in the class.
∗6In-domain of C-judgement defined as all substructures of a test chemical being in reference chemicals in either Neutral organics or the class.
∗7The number of compounds that can be predicted.
∗8The total number of the predicted values by using the training sets. Some chemicals belong to more than one class, and thus Predicted is larger than Chemicals. r2, RMSE, Under and Over were calculated based on the Predicted number.
∗9Fractions (%) of the underestimated chemicals. Underestimation is defined as [calculated log(1/LC50)–measured log(1/LC50)]<1.
∗10Fractions (%) of the overestimated chemicals. Overestimation is defined as [calculated log(1/LC50) – measured log(1/LC50)]> 1.