| Literature DB >> 19672406 |
Hao Zhu1, Lin Ye, Ann Richard, Alexander Golbraikh, Fred A Wright, Ivan Rusyn, Alexander Tropsha.
Abstract
BACKGROUND: Accurate prediction of in vivo toxicity from in vitro testing is a challenging problem. Large public-private consortia have been formed with the goal of improving chemical safety assessment by the means of high-throughput screening.Entities:
Keywords: IC50; LD50; LOAEL; NOAEL; QSAR; acute toxicity; computational toxicology
Mesh:
Year: 2009 PMID: 19672406 PMCID: PMC2721870 DOI: 10.1289/ehp.0800471
Source DB: PubMed Journal: Environ Health Perspect ISSN: 0091-6765 Impact factor: 9.031
The results of data partitioning for the compounds with rat LD50, mouse LD50, rat chronic LOAEL, and rat chronic NOAEL data in ZEBET data set using cytotoxicity IC50 values.
| Model | No. of C1 compounds | C1 ratio (%) | No. of C2 compounds | C2 ratio (%) |
|---|---|---|---|---|
| Rat LD50 (original set) | 137 | 60 | 93 | 40 |
| Mouse LD50 | 119 | 56 | 92 | 44 |
| Rat LOAEL | 21 | 49 | 21 | 51 |
| Rat NOAEL | 19 | 46 | 22 | 54 |
| Rat LD50 (full data set) | 258 | 61 | 167 | 39 |
Abbreviations: C1, Class 1; C2, Class 2.
Statistical information for the five most statistically significant kNN QSAR models based on three modeling sets.
| Model | N-training | Pred-training | N-test | Pred-test | NNN |
|---|---|---|---|---|---|
| The best | |||||
| 1 | 173 | 0.84 | 55 | 0.73 | 1 |
| 2 | 147 | 0.86 | 74 | 0.70 | 1 |
| 3 | 193 | 0.83 | 37 | 0.73 | 1 |
| 4 | 165 | 0.86 | 59 | 0.70 | 1 |
| 5 | 173 | 0.81 | 55 | 0.75 | 1 |
| The best | |||||
| 1 | 103 | 0.66 | 34 | 0.81 | 3 |
| 2 | 103 | 0.73 | 34 | 0.71 | 2 |
| 3 | 111 | 0.71 | 26 | 0.74 | 3 |
| 4 | 115 | 0.65 | 22 | 0.79 | 5 |
| 5 | 77 | 0.73 | 60 | 0.71 | 2 |
| The best | |||||
| 1 | 80 | 0.61 | 13 | 0.84 | 2 |
| 2 | 77 | 0.67 | 16 | 0.77 | 1 |
| 3 | 80 | 0.69 | 13 | 0.74 | 1 |
| 4 | 80 | 0.65 | 13 | 0.76 | 2 |
| 5 | 79 | 0.63 | 14 | 0.78 | 2 |
Abbreviations: NNN; number of the nearest neighbors used for prediction; N-test, number of compounds in the test set; N-training, number of compounds in the training set; Pred-test, the overall predictivity of the test set (correct classification rate for classification models, R2 for continuous models); Pred-training, the overall predictivity of the training set (correct classification rate for classification models, q2 for continuous models).
Figure 1The work flow of the two-step kNN QSAR LD50 modeling.
Figure 2The identification of the baseline correlation between cytotoxicity (IC50) and various types of in vivo toxicity testing results. (A) Rat LD50. (B) Mouse LD50. (C) Rat LOAEL. (D) Rat NOAEL. C1, class 1; C2, class 2.
Figure 3The correlation between experimental and predicted LD50 values for 27 external compounds within the applicability domain (A) using TOPKAT and (B) using the two-step model developed in this study.
Comparison between TOPKAT and the two-step model prediction of the external compounds.
| Two-step model
| TOPKAT
| |||
|---|---|---|---|---|
| Measure | No applicability domain | With applicability domain | No applicability domain | With applicability domain |
| Prediction of 27 new ZEBET compounds
| ||||
| | 0.64 | 0.86 | 0.16 | 0.60 |
| MAE | 0.38 | 0.29 | 0.78 | 0.50 |
| Coverage (%) | 100 | 67 | 100 | 67 |
| Prediction of 1,562 RTECS compounds with 70% confidence level
| ||||
| | 0.26 | 0.33 | 0.19 | 0.22 |
| MAE | 0.65 | 0.54 | 0.76 | 0.65 |
| Coverage (%) | 100 | 62 | 100 | 62 |
| Prediction of 1,562 RTECS compounds with 90% confidence level
| ||||
| | 0.42 | 0.62 | 0.19 | 0.26 |
| MAE | 0.60 | 0.42 | 0.84 | 0.66 |
| Coverage (%) | 12 | 6 | 12 | 6 |
Figure 4Fraction of compounds versus prediction errors obtained by the two-step rat LD50 model, TOPKAT, and random sampling for 965 and 101 RTECS compounds.