| Literature DB >> 24959207 |
Jing Lu1, Jianlong Peng2, Jinan Wang2, Qiancheng Shen2, Yi Bi3, Likun Gong2, Mingyue Zheng2, Xiaomin Luo2, Weiliang Zhu2, Hualiang Jiang4, Kaixian Chen5.
Abstract
BACKGROUND: Acute toxicity means the ability of a substance to cause adverse effects within a short period following dosing or exposure, which is usually the first step in the toxicological investigations of unknown substances. The median lethal dose, LD50, is frequently used as a general indicator of a substance's acute toxicity, and there is a high demand on developing non-animal-based prediction of LD50. Unfortunately, it is difficult to accurately predict compound LD50 using a single QSAR model, because the acute toxicity may involve complex mechanisms and multiple biochemical processes.Entities:
Keywords: Acute toxicity; Applicability domain; Consensus model; Local lazy learning
Year: 2014 PMID: 24959207 PMCID: PMC4047767 DOI: 10.1186/1758-2946-6-26
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Performance of four LLL models using different similarity metrics on the test set using reference set I (Group I) versus the best model of the reference
| LLR | ECFP4 | 0.347 | 0.622 | 0.415 | 0.589 | 0.437 | 0.580 | 0.647 | 0.474 |
| FCFP4 | 0.333 | 0.629 | 0.433 | 0.589 | 0.460 | 0.575 | 0.649 | 0.477 | |
| MACCS | 0.317 | 0.640 | 0.386 | 0.608 | 0.407 | 0.599 | 0.644 | 0.506 | |
| DES | 0.255 | 0.667 | 0.348 | 0.608 | 0.381 | 0.588 | 0.578 | 0.444 | |
| LLR_consensus | 0.434 | 0.535 | 0.511 | 0.508 | 0.534 | 0.499 | 0.738 | 0.401 | |
| SA | ECFP4 | 0.390 | 0.563 | 0.464 | 0.535 | 0.486 | 0.526 | 0.688 | 0.427 |
| FCFP4 | 0.363 | 0.581 | 0.469 | 0.543 | 0.494 | 0.527 | 0.674 | 0.441 | |
| MACCS | 0.379 | 0.573 | 0.452 | 0.546 | 0.467 | 0.542 | 0.679 | 0.452 | |
| DES | 0.329 | 0.591 | 0.419 | 0.539 | 0.448 | 0.523 | 0.627 | 0.407 | |
| SA_consensus | 0.450 | 0.528 | 0.527 | 0.493 | 0.544 | 0.490 | 0.746 | 0.394 | |
| SR | ECFP4 | 0.406 | 0.555 | 0.483 | 0.524 | 0.508 | 0.514 | 0.719 | 0.408 |
| FCFP4 | 0.376 | 0.575 | 0.485 | 0.536 | 0.513 | 0.519 | 0.701 | 0.426 | |
| MACCS | 0.383 | 0.570 | 0.457 | 0.543 | 0.473 | 0.539 | 0.690 | 0.445 | |
| DES | 0.329 | 0.590 | 0.419 | 0.539 | 0.448 | 0.523 | 0.627 | 0.407 | |
| SR_consensus | 0.457 | 0.515 | 0.536 | 0.489 | 0.555 | 0.485 | 0.761 | 0.384 | |
| GP | ECFP4 | 0.413 | 0.550 | 0.491 | 0.519 | 0.514 | 0.510 | 0.719 | 0.406 |
| FCFP4 | 0.367 | 0.583 | 0.477 | 0.543 | 0.504 | 0.527 | 0.689 | 0.437 | |
| MACCS | 0.360 | 0.586 | 0.443 | 0.556 | 0.461 | 0.551 | 0.692 | 0.451 | |
| DES | 0.329 | 0.586 | 0.421 | 0.534 | 0.450 | 0.518 | 0.627 | 0.412 | |
| GP_consensus | 0.460 | 0.512 | 0.545 | 0.483 | 0.565 | 0.477 | 0.771 | 0.371 | |
| Final_consensus | 0.466 | 0.510 | 0.545 | 0.483 | 0.565 | 0.478 | 0.769 | 0.374 | |
| Reference | Individuala | n/ab | n/ab | 0.41 | 0.55 | 0.41 | 0.56 | 0.70 | 0.41 |
| Consensus | n/a | n/a | 0.42 | 0.52 | 0.48 | 0.51 | 0.71 | 0.39 | |
aFor each coverage set, the best result among all individual models reported by Zhu et al. [7] was shown.
bThe reference did not report the prediction results of all compounds in the test set.
Figure 1Example to show how consensus model improve the prediction results. Comparison of estimates for an herbicide (CAS: 89427-44-1) using individual and consensus models from Group I. The y-axis is prediction error.
Performance of our models on the test set using the reference set II (Group II)
| LLR | ECFP4 | 0.513 | 0.495 | 0.623 | 0.439 | 0.657 | 0.415 | 0.867 | 0.222 |
| FCFP4 | 0.481 | 0.513 | 0.604 | 0.457 | 0.640 | 0.435 | 0.818 | 0.299 | |
| MACCS | 0.476 | 0.511 | 0.568 | 0.458 | 0.610 | 0.431 | 0.833 | 0.282 | |
| DES | 0.459 | 0.525 | 0.594 | 0.451 | 0.633 | 0.428 | 0.824 | 0.259 | |
| LLR_consensus | 0.608 | 0.420 | 0.711 | 0.371 | 0.743 | 0.354 | 0.908 | 0.186 | |
| SA | ECFP4 | 0.519 | 0.498 | 0.610 | 0.468 | 0.634 | 0.457 | 0.731 | 0.400 |
| FCFP4 | 0.492 | 0.513 | 0.593 | 0.481 | 0.619 | 0.470 | 0.716 | 0.438 | |
| MACCS | 0.484 | 0.517 | 0.573 | 0.485 | 0.608 | 0.470 | 0.745 | 0.397 | |
| DES | 0.472 | 0.522 | 0.576 | 0.477 | 0.610 | 0.462 | 0.745 | 0.361 | |
| SA_consensus | 0.577 | 0.452 | 0.664 | 0.424 | 0.690 | 0.416 | 0.779 | 0.365 | |
| SR | ECFP4 | 0.551 | 0.475 | 0.650 | 0.438 | 0.676 | 0.424 | 0.834 | 0.319 |
| FCFP4 | 0.520 | 0.495 | 0.627 | 0.458 | 0.656 | 0.444 | 0.772 | 0.391 | |
| MACCS | 0.494 | 0.510 | 0.585 | 0.476 | 0.621 | 0.460 | 0.765 | 0.381 | |
| DES | 0.473 | 0.521 | 0.576 | 0.477 | 0.610 | 0.461 | 0.747 | 0.360 | |
| SR_consensus | 0.593 | 0.442 | 0.683 | 0.411 | 0.710 | 0.402 | 0.825 | 0.330 | |
| GP | ECFP4 | 0.587 | 0.436 | 0.684 | 0.391 | 0.709 | 0.375 | 0.928 | 0.161 |
| FCFP4 | 0.547 | 0.462 | 0.659 | 0.413 | 0.694 | 0.394 | 0.844 | 0.278 | |
| MACCS | 0.489 | 0.493 | 0.585 | 0.451 | 0.630 | 0.428 | 0.845 | 0.292 | |
| DES | 0.501 | 0.494 | 0.610 | 0.445 | 0.647 | 0.426 | 0.817 | 0.297 | |
| GP_consensus | 0.623 | 0.413 | 0.717 | 0.374 | 0.744 | 0.362 | 0.916 | 0.213 | |
| Final_consensus | 0.619 | 0.422 | 0.712 | 0.385 | 0.740 | 0.374 | 0.885 | 0.265 | |
Figure 2Distribution of R values for the prediction of whole test set (“Set_3874”) using different reference set.
The results of the test compound flocoumafen (cas: 90035-08-8) and its neighbors from the reference set I and II
| Flocumafen | 6.336 | --- | --- | --- | |
| The neighbors from reference set I | 4.068 | 0.543 | 3.435 | 3.085 | |
| 3.129 | 0.691 | ||||
| 2.687 | 0.714 | ||||
| 3.147 | 0.720 | ||||
| 3.166 | 0.743 | ||||
| The neighbors from reference set II | 6.218 | 0.169 | 6.295 | 5.948 | |
| 5.807 | 0.177 | ||||
| 5.743 | 0.194 | ||||
| 5.686 | 0.338 | ||||
| 5.873 | 0.343 |
aExperimental –log(LD50).
bThe Tanimoto distance of Flocumafen to its neighbor using ECFP4.
cThe predicted –log(LD50) value by the best individual model.
dThe predicted –log(LD50) value by the final consensus model.
Figure 3The MAEs of all individual models for the test set in Group II. all (blue): the MAEs of all compounds in the test set; within (red): the MAEs of compounds within AD; outside (green): the MAEs of compounds outside of the AD; comp: the number of compounds within the AD.
Figure 4The flowchart of the LLR modeling.
Figure 5Triangle inequality. An triangle can be constructed for any 3 compounds by using Euclidean or Tanimoto distance as the length of edge. D is the projection of vertex C on line AB.