| Literature DB >> 33430989 |
Domenico Gadaleta1, Kristijan Vuković2, Cosimo Toma2,3, Giovanna J Lavado2, Agnes L Karmaus4, Kamel Mansouri4, Nicole C Kleinstreuer5, Emilio Benfenati2, Alessandra Roncaglioni2.
Abstract
The median lethal dose for rodent oral acute toxicity (LD50) is a standard piece of information required to categorize chemicals in terms of the potential hazard posed to human health after acute exposure. The exclusive use of in vivo testing is limited by the time and costs required for performing experiments and by the need to sacrifice a number of animals. (Quantitative) structure-activity relationships [(Q)SAR] proved a valid alternative to reduce and assist in vivo assays for assessing acute toxicological hazard. In the framework of a new international collaborative project, the NTP Interagency Center for the Evaluation of Alternative Toxicological Methods and the U.S. Environmental Protection Agency's National Center for Computational Toxicology compiled a large database of rat acute oral LD50 data, with the aim of supporting the development of new computational models for predicting five regulatory relevant acute toxicity endpoints. In this article, a series of regression and classification computational models were developed by employing different statistical and knowledge-based methodologies. External validation was performed to demonstrate the real-life predictability of models. Integrated modeling was then applied to improve performance of single models. Statistical results confirmed the relevance of developed models in regulatory frameworks, and confirmed the effectiveness of integrated modeling. The best integrated strategies reached RMSEs lower than 0.50 and the best classification models reached balanced accuracies over 0.70 for multi-class and over 0.80 for binary endpoints. Computed predictions will be hosted on the EPA's Chemistry Dashboard and made freely available to the scientific community.Entities:
Keywords: (Q)SAR; Acute rat oral toxicity; Computational toxicology; Integrated modeling; LD50
Year: 2019 PMID: 33430989 PMCID: PMC6717335 DOI: 10.1186/s13321-019-0383-2
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Number of chemicals included in the TS and the ES for each toxicity class
| Training set (TS) | Evaluation set (ES) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | # | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | |
| LD50 | 6279 | – | – | – | – | – | 2169 | – | – | – | – | – |
| vT | 8462 | 703 | 7759 | – | – | – | 2888 | 242 | 2646 | – | – | – |
| nT | 8402 | 4779 | 3623 | – | – | – | 2884 | 1651 | 1233 | – | – | – |
| EPA | 8259 | 703 | 1806 | 4142 | 1608 | – | 2859 | 242 | 643 | 1421 | 553 | – |
| GHS | 8331 | 165 | 538 | 1089 | 2916 | 3623 | 2879 | 58 | 184 | 391 | 1013 | 1233 |
Hazard categories for the two multi-class modeled endpoints (EPA and GHS classes) are sorted for decreasing toxicity, from 1 to 5. For the vT classification, class 1 corresponds to the positive (i.e., very toxic) compounds, while for the nT endpoint class 1 corresponds to the negative (i.e., toxic) compounds
Summary of modeling methods used
| Method | Software | Descriptors | Applicability domain | Endpoints | ||||
|---|---|---|---|---|---|---|---|---|
| LD50 point estimate | vT | nT | EPA | GHS | ||||
| BRF/rRF | KNIME | Dragon | “Error” model Confidence Similarity | ✓ | ✓ | ✓ | ✓ | ✓ |
| aiQSAR | R | Dragon | ADM | ✓ | ✓ | ✓ | ✓ | ✓ |
| istKNN | istKNN | Fingerprints + structural keys | Similarity/activity-based thresholds | ✓ | ||||
| SARpy | SARpy | SAs | Presence/absence of SAs | ✓ | ✓ | |||
| HPT-RF | R (Caret) | Dragon | Mirror matrix Isolation forest | ✓ | ✓ | ✓ | ||
| GLM | R (H2O) | Dragon | NA | ✓ | ||||
For each method, the software, the descriptors used, the applicability domain definition and the modeled endpoints are specified. The methods listed are balanced random forest (BRF)/regression random forest (rRF); ab initio QSAR (aiQSAR); istKNN; hyper-parameter tuning random forest (HPT-RF); generalized linear model (GLM)
Fig. 1Workflow for aiQSAR-based model development
(Adapted from [34])
rRF and BRF settings
| Endpoint | Model | iTS | iVS | #descrs | #trees | k | TC | TD | TE |
|---|---|---|---|---|---|---|---|---|---|
| LD50 point estimate | rRF | 5028 | 1251 | 1352 | 150 | 1 | – | 90th | 1.00 |
| nT | BRF | 6722 | 1680 | 1247 | 150 | 1 | 0.65 | 100th | – |
| vT | BRF | 6772 | 1690 | 1250 | 100 | 1 | 0.65 | 95th | – |
| EPA | BRF | 6607 | 1652 | 1243 | 100 | 1 | 0.40 | 100th | – |
| GHS | BRF | 6663 | 1668 | 1244 | 100 | 5 | 0.30 | 90th | – |
For each model, the size of the internal training set (iTS) and internal validation set (iVS), the number of descriptors (#descrs), the number of trees (#trees) and the tuned parameters for AD definition are indicated
HPT-RF settings
| Endpoint |
|
|
|
|---|---|---|---|
| LD50 | 748 | Extratrees | 5 |
| EPA | 38 | Extratrees | 1 |
| GHS | 38 | Extratrees | 1 |
For each model, the number of descriptors in each tree (mtry), the rule for descriptor selection for single trees (splitrule) and the minimal node size of trees (min.node.size) are indicated
External performance of single models for predicting single point logLD50 (mmol/kg)
| Model | R2 | MAE | RMSE | #AD | %AD |
|---|---|---|---|---|---|
| rRF | 0.590 | 0.432 | 0.585 | 1966 |
|
| aiQSAR |
| 0.390 |
| 1843 | 0.850 |
| istKNN | 0.628 |
| 0.545 | 1917 | 0.884 |
| HPT-RF | 0.620 | 0.398 |
| 1885 | 0.869 |
For each model, the R2, the mean absolute error (MAE), the root-mean squared error (RMSE), the number (#AD) and the percentage (%AD) of predictions in AD are reported. The best values for each metric are italicized
External performance of single models for predicting classification endpoints (vT, nT, EPA, GHS)
| Model | SEN | SPE | MCC | BA | #AD | %AD | |
|---|---|---|---|---|---|---|---|
| nT | BRF |
|
|
|
| 2100 | 0.728 |
| aiQSAR | 0.723 | 0.829 | 0.556 | 0.776 | 2567 | 0.890 | |
| SARpy | 0.772 | 0.724 | 0.492 | 0.748 | 2488 | 0.863 | |
| GLM | 0.779 | 0.650 | 0.425 | 0.714 | 2884 |
| |
| vT | BRF |
| 0.903 | 0.585 |
| 2103 | 0.728 |
| aiQSAR | 0.682 |
|
| 0.822 | 2572 | 0.891 | |
| SARpy | 0.710 | 0.896 | 0.467 | 0.803 | 2613 |
| |
| EPA | BRF | 0.614 | 0.851 | 0.405 | 0.733 | 2301 | 0.805 |
| aiQSAR | 0.603 | 0.857 | 0.450 | 0.730 | 2547 |
| |
| HPT-RF |
|
|
|
| 2180 | 0.763 | |
| GHS | BRF | 0.539 | 0.872 | 0.342 | 0.705 | 1410 | 0.490 |
| aiQSAR | 0.568 | 0.895 | 0.469 | 0.731 | 1475 |
| |
| HPT-RF |
|
|
|
| 1291 | 0.448 |
For each model, the sensitivity (SEN), the specificity (SPE), the balanced accuracy (BA), the Matthew’s correlation coefficient (MCC), the number (#AD) and the percentage (%AD) of predictions in AD are reported. For multi-category endpoints (EPA and GHS), SEN and SPE are the average of values computed separately for each class, while BA is the arithmetic mean of the average SEN and SPE. The best values for each metric are italicized
External performance of the continuous integrated model for predicting single point logLD50 (mmol/kg)
| R2 | MAE | RMSE | #AD | %AD | PF |
|---|---|---|---|---|---|
| 0.632 | 0.397 | 0.549 | 2152 | 0.992 | 0.25 |
| 0.646 | 0.390 | 0.535 | 2085 | 0.961 | 0.50 |
| 0.675 | 0.373 | 0.512 | 1900 | 0.876 | 0.75 |
| 0.716 | 0.348 | 0.477 | 1474 | 0.680 | 1.00 |
The R2, the mean absolute error (MAE), the root-mean squared error (RMSE), the number (#AD) and the percentage (%AD) of predictions in AD are reported, with respect to the PF threshold for defining predictions in AD
Fig. 2Comparison of experimental and predicted (integrated) logLD50 (mmol/kgbw) for ES chemicals. Darkest circles indicate samples with a high prediction fraction (PF), while lightest circles indicate samples with lower PF. Dashed line represents the case of ideal correlation, while dotted lines delimit samples with an absolute error in prediction lower than 1.00 log unit
External performance of continuous models for predicting classification endpoints (vT, nT, EPA, GHS)
| SEN | SPE | MCC | BA | #AD | %AD | CS | |
|---|---|---|---|---|---|---|---|
| nT | 0.794 | 0.796 | 0.587 | 0.795 | 2665 | 0.924 | 1 |
| 0.840 | 0.841 | 0.677 | 0.840 | 2182 | 0.757 | 2 | |
| 0.878 | 0.858 | 0.733 | 0.868 | 1704 | 0.591 | 3 | |
| 0.913 | 0.883 | 0.794 | 0.898 | 1222 | 0.424 | 4 | |
| vT | 0.743 | 0.938 | 0.577 | 0.840 | 2742 | 0.949 | 1 |
| 0.796 | 0.976 | 0.737 | 0.886 | 2316 | 0.802 | 2 | |
| 0.890 | 0.978 | 0.820 | 0.934 | 1556 | 0.539 | 3 | |
| EPA | 0.602 | 0.856 | 0.439 | 0.729 | 2653 | 0.928 | 1 |
| 0.701 | 0.885 | 0.550 | 0.793 | 1731 | 0.605 | 2 | |
| 0.739 | 0.898 | 0.600 | 0.819 | 1200 | 0.420 | 3 | |
| GHS | 0.567 | 0.894 | 0.461 | 0.731 | 1561 | 0.542 | 1 |
| 0.644 | 0.911 | 0.541 | 0.777 | 908 | 0.315 | 2 | |
| 0.676 | 0.916 | 0.573 | 0.796 | 617 | 0.214 | 3 |
For each model, the sensitivity (SEN), the specificity (SPE), the balanced accuracy (BA), the Matthew’s correlation coefficient (MCC) the number (#AD) and the percentage (%AD) of predictions in AD are reported, with respect to the CS threshold for defining predictions in AD. For multi-category endpoints (EPA and GHS), SEN and SPE are the average of sensitivities/specificities computed separately for each class, while BA is the arithmetic mean of the average SEN and SPE
Fig. 3Overview of Pareto optimal solutions for regression and classification models. a Performance of continuous LD50 models is described as root-mean squared error (RMSE) versus percentage of compounds in the AD (%AD). b Performance of classification (nT, vT, EPA, GHS) models are described as balanced accuracy (BA) versus percentage of compounds in the AD (%AD). All the parameters refer to the ES. Models in the bottom-left part of the plots are characterized by the best compromise in terms of performance and coverage, with dotted lines representing the Pareto front for a given endpoint. White indicators are single models (R = rRF; B = BRF; H = HPT-RF; K = istkNN; S = SARpy; Q = aiQSAR), while black indicators are integrated models, flagged with the corresponding PF (for regression) or CS (for classification) threshold
External performance of the continuous integrated model for separate PFs
| R2 | MAE | RMSE | # | % | %AE ≥ 1 | PF |
|---|---|---|---|---|---|---|
| 0.265 | 0.620 | 0.865 | 67 | 0.031 | 0.209 | 0.25 |
| 0.391 | 0.568 | 0.736 | 185 | 0.085 | 0.151 | 0.50 |
| 0.540 | 0.458 | 0.616 | 426 | 0.196 | 0.085 | 0.75 |
| 0.716 | 0.348 | 0.477 | 1474 | 0.680 | 0.043 | 1.00 |
The R2, the mean absolute error (MAE), the root-mean squared error (RMSE), the number (#) and the percentage (%) of predictions with a given prediction fraction (PF) are reported. In addition, the percentage of samples with a given PF value and an absolute error in prediction equal to or greater than 1.00 log unit with respect of the total number of samples with the same PF (%AE ≥ 1) are reported
External performance of other published acute toxicity models developed within the NICEATM and EPA’s NCCT collaborative project
| Method | LD50 single point | nT | vT | EPA | GHS |
|---|---|---|---|---|---|
| Multifingerprint similarity search [ | R2 = 0.737 RMSE = 0.408 %AD = 0.347 | SEN = 0.873 SPE = 0.915 MCC = 0.793 %AD = 0.263 | SEN = 0.789 SPE = 0.998 MCC = 0.857 %AD = 0.320 | MCC = 0.730 %AD = 0.159 | MCC = 0.733 %AD = 0.223 |
| Bayesian consensus [ | – | SEN = 0.800 SPE = 0.840 BA = 0.820 %AD = 0.730 | SEN = 0.850 SPE = 0.940 BA = 0.900 %AD = 0.770 | – | – |
For each method, performance for the five acute toxicity relevant endpoints are reported