| Literature DB >> 35517078 |
Xingmei Chen1, Limin Dang1, Hai Yang1, Xianwei Huang1, Xinliang Yu1.
Abstract
Predicting the acute toxicity of a large dataset of diverse chemicals against fathead minnows (Pimephales promelas) is challenging. In this paper, 963 organic compounds with acute toxicity towards fathead minnows were split into a training set (482 compounds) and a test set (481 compounds) with an approximate ratio of 1 : 1. Only six molecular descriptors were used to establish the quantitative structure-activity/toxicity relationship (QSAR/QSTR) model for 96 hour pLC50 through a support vector machine (SVM) along with genetic algorithm. The optimal SVM model (R 2 = 0.756) was verified using both internal (leave-one-out cross-validation) and external validations. The validation results (q int 2 = 0.699 and q ext 2 = 0.744) were satisfactory in predicting acute toxicity in fathead minnows compared with other models reported in the literature, although our SVM model has only six molecular descriptors and a large data set for the test set consisting of 481 compounds. This journal is © The Royal Society of Chemistry.Entities:
Year: 2020 PMID: 35517078 PMCID: PMC9056962 DOI: 10.1039/d0ra05906d
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
Model summary obtained with stepwise MLR
| Model |
|
| Adjusted | Std. error of the estimate |
|---|---|---|---|---|
| 1 | 0.722 | 0.522 | 0.521 | 1.005398 |
| 2 | 0.773 | 0.598 | 0.597 | 0.922460 |
| 3 | 0.791 | 0.626 | 0.624 | 0.890379 |
| 4 | 0.806 | 0.649 | 0.648 | 0.862459 |
| 5 | 0.821 | 0.674 | 0.672 | 0.831622 |
| 6 | 0.830 | 0.689 | 0.687 | 0.812323 |
| 7 | 0.836 | 0.699 | 0.697 | 0.800239 |
| 8 | 0.840 | 0.706 | 0.703 | 0.791266 |
Predictors: (constant), CLOGP.
Predictors: (constant), CLOGP, SM6_B(P).
Predictors: (constant), CLOGP, SM6_B(P), NDB.
Predictors: (constant), CLOGP, SM6_B(P), NDB, nHM.
Predictors: (constant), CLOGP, SM6_B(P), NDB, nHM, SPMAD_EA.
Predictors: (constant), CLOGP, SM6_B(P), NDB, nHM, SPMAD_EA, MOR10E.
Predictors: (constant), CLOGP, SM6_B(P), NDB, nHM, SPMAD_EA, MOR10E, B10[C–N].
Predictors: (constant), CLOGP, SM6_B(P), NDB, nHM, SPMAD_EA, MOR10E, B10[C–N], MLOGP.
Characteristics of molecular descriptors in MLR model
| Descriptor | Coefficients | Std. error |
| Sig. | VIF |
|---|---|---|---|---|---|
| Constant | 0.309 | 0.510 | 0.605 | 0.546 | — |
| CLOGP | 0.408 | 0.025 | 16.1 | 0.00 | 1.89 |
| SM6_B(P) | 0.725 | 0.097 | 7.46 | 0.00 | 3.84 |
| NDB | 0.305 | 0.040 | 7.68 | 0.00 | 1.39 |
| nHM | 0.149 | 0.027 | 5.53 | 0.00 | 1.26 |
| SPMAD_EA | −2.25 | 0.363 | −6.21 | 0.00 | 2.54 |
| MOR10E | 0.390 | 0.075 | 5.22 | 0.00 | 1.15 |
Fig. 1Plot of experimental versus predicted pLC50 with SVM model.
Comparison of the current SVM model with previous relative works
| Algorithm |
|
|
|
|
| Reference |
|---|---|---|---|---|---|---|
| MLR | 3 | 556 | 0.65 | 219 | 0.51 |
|
| MLR | 3 | 556 | 0.65 | 169 | 0.41 |
|
| Consensus | — | 557 | 0.71 | 201 | 0.60 |
|
| Consensus | — | 557 | 0.71 | 144 | 0.58 |
|
| MLR + ANN | 5 | 445 | 0.712–0.776 | 110 | 0.553–0.632 |
|
| MLR | 3–5 | 63–247 | (0.707–0.903) | 16–62 | (0.660–0.858) |
|
| GA-kNN | 6 | 726 | 0.62–0.73 | 182 | 0.61–0.77 |
|
| MLR + ANN | 6 | 340 | 0.865 | 99–226 | 0.504–0.548 |
|
| GA-MLR | 8 | 771 | 0.70 | 192 | (0.641) |
|
| SVM | 6 | 482 | 0.756 | 481 | 0.686 | Current study |
Fig. 2Williams plot with a warning leverage of 0.0436.