| Literature DB >> 30384505 |
Ting Wang1, Lili Tang2, Feng Luan3, M Natália D S Cordeiro4.
Abstract
Organic compounds are often exposed to the environment, and have an adverse effect on the environment and human health in the form of mixtures, rather than as single chemicals. In this paper, we try to establish reliable and developed classical quantitative structure⁻activity relationship (QSAR) models to evaluate the toxicity of 99 binary mixtures. The derived QSAR models were built by forward stepwise multiple linear regression (MLR) and nonlinear radial basis function neural networks (RBFNNs) using the hypothetical descriptors, respectively. The statistical parameters of the MLR model provided were N (number of compounds in training set) = 79, R² (the correlation coefficient between the predicted and observed activities)= 0.869, LOOq² (leave-one-out correlation coefficient) = 0.864, F (Fisher's test) = 165.494, and RMS (root mean square) = 0.599 for the training set, and Next (number of compounds in external test set) = 20, R² = 0.853, qext2 (leave-one-out correlation coefficient for test set)= 0.825, F = 30.861, and RMS = 0.691 for the external test set. The RBFNN model gave the statistical results, namely N = 79, R² = 0.925, LOOq² = 0.924, F = 950.686, RMS = 0.447 for the training set, and Next = 20, R² = 0.896, qext2 = 0.890, F = 155.424, RMS = 0.547 for the external test set. Both of the MLR and RBFNN models were evaluated by some statistical parameters and methods. The results confirm that the built models are acceptable, and can be used to predict the toxicity of the binary mixtures.Entities:
Keywords: mixture; multiple linear regression (MLR), radial basis function neural networks (RBFNNs); toxicity
Mesh:
Substances:
Year: 2018 PMID: 30384505 PMCID: PMC6274693 DOI: 10.3390/ijms19113423
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Toxicity data of the single chemicals.
| No. | Single Chemicals | CAS | −logEC50 (mol/L) | Residual | |
|---|---|---|---|---|---|
| Experimental | Predicted | ||||
| 1# | Acetaldehyde | 75-07-0 | 2.36 | 3.177 | 0.817 |
| 2# | Propanal | 123-38-6 | 2.72 | 3.212 | 0.492 |
| 3# | Butyraldehyde | 123-72-8 | 3.25 | 3.224 | −0.0265 |
| 4# | Valeraldehyde | 110-62-3 | 3.27 | 3.628 | 0.358 |
| 5# | Benzaldehyde | 100-52-7 | 3.43 | 4.552 | 1.122 |
| 6# | 555-16-8 | 4.28 | 3.634 | −0.646 | |
| 7# | 623-27-8 | 4.07 | 4.880 | 0.810 | |
| 8# | 104-88-1 | 3.97 | 3.876 | −0.094 | |
| 9# | 1122-91-4 | 4.3 | 3.861 | −0.437 | |
| 10# | 123-08-0 | 4.54 | 3.777 | −0.763 | |
| 11# | 104-87-0 | 3.82 | 4.030 | 0.210 | |
| 12# | 123-11-5 | 4.03 | 3.985 | −0.0448 | |
| 13# | 100-10-7 | 5.4 | 4.622 | −0.778 | |
| 14# | Malononitrile | 109-77-3 | 2.55 | 1.783 | −0.767 |
| 15# | Glycolonitrile | 107-16-4 | 2.98 | 2.141 | −0.839 |
| 16# | α-Hydroxyisobutyronitrile | 75-86-5 | 3.61 | 3.834 | 0.227 |
| 17# | Allyl cyanide | 109-75-1 | 2.06 | 1.507 | −0.553 |
| 18# | Benzonitrile | 100-47-0 | 3.48 | 3.456 | −0.0237 |
| 19# | Benzyl cyanide | 140-29-4 | 4.23 | 2.963 | −1.267 |
| 20# | Acetonitrile | 1975-5-8 | 0.75 | 2.023 | 1.273 |
| 21# | Acrylonitrile | 107-13-1 | 1.51 | 1.467 | −0.0414 |
| 22# | Succinonitrile | 110-61-2 | 0.36 | 2.401 | 2.042 |
| 23# | Phthalonitrile | 91-15-6 | 3.51 | 2.622 | −0.888 |
| 24# | Lactonitrile | 78-97-7 | 2.01 | 2.440 | 0.430 |
| 25# | Atrazine | 1912-24-9 | 6.68 | 7.543 | 0863 |
| 26# | Prometryn | 7287-19-6 | 8.07 | 6.457 | −1.613 |
| 27# | Simetryn | 1014-70-6 | 6.29 | 5.565 | −0.725 |
| 28# | Prometone | 1610-18-0 | 8.99 | 7.801 | −1.182 |
| 29# | Simazine | 122-34-9 | 5.43 | 6.892 | 1.462 |
| 30# | Metribuzin | 21087-64-9 | 5.7 | 6.873 | 1.173 |
| 31# | Cyanazine | 21725-46-2 | 6.61 | 6.631 | 0.0212 |
| 32# | Terbutryn | 886-50-0 | 7.95 | 6.131 | −1.819 |
| 33# | Terbutylazine | 5915-41-3 | 6.93 | 8.225 | 1.295 |
| 34# | Ametryn | 834-12-8 | 6.56 | 5.516 | −1.044 |
| 35# | Diuron | 330-54-1 | 7.72 | 6.065 | −1.655 |
| 36# | Chlorotoluron | 15545-48-9 | 8.4 | 5.898 | −2.502 |
| 37# | Monolinuron | 1746-81-2 | 7.33 | 6.384 | −0.946 |
| 38# | Monuron | 150-68-5 | 6.3 | 6.254 | −0.0460 |
| 39# | Methabenzthiazuron | 18691-97-9 | 7.02 | 6.022 | −0.998 |
| 40# | Isoproturon | 34123-59-6 | 7.18 | 6.554 | −0.627 |
| 41# | Fenuron | 101-42-8 | 7.33 | 6.665 | −0.665 |
| 42# | Ethametsulfuron | 111353-84-5 | 4.13 | 6.302 | 2.172 |
| 43# | Chlorsulfuron | 64902-72-3 | 6.42 | 6.337 | −0.0833 |
| 44# | Metsulfuron | 79510-48-8 | 6.29 | 6.245 | −0.0447 |
| 45# | Sulfamethazine | 57-68-1 | 4.08 | 5.506 | 1.426 |
| 46# | Sulfapyridine | 144-83-2 | 3.84 | 3.407 | −0.433 |
| 47# | Sulfamethoxazole | 723-46-6 | 4.45 | 4.511 | 0.0609 |
| 48# | Sulfadiazine | 68-35-9 | 4.5 | 5.021 | 0.521 |
| 49# | Sulfisoxazole | 127-69-5 | 4.43 | 5.506 | 1.076 |
| 50# | Sulfamonomethoxine | 1220-83-3 | 5.05 | 4.535 | −0.515 |
| 51# | Sulfachloropyridazine | 80-32-0 | 4.78 | 5.117 | 0.337 |
| 52# | Sulfachinoxalin | 59-40-5 | 4.53 | 5.203 | 0.673 |
| 53# | Sulfamethoxydiazine | 651-06-9 | 4.41 | 5.050 | 0.640 |
| 54# | Sulfamethoxypyridazine | 80-35-3 | 4.36 | 4.934 | 0.579 |
| 55# | Trimethoprim | 738-70-5 | 3.22 | 5.209 | 1.989 |
Descriptors, Coefficients, Standard Error, and t-Test Values for the Best Multiple Linear Regression (MLR) Model.
| Coefficients | Standard Error | Descriptors | VIF | MF | |
|---|---|---|---|---|---|
| −0.405 | 0.463 | −0.874 | Intercept | ||
| −0.688 | 0.322 | −2.137 | Number of triple bonds (NTB) | 2.210 | −0.051 |
| 1.847 | 0.255 | 7.257 | Average Complementary Information content (order 2) (ACIC2) | 1.109 | 0.401 |
| 63.611 | 7.542 | 8.434 | Max partial charge for a C atom (Zefirov’s PC) ( | 1.002 | 0.650 |
Correlation matrix of the 3 descriptors used in the model.
| ACIC2 | NTB |
| |
|---|---|---|---|
| 1.000 | |||
| −0.314 | 1.000 | ||
|
| 0.740 | 0.013 | 1.000 |
The statistical results of the external test set for the MLR and radial basis function neural network (RBFNN) models.
| MLR | RBFNN | |
|---|---|---|
| R2 | 0.853 | 0.896 |
|
| 0.825 | 0.890 |
|
| 0.849 | 0.896 |
| (R2− | 0.005 | 0.000 |
| k | 0.983 | 1.030 |
| F | 30.861 | 155.424 |
| RMS | 0.691 | 0.547 |
The No., chemicals in the mixtures, ratio of toxic unit, experimental −log(EC50mix), predicted −log(EC50mix), and their corresponding residual.
| No. | Chemicals in the Mixtures | The Ratio of Toxic Unit | Experimental−log(EC50mix) (mol/L) | MLR | RBFNN | |||
|---|---|---|---|---|---|---|---|---|
| Predicted−log(EC50mix) (mol/L) | Residual | Predicted−log(EC50mix) (mol/L) | Residual | Set* | ||||
| 1 | 1#:14# | 1:1 | 2.44 | 2.36 | −0.08 | 2.67 | 0.23 | A |
| 2 | 2#:14# | 1:1 | 2.63 | 2.38 | −0.25 | 2.68 | 0.05 | B |
| 3* | 3#:14# | 1:1 | 2.77 | 2.38 | −0.39 | 2.69 | −0.08 | T |
| 4 | 4#:14# | 1:1 | 2.78 | 2.61 | −0.17 | 2.77 | −0.01 | C |
| 5 | 5#:14# | 1:1 | 2.8 | 3.14 | 0.34 | 2.79 | −0.01 | D |
| 6 | 6#:14# | 1:1 | 2.84 | 2.61 | −0.23 | 2.80 | −0.04 | A |
| 7 | 7#:14# | 1:1 | 2.84 | 3.33 | 0.49 | 2.70 | −0.14 | B |
| 8* | 8#:14# | 1:1 | 2.83 | 2.75 | −0.08 | 2.82 | −0.01 | T |
| 9 | 9#:14# | 1:1 | 2.84 | 2.75 | −0.09 | 2.81 | −0.03 | C |
| 10 | 10#:14# | 1:1 | 2.85 | 2.70 | −0.15 | 2.81 | −0.04 | D |
| 11 | 11#:14# | 1:1 | 2.83 | 2.84 | 0.01 | 2.82 | −0.01 | A |
| 12 | 12#:14# | 1:1 | 2.84 | 2.82 | −0.02 | 2.82 | −0.02 | B |
| 13* | 13#:14# | 1:1 | 2.85 | 3.18 | 0.33 | 2.78 | −0.07 | T |
| 14 | 5#:15# | 1:1 | 3.15 | 3.25 | 0.10 | 3.03 | −0.12 | C |
| 15 | 6#:15# | 1:1 | 3.26 | 2.72 | −0.54 | 3.21 | −0.05 | D |
| 16 | 7#:15# | 1:1 | 3.25 | 3.44 | 0.19 | 3.17 | −0.08 | A |
| 17 | 8#:15# | 1:1 | 3.24 | 2.86 | −0.38 | 3.11 | −0.13 | B |
| 18* | 9#:15# | 1:1 | 3.26 | 2.85 | −0.41 | 3.10 | −0.16 | T |
| 19 | 10#:15# | 1:1 | 3.27 | 2.80 | −0.47 | 3.14 | −0.13 | C |
| 20 | 11#:15# | 1:1 | 3.22 | 2.95 | −0.27 | 3.02 | −0.20 | D |
| 21 | 13#:15# | 1:1 | 3.28 | 3.29 | 0.01 | 3.06 | −0.22 | A |
| 22 | 1#:16# | 1:1 | 2.64 | 3.43 | 0.79 | 3.38 | 0.74 | B |
| 23* | 2#:16# | 1:1 | 2.97 | 3.45 | 0.48 | 3.38 | 0.41 | T |
| 24 | 3#:16# | 1:1 | 3.39 | 3.46 | 0.07 | 3.38 | −0.01 | C |
| 25 | 5#:16# | 1:1 | 3.51 | 4.22 | 0.71 | 3.71 | 0.20 | D |
| 26 | 6#:16# | 1:1 | 3.83 | 3.70 | −0.13 | 3.42 | −0.41 | A |
| 27 | 7#:16# | 1:1 | 3.78 | 4.41 | 0.63 | 3.52 | −0.26 | B |
| 28* | 8#:16# | 1:1 | 3.75 | 3.83 | 0.08 | 3.56 | −0.19 | T |
| 29 | 9#:16# | 1:1 | 3.83 | 3.82 | −0.01 | 3.56 | −0.27 | C |
| 30 | 10#:16# | 1:1 | 3.86 | 3.78 | −0.08 | 3.51 | −0.35 | D |
| 31 | 11#:16# | 1:1 | 3.7 | 3.92 | 0.22 | 3.67 | −0.03 | A |
| 32 | 12#:16# | 1:1 | 3.77 | 3.90 | 0.13 | 3.61 | −0.16 | B |
| 33* | 13#:16# | 1:1 | 3.9 | 4.26 | 0.36 | 3.68 | −0.22 | T |
| 34 | 1#:17# | 1:1 | 2.18 | 2.10 | −0.08 | 1.81 | −0.37 | C |
| 35 | 3#:17# | 1:1 | 2.33 | 2.13 | −0.20 | 1.87 | −0.46 | D |
| 36 | 4#:17# | 1:1 | 2.34 | 2.35 | 0.01 | 1.93 | −0.41 | A |
| 37 | 5#:17# | 1:1 | 2.34 | 2.89 | 0.55 | 2.68 | 0.34 | B |
| 38* | 6#:17# | 1:1 | 2.36 | 2.36 | 0.00 | 2.28 | −0.08 | T |
| 39 | 7#:17# | 1:1 | 2.36 | 3.07 | 0.71 | 3.06 | 0.70 | C |
| 40 | 8#:17# | 1:1 | 2.36 | 2.50 | 0.14 | 2.27 | −0.09 | D |
| 41 | 10#:17# | 1:1 | 2.36 | 2.44 | 0.08 | 2.25 | −0.11 | A |
| 42 | 11#:17# | 1:1 | 2.35 | 2.59 | 0.24 | 2.27 | −0.08 | B |
| 43* | 12#:17# | 1:1 | 2.35 | 2.56 | 0.21 | 2.28 | −0.07 | T |
| 44 | 13#:17# | 1:1 | 2.36 | 2.93 | 0.57 | 2.74 | 0.38 | C |
| 45 | 5#:18# | 1:1 | 3.45 | 4.01 | 0.56 | 4.06 | 0.61 | D |
| 46 | 6#:18# | 1:1 | 3.72 | 3.48 | −0.24 | 3.53 | −0.19 | A |
| 47 | 7#:18# | 1:1 | 3.68 | 4.19 | 0.51 | 3.88 | 0.20 | B |
| 48* | 8#:18# | 1:1 | 3.66 | 3.62 | −0.04 | 3.76 | 0.10 | T |
| 49 | 10#:18# | 1:1 | 3.74 | 3.56 | −0.18 | 3.68 | −0.06 | C |
| 50 | 11#:18# | 1:1 | 3.62 | 3.71 | 0.09 | 3.94 | 0.32 | D |
| 51 | 12#:18# | 1:1 | 3.67 | 3.68 | 0.01 | 3.86 | 0.19 | A |
| 52 | 13#:18# | 1:1 | 3.78 | 4.05 | 0.27 | 4.03 | 0.25 | B |
| 53* | 5#:19# | 1:1 | 3.67 | 3.72 | 0.05 | 4.23 | 0.56 | T |
| 54 | 6#:19# | 1:1 | 4.25 | 3.20 | −1.05 | 3.25 | −1.00 | C |
| 55 | 7#:19# | 1:1 | 4.14 | 3.91 | −0.23 | 4.23 | 0.09 | D |
| 56 | 8#:19# | 1:1 | 4.08 | 3.34 | −0.74 | 3.56 | −0.52 | A |
| 67 | 9#:19# | 1:1 | 4.26 | 3.33 | −0.93 | 3.56 | −0.70 | B |
| 58* | 10#19# | 1:1 | 4.36 | 3.28 | −1.08 | 3.45 | −0.91 | T |
| 59 | 13#:19# | 1:1 | 4.5 | 3.76 | −0.74 | 4.23 | −0.27 | C |
| 60 | 25#:35# | 1:1 | 6.94 | 7.10 | 0.16 | 7.78 | 0.84 | D |
| 61 | 25#:36# | 1:1 | 6.97 | 7.01 | 0.04 | 7.63 | 0.66 | A |
| 62 | 25#:37# | 1:1 | 6.89 | 7.28 | 0.39 | 6.95 | 0.06 | B |
| 63* | 25#:38# | 1:1 | 6.45 | 7.21 | 0.76 | 7.92 | 1.47 | T |
| 64 | 26#:35# | 1:1 | 7.86 | 6.48 | −1.38 | 6.76 | −1.10 | C |
| 65 | 26#:36# | 1:1 | 8.2 | 6.39 | −1.81 | 6.90 | −1.30 | D |
| 66 | 26#:37# | 1:1 | 7.56 | 6.67 | −0.89 | 7.64 | 0.08 | A |
| 67 | 26#:38# | 1:1 | 6.59 | 6.60 | 0.01 | 6.45 | −0.14 | B |
| 68* | 27#:35# | 1:1 | 6.58 | 5.97 | −0.61 | 6.95 | 0.37 | T |
| 69 | 27#:36# | 1:1 | 6.59 | 5.88 | −0.71 | 6.81 | 0.22 | C |
| 70 | 27#:37# | 1:1 | 6.55 | 6.16 | −0.39 | 7.10 | 0.55 | D |
| 71 | 27#:38# | 1:1 | 6.29 | 6.08 | −0.21 | 7.03 | 0.74 | A |
| 72 | 28#:35# | 1:1 | 8 | 7.25 | −0.75 | 7.64 | −0.36 | B |
| 73* | 28#:36# | 1:1 | 8.6 | 7.15 | −1.45 | 7.79 | −0.81 | T |
| 74 | 28#:37# | 1:1 | 7.62 | 7.43 | −0.19 | 7.90 | 0.28 | C |
| 75 | 28#:38# | 1:1 | 6.6 | 7.36 | 0.76 | 7.31 | 0.71 | D |
| 76 | 29#:35# | 1:1 | 5.73 | 6.73 | 1.00 | 6.12 | 0.39 | A |
| 77 | 29#:36# | 1:1 | 5.73 | 6.64 | 0.91 | 6.38 | 0.65 | B |
| 78* | 29#:37# | 1:1 | 5.73 | 6.92 | 1.19 | 7.29 | 1.56 | T |
| 79 | 29#:38# | 1:1 | 5.68 | 6.85 | 1.17 | 5.67 | −0.01 | C |
| 80 | 45#:55# | 1:1 | 5.08 | 5.45 | 0.37 | 5.86 | 0.78 | D |
| 81 | 46#:55# | 1:1 | 4.85 | 4.26 | −0.59 | 3.98 | −0.87 | A |
| 82 | 47#:55# | 1:1 | 5.5 | 4.89 | −0.61 | 5.02 | −0.48 | B |
| 83* | 48#:55# | 1:1 | 5.42 | 5.17 | −0.25 | 5.56 | 0.14 | T |
| 84 | 49#:55# | 1:1 | 5.45 | 5.45 | 0.00 | 6.06 | 0.61 | C |
| 85 | 50#:55# | 1:1 | 6.01 | 4.90 | −1.11 | 5.05 | −0.96 | D |
| 86 | 51#:55# | 1:1 | 5.73 | 5.23 | −0.50 | 5.56 | −0.17 | A |
| 87 | 47#:55# | 13396:1 | 3.49 | 4.48 | 0.99 | 3.84 | 0.35 | B |
| 88* | 47#:55# | 8587:1 | 3.49 | 4.48 | 0.99 | 3.84 | 0.35 | T |
| 89 | 47#:55# | 2747:1 | 3.49 | 4.48 | 0.99 | 3.84 | 0.35 | C |
| 90 | 47#:55# | 858:1 | 3.51 | 4.48 | 0.97 | 3.84 | 0.33 | D |
| 91 | 47#:55# | 274:1 | 3.55 | 4.49 | 0.94 | 3.85 | 0.30 | A |
| 92 | 47#:55# | 85:1 | 3.67 | 4.49 | 0.82 | 3.86 | 0.19 | B |
| 93* | 47#:55# | 27:1 | 3.92 | 4.51 | 0.59 | 3.92 | 0.00 | T |
| 94 | 47#:55# | 15:1 | 4.08 | 4.53 | 0.45 | 3.98 | −0.10 | C |
| 95 | 47#:55# | 4:1 | 4.52 | 4.64 | 0.12 | 4.32 | −0.20 | D |
| 96 | 47#:55# | 1:6 | 5.34 | 4.59 | −0.75 | 4.18 | −1.16 | A |
| 97 | 47#:55# | 1:21 | 5.43 | 5.25 | −0.18 | 5.37 | −0.06 | B |
| 98* | 47#:55# | 1:37 | 5.45 | 5.27 | −0.18 | 5.36 | −0.09 | T |
| 99 | 47#:55# | 1:116 | 5.46 | 5.28 | −0.18 | 5.34 | −0.12 | C |
*Test set. Set* “T” means the corresponding compound belongs to the external test set. Set*s “A”, “B”, “C”, and “D” mean the compound belongs to the subset of the training set.
Figure 1Plot of the predicted versus experimental −log(EC50) including the training and the test set by MLR model (a) and by RBFNN model (b).
Figure 2Residuals of the training and test set by MLR (a) and RBFNN (b).
Figure 3The William plot of the training and external test set.
The R2 and LOOq2 values of 10 Y-randomization tests.
| No. | MLR | RBFNN | ||
|---|---|---|---|---|
| R2 | LOOq2 | R2 | LOOq2 | |
| 1 | 0.028 | 0.016 | 0.019 | 0.006 |
| 2 | 0.013 | 0.000 | 0.027 | 0.014 |
| 3 | 0.02 | 0.007 | 0.017 | 0.004 |
| 4 | 0.047 | 0.035 | 0.035 | 0.022 |
| 5 | 0.03 | 0.017 | 0.034 | 0.022 |
| 6 | 0.014 | 0.002 | 0.024 | 0.011 |
| 7 | 0.034 | 0.022 | 0.017 | 0.004 |
| 8 | 0.013 | 0.001 | 0.04 | 0.028 |
| 9 | 0.018 | 0.005 | 0.024 | 0.012 |
| 10 | 0.049 | 0.036 | 0.014 | 0.001 |
| Average | 0.0266 | 0.0141 | 0.0251 | 0.0124 |
Validation of the MLR model.
| Training Set | R2 | F | RMS | Test Set | R2 | F | RMS |
|---|---|---|---|---|---|---|---|
| A+B+C+D | 0.869 | 165.290 | 0.599 | T | 0.853 | 30.861 | 0.691 |
| B+C+D+T | 0.857 | 150.314 | 0.633 | A | 0.904 | 50.357 | 0.527 |
| A+C+D+T | 0.864 | 159.122 | 0.611 | B | 0.883 | 40.100 | 0.611 |
| A+B+D+T | 0.867 | 162.819 | 0.600 | C | 0.866 | 32.528 | 0.674 |
| A+B+C+T | 0.868 | 167.263 | 0.594 | D | 0.856 | 29.716 | 0.713 |
| Average | 0.865 | 160.962 | 0.607 | 0.872 | 36.712 | 0.643 |
Validation of the RBFNN model.
| Training Set | R2 | F | RMS | Test Set | R2 | F | RMS |
|---|---|---|---|---|---|---|---|
| A+B+C+D | 0.925 | 950.686 | 0.447 | T | 0.896 | 155.424 | 0.547 |
| B+C+D+T | 0.915 | 827.525 | 0.494 | A | 0.932 | 247.421 | 0.418 |
| A+C+D+T | 0.910 | 778.942 | 0.519 | B | 0.954 | 371.815 | 0.361 |
| A+B+D+T | 0.915 | 824.679 | 0.500 | C | 0.931 | 244.294 | 0.455 |
| A+B+C+T | 0.921 | 911.329 | 0.472 | D | 0.900 | 152.589 | 0.559 |
| Average | 0.917 | 858.632 | 0.486 | 0.923 | 234.309 | 0.468 |