| Literature DB >> 28991213 |
Litang Qin1,2,3, Xin Zhang4, Yuhan Chen5, Lingyun Mo6,7, Honghu Zeng8,9,10, Yanpeng Liang11,12,13.
Abstract
Several hundred disinfection byproducts (DBPs) in drinking water have been identified, and are known to have potentially adverse health effects. There are toxicological data gaps for most DBPs, and the predictive method may provide an effective way to address this. The development of an in-silico model of toxicology endpoints of DBPs is rarely studied. The main aim of the present study is to develop predictive quantitative structure-activity relationship (QSAR) models for the reactive toxicities of 50 DBPs in the five bioassays of X-Microtox, GSH+, GSH-, DNA+ and DNA-. All-subset regression was used to select the optimal descriptors, and multiple linear-regression models were built. The developed QSAR models for five endpoints satisfied the internal and external validation criteria: coefficient of determination (R²) > 0.7, explained variance in leave-one-out prediction (Q²LOO) and in leave-many-out prediction (Q²LMO) > 0.6, variance explained in external prediction (Q²F1, Q²F2, and Q²F3) > 0.7, and concordance correlation coefficient (CCC) > 0.85. The application domains and the meaning of the selective descriptors for the QSAR models were discussed. The obtained QSAR models can be used in predicting the toxicities of the 50 DBPs.Entities:
Keywords: QSAR; disinfection byproduct; drinking water; toxicity; validation
Mesh:
Substances:
Year: 2017 PMID: 28991213 PMCID: PMC6151816 DOI: 10.3390/molecules22101671
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
The five endpoints of the 50 drinking water disinfection byproducts (DBPs).
| Bioassay | Test Species (Strain/Cell Line) a | Endpoint | Detected Signal |
|---|---|---|---|
| Microtox | Cytotoxicity | Bioluminescence as indicator for cell viability | |
| Interaction with proteins/peptides | OD at 600 nm as indicator for cell density and descriptor of cell growth | ||
| Interaction with DNA | OD at 600 nm as indicator for cell density and descriptor of cell growth |
a: GSH+: EC50 of E. coli strain; MJF276: capable to produce glutathione (GSH); GSH−: EC50 of E. coli strain; MJF335: not capable to produce GSH and hence more susceptible to compounds which react with proteins (i.e., soft electrophiles); DNA+: EC50 of E. coli strain; MV1161: capable of repairing DNA damage; DNA−: EC50 of E. coli strain; MV4108: not capable of repairing DNA damage and hence more susceptible to compounds which react with DNA (i.e., hard electrophiles).
Observed and calculated effect concentrations (pEC50 (negative logarithm of 50% effective concentration) for X-Microtox and pECIR1.5 (negative logarithm of 1.5 induction ratio effective concentration) for the other assays, mol/L) of disinfection byproducts.
| No. | Name | X-Microtox | GSH+ | GSH− | DNA+ | DNA− | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Observed pEC50 | Calculated pEC50 | Observed pECIR1.5 | Calculated pECIR1.5 | Observed pECIR1.5 | Calculated pECIR1.5 | Observed pECIR1.5 | Calculated pECIR1.5 | Observed pECIR1.5 | Calculated pECIR1.5 | ||
| 1 | 1,1-dichloroethene | 3.1549 | 3.3214 | 1.3516 | 2.4847 | 1.4145 | 2.4357 | 1.0706 * | 1.7985 * | 0.4318 | 1.6512 |
| 2 | dichloromethane | 2.2840 | 3.1963 | 1.0888 * | 0.7391 * | 0.8861 | 0.7818 | 0.8539 | 1.6828 | 0.7212 | 0.9997 |
| 3 | bromochloromethane | 5.0706 | 4.4813 | 1.2182 | 1.6474 | 0.7328 | 1.8741 | 0.6021 | 1.7808 | - | - |
| 4 | chloroform | 2.4318 | 2.4675 | 1.2366 | 0.9330 | 1.4089 * | 1.0149 * | 1.1675 * | 1.2621 * | 1.0809 | 0.6385 |
| 5 | bromodichloromethane | 5.0269 | 4.7723 | 1.2111 * | 1.7614 * | 1.4034 | 1.9032 | 1.3188 | 1.6196 | 1.5686 * | 1.2324 * |
| 6 | bromoform | 3.6383 * | 3.2549 * | 1.0066 | 1.7407 | 1.5850 | 1.9863 | 1.0862 | 1.9166 | 1.1675 | 1.9629 |
| 7 | dibromochloromethane | 3.0000 * | 3.0193 * | 2.0809 | 1.7617 | 1.9208 | 2.0116 | 1.6990 * | 1.8321 * | 1.5528 * | 1.6647 * |
| 8 | dichloroiodomethane | 3.4949 * | 3.5960 * | 2.7959 | 2.9585 | 3.2076 | 3.4508 | 2.1938 | 1.8353 | 2.3279 | 1.6163 |
| 9 | bromochloroiodomethane | 1.6021 | 2.2717 | 2.7959 | 2.8967 | 3.3279 * | 3.3766 * | 2.2291 | 2.9723 | 2.3188 | 1.9777 |
| 10 | dibromoiodomethane | 4.0506 * | 4.0353 * | 2.8697 * | 2.8209 * | 3.4559 | 3.2854 | 2.0000 * | 1.9755 * | 1.9586 | 2.2126 |
| 11 | chlorodiiodomethane | 4.6576 | 4.4046 | 3.0809 | 2.9161 | 3.6990 | 3.3999 | 2.4437 * | 2.0457 * | 2.3098 | 2.2495 |
| 12 | bromodiiodomethane | 5.6021 | 3.8512 | 3.0969 * | 2.8369 * | 3.5850 | 3.3046 | 3.0506 | 1.9836 | 2.9586 * | 2.4210 * |
| 13 | triiodomethane | 2.4202 | 2.6149 | 3.3615 | 2.8486 | 3.9337 | 3.3188 | 2.8861 | 1.9398 | 2.8861 | 2.5894 |
| 14 | trichloronitromethane | 4.3098 | 3.7132 | 4.6383 | 4.2152 | 5.3143 | 4.9812 | 4.2007 * | 4.2819 * | 4.0809 | 3.3428 |
| 15 | tribromonitromethane | 2.7447 | 2.7705 | 5.3820 | 5.1283 | 6.4949 | 6.0793 | 4.8861 | 5.0640 | 4.7447 | 4.7139 |
| 16 | dichloroacetonitrile | 4.5086 | 3.7184 | 3.2757 | 2.5481 | 3.7632 * | 2.5119 * | 3.0362 | 2.5123 | 2.8239 | 2.9808 |
| 17 | trichloroacetonitrile | 4.8861 | 4.2672 | 3.8979 | 2.6617 | 3.7447 | 2.6486 | 3.7447 | 3.8560 | 3.4815 | 3.0753 |
| 18 | bromochloroacetonitrile | 4.0132 | 3.8159 | 4.3188 | 4.1067 | 4.2757 | 4.2982 | 3.8539 | 3.3736 | 3.7212 | 3.3159 |
| 19 | dibromoacetonitrile | 4.7696 | 3.9655 | 4.7100 | 4.7981 | 4.7825 * | 5.0416 * | 4.2291 | 4.1282 | 4.1938 | 3.5113 |
| 20 | 1,1-dichloropropanone | 2.7212 | 4.0555 | 3.0506 * | 2.2187 * | 3.3188 | 2.2263 | 2.4318 * | 2.2303 * | 2.3565 | 2.7129 |
| 21 | 1,1,1-trichloropropanone | 3.6576 * | 3.4857 * | 2.2803 | 2.3311 | 2.7364 | 2.3615 | 2.3872 | 1.0225 | 3.0000 | 2.2423 |
| 22 | chloroacetic acid | 6.0088 | 5.6592 | 2.1367 | 1.5737 | 1.9851 | 1.6179 | 1.6990 | 1.0652 | 1.6576 | 2.3448 |
| 23 | bromoacetic acid | 1.8861 | 3.2782 | 3.8697 | 2.5921 | 4.2111 | 2.8428 | 4.0655 | 3.3127 | 4.0000 | 3.1346 |
| 24 | iodoacetic acid | 2.6778 | 3.8068 | 4.3768 * | 3.8079 * | 4.7212 * | 4.305 * | 3.7447 | 2.4730 | 3.6576 | 3.8483 |
| 25 | dichloroacetic acid | 3.2147 | 4.0622 | 1.2967 | 1.6975 | 1.5229 | 1.7669 | 0.9208 | 1.2745 | 0.6198 | 1.1912 |
| 26 | bromochloroacetic acid | 5.1612 | 4.4508 | 2.0783 | 2.6122 | 2.3565 | 2.8669 | 1.1938 * | 1.9983 * | 1.6990 * | 1.8777 * |
| 27 | dibromoacetic acid | 4.1487 | 4.1865 | 2.2403 | 2.6637 | 2.4318 * | 2.9289 * | 1.6021 | 2.1776 | 1.8861 * | 2.2108 * |
| 28 | chloroiodoacetic acid | 1.8239 | 2.9267 | 4.4034 | 3.8242 | 4.5302 | 4.3246 | 4.0362 | 4.7239 | 4.1024 * | 4.8634 * |
| 29 | bromoiodoacetic acid | 3.7959 * | 3.8762 * | 4.2403 * | 3.8168 * | 4.0200 | 4.3157 | 3.7212 * | 4.8968 * | 3.8861 | 5.1786 |
| 30 | trichloroacetic acid | 3.2924 | 3.6489 | 1.4034 | 1.8108 | 1.4034 | 2.0112 | 1.0555 | 1.5905 | 1.0506 | 1.6881 |
| 31 | bromodichloroacetic acid | 1.4318 | 2.4029 | 2.7100 | 2.6367 | 2.9031 * | 2.8964 * | 1.7959 | 2.0658 | 1.6576 | 0.6340 |
| 32 | dibromochloroacetic acid | 3.5229 | 2.6718 | 2.7959 | 2.6882 | 2.8539 | 2.9584 | 1.4815 | 2.1417 | 1.4559 | 2.7074 |
| 33 | tribromoacetic acid | 4.4202 | 3.2073 | 3.3372 | 2.7322 | 3.6882 | 3.0113 | 2.1805 | 2.3490 | 2.6021 * | 2.9859 * |
| 34 | chloral hydrate | 2.1675 | 2.2067 | 2.2636 | 2.1046 | 2.1707 | 2.1359 | 1.3098 | 1.7185 | 1.6778 | 1.9750 |
| 35 | dichloracetamide | 6.5229 | 6.9769 | 1.1135 | 1.4161 | 1.2798 | 1.5222 | 0.5850 | 1.5056 | 1.0506 | 1.6881 |
| 36 | bromochloroacetamide | 2.5686 * | 2.9516 * | 1.8539 * | 2.9712 * | 2.3565 | 3.3043 | 1.4559 | 2.8215 | 1.8239 * | 2.4295 * |
| 37 | dibromoacetamide | 3.0706 | 3.2043 | 4.2218 * | 3.6626 * | 4.2596 | 4.0477 | 3.9586 | 3.6467 | 3.6198 | 2.7807 |
| 38 | chloroiodoacetamide | 2.6576 | 3.3142 | 3.7212 | 3.5437 | 4.1192 | 4.0810 | 2.7212 | 2.0333 | 2.5376 | 2.1670 |
| 39 | bromoiodoacetamide | 3.3768 | 4.4729 | 3.1163 | 4.1762 | 3.7959 * | 4.7536 * | 2.2291 | 1.9651 | 2.0706 | 2.4718 |
| 40 | diiodoacetamide | 1.4318 | 1.1768 | 2.7825 | 3.5724 | 3.0482 | 4.1155 | 2.2218 | 2.1941 | 2.1938 | 2.1785 |
| 41 | trichloroacetamide | 2.0000 | 1.8559 | 0.3565 | 1.5288 | 0.7825 * | 1.6577 * | 1.0706 | 1.4651 | 1.5850 | 2.1329 |
| 42 | bromodichloroacetamide | 4.3098 * | 4.099 * | 3.6198 | 2.9951 | 3.8239 | 3.3331 | 4.0315 | 2.6693 | 3.7959 | 2.7569 |
| 43 | dibromochloroacetamide | 4.3768 * | 4.2953 * | 3.9566 | 3.6865 | 4.3188 | 4.0764 | 3.9586 | 3.8000 | 3.6383 | 3.1107 |
| 44 | tribromoacetamide | 2.1308 | 1.5148 | 4.3233 | 4.3703 | 4.6676 | 4.8106 | 4.4437 | 4.7359 | 4.2147 * | 3.4421 * |
| 45 | n-nitrosodimethylamine | 2.9208 | 3.0246 | - | - | - | - | - | - | - | - |
| 46 | n-nitrosodiethylamine | 7.4202 | 7.1686 | - | - | - | - | - | - | - | - |
| 47 | n-nitrosopiperidine | 3.8861 | 4.6292 | - | - | - | - | - | - | - | - |
| 48 | n-nitrosomorpholine | 3.8539 | 3.3096 | - | - | - | - | - | - | - | - |
| 49 | nitrosodi-n-butylamine | 3.5850 * | 3.4285 * | - | - | - | - | - | - | - | - |
| 50 | 3-chloro-4-(dichloromethyl)-5- | 4.7447 | 3.1323 | 5.2596 | 6.0139 | 5.6108 | 6.4454 | 4.8861 | 4.3949 | 4.9586 | 4.7578 |
* The chemical included in the test set.
QSAR(quantitative structure–activity relationship) model and statistical parameters for five endpoints of disinfection byproduct (training set = 80% of whole dataset, test set = 20% of whole dataset).
| Endpoint | Equation a | Modeling b | Internal Validation c | External Validation d | Golbraikh & Tropsha e |
|---|---|---|---|---|---|
| X-Microtox | pEC50 = −11.8502 + 0.1230 SpDiam_B(m) + 4.9744 AVS_B(v) + 0.8805 Eig05_AEA(dm) − 3.3986 SddsN | ||||
| GSH+ | pECIR1.5 = −2.4744 + 0.1022C% + 0.3184SpDiam_B(m) + 0.0725 P_VSA_LogP_8+ 0.2132 T(N..Br) | ||||
| GSH- | pECIR1.5 = −2.4133 + 0.0894 C% + 0.3829SpDiam_B(m) + 0.0835 P_VSA_LogP_8 + 0.2270 T(N..Br) | ||||
| DNA+ | pECIR1.5 = 1.8732 + 0.0493 P_VSA_LogP_7 − 0.2258 Mor04s + 0.2798 T(N..Br) − 0.8971 T(N..I) | ||||
| DNA- | pECIR1.5 = 0.9105 + 0.3091Sv + 0.0493 P_VSA_LogP_7 + 0.2008 Mor03s − 1.0911 T(N..I) |
a SpDiam_B(m): spectral diameter from Burden matrix weighted by mass; AVS_B(v): average vertex sum from Burden matrix weighted by van der Waals volume; Eig05_AEA(dm): eigenvalue no. 5 from augmented edge adjacency mat weighted by dipole moment; SddsN: sum of ddsN E-states; Sv: sum of atomic van der Waals volumes (scaled on carbon atom); C%: percentage of C atoms; P_VSA_LogP_7: P_VSA-like on LogP, bin 7; P_VSA_LogP_8: P_VSA-like on LogP, bin 8; T(N..Br): sum of topological distances between N..Br; T(N..I): sum of topological distances between N..I; Mor04s: signal 04/weighted by I-state; Mor03s: signal 03/weighted by I-state; b ntr: the number of samples in training set; R2: coefficient of determination; : adjusted R2; RMSEtr: root mean square error in fitting; F: F-value; c : explained variance in leave-one-out prediction; RMSEcv: root mean square error in cross-validation prediction; : explained variance in leave-many-out prediction; and : R2 and Q2 in Y-scrambling, respectively; d ntest: the number of samples in test set; RMSEext: root mean square error in test set; : external determination coefficient; , and : variance explained in test set; CCC: concordance correlation coefficient; and : average and delta values of Roy criteria, respectively; e k and k’: slopes of the regression line over external data; and : R2 values in Golbraikh & Tropsha criteria.
Figure 1Scatter plot of observed versus calculated pEC50 (A), and the Williams plot of the final model (B) for 50 disinfection byproducts to X-Microtox. “●”: training set, “○”: test set. h*( warning leverage).
Figure 2Scatter plot of observed versus calculated pECIR1.5 ((A) for GSH+ and (C) for GSH− ) and the Williams plot ((B) for GSH+ and (D) for GSH− ) of the final model for 45 disinfection byproducts. “●”: training set, “○”: test set.
Figure 3Scatter plot of observed versus calculated pECIR1.5 ((A) for DNA+ and (C) for DNA−) and the Williams plot ((B) for DNA+ and (D) for DNA−) of the final model for 45 disinfection byproducts. “●”: training set, “○”: test set.