| Literature DB >> 30282923 |
Tengjiao Fan1, Guohui Sun2, Lijiao Zhao3, Xin Cui4, Rugang Zhong5.
Abstract
To better understand the mechanism of in vivo toxicity of N-nitroso compounds (NNCs), the toxicity data of 80 NNCs related to their rat acute oral toxicity data (50% lethal dose concentration, LD50) were used to establish quantitative structure-activity relationship (QSAR) and classification models. Quantum chemistry methods calculated descriptors and Dragon descriptors were combined to describe the molecular information of all compounds. Genetic algorithm (GA) and multiple linear regression (MLR) analyses were combined to develop QSAR models. Fingerprints and machine learning methods were used to establish classification models. The quality and predictive performance of all established models were evaluated by internal and external validation techniques. The best GA-MLR-based QSAR model containing eight molecular descriptors was obtained with Q²loo = 0.7533, R² = 0.8071, Q²ext = 0.7041 and R²ext = 0.7195. The results derived from QSAR studies showed that the acute oral toxicity of NNCs mainly depends on three factors, namely, the polarizability, the ionization potential (IP) and the presence/absence and frequency of C⁻O bond. For classification studies, the best model was obtained using the MACCS keys fingerprint combined with artificial neural network (ANN) algorithm. The classification models suggested that several representative substructures, including nitrile, hetero N nonbasic, alkylchloride and amine-containing fragments are main contributors for the high toxicity of NNCs. Overall, the developed QSAR and classification models of the rat acute oral toxicity of NNCs showed satisfying predictive abilities. The results provide an insight into the understanding of the toxicity mechanism of NNCs in vivo, which might be used for a preliminary assessment of NNCs toxicity to mammals.Entities:
Keywords: N-nitroso compounds; QSAR; acute oral toxicity; classification; toxicity mechanism
Mesh:
Substances:
Year: 2018 PMID: 30282923 PMCID: PMC6213880 DOI: 10.3390/ijms19103015
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Fitting and internal validation parameters of GA-MLR-based QSAR models selected by Multi-Criteria Decision Making (MCDM).
| No. | Model No. | Number of Descriptors | Descriptors | R2 | R2adj | RMSEtr | CCCtr | F | Q2loo | RMSEcv | CCCcv | Q2lmo | R2Yscr | Q2Yscr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 21 | 8 | nR06 MATS6p MATS4i JGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] HOMO | 0.8071 | 0.7796 | 0.2661 | 0.8933 | 29.2961 | 0.7533 | 0.3010 | 0.8651 | 0.7379 | 0.1247 | −0.1890 |
| 2 | 27 | 8 | nR06 MATS6p MATS4i JGI4 SpMin7_Bh(i) P_VSA_MR_1 B01[C-O] F04[C-O] | 0.8033 | 0.7752 | 0.2688 | 0.8909 | 28.5870 | 0.7432 | 0.3071 | 0.8596 | 0.7267 | 0.1247 | −0.1856 |
| 3 | 29 | 8 | D/Dtr06 MATS6p MATS4i JGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] HOMO | 0.8023 | 0.7740 | 0.2695 | 0.8903 | 28.3984 | 0.7504 | 0.3028 | 0.8632 | 0.7335 | 0.1268 | −0.1880 |
| 4 | 33 | 7 | MATS6p MATS4i GATS1m JGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] | 0.7872 | 0.7611 | 0.2796 | 0.8809 | 30.1220 | 0.7322 | 0.3136 | 0.8520 | 0.7169 | 0.1097 | −0.1644 |
| 5 | 34 | 7 | Mp MATS6p MATS4i JGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] | 0.7848 | 0.7584 | 0.2811 | 0.8794 | 29.6947 | 0.7262 | 0.3171 | 0.8484 | 0.7076 | 0.1100 | −0.1636 |
| 6 | 36 | 7 | MATS6p MATS4i JGI4 SpMin7_Bh(i) H-046 B01[C-O] F04[C-O] | 0.7807 | 0.7538 | 0.2838 | 0.8768 | 28.9864 | 0.7276 | 0.3163 | 0.8491 | 0.7140 | 0.1077 | −0.1700 |
| 7 | 37 | 7 | ZM1Mad MATS6p MATS4i GGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] | 0.7797 | 0.7527 | 0.2844 | 0.8762 | 28.8222 | 0.7214 | 0.3199 | 0.8446 | 0.7045 | 0.1094 | −0.1669 |
| 8 | 38 | 7 | MATS6p MATS4i JGI4 SpMin5_Bh(s) P_VSA_MR_1 B01[C-O] F04[C-O] | 0.7786 | 0.7514 | 0.2851 | 0.8755 | 28.6378 | 0.7223 | 0.3194 | 0.8463 | 0.7040 | 0.1104 | −0.1621 |
Figure 1Experimental versus predicted toxicity values for compounds in the training set (red circle) and test set (blue square) of the best GA-MLR (genetic algorithm- multiple linear regression)-based quantitative structure-activity relationship (QSAR) model.
External validation parameters of GA-MLR-based QSAR models selected by MCDM.
| No. | Model No. | Number of Descriptors | Descriptors | R2ext | RMSEext | Q2F1 | Q2F2 | Q2F3 | CCCext |
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 21 | 8 | nR06 MATS6p MATS4i JGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] HOMO | 0.5401 | 0.3544 | 0.5147 | 0.5144 | 0.6581 | 0.7023 | 0.9774 | 1.0132 |
| 2 | 27 | 8 | nR06 MATS6p MATS4i JGI4 SpMin7_Bh(i) P_VSA_MR_1 B01[C-O] F04[C-O] | 0.5100 | 0.3709 | 0.4685 | 0.4681 | 0.6255 | 0.7003 | 0.9784 | 1.0110 |
| 3 | 29 | 8 | D/Dtr06 MATS6p MATS4i JGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] HOMO | 0.5153 | 0.3659 | 0.4862 | 0.4823 | 0.6355 | 0.6767 | 0.9742 | 1.0159 |
| 4 | 33 | 7 | MATS6p MATS4i GATS1m JGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] | 0.4632 | 0.3806 | 0.4381 | 0.4399 | 0.6056 | 0.6398 | 0.9787 | 1.0101 |
| 5 | 34 | 7 | Mp MATS6p MATS4i JGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] | 0.4712 | 0.3813 | 0.4381 | 0.4377 | 0.6041 | 0.6508 | 0.9752 | 1.0138 |
| 6 | 36 | 7 | MATS6p MATS4i JGI4 SpMin7_Bh(i) H-046 B01[C-O] F04[C-O] | 0.4726 | 0.3780 | 0.4478 | 0.4474 | 0.6109 | 0.6609 | 0.9804 | 1.0084 |
| 7 | 37 | 7 | ZM1Mad MATS6p MATS4i GGI4 SpMin7_Bh(i) B01[C-O] F04[C-O] | 0.6322 | 0.3295 | 0.5805 | 0.5802 | 0.7044 | 0.7851 | 0.9751 | 1.0171 |
| 8 | 38 | 7 | MATS6p MATS4i JGI4 SpMin5_Bh(s) P_VSA_MR_1 B01[C-O] F04[C-O] | 0.4710 | 0.3786 | 0.4461 | 0.4457 | 0.6097 | 0.6702 | 0.9855 | 1.0030 |
Figure 2Williams plot for the best GA-MLR-based QSAR model. The transverse dash lines represent ±3 standard residual, vertical black line represents warning leverage h* = 0.415.
Type and chemical meaning of molecular descriptors in the best QSAR model.
| Descriptor | Type | Chemical Meaning |
|---|---|---|
| nR06 | Ring descriptors | Number of 6-membered rings |
| MATS6p | 2D autocorrelations | Moran autocorrelation of lag 6 weighted by polarizability |
| MATS4i | 2D autocorrelations | Moran autocorrelation of lag 4 weighted by ionization potential |
| JGI4 | 2D autocorrelations | Mean topological charge index of order 4 |
| SpMin7_Bh(i) | Burden eigenvalues | Smallest eigenvalue n. 7 of Burden matrix weighted by ionization potential |
| B01[C-O] | 2D Atom Pairs | Presence/absence of C–O at topological distance 1 |
| F04[C-O] | 2D Atom Pairs | Frequency of C–O at topological distance 4 |
|
| QM descriptors | Highest occupied molecular orbital energy |
Figure 3Several typical compounds that contain 6-membered rings in 22 high toxic N-nitroso compounds (NNCs).
Figure 4Chemical diversity analysis of the training and external test sets. (A) Chemical space was defined by molecular weight (MW) and Ghose−Crippen LogKow (ALogP). N represents the chemical number of different data sets. (B) Similarity heat map of Euclidian distance metrics calculated using MACCS keys fingerprint for the training and external test sets.
Figure 5Performance of 10-fold cross-validation for the training set in 28 classification models. CA, classification accuracy; AUC, the area under the ROC curve; SE, sensitivity; SP, specificity.
Performance of the top eight models for training set and external test set in classification study 1.
| Data Set | Model | CA | SE | SP | AUC | TP | TN | FP | FN |
|---|---|---|---|---|---|---|---|---|---|
| Training set | MACCS-ANN | 0.821 | 0.67 | 0.88 | 0.905 | 10 | 36 | 5 | 5 |
| PubChem-ANN | 0.839 | 0.73 | 0.88 | 0.885 | 11 | 36 | 5 | 4 | |
| SubFP-ANN | 0.732 | 0.33 | 0.88 | 0.872 | 5 | 36 | 5 | 10 | |
| PubChem-LR | 0.804 | 0.47 | 0.93 | 0.860 | 7 | 38 | 3 | 8 | |
| PubChem-RF | 0.804 | 0.47 | 0.93 | 0.857 | 7 | 38 | 3 | 8 | |
| Est-ANN | 0.732 | 0.33 | 0.88 | 0.840 | 5 | 36 | 5 | 10 | |
| MACCS-LR | 0.768 | 0.40 | 0.90 | 0.832 | 6 | 37 | 4 | 9 | |
| MACCS-SVM | 0.750 | 0.47 | 0.85 | 0.770 | 7 | 35 | 6 | 8 | |
| Test set | MACCS-ANN | 0.792 | 0.29 | 1.00 | 0.992 | 2 | 17 | 0 | 5 |
| PubChem-ANN | 0.708 | 0.29 | 0.88 | 0.765 | 2 | 15 | 2 | 5 | |
| SubFP-ANN | 0.667 | 0.29 | 0.82 | 0.626 | 2 | 14 | 3 | 5 | |
| PubChem-LR | 0.792 | 0.43 | 0.94 | 0.889 | 3 | 16 | 1 | 4 | |
| PubChem-RF | 0.708 | 0.14 | 0.94 | 0.693 | 1 | 16 | 1 | 6 | |
| Est-ANN | 0.750 | 0.14 | 1.00 | 0.790 | 1 | 17 | 0 | 6 | |
| MACCS-LR | 0.875 | 0.57 | 1.00 | 0.899 | 4 | 17 | 0 | 3 | |
| MACCS-SVM | 0.875 | 0.71 | 0.94 | 0.958 | 5 | 16 | 1 | 2 |
1 Notes: CA, classification accuracy; SE, sensitivity; SP, specificity; AUC, the area under the ROC curve; TP, the number of true positive compounds; TN, the number of true negative compounds; FP, the number of false positive compounds; FN, the number of true negative compounds.
Privileged substructures in compounds with high toxicity identified by information gain and frequency analysis method.
| No. | Description | SMARTS | General Structures | Representative Compounds | IG | FH |
|---|---|---|---|---|---|---|
| SubFP133 | Nitrile | [NX1]#[CX2] |
|
| 0.048 | 3.64 |
| SubFP181 | Hetero N nonbasic | [nX2,nX3+] |
|
| 0.037 | 2.73 |
| SubFP184 | Heteroaromatic | [a;!c] |
|
| 0.037 | 2.73 |
| SubFP8 | Alkylchloride | [ClX1][CX4] |
|
| 0.024 | 3.64 |
| SubFP26 | Tertiary aliph amine | [NX3H0+0,NX4H1+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])] |
|
| 0.024 | 3.64 |
Names, CAS no. and corresponding toxicity values of N-nitroso compounds used in this study.
| No. | Name | CAS No. | LD50 mg/kg | Log (LD50)−1 | Predicted log(LD50)−1 |
|---|---|---|---|---|---|
|
| Diallylnitrosamine a | 16338-97-9 | 800 (L) b | 3.10 | 3.20 |
|
| Dipentylnitrosamine | 13256-06-9 | 1750 (L) | 2.76 | 2.72 |
|
| 99-80-9 | 1370 (L) | 2.86 | 3.20 | |
|
| Nitroso- | 13256-11-6 | 48 (H) b | 4.32 | 4.30 |
|
| 82018-90-4 | 960 (L) | 3.02 | 3.20 | |
|
| Nitrosodibutylamine | 924-16-3 | 1200 (L) b | 2.92 | 3.17 |
|
| 621-64-7 | 480 (L) b | 3.32 | 3.25 | |
|
| Nitrosoethylmethylamine | 10595-95-6 | 90 (H) b | 4.05 | 4.34 |
|
| 2-Nitrosomethylaminopyridine a | 16219-98-0 | 60 (H) | 4.22 | 4.23 |
|
| Nitrosomethylaniline | 614-00-6 | 225 (L) | 3.65 | 4.15 |
|
| Diisopropylnitrosamine | 601-77-4 | 850 (L) | 3.07 | 2.68 |
|
| 625-89-8 | 300 (L) | 3.52 | 3.46 | |
|
| 3398-69-4 | 1600 (L) b | 2.80 | 2.71 | |
|
| 3684-97-7 | 45 (H) | 4.35 | 4.25 | |
|
| 3817-11-6 | 1800 (L) b | 2.74 | 2.34 | |
|
| 4549-40-0 | 24 (H) | 4.62 | 4.51 | |
|
| 4549-43-3 | 340 (L) | 3.47 | 3.55 | |
|
| 4549-44-4 | 380 (L) b | 3.42 | 3.71 | |
|
| 5336-53-8 | 900 (L) | 3.05 | 2.92 | |
|
| 5432-28-0 | 30 (H) b | 4.52 | 3.92 | |
|
| Nitrosomethyl-n-butylamine | 7068-83-9 | 130 (H) | 3.89 | 3.99 |
|
| 13147-25-6 | 7500 (L) | 2.12 | 2.53 | |
|
| 13256-07-0 | 120 (H) | 3.92 | 3.85 | |
|
| Dinitrosodimethylethylenediamine | 13256-12-7 | 125 (H) b | 3.90 | 3.90 |
|
| Vinylethylnitrosamine | 13256-13-8 | 88 (H) | 4.06 | 3.68 |
|
| 13256-22-9 | 5000 (L) | 2.30 | 2.68 | |
|
| 2-Chloro- | 16339-16-5 | 22 (H) | 4.66 | 4.10 |
|
| 39885-14-8 | 700 (L) | 3.15 | 3.34 | |
|
| Methyl(acetoxymethyl)nitrosamine a | 56856-83-8 | 130 (H) | 3.89 | 3.83 |
|
| Acetoxymethylbutylnitrosamine a | 56986-36-8 | 1500 (L) | 2.82 | 2.89 |
|
| 1-Methoxy-ethyl-ethylnitrosamine | 61738-03-2 | 1000 (L) b | 3.00 | 2.84 |
|
| Methoxymethyl-ethylnitrosamine | 61738-04-3 | 540 (L) | 3.27 | 3.12 |
|
| 1-Methoxy-ethyl-methylnitrosamine | 61738-05-4 | 240 (L) | 3.62 | 3.35 |
|
| Acetoxymethylpropylnitrosamine | 66017-91-2 | 1000 (L) | 3.00 | 3.05 |
|
| Methyl(butyroxymethyl)nitrosamine | 67557-56-6 | 800 (L) b | 3.10 | 3.20 |
|
| Acetoxymethyltrideuteromethylnitrosamine | 67557-57-7 | 120 (H) | 3.92 | 3.88 |
|
| 148-97-0 | 490 (L) b | 3.31 | 3.53 | |
|
| 937-40-6 | 18 (H) b | 4.74 | 4.22 | |
|
| 4-(Methylnitrosoamino)benzaldehyde a | 7431-19-8 | 2000 (L) | 2.70 | 2.76 |
|
| 3-( | 13256-21-8 | 750 (L) | 3.12 | 2.88 |
|
| Aethyl-4-picolylnitrosamin | 13256-23-0 | 40 (H) | 4.40 | 4.02 |
|
| 13256-32-1 | 280 (L) | 3.55 | 3.50 | |
|
| 13344-50-8 | 4000 (L) | 2.40 | 2.65 | |
|
| 4-Nitrosomethylaminopyridine | 16219-99-1 | 200 (L) | 3.70 | 3.81 |
|
| 16339-04-1 | 1100 (L) b | 2.96 | 3.19 | |
|
| 16339-14-3 | 95 (H) b | 4.02 | 4.05 | |
|
| 16339-18-7 | 163 (H) | 3.79 | 3.89 | |
|
| 20689-96-7 | 250 (L) b | 3.60 | 3.66 | |
|
| 56235-95-1 | 1000 (L) b | 3.00 | 2.79 | |
|
| 62783-48-6 | 90 (H) | 4.05 | 3.96 | |
|
| 62783-49-7 | 600 (L) | 3.22 | 3.41 | |
|
| 62783-50-0 | 400 (L) b | 3.40 | 3.89 | |
|
| 68690-89-1 | 600 (L) | 3.22 | 4.00 | |
|
| 68690-90-4 | 2100 (L) | 2.68 | 2.82 | |
|
| 3-Nitrosomethylaminopyridine | 69658-91-9 | 10 (H) | 5.00 | 4.40 |
|
| 55-18-5 | 220 (L) | 3.66 | 3.62 | |
|
| 62-75-9 | 37 (H) | 4.43 | 4.53 | |
|
| 86-30-6 | 1825 (L) | 2.74 | 2.74 | |
|
| 3276-41-3 | 900 (L) | 3.05 | 3.05 | |
|
| 14026-03-0 | 600 (L) | 3.22 | 2.94 | |
|
| 36702-44-0 | 600 (L) | 3.22 | 3.00 | |
|
| 20917-49-1 | 283 (L) | 3.55 | 3.58 | |
|
| 59-89-2 | 282 (L) | 3.55 | 3.23 | |
|
| 930-55-2 | 900 (L) | 3.05 | 3.35 | |
|
| 1-Nitrosopiperazine | 5632-47-3 | 2260 (L) | 2.65 | 3.39 |
|
| 100-75-4 | 200 (L) | 3.70 | 3.39 | |
|
| 40548-68-3 | 830 (L) b | 3.08 | 2.95 | |
|
| 932-83-2 | 336 (L) | 3.47 | 3.51 | |
|
| 7633-57-0 | 320 (L) | 3.49 | 3.40 | |
|
| 16339-07-4 | 100 (H) b | 4.00 | 3.51 | |
|
| 20917-50-4 | 566 (L) b | 3.25 | 3.40 | |
|
| 3-Nitrosotetrahydro-1,3-oxazine | 35627-29-3 | 600 (L) | 3.22 | 3.29 |
|
| 39884-52-1 | 1500 (L) | 2.82 | 2.92 | |
|
| 1-Amyl-1-nitrosourea a | 10589-74-9 | 560 (L) | 3.25 | 3.30 |
|
| 869-01-2 | 400 (L) b | 3.40 | 3.49 | |
|
| 759-73-9 | 300 (L) | 3.52 | 3.46 | |
|
| 684-93-5 | 110 (H) | 3.96 | 4.27 | |
|
| Propylnitrosourea | 816-57-9 | 480 (L) | 3.32 | 3.13 |
|
| 13860-69-0 | 450 (L) b | 3.35 | 3.73 | |
|
| Ethylnitrosobiuret a | 32976-88-8 | 1050 (L) | 2.98 | 3.53 |
a Test set in QSAR study; b Test set in classification study; c Outlier in the best GA-MLR-based QSAR model.