| Literature DB >> 20678182 |
Natalja Fjodorova1, Marjan Vracko, Marjana Novic, Alessandra Roncaglioni, Emilio Benfenati.
Abstract
BACKGROUND: One of the main goals of the new chemical regulation REACH (Registration, Evaluation and Authorization of Chemicals) is to fulfill the gaps in data concerned with properties of chemicals affecting the human health. (Q)SAR models are accepted as a suitable source of information. The EU funded CAESAR project aimed to develop models for prediction of 5 endpoints for regulatory purposes. Carcinogenicity is one of the endpoints under consideration.Entities:
Year: 2010 PMID: 20678182 PMCID: PMC2913330 DOI: 10.1186/1752-153X-4-S1-S3
Source DB: PubMed Journal: Chem Cent J ISSN: 1752-153X Impact factor: 4.215
Figure 1Statistical performance of model with 8 MDL descriptors (model A) and dimension 35*35 depending on number of learning epochs*. *Optimal model corresponds to 800 learning epochs (accuracy of test set is equal to 0.73)
Figure 2Statistical performance of model with 12 Dragon descriptors (model B) and dimension 35*35 depending on number of learning epochs*. *Optimal model corresponds to 200 learning epochs (accuracy of test set is equal to 0.69)
Statistical performance of models using 8 MDL descriptors (Model A) and 12 Dragon descriptors (Model B).
| Model A (8 MDL descriptors) | Model B (12 Dragon descriptors) | |||
|---|---|---|---|---|
| Accuracy, % | 91 | 73 | 89 | 69 |
| Sensitivity, % | 96 | 75 | 90 | 75 |
| Specificity, % | 86 | 69 | 87 | 61 |
Figure 3Accuracy (ACC), sensitivity (SE) and specificity (SP) of test set (161 compounds) vs. threshold for CP ANN model A.
Confusion matrix for external validation set of 738 chemicals of the model obtained with MDL descriptors (model A).
| Leadscope experimental carcinogenicity class | |||
|---|---|---|---|
|
|
| ||
| 231 | 155 | ||
| 130 | 222 | ||
Confusion matrix for external validation set of 738 chemicals of the model obtained with Dragon descriptors (model B).
| Leadscope experimental carcinogenicity class | |||
|---|---|---|---|
|
|
| ||
| 223 | 157 | ||
| 138 | 220 | ||
The range of MDL descriptors for model A
| MDL descriptors symbol | Min_value of descriptor | Max_value of descriptor |
|---|---|---|
| 0.000 | 30.74 | |
| 0.000 | 18.000 | |
| 0.000 | 4.000 | |
| -0.4009 | 7.7061 | |
| 0.000 | 7.000 | |
| -5.0185 | 2.000 | |
| 0.000 | 66.2633 | |
| 0.000 | 8.000 |
The range of Dragon descriptors for model B
| DRAGON descriptors symbol | Min_value of descriptor | Max_value of descriptor |
|---|---|---|
| 0 | 0.167 | |
| 0 | 1114.64 | |
| -1.357 | 1.0 | |
| -1.0 | 3.9280 | |
| 0.693 | 19. 939 | |
| 0.0 | 15.483 | |
| 0.0 | 15.111 | |
| 0.0 | 0.047 | |
| 0.0 | 2.0 | |
| 0.0 | 2.0 | |
| 0.0 | 2.0 | |
| 0.0 | 4.0 |
Figure 4Representation of CAESAR (left side) and external test set (right panel) in terms of number of chemicals showing specific fragments, as calculated by Leadscope software.
The structural diversity of CAESAR dataset of 805 chemicals by presence of specific structural alerts (SAs) extracted from ToxTree program.
| Structural Alert (SA) | Number of chemicals |
|---|---|
| SA_2: alkyl (C<5) or benzyl ester of sulphonic or phosphonic acid | 4 |
| SA_3: N-methylol derivatives | 2 |
| SA_4: Monohaloalkene | 5 |
| SA_5: S or N mustard | 7 |
| SA_6 Propiolactones or propiosultones | 0 |
| SA_7:Epoxides and aziridines | 22 |
| SA_8: Aliphatic halogens | 47 |
| SA_9: Alkyl nitrite | 1 |
| SA_10: a, b unsaturated carbonyls | 0 |
| SA_11: Simple aldehyde | 4 |
| SA_12: Quinones | 22 |
| SA_13: Hydrazine | 32 |
| SA_14: Aliphatic azo and azoxy | 7 |
| SA_15: isocyanate and isothiocyanategroups | 4 |
| SA_16: alkyl carbamate and thiocarbamate | 5 |
| SA_17: Thiocarbonyl | 18 |
| SA_18: Polycyclic Aromatic Hydrocarbons | 12 |
| SA_19: Heterocyclic Polycyclic Aromatic Hydrocarbons | 5 |
| SA_20: (Poly) Halogenated Cycloalkanes | 9 |
| SA_21: alkyl and aryl N-nitroso groups | 107 |
| SA_22: azide and triazene groups | 3 |
| SA_23: aliphatic N-nitro group | 1 |
| SA_24: a, b unsaturated aliphatic alkoxy group | 1 |
| SA_25: aromatic nitroso group | 4 |
| SA_26: aromatic ring N-oxide | 1 |
| SA_27: Nitro-aromatic | 75 |
| SA_28: primary aromatic amine, hydroxyl amine and its derived esters | 52 |
| SA_28bis: Aromatic mono- and dialkylamine | 7 |
| SA_28ter: aromatic N-acyl amine | 17 |
| SA_29: Aromatic diazo | 4 |
| SA_30: Coumarins and Furocoumarins | 8 |
| SA_31a: Halogenated benzene | 16 |
| SA_31b: Halogenated PAH | 7 |
| SA_31c: Halogenated dibenzodioxins | 2 |
Eight MDL descriptors selected for modeling.
| MDL_ID Descriptor Code | Symbol | Definition |
|---|---|---|
| Sum of all (= CH -) E-State values in molecule | ||
| Count of all (= C <) groups in molecule | ||
| Count of all (= N) groups in molecule | ||
| Difference simple 9th order path chi indices | ||
| Number of 6-membered rings | ||
| Smallest atom E-State value in molecule | ||
| Sum of hydrogen E-State on sp3 C on saturated bond | ||
| Count of internal hydrogen bonds with 2 skeletal bonds between donor and acceptor |
Twelve Dragon descriptors selected for modeling.
| Dragon Descriptor's code | Symbol | Definition |
|---|---|---|
| Path/walk 5 - Randic shape index | ||
| Distance/detour ring index of order 6 | ||
| Moran autocorrelation - lag 2/weighted by atomic polarizabilities | ||
| Eigenvalue 10 from edge adj. matrix weighted by edge degrees | ||
| Spectral moment 11 from edge adj. matrix weighted by edge degrees | ||
| Spectral moment 09 from edge adj. matrix weighted by dipole moments | ||
| Topological charge index of order 2 | ||
| Mean topological charge index of order6 | ||
| Number of N-nitroso groups (aliphatic) | ||
| Number of phosphates/thiophosphates | ||
| Al2-NH | ||
| Ar-N = X/X-N = X |
The distribution of carcinogens and non-carcinogens in total, training and test sets.
| Total set | Training Set | Test Set | |
|---|---|---|---|
| Carcinogens | 421 | 332 | 89 |
| Non-carcinogens | 384 | 312 | 72 |
| Totally |
Figure 5Counter propagation neural network architecture.
Confusion matrix for two class classifier (P- positive and N- negative).
| Predicted | ||||
|---|---|---|---|---|
| Non-carcinogens | Carcinogens | Total predicted | ||
| Total observed | ||||
*Definitions in Table 6:
TP-True positive;
TN-True negative;
FP-False positive;
FN-False negative.
Nnegative is the number of negative (non-carcinogens) in the dataset;
Npositive is the number of positive compounds (carcinogens) in the dataset;
Ntotal is total number of negative (non-carcinogens) and positive compounds (carcinogens) in the dataset;