| Literature DB >> 25302621 |
Swapnil Chavan1, Ian A Nicholls2, Björn C G Karlsson3, Annika M Rosengren4, Davide Ballabio5, Viviana Consonni6, Roberto Todeschini7.
Abstract
A series of 436 Munro database chemicals were studied with respect to their corresponding experimental LD50 values to investigate the possibility of establishing a global QSAR model for acute toxicity. Dragon molecular descriptors were used for the QSAR model development and genetic algorithms were used to select descriptors better correlated with toxicity data. Toxic values were discretized in a qualitative class on the basis of the Globally Harmonized Scheme: the 436 chemicals were divided into 3 classes based on their experimental LD50 values: highly toxic, intermediate toxic and low to non-toxic. The k-nearest neighbor (k-NN) classification method was calibrated on 25 molecular descriptors and gave a non-error rate (NER) equal to 0.66 and 0.57 for internal and external prediction sets, respectively. Even if the classification performances are not optimal, the subsequent analysis of the selected descriptors and their relationship with toxicity levels constitute a step towards the development of a global QSAR model for acute toxicity.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25302621 PMCID: PMC4227209 DOI: 10.3390/ijms151018162
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Calculated 2D descriptors for 441 Munro database chemicals.
| Sr. | Descriptor Type | No. of Descriptors |
|---|---|---|
| 1 | Constitutional indices | 43 |
| 2 | Topological indices | 75 |
| 3 | Connectivity indices | 37 |
| 4 | 2D matrix based descriptors | 550 |
| 5 | ETA indices | 23 |
| 6 | Atom type E-state indices | 170 |
| 7 | 2D atom pairs | 1596 |
| 8 | Drug like indices | 27 |
| 9 | Ring descriptors | 32 |
| 10 | Walk and path counts | 46 |
| 11 | Information indices | 48 |
| 12 | 2D auto correlations | 213 |
| 13 | P-VSA like descriptors | 45 |
| 14 | Edge adjacency indices | 324 |
| 15 | CATS 2D | 150 |
| 16 | Atom-centered fragments | 115 |
| 17 | Molecular properties | 20 |
| 18 | Functional group counts | 154 |
| Total | All 18 types | 3668 |
The distribution of class I, II and III chemicals into training and test sets based on the principle of keeping 20% of chemicals from each class as a test set.
| Class I | Class II | Class III | Total | |
|---|---|---|---|---|
| Training | 82 | 136 | 129 | 347 |
| Test | 21 | 35 | 33 | 89 |
| Total | 103 | 171 | 162 | 436 |
Classification parameters of k-NN classification model.
| NER | ER | Sensitivity | Specificity | |||||
|---|---|---|---|---|---|---|---|---|
| Class | Class | |||||||
| I | II | III | I | II | III | |||
| Fitting | 0.66 | 0.34 | 0.53 | 0.46 | 0.65 | 0.80 | 0.74 | 0.78 |
| cv | 0.67 | 0.33 | 0.54 | 0.49 | 0.65 | 0.81 | 0.76 | 0.78 |
| External | 0.57 | 0.43 | 0.39 | 0.35 | 0.55 | 0.73 | 0.68 | 0.74 |
Description of the 25 descriptors derived by the genetic algorithm coupled with k-NN classification.
| Sr. | Name | Description | Type |
|---|---|---|---|
| 1 | MATS1e | Moran autocorrelation of lag 1 weighted by Sanderson electronegativity | 2D autocorrelations |
| 2 | SpMAD_B(s) | Spectral mean absolute deviation from Burden matrix weighted by I-State | 2D matrix-based descriptors |
| 3 | SpPosA_B(p) | Normalized spectral positive sum from Burden matrix weighted by polarizability | 2D matrix-based descriptors |
| 4 | MATS1v | Moran autocorrelation of lag 1 weighted by van der Waals volume | 2D autocorrelations |
| 5 | Mi | Mean first ionization potential (scaled on Carbon atom) | Constitutional indices |
| 6 | AAC | Mean information index on atomic composition | Information indices |
| 7 | SpMAD_B(m) | Spectral mean absolute deviation from Burden matrix weighted by mass | 2D matrix-based descriptors |
| 8 | GATS1p | Geary autocorrelation of lag 1 weighted by polarizability | 2D autocorrelations |
| 9 | C-026 | R--CX--R | Atom-centred fragments |
| 10 | SIC0 | Structural Information Content index (neighborhood symmetry of 0-order) | Information indices |
| 11 | nDB | Number of double bonds | Constitutional indices |
| 12 | SIC1 | Structural Information Content index (neighborhood symmetry of 1-order) | Information indices |
| 13 | ATS6e | Broto-Moreau autocorrelation of lag 6 (log function) weighted by Sanderson electronegativity | 2D autocorrelations |
| 14 | P_VSA_MR_3 | P_VSA-like on Molar Refractivity, bin 3 | P_VSA-like descriptors |
| 15 | DLS_02 | Modified drug-like score from Oprea | Drug-like indices |
| 16 | nCL | Number of Chlorine atoms | Constitutional indices |
| 17 | J_Dz(Z) | Balaban-like index from Barysz matrix weighted by atomic number | 2D matrix-based descriptors |
| 18 | SM6_B(s) | Spectral moment of order 6 from Burden matrix weighted by I-State | 2D matrix-based descriptors |
| 19 | GATS1v | Geary autocorrelation of lag 1 weighted by van der Waals volume | 2D autocorrelations |
| 20 | JGI4 | Mean topological charge index of order 4 | 2D autocorrelations |
| 21 | P_VSA_i_4 | P_VSA-like on ionization potential, bin 4 | P_VSA-like descriptors |
| 22 | P-117 | X3-P = X (phosphate) | Atom-centred fragments |
| 23 | B01[S-P] | Presence/absence of S–P at topological distance 1 | 2D Atom Pairs |
| 24 | B03[C-S] | Presence/absence of C–S at topological distance 3 | 2D Atom Pairs |
| 25 | BLTF96 | Verhaar Fish base-line toxicity from MLOGP (mmol/L) | Molecular properties |
Figure 1PCA score plot using 25 descriptors explaining similarity and variability in training set chemicals with respect to their corresponding class.
Figure 2PCA score plot using 25 descriptors explains similarity and variability in test set chemicals with respect to their corresponding class.
Figure 3Loading plot describes significant descriptors.