| Literature DB >> 20980217 |
Alexander Sedykh1, Hao Zhu, Hao Tang, Liying Zhang, Ann Richard, Ivan Rusyn, Alexander Tropsha.
Abstract
BACKGROUND: Quantitative high-throughput screening (qHTS) assays are increasingly being used to inform chemical hazard identification. Hundreds of chemicals have been tested in dozens of cell lines across extensive concentration ranges by the National Toxicology Program in collaboration with the National Institutes of Health Chemical Genomics Center.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20980217 PMCID: PMC3060000 DOI: 10.1289/ehp.1002476
Source DB: PubMed Journal: Environ Health Perspect ISSN: 0091-6765 Impact factor: 9.031
Figure 1Modeling workflow. (A) Preparation of the target data set. (B) Modeling procedure for qHTS LD50 data set.
Figure 2Examples of qHTS concentration–response curves and their noise-filtering transformations. (A) Original concentration–response curves for three sample chemicals from the qHTS data set (Jurkat cell line, AID no. 426). (B) Data after noise filtering (THR = 15%, MXDV = 5%). THR controls data variation near baseline; MXDV controls deviation from monotonicity. (C) Representation of concentration–response by binary fingerprints. (D) Concentration–response curve fingerprint of β-nitrostyrene. The x-axis indicates the qHTS profile based on 14 concentrations: “00 . . . 00 01 11 11 11” indicates 26 + 25 + 24 + 23 + 22 + 21 + 20 = 127.
Figure 3Pairwise Euclidean distances in the chemical (y-axis) and biological (x-axis) descriptor space for the qHTS LD50 data set. Dots represent compound pairs; colors reflect in vivo toxicity: blue, pairs of nontoxic compounds; red, pairs of toxic compounds; green, pairs where one compound is toxic and another nontoxic.
CCRs of 5-fold external validation for kNN and random forest models.
| Split no. | Chemical descriptors only | Hybrid | |||
|---|---|---|---|---|---|
| THR = 0% | THR = 5% | THR = 15% | THR = 25% | ||
| 1 | 0.75 | 0.74 | 0.79 | 0.79 | 0.79 |
| 2 | 0.76 | 0.67 | 0.79 | 0.79 | 0.79 |
| 3 | 0.75 | 0.74 | 0.90 | 0.86 | 0.87 |
| 4 | 0.71 | 0.79 | 0.78 | 0.81 | 0.74 |
| 5 | 0.83 | 0.77 | 0.81 | 0.82 | 0.83 |
| Mean | 0.76 | 0.74 | 0.81* | 0.81* | 0.80* |
| Random forest | |||||
| 1 | 0.75 | 0.70 | 0.79 | 0.80 | 0.77 |
| 2 | 0.77 | 0.79 | 0.84 | 0.83 | 0.82 |
| 3 | 0.80 | 0.77 | 0.85 | 0.88 | 0.86 |
| 4 | 0.74 | 0.74 | 0.71 | 0.74 | 0.71 |
| 5 | 0.84 | 0.83 | 0.83 | 0.83 | 0.83 |
| Mean | 0.78 | 0.77 | 0.80* | 0.82* | 0.80* |
Figure 4External prediction results of kNN models using different classification criteria: distribution of the predicted values (A) and heat maps illustrating classification (B, CCR) and coverage (C, percent chemicals within the applicability domain) results for each pair of classification thresholds T1, T2 (i.e., “nontoxic” < T1 ≤ “not covered” < T2 ≤ “toxic”). Red dashed (A) and diagonal (B,C) lines denote a default single-threshold classification (T1 = T2 = 0.5). Gray (A) and black (B,C) dashed lines denote an example of double-threshold classification (T1 = 0.3 and T2 = 0.7).
Classification results for external validation set.
| TOPKAT | Chemical descriptors only | Hybrid descriptors | |||||
|---|---|---|---|---|---|---|---|
| THR = 0% | THR = 15% | ||||||
| RF | RF | RF | |||||
| CCR | 0.69 | 0.75 | 0.77 | 0.70 | 0.80 | 0.88 | 0.87 |
| Sensitivity | 0.45 | 0.73 | 0.73 | 0.55 | 0.82 | 0.91 | 0.91 |
| Specificity | 0.93 | 0.78 | 0.80 | 0.85 | 0.78 | 0.85 | 0.83 |
RF, random forest. Each misclassification corresponds to the error of ≥ 1 log10 units on a continuous LD50 scale.
p < 0.05, TOPKAT model predictions versus all other models by using the permutation (10,000 times) test.
Figure 5Occurrence frequencies of the descriptors in the hybrid kNN (THR = 15%) model (A) and relative frequencies of qHTS biological descriptors (B). Max, maximum. The fraction of most frequent descriptors selected by mean occurrence is marked by a dashed line (A) and by a red arrowhead and red boxes (B).
Frequently used descriptors in a kNN Hybrid (THR = 15%) model.
| Dragon chemical descriptor (label and occurrence) | Representation | T/N-T | Example |
|---|---|---|---|
| nCH2RX (59%) | Alkyl halides | 19/4 | |
| Br-091 (12%) | |||
| Cl-086 (5%) | |||
| B03[O-Cl] (55%) | Aryl halides, haloalkyl ethers | 18/3 | |
| F03[O-Cl] (13%) | |||
| B04[Cl-Cl] (7%) | |||
| B05[O-Cl] (5%) | |||
| B01[C-Br] (5%) | |||
| nS (36%) | Thiophosphates | 22/17 | |
| B04[C-S] (28%) | |||
| B05[C-S] (26%) | |||
| B03[C-S] (13%) | |||
| B01[C-S] (12%) | |||
| F05[C-S] (9%) | |||
| F04[C-S] (7%) | |||
| B02[C-S] (7%) | |||
| B07[C-S] (4%) | |||
| nRCN (21%) | Alkyl nitriles | 5/1 | |
| nTB (10%) | |||
| C-001 (20%) | Methyl groups | 25/29 | CH3–[C,N,O,S,]... |
| C-005 (10%) | |||
| Mv (38%) | Molecular size | — | |
| AMW (17%) | |||
| F02[C-C] (13%) | Carbon backbone | — | |
| nCIC (10%) | Rings count | — | |
| ARR (8%), nCbH (9%), nCb- (5%) | Aromatic compounds | — | |
“T/N-T” is the number of “toxic” and “nontoxic” chemicals that represent the corresponding descriptor in the qHTS LD50 data set.
Classifications for similar compounds.
| Item no. | Compounds | qHTS profile | Activity | Classification | Structure |
|---|---|---|---|---|---|
| 1 | X=Cl, Y=H; CAS no. 58-90-2 | 0000000111 | 1 | 1 | |
| X,Y=Cl; CAS no. 87-86-5 | 0000000111 | 1 | 1 | ||
| X=H, Y=Cl; CAS no. 4901-51-3 | 0000001111 | 1 | 1 | ||
| 2 | X=H; CAS no. 71-41-0 | 0000000000 | 0 | 0 | |
| X=CH3; CAS no. 105-30-6 | 0000000000 | 0 | 0 | ||
| 3 | X=H; CAS no. 141-78-6 | 0000000000 | 0 | 0 | |
| X=CH3; CAS no. 108-21-4 | 0000000000 | 0 | 0 | ||
| 4 | X=H; CAS no. 100-52-7 | 0001010101 | 0 | 0 | |
| X=CH3; CAS no. 529-20-4 | 0000000000 | 0 | 0 | ||
| 5 | CAS no. 74-96-4 | 0000000000 | 0 | 0.9 | H–CH2–CH2–Br |
| CAS no. 106-93-4 | 0000000001 | 1 | 0.9 | Br–CH2–CH2–Br | |
| CAS no. 107-04-0 | 0000000000 | 1 | 1 | Cl–CH2–CH2–Br | |
| 6 | X=Me; CAS no. 7-50-58 | 0000000000 | 0 | 0.8 | X—≡N |
| X=Et; CAS no. 107-12-0 | 0000000000 | 1 | 0 | ||
| X=i-Pr; CAS no. 78-82-0 | 0000000000 | 1 | 0 | ||
| 7 | X=1,3-di-Me-But, Y=H, Z=Ph; CAS no. 793-24-8 | 0000011111 | 0 | 0.6 | |
| X,Y=CH3, Z=H; CAS no. 99-98-9 | 0000011011 | 1 | 0.6 | ||
| X=H, Y,Z=2-But; CAS no. 101-96-2 | 1011111111 | 1 | 0.7 | ||
| 8 | CAS no. 123-38-6 | 0000000000 | 0 | 0 | CH3–CH2–CH=O |
| CAS no. 107-02-8 | 0000000111 | 1 | 0.3 | CH2=CH–CH=O | |
| 9 | CAS no. 78-93-3 | 0000000000 | 0 | 0 | CH3–CH2–C(CH3)=O |
| CAS no. 78-94-4 | 0000000000 | 1 | 0 | CH2=CH–C(CH3)=O | |
| CAS no. 78-92-2 | 0000000000 | 0 | 0 | CH3–CH2–C(CH3)–OH | |
Abbreviations: But, butyl; Et, ethyl; i-Pr, isopropyl; Me, methyl; Ph, phenyl. Only bits of five highest concentrations are shown. “Activity,” experimental activity class; “Classification,” predicted class (average across all random forest and kNN models).
A concentration–response curve fingerprint based on the five highest concentrations only (see “Materials and Methods”) derived at THR = 15%, MXDV = 5% (maximum across 13 cell lines).