| Literature DB >> 26897366 |
William Bains1,2.
Abstract
Entities:
Mesh:
Substances:
Year: 2016 PMID: 26897366 PMCID: PMC4761184 DOI: 10.1186/s12918-016-0262-7
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 2Extent of biochemical space. Fraction (Y axis) of Fragments derived from the space of all possible chemicals that are not found in actual metabolites, compared to the fraction that would be expected not to be found in the same number of chemicals sampled at random from the chemical space of possible metabolites, plotted as a function of fragment size (N – X axis). Blue squares – fraction of fragments not found in actual metabolites. Red circles – fraction not found in an equivalent size collection of random molecules. Panel a: fragments not found in the ‘core metabolism’ of 611 molecules represented in the ExPasy metabolic map. Panel b: Fragments not found in the ~45,000 unique molecules listed in the Dictionary of Natural Products [11]
Fig. 4UnBiological vs. toxicity for selected organisms. Plots of UnBiological vs. toxicity endpoints for three of the data sets analysed here. Each dot represents the LD50 (X axis) vs Ub value (Y axis) for one compound. Ub is calculated as described in the Methods section and Appendix 2. In summary, Ub represents the largest region on a molecule that is not present in a metabolite, as defined by a 5-atom (Ub ) or 6-atom (Ub ) overlap. a: Ub vs. LD50 for Chlorella, b: Ub6 vs. LD50 for Rainbow trout, c: Ub vs. LD50 for Lemna, intoxicated with compounds other than herbicides
Fig. 5Correlation of UnBiological with toxicity by potency band. Correlation of Ub and Ub with different toxicity endpoints. For each data set, the data on a compound was binned for compounds having different EC50 or LD50 values, and the correlation of the toxicity endpoint with Ub was correlated with the toxicity values within that concentration range. Thus for Rat oral LD50 (Panel a), toxicity was binned into Log(LD50) < −4, Log(LD50) between −4 and −3, Log(LD50) between −3 and −2, and Log(Ld50) > −2, all values in molar. Correlations were calculated for each of these four data sets. Error bars are 95 % confidence limits for the correlation, based on the number of data points in each bin. For all panels: X axis = concentration bins, in log (molar). Y axis: correlation of Ub and toxicity within that data sub-set. Panels a: to f: − Rat oral toxicity, mouse oral toxicity, rat carcinogenic potential (from CPDBAS), NCI cell line cytotoxicity, Fathead minnow toxicity and tetrahymena toxicity
Fig. 6NCI screening data analysis. X axis: concentration. Y axis: fraction of compounds in NCI public datasets on cell-based screens that show inhibitory effect in that assay as a fraction of number of compounds tested at that concentration. Results are binned into concentration bins on a log scale, each bin representing log (concentration) = 0.25. Blue diamonds: HIV screening data [39]. Red squares: cell line screening for anti-cancer effect [40]
Fig. 7Docking small molecules with entire protein structures. a. Binding of 18 known ABL inhibitors, compared to the binding of 56 drugs or natural products not reported to have any effect on ABL kinase activity. Y axis: Vina output binding energy. X axis: molecular weight. b. Comparison of the predicted binding energy of 15 alpha amino acid and their alpha-N methyl alpha-carboxymethyl derivatives with the binding energy of equivalent beta amino acids and amino acid derivatives to ABL, Aldolase, HIV protease, PDE2b4b and PPAR gamma structures.. Excluded amino acids were: Glycine, which has no beta amino acid, beta alanine which is a metabolite in its own right and so was excluded, beta aspartate and asparagine which are the same as alpha aspartate and asparagines, and beta threonine which is likely to be unstable and so not a realistic chemical structure. Error bars are 95 % confidence limits (1.98*standard error of the mean)
Fig. 1Examples of non-metabolites
Fig. 3Cartoon of calculating UnBiological (Ub). This takes a ‘toy domain’ of four metabolites and three target molecules to explain the process. Only fragments of 3 or 4 atoms are considered in this example. In reality there are 611 metabolites, ~5000 targets molecules (note that a number of molecules are tested in more than one experimental series in Table 3) and 30912 Fragments of size 3 to 14 atoms. Metabolites and target molecules are used to generate Fragments that are present in at least two of the overall set of molecules (This is a convenient limitation on the number of Fragments, and may be revised in future implementations of the algorithm). Fragments are classified as to whether they occur in the set of metabolites (green) or do not occur in metabolites (red). The target set of molecules is then matched to the set of Fragments that do not occur in metabolites – the size of the largest such Fragment is the Ub measure. Note that in this simplified model it is clear that the presence of a chlorine atom confers ‘UnBiological-ness’ on a molecule. The size of the Ub fragment can be the same as the size of the whole molecule (e.g. 1-cloropropane in this example). As illustrated here, this approach takes no account of the potential reactivity of a molecule, only its topological structure
Biological datasets
| Data set | Number of compounds | Species | Measured endpoint | Source | |
|---|---|---|---|---|---|
| Whole organism toxicity endpoints | |||||
| Trout (24 h) | 186 | Oncorhynchus mykiss | Death | [ | These two data sets differ only in the time of exposure – 1 and 3 days |
| Trout (96 h) | 181 | ||||
| Pteronarcys (24 h) | 52 | Pteronarcys californica | Death | [ | These two data sets differ only in the time of exposure – 1 and 3 days |
| Pteronarcys (96 h) | 52 | ||||
| Bluegill (24 h) | 157 | Lepomis macrochirus | Death | [ | These two data sets differ only in the time of exposure – 1 and 3 days |
| Bluegill (96 h) | 172 | ||||
| Gammarus (24 h) | 113 | Combined data from G. fasciatus, G. lacustris and G. Pseudolimnaeus | Death | [ | These two data sets differ only in the time of exposure – 1 and 3 days |
| Gammarus (96 h) | 132 | ||||
| Fathead minnow | 578 | Pimephales promelas | Death | [ | |
| Rat oral | 814 | Rattus norvegicus | Death | [ | Rodent toxicity data was manually curated from The Merck Index. Note that ‘molar’ values for mammalian whole organism studies are calculated as moles/kg body mass |
| Mouse oral | 398 | Mus musculus | |||
| Rat IP | 170 | Rattus norvegicus | |||
| Mouse IP | 290 | Mus musculus | |||
| AMES (mutagenicity) | 163 | Salmonella typhimurium | Mutated colony formation | Data collected and provided by Choracle Ltd, derived from Toxnet [ | |
| CPDBAS rat | 519 | Rattus norvegicus | Tumour formation frequency | [ | |
| CPDBAS mouse | 402 | Mus musculus | |||
| CPDBAS hamster | 44 | Mesocricetus auratus | |||
| Drosophila | 139 | Drosophila melanogaster | Death | [ | Only compounds with at least two compound concentrations reported included |
| Lemna - non-Herbicides | 149 | Lemna gibba and Lemna minor | lack of growth/leaflet reduction | [ | Compounds developed for reasons other than their herbicide effect |
| Lemna - Herbicides | 174 | Lemna gibba and Lemna minor | lack of growth/leaflet reduction | [ | Compounds developed as herbicides (primarily for macroscopic land plants) |
| Tetrahymena | 334 | Tetrahymena pyriformis | Death | [ | |
| Chlorella | 91 | Chlorella vulgaris | Death | [ | |
| Scenedesmus | 63 | Cell numbers (combination growth inhibition and death) | [ | Data-set heavy on chlorinated and nitrated aromatic compounds | |
| Yeast | 253 | Saccharomyces cereviseae | Growth inhibition | [ | Mostly drug-like molecules: See methods section for details of this analysis |
| Other endpoints | |||||
| NCI | 768 | Homo sapiens | Cell number (cell growth vs. cell killing) | [ | Cell culture assay, not whole organism. Cytotoxicty data from the NCI anti-HIV compounds screening programme. |
| HERG | 229 | Homo sapiens | Ion channel blockade | [ | Ion channel assay in cloned receptor assay, not whole organism test |
| Oestrogenic | 131 | Rattus norvegicus | Receptor binding IC50 | [ | Receptor binding assay, not a cell- or organism-based assay |
| Tadpole narcosis | 141 | Rana temporaria | Narcosis (reversible lack of motion) | [ | |
| COX-2 | 107 | N/A | Cycloxygenase-2 inhibition | [ | |
| Antihistamine | 61 | N/A | Histamine receptor blockade | [ | A variety of related structures, including anti-psychotics |
Data sets used in this paper
Correlations of Ub with toxicity endpoints
| Endpoint | Number | Ub5 | Ub6 |
|---|---|---|---|
| Trout 24 h | 186 | −0.230** | −0.337*** |
| Trout 96 h | 181 | −0.419*** | −0.516*** |
| Pteronarcys (24 h) | 52 | −0.433** | −0.385** |
| Pteronarcys (96 h) | 52 | −0.456*** | −0.369** |
| Bluegill (24 h) | 157 | −0.149 | −0.215** |
| Bluegill (96 h) | 172 | −0.216** | −0.276*** |
| Gammarus (24 h) | 113 | −0.437*** | −0.208* |
| Gammarus (96 h) | 132 | −0.407*** | −0.205* |
| Fathead minnow | 578 | −0.311*** | −0.308*** |
| Rat oral | 814 | −0.441*** | −0.372*** |
| Mouse oral | 398 | −0.199*** | −0.191*** |
| Rat IP | 170 | −0.214** | −0.147 |
| Mouse IP | 290 | −0.180** | −0.161** |
| AMES (mutagenicity) | 163 | −0.316*** | −0.518*** |
| CPDBAS rat | 519 | −0.198*** | −0.191*** |
| CPDBAS mouse | 402 | −0.145** | −0.198*** |
| CPDBAS hamster | 44 | −0.430** | −0.351* |
| Drosophila | 139 | −0.397*** | −0.337*** |
| Lemna - non-Herbicides | 149 | −0.428*** | −0.502*** |
| Lemna - Herbicides | 174 | −0.392*** | −0.428*** |
| Tetrahymena | 334 | −0.408*** | −0.448*** |
| Chlorella | 91 | −0.578*** | −0.738*** |
| Scenedesmus | 63 | −0.237 | −0.467*** |
| Yeast | 253 | 0.095 | −0.014 |
| NCI | 768 | −0.113** | −0.137*** |
Rank Correlation coefficient between toxicity endpoints and UnBiological (Ub) measures. Two Ub measures are shown – Ub and Ub , calculated from an overlap of 5 and 6 atoms between target molecule and the pool of metabolites. See Appendix 2 for more detailed descriptions of calculation of Ub. Column 1: toxicity endpoint. Column 2: number of data points. Column 3 and 4: correlation of Ub and Ub respectively with appropriate toxicity endpoint. Significance of the correlation of flagged by asterisks.* = p < 0.05.** = p < 0.01*** = p < 0.000714. Note that*** is a value selected to be 0.05/(35*2), to correct for multiple testing of 35 toxicity endpoints and 2 correlates. If Ub and Ub were randomly distributed with respect to toxicity, then we would expect to have to do this study 20 times to come up with one correlation of p < 0.000714
Correlation of Ub with other biological endpoints
| Endpoint | Number | UB5 | Ub6 |
|---|---|---|---|
| HERG | 229 | −0.062 | 0.179** |
| Oestrogenic | 131 | −0.024 | −0.342*** |
| Tadpole narcosis | 141 | −0.043 | −0.267** |
| COX-2 | 107 | −0.069 | −0.149 |
| Antihistamine | 61 | −0.097 | −0.0126 |
Rank correlation coefficient of three target-related toxicity measures and two pharmacological endpoints with UnBiological measures Ub and Ub . Column 1: Pharmacological endpoint. Column 2: number of data points. Columns 3 and 4: correlations with Ub and Ub respectively. Significance flags are the same as in Table 1
Structures sets used for docking
| Protein | PDB structural data sets used for docking |
|---|---|
| ABL | 2e2b 1 m52 1iep 3k5v 3qri 3qrk 3g6g 1ab2 2g2h 2hiw 2gqg 2hz0 3cs9 |
| Aldolase | 1ald 2ald 4ald |
| HIV protease | 1a94 1kj4 2bpz 2qhz 2qi6 2r5p 2r5q |
| PDE2b4b | 1f0j 1ro6 1ro9 1ror 2qyl 3frg 3gwt 3hmv 3o57 |
| PPAR | 1i7g 1kkq 2npa 2p54 2rew 2znn 3et1 3kdu |
Modelled expectations of fragment matching to sets of chemicals
| AM | 7 atoms | 14 atoms | 21 atoms | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AF | NF | NM | 50 | 200 | 611 | 2000 | 50 | 200 | 611 | 2000 | 50 | 200 | 611 | 2000 |
| 3 | 62 | 0.356 | 0.177 | 0.076 | 0.024 | 0.292 | 0.136 | 0.041 | 0.019 | 0.245 | 0.084 | 0.048 | 0.012 | |
| 4 | 318 | 0.571 | 0.273 | 0.101 | 0.021 | 0.465 | 0.202 | 0.063 | 0.022 | 0.356 | 0.127 | 0.039 | 0.011 | |
| 5 | 1363 | 0.832 | 0.561 | 0.321 | 0.119 | 0.702 | 0.448 | 0.240 | 0.092 | 0.611 | 0.335 | 0.164 | 0.064 | |
| 6 | 9240 | 0.975 | 0.904 | 0.771 | 0.529 | 0.899 | 0.760 | 0.581 | n/m | 0.852 | 0.674 | 0.476 | n/m | |
AF – number of (non-hydrogen) atoms in the fragment set. NF – number of fragments in the chemical space containing AF atoms. NM – number of test molecules in the test set. AM – number of atoms in the molecules in the test set. Cells show the fraction of Fragments that are not found in at least one of the Molecules in the test set. n/m = not modelled
Summary statistics on the toxicity and UnBiological values for different biological endpoints
| Endpoint | Number | Correlation with Toxicity | Endpoint distribution (log molar) | Ub5 distribution | Ub6 distribution | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ub5 | Ub6 | mean | median | Std. dev | skew | max | min | mean | median | St. Dev | skew | max | min | mean | median | St. Dev. | skew | max | min | ||
| Trout 24 h | 186 | -0.230 ** | -0.337 *** | -5.73 | -5.64 | 1.39 | -0.14 | -0.99 | -10.04 | 7.58 | 7 | 1.92 | 0.01 |
|
| 8.94 | 9 | 1.91 | -0.74 |
|
|
| Trout 96 h | 181 | -0.419 *** | -0.516 *** | -5.73 | -5.78 | 1.77 | 0.76 | -0.23 |
| 7.58 | 7 |
| 0.01 |
|
| 8.94 | 9 | 1.92 | -0.76 |
|
|
| Pteronarcys (24 h) | 52 | -0.433 ** | -0.385 ** | -6.25 | -6.83 | 1.27 | 0.75 | -3.24 | -7.98 |
|
| 1.70 | 0.15 |
| 6 | 9.71 |
| 1.60 | -0.37 |
| 6 |
| Pteronarcys (96 h) | 52 | -0.456 *** | -0.369 ** | -7.08 | -7.58 | 1.44 | 0.61 | -3.71 | -9.55 | 8.44 |
| 1.67 | 0.22 |
| 6 | 9.65 |
| 1.58 | -0.29 |
| 6 |
| Bluegill (24 h) | 157 | -0.149 | -0.215 ** | -5.71 | -5.63 | 1.41 | -0.21 | -2.31 | -9.61 | 7.92 |
| 1.82 | 0.22 |
| 4 | 9.30 | 9 | 1.65 | -0.28 |
| 4 |
| Bluegill (96 h) | 172 | -0.216 ** | -0.276 *** | -5.88 | -5.84 | 1.48 | -0.16 | -2.31 | -10.20 | 7.85 |
| 1.82 | 0.23 |
| 4 | 9.23 | 9 | 1.70 | -0.33 |
| 4 |
| Gammarus (24 h) | 113 | -0.437 *** | -0.208 * | -6.17 | -6.35 | 1.46 | -0.16 | -3.52 | -10.73 | 8.19 |
| 1.71 | 0.43 |
| 6 | 9.64 |
| 1.46 | -0.16 |
| 6 |
| Gammarus (96 h) | 132 | -0.407 *** | -0.205 * | -6.56 | -6.66 | 1.63 | -0.12 | -3.20 | -10.82 | 8.14 |
| 1.71 | 0.46 |
| 6 | 9.57 | 9 | 1.47 | -0.12 |
| 6 |
| Fathead minnow | 578 | -0.311 *** | -0.308 *** | -3.83 | -3.82 | 1.38 | -0.15 | -0.04 | -9.38 | 5.59 | 5 | 1.41 |
|
|
| 7.07 | 7 | 1.75 | 0.19 |
|
|
| Rat oral | 814 | -0.441 *** | -0.372 *** | -2.40 | -2.27 | 0.91 | -0.99 | -0.43 | -6.98 | 6.57 | 6 | 1.64 | 0.40 |
|
| 8.11 | 9 | 2.14 | -0.34 |
|
|
| Mouse oral | 398 | -0.199 *** | -0.191 *** | -2.56 | -2.47 | 0.82 |
| -0.65 | -6.34 | 6.96 | 7 | 1.43 | -0.02 |
|
| 8.80 | 9 | 1.86 | -0.57 |
|
|
| Rat IP | 170 | -0.214 ** | -0.147 | -3.03 | -2.88 | 0.86 | -0.66 | -0.90 | -5.61 | 6.85 | 7 | 1.35 | 0.24 |
|
| 8.64 | 9 | 1.71 | -0.31 |
|
|
| Mouse IP | 290 | -0.180 ** | -0.161 ** | -3.21 | -3.02 | 1.04 | -0.93 | -0.99 | -7.50 | 6.85 | 7 | 1.45 | -0.09 |
|
| 8.50 | 9 | 1.98 | -0.46 |
|
|
| AMES | 163 | -0.316 *** | -0.518 *** | -4.72 | -4.25 | 1.67 | -0.51 | -1.52 | -9.88 | 6.33 | 7 | 1.33 | 0.07 |
|
| 8.40 | 9 | 2.09 | -0.49 |
|
|
| CPDBAS rat | 519 | -0.198 *** | -0.191 *** | -4.18 | -4.19 | 1.42 | -0.24 | -0.47 | -9.85 | 6.36 | 6 | 1.35 | -0.09 |
|
| 8.03 | 8 | 2.08 | -0.23 |
|
|
| CPDBAS mouse | 402 | -0.145 ** | -0.198 *** | -3.62 | -3.54 | 1.18 | -0.60 | -0.53 | -9.32 | 6.19 | 6 | 1.43 | -0.18 | 10 |
| 7.85 | 8 | 2.21 | -0.32 |
|
|
| CPDBAS hamster | 44 | -0.430 ** | -0.351 * | -4.46 | -4.53 | 0.99 | 0.43 | -1.89 | -6.05 | 5.95 | 6 | 1.51 |
| 9 |
| 7.32 | 8 |
| -0.17 |
|
|
| Drosophila | 139 | -0.397 *** | -0.337 *** |
|
| 0.94 | -0.82 | 0.23 |
| 5.82 | 6 | 1.35 | 0.17 | 10 |
| 7.32 | 7 | 2.07 | 0.00 |
|
|
| Lemna - non-Herbicides | 149 | -0.428 *** | -0.502 *** | -4.85 | -5.12 | 1.47 | 0.85 | -0.24 | -8.20 | 7.01 | 7 | 1.50 | -0.37 |
|
| 8.68 | 9 | 1.94 | -0.87 |
|
|
| Lemna - Herbicides | 174 | -0.392 *** | -0.428 *** | -6.21 | -6.08 | 1.65 | -0.19 | -2.99 | -10.23 | 8.02 |
| 1.63 | 0.27 |
| 4 | 9.56 | 9 | 1.77 | -0.02 |
| 4 |
| Tetrahymena | 334 | -0.408 *** | -0.448 *** | -3.51 | -3.59 | 0.97 | 0.10 | -1.07 | -5.82 | 5.62 | 5.5 | 1.03 | -0.01 |
|
| 7.16 | 7 | 1.55 | -0.11 |
|
|
| Chlorella | 91 | -0.578 *** | -0.738 *** | -2.86 | -3.16 | 1.47 | 0.43 |
| -6.10 | 6.15 | 6 | 1.74 | 0.86 |
|
| 7.62 | 7 | 1.77 | -0.43 |
|
|
| Scenedesmus | 63 | -0.237 | -0.467 *** | -5.80 | -5.90 | 1.61 |
| 0.10 | -8.04 | 6.27 | 6 | 1.15 | 0.10 | 9 | 3 | 8.17 | 9 | 1.53 |
|
| 3 |
| Yeast | 253 | 0.095 | -0.014 | -4.69 | -4.56 |
| -0.61 | -4.01 | -5.77 | 6.76 | 7 | 1.02 | 0.01 | 10 | 4 | 8.96 | 9 | 1.42 | -0.01 |
| 5 |
| NCI | 768 | -0.113 ** | -0.137 *** | -5.03 | -4.67 | 1.38 | -1.19 | -2.03 | -10.02 | 7.04 | 7 | 1.07 | 0.81 |
| 4 | 9.14 | 9 | 1.33 | -0.20 |
| 5 |
| HERG | 229 | -0.062 | 0.179 ** | -5.50 | -5.42 | 1.27 | -0.09 | -2.36 | -8.59 | 7.14 | 7 | 1.01 | -0.14 | 10 | 4 | 9.67 | 9 | 1.29 | -0.03 |
| 5 |
| Oestrogenic | 131 | -0.024 | -0.342 *** | -5.62 | -5.32 |
| -0.47 | -2.52 | -9.65 | 6.16 | 6 | 0.95 | 0.49 | 10 | 4 | 8.97 | 9 | 1.24 | -0.06 |
| 6 |
| Tadpole narcosis | 141 | -0.043 | -0.267 ** | -2.47 | -2.37 | 1.15 | -0.32 | -0.19 | -5.33 |
|
| 1.32 | 0.82 | 10 |
|
|
| 1.82 |
|
|
|
| COX-2 | 107 | -0.069 | -0.149 | -6.36 | -6.52 | 1.51 | 0.48 | -3.00 | -8.70 | 7.72 |
|
| 0.36 | 10 |
| 9.20 | 9 |
| -0.08 |
|
|
| Antihistamine | 61 | -0.097 | -0.0126 |
|
| 1.27 | 0.84 |
| -10.59 | 6.92 | 7 | 0.76 | 0.61 | 9 | 6 |
|
| 1.26 | -0.23 |
| 7 |
Left column, biological endpoints correlated in this study. Column 2 – number of molecules in the data set for that endpoint. Columns 3 and 4, correlation of the endpoint with UnBiological measures Ub5 and Ub6, as per Table 3. The rest of the table lists summary statistics on the toxicity endpoints (all in Moles/lirte for solution studies, moles/kg for animal studies), Ub5 or Ub6. Listed are mean, median, Standard deviation (St. Dev.), Skew (a measure of the asymmetry of the data - positive skewness indicates a distribution with an asymmetric tail extending toward more positive values, negative skewness indicates a distribution with an asymmetric tail extending toward more negative values), and maximum (max) and minimum (min) values
The maximum and minimum values in each column are underlined for ease of comparison. There are more minimum values for the biological endpoint in the pharmacologically defined endpoints Cox-2 and antihistamine, as would be expected as these molecules have been selected specifically for pharmacological potency (i.e. for low EC50 values). The molecular set used for Tadpole Narcosis has more minimum values for Ub5 and Ub6 statistics than might be expected by chance, suggesting that the set of molecules tested for narcosis induction in tadpoles is biased with respect to the other sets used in this study. Other sets are generally not obviously different from each other, and specifically the Ub5 and Ub6 statistics for HERG, Oestrogenic potential. COX-2 inhibition and Antihistamine efficacy appear be broadly similar to the Ub5 and Ub6 statistics for the molecule sets analysed for toxicity endpoints