| Literature DB >> 26863090 |
Thomas Luechtefeld1, Alexandra Maertens1, Daniel P Russo2, Costanza Rovida3, Hao Zhu2,4, Thomas Hartung1,3.
Abstract
The European Chemicals Agency (ECHA) warehouses the largest public dataset of in vivo and in vitro toxicity tests. In December 2014 this data was converted into a structured, machine readable and searchable database using linguistic search engines. It contains data for 9,801 unique substances, 3,609 unique study descriptions and 816,048 study documents.This allows exploring toxicological data on a scale far larger than previously available. Substance similarity analysis was used to determine clustering of substances for hazards by mapping to PubChem. Similarity was measured using PubChem 2D conformational substructure fingerprints, which were compared via the Tanimoto metric. Following K-Core filtration, the Blondel et al.(2008) module recognition algorithm was used to identify chemical modules showing clusters of substances in use within the chemical universe. Global Harmonized System of Classification and Labelling provides a valuable information source for hazard analysis. The most prevalent hazards are H317 "May cause an allergic skin reaction" with 20% and H318 "Causes serious eye damage" with 17% positive substances. Such prevalences obtained for all hazards here are key for the design of integrated testing strategies. The data allowed estimation of animal use. ECHA cover about 20% of substances in the high-throughput biological assay database Tox21 (1,737 substances) and have a 917 substance overlap with the Comparative Toxicogenomics Database (~7% of CTD). The biological data available in these datasets combined with ECHA in vivo endpoints have enormous modeling potential. A case is made that REACH should systematically open regulatory data for research purposes.Entities:
Keywords: animal testing; chemical toxicity; computational toxicology; database; in silico
Mesh:
Substances:
Year: 2016 PMID: 26863090 PMCID: PMC5408747 DOI: 10.14573/altex.1510052
Source DB: PubMed Journal: ALTEX ISSN: 1868-596X Impact factor: 6.043
Fig. 1Prevalence of purpose flags
Prevalence of the four purpose flags (disregarded study, key study, weight of evidence) over an extraction of 509,083 studies with purpose flags in REACH registrations 2008–2014.
Fig. 2Klimisch score pie chart
Prevalence of different Klimisch values over 539,675 studies with assignable Klimisch values in REACH registrations 2008–2014.
Fig. 3Chemical similarity for 3,122 substances mapped from ECHA dossiers to PubChem
Minimum similarity of 0.6. Substances without neighbors are filtered out. Gephi algorithm “Force Layout 2” used for layout.
Fig. 4Filtering of chemical similarity graph via K-Core
Chemical coloring via module membership (determined by Blondel et al. (2008) algorithm) to the nine global modules numbered 0–8.
Fig. 5Chemical examples from each module in Figure 4
Characteristic substructures for each module (
Fig. 4) as determined by modular frequency x inverse chemical frequency (MF_ICF)
Orange = Hierarchic Element Counts. Grey = Rings in a canonic extended smallest set of smallest rings (ESSR) ring set. Green = Simple atom nearest neighbors. Blue = SMARTS patterns. High MF_ICF numbers indicate stronger relationships between the given module and substructure.
| FP | MF_ICF | FP | MF_ICF | FP | MF_ICF |
|---|---|---|---|---|---|
| O=C-O-C:C | 0.92 | C-O-C-C=C | 0.25 | O=C-C-N-C | 0.49 |
| OC1C(O)CCCC1 | 0.88 | C=C-C-O-C | 0.25 | ≥1 Fe | 0.31 |
| Oc1c(O)cccc1 | 0.87 | O=C-C=C-[#1] | 0.18 | O=C-C-N | 0.3 |
| Cc1c(O)cccc1 | 0.84 | O=C-C=C | 0.16 | ≥1 Cu | 0.26 |
| O-C:C-O-[#1] | 0.83 | O-C-C=C | 0.15 | N-C-C-N-C | 0.23 |
| O-C:C-O | 0.82 | C-C-O-C-C | 0.13 | O-C-C-N-C | 0.22 |
| O=C-C:C-O | 0.8 | O(~C)(~C) | 0.11 | N-C-C-N | 0.2 |
| O-C:C-O-C | 0.8 | ≥1 Sn | 0.11 | O-C-C-N | 0.16 |
| C-C:C-O-[#1] | 0.77 | C(-C)(-O)(=O) | 0.1 | ≥2 Na | 0.16 |
| Cc1ccc(O)cc1 | 0.75 | C(-O)(=O) | 0.09 | ≥8 O | 0.15 |
| Nc1c(Cl)cccc1 | 0.87 | O-C-C=O | 0.33 | S=C-N-[#1] | 0.33 |
| NC1C(Cl)CCCC1 | 0.87 | O=C-C-O | 0.33 | C-S-C:C | 0.32 |
| O=C-C-C-C-C(N)-C | 0.86 | O=C-C-C-O | 0.24 | C(~N)(:C)(:C) | 0.31 |
| C-C=N-N-C | 0.85 | O-C-C-C=O | 0.24 | Cc1ccc(N)cc1 | 0.31 |
| N-N-C:C | 0.84 | ≥1 Zr | 0.22 | N-C-C-C:C | 0.31 |
| N(~C)(~H)(~N) | 0.84 | O=C-C-O-C | 0.18 | N-C-C:C-C | 0.31 |
| C=N-N-C | 0.83 | O-C-C-O-[#1] | 0.18 | N-C-C:C | 0.3 |
| N(~H)(~N) | 0.79 | ≥1 Pb | 0.17 | CC1CCC(N)CC1 | 0.3 |
| N-N-C-C | 0.79 | O=C-C-C-C-O | 0.17 | C(-C)(-N)(=C) | 0.3 |
| ≥5 unsaturated non-aromatic carbon-only ring size 6 | 0.71 | O-C-C-O | 0.16 | N-C:C:C-C | 0.29 |
| O=C-C-C-C-C-C | 0.33 | Cc1ccc(S)cc1 | 0.29 | CC1CC(O)CC1 | 0.97 |
| ≥1 Sn | 0.32 | CC1CCC(S)CC1 | 0.29 | CC1C(O)CCC1 | 0.97 |
| O=C-C-C-C-C | 0.32 | N-S-C:C | 0.28 | ≥3 saturated or aromatic carbon-only ring size 6 | 0.93 |
| O-O | 0.31 | C(~C)(~H)(~P) | 0.25 | ≥2 saturated or aromatic carbon-only ring size 6 | 0.75 |
| O=C-C-C-C-C-C-C | 0.29 | Cc1ccc(C)cc1 | 0.24 | ≥2 saturated or aromatic carbon-only ring size 5 | 0.74 |
| O-C-C-C-C-C-C-C | 0.29 | C(-C)(-Cl)(=O) | 0.2 | CC1C(C)CCC1 | 0.71 |
| O=C-C-C-C | 0.29 | N-S | 0.18 | ≥1 saturated or aromatic carbon-only ring size 5 | 0.71 |
| C(-C)(-O)(=O) | 0.28 | C(-C)(-H)(=O) | 0.18 | CC1CC(C)CC1 | 0.69 |
| O-C-C-C-C-C-C | 0.27 | C-P | 0.17 | CC1CC(O)CCC1 | 0.53 |
| O-C-C-C-C-C | 0.27 | S-C:C-C | 0.14 | ≥1 saturated or aromatic carbon-only ring size 6 | 0.48 |
Intermodular similarity as determined by cosine of angle between module substructure importance vectors
Substructure importance vectors are determined via analog to TFIDF, where a module’s importance for a given substructure is given by its frequency within the module multiplied by the inverse of its frequency in all substances. Green cells show the greatest similarity for the module in each row. These similarities fit well with visual inspection of Figure 4.
| Module | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| 1.00 | 0.26 | 0.02 | 0.24 | 0.18 | 0.19 | 0.26 | 0.36 | 0.18 | |
| 0.26 | 1.00 | 0.06 | 0.09 | 0.23 | 0.08 | 0.43 | 0.16 | 0.15 | |
| 0.02 | 0.06 | 1.00 | 0.06 | 0.10 | 0.05 | 0.10 | 0.04 | 0.02 | |
| 0.24 | 0.09 | 0.06 | 1.00 | 0.04 | 0.53 | 0.11 | 0.32 | 0.05 | |
| 0.18 | 0.23 | 0.10 | 0.04 | 1.00 | 0.02 | 0.43 | 0.04 | 0.22 | |
| 0.19 | 0.08 | 0.05 | 0.53 | 0.02 | 1.00 | 0.13 | 0.40 | 0.04 | |
| 0.26 | 0.43 | 0.10 | 0.11 | 0.43 | 0.13 | 1.00 | 0.20 | 0.30 | |
| 0.36 | 0.16 | 0.04 | 0.32 | 0.04 | 0.40 | 0.20 | 1.00 | 0.09 | |
| 0.18 | 0.15 | 0.02 | 0.05 | 0.22 | 0.04 | 0.30 | 0.09 | 1.00 |
Top 3 OECD TG counts by category in REACH registrations 2008–2014
Counts give total number of studies following the given OECD TG.
| Category | OECD TG | Count | Description |
|---|---|---|---|
|
| |||
| InVitro | 471 | 6044 | Bacterial Reverse Mutation Test ( |
| 431 | 3576 | ||
| 435 | 3287 | ||
|
| |||
| QSAR/PCHEM | 105 | 2920 | Water Solubility ( |
| 109 | 2420 | Density of liquids and solids ( | |
| 102 | 2322 | Melting Point/Range ( | |
|
| |||
| InVivo | 404 | 8548 | Acute Dermal Irritation/Corrosion ( |
| 405 | 8142 | Acute Eye Irritation/Corrosion (Draize) ( | |
| 401 | 7852 | Acute Oral Toxicity ( | |
|
| |||
| ReadAcross | 471 | 3896 | Bacterial Reverse Mutation Test ( |
| 401 | 2747 | Acute Oral Toxicity ( | |
| 201 | 2679 | Cyanobacteria Growth Inhibition Test ( | |
Fig.6Frequency of different health hazards in extracted dataset of REACH registrations 2008–2014
Hazard definitions given in Table 4 (Hazard values extracted for 6,186 substances). Green bars designate the frequency of chemicals labeled with the given hazard, red bars designate the frequency of chemicals not labeled with given hazard.
Hazard counts for extracted GHS hazards in REACH registrations 2008–2014
6,186 substances had extractable classification and labeling data in ECHA dossiers.
| Description | Hazard | Labeled substances | Conclusive but not sufficient for classification | Data lacking | Inconclusive | Failed extraction |
|---|---|---|---|---|---|---|
| Unstable explosive | H200 | 6 (0.1%) | 5507 (89%) | 492 (8%) | 22 (0.4%) | 159 (2.6%) |
| Explosive; mass explosion hazard | H201 | 14 (0.2%) | 5499 (88.9%) | 492 (8%) | 22 (0.4%) | 159 (2.6%) |
| Fire or projection hazard | H204 | 16 (0.3%) | 5497 (88.9%) | 492 (8%) | 22 (0.4%) | 159 (2.6%) |
| Extremely flammable gas | H220 | 41 (0.7%) | 4841 (78.3%) | 1122 (18.1%) | 23 (0.4%) | 159 (2.6%) |
| Flammable gas | H221 | 2 (0%) | 4880 (78.9%) | 1122 (18.1%) | 23 (0.4%) | 159 (2.6%) |
| Extremely flammable liquid and vapour | H224 | 90 (1.5%) | 5047 (81.6%) | 877 (14.2%) | 13 (0.2%) | 159 (2.6%) |
| Highly flammable liquid and vapour | H225 | 268 (4.3%) | 4869 (78.7%) | 877 (14.2%) | 13 (0.2%) | 159 (2.6%) |
| Flammable liquid and vapour | H226 | 401 (6.5%) | 4741 (76.6%) | 872 (14.1%) | 13 (0.2%) | 159 (2.6%) |
| Combustible liquid | H227 | 2 (0%) | 5136 (83%) | 876 (14.2%) | 13 (0.2%) | 159 (2.6%) |
| Flammable solid | H228 | 68 (1.1%) | 5277 (85.3%) | 659 (10.7%) | 23 (0.4%) | 159 (2.6%) |
| May react explosively even in the absence of air | H230 | 2 (0%) | 4880 (78.9%) | 1122 (18.1%) | 23 (0.4%) | 159 (2.6%) |
| Heating may cause a fire or explosion | H241 | 4 (0.1%) | 4879 (78.9%) | 1126 (18.2%) | 17 (0.3%) | 160 (2.6%) |
| Heating may cause a fire | H242 | 55 (0.9%) | 4831 (78.1%) | 1123 (18.2%) | 17 (0.3%) | 160 (2.6%) |
| Catches fire spontaneously if exposed to air | H250 | 13 (0.2%) | 5077 (82.1%) | 918 (14.8%) | 19 (0.3%) | 159 (2.6%) |
| Self-heating; may catch fire | H251 | 13 (0.2%) | 4886 (79%) | 1109 (17.9%) | 19 (0.3%) | 159 (2.6%) |
| Self-heating in large quantities; may catch fire | H252 | 8 (0.1%) | 4892 (79.1%) | 1108 (17.9%) | 19 (0.3%) | 159 (2.6%) |
| In contact with water releases flammable gases which may ignite spontaneously | H260 | 15 (0.2%) | 5076 (82.1%) | 914 (14.8%) | 22 (0.4%) | 159 (2.6%) |
| In contact with water releases flammable gas | H261 | 8 (0.1%) | 5083 (82.2%) | 914 (14.8%) | 22 (0.4%) | 159 (2.6%) |
| May cause or intensify fire; oxidizer | H270 | 3 (0%) | 4862 (78.6%) | 1139 (18.4%) | 23 (0.4%) | 159 (2.6%) |
| May cause fire or explosion; strong oxidizer | H271 | 15 (0.2%) | 5262 (85.1%) | 732 (11.8%) | 18 (0.3%) | 159 (2.6%) |
| May intensify fire; oxidizer | H272 | 41 (0.7%) | 5236 (84.6%) | 732 (11.8%) | 18 (0.3%) | 159 (2.6%) |
| Contains gas under pressure; may explode if heated | H280 | 60 (1%) | 4784 (77.3%) | 1161 (18.8%) | 22 (0.4%) | 159 (2.6%) |
| Contains refrigerated gas; may cause cryogenic burns or injury | H281 | 1 (0%) | 4843 (78.3%) | 1161 (18.8%) | 22 (0.4%) | 159 (2.6%) |
| May be corrosive to metals | H290 | 125 (2%) | 3722 (60.2%) | 2155 (34.8%) | 25 (0.4%) | 159 (2.6%) |
| Fatal if swallowed | H300 | 33 (0.5%) | 5709 (92.3%) | 273 (4.4%) | 12 (0.2%) | 159 (2.6%) |
| Toxic if swallowed | H301 | 225 (3.6%) | 5518 (89.2%) | 272 (4.4%) | 12 (0.2%) | 159 (2.6%) |
| Harmful if swallowed | H302 | 1072 (17.3%) | 4677 (75.6%) | 266 (4.3%) | 12 (0.2%) | 159 (2.6%) |
| May be harmful if swallowed | H303 | 23 (0.4%) | 5720 (92.5%) | 272 (4.4%) | 12 (0.2%) | 159 (2.6%) |
| May be fatal if swallowed and enters airways | H304 | 453 (7.3%) | 2913 (47.1%) | 2626 (42.5%) | 35 (0.6%) | 159 (2.6%) |
| May be harmful if swallowed and enters airways | H305 | 3 (0%) | 3361 (54.3%) | 2628 (42.5%) | 35 (0.6%) | 159 (2.6%) |
| Fatal in contact with skin | H310 | 30 (0.5%) | 4905 (79.3%) | 1074 (17.4%) | 18 (0.3%) | 159 (2.6%) |
| Toxic in contact with skin | H311 | 164 (2.7%) | 4774 (77.2%) | 1071 (17.3%) | 18 (0.3%) | 159 (2.6%) |
| Harmful in contact with skin | H312 | 209 (3.4%) | 4728 (76.4%) | 1072 (17.3%) | 18 (0.3%) | 159 (2.6%) |
| May be harmful in contact with skin | H313 | 9 (0.1%) | 4924 (79.6%) | 1076 (17.4%) | 18 (0.3%) | 159 (2.6%) |
| Causes severe skin burns and eye damage | H314 | 615 (9.9%) | 5105 (82.5%) | 290 (4.7%) | 16 (0.3%) | 160 (2.6%) |
| Causes skin irritation | H315 | 1010 (16.3%) | 4714 (76.2%) | 287 (4.6%) | 15 (0.2%) | 160 (2.6%) |
| Causes mild skin irritation | H316 | 10 (0.2%) | 5706 (92.2%) | 294 (4.8%) | 16 (0.3%) | 160 (2.6%) |
| May cause an allergic skin reaction | H317 | 1255 (20.3%) | 4317 (69.8%) | 428 (6.9%) | 26 (0.4%) | 160 (2.6%) |
| Causes serious eye damage | H318 | 1087 (17.6%) | 4574 (73.9%) | 352 (5.7%) | 14 (0.2%) | 159 (2.6%) |
| Causes serious eye irritation | H319 | 885 (14.3%) | 4762 (77%) | 366 (5.9%) | 14 (0.2%) | 159 (2.6%) |
| Causes eye irritation | H320 | 44 (0.7%) | 5592 (90.4%) | 377 (6.1%) | 14 (0.2%) | 159 (2.6%) |
| Fatal if inhaled | H330 | 119 (1.9%) | 3385 (54.7%) | 2480 (40.1%) | 43 (0.7%) | 159 (2.6%) |
| Toxic if inhaled | H331 | 188 (3%) | 3314 (53.6%) | 2482 (40.1%) | 43 (0.7%) | 159 (2.6%) |
| Harmful if inhaled | H332 | 446 (7.2%) | 3064 (49.5%) | 2474 (40%) | 43 (0.7%) | 159 (2.6%) |
| May cause allergy or asthma symptoms or breathing difficulties if inhaled | H334 | 127 (2.1%) | 2054 (33.2%) | 3825 (61.8%) | 21 (0.3%) | 159 (2.6%) |
| May cause respiratory irritation | H335 | 377 (6.1%) | 4156 (67.2%) | 1409 (22.8%) | 35 (0.6%) | 209 (3.4%) |
| May cause drowsiness or dizziness | H336 | 207 (3.3%) | 4315 (69.8%) | 1415 (22.9%) | 35 (0.6%) | 214 (3.5%) |
| May cause genetic defects | H340 | 143 (2.3%) | 4983 (80.6%) | 770 (12.4%) | 80 (1.3%) | 210 (3.4%) |
| Suspected of causing genetic defects | H341 | 126 (2%) | 5005 (80.9%) | 766 (12.4%) | 79 (1.3%) | 210 (3.4%) |
| May cause cancer | H350 | 342 (5.5%) | 2260 (36.5%) | 3373 (54.5%) | 24 (0.4%) | 187 (3%) |
| Suspected of causing cancer | H351 | 143 (2.3%) | 2460 (39.8%) | 3372 (54.5%) | 24 (0.4%) | 187 (3%) |
| May damage fertility or the unborn child | H360 | 191 (3.1%) | 3854 (62.3%) | 1927 (31.2%) | 54 (0.9%) | 160 (2.6%) |
| Suspected of damaging fertility or the unborn child | H361 | 370 (6%) | 3677 (59.4%) | 1925 (31.1%) | 54 (0.9%) | 160 (2.6%) |
| May cause harm to breast-fed children | H362 | 9 (0.1%) | 2132 (34.5%) | 3865 (62.5%) | 21 (0.3%) | 159 (2.6%) |
| Causes damage to organs | H370 | 32 (0.5%) | 4476 (72.4%) | 1414 (22.9%) | 35 (0.6%) | 229 (3.7%) |
| May cause damage to organs | H371 | 24 (0.4%) | 4485 (72.5%) | 1415 (22.9%) | 35 (0.6%) | 227 (3.7%) |
| Causes damage to organs through prolonged or repeated exposure | H372 | 258 (4.2%) | 4466 (72.2%) | 1216 (19.7%) | 44 (0.7%) | 202 (3.3%) |
| May cause damage to organs through prolonged or repeated exposure | H373 | 453 (7.3%) | 4277 (69.1%) | 1212 (19.6%) | 44 (0.7%) | 200 (3.2%) |
| Very toxic to aquatic life | H400 | 805 (13%) | 4258 (68.8%) | 938 (15.2%) | 16 (0.3%) | 169 (2.7%) |
| Toxic to aquatic life | H401 | 31 (0.5%) | 4893 (79.1%) | 1078 (17.4%) | 16 (0.3%) | 168 (2.7%) |
| Harmful to aquatic life | H402 | 33 (0.5%) | 4888 (79%) | 1080 (17.5%) | 16 (0.3%) | 169 (2.7%) |
| Very toxic to aquatic life with long-lasting effects | H410 | 715 (11.6%) | 4840 (78.2%) | 455 (7.4%) | 16 (0.3%) | 160 (2.6%) |
| Toxic to aquatic life with long-lasting effects | H411 | 870 (14.1%) | 4649 (75.2%) | 489 (7.9%) | 16 (0.3%) | 162 (2.6%) |
| Harmful to aquatic life with long-lasting effects | H412 | 615 (9.9%) | 4904 (79.3%) | 489 (7.9%) | 16 (0.3%) | 162 (2.6%) |
| May cause long-lasting harmful effects to aquatic life | H413 | 276 (4.5%) | 5240 (84.7%) | 492 (8%) | 16 (0.3%) | 162 (2.6%) |
| Harms public health and the environment by destroying ozone in the upper atmosphere | H420 | 3 (0%) | 2706 (43.7%) | 3272 (52.9%) | 8 (0.1%) | 197 (3.2%) |
Fig. 7Number of sources from each year
Possible double counting due to absence of reference identifiers in ECHA dossiers.
Fig. 8Number of animals used in data sources from referenced year in REACH registrations 2008–2014
Possible double counting due to missing reference identifiers in ECHA dossiers.
Number of substances shared between pairs of toxicological/chemical databases
REACH refers only to substances extracted for this publication.
| REACH | ToxRefDB | Tox21 | CTD | ChEMBl | PubChem | |
|---|---|---|---|---|---|---|
| 9,801 | ||||||
| 51 | 474 | |||||
| 1,737 | 375 | 8,599 | ||||
| 917 | 230 | 2,511 | 13,446 | |||
| 2,080 | 339 | 6,001 | 5,490 | 1,715,667 | ||
| 4,955 | 465 | 8,065 | 7,729 | 1,394,860 | 68,369,258 |