| Literature DB >> 26908244 |
Kamel Mansouri1, Ahmed Abdelaziz, Aleksandra Rybacka, Alessandra Roncaglioni, Alexander Tropsha, Alexandre Varnek, Alexey Zakharov, Andrew Worth, Ann M Richard, Christopher M Grulke, Daniela Trisciuzzi, Denis Fourches, Dragos Horvath, Emilio Benfenati, Eugene Muratov, Eva Bay Wedebye, Francesca Grisoni, Giuseppe F Mangiatordi, Giuseppina M Incisivo, Huixiao Hong, Hui W Ng, Igor V Tetko, Ilya Balabin, Jayaram Kancherla, Jie Shen, Julien Burton, Marc Nicklaus, Matteo Cassotti, Nikolai G Nikolov, Orazio Nicolotti, Patrik L Andersson, Qingda Zang, Regina Politi, Richard D Beger, Roberto Todeschini, Ruili Huang, Sherif Farag, Sine A Rosenberg, Svetoslav Slavov, Xin Hu, Richard S Judson.
Abstract
BACKGROUND: Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26908244 PMCID: PMC4937869 DOI: 10.1289/ehp.1510267
Source DB: PubMed Journal: Environ Health Perspect ISSN: 0091-6765 Impact factor: 9.031
Methods adopted by the participant groups (alphabetic order) in the modeling procedure.
| Model name | Calibration method | Descriptors software/type | Training set (No. of chemicals) | Predictions type |
|---|---|---|---|---|
| DTU | PLS/fragments | Leadscope | METI (595,481)/ToxCast™ (1,422) | Categorical |
| EPA_NCCT | GA + PLSDA | PADEL | ToxCast™ (1,529) | Categorical |
| FDA_NCTR_DBB (Ng et al. 2014) | DF | Mold2 | ToxCast™ (1,677) | Categorical |
| FDA_NCTR_DSB | PLS | 3D-SDAR | ToxCast™ (1019) | Categorical |
| ILS_EPA (Zang et al. 2013) | SVM + RF | Qikprop | ToxCast™ (1,677) | Categorical |
| IRCCS_CART (Roncaglioni et al. 2008) | CART-VEGA | 2D descriptors | METI (806) | Categorical |
| IRCCS_Ruleset | Ruleset | SMARTS | ToxCast™ (1,529) | Categorical |
| JRC_Ispra (Poroikov et al. 2000) | PASS | MNA | — | Categorical |
| Lockheed Martin | kNN | Fingerprints | ToxCast™ (1,677) | Categorical + continuous |
| NIH_NCATS | Docking | AutoDock score | — | Categorical |
| NIH_NCI_GUSAR (Filimonov et al. 2009) | RBF-SCR | MNA, QNA | ToxCast™ (1,677) | Categorical |
| NIH_NCI_PASS (Poroikov et al. 2000) | PASS | MNA | ToxCast™ (1,677) | Categorical |
| OCHEM (2015) | 11 Descriptor types | ToxCast™(1,660) | Categorical + continuous | |
| RIFM | SVM | Fingerprints | ToxCast™ (1,677) | Categorical |
| Umeå (Rybacka et al. 2015) | ASNN | DRAGON | METI + (Kuiper et al. 1997; Taha et al. 2010) | Categorical |
| UNC_MML | SVM+RF | DRAGON | ToxCast™ (120) | Categorical |
| UNIBA (Trisciuzzi et al. 2015) | Docking | GLIDE score | ToxCast™ (1,677) | Categorical |
| UNIMIB | kNN | DRAGON + fingerprints | ToxCast™ (1,677) | Categorical |
| UNISTRA (Horvath et al. 2014) | SVM | ISIDA | ToxCast™ (1,529) | Categorical + continuous |
| Predictions type: A categorical model is one that provides an active/inactive call for each chemical, whereas a continuous model provides a prediction of the potency (in μM) for each active chemical. Calibration methods: PLS (partial least-squares), PLS-DA (partial least-squares discriminant analysis), SVM (support vector machines), RF (random forest), DF (Decision forest), kNN ( | ||||
Evaluation set for binary categorical models. Distribution of the number of active and inactive chemicals within the three different classes: binding, agonists and antagonists.
| Class/activity | Active | Inactive | Total |
|---|---|---|---|
| Binding | 1,982 | 5,301 | 7,283 |
| Agonist | 350 | 5,969 | 6,319 |
| Antagonist | 284 | 6,255 | 6,539 |
| Total | 2,017 | 7,024 | 7,522 |
| The classification into actives and inactives is based on a consensus between the literature data sources that were in agreement. | |||
Evaluation set for quantitative models. Distribution of the number of chemicals in the five potency levels within the three different classes (binding, agonists, and antagonists), classifications based on average scores.
| Class/activity | Inactive | Very weak | Weak | Moderate | Strong | Total |
|---|---|---|---|---|---|---|
| Binding | 5,042 | 685 | 894 | 72 | 77 | 6,770 |
| Agonist | 5,892 | 19 | 179 | 31 | 42 | 6,163 |
| Antagonist | 6,221 | 76 | 188 | 10 | 10 | 6,505 |
| Total | 6,892 | 702 | 916 | 81 | 93 | 7,253 |
| The classification of the chemicals in the five potency levels is based on the concentration responses from the literature sources that were in agreement. | ||||||
Confusion matrices of categorical consensus predictions for binding.
| Observed/predicted | ToxCast™ data predicted actives | ToxCast™ data predicted inactives | Literature evaluation set (all: 7,283) predicted actives | Literature evaluation set (all: 7,283) predicted inactives |
|---|---|---|---|---|
| Observed actives | 76 | 13 | 467 | 1,515 |
| Observed inactives | 25 | 1,415 | 268 | 5,033 |
Statistics of categorical consensus predictions for binding on ToxCast™ and literature data.
| Statistics/used data | ToxCast™ data | Literature evaluation set (all: 7,283) | Literature evaluation set (> 6 sources: 1,257) |
|---|---|---|---|
| Sensitivity | 0.85 | 0.23 | 0.85 |
| Specificity | 0.98 | 0.95 | 0.97 |
| Balanced accuracy | 0.92 | 0.59 | 0.91 |
| The literature data with more than six sources represents the most consistent part of the evaluation set. | |||
Figure 1ROC curves of the categorical corrected consensus predictions for binding evaluated against different sets of the evaluation set with variable numbers of literature sources. The number of available chemicals in the evaluation set (between brackets) decreased with higher numbers of literature sources. The true and false positive rates are determined based on the number of actives in the different sets of the evaluation set.
Figure 2Box-plot of the positive class potency levels in the corrected quantitative consensus predictions for binding. The concordance between models is the fraction of the number of models that agrees on the prediction of a certain chemical. Boxes extend from the 25th to the 75th percentile, horizontal bars represent the median, whiskers indicate the 10th and 90th percentiles, and outliers are represented as points.
Number of chemicals reclassified after applying each one of the four prediction correction rules.
| Rule used for each class | Rule 1 | Rule 2 | Rule 3 | Rule 4 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Agonist | Antagonist | Binding | Agonist | Antagonist | Binding | Agonist | Antagonist | Binding | Binding | |
| Number of chemicals | 1,288 | 2,760 | 1,587 | 217 | 14 | 344 | 145 | 161 | 38 | 966 |
| Rule 1: Chemicals that changed from inactive to active in the quantitative consensus based on the categorical | ||||||||||
Confusion matrices of the modified categorical consensus predictions for binding.
| Observed/predicted | ToxCast™ data predicted actives | ToxCast™ data predicted inactives | Literature evaluation set (All: 7,283) predicted actives | Literature evaluation set (All: 7,283) predicted inactives |
|---|---|---|---|---|
| Observed actives | 83 | 6 | 597 | 1,385 |
| Observed inactives | 40 | 1,400 | 463 | 4,838 |
Statistics of the modified categorical consensus for binding predictions on ToxCast™ and literature data.
| Statistics/used data | ToxCast™ data | Literature evaluation set (All: 7,283) | Literature evaluation set (> 6 Sources: 1,275) |
|---|---|---|---|
| Sensitivity | 0.93 | 0.30 | 0.87 |
| Specificity | 0.97 | 0.91 | 0.94 |
| Balanced accuracy | 0.95 | 0.61 | 0.91 |
Figure 3Variation of the balanced accuracy of the corrected categorical consensus predictions for binding with positive concordance (agreement between models on predictions for active chemicals) threshold at different numbers of literature sources.