| Literature DB >> 33113912 |
Kota Kurosaki1, Raymond Wu1, Yoshihiro Uesawa1.
Abstract
Because the health effects of many compounds are unknown, regulatory toxicology must often rely on the development of quantitative structure-activity relationship (QSAR) models to efficiently discover molecular initiating events (MIEs) in the adverse-outcome pathway (AOP) framework. However, the QSAR models used in numerous toxicity prediction studies are publicly unavailable, and thus, they are challenging to use in practical applications. Approaches that simultaneously identify the various toxic responses induced by a compound are also scarce. The present study develops Toxicity Predictor, a web application tool that comprehensively identifies potential MIEs. Using various chemicals in the Toxicology in the 21st Century (Tox21) 10K library, we identified potential endocrine-disrupting chemicals (EDCs) using a machine-learning approach. Based on the optimized three-dimensional (3D) molecular structures and XGBoost algorithm, we established molecular descriptors for QSAR models. Their predictive performances and applicability domain were evaluated and applied to Toxicity Predictor. The prediction performance of the constructed models matched that of the top model in the Tox21 Data Challenge 2014. These advanced prediction results for MIEs are freely available on the Internet.Entities:
Keywords: machine learning; molecular descriptor; nuclear receptor; prediction model; stress response pathway
Mesh:
Substances:
Year: 2020 PMID: 33113912 PMCID: PMC7660166 DOI: 10.3390/ijms21217853
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Activity distribution of 59 molecular initiating events (MIEs) in the Tox21 10K library: (a) the number of chemical compounds in the case of criteria 40 and (b) the number of chemical compounds in the case of criteria 1. Orange and blue show active and inactive chemicals, respectively.
Predictive performances in the test set for each target.
| No. | AID | Abbreviation | Criteria 40 | Criteria 1 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | SE | SP | ACC | BAC | MCC | AUC | SE | SP | ACC | BAC | MCC | |||
| 1 | 720516 | ATAD5_ind | 0.840 | 0.750 | 0.843 | 0.840 | 0.796 | 0.272 | 0.845 | 0.744 | 0.847 | 0.839 | 0.795 | 0.395 |
| 2 | 720552 | p53_ago | 0.899 | 0.824 | 0.830 | 0.830 | 0.827 | 0.356 | 0.845 | 0.804 | 0.793 | 0.794 | 0.799 | 0.458 |
| 3 | 720637 | MMP_disr | 0.919 | 0.845 | 0.846 | 0.846 | 0.845 | 0.501 | 0.795 | 0.698 | 0.788 | 0.758 | 0.743 | 0.475 |
| 4 | 720719 | GR_ago | 0.783 | 0.600 | 0.931 | 0.923 | 0.766 | 0.300 | 0.841 | 0.754 | 0.807 | 0.800 | 0.780 | 0.416 |
| 5 | 720725 | GR_ant | 0.808 | 0.577 | 0.905 | 0.888 | 0.741 | 0.328 | 0.827 | 0.801 | 0.721 | 0.743 | 0.761 | 0.471 |
| 6 | 743053 | Arlbd_ago | 0.878 | 0.765 | 0.947 | 0.941 | 0.856 | 0.481 | 0.766 | 0.582 | 0.843 | 0.806 | 0.712 | 0.357 |
| 7 | 743054 | ARfull_ant | 0.774 | 0.750 | 0.681 | 0.683 | 0.716 | 0.169 | 0.833 | 0.841 | 0.700 | 0.734 | 0.770 | 0.468 |
| 8 | 743063 | Arlbd_ant | 0.844 | 0.786 | 0.791 | 0.790 | 0.788 | 0.338 | 0.833 | 0.805 | 0.724 | 0.745 | 0.765 | 0.469 |
| 9 | 743067 | TR_ant | 0.783 | 0.511 | 0.924 | 0.906 | 0.718 | 0.306 | 0.829 | 0.740 | 0.825 | 0.796 | 0.782 | 0.555 |
| 10 | 743077 | ERlbd_ago | 0.782 | 0.536 | 0.961 | 0.938 | 0.748 | 0.457 | 0.735 | 0.600 | 0.843 | 0.812 | 0.722 | 0.362 |
| 11 | 743078 | ERlbd_ant | 0.810 | 0.815 | 0.684 | 0.691 | 0.750 | 0.237 | 0.805 | 0.696 | 0.789 | 0.767 | 0.743 | 0.444 |
| 12 | 743091 | ERfull_ant | 0.826 | 0.872 | 0.699 | 0.705 | 0.785 | 0.235 | 0.862 | 0.730 | 0.870 | 0.842 | 0.800 | 0.555 |
| 13 | 743122 | AhR_ago | 0.888 | 0.713 | 0.907 | 0.887 | 0.810 | 0.513 | 0.749 | 0.728 | 0.695 | 0.702 | 0.711 | 0.359 |
| 14 | 743139 | Arom_ant | 0.801 | 0.892 | 0.598 | 0.608 | 0.745 | 0.186 | 0.807 | 0.825 | 0.661 | 0.704 | 0.743 | 0.429 |
| 15 | 743140 | PPARg_ago | 0.813 | 0.750 | 0.823 | 0.821 | 0.786 | 0.238 | 0.832 | 0.735 | 0.819 | 0.805 | 0.777 | 0.457 |
| 16 | 743199 | PPARg_ant | 0.829 | 0.786 | 0.798 | 0.798 | 0.792 | 0.290 | 0.810 | 0.824 | 0.645 | 0.682 | 0.734 | 0.383 |
| 17 | 743219 | ARE_ago | 0.785 | 0.794 | 0.652 | 0.672 | 0.723 | 0.317 | 0.795 | 0.770 | 0.715 | 0.733 | 0.742 | 0.461 |
| 18 | 743226 | PPARd_ant | 0.681 | 0.600 | 0.885 | 0.884 | 0.743 | 0.111 | 0.811 | 0.764 | 0.749 | 0.751 | 0.756 | 0.374 |
| 19 | 743227 | PPARd_ago | 0.812 | 0.615 | 0.954 | 0.949 | 0.785 | 0.296 | 0.796 | 0.705 | 0.790 | 0.780 | 0.747 | 0.356 |
| 20 | 743228 | HSR_act | 0.788 | 0.576 | 0.922 | 0.910 | 0.749 | 0.315 | 0.790 | 0.667 | 0.808 | 0.789 | 0.737 | 0.370 |
| 21 | 743239 | FXR_ago | 0.775 | 0.727 | 0.836 | 0.835 | 0.782 | 0.163 | 0.817 | 0.689 | 0.834 | 0.825 | 0.762 | 0.325 |
| 22 | 743240 | FXR_ant | 0.757 | 0.933 | 0.565 | 0.577 | 0.749 | 0.178 | 0.843 | 0.788 | 0.799 | 0.798 | 0.794 | 0.481 |
| 23 | 743241 | VDR_ago | N.D | N.D | N.D | N.D | N.D | N.D | 0.826 | 0.769 | 0.727 | 0.730 | 0.748 | 0.297 |
| 24 | 743242 | VDR_ant | 0.716 | 1.000 | 0.399 | 0.403 | 0.699 | 0.066 | 0.701 | 0.630 | 0.689 | 0.678 | 0.660 | 0.258 |
| 25 | 1159518 | NFkB_ago | 0.780 | 0.667 | 0.846 | 0.846 | 0.756 | 0.081 | 0.871 | 0.692 | 0.912 | 0.900 | 0.802 | 0.427 |
| 26 | 1159519 | ERsr_ago | 0.638 | 0.857 | 0.441 | 0.445 | 0.649 | 0.052 | 0.801 | 0.655 | 0.833 | 0.816 | 0.744 | 0.349 |
| 27 | 1159523 | ROR_ant | 0.828 | 0.789 | 0.764 | 0.766 | 0.777 | 0.323 | 0.695 | 0.523 | 0.819 | 0.703 | 0.671 | 0.359 |
| 28 | 1159528 | AP1_ago | 0.777 | 0.553 | 0.877 | 0.851 | 0.715 | 0.319 | 0.799 | 0.765 | 0.722 | 0.729 | 0.743 | 0.372 |
| 29 | 1159531 | RXR_ago | 0.532 | 0.235 | 0.964 | 0.951 | 0.600 | 0.135 | 0.725 | 0.527 | 0.841 | 0.756 | 0.684 | 0.374 |
| 30 | 1159555 | RAR_ant | 0.831 | 0.800 | 0.742 | 0.746 | 0.771 | 0.308 | 0.683 | 0.740 | 0.511 | 0.601 | 0.626 | 0.249 |
| 31 | 1224892 | CAR_ago | 0.889 | 0.826 | 0.808 | 0.810 | 0.817 | 0.455 | 0.847 | 0.684 | 0.889 | 0.845 | 0.787 | 0.556 |
| 32 | 1224893 | CAR_ant | 0.809 | 0.652 | 0.880 | 0.874 | 0.766 | 0.239 | 0.793 | 0.700 | 0.768 | 0.746 | 0.734 | 0.448 |
| 33 | 1224894 | HIF1_ago | 0.556 | 0.250 | 1.000 | 0.997 | 0.625 | 0.499 | 0.854 | 0.769 | 0.829 | 0.824 | 0.799 | 0.395 |
| 34 | 1224895 | TSHR_ago | 0.872 | 0.750 | 0.880 | 0.874 | 0.815 | 0.355 | 0.838 | 0.692 | 0.831 | 0.816 | 0.762 | 0.389 |
| 35 | 1224896 | H2AX_ago | 0.834 | 0.696 | 0.892 | 0.880 | 0.794 | 0.394 | 0.779 | 0.605 | 0.842 | 0.814 | 0.724 | 0.354 |
| 36 | 1259247 | Arfulls_ant | 0.856 | 0.857 | 0.733 | 0.747 | 0.795 | 0.401 | 0.824 | 0.788 | 0.767 | 0.774 | 0.778 | 0.534 |
| 37 | 1259248 | Erfulls_ant | 0.835 | 0.850 | 0.702 | 0.711 | 0.776 | 0.283 | 0.793 | 0.668 | 0.798 | 0.770 | 0.733 | 0.416 |
| 38 | 1259387 | ARant_ago | 0.852 | 0.727 | 0.946 | 0.939 | 0.837 | 0.460 | 0.712 | 0.494 | 0.872 | 0.841 | 0.683 | 0.275 |
| 39 | 1259388 | HDAC_ant | 0.897 | 0.783 | 0.888 | 0.883 | 0.835 | 0.407 | 0.868 | 0.768 | 0.879 | 0.871 | 0.824 | 0.447 |
| 40 | 1259390 | Shh_ago | 0.571 | 1.000 | 0.219 | 0.223 | 0.609 | 0.042 | 0.724 | 0.609 | 0.913 | 0.905 | 0.761 | 0.266 |
| 41 | 1259391 | ERaant_ago | 0.934 | 0.850 | 0.959 | 0.956 | 0.904 | 0.493 | 0.782 | 0.551 | 0.898 | 0.880 | 0.725 | 0.299 |
| 42 | 1259392 | Shh_ant | 0.829 | 0.809 | 0.718 | 0.731 | 0.764 | 0.379 | 0.758 | 0.642 | 0.745 | 0.705 | 0.693 | 0.383 |
| 43 | 1259393 | TSHR_agoant | 0.834 | 0.750 | 0.875 | 0.874 | 0.812 | 0.120 | 0.669 | 0.727 | 0.681 | 0.682 | 0.704 | 0.093 |
| 44 | 1259394 | ERb_ago | 0.980 | 0.923 | 0.973 | 0.972 | 0.948 | 0.531 | 0.729 | 0.444 | 0.937 | 0.900 | 0.691 | 0.348 |
| 45 | 1259395 | TSHR_ant | 0.865 | 0.933 | 0.715 | 0.721 | 0.824 | 0.244 | 0.850 | 0.800 | 0.807 | 0.807 | 0.804 | 0.381 |
| 46 | 1259396 | Erb_ant | 0.825 | 0.677 | 0.863 | 0.851 | 0.770 | 0.352 | 0.798 | 0.743 | 0.763 | 0.758 | 0.753 | 0.462 |
| 47 | 1259401 | ERRPGC_ant | 0.843 | 0.698 | 0.843 | 0.837 | 0.770 | 0.290 | 0.751 | 0.595 | 0.793 | 0.723 | 0.694 | 0.390 |
| 48 | 1259402 | ERRPGC_ago | 0.840 | 0.650 | 0.937 | 0.925 | 0.794 | 0.415 | 0.805 | 0.734 | 0.777 | 0.768 | 0.756 | 0.444 |
| 49 | 1259403 | ERR_ant | 0.812 | 0.653 | 0.856 | 0.835 | 0.755 | 0.392 | 0.819 | 0.696 | 0.826 | 0.786 | 0.761 | 0.510 |
| 50 | 1259404 | ERR_ago | 0.884 | 0.880 | 0.814 | 0.816 | 0.847 | 0.274 | 0.803 | 0.680 | 0.820 | 0.777 | 0.750 | 0.491 |
| 51 | 1347030 | TRHR_ago | 0.748 | 0.833 | 0.637 | 0.638 | 0.735 | 0.077 | 0.751 | 0.593 | 0.853 | 0.846 | 0.723 | 0.201 |
| 52 | 1347031 | PR_ant | 0.892 | 0.880 | 0.794 | 0.804 | 0.837 | 0.473 | 0.831 | 0.757 | 0.821 | 0.802 | 0.789 | 0.550 |
| 53 | 1347032 | TGFb_ant | 0.809 | 0.750 | 0.765 | 0.764 | 0.757 | 0.273 | 0.860 | 0.780 | 0.824 | 0.817 | 0.802 | 0.493 |
| 54 | 1347033 | PXR_ago | 0.851 | 0.759 | 0.817 | 0.805 | 0.788 | 0.517 | 0.838 | 0.745 | 0.817 | 0.790 | 0.781 | 0.556 |
| 55 | 1347034 | CaspH_ind | 0.870 | 0.791 | 0.852 | 0.849 | 0.821 | 0.348 | 0.858 | 0.773 | 0.856 | 0.848 | 0.814 | 0.452 |
| 56 | 1347035 | TGFb_ago | 0.968 | 1.000 | 0.938 | 0.938 | 0.969 | 0.174 | 0.900 | 0.818 | 0.937 | 0.936 | 0.878 | 0.311 |
| 57 | 1347036 | PR_ago | 0.943 | 0.833 | 0.989 | 0.986 | 0.911 | 0.701 | 0.799 | 0.537 | 0.986 | 0.967 | 0.761 | 0.564 |
| 58 | 1347037 | CaspC_ind | 0.884 | 0.850 | 0.785 | 0.786 | 0.817 | 0.216 | 0.863 | 0.771 | 0.882 | 0.878 | 0.827 | 0.351 |
| 59 | 1347038 | TRHR_ant | 0.822 | 0.700 | 0.841 | 0.840 | 0.771 | 0.148 | 0.828 | 0.870 | 0.701 | 0.709 | 0.785 | 0.260 |
AID means PubChem assay IDs. Predictive performances were evaluated using the following metrics: area under the curve of receiver operating characteristic curve (AUC), sensitivity (SE), specificity (SP), accuracy (ACC), balanced accuracy (BAC), and Matthews correlation coefficient (MCC). N.D. shows no data.
Figure 2Receiver operating characteristic (ROC) curves with the test set in the case of criteria 40.
Figure 3Receiver operating characteristic curves with the test set in the case of criteria 1.
Mean predictive performances for all assay targets.
| Metrics | Criteria 40 | Criteria 1 |
|---|---|---|
| AUC | 0.817 ± 0.088 | 0.802 ± 0.051 |
| SE | 0.750 ± 0.151 | 0.705 ± 0.094 |
| SP | 0.809 ± 0.149 | 0.801 ± 0.082 |
| ACC | 0.807 ± 0.144 | 0.788 ± 0.069 |
| BAC | 0.780 ± 0.069 | 0.753 ± 0.045 |
| MCC | 0.307 ± 0.141 | 0.402 ± 0.096 |
Each value of performances evaluated by six metrics were shown as mean ± standard error. n = 58 (criteria 40), n = 59 (criteria 1).
Figure 4Comparison of the Toxicity Predictor models with the Tox21 Data Challenge 2014 models: This figure shows the predictive performance of the top 10 Tox21 Data Challenge and Toxicity Predictor models, which were built for 10 toxicity targets (AhR_ago, Arlbd_ago, ERlbd_ago, Arom_ant, PPARg_ago, ARE_ago, ATAD_ind, HSR_act, MMP_disr, and p53_ago). The horizontal axis denotes the names of the modeling teams of the Tox21 Data Challenge, and the vertical axis indicates the areas under the curve (AUCs).
Figure 5The platform screens of Toxicity Predictor.
Figure 6Prediction results in Toxicity Predictor: (a) the position of the compound to be predicted in the training set chemical space visualized with principal component analysis. The gray points are compounds in the training set, and the blue point is the compound to be predicted. (b) The predictive results for 59 MIEs predicted by Toxicity Predictor for each of the criteria 1 and 40. Normalized prediction scores for each target were displayed as bar charts. Red, blue, and gray bars show scores above 0.6, below 0.4, and between 0.4 and 0.6, respectively.
Molecular Initiating Events (MIEs) used in this study.
| No. | AID | Molecular Initiating Events | Activity Type | Abbreviation |
|---|---|---|---|---|
| 1 | 720516 | ATAD5 | genotoxic inducer | ATAD5_ind |
| 2 | 720552 | p53 | agonist | p53_ago |
| 3 | 720637 | mitochondrial membrane potential | disruptor | MMP_disr |
| 4 | 720719 | glucocorticoid receptor | agonist | GR_ago |
| 5 | 720725 | glucocorticoid receptor | antagonist | GR_ant |
| 6 | 743053 | androgen receptor lbd | agonist | Arlbd_ago |
| 7 | 743054 | androgen receptor full | antagonist | ARfull_ant |
| 8 | 743063 | androgen receptor lbd | antagonist | Arlbd_ant |
| 9 | 743067 | thyroid receptor | antagonist | TR_ant |
| 10 | 743077 | estrogen receptor alpha lbd | agonist | ERlbd_ago |
| 11 | 743078 | estrogen receptor alpha lbd | antagonist | ERlbd_ant |
| 12 | 743091 | estrogen receptor alpha full | antagonist | ERfull_ant |
| 13 | 743122 | aryl hydrocarbon receptor | agonist | AhR_ago |
| 14 | 743139 | aromatase | antagonist | Arom_ant |
| 15 | 743140 | peroxisome proliferator-activated receptor gamma | agonist | PPARg_ago |
| 16 | 743199 | peroxisome proliferator-activated receptor gamma | antagonist | PPARg_ant |
| 17 | 743219 | antioxidant response element | agonist | ARE_ago |
| 18 | 743226 | peroxisome proliferator-activated receptor delta | antagonist | PPARd_ant |
| 19 | 743227 | peroxisome proliferator-activated receptor delta | agonist | PPARd_ago |
| 20 | 743228 | heat shock response | activator | HSR_act |
| 21 | 743239 | farnesoid-X-receptor | agonist | FXR_ago |
| 22 | 743240 | farnesoid-X-receptor | antagonist | FXR_ant |
| 23 | 743241 | vitamin D receptor | agonist | VDR_ago |
| 24 | 743242 | vitamin D receptor | antagonist | VDR_ant |
| 25 | 1159518 | NFkB | agonist | NFkB_ago |
| 26 | 1159519 | endoplasmic reticulum stress response | agonist | ERsr_ago |
| 27 | 1159523 | retinoid-related orphan receptor gamma | antagonist | ROR_ant |
| 28 | 1159528 | activator protein-1 | agonist | AP1_ago |
| 29 | 1159531 | retinoid X receptor-alpha | agonist | RXR_ago |
| 30 | 1159555 | retinoic acid receptor | antagonist | RAR_ant |
| 31 | 1224892 | constitutive androstane receptor | agonist | CAR_ago |
| 32 | 1224893 | constitutive androstane receptor | antagonist | CAR_ant |
| 33 | 1224894 | hypoxia | agonist | HIF1_ago |
| 34 | 1224895 | thyroid stimulating hormone receptor | agonist | TSHR_ago |
| 35 | 1224896 | histone variant H2AX | agonist | H2AX_ago |
| 36 | 1259247 | androgen receptor with stimulator | antagonist | Arfulls_ant |
| 37 | 1259248 | estrogen receptor alpha with stimulator | antagonist | Erfulls_ant |
| 38 | 1259387 | androgen receptor with antagonist | agonist | ARant_ago |
| 39 | 1259388 | histone deacetylase | antagonist | HDAC_ant |
| 40 | 1259390 | sonic hedgehog signaling | agonist | Shh_ago |
| 41 | 1259391 | estrogen receptor alpha with antagonist | agonist | ERaant_ago |
| 42 | 1259392 | sonic hedgehog signaling | antagonist | Shh_ant |
| 43 | 1259393 | thyroid stimulating hormone receptor | agonist antagonist | TSHR_agoant |
| 44 | 1259394 | estrogen receptor beta | agonist | ERb_ago |
| 45 | 1259395 | thyroid stimulating hormone receptor | antagonist | TSHR_ant |
| 46 | 1259396 | estrogen receptor beta | antagonist | Erb_ant |
| 47 | 1259401 | estrogen related receptor with PGC | antagonist | ERRPGC_ant |
| 48 | 1259402 | estrogen related receptor with PGC | agonist | ERRPGC_ago |
| 49 | 1259403 | estrogen related receptor | antagonist | ERR_ant |
| 50 | 1259404 | estrogen related receptor | agonist | ERR_ago |
| 51 | 1347030 | thyrotropin releasing hormone receptor | agonist | TRHR_ago |
| 52 | 1347031 | progesterone receptor | antagonist | PR_ant |
| 53 | 1347032 | transforming growth factor beta | antagonist | TGFb_ant |
| 54 | 1347033 | human pregnane X receptor | agonist | PXR_ago |
| 55 | 1347034 | caspase-3/7 in HepG2 | inducer | CaspH_ind |
| 56 | 1347035 | transforming growth factor beta | agonist | TGFb_ago |
| 57 | 1347036 | progesterone receptor | agonist | PR_ago |
| 58 | 1347037 | caspase-3/7 in CHO-K1 | inducer | CaspC_ind |
| 59 | 1347038 | thyrotropin releasing hormone receptor | antagonist | TRHR_ant |
AID means PubChem assay IDs.
Figure 7Relationship between the thresholds and active/inactive judgment. Red and white squares mean active and inactive judgments, respectively. Blue square means AIDs and SIDs.
Figure 8The modeling pipeline integrated validator, recorder, and filter used in this study.