| Literature DB >> 29861831 |
Xiaocong Pang1, Weiqi Fu1, Jinhua Wang1,2,3, Lvjie Xu1, Ying Zhao1, Ai-Lin Liu1,2,3, Guan-Hua Du1,2,3.
Abstract
Estrogen receptor α (ERα) is a successful target for ER-positive breast cancer and also reported to be relevant in many other diseases. Selective estrogen receptor modulators (SERMs) make a good therapeutic effect in clinic. Because of the drug resistance and side effects of current SERMs, the discovery of new SERMs is given more and more attention. Virtual screening is a validated method to high effectively to identify novel bioactive small molecules. Ligand-based machine learning methods and structure-based molecular docking were first performed for identification of ERα antagonist from in-house natural product library. Naive Bayesian and recursive partitioning models with two kinds of descriptors were built and validated based on training set, test set, and external test set and then were utilized for distinction of active and inactive compounds. Totally, 162 compounds were predicted as ER antagonists and were further evaluated by molecular docking. According to docking score, we selected 8 representative compounds for both ERα competitor assay and luciferase reporter gene assay. Genistein, daidzein, phloretin, ellagic acid, ursolic acid, (-)-epigallocatechin-3-gallate, kaempferol, and naringenin exhibited different levels for antagonistic activity against ERα. These studies validated the feasibility of machine learning methods for predicting bioactivities of ligands and provided better insight into the natural products acting as estrogen receptor modulator, which are important lead compounds for future new drug design.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29861831 PMCID: PMC5971309 DOI: 10.1155/2018/6040149
Source DB: PubMed Journal: Oxid Med Cell Longev ISSN: 1942-0994 Impact factor: 6.543
56 molecular descriptors selected by the Pearson correlation analysis and stepwise regression.
| Descriptor class | Numbers of descriptors | Descriptors |
|---|---|---|
| DS descriptors | 24 | PEOE_VSA_FNEG,PEOE_RPC+,a_base,a_ICM,a_nBr,a_nCl,a_nN,a_nO,a_nS,ast_violation,b_rotR,BCUT_SMR_0,BCUT_SMR_1,chi1_C,chiral_u,density,GCUT_PEOE_1,GCUT_SLOGP_1,GCUT_SMR_0,PEOE_VSA_NEG,PEOE_VSA_POS,PEOE_VSA+ 3,PEOE_VSA-0,PEOE_VSA-1,PEOE_VSA-3,PEOE_VSA-6, radius,reactive,rings,SMR_VSA0,vdw_vol,vsa_don |
| MOE descriptors | 32 | a_donacc,a_ICM,a_nCl,a_nN,a_nO,a_nS,b_rotR,BCUT_SMR_1,chi1_C,chiral_u,density,FCharge,GCUT_PEOE_1,GCUT_SLOGP_1,PEOE_RPC,PEOE_RPC+,PEOE_VSA_FNEG,PEOE_VSA_FPOL,PEOE_VSA_NEG,PEOE_VSA_POS,PEOE_VSA+ 1,PEOE_VSA-0,PEOE_VSA-1,PEOE_VSA-3,PEOE_VSA-6,rings,SlogP, SlogP_VSA7, SMR_VSA0,SMR_VSA7,vdw_vol,vsa_don |
Figure 1Diversity distribution of the training set and test set as described by principal component analysis (PCA).
Performance of Bayesian and recursive partitioning models and their 5-fold cross-validation results.
| Model | TP | FN | FP | TN | SE | SP | MCC | Q+ | Q− |
|---|---|---|---|---|---|---|---|---|---|
| NB-a | 1091 | 35 | 223 | 5206 | 0.969 | 0.959 | 0.874 | 0.830 | 0.993 |
| NB-b | 1119 | 8 | 21 | 5408 | 0.993 | 0.996 | 0.985 | 0.982 | 0.999 |
| NB-c | 749 | 316 | 455 | 5037 | 0.704 | 0.917 | 0.591 | 0.622 | 0.941 |
| NB-d | 1054 | 11 | 75 | 5416 | 0.990 | 0.986 | 0.953 | 0.933 | 0.998 |
| RP-a | 1113 | 13 | 51 | 5379 | 0.988 | 0.991 | 0.966 | 0.956 | 0.998 |
| RP-b | 1111 | 15 | 49 | 5381 | 0.987 | 0.991 | 0.966 | 0.958 | 0.997 |
| RP-c | 1007 | 57 | 177 | 5314 | 0.946 | 0.968 | 0.876 | 0.850 | 0.989 |
| RP-d | 1022 | 42 | 95 | 5396 | 0.960 | 0.983 | 0.925 | 0.915 | 0.992 |
Note: NB-a: NB model with MOE 2D descriptors; NB-b: NB model with MOE 2D descriptors + ECFP_6; NB-c: NB model with DS 2D descriptors; NB-d: NB model with DS 2D descriptors + ECFP_6; RP-a: RP model with MOE 2D descriptors; RP-b: RP model with MOE 2D descriptors + ECFP_6; RP-c: RP model with DS 2D descriptors; RP-d: RP model with DS 2D descriptors + ECFP_6.
Figure 2Y-scrambling for 30 times for evaluating the chance correlation possibility of naive Bayesian (NB) and recursive partitioning (RP) models by calculating Matthews correlation coefficient (MCC). The red bar represented the performance of NB and RP without Y-scrambling. The blue bar showed that the MCC values decreased after Y-scrambling.
Performance of Bayesian and recursive partitioning models on the test set.
| Model | TP | FN | FP | TN | SE | SP | MCC | Q+ | Q− |
|---|---|---|---|---|---|---|---|---|---|
| NB-a | 426 | 7 | 68 | 2018 | 0.985 | 0.967 | 0.904 | 0.862 | 0.997 |
| NB-b | 425 | 8 | 27 | 2059 | 0.981 | 0.987 | 0.952 | 0.941 | 0.996 |
| NB-c | 346 | 62 | 356 | 1756 | 0.849 | 0.832 | 0.559 | 0.493 | 0.966 |
| NB-d | 389 | 19 | 40 | 2071 | 0.954 | 0.981 | 0.917 | 0.907 | 0.991 |
| RP-a | 426 | 8 | 31 | 2055 | 0.983 | 0.985 | 0.948 | 0.932 | 0.996 |
| RP-b | 427 | 6 | 29 | 2057 | 0.987 | 0.986 | 0.953 | 0.936 | 0.997 |
| RP-c | 379 | 54 | 81 | 2005 | 0.875 | 0.961 | 0.817 | 0.824 | 0.974 |
| RP-d | 401 | 33 | 35 | 2051 | 0.925 | 0.983 | 0.906 | 0.920 | 0.984 |
Note: NB-a: NB model with MOE 2D descriptors; NB-b: NB model with MOE 2D descriptors + ECFP_6; NB-c: NB model with DS 2D descriptors; NB-d: NB model with DS 2D descriptors + ECFP_6; RP-a: RP model with MOE 2D descriptors; RP-b: RP model with MOE 2D descriptors + ECFP_6; RP-c: RP model with DS 2D descriptors; RP-d: RP model with DS 2D descriptors + ECFP_6.
Figure 3The performance of MCC value made by 8 classifiers on external test set. NB and RP MOE 2D descriptors with ECFP_6 were the most powerful models for prediction of ERα antagonist.
Figure 4Examples of 10 good (a) and bad (b) fragments evaluated by the NB-b model. The Bayesian score (score) was given for each fragment.
IC50 values (nM) of 8 representative compounds as ERα antagonist from natural products and their binding affinity evaluated by Discovery Studio 2016.
| Chemical | IC50 (nM) | -CDOCKER_ENERGY | -CDOCKER_INTERACTION_ENERGY |
|---|---|---|---|
| Estradiol | 7.38 ± 0.80 | 32.60 | 42.1106 |
| Genistein | 29.38 ± 6.13 | 31.63 | 45.9033 |
| Daidzein | 107.62 ± 9.38 | 38.321 | 42.813 |
| Phloretin | 74.55 ± 24.24 | 44.56 | 49.7287 |
| Ellagic acid | 62.61 ± 9.34 | 27.54 | 40.3217 |
| EGCG | 66.01 ± 11.59 | 46.72 | 64.262 |
| Ursolic acid | 977.38 ± 125.30 | −65.38 | 24.1527 |
| Kaempferol | 316.67 ± 14.33 | 30.31 | 41.5717 |
| Naringenin | 967.54 ± 70.95 | 30.08 | 38.021 |
Figure 5The investigation of the binding modes of 6 different skeleton structures. They were genistein (a), naringenin (b), EGCG (c), phloretin (d), ellagic acid (e), and ursolic acid (f), which belong to isoflavone, flavone, catechin, dihydrochalcone, polyphenol, and triterpenoid.
Figure 6Antiestrogenic effects of 8 natural products in the ERα transactivation assay using MCF-7 cells transiently transfected with pERE-TATA-Luc. Cells were treated with the tested chemicals at a series of concentrations. The values represent the mean ± SD of three independent experiments and are presented as the percentage of the response, with the control defined as 100%. ∗P < 0.01, ∗∗P < 0.05, and ∗∗∗P < 0.001.