| Literature DB >> 27023588 |
Huixiao Hong1, Jie Shen2, Hui Wen Ng3, Sugunadevi Sakkiah4, Hao Ye5, Weigong Ge6, Ping Gong7, Wenming Xiao8, Weida Tong9.
Abstract
Endocrine disruptors such as polychlorinated biphenyls (PCBs), diethylstilbestrol (DES) and dichlorodiphenyltrichloroethane (DDT) are agents that interfere with the endocrine system and cause adverse health effects. Huge public health concern about endocrine disruptors has arisen. One of the mechanisms of endocrine disruption is through binding of endocrine disruptors with the hormone receptors in the target cells. Entrance of endocrine disruptors into target cells is the precondition of endocrine disruption. The binding capability of a chemical with proteins in the blood affects its entrance into the target cells and, thus, is very informative for the assessment of potential endocrine disruption of chemicals. α-fetoprotein is one of the major serum proteins that binds to a variety of chemicals such as estrogens. To better facilitate assessment of endocrine disruption of environmental chemicals, we developed a model for α-fetoprotein binding activity prediction using the novel pattern recognition method (Decision Forest) and the molecular descriptors calculated from two-dimensional structures by Mold² software. The predictive capability of the model has been evaluated through internal validation using 125 training chemicals (average balanced accuracy of 69%) and external validations using 22 chemicals (balanced accuracy of 71%). Prediction confidence analysis revealed the model performed much better at high prediction confidence. Our results indicate that the model is useful (when predictions are in high confidence) in endocrine disruption risk assessment of environmental chemicals though improvement by increasing number of training chemicals is needed.Entities:
Keywords: assessment; binding; disruption; endocrine; model; prediction; α-fetoprotein
Mesh:
Substances:
Year: 2016 PMID: 27023588 PMCID: PMC4847034 DOI: 10.3390/ijerph13040372
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Overview of the study design.
Figure 2Boxplots for the predictions from the DF models in the 5-fold cross validations. Performance were measured by metrics as indicated on the x-axis.
Summary of cross validations, permutation tests, and external validation.
| Parameter | Cross Validations | Permutation Tests | External Validation | ||
|---|---|---|---|---|---|
| Mean | STD | Mean | STD | ||
| Accuracy | 0.689 | ±0.034 | 0.498 | ±0.049 | 0.546 |
| Sensitivity | 0.675 | ±0.054 | 0.427 | ±0.067 | 0.412 |
| Specificity | 0.700 | ±0.046 | 0.558 | ±0.061 | 1.000 |
| MCC | 0.570 | ±0.026 | 0.497 | ±0.009 | 0.371 |
| Balanced accuracy | 0.688 | ±0.034 | 0.492 | ±0.050 | 0.706 |
STD: standard deviation.
Figure 3Distributions of the 1000 prediction accuracy values calculated from the DF models in permuation tests (red line) and yielded from the DF models in the cross validations (blue line).
Figure 4Predictions and accuracy at different confidence levels. The distributions of predictions were given by the left y-axis and the prediction accuracy is indicated by the right y-axis. Prediction confidence was given at the x-axis. Predictions are plotted in green line, correct predictions in blue line, incorrect predictions in red line, and prediction accuracy in black line.
Figure 5The distribution of descriptors used in the DF models.
Informative descriptors identified from the cross validations.
| ID | Models | Descriptor Definition |
|---|---|---|
| D282 | 4429 | complementary information content (neighborhood symmetry of 2-order) |
| D281 | 4099 | structural information content (neighborhood symmetry of 2-order) |
| D450 | 4075 | Geary autocorrelation-lag 4/weighted by atomic masses |
| D432 | 3916 | Broto-Moreau autocorrelation of a topological structure-lag 2/weighted by atomic Sanderson electronegativity |
| D458 | 3770 | Geary autocorrelation-lag 4/weighted by atomic van der Waals volumes |
| D361 | 3391 | ratio of multiple path counts to path counts |
| D213 | 3233 | valence connectivity index chi-1 |
| D467 | 3225 | Geary autocorrelation-lag 5/weighted by atomic Sanderson electronegativity |
| D491 | 3091 | Moran autocorrelation-lag 5/weighted by atomic van der Waals volumes |
| D259 | 3084 | mean information content on the distance degree equality |
| D496 | 2272 | Moran autocorrelation-lag 2/weighted by atomic Sanderson electronegativity |
| D478 | 2238 | Geary autocorrelation-lag 8/weighted by atomic polarizabilities |
| D463 | 2024 | Geary autocorrelation-lag 1/weighted by atomic Sanderson electronegativity |
| D246 | 1995 | Maximum of the differences between vertex distance and unipolarity |
| D473 | 1799 | Geary autocorrelation-lag 3/weighted by atomic polarizabilities |
| D595 | 1698 | highest eigenvalue n. 8 of Burden matrix/weighted by atomic polarizabilities |
Figure 6Decision trees of the AFP binding activity prediction DF model. The descriptors and their criteria that were used to split the intermediate nodes are given under the nodes. The left nodes are the sets of chemicals that meet the criteria for splitting their parent nodes; the right nodes represent the sets of chemicals that do not meet the criteria. The root node (whole training data set) and the intermediate nodes are presented in empty/white circles. Letter Y in a circle indicates the chemicals in the node meet the splitting criterion, whereas the letter N means the chemicals do not meet the splitting criterion. The terminal nodes are the leaves of the trees where the AFP binding activity predictions were determined and are shown in grey circles. Number 1 in a circle indicates that the chemicals in the node are predicted as AFP binders while number 0 marks the node where chemicals are predicted as AFP non-binders.
The experimental and predicted AFP binding activity of the external data set.
| Chemical Name | Experiment | Prediction | Reference |
|---|---|---|---|
| 17-α-Ethynylestradiol | 1 | 1 | [ |
| 11-β-Ethyloxyestradiol | 1 | 0 | [ |
| 11-β-Methoxyestradiol | 1 | 1 | [ |
| Compound | 1 | 0 | [ |
| 16-α-Fluoroestradiol (FES) | 1 | 1 | [ |
| Compound | 1 | 0 | [ |
| Compound | 1 | 1 | [ |
| Compound | 1 | 1 | [ |
| Compound | 1 | 0 | [ |
| Compound | 1 | 0 | [ |
| Compound | 1 | 1 | [ |
| 11-β-Ethyl-17-α-ethynylestradiol | 1 | 0 | [ |
| 11-β-Ethylestradiol | 1 | 0 | [ |
| Compound | 1 | 0 | [ |
| 17-α-Ethynyl-11-β-Methoxyestradiol | 1 | 0 | [ |
| Compound | 1 | 0 | [ |
| 4-Nonylphenoxyacetic acid (NP1EC) | 1 | 1 | [ |
| 4- | 0 | 0 | [ |
| Igepal | 0 | 0 | [ |
| 2,4’DDT | 0 | 0 | [ |
| 2,4’-DDE | 0 | 0 | [ |
| Kepone | 0 | 0 | [ |
AFP binding data: 1 represents binder and 0 indicates non-binder.