| Literature DB >> 34206613 |
Alfonso T García-Sosa1, Uko Maran1.
Abstract
Many chemicals that enter the environment, food chain, and the human body can disrupt androgen-dependent pathways and mimic hormones and therefore, may be responsible for multiple diseases from reproductive to tumor. Thus, modeling and predicting androgen receptor activity is an important area of research. The aim of the current study was to find a method or combination of methods to predict compounds that can bind to and/or disrupt the androgen receptor, and thereby guide decision making and further analysis. A stepwise procedure proceeded from analysis of protein structures from human, chimp, and rat, followed by docking and subsequent ligand, and statistics based techniques that improved classification gradually. The best methods used multivariate logistic regression of combinations of chimpanzee protein structural docking scores, extended connectivity fingerprints, and naïve Bayesians of known binders and non-binders. Combination or consensus methods included data from a variety of procedures to improve the final model accuracy.Entities:
Keywords: androgen receptor; bayesian; chemical fingerprints; chimp; docking; ecfp; human; multivariate logistic regression; rat; toxicity
Mesh:
Substances:
Year: 2021 PMID: 34206613 PMCID: PMC8267747 DOI: 10.3390/ijms22136695
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Receiver-operator curves for docking to the human (in yellow), rat (in purple), and chimp (in green) androgen receptor as compared to a random pick (diagonal line). Chimp AUC = 0.832; human AUC = 0.797; and rat AUC = 0.744.
Figure 2Distributions of docking scores (in kcal/mol) for binding (agonists plus antagonists) and non-binding compounds for the training set against chimp protein.
Figure 3Flowchart of the different Procedures (1–13, see Section 3.2 in text) applied on the training set.
True positives (TP), true negatives (TN), false positives (FP), false negatives (FN), specificity (SP, %), sensitivity (SE, %), accuracy (Acc., %), and Matthews correlation coefficient (MCC) for 13 different procedures for the training set.
| Procedure | TP | TN | FP | FN | SP | SE | Acc. | MCC | |
|---|---|---|---|---|---|---|---|---|---|
|
| Docking score threshold | 97 | 1157 | 323 | 108 | 78.18 | 47.32 | 74.42 | 0.1926 |
|
| Bayesian on scores | 89 | 1232 | 248 | 116 | 83.24 | 43.41 | 78.40 | 0.2179 |
|
| Logistic regression on Bayesian | 100 | 1078 | 402 | 105 | 72.84 | 48.78 | 69.91 | 0.1545 |
|
| Modified Bayesian | 90 | 1233 | 247 | 115 | 83.31 | 43.90 | 78.52 | 0.2224 |
|
| Fingerprints (ECFP) | 189 | 795 | 685 | 16 | 53.72 | 92.20 | 58.40 | 0.3004 |
|
| Docking scores then ECFP | 169 | 978 | 502 | 36 | 66.08 | 82.44 | 68.07 | 0.3240 |
|
| ECFP then docking scores | 191 | 712 | 768 | 14 | 48.11 | 93.17 | 53.59 | 0.2725 |
|
| Logistic regr. on docking scores and ECFP | 71 | 1463 | 17 | 134 | 98.85 | 34.63 | 91.04 | 0.4920 |
|
| Consensus Docking and ECFP else | 170 | 1054 | 426 | 35 | 71.22 | 82.93 | 72.64 | 0.3702 |
|
| Consensus Bayesian and ECFP else | 117 | 1302 | 178 | 88 | 87.97 | 57.07 | 84.21 | 0.3875 |
|
| Logistic regr. on docking scores and ECFP and ratio of Bayesian | 69 | 1463 | 17 | 136 | 98.85 | 33.66 | 90.92 | 0.4829 |
|
| Logistic regression on and Bayesian avgs. and fingerprints | 42 | 1462 | 18 | 163 | 98.78 | 20.49 | 89.26 | 0.3400 |
|
| Logistic regression on docking scores and Bayesian avgs. and fingerprints | 75 | 1469 | 11 | 130 | 99.26 | 36.59 | 91.63 | 0.5324 |
PPV (probability of predicted binder), NPV (probability of predicted nonbinder), positive (+LR) and negative (−LR) likelihood ratios, and BCR (modified correct classification rate) for 13 different procedures for the training set.
| Procedure | NPV | PPV | +LR | −LR | BCR |
|---|---|---|---|---|---|
|
| 91.4145 | 19.8795 | 2.1133 | 0.7999 | 0.2774 |
|
| 91.3944 | 22.6107 | 2.1079 | 0.6793 | 0.4356 |
|
| 91.1243 | 19.9203 | 1.7959 | 0.7032 | 0.4618 |
|
| 91.4562 | 26.7062 | 2.6270 | 0.6735 | 0.3855 |
|
| 91.8699 | 87.2093 | 49.2239 | 0.6389 | 0.2535 |
|
| 98.0271 | 21.6247 | 1.9920 | 0.1453 | 0.4488 |
|
| 96.4497 | 25.1863 | 2.4305 | 0.2657 | 0.6211 |
|
| 98.0716 | 19.9166 | 1.7955 | 0.1420 | 0.3881 |
|
| 91.6093 | 80.6818 | 30.1521 | 0.6613 | 0.2388 |
|
| 96.7860 | 28.5235 | 2.8810 | 0.2397 | 0.6805 |
|
| 93.6691 | 39.6610 | 4.7454 | 0.4880 | 0.5011 |
|
| 91.4947 | 80.2326 | 29.3027 | 0.6711 | 0.2306 |
|
| 89.9692 | 70.0000 | 16.8455 | 0.8049 | 0.1294 |
AR pathway in vitro reference compounds and their predicted class according to Procedure 13.
| CAS | Name | Agonist | Antagonist | Predicted | Correct |
|---|---|---|---|---|---|
| 52806-53-8 | hydroxyflutamide | NA | Strong | 0 | X |
| 90357-06-5 | Bicalutamide | NA | Strong | 0 | X |
| 122-14-5 | Fenitrothion | NA | Strong | 0 | X |
| 63612-50-0 | Nilutamide | Negative | Moderate | 0 | X |
| 427-51-0 | cyproterone acetate | Weak | Moderate | 1 | Yes |
| 80-05-7 | bisphenol A | NA | Moderate/weak | 1 | Yes |
| 330-55-2 | Linuron | NA | Moderate/weak | 0 | X |
| 13311-84-7 | Flutamide | Negative | Moderate/weak | 0 | X |
| 67747-09-5 | Prochloraz | Negative | Moderate/weak | 0 | X |
| 789-02-6 | Negative | Weak | 0 | Yes | |
| 60168-88-9 | Fenarimol | Negative | Very weak | 0 | Yes |
| 58-18-4 | methyl testosterone | Strong | Negative | 1 | Yes |
| 58-22-0 | Testosterone | Strong | Negative | 1 | Yes |
| 63-05-8 | 4-androstenedione | Moderate | Negative | 1 | Yes |
| 1912-24-9 | Atrazine | Negative | Negative | 0 | Yes |
| 52918-63-5 | Deltamethrin | Negative | Negative | 0 | Yes |
| 10161-33-8 | 17b-trenbolone | Strong | NA | 1 | Yes |
| 797-63-7 | Levonorgestrel | Strong | NA | 1 | Yes |
| 68-22-4 | Norethindrone | Strong | NA | 1 | Yes |
| 521-18-6 | 5a-dihydrotestosterone | Strong | NA | 1 | Yes |