| Literature DB >> 22623895 |
Ali Ahmed1, Ammar Abdo, Naomie Salim.
Abstract
Many of the similarity-based virtual screening approaches assume that molecular fragments that are not related to the biological activity carry the same weight as the important ones. This was the reason that led to the use of Bayesian networks as an alternative to existing tools for similarity-based virtual screening. In our recent work, the retrieval performance of the Bayesian inference network (BIN) was observed to improve significantly when molecular fragments were reweighted using the relevance feedback information. In this paper, a set of active reference structures were used to reweight the fragments in the reference structure. In this approach, higher weights were assigned to those fragments that occur more frequently in the set of active reference structures while others were penalized. Simulated virtual screening experiments with MDL Drug Data Report datasets showed that the proposed approach significantly improved the retrieval effectiveness of ligand-based virtual screening, especially when the active molecules being sought had a high degree of structural heterogeneity.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22623895 PMCID: PMC3353468 DOI: 10.1100/2012/410914
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Bayesian inference network model.
MDDR activity classes for DS1 dataset.
| Activity index | Activity class | Active molecules | Pairwise similarity (mean) |
|---|---|---|---|
| 31420 | Renin inhibitors | 1130 | 0.290 |
| 71523 | HIV protease inhibitors | 750 | 0.198 |
| 37110 | Thrombin inhibitors | 803 | 0.180 |
| 31432 | Angiotensin II AT1 antagonists | 943 | 0.229 |
| 42731 | Substance P antagonists | 1246 | 0.149 |
| 06233 | Substance P antagonists | 752 | 0.140 |
| 06245 | 5HT reuptake inhibitors | 359 | 0.122 |
| 07701 | D2 antagonists | 395 | 0.138 |
| 06235 | 5HT1A agonists | 827 | 0.133 |
| 78374 | Protein kinase C inhibitors | 453 | 0.120 |
| 78331 | Cyclooxygenase inhibitors | 636 | 0.108 |
MDDR activity classes for DS3 dataset.
| Activity index | Activity class | Active molecules | Pairwise similarity (mean) |
|---|---|---|---|
| 09249 | Muscarinic (M1) agonists | 900 | 0.111 |
| 12455 | NMDA receptor antagonists | 1400 | 0.098 |
| 12464 | Nitric oxide synthase inhibitors | 505 | 0.102 |
| 31281 | Dopamine-hydroxylase inhibitors | 106 | 0.125 |
| 43210 | Aldose reductase inhibitors | 957 | 0.119 |
| 71522 | Reverse transcriptase inhibitors | 700 | 0.103 |
| 75721 | Aromatase inhibitors | 636 | 0.110 |
| 78331 | Cyclooxygenase inhibitors | 636 | 0.108 |
| 78348 | Phospholipase A2 inhibitors | 617 | 0.123 |
| 78351 | Lipoxygenase inhibitors | 2111 | 0.113 |
The recall is calculated using the top 1% and top 5% of the DS1 data sets when ranked using the TAN, BIN, and BINRF.
| Activity index | 1% | 5% | ||||
|---|---|---|---|---|---|---|
| TAN | BIN | BINRF | TAN | BIN | BINRF | |
| 31420 | 55.84* | 74.08* | 81.8** | 85.49* | 87.61** | 84.12* |
| 71523 | 22.26* | 28.26* | 43.86** | 42.7* | 52.72* | 68.72** |
| 37110 | 12.54* | 26.05* | 41.25** | 24.11* | 48.2* | 71.05** |
| 31432 | 33.36* | 39.23* | 46.5** | 68.2* | 77.57* | 91.59** |
| 42731 | 16.24* | 21.68* | 28.13** | 32.81* | 26.63* | 42.39** |
| 06233 | 14.23* | 14.06* | 16.75** | 27.01* | 23.49* | 32.93** |
| 06245 | 10.06** | 6.31* | 10.04* | 22.9* | 14.86* | 28.8** |
| 07701 | 8.91* | 11.45* | 19.75** | 23.1* | 27.79* | 41.24** |
| 06235 | 11.87* | 10.84* | 12.45** | 24.54* | 23.78* | 31.89** |
| 78374 | 16.75* | 14.25* | 25.49** | 24.26* | 20.2* | 39.18** |
| 78331 | 8.05* | 6.03* | 8.14** | 16.83** | 11.8* | 11.20* |
|
| ||||||
| Mean |
|
|
|
|
|
|
|
| ||||||
| Share cells | 1 | 0 | 10 | 1 | 1 | 9 |
The recall is calculated using the top 1% and top 5% of the DS3 data sets when ranked using the TAN, BIN, and BINRF.
| Activity index | 1% | 5% | ||||
|---|---|---|---|---|---|---|
| TAN | BIN | BINRF | TAN | BIN | BINRF | |
| 09249 | 25.09** | 15.33* | 15.51* | 40.21** | 25.72* | 29.08* |
| 12455 | 7.7* | 9.37* | 11.59** | 19.08** | 14.65* | 16.77* |
| 12464 | 9.02* | 8.45* | 11.67** | 14.56* | 16.55* | 27.1** |
| 31281 | 27.53* | 18.29* | 44.48** | 44* | 28.29* | 59.9** |
| 43210 | 11.1** | 7.34* | 9.41* | 26.37** | 14.41* | 21.27* |
| 71522 | 2.35* | 4.08* | 11.39** | 6.28* | 8.44* | 23.62** |
| 75721 | 24.02* | 20.41* | 28.24** | 28.97* | 30.02* | 56.39** |
| 78331 | 6.27* | 7.51* | 10.11** | 15.79* | 12.03* | 18.82** |
| 78348 | 4.69* | 9.79** | 8.99* | 13.16* | 20.76* | 24.15** |
| 78351 | 4.31* | 13.68* | 16.64** | 10.55* | 12.94* | 20.16** |
|
| ||||||
| Mean |
|
|
|
|
|
|
|
| ||||||
| Share cells | 2 | 1 | 7 | 3 | 0 | 7 |
Rankings of weighting functions based on Kendall W test results: DS1–DS3 Top 1% and 5%.
| Dataset | Recall type |
|
| Ranking |
|---|---|---|---|---|
| DS1 | 1% | 0.75 | <0.01 | BINRF > BIN > TAN |
| 5% | 0.71 | <0.01 | BINRF > TAN > BIN | |
|
| ||||
| DS2 | 1% | 0.39 | >0.01 | BINRF > BIN > TAN |
| 5% | 0.28 | <0.01 | BINRF > BIN > TAN | |
|
| ||||
| DS3 | 1% | 0.37 | <0.01 | BINRF>TAN>BIN |
| 5% | 0.39 | <0.01 | BINRF>TAN>BIN | |
Number of (∗ and ∗∗) cells for mean recall of actives using different search models for DS1–DS3 Top 1% and 5%.
| Dataset | TAN | BIN | BINRF |
|---|---|---|---|
| Top 1% | |||
| DS1 | 1 | 0 | 10 |
| DS2 | 1 | 1 | 8 |
| DS3 | 2 | 1 | 7 |
|
| |||
| Top 5% | |||
| DS1 | 1 | 1 | 9 |
| DS2 | 2 | 2 | 6 |
| DS3 | 3 | 0 | 7 |
MDDR activity classes for DS2 dataset.
| Activity index | Activity class | Active molecules | Pairwise similarity (mean) |
|---|---|---|---|
| 07707 | Adenosine (A1) agonists | 207 | 0.229 |
| 07708 | Adenosine (A2) agonists | 156 | 0.305 |
| 31420 | Renin inhibitors 1 | 1300 | 0.290 |
| 42710 | CCK agonists | 111 | 0.361 |
| 64100 | Monocyclic-lactams | 1346 | 0.336 |
| 64200 | Cephalosporins | 113 | 0.322 |
| 64220 | Carbacephems | 1051 | 0.269 |
| 64500 | Carbapenems | 126 | 0.260 |
| 64350 | Tribactams | 388 | 0.305 |
| 75755 | Vitamin D analogous | 455 | 0.386 |
The recall is calculated using the top 1% and top 5% of the DS2 data sets when ranked using the TAN, BIN, and BINRF.
| Activity index | 1% | 5% | ||||
|---|---|---|---|---|---|---|
| TAN | BIN | BINRF | TAN | BIN | BINRF | |
| 07707 | 78.3** | 72.18* | 72.33* | 91.08** | 74.81* | 74.17* |
| 07708 | 74.01* | 96* | 100** | 88.52* | 99.61* | 100** |
| 31420 | 46.44* | 79.82* | 82.71** | 77.6* | 95.46* | 97.15** |
| 42710 | 57.22* | 76.27* | 95.36** | 67.59* | 92.55* | 99.36** |
| 64100 | 93.22* | 88.43** | 87.75* | 97.89* | 99.22** | 98.93* |
| 64200 | 63.39* | 70.18* | 71.79** | 89.82* | 99.2** | 99.12* |
| 64220 | 73.56* | 68.32* | 82.47** | 92.05* | 91.32* | 98.89** |
| 64500 | 60.75* | 81.2* | 96.56** | 74.98* | 94.96* | 99.28** |
| 64350 | 76.69* | 81.89* | 93.67** | 90.34* | 91.47* | 98.24** |
| 75755 | 95.99* | 98.06* | 98.26** | 98.78** | 98.33* | 98.33* |
|
| ||||||
| Mean |
|
|
|
|
|
|
|
| ||||||
| Share cells | 1 | 1 | 8 | 2 | 2 | 6 |