| Literature DB >> 27089312 |
Mubarak Himmat1, Naomie Salim2, Mohammed Mumtaz Al-Dabbagh3, Faisal Saeed4, Ali Ahmed5,6.
Abstract
Quantifying the similarity of molecules is considered one of the major tasks in virtual screening. There are many similarity measures that have been proposed for this purpose, some of which have been derived from document and text retrieving areas as most often these similarity methods give good results in document retrieval and can achieve good results in virtual screening. In this work, we propose a similarity measure for ligand-based virtual screening, which has been derived from a text processing similarity measure. It has been adopted to be suitable for virtual screening; we called this proposed measure the Adapted Similarity Measure of Text Processing (ASMTP). For evaluating and testing the proposed ASMTP we conducted several experiments on two different benchmark datasets: the Maximum Unbiased Validation (MUV) and the MDL Drug Data Report (MDDR). The experiments have been conducted by choosing 10 reference structures from each class randomly as queries and evaluate them in the recall of cut-offs at 1% and 5%. The overall obtained results are compared with some similarity methods including the Tanimoto coefficient, which are considered to be the conventional and standard similarity coefficients for fingerprint-based similarity calculations. The achieved results show that the performance of ligand-based virtual screening is better and outperforms the Tanimoto coefficients and other methods.Entities:
Keywords: chemoinformatics; drug discovery; similarity coefficients; similarly search; virtual screening
Mesh:
Substances:
Year: 2016 PMID: 27089312 PMCID: PMC6274479 DOI: 10.3390/molecules21040476
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Similarity measures most used in text retrieval.
| Similarity Measure | Formula |
|---|---|
| Extended Jaccard coefficient | S |
| Dice | |
| Euclidean distance | |
| Cosine similarity | |
| Pairwise-adaptive | |
| IT-Sim |
where d in Pairwise-adaptive is a subset of containing the values of the features which are the union of the K largest features appearing in d1 and d2, respectively.
MDDR activity classes for the DS1 dataset.
| Activity Index | Activity Class | Active Molecules |
|---|---|---|
| 31420 | Renin inhibitors | 1130 |
| 71523 | HIV protease | 750 |
| 37110 | Thrombin inhibitors | 803 |
| 31432 | Angiotensin II AT1antagonists | 943 |
| 42731 | Substance P antagonists | 1246 |
| 06233 | Substance P antagonists | 752 |
| 06245 | 5HT reuptake inhibitors | 359 |
| 07701 | D2 antagonists | 395 |
| 06235 | 5HT1A agonists | 827 |
| 78374 | Protein kinase C inhibitors | 453 |
DS2 dataset activity classes.
| Activity Index | Activity Class | Active Molecules |
|---|---|---|
| 07707 | Adenosine (AI) agonists | 207 |
| 07708 | Adenosine (A2) agonists | 156 |
| 31420 | Rennin inhibitors 1 | 1300 |
| 42710 | CCK agonists | 111 |
| 64100 | Monocycle_ lactams | 1346 |
| 64200 | Cephalosporin’s | 113 |
| 64220 | Carbacephems | 1051 |
| 64500 | Carbapenems | 126 |
| 64350 | Tribactams | 388 |
| 75755 | Vitamin D analogues | 455 |
MUV activity classes.
| Activity Index | Activity Class |
|---|---|
| 466 | S1P1 rec. (agonists) |
| 548 | PKA (inhibitors) |
| 600 | SF1 (inhibitors) |
| 644 | Rho-Kinase2 (inhibitors) |
| 652 | HIV RT-RNase (inhibitors) |
| 689 | Eph rec. A4 (inhibitors) |
| 692 | SF1 (agonists) |
| 712 | HSP 90 (inhibitors) 30 |
| 713 | ER-a-Coact. Bind. (inhibitors) |
| 733 | ER-b-Coact. Bind. (inhibitors) |
| 737 | ER-a-Coact. Bind. (potentiators) |
| 810 | FAK (inhibitors |
| 832 | Cathepsin G (inhibitors) |
| 846 | FXIa (inhibitors) |
| 852 | FXIIa (inhibitors) |
| 858 | D1 rec. (allosteric modulators) |
| 859 | M1 rec. (allosteric inhibitors) |
The recall is calculated using the top 1% and top 5% of the DS1 dataset.
| Activity Classes | TAN | SQB | ASMTP | |||
|---|---|---|---|---|---|---|
| 1% | 5% | 1% | 5% | 1% | 5% | |
| 31420 | 57.3 | 85.85 | 73.73 | 87.22 | 78.83 | 96.81 |
| 71523 | 29.96 | 58.09 | 26.84 | 48.7 | 12.82 | 51.94 |
| 37110 | 14.38 | 29.98 | 24.73 | 45.62 | 39.53 | 63.84 |
| 31432 | 36.37 | 76.85 | 36.66 | 70.44 | 45.22 | 97.45 |
| 42731 | 16.89 | 27.74 | 21.17 | 19.35 | 13.95 | 20.88 |
| 6233 | 22.72 | 37.78 | 12.49 | 21.04 | 22.77 | 36.75 |
| 6245 | 5.03 | 14.83 | 6.03 | 13.63 | 11.73 | 26.26 |
| 7701 | 8.45 | 23.07 | 11.35 | 21.85 | 8.95 | 17.26 |
| 6235 | 9.03 | 21 | 10.15 | 19.13 | 21.91 | 37.17 |
| 78374 | 12.08 | 17.81 | 13.08 | 20.55 | 1.77 | 2.65 |
| 78331 | 8.77 | 16.71 | 5.92 | 13.1 | 3.31 | 10.24 |
| Mean | 20.08909 | 37.24636 | 22.01364 | 34.05 | 23.65 | 41.93182 |
| Shaded cells | 3 | 4 | 0 | 0 | 8 | 7 |
The recall is calculated using the top 1% and top 5% of the DS2 dataset.
| Activity Classes | TAN | SQB | Proposed Method | |||
|---|---|---|---|---|---|---|
| 1% | 5% | 1% | 5% | 1% | 5% | |
| 09249 | 61.84 | 70.39 | 58.5 | 74.22 | 72.82 | 73.3 |
| 12455 | 47.03 | 56.58 | 55.61 | 100 | 99.35 | 100 |
| 12464 | 65.1 | 88.19 | 62.22 | 95.24 | 81.66 | 96.46 |
| 31281 | 81.82 | 86.64 | 83 | 93 | 92.73 | 99.09 |
| 43210 | 80.31 | 93.75 | 80.73 | 98.94 | 88.2 | 99.85 |
| 71522 | 53.84 | 77.68 | 53.13 | 98.93 | 81.25 | 99.11 |
| 75721 | 46.8 | 63.94 | 34.61 | 90.9 | 77.27 | 98.67 |
| 78331 | 30.56 | 44.8 | 29.04 | 92.72 | 80 | 96.8 |
| 78348 | 80.18 | 91.71 | 81.86 | 93.75 | 82.17 | 99.74 |
| 78351 | 87.56 | 94.82 | 85.4 | 95.39 | 96.48 | 96.92 |
| Mean | 63.504 | 76.85 | 62.41 | 93.31 | 85.193 | 95.994 |
| Shaded cells | 0 | 0 | 0 | 0 | 10 | 10 |
The recall is calculated using the top 1% and top 5% of the MUV 17 activity classes data sets.
| Activity Index | Tanimoto | SQB | Proposed Method | |||
|---|---|---|---|---|---|---|
| 1% | 5% | 1% | 5% | 1% | 5% | |
| 466 | 3.1 | 5.86 | 1.38 | 8.62 | 5.86 | 9.66 |
| 548 | 8.62 | 22.76 | 11.38 | 24.14 | 10.34 | 17.93 |
| 600 | 3.79 | 11.38 | 5.52 | 16.21 | 6.21 | 13.45 |
| 644 | 7.59 | 17.59 | 8.97 | 17.93 | 7.24 | 12.41 |
| 652 | 2.76 | 7.93 | 3.79 | 9.66 | 5.86 | 11.38 |
| 689 | 3.79 | 9.66 | 4.48 | 11.72 | 5.86 | 9.71 |
| 692 | 0.69 | 4.83 | 1.38 | 4.83 | 3.79 | 6.55 |
| 712 | 4.14 | 10.34 | 5.17 | 11.03 | 6.21 | 8.97 |
| 713 | 3.1 | 7.24 | 2.76 | 5.86 | 6.21 | 9.31 |
| 733 | 3.45 | 8.97 | 4.14 | 8.62 | 5.86 | 9.31 |
| 737 | 2.41 | 8.28 | 1.72 | 8.28 | 7.59 | 14.14 |
| 810 | 2.07 | 6.9 | 1.72 | 11.03 | 7.24 | 13.1 |
| 832 | 6.55 | 13.1 | 8.28 | 14.83 | 13.1 | 20 |
| 846 | 9.66 | 28.62 | 12.41 | 26.9 | 10.69 | 25.52 |
| 852 | 12.41 | 21.38 | 9.66 | 20 | 13.45 | 21.03 |
| 858 | 1.72 | 5.86 | 1.38 | 6.21 | 6.21 | 7.93 |
| 859 | 1.38 | 8.97 | 2.41 | 8.62 | 5.86 | 10.69 |
| Avg | 4.54 | 11.70 | 5.09 | 12.61 | 7.50 | 12.991 |
| Shaded cells | 0 | 2 | 3 | 5 | 14 | 11 |
Figure 1ROC curves and AUCs at 5% cutoff of DS1 data set.
Figure 2ROC curves and AUCs at 5% cutoff of DS2 data set.
Figure 3ROC curves and AUCs at 5% cutoff of MUV data set.
Comparison results of enrichment values of (BEDROC α = 20) and (EF 1%) using ASMTP on MDDR1, MDDR2, and MUV data sets.
| Methods | DS1 | DS2 | MUV | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BEDROC( α 20) | EF (1%) | BEDROC( α 20) | EF (1%) | BEDROC( α 20) | EF (1%) | |||||||
| Mean | Median | Mean | Median | Mean | Median | Mean | Median | Mean | Median | Mean | Median | |
| Tan | 0.48 | 0.46 | 80.01 | 86.01 | 0.33 | 0.34 | 23.01 | 23.01 | 0.37 | 0.37 | 16.69 | 17.92 |
| SQB | 0.53 | 0.57 | 90.01 | 89.31 | 0.44 | 0.39 | 29.01 | 22.01 | 0.41 | 0.39 | 18.01 | 19.74 |
| ASMTP | 0.61 | 0.64 | 92.9 | 90.23 | 0.46 | 0.50 | 28.27 | 25.32 | 0.44 | 0.42 | 18.93 | 20.14 |