| Literature DB >> 25140330 |
Hamza Hentabli1, Faisal Saeed2, Ammar Abdo3, Naomie Salim1.
Abstract
Molecular similarity is a pervasive concept in drug design. The basic idea underlying molecular similarity is the similar property principle, which states that structurally similar molecules will exhibit similar physicochemical and biological properties. In this paper, a new graph-based molecular descriptor (GBMD) is introduced. The GBMD is a new method of obtaining a rough description of 2D molecular structure in textual form based on the canonical representations of the molecule outline shape and it allows rigorous structure specification using small and natural grammars. Simulated virtual screening experiments with the MDDR database show clearly the superiority of the graph-based descriptor compared to many standard descriptors (ALOGP, MACCS, EPFP4, CDKFP, PCFP, and SMILE) using the Tanimoto coefficient (TAN) and the basic local alignment search tool (BLAST) when searches were carried.Entities:
Mesh:
Year: 2014 PMID: 25140330 PMCID: PMC4130360 DOI: 10.1155/2014/286974
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Aspirin molecule structure.
Figure 2Construct atomic connectivity values in GBMD using Morgan algorithm.
Figure 3Graphical representation of tree graph extracted from aspirin molecule graph.
Figure 4Process of generating the GBMD descriptor.
MDDR activity classes for DS1 data set.
| Activity index | Activity class | Active molecules | Pairwise similarity |
|---|---|---|---|
| 31420 | Renin inhibitors | 1130 | 0.290 |
| 71523 | HIV protease inhibitors | 750 | 0.198 |
| 37110 | Thrombin inhibitors | 803 | 0.180 |
| 31432 | Angiotensin II AT1 antagonists | 943 | 0.229 |
| 42731 | Substance P antagonists | 1246 | 0.149 |
| 06233 | 5HT3 antagonist | 752 | 0.140 |
| 06245 | 5HT reuptake inhibitors | 359 | 0.122 |
| 07701 | D2 antagonists | 395 | 0.138 |
| 06235 | 5HT1A agonists | 827 | 0.133 |
| 78374 | Protein kinase C inhibitors | 453 | 0.120 |
| 78331 | Cyclooxygenase inhibitors | 636 | 0.108 |
MDDR activity classes for DS2 data set.
| Activity index | Activity class | Active molecules | Pairwise similarity |
|---|---|---|---|
| 07707 | Adenosine (A1) agonists | 207 | 0.229 |
| 07708 | Adenosine (A2) agonists | 156 | 0.305 |
| 31420 | Renin inhibitors | 1130 | 0.290 |
| 42710 | CCK agonists | 111 | 0.361 |
| 64100 | Monocyclic lactams | 1346 | 0.336 |
| 64200 | Cephalosporins | 113 | 0.322 |
| 64220 | Carbacephems | 1051 | 0.269 |
| 64500 | Carbapenems | 126 | 0.260 |
| 64350 | Tribactams | 388 | 0.305 |
| 75755 | Vitamin D analogues | 455 | 0.386 |
MDDR activity classes for DS3 data set.
| Activity index | Activity class | Active molecules | Pairwise similarity |
|---|---|---|---|
| 09249 | Muscarinic (M1) agonists | 900 | 0.111 |
| 12455 | NMDA receptor antagonists | 1400 | 0.098 |
| 12464 | Nitric oxide synthase inhibitors | 505 | 0.102 |
| 31281 | Dopamine hydroxylase inhibitors | 106 | 0.125 |
| 43210 | Aldose reductase inhibitors | 957 | 0.119 |
| 71522 | Reverse transcriptase inhibitors | 700 | 0.103 |
| 75721 | Aromatase inhibitors | 636 | 0.110 |
| 78331 | Cyclooxygenase inhibitors | 636 | 0.108 |
| 78348 | Phospholipase A2 inhibitors | 617 | 0.123 |
| 78351 | Lipoxygenase inhibitors | 2111 | 0.113 |
Retrieval results of top 1% for data set DS1.
| Activity index | GBMD | SMILE | PCFP | ALOGP | MACCS | EPFP4 | CDKFP |
|---|---|---|---|---|---|---|---|
| 31420 |
| 53.69 | 26.13 | 22.06 | 28.65 | 34.75 | 41.8 |
| 71523 | 26.49 |
| 9.61 | 13.72 | 14.71 | 14.29 | 19.6 |
| 37110 |
| 15.52 | 12.38 | 9.26 | 17.99 | 18.8 | 18.74 |
| 31432 |
| 34.96 | 15.55 | 16.52 | 24.52 | 22.81 | 25.75 |
| 42731 |
| 16.20 | 9.63 | 6.05 | 8.18 | 10.08 | 12.27 |
| 6233 | 8.16 | 7.30 | 6.8 | 7.98 | 8.8 | 8.35 |
|
| 6245 | 5.75 | 5.47 | 4.11 | 3.66 | 4.94 | 5.61 |
|
| 7701 |
| 7.79 | 4.62 | 5.86 | 7.39 | 6.75 | 7.77 |
| 6235 | 7.83 | 7.43 | 4.27 | 6.22 | 6.91 | 6.55 |
|
| 78374 | 8.03 | 7.92 |
| 7.81 | 6.02 | 8.01 | 10.64 |
| 78331 | 6.09 | 5.98 | 5.13 | 4.11 |
| 4.94 | 5.72 |
|
| |||||||
| Mean |
| 17.19 | 10.13 | 9.39 | 12.22 | 12.81 | 15.21 |
|
| |||||||
| Bold cells | 6 | 1 | 1 | 0 | 1 | 0 | 3 |
Retrieval results of top 5% for data set DS3.
| Activity index | GBMD | SMILE | PCFP | ALOGP | MACCS | EPFP4 | CDKFP |
|---|---|---|---|---|---|---|---|
| 9249 | 24.19 | 28.04 | 16.23 | 20.51 | 21.97 |
| 21.46 |
| 12455 | 10.21 |
| 11.87 | 8.03 | 9.26 | 8.51 | 9.39 |
| 12464 | 7.2 | 18.69 | 22.96 | 19.8 |
| 18.19 | 21.69 |
| 31281 | 25.62 | 27.52 | 23.9 | 33.24 |
| 34.95 | 27.71 |
| 43210 | 10.4 |
| 11.42 | 14.05 | 18.61 | 12.63 | 13.19 |
| 71522 |
| 9.17 | 12.86 | 8.5 | 12.02 | 13.89 | 10.92 |
| 75721 |
| 27.18 | 29.72 | 29.15 | 27.86 | 30.79 | 30.76 |
| 78331 |
| 12.25 | 9.61 | 9.26 | 13.75 | 10.17 | 9.43 |
| 78348 |
| 9.33 | 7.43 | 9.72 | 8.6 | 10.47 | 9.01 |
| 78351 | 12.93 | 16.61 | 15.03 | 10.58 | 12.89 |
| 16.16 |
|
| |||||||
| Mean | 18.19 | 18.17 | 16.1 | 16.28 |
| 18.59 | 16.97 |
|
| |||||||
| Bold cells | 4 | 2 | 0 | 0 | 3 | 2 | 0 |
Retrieval results of top 1% for data set DS2.
| Activity index | GBMD | SMILE | PCFP | ALOGP | MACCS | EPFP4 | CDKFP |
|---|---|---|---|---|---|---|---|
| 7707 | 60.39 | 60.87 | 43.69 | 64.56 | 63.35 | 72.52 |
|
| 7708 | 68.65 | 82.90 | 78 | 91.03 | 71.03 | 97.94 |
|
| 31420 | 64.59 |
| 39.05 | 37.91 | 29.61 | 43.14 | 49.2 |
| 64100 | 74.27 | 79.09 | 46.36 | 79.27 | 86.45 |
| 83.55 |
| 64200 | 83.2 | 67.13 | 69.68 | 81.59 | 83.72 |
| 91.03 |
| 64220 | 49.04 | 39.02 | 47.86 | 47.41 | 53.48 | 66.16 |
|
| 64500 | 59.66 | 65.69 | 46.26 | 30.94 | 52.99 |
| 67.9 |
| 64300 | 76.48 | 30.64 | 22.72 | 34.64 | 52.88 |
| 85.68 |
| 65000 |
| 62.14 | 45.97 | 65.06 | 57.88 | 75.45 | 70.34 |
| 75755 | 93.72 | 93.61 | 86.26 | 85.62 | 66.94 | 92.38 |
|
|
| |||||||
| Mean | 71.01 | 64.97 | 52.59 | 61.8 | 61.83 | 79.12 |
|
|
| |||||||
| Bold cells | 1 | 1 | 0 | 0 | 1 | 4 | 5 |
Retrieval results of top 1% for data set DS3.
| Activity index | GBMD | SMILE | PCFP | ALOGP | MACCS | EPFP4 | CDKFP |
|---|---|---|---|---|---|---|---|
| 9249 | 14.64 |
| 10.11 | 11.71 | 11.45 | 15.44 | 13.6 |
| 12455 | 5.81 |
| 6.81 | 3.94 | 4.94 | 6.27 | 6.23 |
| 12464 | 3.17 | 9.40 | 11.13 | 9.31 |
| 8.65 | 11.61 |
| 31281 | 16.38 | 16.19 | 12.19 |
| 21.05 | 17.33 | 16.48 |
| 43210 | 9.58 | 9.16 | 4.6 | 5.84 |
| 6.98 | 7.04 |
| 71522 |
| 4.36 | 6.27 | 3.71 | 5.38 | 6.62 | 5.75 |
| 75721 |
| 19.54 | 22.09 | 17.8 | 19.39 | 22.24 | 22.33 |
| 78331 |
| 5.81 | 4.16 | 3.97 | 6.08 | 4.55 | 5.09 |
| 78348 |
| 4.70 | 3.58 | 4.24 | 3.27 | 4.29 | 3.69 |
| 78351 | 11.05 | 14.46 | 12.69 | 8.32 | 13.32 | 15.01 |
|
|
| |||||||
| Mean |
| 10.68 | 9.36 | 9.11 |
| 10.74 | 10.73 |
|
| |||||||
| Bold cells | 5 | 2 | 0 | 1 | 2 | 0 | 1 |
Retrieval results of top 5% for data set DS1.
| Activity index | GBMD | SMILE | PCFP | ALOGP | MACCS | EPFP4 | CDKFP |
|---|---|---|---|---|---|---|---|
| 31420 | 82.74 |
| 45.95 | 45.08 | 55.41 | 76.76 | 80.27 |
| 71523 | 52.92 |
| 19.73 | 33.38 | 29.97 | 33.31 | 37.92 |
| 37110 | 31.47 | 33.77 | 27.99 | 26.71 | 34.7 |
| 37.26 |
| 31432 |
| 66.27 | 33.73 | 39.37 | 48.29 | 41.01 | 51.46 |
| 42731 |
| 24.00 | 19.32 | 12.91 | 19.36 | 20.71 | 23.2 |
| 6233 | 18.28 | 14.94 | 17 | 20.47 |
| 20 | 19.92 |
| 6245 | 13.69 | 13.46 | 10.08 | 10.59 | 11.06 | 12.65 |
|
| 7701 |
| 19.92 | 11.62 | 13.6 | 22.34 | 17.69 | 18.86 |
| 6235 | 19.35 |
| 13.51 | 14.71 | 20.33 | 17.82 | 19.21 |
| 78374 | 15.8 | 13.05 |
| 14.71 | 11.73 | 12.59 | 15.11 |
| 78331 | 13.51 | 14.25 | 11.23 | 9.97 |
| 9.37 | 10.55 |
|
| |||||||
| Mean |
| 32.84 | 20.75 | 21.95 | 26.51 | 27.44 | 30.15 |
|
| |||||||
| Bold cells | 4 | 2 | 1 | 0 | 2 | 1 | 1 |
Retrieval results of top 5% for data set DS2.
| Activity index | GBMD | SMILE | PCFP | ALOGP | MACCS | EPFP4 | CDKFP |
|---|---|---|---|---|---|---|---|
| 7707 | 71.12 | 71.41 | 71.6 | 75.49 |
| 77.96 | 75.39 |
| 7708 | 83.42 | 98.26 | 90.32 | 99.74 | 97.23 |
|
|
| 31420 | 90.5 |
| 55.58 | 61.15 | 55.9 | 81.73 | 88.81 |
| 64100 | 87.45 | 89.64 | 81.64 | 85.82 |
| 95.64 | 94.55 |
| 64200 | 96.53 | 88.24 | 89.6 | 95.36 | 97.87 |
| 99.63 |
| 64220 | 83.63 | 72.77 | 67.41 | 65.27 | 78.39 |
| 99.82 |
| 64500 | 76.84 | 93.51 | 78.89 | 59.66 | 85.77 |
| 97.41 |
| 64300 | 90.32 | 47.92 | 40.64 | 56.24 | 83.12 |
| 99.6 |
| 65000 |
| 71.01 | 71.91 | 85.5 | 81.34 | 86.59 | 85.43 |
| 75755 | 97.09 | 97.53 | 96.81 | 98.04 | 86.96 | 97.8 |
|
|
| |||||||
| Mean | 87.30 | 82.33 | 74.44 | 78.23 | 84.32 |
| 93.88 |
|
| |||||||
| Bold cells | 1 | 1 | 0 | 0 | 2 | 6 | 2 |
Rankings of various types of descriptors based on Kendall W test results: Top 1 and 5%.
| Data set | Recall type |
|
| Ranking |
|---|---|---|---|---|
| DS1 | Top 1% | 0.599 | <0.01 | GBMD > CDKFP > SMILE > EPFP4 = MACCS > PCFP > ALOGP |
| Top 5% | 0.372 | <0.01 | GBMD > SMILE > CDKFP > MACCS > EPFP4> ALOGP > PCFP | |
|
| ||||
| DS2 | Top 1% | 0.503 | <0.01 | CDKFP > EPFP4 > GBMD > SMILE > MACCS > ALOGP > PCFP |
| Top 5% | 0.443 | <0.01 | EPFP4 > CDKFP > MACCS > GBMD > ALOGP > SMILE > PCFP | |
|
| ||||
| DS3 | Top 1% | 0.189 | <0.01 | GBMD > EPFP4 = CDKFP = SMILE > MACCS > PCFP > ALOGP |
| Top 5% | 0.141 | <0.01 | EPFP4 > GBMD > MACCS > CDKFP > PCFP > SMILE > ALOGP | |
Numbers of bold cells for mean recall of actives using different descriptors: DS1-DS3 Top 1% and 5%.
| Data set | GBMD | SMILE | PCFP | ALOGP | MACCS | EPFP4 | CDKFP |
|---|---|---|---|---|---|---|---|
| Top 1% | |||||||
| DS1 | 6 | 1 | 1 | 0 | 1 | 0 | 3 |
| DS2 | 1 | 1 | 0 | 0 | 1 | 4 | 5 |
| DS3 | 5 | 2 | 0 | 1 | 2 | 0 | 1 |
|
| |||||||
| Top 5% | |||||||
| DS1 | 4 | 2 | 1 | 0 | 2 | 1 | 1 |
| DS2 | 1 | 1 | 0 | 0 | 2 | 6 | 2 |
| DS3 | 4 | 2 | 0 | 0 | 3 | 2 | 0 |
Retrieval results of top 1% and 5% for data set DS1 compared with LWDOSM and Lingo-DOSM.
| Activity index | Top 1% | Top 5% | ||||
|---|---|---|---|---|---|---|
| GBMD | LINGO-DOSM | LWDOSM | GBMD | LINGO-DOSM | LWDOSM | |
| 31420 | 60.93 | 61.1 |
| 82.74 | 84.82 |
|
| 71523 |
| 26.1 | 20.4 |
| 50.11 | 43.5 |
| 37110 |
| 17.37 | 12.18 |
| 28.19 | 23.8 |
| 31432 |
| 38.63 | 36.03 | 74.24 |
| 68.18 |
| 42731 |
| 11.86 | 14.34 |
| 21.62 | 27.51 |
| 6233 | 8.16 |
| 9.36 | 18.28 |
| 16.32 |
| 6245 | 5.75 | 4.66 |
| 13.69 | 10.92 |
|
| 7701 |
| 10.38 | 8.98 |
| 22.99 | 24.31 |
| 6235 | 7.83 |
| 8.23 | 19.35 |
| 21.42 |
| 78374 | 8.03 |
| 11.66 | 15.8 | 18.3 |
|
| 78331 |
| 5.8 | 4.79 |
| 10.16 | 12.98 |
|
| ||||||
| Mean |
| 19.06 | 18.65 |
| 34.04 | 33.42 |
|
| ||||||
| Bold cells | 7 | 3 | 2 | 6 | 3 | 3 |
Retrieval results of top 1% and 5% for data set DS3 compared with LWDOSM and Lingo-DOSM.
| Activity index | Top 1% | Top 5% | ||||
|---|---|---|---|---|---|---|
| GBMD | LINGO-DOSM | LWDOSM | GBMD | LINGO-DOSM | LWDOSM | |
| 9249 | 14.64 |
| 9.84 | 24.19 |
| 16.24 |
| 12455 | 5.81 |
| 8.29 | 10.21 | 14.62 |
|
| 12464 | 3.17 |
| 4.8 | 7.2 | 11.27 | 12.1 |
| 31281 |
| 16.19 | 12.1 | 25.62 |
| 21.9 |
| 43210 | 5.58 | 4.56 |
| 10.4 | 10.4 |
|
| 71522 |
| 2.36 | 2.98 |
| 5.89 | 6.34 |
| 75721 |
| 22.96 | 23.67 |
| 28.17 | 33.4 |
| 78331 | 6.83 |
| 7.59 | 14.38 | 15.48 |
|
| 78348 |
| 7.09 | 7.69 |
| 13.9 | 18.05 |
| 78351 |
| 10.31 | 8.24 | 12.93 |
| 11.46 |
|
| ||||||
| Mean |
| 10.36 | 9.13 |
| 16.69 | 16.64 |
|
| ||||||
| Bold cells | 6 | 4 | 1 | 4 | 3 | 3 |
Retrieval results of top 1% and 5% for data set DS2 compared with LWDOSM and Lingo-DOSM.
| Activity index | Top 1% | Top 5% | ||||
|---|---|---|---|---|---|---|
| GBMD | LINGO-DOSM | LWDOSM | GBMD | LINGO-DOSM | LWDOSM | |
| 7707 |
| 60.29 | 54.71 | 71.12 |
| 70.34 |
| 7708 | 68.65 |
| 82 | 83.42 |
| 93.29 |
| 31420 | 64.59 | 66.77 |
| 90.5 | 84.81 |
|
| 64100 | 74.27 | 79 |
| 87.45 | 98.45 |
|
| 64200 | 83.2 |
| 86.07 | 96.53 | 99.55 |
|
| 64220 | 49.04 |
| 68.6 | 83.63 |
| 92.23 |
| 64500 | 59.66 |
| 74.65 | 76.84 |
| 95.56 |
| 64300 | 76.48 |
| 73.52 | 90.32 |
| 91.28 |
| 65000 |
| 51.19 | 51.06 |
| 78.24 | 67.8 |
| 75755 | 93.72 |
| 96.3 | 97.09 |
| 97.91 |
|
| ||||||
| Mean | 71.01 |
| 74.71 | 87.3 |
| 90 |
|
| ||||||
| Bold cells | 2 | 7 | 2 | 1 | 6 | 3 |