| Literature DB >> 35454097 |
Maged Nasser1, Naomie Salim1, Faisal Saeed2, Shadi Basurra2, Idris Rabiu1, Hentabli Hamza1,3, Muaadh A Alsoufi1.
Abstract
The concept of molecular similarity has been commonly used in rational drug design, where structurally similar molecules are examined in molecular databases to retrieve functionally similar molecules. The most used conventional similarity methods used two-dimensional (2D) fingerprints to evaluate the similarity of molecules towards a target query. However, these descriptors include redundant and irrelevant features that might impact the performance of similarity searching methods. Thus, this study proposed a new approach for identifying the important features of molecules in chemical datasets based on the representation of the molecular features using Autoencoder (AE), with the aim of removing irrelevant and redundant features. The proposed approach experimented using the MDL Data Drug Report standard dataset (MDDR). Based on experimental findings, the proposed approach performed better than several existing benchmark similarity methods such as Tanimoto Similarity Method (TAN), Adapted Similarity Measure of Text Processing (ASMTP), and Quantum-Based Similarity Method (SQB). The results demonstrated that the performance achieved by the proposed approach has proven to be superior, particularly with the use of structurally heterogeneous datasets, where it yielded improved results compared to other previously used methods with the similar goal of improving molecular similarity searching.Entities:
Keywords: autoencoder; drug design; irrelevant and redundant features; molecular similarity
Mesh:
Year: 2022 PMID: 35454097 PMCID: PMC9029813 DOI: 10.3390/biom12040508
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1The Standard structure of Autoencoder.
Figure 2The visualization description of Autoencoder.
Figure 3Autoencoder framework for molecular dimensionality reduction.
Figure 4The AE1-DR proposed design.
Figure 5The AE2-DR proposed design.
Figure 6The AE3-DR proposed design.
Figure 7The Experimental design processes.
The MDDR-DS1 structure activity classes.
| Activity Class | Active Molecules | Activity | Pairwise |
|---|---|---|---|
| Renin inhibitors | 1130 | 31,420 | 0.290 |
| HIV protease inhibitors | 750 | 71,523 | 0.198 |
| Thrombin inhibitors | 803 | 37,110 | 0.180 |
| Angiotensin II AT1 antagonists | 943 | 31,432 | 0.229 |
| Substance P antagonists | 1246 | 42,731 | 0.149 |
| 5HT3 antagonist | 752 | 06233 | 0.140 |
| 5HT reuptake inhibitors | 359 | 06245 | 0.122 |
| D2 antagonists | 395 | 07701 | 0.138 |
| 5HT1A agonists | 827 | 06235 | 0.133 |
| Protein kinase C inhibitors | 453 | 78,374 | 0.120 |
| Cyclooxygenase inhibitors | 636 | 78,331 | 0.108 |
The MDDR-DS2 structure activity classes.
| Activity Class | Active Molecules | Activity | Pairwise |
|---|---|---|---|
| Adenosine (A1) agonists | 207 | 07707 | 0.229 |
| Adenosine (A2) agonists | 156 | 07708 | 0.305 |
| Renin inhibitors | 1130 | 31,420 | 0.290 |
| CCK agonists | 111 | 42,710 | 0.361 |
| Monocyclic | 1346 | 64,100 | 0.336 |
| Cephalosporins | 113 | 64,200 | 0.322 |
| Carbacephems | 1051 | 64,220 | 0.269 |
| Carbapenems | 126 | 64,500 | 0.260 |
| Tribactams | 388 | 64,350 | 0.305 |
The MDDR-DS3 structure activity classes.
| Activity Class | Active | Activity | Pairwise Similarity |
|---|---|---|---|
| Muscarinic (M1) agonists | 900 | 09249 | 0.111 |
| NMDA receptor antagonists | 1400 | 12,455 | 0.098 |
| Nitric oxide synthase inhibitors | 505 | 12,464 | 0.102 |
| Dopamine | 106 | 31,281 | 0.125 |
| Aldose reductase inhibitors | 957 | 43,210 | 0.119 |
| Reverse transcriptase inhibitors | 700 | 71,522 | 0.103 |
| Aromatase inhibitors | 636 | 75,721 | 0.110 |
| Cyclooxygenase inhibitors | 636 | 78,331 | 0.108 |
| Phospholipase A2 inhibitors | 617 | 78,348 | 0.123 |
| Lipoxygenase inhibitors | 2111 | 78,351 | 0.113 |
The Retrieval results of the top 1% for the MDDR-DS1 dataset.
| Activity Index | TAN | ASMTP | SQB | AE1_DR | AE2_DR | AE3_DR |
|---|---|---|---|---|---|---|
| 31,420 | 69.69 | 73.84 | 73.73 | 71.31 | 70.43 | 70.99 |
| 71,523 | 25.94 | 15.03 | 26.84 | 28.37 | 25.37 | 25.85 |
| 37,110 | 9.63 | 20.82 | 24.73 | 21.40 | 21.90 | 20.92 |
| 31,432 | 35.82 | 37.14 | 36.66 | 41.34 | 40.71 | 41.04 |
| 42,731 | 17.77 | 19.53 | 21.17 | 19.23 | 17.67 | 22.03 |
| 06233 | 13.87 | 10.35 | 12.49 | 13.01 | 14.04 | 14.87 |
| 06245 | 6.51 | 5.50 | 6.03 | 6.03 | 7.78 | 7.08 |
| 07701 | 8.63 | 7.99 | 11.35 | 9.87 | 8.91 | 12.31 |
| 06235 | 9.71 | 9.94 | 10.15 | 10.71 | 11.07 | 10.49 |
| 78,374 | 13.69 | 13.90 | 13.08 | 11.91 | 12.04 | 13.74 |
| 78,331 | 7.17 | 6.89 | 5.92 | 7.23 | 7.07 | 8.14 |
|
| 19.86 | 20.08 | 22.01 | 21.86 | 21.54 |
|
|
| 0 | 2 | 1 | 2 | 2 |
|
The Retrieval results of top 5% for MDDR-DS1 dataset.
| Activity Index | TAN | ASMTP | SQB | AE1_DR | AE2_DR | AE3_DR |
|---|---|---|---|---|---|---|
| 31,420 | 83.49 | 86 | 87.75 | 85.8 | 85.03 | 87.08 |
| 71,523 | 48.92 | 51.33 | 60.16 | 55.21 | 57.22 | 56.41 |
| 37,110 | 21.01 | 23.87 | 39.81 | 43.53 | 42.17 | 41.79 |
| 31,432 | 74.29 | 76.63 | 82 | 78.72 | 80.40 | 80.12 |
| 42,731 | 29.68 | 32.9 | 28.77 | 27.04 | 26.03 | 27.04 |
| 06233 | 27.68 | 26.2 | 20.96 | 23.8 | 24.11 | 25.19 |
| 06245 | 16.54 | 15.5 | 15.39 | 19.76 | 21.17 | 21.07 |
| 07701 | 24.09 | 23.9 | 26.90 | 25.21 | 24.78 | 26.25 |
| 06235 | 20.06 | 23.6 | 22.47 | 22.08 | 21.91 | 24.17 |
| 78,374 | 20.51 | 22.26 | 20.95 | 18.19 | 19.88 | 23.74 |
| 78,331 | 16.2 | 15 | 10.31 | 11.07 | 11.9 | 13.19 |
|
| 34.77 | 36.11 | 37.77 | 37.31 | 37.69 |
|
|
| 2 | 1 |
| 0 | 1 |
|
The Retrieval results of top 1% for MDDR-DS2 dataset.
| Activity Index | TAN | ASMTP | SQB | AE1_DR | AE2_DR | AE3_DR |
|---|---|---|---|---|---|---|
| 07707 | 61.84 | 67.86 | 72.09 | 70.15 | 73.18 | 73.46 |
| 07708 | 47.03 | 97.87 | 95.68 | 95.73 | 97.57 | 98.75 |
| 31,420 | 65.10 | 73.51 | 78.56 | 73.75 | 75.17 | 74.04 |
| 42,710 | 81.27 | 81.17 | 76.82 | 80.12 | 83.03 | 82.01 |
| 64,100 | 80.31 | 86.62 | 87.80 | 86.19 | 88.17 | 87.79 |
| 64,200 | 53.84 | 69.11 | 70.18 | 67.61 | 67.02 | 69.08 |
| 64,220 | 38.64 | 66.26 | 67.58 | 67.96 | 66.74 | 67.19 |
| 64,500 | 30.56 | 46.24 | 79.20 | 74.04 | 76.02 | 79.72 |
| 64,350 | 80.18 | 68.01 | 81.68 | 81.96 | 81.77 | 83.09 |
| 75,755 | 87.56 | 93.48 | 98.02 | 97.26 | 97.08 | 98.15 |
|
| 62.63 | 75.01 | 80.76 | 79.48 | 80.58 |
|
|
| 0 | 0 | 2 | 1 | 2 |
|
The Retrieval results of top 5% for MDDR-DS2 dataset.
| Activity Index | TAN | ASMTP | SQB | AE1_DR | AE2_DR | AE3_DR |
|---|---|---|---|---|---|---|
| 07707 | 70.39 | 76.17 | 74.22 | 73.33 | 77.78 | 80.24 |
| 07708 | 56.58 | 99.99 | 100 | 97.9 | 98.03 | 99.28 |
| 31,420 | 88.19 | 95.75 | 95.24 | 92.08 | 94.11 | 95.22 |
| 42,710 | 88.09 | 96.73 | 93 | 91.06 | 91.27 | 92.71 |
| 64,100 | 93.75 | 98.27 | 98.94 | 98.90 | 97.41 | 97.85 |
| 64,200 | 77.68 | 96.16 | 98.93 | 93.80 | 94.80 | 95.90 |
| 64,220 | 52.19 | 94.13 | 90.9 | 91.5 | 92.09 | 92.33 |
| 64,500 | 44.8 | 90.6 | 92.72 | 89.04 | 91.08 | 91.07 |
| 64,350 | 91.71 | 98.6 | 93.75 | 91.11 | 92.44 | 90.9 |
| 75,755 | 94.82 | 97.27 | 98.75 | 98.08 | 97.09 | 97.19 |
|
| 75.82 |
| 93.61 | 91.68 | 92.61 | 93.27 |
|
| 0 | 4 |
| 0 | 0 | 1 |
The Retrieval results of top 1% for MDDR-DS3 dataset.
| Activity Index | TAN | SQB | AE1_DR | AE2_DR | AE3_DR |
|---|---|---|---|---|---|
| 09249 | 12.12 | 10.99 | 15.01 | 16.03 | 17.76 |
| 12,455 | 6.57 | 7.03 | 7.88 | 9.17 | 6.77 |
| 12,464 | 8.17 | 6.92 | 11.12 | 12.50 | 12.04 |
| 31,281 | 16.95 | 18.67 | 17.66 | 17.75 | 16.5 |
| 43,210 | 6.27 | 6.83 | 9.76 | 9.07 | 10.90 |
| 71,522 | 3.75 | 6.57 | 7.19 | 9.14 | 9.02 |
| 75,721 | 17.32 | 20.38 | 22.29 | 21.66 | 23.90 |
| 78,331 | 6.31 | 6.16 | 6.09 | 5.06 | 8.98 |
| 78,348 | 10.15 | 8.99 | 9.11 | 6.89 | 6.40 |
| 78,351 | 9.84 | 12.5 | 14.02 | 15.78 | 16.06 |
|
| 9.75 | 10.50 | 12.01 | 12.31 |
|
|
| 1 | 1 | 0 | 3 |
|
The Retrieval results of top 5% for MDDR-DS3 dataset.
| Activity Index | TAN | SQB | AE1_DR | AE2_DR | AE3_DR |
|---|---|---|---|---|---|
| 09249 | 24.17 | 17.8 | 26.08 | 26.02 | 25.79 |
| 12,455 | 10.29 | 11.42 | 14.85 | 15.86 | 14.99 |
| 12,464 | 15.22 | 16.79 | 19.76 | 20.74 | 19.78 |
| 31,281 | 29.62 | 29.05 | 32.33 | 33.19 | 35.01 |
| 43,210 | 16.07 | 14.12 | 19.11 | 20.22 | 19.55 |
| 71,522 | 12.37 | 13.82 | 15.44 | 15.07 | 16.06 |
| 75,721 | 25.21 | 30.61 | 33.71 | 34.45 | 35.33 |
| 78,331 | 15.01 | 11.97 | 13.22 | 13.10 | 14.12 |
| 78,348 | 24.67 | 21.14 | 20.87 | 20.98 | 21.89 |
| 78,351 | 11.71 | 13.30 | 17.50 | 16.45 | 18.08 |
|
| 18.43 | 18.00 | 21.29 | 21.61 |
|
|
| 2 | 0 | 1 | 3 |
|
The Rankings of TAN, ASMTP, SQB, AE1-DR, AE2-DR, and AE3-DR approaches Based on Kendall W Test Results: DS1, DS2, and DS3 at top 1% and top 5%.
| Data Set | Recall Cut-Off | W | P | Mean Rank | |||||
|---|---|---|---|---|---|---|---|---|---|
| TAN | ASMTP | SQB | AE1_DR | AE2_DR | AE3_DR | ||||
| DS1 | 1% | 0.19 | 0.00012 | 1.56 | 1.64 | 2.59 | 2.95 | 2.55 |
|
| DS1 | 5% | 0.11 | 0.03 | 1.73 | 2.55 | 2.82 | 2.05 | 2.36 |
|
| DS2 | 1% | 0.49 | 0.002 | 0.4 | 1.7 | 3.2 | 2.4 | 3.2 |
|
| DS2 | 5% | 0.61 | 0.001 | 0.2 |
|
| 1.8 | 2.6 | 3.1 |
| DS3 | 1% | 0.23 | 0.0001 | 1 | Not used | 1.4 | 2.3 | 2.6 |
|
| DS3 | 5% | 0.47 | 0.0011 | 1.1 | Not used | 0.7 | 2.2 | 2.7 |
|