| Literature DB >> 20150995 |
Andreas Jahn1, Georg Hinselmann, Nikolas Fechner, Andreas Zell.
Abstract
BACKGROUND: Ligand-based virtual screening experiments are an important task in the early drug discovery stage. An ambitious aim in each experiment is to disclose active structures based on new scaffolds. To perform these "scaffold-hoppings" for individual problems and targets, a plethora of different similarity methods based on diverse techniques were published in the last years. The optimal assignment approach on molecular graphs, a successful method in the field of quantitative structure-activity relationships, has not been tested as a ligand-based virtual screening method so far.Entities:
Year: 2009 PMID: 20150995 PMCID: PMC2820492 DOI: 10.1186/1758-2946-1-14
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Optimal atom assignment. Optimal atom assignment of two angiotensine-converting enzyme molecules. The assignments are based on local atom similarity calculations of the OAK. The color of the mapping edges indicates the atom similarity: green represents a high similarity whereas red edges indicate a low similarity.
Figure 2Local flexibility. Visualization of the local flexibility for one core atom. The colored shapes represent possible positions of the equal colored atoms. The black core atom is the source of the flexibility objects.
Figure 3Optimal atom assignment with topological errors. Atom mapping of the OAK on two benzodiazepine derivatives disclosing topological errors. Each of the four intersecting edges maps one atom of the aromatic system on the condensed system.
Figure 4Fragmentation and assignment of fragments. Result of the fragmentation algorithm and the first assignment step. The aromatic and condensed systems were mapped onto each other. The terminal nitro group forms a conjugated fragment but has no assignment partner and remains unassigned.
Figure 5Optimal atom assignment using the two-step hierarchical assignment. Result of an optimal assignment using the two-step hierarchical assignment approach with the case differentiation of the pairwise atom similarity calculation. The hierarchical assignment reduces the number of topological errors and shows a substructure preserving mapping. The mapping of the carbon atom of the condensed ring system onto the nitrogen of the nitro group is an example of the sixth case and results in a penalized mapping.
Figure 6Binned geometrical distances, spheres and trie. The upper left figure shows the spheres of the binned geometrical distances of 1.0, 2.0, and 3.0 Å for the centered carbon atom. The sphere of the binned geometrical distance of 0.0 Å (distances in the range [0.0;1.0)) is not visualized as individual sphere because it contains no atoms. The upper right figure illustrates the resulting local atom pair environment of binned geometrical distances. For simplicity, only the distances to non-carbon atoms are displayed. The lower figure visualizes the corresponding trie of geometric atomic distances of the annotated atom in the upper figures. The root and leaves are labeled with the corresponding atom type. The leaves contain additionally the total number of occurrences in the local atom pair environment.
Data sets.
| target | number actives | number decoys | number clustersa | PDB codeb |
|---|---|---|---|---|
| acec | 46 | 1796 | 19p | |
| ached | 100 | 3859 | 18 | |
| cdk2e | 47 | 2070 | 32 | |
| cox2f | 212 | 12606 | 44 | |
| egfrg | 365 | 15560 | 40 | |
| fxah | 64 | 2092 | 19 | |
| hivrti | 34 | 1494 | 17 | |
| inhaj | 57 | 2707 | 23 | |
| p38k | 137 | 6779 | 20 | |
| pde5l | 26 | 1698 | 22 | |
| pdgfrbm | 124 | 5603 | 22 | |
| srcn | 98 | 5679 | 21 | |
| vegfr2o | 48 | 2712 | 31 |
Overview of the used data sets containing the number of actives, decoys, different chemotype clusters, and the PDB code of the complexed crystal structure which contains the search query.
aNumber of clusters using the reduced graph algorithm from Barker et al. [55]
b PDB code of the complexed crystal structures from which the search queries were taken.
c Angiotensine-converting enzyme
d Acetylcholinesterase
e Cyclin-dependent kinase
f Cyclooxygenase-2
g Epidermal growth factor receptor
h Factor Xa
i HIV reverse transcriptase
j Enoyl ACP reductase
k P38 mitogen activated protein
l Phosphodiesterase 5
m Platelet derived growth factor receptor kinase
n Tyrosine kinase
o Vascular endothelial growth factor receptor
p The molecule with the ZINC ID 03814157 does not contain a ring and therefore it is assigned to a dummy cluster forming one additional cluster.
awROC Enrichments at 0.5%.
| target | DOCK | FieldScreen | MACCS | OAK | OAKFLEX | 2SHA | OAAP |
|---|---|---|---|---|---|---|---|
| ace | 17.0 ± 6.2 | 14.7 | 55.7 ± 8.5 | 94.5 ± 9.6 | 73.5 ± 8.6 | 30.5 ± 8.4 | |
| ache | 0.0 ± 0.0 | 16.7 | 19.1 ± 4.6 | 24.5 ± 4.8 | 24.8 ± 4.9 | 23.1 ± 4.7 | |
| cdk2 | 4.0 ± 6.1 | 7.5 | 9.4 ± 1.8 | 9.4 ± 1.8 | 9.4 ± 1.8 | 9.4 ± 3.7 | |
| cox2 | 1.9 ± 0.6 | 48.8 | 17.0 ± 3.0 | 25.7 ± 3.6 | 42.5 ± 4.5 | 38.3 ± 6.4 | |
| egfr | 7.6 ± 2.2 | 52.4 | 40.3 ± 3.2 | 56.6 ± 3.9 | 47.4 ± 3.6 | 53.1 ± 4.3 | |
| fxa | 15.1 ± 6.8 | 0.0 | 20.0 ± 6.3 | 10.0 ± 4.6 | 10.0 ± 4.6 | 20.0 ± 6.3 | |
| hivrt | 4.4 ± 1.8 | 22.0 ± 5.5 | 20.1 ± 5.6 | 20.1 ± 5.6 | 20.1 ± 5.6 | 34.8 ± 8.0 | |
| inha | 0.0 ± 0.0 | 56.7 | 49.9 ± 7.7 | 31.8 ± 8.7 | 43.9 ± 8.4 | 57.9 ± 13.5 | |
| p38 | 0.0 ± 0.0 | 3.7 | 1.0 ± 0.4 | 22.2 ± 5.9 | 18.8 ± 5.0 | 10.7 ± 4.3 | |
| pde5 | 4.4 ± 4.3 | 6.8 | 4.3 ± 3.3 | 4.3 ± 2.3 | 0.0 ± 0.0 | ||
| pdgfrb | 0.0 ± 0.0 | 27.3 | 42.0 ± 7.4 | 47.0 ± 6.7 | 44.4 ± 6.7 | 43.9 ± 6.7 | |
| src | 0.0 ± 0.0 | 0.0 ± 0.0 | 5.7 ± 1.2 | 9.2 ± 1.2 | 4.3 ± 1.6 | 4.7 ± 1.0 | |
| vegfr2 | 6.2 ± 3.0 | 12.9 | 6.3 ± 1.6 | 12.5 ± 3.4 | 12.5 ± 3.4 | 12.5 ± 4.1 | |
| avg. rank | 6.27 | 4.08 | 4.35 | 3.23 | 3.27 | 3.54 | 3.12 |
awROC Enrichments at 0.5% false positive fraction of DOCK, FieldScreen, MACCS keys, and the optimal assignment methods.
awROC Enrichments at 1.0%.
| target | DOCK | FieldScreen | MACCS | OAK | OAKFLEX | 2SHA | OAAP |
|---|---|---|---|---|---|---|---|
| ace | 13.5 ± 4.1 | 12.6 | 36.8 ± 4.9 | 47.3 ± 4.8 | 42.0 ± 4.9 | 20.5 ± 4.3 | |
| ache | 0.0 ± 0.0 | 9.8 ± 2.4 | 13.0 ± 2.5 | 14.2 ± 2.6 | 15.2 ± 2.8 | 11.8 ± 2.4 | |
| cdk2 | 9.5 ± 2.7 | 3.8 | 4.9 ± 1.0 | 6.5 ± 1.5 | 4.9 ± 1.5 | 11.1 ± 2.4 | |
| cox2 | 5.3 ± 1.4 | 29.5 | 10.7 ± 1.7 | 17.2 ± 2.0 | 24.0 ± 2.4 | 33.7 ± 2.9 | |
| egfr | 7.7 ± 1.6 | 29.5 | 20.4 ± 1.6 | 40.3 ± 3.1 | 26.2 ± 1.8 | 28.3 ± 2.0 | |
| fxa | 13.5 ± 3.6 | 2.8 | 10.5 ± 3.3 | 5.2 ± 2.4 | 5.2 ± 2.4 | 10.5 ± 3.3 | |
| hivrt | 2.2 ± 0.9 | 11.7 | 10.7 ± 3.7 | 10.7 ± 2.8 | 10.7 ± 2.8 | 10.7 ± 2.8 | |
| inha | 0.0 ± 0.0 | 31.2 | 31.0 ± 4.0 | 25.2 ± 3.6 | 22.9 ± 3.6 | 31.8 ± 3.7 | |
| p38 | 0.0 ± 0.0 | 1.8 | 0.5 ± 0.2 | 14.1 ± 3.2 | 11.8 ± 2.5 | 5.6 ± 2.2 | |
| pde5 | 4.5 | 2.3 ± 1.7 | 4.5 ± 2.0 | 4.5 ± 2.0 | 2.3 ± 1.2 | 0.0 ± 0.0 | |
| pdgfrb | 0.0 ± 0.0 | 13.6 | 23.2 ± 3.5 | 23.9 ± 3.4 | 22.9 ± 3.4 | 22.3 ± 3.4 | |
| src | 4.7 ± 2.1 | 7.0 | 0.0 ± 0.0 | 3.8 ± 0.5 | 5.4 ± 0.6 | 2.8 ± 0.5 | |
| vegfr2 | 3.2 ± 1.5 | 8.1 | 3.1 ± 0.8 | 6.3 ± 1.7 | 9.4 ± 3.1 | ||
| avg. rank | 5.54 | 4.00 | 4.73 | 3.65 | 4.04 | 2.73 | 3.23 |
awROC Enrichments at 1.0% false positive fraction of DOCK, FieldScreen, MACCS keys, and the optimal assignment methods.
awROC Enrichments at 2.0%.
| target | DOCK | FieldScreen | MACCS | OAK | OAKFLEX | 2SHA | OAAP |
|---|---|---|---|---|---|---|---|
| ace | 8.2 ± 1.9 | 8.9 | 21.3 ± 2.4 | 27.6 ± 2.5 | 21.0 ± 3.0 | 14.7 ± 2.6 | |
| ache | 0.0 ± 0.0 | 5.3 ± 1.2 | 7.5 ± 1.3 | 7.5 ± 1.3 | 10.1 ± 1.4 | 6.3 ± 1.3 | |
| cdk2 | 7.0 ± 1.4 | 1.9 | 2.5 ± 0.5 | 5.5 ± 1.2 | 4.8 ± 1.1 | 7.9 ± 1.4 | |
| cox2 | 5.9 ± 1.3 | 17.8 | 6.8 ± 0.8 | 14.1 ± 1.6 | 17.6 ± 1.5 | 22.9 ± 1.4 | |
| egfr | 7.2 ± 0.8 | 18.1 | 11.0 ± 0.8 | 24.1 ± 1.3 | 20.5 ± 1.3 | 15.8 ± 1.1 | |
| fxa | 6.8 ± 1.8 | 5.4 | 5.3 ± 1.7 | 2.6 ± 1.2 | 2.6 ± 1.2 | 5.3 ± 1.7 | |
| hivrt | 2.2 ± 0.5 | 8.8 ± 1.8 | 8.3 ± 2.1 | 5.4 ± 1.8 | 5.4 ± 1.4 | 9.8 ± 2.0 | |
| inha | 0.0 ± 0.0 | 15.6 | 17.5 ± 2.0 | 15.9 ± 1.9 | 13.8 ± 1.7 | 16.2 ± 2.0 | |
| p38 | 0.8 ± 0.6 | 0.9 | 0.8 ± 0.3 | 9.3 ± 1.6 | 8.8 ± 1.6 | 5.7 ± 1.7 | |
| pde5 | 7.8 ± 1.7 | 1.1 ± 0.8 | 2.3 ± 1.0 | 3.4 ± 0.8 | 4.5 ± 0.8 | 0.0 ± 0.0 | |
| pdgfrb | 0.0 ± 0.0 | 9.1 | 12.8 ± 1.8 | 12.1 ± 1.7 | 11.5 ± 1.7 | 11.3 ± 1.7 | |
| src | 2.5 ± 1.1 | 3.7 | 0.0 ± 0.0 | 2.5 ± 0.2 | 6.4 ± 1.1 | 1.6 ± 0.3 | |
| vegfr2 | 3.2 ± 1.1 | 7.3 | 6.4 ± 1.3 | 3.2 ± 0.9 | 3.2 ± 0.9 | 4.8 ± 1.1 | |
| avg. rank | 6.54 | 3.62 | 4.93 | 3.58 | 4.12 | 2.81 | 3.73 |
awROC Enrichments at 2.0% false positive fraction of DOCK, FieldScreen, MACCS keys, and the optimal assignment methods.
awROC Enrichments at 5.0%.
| target | DOCK | FieldScreen | MACCS | OAK | OAKFLEX | 2SHA | OAAP |
|---|---|---|---|---|---|---|---|
| ace | 4.6 ± 0.9 | 4.7 | 10.7 ± 1.2 | 11.6 ± 1.0 | 8.0 ± 1.0 | ||
| ache | 0.8 ± 0.2 | 2.1 ± 0.5 | 3.9 ± 0.6 | 4.4 ± 0.6 | 5.4 ± 0.6 | 4.0 ± 0.6 | |
| cdk2 | 2.8 ± 0.6 | 0.8 | 2.6 ± 0.6 | 2.6 ± 0.4 | 2.6 ± 0.4 | 3.5 ± 0.7 | |
| cox2 | 5.5 ± 0.5 | 10.4 | 4.9 ± 0.5 | 9.0 ± 0.6 | 8.8 ± 0.6 | 9.7 ± 0.6 | |
| egfr | 4.5 ± 0.4 | 9.5 | 5.0 ± 0.4 | 11.6 ± 0.5 | 11.3 ± 0.5 | 7.3 ± 0.5 | |
| fxa | 5.4 | 3.1 ± 0.8 | 2.1 ± 0.7 | 1.1 ± 0.5 | 2.6 ± 0.8 | 2.1 ± 0.7 | |
| hivrt | 2.2 ± 0.6 | 3.5 ± 1.1 | 3.3 ± 0.8 | 3.3 ± 0.8 | 3.5 ± 0.7 | ||
| inha | 0.0 ± 0.0 | 6.5 | 7.0 ± 0.8 | 8.6 ± 0.8 | 5.7 ± 0.7 | 7.0 ± 0.8 | |
| p38 | 1.4 ± 0.5 | 0.5 | 0.5 ± 0.1 | 4.3 ± 0.7 | 4.0 ± 0.6 | 2.9 ± 0.7 | |
| pde5 | 4.3 ± 0.8 | 0.5 ± 0.3 | 2.3 ± 0.6 | 1.4 ± 0.3 | 2.7 ± 0.6 | 1.4 ± 0.6 | |
| pdgfrb | 0.0 ± 0.0 | 3.8 | 6.0 ± 0.7 | 4.9 ± 0.7 | 4.9 ± 0.7 | 4.5 ± 0.7 | |
| src | 1.0 ± 0.4 | 2.5 | 0.1 ± 0.0 | 3.7 ± 0.7 | 4.5 ± 0.8 | 1.0 ± 0.1 | |
| vegfr2 | 1.3 ± 0.5 | 3.5 | 2.6 ± 0.5 | 1.3 ± 0.4 | 1.3 ± 0.4 | 2.6 ± 0.5 | |
| avg. rank | 5.54 | 3.69 | 5.04 | 3.69 | 4.19 | 2.23 | 3.62 |
awROC Enrichments at 5.0% false positive fraction of DOCK, FieldScreen, MACCS keys, and the optimal assignment methods.
awAUC values.
| target | DOCK | FieldScreen | MACCS | OAK | OAKFLEX | 2SHA | OAAP |
|---|---|---|---|---|---|---|---|
| ace | 0.67 ± 0.03 | 0.64 | 0.84 ± 0.03 | 0.81 ± 0.03 | 0.86 ± 0.03 | 0.72 ± 0.04 | |
| ache | 0.57 ± 0.02 | 0.37 ± 0.03 | 0.44 ± 0.03 | 0.46 ± 0.04 | 0.50 ± 0.04 | 0.50 ± 0.03 | |
| cdk2 | 0.53 ± 0.03 | 0.44 | 0.55 ± 0.02 | 0.46 ± 0.02 | 0.48 ± 0.03 | 0.53 ± 0.03 | |
| cox2 | 0.68 ± 0.02 | 0.82 | 0.56 ± 0.02 | 0.77 ± 0.02 | 0.77 ± 0.01 | 0.79 ± 0.01 | |
| egfr | 0.55 ± 0.02 | 0.60 ± 0.02 | 0.72 ± 0.02 | 0.70 ± 0.02 | 0.72 ± 0.02 | 0.49 ± 0.02 | |
| fxa | 0.72 ± 0.03 | 0.45 ± 0.03 | 0.46 ± 0.03 | 0.50 ± 0.03 | 0.56 ± 0.03 | 0.57 ± 0.03 | |
| hivrt | 0.63 | 0.54 ± 0.04 | 0.53 ± 0.03 | 0.47 ± 0.03 | 0.53 ± 0.04 | 0.65 ± 0.03 | |
| inha | 0.26 ± 0.02 | 0.64 ± 0.03 | 0.53 ± 0.04 | 0.52 ± 0.04 | 0.64 ± 0.03 | 0.59 ± 0.04 | |
| p38 | 0.36 ± 0.02 | 0.27 | 0.38 ± 0.02 | 0.47 ± 0.03 | 0.49 ± 0.03 | 0.43 ± 0.03 | |
| pde5 | 0.48 ± 0.04 | 0.28 ± 0.03 | 0.37 ± 0.04 | 0.32 ± 0.03 | 0.38 ± 0.03 | 0.35 ± 0.03 | |
| pdgfrb | 0.40 ± 0.02 | 0.40 | 0.54 ± 0.03 | 0.52 ± 0.03 | 0.49 ± 0.03 | ||
| src | 0.52 ± 0.02 | 0.39 | 0.50 ± 0.02 | 0.66 ± 0.02 | 0.72 ± 0.02 | 0.30 ± 0.02 | |
| vegfr2 | 0.42 ± 0.03 | 0.53 | 0.42 ± 0.03 | 0.31 ± 0.03 | 0.33 ± 0.02 | 0.41 ± 0.03 | |
| avg. rank | 4.27 | 3.42 | 4.50 | 4.04 | 4.62 | 3.08 | 3.92 |
awAUC values of DOCK, FieldScreen, MACCS keys, and the optimal assignment methods.
Figure 7Example of egfr clusters. The figure visualizes four parent structures of different egfr clusters. Although these structures belong to different clusters, they share a common basic scaffold.
Figure 8Chemotype enrichment and "scaffold-hoppings" on p38. The left figure visualizes the chemotype enrichment of the four optimal assignment methods, the MACCS keys, and the random performance on the p38 data set. A chemotype is considered as retrieved if one structure of the chemotype is ranked. The right figure shows five different chemotypes that were only retrieved by the 2SHA method ranking 25% of the data set.
Figure 9Rank correlation coefficient of the chemotype discovery between optimal assignment methods. The diagram illustrates all pairwise rank correlation coefficient of the order of the chemotype discovery between two optimal assignment methods. A high correlation indicates that the order of the chemotype discovery between two methods is similar. Each box plot was created using the correlation of the order of the chemotype discovery on each data set used in this study.
Figure 10Rank correlation coefficient of the chemotype discovery between optimal assignment methods and DOCK/MACCS keys. The boxplots show the correlation coefficients of the order of the chemotype discovery between the optimal assignment methods and DOCK as well as the MACCS keys. The experimental setup is equal to the previous correlation analysis between two optimal assignment methods.