| Literature DB >> 30288626 |
Anita Rácz1, Dávid Bajusz2, Károly Héberger1.
Abstract
BACKGROUND: Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied.Entities:
Keywords: ANOVA; Binary fingerprints; FPKit; Interaction fingerprint; SRD; Similarity metrics; Virtual screening
Year: 2018 PMID: 30288626 PMCID: PMC6755604 DOI: 10.1186/s13321-018-0302-y
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Summary of the bit definitions of the modified SIFt implemented in the Schrödinger Suite and applied in this work
| Abbreviation | Short definition | Description |
|---|---|---|
| Any | Any contact | A ligand atom is within the required distance of a receptor atom |
| BB | Backbone interaction | A ligand atom is within the required distance of a receptor backbone atom |
| SC | Sidechain interaction | A ligand atom is within the required distance of a receptor side chain atom |
| Pol | Polar residues | A ligand atom is within the required distance of an atom in a polar residue of the receptor (ARG, ASP, GLU, HIS, ASN, GLN, LYS, SER, THR, ARN, ASH, GLH, HID, HIE, LYN) |
| Hyd | Hydrophobic residues | A ligand atom is within the required distance of an atom in a hydrophobic residue of the receptor (PHE, LEU, ILE, TYR, TRP, VAL, MET, PRO, CYS, ALA, CYX) |
| HBA | Hydrogen bond acceptor | The ligand forms a hydrogen bond with an acceptor in a receptor residue |
| HBD | Hydrogen bond donor | The ligand forms a hydrogen bond with a donor in a receptor residue |
| Aro | Aromatic residue | A ligand atom is within the required distance of an atom in an aromatic residue of the receptor (PHE, TYR, TRP, TYO) |
| Chg | Charged residue | A ligand atom is within the required distance of an atom in a charged residue of the receptor (ARG, ASP, GLU, LYS, HIP, CYT, SRO, TYO, THO) |
Confusion matrix for a pair of interaction fingerprints, containing the frequencies of common on bits (a), common off bits (d), and exclusive on bits for Complex 1 (b) and Complex 2 (c)
| Complex 2 | ||
|---|---|---|
| 1 (interaction present) | 0 (interaction absent) | |
|
| ||
| 1 (interaction present) |
|
|
| 0 (interaction absent) |
|
|
Summary of the applied protein targets and ligand sets
| Short name | Name | Uniprot | Protein family | PDB code | No. actives | No. inactives | |
|---|---|---|---|---|---|---|---|
| 1 | ACE | Angiotensin-converting enzyme | P12821 | Hydrolase | 4CA5 | 49 | 1727 |
| 2 | ACHE | Acetylcholine esterase | P22303 | Hydrolase | 4M0F | 105 | 3708 |
| 3 | ALR2 | Aldose reductase | P15121 | Oxidoreductase | 4XZH | 26 | 917 |
| 4 | AR | Androgen receptor agonists | P10275 | Transcription factor | 4OEA | 64 | 2234 |
| 5 | CDK2 | Cyclin dependent kinase 2 | P24941 | Protein kinase | 1AQ1 | 48 | 1763 |
| 6 | COMT | Catechol O-methyltransferase | P21964 | Transferase | 3BWM | 11 | 428 |
| 7 | ER | Estrogen receptor antagonists | P03372 | Nuclear receptor | 3ERT | 39 | 1388 |
| 8 | PARP | Poly(ADP-ribose) polymerase | P09874 | Transferase | 4PJT | 33 | 1175 |
| 9 | SRC | Tyrosine kinase SRC | P12931 | Protein kinase | 2H8H | 155 | 5784 |
| 10 | VEGFr2 | Vascular endothelial growth factor receptor kinase | P35968 | Transferase | 3VHE | 71 | 2617 |
Fig. 1a Docked complex of a small-molecule virtual hit (green sticks) to JAK2 [16]. Potentially interacting residues in the vicinity of the ligand are highlighted in red. b Excerpt from the interaction fingerprint of the docked complex. Interacting residues are highlighted in red, while non-interacting residues are represented as gray blocks. Inside the red blocks, those interactions are grayed out that cannot be established by definition. c Short definition of the SIFt filtering rules implemented in this work. Residue-based filtering (RES) omits any residue that is found to be consistently non-interacting across the whole docked dataset. Interaction-based filtering (INTS) additionally omits any individual interaction that is not established even once across the whole dataset. The latter includes (but is not restricted to) those interactions that cannot be established by definition (grayed-out interactions inside red blocks); for example the “Aromatic” bit will be 0 for any residue that lacks an aromatic ring
Fig. 2Workflow of the input matrix generation and the complete protocol of the study
Fig. 3Factorial ANOVA with the use of the protein targets and the similarity measures as factors. (AUC values are plotted against the similarity metrics.) The protein targets (with PDB codes) are marked with different colors and marks on the plot. Average values (dots) and 95% confidence intervals (lines) are shown in each case
Fig. 4Factorial ANOVA with the use of scaling and similarity metrics as factors. Normalized SRD values [%] are plotted against the similarity metrics. The different scaling methods are marked with different symbols and lines. (RGS: range scaling, RANK: rank transformation, AUTO: autoscaling.)
Fig. 5Factorial ANOVA with the similarity measures as the factor. Average values are marked with blue dots and the blue lines below and above the dots denote 95% confidence intervals. Normalized SRD values [%] are plotted against the similarity measures. The red dashed lines are arbitrary thresholds defined to select the best few metrics, and to identify the region with the less consistent similarity measures
Fig. 6Factorial ANOVA with the bit selection and the filtering rule as dependent factors. SRD values [%] are plotted against the bit selection options. Interaction based filtering (INTS) is marked with a blue dotted line, no filtering (NO) is marked with a red continuous line and residue based filtering (RES) is marked with a green dashed line
Fig. 7The result of ANOVA analysis with metricity (a) and symmetricity (b) as factors. SRD values [%] are plotted against the different groups of similarity measures. Average values are plotted and the 95% confidence intervals are indicated with whiskers