| Literature DB >> 24877883 |
Fabio Broccatelli1, Nathan Brown.
Abstract
Virtual screening with docking is an integral component of drug design, particularly during hit finding phases. While successful prospective studies of virtual screening exist, it remains a significant challenge to identify best practices a priori due to the many factors that influence the final outcome, including targets, data sets, software, metrics, and expert knowledge of the users. This study investigates the extent to which ligand-based methods can be applied to improve structure-based methods. The use of ligand-based methods to modulate the number of hits identified using the protein-ligand complex and also the diversity of these hits from the crystallographic ligand is discussed. In this study, 40 CDK2 ligand complexes were used together with two external data sets containing both actives and inactives from GlaxoSmithKline (GSK) and actives and decoys from the Directory of Useful Decoys (DUD). Results show how ligand-based modeling can be used to select a more appropriate protein conformation for docking, as well as to assess the reliability of the docking experiment. The time gained by reducing the pool of virtual screening candidates via ligand-based similarity can be invested in more accurate docking procedures, as well as in downstream labor-intensive approaches (e.g., visual inspection) maximizing the use of the chemical and biological information available. This provides a framework for molecular modeling scientists that are involved in initiating virtual screening campaigns with practical advice to make best use of the information available to them.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24877883 PMCID: PMC4068864 DOI: 10.1021/ci5001604
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Figure 1Docking success rate for three Glide protocols: HTVS, SP, and XP. Docking pose prediction is considered correct if the RMSD from the crystallographic ligand is below 2 Å.
Figure 2Distribution of the relative percentage of the accurate docking experiments with respect to the docking score interval (upper plot). Variation of the docking success rate with respect to the docking score interval (lower plot).
Figure 3Box plot describing the distributions of the RMSDs of predicted binding modes from crystallographic ligands using different docking approaches and protocols. The box is delimited by the 25th and the 75th percentile values; the line within the box represents the median. The whiskers represent maximum and minimum values for non-outliers. Data points are defined as outliers (green dots) if their distance from the 75th percentile is over 1.5 times the interquartile range (distance between the 25th and 75th percentile).
Figure 4Screening success for different virtual screening campaigns using the DUD and the GSK data sets. For the DUD data sets, both the AUC (representing the ability to distinguish actives from inactives) and the BEDROC at α = 20 (representing the ability to associate the highest scores to active molecules) are considered due to the high ligand-to-decoy ratio. Methods other than “Multiple” are only aware of either a single protein structure or a single crystallographic ligand.
Performances of Virtual Screening Strategies Aware of 40 CDK2 PDB Structuresa
| row | score | AUC GSK | CCR GSK | AUC DUD | BEDROC20 DUD |
|---|---|---|---|---|---|
| 1 | Ens HTVS | 0.95 | 0.92 | 0.81 | 0.53 |
| 2 | Ens ROCS_TC | 0.95 | 0.89 | 0.78 | 0.43 |
| 3 | Ens ECFP_4 | 0.91 | 0.88 | 0.69 | 0.43 |
| 4 | HTVS (ROCS_TC) | 0.93 | 0.87 | 0.73 | 0.45 |
| 5 | HTVS (ECFP_4) | 0.91 | 0.89 | 0.72 | 0.47 |
| 6 | Z2 HTVS-ROCS_TC-ECFP_4 | 0.98 | 0.93 | 0.84 | 0.54 |
| 7 | Z2 HTVS(ECFP_4)-ROCS_TC-ECFP_4 | 0.97 | 0.92 | 0.80 | 0.51 |
AUC is reported for the DUD and the GSK data sets. The average between sensitivity and specificity (CCR) is reported for the GSK data sets, for which inactives are available. The BEDROC20 is reported for the DUD data set due to the high ligand to decoy ratio. “Ens” is the abbreviation of ensemble, where the best score is selected after screening against all the crystallographic structures available. HTVS ROCS_TC and HTVS ECFP_4 identify HTVS docking experiments based on a protein structure selected by means of ligand-based similarity. Z2 identify data fusion approaches based on ensemble HTVS docking, ensemble ROCS Tanimoto Combo, and ensemble ECFP_4 similarity. HTVS docking on a single protein structure was selected using ECFP_4 similarity, ensemble ROCS Tanimoto Combo, and ensemble ECFP_4.
Figure 5Relative percentage of actives and decoys (DUD data set, upper plot) and actives and inactives (GSK data set, lower plot) in different ECFP_4 and HTVS Glide docking score ranges.