| Literature DB >> 29416509 |
Manon Réau1, Florent Langenfeld1, Jean-François Zagury1, Nathalie Lagarde1, Matthieu Montes1.
Abstract
Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets.Entities:
Keywords: benchmarking; benchmarking databases; decoy; ligand-based drug design; structure-based drug design; virtual screening
Year: 2018 PMID: 29416509 PMCID: PMC5787549 DOI: 10.3389/fphar.2018.00011
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Figure 1Decoys selection in MUV and DEKOIS 2.0. (A) For each active of the MUV, a distance to the 500th nearest neighbor from 100 random samples from multiple drug-like compounds collections was computed. The 90th percentile was recorded as the confidence distance for a good embedding (d). Active compounds were accepted only if the 500th nearest neighbor from the decoy compounds (d) set was within the d. (B) Selected active compounds datasets from the MUV were adjusted to the same level of spread (ΣG ≈ constant), and decoy compounds sets were, in turn, adjusted to this level of spread (ΣF ≈ ΣG). (C) The chemical space of both active and decoy compounds was divided into cells characterized by a set of 8 physicochemical properties. Each user-provided compound is associated with its property matching cell, and 1,500 decoys are selected from the same cell, or direct neighboring cells if the parent cell is not populated enough.
Figure 2Example of Bemis-Murcko atomic frameworks clustering of Protein kinase C beta type (KPCB) ligands from the DUD-E.
| Rognan's decoy set (Bissantz et al., | 2000 | Literature | ACD | 2/2 | Random selection | Design of decoy sets to evaluate the performance of 3 docking programs and 7 scoring functions | |
| Shoichet's decoy set (McGovern and Shoichet, | 2003 | – | MDDR | MDDR | 9/4 | Remove compounds with unwanted functional groups | Compare VS performance depending on the binding site definition (apo, holo or homology modeled structures) |
| Li's decoy set (Diller and Li, | 2003 | – | Literature | MDDR | 6/1 | Fit polarity and MW to known kinases inhibitors | Compare decoys and ligands physicochemical properties to select decoys |
| Jain's decoy set (Jain and Nicholls, | 2006 | PDBbind | ZINC “drug-like” and Rognan's decoy set | 34/7 | 1,000 random molecules from the ZINC that comply to MW ≤ 500, logP ≤ 5, HBA ≤ 10, HBD ≤ 5 and RB ≤ 12 and Rognan's decoys with RB ≤ 15 | Use of 5 physicochemical properties to match decoy sets to ligands sets | |
| Directory of Useful Decoys (DUD) (Huang et al., | 2006 | Literature and PDBbind | ZINC “drug-like” | 40/6 | Decoys must be Lipinski-compliant. The selection is based on both the topologically dissimilarity to ligands and the fit of physicochemical properties | Largest decoy data set so far (40 proteins) and first attempt to select decoys topologically dissimilar decoys | |
| DUD Clusters (Meyer, | 2008 | DUD | – | 40/6 | – | DUD clusters more relevant for scaffold hopping | |
| WOMBAT Datasets (Meyer, | 2007 | WOMBAT | – | 13/4 | – | Design to decrease the analog bias on 13 of the 40 DUD targets, enrich DUD active data sets with compounds from WOMBAT database | |
| Maximum Unbiased Validation (MUV) (Rohrer and Baumann, | 2009 | – | PubChem | PubChem | 18/7 | Two functions measure the active-active and decoy-active distances using 2D chemical descriptors. Actives with the maximum spread within the active set were chosen and decoys with similar spatial distribution were selected | Ligands and decoys are from biologically actives and inactive compounds, i.e., are true actives and inactives, respectively |
| DUD LIB | 2009 | DUD-cluster | DUD | 13/4 | Subset of the DUD database, with more stringent criteria on MW (≤450) and AlogP (≤4,5), and a minimal number of chemotypes | Initially designed for “scaffold-hopping” studies | |
| Charge Matched DUD | 2010 | DUD | ZINC | 40/6 | Apply a net charge property match on DUD datasets | ||
| REPROVIS-DB | 2011 | – | Literature | Literature | – | Extracted from previous successful studies | Designed for LBVS only |
| Virtual Decoy sets (VDS) (Wallach and Lilien, | 2011 | DUD | ZINC | 40/6 | Same as DUD, but does not consider synthetic feasibility | Purely virtual decoys, availability is not considered | |
| DEKOIS (Vogel et al., | 2011 | BindingDB | ZINC | 40/6 | Class decoys and ligands into “cells” based on 6 physicochemical properties and select the closest decoys based on (1) a weighted physicochemical similarity and (2) a LADS score based on functional fingerprints similarity elaborated from the active set | Original treatment of the physicochemical similarity, and introduce the concept of | |
| GPCR Ligand (GLL)/Decoys Database (GDD) (Xia et al., | 2012 | GLIDA and PDB structures and Vilar et al., | ZINC | 147/1 | Physico-chemical properties fit and topological dissimilarity filter. Final selection based on MW | First extensive database targeting a specific protein family | |
| Decoy Finder (Cereto-Massagué et al., | 2012 | User | User | – | Same as DUD | Graphical tool to generate decoy data sets with adaptable thresholds for physicochemical properties | |
| DUD Enhanced (DUD-E) (Mysinger et al., | 2012 | CHEMBL | ZINC | 102/8 | Physico-chemical properties fit along with a topological dissimilarity filter. Random selection of decoys is then applied | Largest database so far (1,420,433 decoys and 66,695 actives) | |
| DEKOIS 2.0 (Ibrahim et al., | 2013 | BindingDB | ZINC | 81/11 | Same as DEKOIS with 3 additional physicochemical properties (nFC, nPC, Ar), a PAINS filter and an improved, weighted LADS score | ||
| NRLiSt BDB (Lagarde et al., | 2014 | CHEMBL | ZINC and DUD-E decoys generator | 27/1 | Use the DUD-E decoy generation tool | Ligands can be either agonists or antagonists (other actives are removed), depending on the purpose of the study | |
| MUBD-HDACs (Xia et al., | 2015 | – | CHEMBL and literature | ZINC | 14/1 | Select decoys based a weighted physicochemical similarity (6 physicochemical properties are considered), and ensure a random spatial distribution of the decoys (i.e., decoys should be as distant to the other actives as a reference ligand) | Applicable both to SBVS and LBVS strategies, uses ligands with proved bioactivity |