| Literature DB >> 23617227 |
James B Dunbar1, Richard D Smith, Kelly L Damm-Ganamet, Aqeel Ahmed, Emilio Xavier Esposito, James Delproposto, Krishnapriya Chinnaswamy, You-Na Kang, Ginger Kubish, Jason E Gestwicki, Jeanne A Stuckey, Heather A Carlson.
Abstract
A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose ( www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3-4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a). This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23617227 PMCID: PMC3753885 DOI: 10.1021/ci4000486
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
CSAR Criteria for High Quality Crystal Structures
| - Overall Rmerge ≤ 0.1 (highest resolution bin ≤ 0.4) |
| - Resolution 2.5 Å or better |
| - Signal to noise ratio ≥ 2 for 50% or more of reflections in highest resolution bin (preference I/σI ≥ 3) |
| - Completeness of data ≥ 90% (highest resolution bin ≥ 50%) |
| - Redundancy ≥ 2 (low symmetry) |
| - Redundancy ≥ 3 (high symmetry) |
| - Rfree–Rwork ≤ 5% |
| - Molprobity: protein structure |
| Poor rotamers < 1% |
| Ramachandran outliers < 0.2% |
| Residues with bad bond angles 0% |
| Residues with bad angles 0% |
| Clashscore ≤ 5 |
| - Whatcheck |
| RMS Z scores near 1.0 |
| Torsions |
| B-factor distribution |
| Bonds and angles |
| - Parvarti[ |
| Bonds linking atomic displacement parameters (TLS) have correlation coefficient > 0.92 |
| - No ring puckers |
| - No eclipsed hydrogens |
| - Real Space R ≤ 0.2 |
| - Real space correlation coefficient ≥ 0.9 |
| - ≥ 90% of compound atoms in 2Fo–Fc density |
| - No large unexplained density within 5 Å of compound |
| - No severe clashes between reduced amino acids |
| - No symmetry related atoms within 5 Å of compound atom |
| - No ambiguously fitted compounds: all alternate conformations clearly defined by density |
| - No more than two alternate conformations of compound may coexist |
| - Structures created using SMILES
input to grade (Global Phasing,
Inc.[ |
Figure 1Selection method utilizing recursive partitioning and coupled multiple distribution analysis in JMP[31].
2012 Release Data Set Summarya
Targets in yellow were used in the 2012 Exercise. Numbers in parentheses indicate the number included in the 2012 Exercise.
Figure 2Representative ligands in the 2012 release data set.
Figure 3Percentage frequencies of RMSDs (Å) between decoy poses and the native crystal pose for the different targets. The frequencies are based on all the structures available for each target.
Figure 4Percentage frequencies of RMSDs (Å) between decoy poses themselves for the different targets. The frequencies are based on all the structures available for each target.
Figure 5Representative of the different targets (green) with the native bound pose (red) and the 200 decoy poses (gray). The representative protein–ligand complexes are CDK2-CS12, CDK2-CyclinA-CS260, CHK1-70, ERK2-000075, LpxC-CS252, and Urokinase-15 (second term is the ligand number in the data set).
Figure 6Multivariate analysis of the CDK2 pKd data in JMP:[31] r, ρ, τ.