Literature DB >> 17145705

BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities.

Tiqing Liu¹, Yuhmei Lin, Xin Wen, Robert N Jorissen, Michael K Gilson.

Abstract

BindingDB (http://www.bindingdb.org) is a publicly accessible database currently containing approximately 20,000 experimentally determined binding affinities of protein-ligand complexes, for 110 protein targets including isoforms and mutational variants, and approximately 11,000 small molecule ligands. The data are extracted from the scientific literature, data collection focusing on proteins that are drug-targets or candidate drug-targets and for which structural data are present in the Protein Data Bank. The BindingDB website supports a range of query types, including searches by chemical structure, substructure and similarity; protein sequence; ligand and protein names; affinity ranges and molecular weight. Data sets generated by BindingDB queries can be downloaded in the form of annotated SDfiles for further analysis, or used as the basis for virtual screening of a compound database uploaded by the user. The data in BindingDB are linked both to structural data in the PDB via PDB IDs and chemical and sequence searches, and to the literature in PubMed via PubMed IDs.

Entities: Disease Species

Mesh：

Substances：
Ligands
Proteins

Year: 2006 PMID： 17145705 PMCID： PMC1751547 DOI： 10.1093/nar/gkl999

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The early steps in a modern drug discovery project typically include identifying a biological macromolecule that plays a key role in a disease process, and seeking a low-molecular weight compound that inactivates this macromolecular target by binding it with high affinity. Ligand discovery involves a substantial component of trial and error, despite advances in computer-aided drug-design, so many binding data are generated for each target. Projects directed at ligand discovery therefore generate large quantities of binding data not only for drugs, but also for compounds that do not themselves become drugs. When published, these data become a valuable resource for scientists studying the same macromolecular target, and also for those seeking to develop improved computational models of molecular recognition. Currently, binding data are published almost exclusively via the scientific journals, which provide an indispensable archival service, are now available in electronic formats, and can be searched in useful ways. However, the journals also impose severe restrictions, as recently emphasized (1). For example, they provide no mechanism for accessing data in numerical form, querying according to chemical structure, downloading computer representations of chemical structure, publishing large datasets in any detail or navigating among binding, structural and sequence data. By providing these missing functionalities, especially to researchers in academia and in small companies who do not have access to the resources of the major pharmaceutical firms, a database of measured binding affinities should accelerate the discovery of targeted ligands. Potential applications of a binding database include: Analysis of ligands for a specific target to discover chemical features or pharmacophores that correlate with affinity. Development of quantitative structure–activity relationships. Interpretation of measured entropies and enthalpies of binding in the context of a receptor's 3D structure. Parameterization and validation of broadly applicable methods of ligand design. Identification of candidate lead compounds for a new drug target, by searching for ligands known to bind similar proteins. Identification of drug candidates with a high risk of side effects, by checking whether similar compounds bind multiple receptors. Elucidation of the mechanism of a biological effector molecule; e.g. if a naturally occurring compound inhibits cellular proliferation, a search of the database for chemically similar compounds may reveal that a similar compound binds a protein known to be involved in regulation of the cell cycle. A binding database also offers the possibility of publishing data that are not amenable to journal publication, such as very large data sets, and raw experimental data which can be useful in the assessment of data quality. BindingDB () was created to address these needs. It currently holds ∼20 000 measurements, making it one of the most extensive public databases of protein–ligand binding affinities, and it is continuing to grow. The present paper summarizes these data holdings as well as new website features and capabilities; basic technical aspects of BindingDB have been described previously (2–4).

CONTENTS OF THE DATABASE

Data collection currently focuses on targets whose three-dimensional structures are available in the Protein Data Bank (5,6) (PDB) or can be accurately modeled. Such data are of particular interest because they are amenable to structural analysis and are suitable for the development and validation of computational models of binding. Statistical sampling of the PDB in 2003 revealed that ∼150 of the non-redundant proteins therein were considered current or potential drug-targets (unpublished data) and were thus suitable for data collection by BindingDB. This analysis omits additional drug-targets whose structures could be built by comparative modeling. Restricting attention to proteins of known structure allows BindingDB to complement, rather than overlap, other binding databases collecting data for membrane proteins whose 3D structures are, in the main, unavailable; e.g. GPCRDB [ (7)], the IUPHAR receptor database () and GLIDA [ (8)]. Proteins are selected for data collection based upon their importance as drug-targets or model systems, as well as the availability of suitable data. Once a protein is selected, relevant scientific articles are identified and their data are extracted and deposited into BindingDB. Data from multiple laboratories and companies are sought in order to obtain a wide range of chemotypes for the targeted protein. The journals from which data are drawn include J. Med. Chem., Bioorg. Med. Chem. Lett. and Biochem. Web-accessible forms also allow direct deposition by experimentalists, but this route has not generated a significant number of entries. The majority of the data are based upon enzyme inhibition studies (>19 000 measurements), but a smaller number of data from the more informative method of isothermal titration calorimetry also are included (416 measurements). Each data entry includes detailed experimental conditions, such as solution composition, pH and temperature, because these can affect the measured affinities. BindingDB currently holds ∼20 000 binding data for ∼11 000 different small molecule ligands and 110 different drug-targets; or 74 targets when mutants and isoforms are not counted separately. Examples include anthrax lethal factor, various caspases and kinases and HIV protease and reverse transcriptase. Perhaps the most similar public effort is KiBank (9), which provides a sparser user-interface to a substantial data set of ∼16 000 K data for 5900 small molecule ligands and 50 protein targets, apparently including proteins for which no structural data are available. For a perspective on BindingDB's current data holdings, Figure 1 shows the number of binding measurements for various targets and target classes, and Figure 2 provides histograms of K and IC50 values, and of the molecular weights of the small molecules across all entries. Although structural data are available for every protein target included in BindingDB, BindingDB collects data for many ligands that are not represented in the PDB. For example, the PDB has ∼50 structures of acetylcholinesterases, while BindingDB has affinity data for acetylcholinesterase with ∼250 different ligands. More generally, ∼2% of ligands in BindingDB have an exact match in the PDB and ∼15% of ligands in BindingDB have 90% similarity to a ligand in the PDB based upon the search criterion of the PDB. Thus, BindingDB's data collection differs significantly from those of databases which only collect affinities for protein–ligand complexes in the PDB, notably BindingMOAD (10) which holds ∼1400 data, PDBBind (11,12) with ∼1600 data, and AffinDB (13) with ∼750 data.

Figure 1

Number of measurements in BindingDB for various targets and target classes.

Figure 2

Histograms of binding affinities (1 M standard concentration), and molecular weights of ligands in BindingDB.

Number of measurements in BindingDB for various targets and target classes. Histograms of binding affinities (1 M standard concentration), and molecular weights of ligands in BindingDB.

WEB INTERFACE: QUERY, DOWNLOAD AND VIRTUAL COMPOUND SCREENING

The BindingDB website provides an increasingly rich set of tools for query, analysis and download of binding data. Search capabilities include queries by target name; ligand name; affinity range; chemical structure, substructure and similarity; and target sequence, via BLAST (14). Query results are presented in a summary table, with the option to drill down to more detail on a given measurement. Available details include citation data, with links to PubMed and the option to retrieve all binding data from the same publication; sequence data and SMILES strings (15,16) and chemical structures. Hyperlinks to the PDB allow easy navigation to structural data for a given ligand, protein or complex. Additional tools also allow the user to build a ‘data set’ which can be downloaded in the form of an MDL SDfile containing chemical structures, target information and affinities. The website also provides web-accessible tools for virtual screening of candidate ligands; we are not aware of any other public website that provides this functionality. The user provides a training set of ligands active against a given target or class of targets, either by using queries to form a BindingDB data set, or by uploading an SDfile from disk. The user then uploads his or her own SDfile of candidate ligands, selects one of three machine-learning methods installed on the BindingDB server, and starts the calculation. The software returns a ranking of the user's candidate ligands, where the top-ranked compounds are most likely to share the activity of the training set of active compounds. The results can be downloaded in the form of an SDfile containing the score of each compound; optionally, the compounds in the SDfile can be ranked according to their scores. The three machine-learning methods are as follows.

Maximum similarity

JChem (17) chemical fingerprints are computed with default parameters for each active compound and for each candidate ligand. The software computes the Tanimoto similarity [see, e.g. (18)] of each candidate compound to each active, and ranks the candidate compounds according to their maximal similarity to any active.

Binary kernel discrimination

JChem chemical fingerprints are computed with default parameters for each active compound and for a set of decoy compounds that are presumed to be inactive. The decoy compounds can be supplied by the user, or BindingDB can supply a random set of drug-like compounds drawn from the Zinc compound database (19). The BKD method (20) is then trained on a subset of the known actives and decoys, and tested on the remainder of the actives and decoys. The results of the test are reported to the user in terms of the fold enrichment of the known actives among the top 2% and top 10% of the ranked test-set compounds. If a high degree of enrichment is obtained (e.g. 10-fold enrichment) then it is reasonable to screen the user's candidate ligands with the trained model. When the user uploads these compounds, JChem fingerprints are computed for them, and the compounds are scored and ranked. The scores of the candidate ligands can be compared with those of the test-set actives and decoys, which are also provided as part of the output.

Support vector machine

As for the BKD, a set of active compounds and a set of decoys is established. The user is then presented with a list of quantitative molecular descriptors that can be used for the screening process; a reasonable default set of these is suggested by the website in order to aid the user. Descriptors are computed for all the compounds with Molconn-Z (eduSoft LC), and the descriptor set is then refined to avoid using highly correlated, and therefore redundant, descriptors (21). The LibSVM software (22) is then trained with a subset of the actives and decoys, and applied to the remaining active and decoy compounds to generate training set and test-set rankings, as previously described (21). The quality of these results are reported as enrichment factors, as for the BKD, and the user can then upload an SDfile of compounds to be ranked with the trained SVM model. Maximum similarity is the fastest of the three methods and thus may be most convenient for very large screening sets. The BKD method is slower, but can recover more diverse actives. The SVM method also is slower than maximum similarity, but is arguably the best at finding actives that differ significantly from the known actives used to train the algorithm.

AVAILABILITY AND CITATION

BindingDB is freely accessible at , and also may be accessed by following links from compounds at PubChem. To download SDfiles, users must complete a simple registration process and agree not to republish the data without explicit permission. Users are invited to contact us through the ‘Email us’ link and to participate in the user-forum at . Suggestions regarding data sets to be extracted and deposited in BindingDB, and for web site features, are welcomed. Works using BindingDB should cite references (2–4) in this paper.

18 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures.

Authors: Renxiao Wang; Xueliang Fang; Yipin Lu; Shaomeng Wang
Journal: J Med Chem Date: 2004-06-03 Impact factor: 7.446

3. ZINC--a free database of commercially available compounds for virtual screening.

Authors: John J Irwin; Brian K Shoichet
Journal: J Chem Inf Model Date: 2005 Jan-Feb Impact factor: 4.956

4. The PDBbind database: methodologies and updates.

Authors: Renxiao Wang; Xueliang Fang; Yipin Lu; Chao-Yie Yang; Shaomeng Wang
Journal: J Med Chem Date: 2005-06-16 Impact factor: 7.446

5. Virtual screening of molecular databases using a support vector machine.

Authors: Robert N Jorissen; Michael K Gilson
Journal: J Chem Inf Model Date: 2005 May-Jun Impact factor: 4.956

6. Binding MOAD (Mother Of All Databases).

Authors: Liegi Hu; Mark L Benson; Richard D Smith; Michael G Lerner; Heather A Carlson
Journal: Proteins Date: 2005-08-15

7. GPCRDB: an information system for G protein-coupled receptors.

Authors: F Horn; J Weare; M W Beukers; S Hörsch; A Bairoch; W Chen; O Edvardsen; F Campagne; G Vriend
Journal: Nucleic Acids Res Date: 1998-01-01 Impact factor: 16.971

8. Will a biological database be different from a biological journal?

Authors: Philip Bourne
Journal: PLoS Comput Biol Date: 2005-08 Impact factor: 4.475

9. GLIDA: GPCR-ligand database for chemical genomic drug discovery.

Authors: Yasushi Okuno; Jiyoon Yang; Kei Taneishi; Hiroaki Yabuuchi; Gozoh Tsujimoto
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

452 in total

1. A collaborative environment for developing and validating predictive tools for protein biophysical characteristics.

Authors: Michael A Johnston; Damien Farrell; Jens Erik Nielsen
Journal: J Comput Aided Mol Des Date: 2012-04-04 Impact factor: 3.686

Review 2. PubChem as a public resource for drug discovery.

Authors: Qingliang Li; Tiejun Cheng; Yanli Wang; Stephen H Bryant
Journal: Drug Discov Today Date: 2010-10-21 Impact factor: 7.851

3. Discriminating of HMG-CoA reductase inhibitors and decoys using self-organizing maps.

Authors: Zhi Wang; Aixia Yan
Journal: Mol Divers Date: 2010-11-12 Impact factor: 2.943

Review 4. Thermodynamics of protein-ligand interactions as a reference for computational analysis: how to assess accuracy, reliability and relevance of experimental data.

Authors: Stefan G Krimmer; Gerhard Klebe
Journal: J Comput Aided Mol Des Date: 2015-09-16 Impact factor: 3.686

5. In silico exploration of c-KIT inhibitors by pharmaco-informatics methodology: pharmacophore modeling, 3D QSAR, docking studies, and virtual screening.

Authors: Prashant Chaudhari; Sanjay Bari
Journal: Mol Divers Date: 2015-09-28 Impact factor: 2.943

6. PDID: database of molecular-level putative protein-drug interactions in the structural human proteome.

Authors: Chen Wang; Gang Hu; Kui Wang; Michal Brylinski; Lei Xie; Lukasz Kurgan
Journal: Bioinformatics Date: 2015-10-26 Impact factor: 6.937

7. Repurposing of Proton Pump Inhibitors as first identified small molecule inhibitors of endo-β-N-acetylglucosaminidase (ENGase) for the treatment of NGLY1 deficiency, a rare genetic disease.

Authors: Yiling Bi; Matthew Might; Hariprasad Vankayalapati; Balagurunathan Kuberan
Journal: Bioorg Med Chem Lett Date: 2017-05-05 Impact factor: 2.823

8. D3R grand challenge 2015: Evaluation of protein-ligand pose and affinity predictions.

Authors: Symon Gathiaka; Shuai Liu; Michael Chiu; Huanwang Yang; Jeanne A Stuckey; You Na Kang; Jim Delproposto; Ginger Kubish; James B Dunbar; Heather A Carlson; Stephen K Burley; W Patrick Walters; Rommie E Amaro; Victoria A Feher; Michael K Gilson
Journal: J Comput Aided Mol Des Date: 2016-09-30 Impact factor: 3.686

9. Structure-based predictions of activity cliffs.

Authors: Jarmila Husby; Giovanni Bottegoni; Irina Kufareva; Ruben Abagyan; Andrea Cavalli
Journal: J Chem Inf Model Date: 2015-05-11 Impact factor: 4.956

10. Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

Authors: Jie Xia; Ermias Lemma Tilahun; Terry-Elinor Reid; Liangren Zhang; Xiang Simon Wang
Journal: Methods Date: 2014-12-03 Impact factor: 3.608