Literature DB >> 16381925

AffinDB: a freely accessible database of affinities for protein-ligand complexes from the PDB.

Peter Block¹, Christoph A Sotriffer, Ingo Dramburg, Gerhard Klebe.

Abstract

AffinDB is a database of affinity data for structurally resolved protein-ligand complexes from the Protein Data Bank (PDB). It is freely accessible at http://www.agklebe.de/affinity. Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein-ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH value of the measurement, ligand molecular weight, and publication data (author, journal and year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design.

Entities: Chemical Disease Gene

Mesh：

Substances：
Ligands
Proteins

Year: 2006 PMID： 16381925 PMCID： PMC1347402 DOI： 10.1093/nar/gkj039

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Understanding the energetics of biomolecular recognition is of paramount importance for a large variety of biomedical and biotechnological disciplines. One of the most prominent examples is given by structure-based drug design where the 3D structure of a target macromolecule (most frequently a protein) is used to identify, design or optimize small-molecule ligands, which bind tightly to the target. Obviously, such design can only be successful if the structural requirements for energetically favorable interactions and high-affinity binding are known. Much of the current knowledge has been gained from comparative analyses of different complex structures and their affinities (1,2). These analyses, however, were normally restricted to rather small sets of data, and the understanding of protein–ligand recognition is still far from being complete, as illustrated by the recurring surprises during projects of molecular design (3–6). Clearly, more data are instrumental to increase the knowledge about protein–ligand interactions and to improve not only the qualitative understanding but also the quantitative tools for estimating affinities from complex structures, such as empirical, regression-based scoring functions (7–12). Structural data of protein–ligand complexes are available to a large and rapidly increasing extent through the Protein Data Bank (PDB) (13). This database, however, is a general resource for biomacromolecular structures that had not particularly been designed for protein–ligand complexes. Accordingly, secondary databases, such as Relibase (14,15) and PDBsum (16,17), have been developed which provide more convenient access to specific information about protein–ligand complexes (e.g. search functions for ligand structures and analysis tools for interaction patterns in Relibase). Unfortunately, neither the secondary structural databases nor the PDB contain any information about the binding energetics of the corresponding complex, since this information is not required to be included upon submission of structural coordinates to the PDB. However, some databases exist that collect binding data for enzymes, receptors or protein–ligand complexes in general, such as BindingDB (18) and KiBank (19), but these, in turn, are not limited to complexes with available structure and do not provide a direct link to the available 3D structure of a given complex with measured affinity. Given the obvious need for databases that establish the missing link between structural information from the PDB and the rather sparse and widely distributed affinity data, we started to develop AffinDB, a database of affinity values collected from the scientific literature for protein–ligand complexes of known structure. Originally intended as a simple tabular collection of affinity values related to PDB codes for in-house use only, the project has grown over time, both with respect to data content and database management, such that it has ultimately been made available to the public as a potentially valuable new resource, despite the recent appearance of other databases of similar scope, most notably PDBbind (20,21) and Binding MOAD (22). In the following, we briefly describe the database architecture and content of AffinDB, as well as the data collection procedure, give a succinct introduction to possibilities for accessing the data through the user interface, and discuss differences and similarities to other databases.

METHODS

Database architecture

AffinDB is based on a MySQL (4.0.24) backend machine. The web interface is written in PHP (4.3.10). The database is designed to provide supplementary information for PDB structures of protein–ligand complexes. Accordingly, AffinDB is structured by PDB code, which is the primary reference for all the data. Basic PDB meta-information about the protein is available for every PDB structure, basic ligand information is provided for ligands with more than five non-hydrogen atoms. Ligand entries of complexes are stored only once in AffinDB, i.e. in case of multiple occurrences of the same ligand in different structures, a pointer to the reference ligand molecule is used. Affinity data and related information are always associated with a specific ligand of a specific PDB structure.

Data collection and database content

The database core is constituted by basic meta-information about all PDB structures. To obtain these data, a helper-application was generated with a Python-based Relibase toolkit and the data were retrieved from Relibase+ (14,15). A further preprocessing step served to store only ligands with more than five non-hydrogen atoms in AffinDB, using a unique and consistently created name for these molecules. The PDB meta-information provided for every entry includes the name of the protein or protein class (as given in the header information of the PDB file), the EC number (for enzymes), the protein source, the resolution of the crystal structure and the names of the authors who determined the structure. In addition, for each PDB code links to the following external databases were added: PDB (13), Relibase (14,15), MSD (23), SCOP (24) and PDBsum (16,17). The ligand entries consist of the chemical name (as provided in the PDB file), the molecular weight and the SMILES code (25) as basic information. In addition, a 2D molecule drawing of the ligand structure is included. These drawings are generated for every unique ligand with Marvin (3.5.7) (). Babel (1.6) (), Corina (3.1) (26), and in-house software was used to harmonize the format of the ligands in order to obtain best results from Marvin. This automated procedure provided correct drawings for most of the ligands. A small proportion which could either not be drawn by Marvin or gave distorted pictures had to be post-processed by hand. Titratable functional groups are always shown in their neutral state, independent of any actual protonation state. The protein and ligand information described so far is shown by AffinDB regardless whether affinity data are already available for the PDB entry or not. The main purpose of AffinDB, however, is to provide affinity information. Affinity data are exclusively extracted from the scientific literature. Both ‘primary’ and ‘secondary’ references are taken into account. A primary reference is a paper describing the original work of the affinity measurement for the corresponding protein–ligand complex. A secondary reference, instead, is any other paper that reports an affinity value for a PDB complex; this may include publications with compilations of affinity data for the development of scoring functions or similar purposes. In a secondary paper, the affinity value for a PDB complex is often only cited, without specifying further experimental details. So far, more than 740 affinity values covering over 470 PDB complexes could be collected and stored in AffinDB (cf. Discussion). The input of affinity data into AffinDB is implemented in the form of a wizard. After entering the desired PDB code, AffinDB provides a list of all the ligands of the PDB entry in combination with a 2D molecule drawing and a hint whether affinity information is already stored for the given ligand. After choosing the desired ligand, the user is requested to enter the affinity information. Upon submission of the data, simple checks of the data integrity are performed and the entry is flagged for review by the database curators. Only after a database curator has checked these data, they are released for public access in AffinDB. The binding affinity is thermodynamically quantified as a free energy of binding ΔGbind or as equilibrium constant (for association: Ka; for dissociation: Kd) for the reversible equilibrium reaction between protein P and ligand L to form the protein–ligand complex PL: P + L ↔ PL. ΔGbind and the equilibrium constants are related by the equation: ΔGbind = −RT ln Ka = RT ln Kd, where T is the temperature (in Kelvin) and R is the ideal gas constant (8.314 Jmol−1 K−1). For enzyme inhibitors, affinities are more frequently quantified in terms of parameters derived from kinetic assays. This may either be the inhibition constant Ki (which to a first approximation may be considered as a Kd for the enzyme–inhibitor complex, thus ΔGbind = RT ln Ki) or the IC50 value, which is the inhibitor concentration leading to 50% inhibition of the enzymatic activity. In AffinDB, the affinity value is stored in the same form as published in the specified reference, i.e. without any conversion of type or unit. If available, experimental uncertainties or error margins are saved as well. Along with the affinity value itself, also the experimental method and conditions of the affinity measurement are stored in AffinDB, if specified in the corresponding reference. This information is provided to the user through a separate ‘affinity information window’ (Figure 1), which can be opened by activating the ‘Details’ link next to the affinity entry in the main window (Figure 2). The method by which the affinity was determined is characterized by a keyword or a brief statement. Temperature and pH value at which the measurements were carried out are stored separately. In addition, the buffer and any other significant reagents or additives present in the solution are reported. For the literature reference itself, the name of the first author, the title of the journal, as well as volume, year and first page of the publication are stored. The reference is linked to the corresponding PubMed entry, which provides direct access to the abstract. A flag indicates whether the reference is of primary or secondary type. Finally, comments and additional valuable information regarding the method, the reference, the structure or the affinity value itself are saved in a separate data field.

Figure 1

Affinity information window for one of the affinity entries for PDB complex 1FLR.

Figure 2

Main window showing a PDB entry with affinity data in AffinDB. PDB complex 1FLR is used as an example. Five different affinity values measured in different studies and under different conditions are available for this complex. The left navigation bar provides fast access to all functionalities of AffinDB.

Database access

The database is freely accessible at . Data can be retrieved via the PDB code, by defining specific search queries using the affinity search form, or simply by browsing. Upon specifying a PDB code in the data entry field on the left navigation bar of the main window (cf. Figure 2), the summary information for the PDB entry is shown. If an affinity value for the ligand is available in the database, it is displayed below the ligand structure, along with the first author and the year of the publication which reports this value. If additional affinity values are available from other references, these are displayed as well, each in a separate line (cf. Figure 2). Further details can be requested for each affinity entry. Searching for a specific PDB entry with the affinity search form (accessible through the ‘Search’ link in the left navigation bar) yields only a result if an affinity value is already associated with the corresponding PDB entry. In the affinity search form, a variety of queries for affinity data and related information can be defined. It is possible to search for affinities of a certain range of magnitude and for measurements carried out at a specific temperature and/or pH range. Affinities for certain enzyme classes or PDB codes may be retrieved, as well as affinities for ligands of a certain molecular weight range. Also the affinity values published by a certain author or within a specified time frame can be requested, and the retrieved affinity values may be limited to those obtained from primary literature sources. AffinDB generates tabular reports for displaying affinity search results and for browsing through the database (Figure 3). The format of the tables consists of six columns providing the drawing of the ligand structure; the PDB code (linked to the summary information for the PDB entry), the affinity value in the originally reported form as well as converted to the negative base-10 logarithm [i.e. as pKi, pKd, pIC50 (relative to the standard concentration of 1 mol/l)]; the pH value of the measurement, the first author of the publication (with a link to the PubMed entry); and the year of the publication. Tables reporting search results can be saved as ‘csv’ file, which is an ASCII file with semicolon-separated columns and one affinity entry per line.

Figure 3

Tabular report, showing part of the search results produced by a query for affinity data published by a specific author (‘Kurinov’) in a primary reference.

DISCUSSION

AffinDB has been designed to provide fast and easy access to affinity data. The popular MySQL backend was chosen as database machine, since MySQL offers a speed-optimized SQL engine. Using the scripting language PHP, special care was taken to generate a clearly structured layout which enables fast and easy navigation. Since all the data can be accessed and retrieved directly via the web browser, the user does not have to install any special software to work with AffinDB. Using current PC hardware (CPU: Athlon XP 2400+), AffinDB executes search queries in <0.1 s, fairly independent of the complexity of the query. The representation of the tabular report including the 2D molecule drawings needs between 0.1 and 5 s, depending on the number of hits (5 s if all entries are retrieved). The representation of a PDB entry takes up to 0.2 s, depending on the number of affinity data available for that entry. These values reflect only a server-side benchmarking. Obviously, the real speed also depends on the client-side hardware, the Internet connection and the browser. Data collection for AffinDB is a very time-consuming process which can hardly be automated since scientifically educated readers are required to critically extract the relevant data from the scientific literature. In contrast to other databases (cf. below), we decided to include all affinity data found during literature research for a given PDB complex. Multiple affinity entries may, thus, be available for certain structures. These may reflect measurements with different methods or under different experimental conditions (e.g. PDB 1FLR; cf. Figure 1), or it may be due to additional reports from secondary references, which allows the user to trace back in which context the corresponding complex and its affinity have already been used. Only purely redundant data are not included (e.g. if the value is reported in the same paper as Kd and ΔG derived thereof). The current coverage of more than 470 PDB structures derives from a priority selection made upon constructing the database. The initial basis was formed by compilations of affinity values from secondary references concerning empirical scoring functions. Owing to discrepancies among some of the values and to obtain more detailed information, primary references were also retrieved for part of this initial set. Subsequently, the database was augmented by seeking affinity data for PDB complexes of different datasets, such as a docking test set of validated structures (27) or datasets for certain target classes (e.g. carbonic anhydrases and trypsin-like serine proteases). Furthermore, published data from our own laboratory were also directly included. AffinDB is a valuable resource for anyone interested in correlating structural data with binding energetics and complements other databases of similar subject, specifically the Ligand–Protein Database LPDB (28), the Protein–Ligand Database PLD (29), PDBbind (20,21), and—published shortly after first submission of this paper—Binding MOAD (22). LPDB is a compilation of 262 PDB complexes with affinity data. Since it also provides scoring values, docked ligand poses (‘decoys’) and ligand files setup for docking, LPDB is primarily intended to serve as a dataset for testing and developing docking and scoring methods. It does neither provide details nor references for the affinity values. The same is true for PLD, which contains 485 complexes and experimental binding energies for 344 of them. PLD can be searched by using a variety of single search criteria, but no combined search queries are possible. Similar to AffinDB it is freely accessible to anybody over the Internet, whereas PDBbind and Binding MOAD require a registration before granting academic users a free login account. The latter two databases offer by far the largest amount of affinity values, both covering well beyond 1700 complexes in their latest updates. Details about the affinity measurement and experimental conditions, however, are not included, which is an information provided by AffinDB for data retrieved from primary references. In summary, although there is certainly some overlap among the structure-affinity databases recently arosen from independent efforts, there are clear differences in focus, design and content, rendering each database on its own and in mutual combination an indispensable tool for the scientific community as long as affinities are not reported by the PDB and/or no common repository for biomolecular affinity data exists. AffinDB encourages users to contribute data and submit references to papers with affinity data for PDB complexes. After registering for upload, an input form can be accessed which facilitates the submission of all relevant data in a clear format. Data submitted by users do not directly enter the database, but must first undergo revision by the database curators. This should ensure high fidelity of the affinity data collected from literature and reported by AffinDB.

24 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Further development and validation of empirical scoring functions for structure-based binding affinity prediction.

Authors: Renxiao Wang; Luhua Lai; Shaomeng Wang
Journal: J Comput Aided Mol Des Date: 2002-01 Impact factor: 3.686

3. E-MSD: the European Bioinformatics Institute Macromolecular Structure Database.

Authors: H Boutselakis; D Dimitropoulos; J Fillon; A Golovin; K Henrick; A Hussain; J Ionides; M John; P A Keller; E Krissinel; P McNeil; A Naim; R Newman; T Oldfield; J Pineda; A Rachedi; J Copeland; A Sitnov; S Sobhany; A Suarez-Uruena; J Swaminathan; M Tagari; J Tate; S Tromm; S Velankar; W Vranken
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

4. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures.

Authors: Renxiao Wang; Xueliang Fang; Yipin Lu; Shaomeng Wang
Journal: J Med Chem Date: 2004-06-03 Impact factor: 7.446

5. The PDBbind database: methodologies and updates.

Authors: Renxiao Wang; Xueliang Fang; Yipin Lu; Chao-Yie Yang; Shaomeng Wang
Journal: J Med Chem Date: 2005-06-16 Impact factor: 7.446

6. Binding MOAD (Mother Of All Databases).

Authors: Liegi Hu; Mark L Benson; Richard D Smith; Michael G Lerner; Heather A Carlson
Journal: Proteins Date: 2005-08-15

7. Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes.

Authors: M D Eldridge; C W Murray; T R Auton; G V Paolini; R P Mee
Journal: J Comput Aided Mol Des Date: 1997-09 Impact factor: 3.686

8. The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure.

Authors: H J Böhm
Journal: J Comput Aided Mol Des Date: 1994-06 Impact factor: 3.686

9. (R)-3-Amidinophenylalanine-derived inhibitors of factor Xa with a novel active-site binding mode.

Authors: Markus Michael Mueller; Stefan Sperl; Jörg Stürzebecher; Wolfram Bode; Luis Moroder
Journal: Biol Chem Date: 2002 Jul-Aug Impact factor: 3.915

10. PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids.

Authors: Roman A Laskowski; Victor V Chistyakov; Janet M Thornton
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

20 in total

1. A collaborative environment for developing and validating predictive tools for protein biophysical characteristics.

Authors: Michael A Johnston; Damien Farrell; Jens Erik Nielsen
Journal: J Comput Aided Mol Des Date: 2012-04-04 Impact factor: 3.686

2. A virtual active compound produced from the negative image of a ligand-binding pocket, and its application to in-silico drug screening.

Authors: Yoshifumi Fukunishi; Satoru Kubota; Chisato Kanai; Haruki Nakamura
Journal: J Comput Aided Mol Des Date: 2006-06-21 Impact factor: 3.686

3. Lessons for fragment library design: analysis of output from multiple screening campaigns.

Authors: I-Jen Chen; Roderick E Hubbard
Journal: J Comput Aided Mol Des Date: 2009-06-03 Impact factor: 3.686

4. An improved scoring function for suboptimal polar ligand complexes.

Authors: Giovanni Cincilla; David Vidal; Miquel Pons
Journal: J Comput Aided Mol Des Date: 2008-10-09 Impact factor: 3.686

Review 5. An Overview of the Challenges in Designing, Integrating, and Delivering BARD: A Public Chemical-Biology Resource and Query Portal for Multiple Organizations, Locations, and Disciplines.

Authors: Andrea de Souza; Joshua A Bittker; David L Lahr; Steve Brudz; Simon Chatwin; Tudor I Oprea; Anna Waller; Jeremy J Yang; Noel Southall; Rajarshi Guha; Stephan C Schürer; Uma D Vempati; Mark R Southern; Eric S Dawson; Paul A Clemons; Thomas D Y Chung
Journal: J Biomol Screen Date: 2014-01-17

6. BUDDY-system: A web site for constructing a dataset of protein pairs between ligand-bound and unbound states.

Authors: Mizuki Morita; Tohru Terada; Shugo Nakamura; Kentaro Shimizu
Journal: BMC Res Notes Date: 2011-05-22

7. A chemocentric approach to the identification of cancer targets.

Authors: Beáta Flachner; Zsolt Lörincz; Angelo Carotti; Orazio Nicolotti; Praveena Kuchipudi; Nikita Remez; Ferran Sanz; József Tóvári; Miklós J Szabó; Béla Bertók; Sándor Cseh; Jordi Mestres; György Dormán
Journal: PLoS One Date: 2012-04-25 Impact factor: 3.240