Literature DB >> 34606614

ProNAB: database for binding affinities of protein-nucleic acid complexes and their mutants.

Kannan Harini1, Ambuj Srivastava1, Arulsamy Kulandaisamy1, M Michael Gromiha1.   

Abstract

Protein-nucleic acid interactions are involved in various biological processes such as gene expression, replication, transcription, translation and packaging. The binding affinities of protein-DNA and protein-RNA complexes are important for elucidating the mechanism of protein-nucleic acid recognition. Although experimental data on binding affinity are reported abundantly in the literature, no well-curated database is currently available for protein-nucleic acid binding affinity. We have developed a database, ProNAB, which contains more than 20 000 experimental data for the binding affinities of protein-DNA and protein-RNA complexes. Each entry provides comprehensive information on sequence and structural features of a protein, nucleic acid and its complex, experimental conditions, thermodynamic parameters such as dissociation constant (Kd), binding free energy (ΔG) and change in binding free energy upon mutation (ΔΔG), and literature information. ProNAB is cross-linked with GenBank, UniProt, PDB, ProThermDB, PROSITE, DisProt and Pubmed. It provides a user-friendly web interface with options for search, display, sorting, visualization, download and upload the data. ProNAB is freely available at https://web.iitm.ac.in/bioinfo2/pronab/ and it has potential applications such as understanding the factors influencing the affinity, development of prediction tools, binding affinity change upon mutation and design complexes with the desired affinity.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 34606614      PMCID: PMC8728258          DOI: 10.1093/nar/gkab848

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Protein–nucleic acid interactions play essential roles in fundamental cellular processes such as regulation of gene expression, replication, translation, DNA repair and packaging. The functions of protein–nucleic acid complexes are mainly dictated by their binding affinities (1). Experimentally, the strength of the protein–nucleic acid interactions is determined using isothermal titration calorimetry (ITC), surface plasmon resonance (SPR), electrophoretic mobility shift assay (EMSA) and fluorescence. The binding affinity data such as dissociation constant (Kd) and binding free energy (ΔG) are successfully used to understand the recognition mechanism of protein–DNA and protein–RNA complexes (2). Further, amino acid and/or nucleotide mutations in protein–nucleic acid complexes alter their binding affinities and some of them lead to diseases including cancer and neurodegenerative disorders (3,4). Hence, binding affinities of protein–nucleic acid complexes are essential for understanding the disease-causing mechanisms and developing disease-specific drug design strategies. For example, designing aptamers with high affinity is reported to be a promising therapy for different diseases (5–7). With advancements in high-throughput experimental methods, vast amount of thermodynamic data on protein–DNA and protein–RNA complexes are reported in the literature. An effective compilation of these data and constructing a public repository would aid researchers in gaining insights for understanding the relationship among binding affinity, structure, function and diseases. The binding affinities of protein–nucleic acid complexes are accumulated in few databases such as ProNIT (8), dbAMEPNI (9) and PDBbind (10), and these databases have several limitations on scope, update and availability. ProNIT has not been updated after 2006 and dbAMEPNI has a limited number of data on alanine mutations alone. PDBbind is not specific for protein–nucleic acid complexes and also it has only the binding affinity data for complexes with known structural information. On the other hand, experimental binding affinities are necessary to develop machine learning methods for predicting the binding affinity (11–13) and change in binding affinity upon mutation (14). In this study, we developed a comprehensive database for protein–nucleic acid binding affinity, ProNAB, which contains binding affinity data such as dissociation (Kd) and association (Ka) constants, enthalpy, binding free energy (ΔG), and change in free energy upon mutation (ΔΔG) along with sequence and structural information of proteins and nucleic acids, experimental conditions and literature information. Each entry in ProNAB provides comprehensive features from both protein and nucleic acid sequence and structure. Further, it is cross-linked with various sequence and structure databases of proteins and nucleic acids. The web interface provides flexible options for researchers to search based on different parameters along with options for sorting and visualization. ProNAB is available at https://web.iitm.ac.in/bioinfo2/pronab/.

CONTENTS OF THE DATABASE

ProNAB provides experimentally determined binding affinity data for wild-type and mutant protein–nucleic acid complexes. These data are obtained from a detailed survey of literature and existing/ obsolete databases. We have retrieved research articles and reviews related to binding affinity of protein–nucleic acid complexes using keyword searches with AND/OR operations (e.g. binding affinity, protein-DNA, protein–RNA, dissociation constant, Kd, free energy of binding, ITC, SPR etc.) from PubMed. Further, we obtained the papers listed in Protein Data Bank (PDB) for known three-dimensional structures of protein–nucleic acid complexes and their binding affinities. In addition, we checked the ‘Table of contents’ of specific journals (e.g. Biochemistry, Nucleic Acids Research, Journal of Molecular Biology, Journal of Biological Chemistry, PNAS etc.), which publish articles related to experimental data on binding affinity of protein–nucleic acid complexes. From each article, we manually curated the information about the name of the protein, nucleic acid, complex, experimental conditions, measurement, method, thermodynamic data, literature information and location of the data in the research article. The detailed workflow of the ProNAB is provided in Figure 1. For each entry, ProNAB contains the following seven different levels of information. Each entry in ProNAB is identified with a unique entry number, which contains the information provided in Table 1.
Figure 1.

Overall workflow of ProNAB database.

Table 1.

Description of data in ProNAB with an example entry showing the binding affinity data of ‘Cysteine-tRNA ligase’ protein–RNA complex

DescriptionExample
Entry id12197
Protein NameCysteine–tRNA ligase
SynonymsCysteinyl–tRNA synthetase; CysRS
EC number6.1.1.16
Protein Source Escherichia coli (strain K12)
SequenceMLKIFNTLTRQKEEFKPIHAGEVGMYVCGITVYDLCHIGHGRTFVAFDVVARYLRFLGYKLKYVRNITDIDDKIIKRANENGESFVAMVDRMIAEMHKDFDALNILRPDMEPRATHHIAEIIELTEQLIAKGHAYVADNGDVMFDV…
Length461
Mass (Da)52,202
UniProt IDP21888
PROSITE ID-
DisProt ID-
PDB of Free Protein1LI5
ASA of Free protein (Å2)29
ProTherm Id-
Mutation in proteinN351A
Nucleic acid NameTRNA-cys
Nucleic acid SourceSynthetic
Type of Nuclei acidRNA
SequenceGGCGCGUUAACAAAGCGGUUAUGUAGCGGAUUGCAAAUCCGUCUAGUCCGGUUCGACUCCGGAAC….
Mutation in Nucleic acidG48C
Genbank ID56966181
PDB Complex1U0B
NDB ComplexPR0135
ASA of Complex (Å2)40
Sec strCoil
pH7.5
Temperature (K)298
Buffer20 mM Tris–Hcl
Ion name50 mM NaCl
MethodFluorescence
K d wild (M)2.7 × 10–7
K d mutant (M)8.16 × 10–6
K a wild (M–1)4 × 106
K a mutant (M–1)1 × 105
ΔG wild (kcal/mol) 8.96
ΔG mutant (kcal/mol) 6.94
ΔΔG (kcal/mol)2.02
ΔH wild (kcal/mol)-
ΔH mutant (kcal/mol)-
Stoichiometry-
ReferenceNat Struct Mol Biol. 2004 Nov;11(11):1134–41.
TitleShape-selective RNA recognition by cysteinyl-tRNA synthetase.
AuthorsHauenstein S, Zhang CM, Hou YM, Perona JJ
KeywordsCysRS; tRNA aminoacylation; elongation factor;
PubMed15489861
DOI http://dx.doi.org/10.1038/nsmb849
Location of dataTable 2; Page No.: 1138
Remarks-
Related Entries12194; 12195; 12196
Protein information: Protein name, name of the organism (source), sequence, structure, accession numbers of UniProt (15), PROSITE (16), ProThermDB (17), DisProt (18), enzyme commission number (19) and PDB (20), secondary structure, solvent accessibility and mutation information (wild-type, single, double and multiple mutations along with mutant positions). We utilized the SIFTS database (21) for mapping residue positions between UniProt and PDB. Nucleic acid information: Nucleic acid name, source, type of nucleic acid such as DNA or RNA, GenBank ID, sequence, mutation and type of mutation such as single, double and multiple along with mutation position. Complex information: PDB (20) and NDB (22) codes for both wild-type and mutant structures of protein–nucleic acid complexes (if available), 3D visualization of the complex using JSmol interface (23), secondary structure and solvent accessibility of the mutant in a complex calculated using DSSP (24) for the proteins with known three-dimensional structures. Experimental conditions: Temperature, pH, buffer name, additives, ions and method. Thermodynamic data: The thermodynamic parameters of binding affinity are represented as dissociation (Kd) and association constants (Ka), enthalpy (ΔH), the free energy of binding (ΔG) and the change in free energy of binding (ΔΔG) for the mutants. Literature information: PubMed identifier, name of the author(s), journal name, year of publication, location of the data, keywords and Digital Object Identifier (DOI). Miscellaneous Information: Remarks and entry numbers related to the same protein in ProNAB. Overall workflow of ProNAB database. Description of data in ProNAB with an example entry showing the binding affinity data of ‘Cysteine-tRNA ligase’ protein–RNA complex

DATABASE STATISTICS

ProNAB contains 20 090 entries, which include 14 606 and 5323 entries for protein–DNA and protein–RNA binding affinities, respectively along with 161 entries for hybrid complexes (protein–DNA–RNA). It has binding affinity information for 1027 unique nucleic acid binding proteins from 1250 literature sources published during 1979–2021. A total of 798 unique protein–DNA and 340 protein–RNA complex structures are present in ProNAB. Based on wild-type, the current version contains 13 642, 12 318 and 4304 data from proteins, DNA and RNA, respectively. Further, 6448, 2288 and 5323 mutation data are available for proteins, DNA and RNA, respectively. Among them, 76.4%, 15.4% and 8.15% are single, double and multiple mutations, respectively. Detailed statistics on wild-type and mutant data for proteins and nucleic acids are presented in Figure 2A and B. Figure 2C and D shows the representation of mutants based on secondary structure and solvent accessibility, respectively. Figure 2E and F provides the information about the distribution of data based on the year of publication and experimental methods used to determine the binding affinity, respectively.
Figure 2.

Statistics of ProNAB database based on the distribution of (A) wild-type and mutant data in Proteins, (B) wild-type and mutant data in nucleic acids, (C) secondary structure of mutants, (D) solvent accessibility of mutants, (E) publication years and (F) methods

Statistics of ProNAB database based on the distribution of (A) wild-type and mutant data in Proteins, (B) wild-type and mutant data in nucleic acids, (C) secondary structure of mutants, (D) solvent accessibility of mutants, (E) publication years and (F) methods

LINKS TO OTHER DATABASES

ProNAB is cross-linked with different sequence, structure, stability, and other relevant databases. Entries are linked with (i) UniProt, which provides both sequence and functional information of the protein, (ii) PDB, three-dimensional structural information, (iii) ProThermDB to understand the stability of the proteins and mutants, (iv) PROSITE, which provides details about the motif present in the proteins, (v) DisProt to show the disorderness of the protein, (vi) GenBank to obtain the information on nucleic acid sequences, (vii) PDB and NDB for the protein–nucleic acid complexes and (viii) PUBMED for the literature information.

DATA RETRIEVAL

The detailed information about the search and display options and an example for data retrieval from ProNAB are illustrated in Figure 3. In this example, we build a query using a combination of multiple search options as ‘Get the change in binding free energy upon mutation (ΔΔG) within the range of −5 to −1 kcal/mol in proteins with ‘single mutations’ measured at the temperature and pH in the range of ‘293–298 K’ and ‘5–7’ respectively’ (Figure 3A). In addition, we selected the desired columns in the display options as ΔΔG, Temperature, pH and other default options (Figure 3B). The data is sorted based on ΔΔG values (Figure 3C). After submitting the query, the results are displayed in a table format (Figure 3D). We also provided an option to download the search results in CSV format and users can easily parse the same for further analysis. On the result page, each entry accession number has a hyperlink for their respective external page (Figure 3E), which contains the complete information. The structural visualization is available for each entry.
Figure 3.

An example of data retrieval from ProNAB database using different search and display options

An example of data retrieval from ProNAB database using different search and display options

DATA DOWNLOAD AND UPLOAD

Users can upload their new experimentally determined binding affinity data of protein–nucleic acid complexes into the ProNAB database. For uploading the data, depositors are requested to use the ‘Data Upload’ option in the web page and supply the following information: Protein–nucleic acid complex name, UniProt/PDB code, PubMed or Digital Object Identifier (DOI) number. We have also provided an option to download the entire ProNAB data by submitting a request to the corresponding author through the ‘Data Download’ option in the web page.

COMPARISON WITH EXISTING DATABASES

A detailed comparison of ProNAB with other existing databases is given in Table 2. ProNIT is currently not accessible on the web and ProNAB has an increase of 66% data compared to the previous version of ProNIT with a considerable increase in entries with known structural information. dbAMEPNI has only 578 binding affinity data specific for alanine mutations, whereas ProNAB has 2397 entries for alanine mutations and also has binding affinity data for all types of mutations along with the wild-type data. The other database, PDBbind, has binding affinity data for biomolecular complexes (protein–protein, protein–ligand, protein–nucleic acids) in PDB, i.e. only for complexes with known structural information. On the other hand, ProNAB is a comprehensive database for the binding affinity of protein–nucleic acid complexes with sequence as well as structural information. Further, ProNAB also provides other features such as change in binding affinity upon mutations and the exact location of the data in the literature. Also, our database has options to upload new data and links for the structural visualization of complexes. The ProNAB database is linked to several databases such as ProThermDB, DisProt, PROSITE and GenBank.
Table 2.

Comparison of ProNAB with other existing databases

FeaturesProNITdbAMEPNIPDBbindProNAB
AvailabilityNoYesYesYes
Number of Entries12 17457897320 090
Number of unique PDB structures124 (108 protein–DNA, 16 protein–RNA)152 (101 protein–DNA, 51 protein–RNA)973(670 protein–DNA, 293 protein–RNA)1138 (798 protein-DNA, 340 protein–RNA)
Type of mutationAllOnly for alanine mutationAllAll
Change in binding affinity upon mutation (ΔΔG)NoYesNoYes
Availability of exact location of dataNoYesNoYes
Structure visualizationNoNoNoYes
Protein information and buffer conditionsYesNoNoYes
Option for upload, to help maintenanceNoNoNoYes
Literature Year1983–20131983–20171993–20191979–2021
Comparison of ProNAB with other existing databases

APPLICATIONS

ProNAB has several potential applications and some of them are listed below: Explore the relationship between binding affinity and structural/sequence-based features of proteins and nucleic acids to understand the molecular mechanism of protein–nucleic acid interactions (25,26). ProNAB provides a wealth of data for the binding affinities of protein–nucleic acid complexes, which can be used to elucidate the important features based on structure and function, which governs the affinities. Study the effect of mutation on the binding affinity of the complex (27,28). ProNAB serves as a potential resource for providing experimentally determined binding affinities of protein–nucleic acid complexes and their mutants. These data are useful for both large-scale analysis as well as in-depth analysis of a specific complex. Develop computational tools and reliable machine learning models for predicting the binding affinity of protein–nucleic acid complexes and binding affinity change upon mutations (11–14). Design DNA/RNA aptamers with the desired affinity (29,30). Using the binding affinity data of experimentally determined DNA/RNA aptamers available in ProNAB, computational and experimental methods could be developed to design aptamers with desired affinities. Investigate the relationship between binding affinity change and disease causing mutations as reported for protein-protein complexes (31).

DATA AVAILABILITY

ProNAB is available at https://web.iitm.ac.in/bioinfo2/pronab/. The database is developed using HTML, CSS, PHP, MySQL and JavaScript and it supports the latest version of major browsers such as Firefox, Chrome and Opera. The database will be maintained and updated regularly. Each update will be reflected on the homepage of the database. Any constructive comments and suggestions are welcome and should be sent to gromiha@iitm.ac.in.
  30 in total

1.  ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years.

Authors:  Rahul Nikam; A Kulandaisamy; K Harini; Divya Sharma; M Michael Gromiha
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

2.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

3.  Neutralizing DNA aptamers against swine influenza H3N2 viruses.

Authors:  Manoosak Wongphatcharachai; Ping Wang; Shinichiro Enomoto; Richard J Webby; Marie R Gramer; Alongkorn Amonsin; Srinand Sreevatsan
Journal:  J Clin Microbiol       Date:  2012-10-17       Impact factor: 5.948

4.  ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions.

Authors:  M D Shaji Kumar; K Abdulla Bava; M Michael Gromiha; Ponraj Prabakaran; Koji Kitajima; Hatsuho Uedaira; Akinori Sarai
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

5.  dbAMEPNI: a database of alanine mutagenic effects for protein-nucleic acid interactions.

Authors:  Ling Liu; Yi Xiong; Hongyun Gao; Dong-Qing Wei; Julie C Mitchell; Xiaolei Zhu
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

6.  In silico analysis on the functional and structural impact of Rad50 mutations involved in DNA strand break repair.

Authors:  Juwairiah Remali; Wan Mohd Aizat; Chyan Leong Ng; Yi Chieh Lim; Zeti-Azura Mohamed-Hussein; Shazrul Fazry
Journal:  PeerJ       Date:  2020-05-22       Impact factor: 2.984

7.  Enzyme annotation in UniProtKB using Rhea.

Authors:  Anne Morgat; Thierry Lombardot; Elisabeth Coudert; Kristian Axelsen; Teresa Batista Neto; Sebastien Gehant; Parit Bansal; Jerven Bolleman; Elisabeth Gasteiger; Edouard de Castro; Delphine Baratin; Monica Pozzato; Ioannis Xenarios; Sylvain Poux; Nicole Redaschi; Alan Bridge
Journal:  Bioinformatics       Date:  2020-03-01       Impact factor: 6.937

8.  The Nucleic Acid Database: new features and capabilities.

Authors:  Buvaneswari Coimbatore Narayanan; John Westbrook; Saheli Ghosh; Anton I Petrov; Blake Sweeney; Craig L Zirbel; Neocles B Leontis; Helen M Berman
Journal:  Nucleic Acids Res       Date:  2013-10-31       Impact factor: 16.971

9.  Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis.

Authors:  David Jakubec; Roman A Laskowski; Jiri Vondrasek
Journal:  PLoS One       Date:  2016-07-06       Impact factor: 3.240

10.  RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive.

Authors:  Yana Rose; Jose M Duarte; Robert Lowe; Joan Segura; Chunxiao Bi; Charmi Bhikadiya; Li Chen; Alexander S Rose; Sebastian Bittrich; Stephen K Burley; John D Westbrook
Journal:  J Mol Biol       Date:  2020-11-10       Impact factor: 6.151

View more
  3 in total

1.  DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.

Authors:  Omar Barukab; Yaser Daanial Khan; Sher Afzal Khan; Kuo-Chen Chou
Journal:  Appl Bionics Biomech       Date:  2022-04-13       Impact factor: 1.664

2.  The 2022 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

3.  ProDFace: A web-tool for the dissection of protein-DNA interfaces.

Authors:  Arumay Pal; Pinak Chakrabarti; Sucharita Dey
Journal:  Front Mol Biosci       Date:  2022-09-06
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.