Literature DB >> 25474259

DBBP: database of binding pairs in protein-nucleic acid interactions.

Byungkyu Park, Hyungchan Kim, Kyungsook Han.   

Abstract

BACKGROUND: Interaction of proteins with other molecules plays an important role in many biological activities. As many structures of protein-DNA complexes and protein-RNA complexes have been determined in the past years, several databases have been constructed to provide structure data of the complexes. However, the information on the binding sites between proteins and nucleic acids is not readily available from the structure data since the data consists mostly of the three-dimensional coordinates of the atoms in the complexes.
RESULTS: We analyzed the huge amount of structure data for the hydrogen bonding interactions between proteins and nucleic acids and developed a database called DBBP (DataBase of Binding Pairs in protein-nucleic acid interactions, http://bclab.inha.ac.kr/dbbp). DBBP contains 44,955 hydrogen bonds (H-bonds) of protein-DNA interactions and 77,947 H-bonds of protein-RNA interactions.
CONCLUSIONS: Analysis of the huge amount of structure data of protein-nucleic acid complexes is labor-intensive, yet provides useful information for studying protein-nucleic acid interactions. DBBP provides the detailed information of hydrogen-bonding interactions between proteins and nucleic acids at various levels from the atomic level to the residue level. The binding information can be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25474259      PMCID: PMC4271565          DOI: 10.1186/1471-2105-15-S15-S5

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Protein-nucleic acid interactions play an important role in many biological activities. Site-specific DNA-binding proteins or transcription factors (TFs) play important roles in gene regulations by forming protein complexes [1]. These protein-DNA complexes may bind alone or in combination near the genes whose expression they control [2]. For example, DNA-binding proteins may regulate the expression of a target gene [1], so protein-DNA interactions are important for DNA replication, transcription and gene regulations in general. Protein-RNA interactions also have important roles in a wide variety of gene expression [3]. For instance, ribonucleoprotein particles (RNPs) bind to RNA in the post-transcriptional regulation of gene expression [4], and tRNAs bind to aminoacyl-tRNA synthetases to properly translate the genetic code into amino acids [5]. As protein and RNA mutually interact, RNA-binding proteins are essential molecules in degradation, localization, regulating RNA splicing, RNA metabolism, stability, translation, and transport [6]. Therefore, identification of amino acids involved in DNA/RNA binding or (ribo)nucleotides involved in amino acid binding is important for understanding of the mechanism of gene regulations. As the number of structures of protein-DNA/RNA complexes that have been resolved has been increased plentifully for the past few years, a huge amount of structure data is available at several databases [7-10]. However, the data on the binding sites between proteins and nucleic acids is not readily available from the structure data, which consist mostly of the three-dimensional coordinates of the atoms in the complexes. A recent database called the Protein-RNA Interface Database (PRIDB) [9] provides the information on protein-RNA interfaces by showing interacting amino acids and ribonucleotides in the primary sequences. However, it does not provide the binding sites on the interacting partners of the amino acids and ribonucleotides in protein-RNA interfaces. In this study we performed wide analysis of the structures of protein-DNA/RNA complexes and built a database called DBBP (DataBase of Binding Pairs in protein-nucleic acid interactions). The database shows hydrogen-bonding interactions between proteins and nucleic acids at an atomic level, which is not readily available in any other databases, including the Protein Data Bank (PDB) [11]. The binding pairs of hydrogen bonds provided by the database will help researchers determine DNA (or RNA) binding sites in proteins and protein binding sites in DNA or RNA molecules. It can also be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids. The rest of the paper presents the structure and interface of the database.

Materials and methods

Protein-DNA/RNA complexes

The protein-DNA/RNA complexes determined by X-ray crystallography were selected from PDB. As of February, 2013 there were 2,568 protein-DNA complexes and 1,355 protein-RNA complexes in PDB. After extracting complexes with a resolution of 3.0 Å or better, 2,138 protein-DNA complexes (called the DS1 data set) and 651 protein-RNA complexes (the DS2 data set) remained.

Binding sites in protein-nucleic acid interactions

Different studies [9,12-14] have defined slightly different criteria for a binding site in protein-nucleic acid interactions. For example, in RNABindR [15,16] and BindN [17] an amino acid with an atom within a distance of 5 Å from any other atom of a ribonucleotide was considered to be an RNA-binding amino acid. As for the criteria for a binding site between proteins and nucleic acids, we use a hydrogen bond (H-bond), which is stricter than the distance criteria. The locations of hydrogen atoms (H) were inferred from the surrounding atoms since hydrogen atoms are invisible in purely X-ray-derived structures. H-bonds between proteins and nucleic acids were identified by finding all proximal atom pairs between H-bond donors (D) and acceptors (A) that satisfy the following the geometric criteria: (1) the hydrogen-acceptor (H-A) distance <2.5 Å, (2) the donor-hydrogen-acceptor (D-H-A) angle >90°, (3) the contacts with the donor-acceptor (D-A) distance <3.9 Å, (4) H-A-AA angle >90°, where AA is an acceptor antecedent. These are the most commonly used criteria for H bonds. In particular, the criteria of H-A distance <2.5 Å and D-H-A angle >90° are essential for H bonds [18]. If there is no H-bond within a protein-nucleic acid complex, we eliminated the complex from the data sets of DS1 and DS2. As a result, we gathered 2,068 protein-DNA complexes (DS3) and 637 protein-RNA complexes (DS4). As an example, Figure 1 shows three H-bonds between Threonine (Thr224) and Cytosine (C8) in a protein-RNA complex (PDB ID: 4F3T) [19]. In protein-RNA interactions, OG1 and N of Threonine can act as a hydrogen donor and OG1 and O of Threonine can act as a hydrogen acceptor. N3, N4, O2′ and O3′ of Cytosine can act as a hydrogen donor and N3, O2, O2′, O3′, O4′, O5′, OP1 and OP2 of Cytosine can act as a hydrogen acceptor. In this example, Cytosine is the 8th nucleotide in RNA chain R and Threonine is the 224th amino acid in protein chain A. OG1 of Threonine donates hydrogen to O2′ of Cytosine, OG1 of Threonine donates hydrogen to O3′ of Cytosine, and O2' of Cytosine donates hydrogen to OG1 of Threonine. Figure 2 shows the structure of the protein-RNA complex (PDB ID: 4F3T).
Figure 1

Three H-bonds between Cytosine (C8) and Threonine (Thr224). Three H-bonds between Cytosine (C8) and Threonine (Thr224) of a protein-RNA complex (PDB ID: 4F3T). O2′ of Cytosine donates hydrogen to OG1 of Threonine. OG1 of Threonine donates hydrogen to O2′ of Cytosine and OG1 of Threonine donates hydrogen to O3′ of Cytosine.

Figure 2

The structure of a protein-RNA complex (PDB ID: 4F3T). The enlarged box shows three hydrogen bonds between Cytosine and Threonine. O2′ donates hydrogen to OG1. OG1 donates hydrogen to O2′ and O3′.

Three H-bonds between Cytosine (C8) and Threonine (Thr224). Three H-bonds between Cytosine (C8) and Threonine (Thr224) of a protein-RNA complex (PDB ID: 4F3T). O2′ of Cytosine donates hydrogen to OG1 of Threonine. OG1 of Threonine donates hydrogen to O2′ of Cytosine and OG1 of Threonine donates hydrogen to O3′ of Cytosine. The structure of a protein-RNA complex (PDB ID: 4F3T). The enlarged box shows three hydrogen bonds between Cytosine and Threonine. O2′ donates hydrogen to OG1. OG1 donates hydrogen to O2′ and O3′.

The probability of binding amino acid

Let P (+) be the probability that an amino acid is a binding site and P (−) be the probability that an amino acid is a non-binding site in protein-nucleic acid interactions (Equations 1 and 2). Then, the conditional probability P(A|+) is the probability that the binding amino acid is A. Likewise, the conditional probability P(A|−) is the probability that the non-binding amino acid is A. Equation 5 is the log-likelihood ratio of P(A|+) and P(A|−).

Results and discussion

Hydrogen bonds in protein-nucleic acid interactions

We obtained H-bonds from 2,068 protein-DNA complexes (DS3) and 637 protein-RNA complexes (DS4) using HBPLUS [18,20] with the H-bond criteria: , ∠DHA >90°, . There are a total of 44,955 H-bonds in protein-DNA complexes and 77,947 H-bonds in protein-RNA complexes. Table 1 shows the number of atoms, which are occurrences in H-bonds of amino acids. In the 44,955 H-bonds of protein-DNA complexes, there are 41,298 hydrogen donors and 3,657 hydrogen acceptors in amino acids. In the 77,947 H-bonds of protein-RNA complexes, there are 59,796 hydrogen donors and 18,151 hydrogen acceptors in amino acids. Table 2 shows the number of atoms, which are occurrences in H-bonds of (ribo)nucleotides. In the 44,955 H-bonds of protein-DNA complexes, there are 3,657 hydrogen donors and 41,298 hydrogen acceptors in DNAs. In the 77,947 H-bonds of protein-RNA complexes, there are 18,151 hydrogen donors and 59,796 hydrogen acceptors in RNAs.
Table 1

Atoms of amino acids involved in H-bonding interactions with nucleic acids.

RNA-protein complexDNA-protein complex
AAAtomAcceptorDonor#H-bondsAcceptorDonor#H-bonds
AlaN1,0691,653674808
O567134
OXT17

ArgNH29,25222,3956,14413,705
NH17,2784,665
NE4,0112,191
N1,388606
O45599
OXT13

AsnND23,2684,95323493,119
OD1934408
N549261
O202101

AspOD21,4162,829353735
OD11,183290
O17831
N5261

CysSG237612519120215
O24
N276

GlnNE216824964,46821,5932,571
OE11,108363521
N480
O21692

GluOE21,6913,507275737
OE11,315260
O19319
N308183

GlyN1,5182,69917491,902
O1,175153
OXT6

HisNE24121,4543,591307681,254
ND15361,01415327
N10690
O6924

IleN258309433466
O4033
OXT11

LeuN507766362387
O25925

LysNZ9,8645,1456,351
N85211,436861,120
O717
OXT3

MetSD10566215147
O27613119
N278
OXT3

PheO33353942247
N206205

ProO1611612828

SerOG1,1794,6756,9971823,5334,741
N683958
O46068

ThrOG11,0584,4067,2671583,0174,252
O750132
N1,053945

TrpNE1532582358393
OXT16
O1410
N2025

TyrOH5971,9352,6821331,8002,511
O9328
N57550

ValO17432636386
N151350
OXT1
18,15159,79677,9473,65741,29844,955
Table 2

Atoms of nucleotides involved in H-bonding interactions with amino acids.

RNA-protein complexDNA-protein complex
NucleotideAtomAcceptorDonor#H-bondsAcceptorDonor#H-bonds
AN140214022,103582310,254
N31,0717974826
N61,472621
N7505580
O2'4,2404,269
O3'1,71186361100
O4'252276
O5'110188
OP11,7544,039
OP26,0123,234

CN33354916,18912739,502
N47851,272
O22,556959
O2'2,1012,20911
O3'1,15056257139
O4'663209
O5'117118
OP15,1763,858
OP29922,558

GN154775930,350220414,864
N23,907761
N3655533992
N71,6602,238
O2'2,0472,383
O3'1,03124438157
O4'450420
O5'585197
O62,3962,272
OP110,5234,359
OP23,3303,415

U/TN31733869,3052923410,335
O21,5611,165
O2'1,3101,445
O3'1,06749351114
O41,199796
O4'166257
O5'45216
OP11,1083,548
OP27963,625

59,79618,15177,94741,2983,65744,955
Atoms of amino acids involved in H-bonding interactions with nucleic acids. Atoms of nucleotides involved in H-bonding interactions with amino acids. If an atom of DNA acts as a hydrogen acceptor, an atom of protein should be a hydrogen donor. Hence, the number of DNA acceptors (41,298) is the same as the number of protein donors (41,298), and the number of DNA donors (3,657) is the same as the number of protein acceptors (3,657). Likewise, the number of RNA acceptors (59,796) is the same as the number of protein donors (59,796) and the number of RNA donors (18,151) is the same as the number of protein acceptors (18,151). Figure 3 shows RNA-binding amino acids in protein-RNA complexes. Ala, Arg, Glu, Gly, Leu, Lys, and Val are more frequent than others in protein-RNA complexes (Figure 3A). In binding sites with RNA, Arg has the most frequently observed amino acid. Figure 3C shows the log-likelihood ratio (Equation 5) for each amino acid. Amino acids with a positive log-likelihood ratio have a higher chance to bind to RNA than those with a negative log-likelihood ratio. Arg has the highest log-likelihood ratio (1.59), and Val has the lowest log-likelihood ratio (-4.24). Interestingly, Ala has a negative log-likelihood ratio although it is frequently observed in protein-RNA complexes. This is because Ala is rarely observed in binding sites.
Figure 3

RNA-binding amino acids in protein-RNA complexes. (A) Amino acids in the protein-RNA complexes and RNA-binding amino acids. (B) The probability that the binding amino acid is A (P(A|+)) and the probability that non-binding amino acid is A (P(A|−)). (C) The log-likelihood ratio log2(P(A|+)/P(A|−)).

RNA-binding amino acids in protein-RNA complexes. (A) Amino acids in the protein-RNA complexes and RNA-binding amino acids. (B) The probability that the binding amino acid is A (P(A|+)) and the probability that non-binding amino acid is A (P(A|−)). (C) The log-likelihood ratio log2(P(A|+)/P(A|−)). Figure 4 shows DNA-binding amino acids in protein-DNA complexes. Ala, Arg, Glu, Gly, Leu, Lys, Ser, and Val are more frequent than others in protein-DNA complexes (Figure 4A). As in protein-RNA interactions, Arg has the most frequently observed amino acid in the binding sites with DNA.
Figure 4

DNA-binding amino acids in protein-DNA complexes. (A) Amino acids in the protein-DNA complexes and DNA-binding amino acids. (B) The probability that the binding amino acid is A (P(A|+)) and the probability that non-binding amino acid is A (P(A|−)). (C) The log-likelihood ratio log2(P(A|+)/P(A|−)).

DNA-binding amino acids in protein-DNA complexes. (A) Amino acids in the protein-DNA complexes and DNA-binding amino acids. (B) The probability that the binding amino acid is A (P(A|+)) and the probability that non-binding amino acid is A (P(A|−)). (C) The log-likelihood ratio log2(P(A|+)/P(A|−)).

Web interface

DBBP shows binding pairs at various levels, from the atomic level to the residue level. When it shows detailed information on H-Bonds, it shows the donors and acceptors of each H-bond. A same type of atom can play a role of hydrogen donor or acceptor depending on the context. We generated XML files for binding sites of protein-DNA/RNA complexes. Users of the database can access the XML file via PDB ID. Figure 5 shows our XML schema. The BindPartner element has elements and attributes, which are PDB ID, protein sequence (proSeq), protein bond (proBnd), DNA/RNA sequence (dnaSeq, rnaSeq), and DNA/RNA bond (dnaBnd, rnaBnd). DNA/RNA and protein bonds represent binding site '+' and non-binding site '-'. The BindingSite element has attributes, which are PDBID, Acceptor, Acceptor chain, Acceptor index, Acceptor residue, Donor, Donor chain, Donor index, and Donor residue.
Figure 5

The XML schema of the database. XML files were generated for the binding sites in protein-DNA complexes and protein-RNA complexes via the XML schema.

The XML schema of the database. XML files were generated for the binding sites in protein-DNA complexes and protein-RNA complexes via the XML schema.

Conclusion

From an extensive analysis of the structure data of protein-DNA/RNA complexes extracted from PDB, we have identified hydrogen bonds (H-bonds). Analysis of the large amount of structure data for H-bonds is labor-intensive, yet provides useful information for studying protein-nucleic acid interactions. The protein-DNA complexes contain 44,955 H-bonds, which have 3,657 hydrogen acceptors (HA) and 41,298 hydrogen donors (HD) in amino acids, and 41,298 HA and 3,657 HD in nucleotides. The protein-RNA complexes contain 77,947 H-bonds, which have 18,151 HA and 59,796 HD in amino acids, and 59,796 HA and 18,151 HD in nucleotides. Using the data of H-bonding interactions, we developed a database called DBBP (DataBase of Binding Pairs in protein-nucleic acid interactions). DBBP provides the detailed information of H-bonding interactions between proteins and nucleic acids at various levels. Such information is not readily available in any other databases, including PDB, but will help researchers determine DNA (or RNA) binding sites in proteins and protein binding sites in DNA or RNA molecules. It can also be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids. The database is available at http://bclab.inha.ac.kr/dbbp.

Authors' contributions

Byungkyu Park implemented the databse and prepared the first draft of the manuscript. Hyungchan Kim drew figures and prepared the manuscript together. Kyungsook Han supervised the work and rewrote the manuscript. All authors read and approved the final manuscript.
  19 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Structure-based analysis of protein-RNA interactions using the program ENTANGLE.

Authors:  J Allers; Y Shamoo
Journal:  J Mol Biol       Date:  2001-08-03       Impact factor: 5.469

3.  Geometric criteria of hydrogen bonds in proteins and identification of "bifurcated" hydrogen bonds.

Authors:  Ivan Y Torshin; Irene T Weber; Robert W Harrison
Journal:  Protein Eng       Date:  2002-05

4.  AANT: the Amino Acid-Nucleotide Interaction Database.

Authors:  Michael M Hoffman; Maksim A Khrapov; J Colin Cox; Jianchao Yao; Lingnan Tong; Andrew D Ellington
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  Prediction of RNA binding sites in proteins from amino acid sequence.

Authors:  Michael Terribilini; Jae-Hyung Lee; Changhui Yan; Robert L Jernigan; Vasant Honavar; Drena Dobbs
Journal:  RNA       Date:  2006-06-21       Impact factor: 4.942

6.  Satisfying hydrogen bonding potential in proteins.

Authors:  I K McDonald; J M Thornton
Journal:  J Mol Biol       Date:  1994-05-20       Impact factor: 5.469

Review 7.  RNA recognition by RNP proteins during RNA processing.

Authors:  G Varani; K Nagai
Journal:  Annu Rev Biophys Biomol Struct       Date:  1998

8.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences.

Authors:  Liangjiang Wang; Susan J Brown
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

9.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins.

Authors:  Michael Terribilini; Jeffry D Sander; Jae-Hyung Lee; Peter Zaback; Robert L Jernigan; Vasant Honavar; Drena Dobbs
Journal:  Nucleic Acids Res       Date:  2007-05-05       Impact factor: 16.971

10.  HITS-CLIP yields genome-wide insights into brain alternative RNA processing.

Authors:  Donny D Licatalosi; Aldo Mele; John J Fak; Jernej Ule; Melis Kayikci; Sung Wook Chi; Tyson A Clark; Anthony C Schweitzer; John E Blume; Xuning Wang; Jennifer C Darnell; Robert B Darnell
Journal:  Nature       Date:  2008-11-02       Impact factor: 49.962

View more
  4 in total

Review 1.  DNA-protein interaction: identification, prediction and data analysis.

Authors:  Abbasali Emamjomeh; Darush Choobineh; Behzad Hajieghrari; Nafiseh MahdiNezhad; Amir Khodavirdipour
Journal:  Mol Biol Rep       Date:  2019-03-26       Impact factor: 2.316

2.  Sequence-Based Prediction of RNA-Binding Residues in Proteins.

Authors:  Rasna R Walia; Yasser El-Manzalawy; Vasant G Honavar; Drena Dobbs
Journal:  Methods Mol Biol       Date:  2017

3.  RPpocket: An RNA-Protein Intuitive Database with RNA Pocket Topology Resources.

Authors:  Rui Yang; Haoquan Liu; Liu Yang; Ting Zhou; Xinyao Li; Yunjie Zhao
Journal:  Int J Mol Sci       Date:  2022-06-21       Impact factor: 6.208

4.  Multiple protein-DNA interfaces unravelled by evolutionary information, physico-chemical and geometrical properties.

Authors:  Flavia Corsi; Richard Lavery; Elodie Laine; Alessandra Carbone
Journal:  PLoS Comput Biol       Date:  2020-02-03       Impact factor: 4.475

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.