| Literature DB >> 20482798 |
Tomás Norambuena1, Francisco Melo.
Abstract
The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 A or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface.We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20482798 PMCID: PMC2885377 DOI: 10.1186/1471-2105-11-262
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1PDB complexes and interfaces definition. (A) Example of a structure whose asymmetric unit contained half of the known biological unit (left). For each entry of this type, the complete biological unit was obtained from the specialized ftp site of PDB at ftp://ftp.wwpdb.org. (B) Each entry of the database consists of a single and independent protein-DNA interface, which is isolated from the whole PDB complex. Here, two examples that illustrate this feature are shown. (Top) 1am9, which PDB file contains two separable complexes, each having a single protein-DNA interface; (Bottom) 1h89, which has one complex, but with two independent protein-DNA interfaces. Each interface has assigned a unique ID in the database.
Description of protein features classes and types
| Class | Type | Description |
|---|---|---|
| Enzyme | Dioxygenase | Enzyme that repairs DNA base lesions by using a direct oxidative dealkylation mechanism [ |
| Endonuclease | Restriction enzyme that cleaves DNA at specific sites [ | |
| Excisionase | Enzyme that controls integrase-mediated DNA rearrangement [ | |
| Glucosyltransferase | Enzyme that binds DNA in abasic site and flips it. Glucosylation is on a 5-hydroximethylcytosine in duplex DNA using UDP-glucose [ | |
| Glycosylase | Enzyme involved in base excision repair, a mechanism by which, damaged nucleotides in DNA are removed and replaced. It catalyses the first step in the process [ | |
| Helicase | Enzyme that unwinds double helices using ATP hydrolysis [ | |
| Ligase | Enzyme that recognizes nicks and states for strand closure [ | |
| Methyltransferase | Enzyme responsible for the generation of the genome methylation patterns leading to gene silencing [ | |
| Nuclease | Enzyme that cleaves DNA, but that are not classified as Endonuclease. | |
| Photolyase | Enzyme that uses light to repair DNA having UV-induced lesions [ | |
| Polymerase | Enzyme that takes nucleotides from solvent, and catalyses the synthesis of a polynucleotide sequence against a nucleotide template strand using base-pairing interactions [ | |
| Recombinase | Enzyme that catalyses the reciprocal exchange of DNA strands in the direct site-specific DNA recombination process [ | |
| Topoisomerase | Enzyme that promotes the relaxation of DNA superhelical lesions by introducing a transient single stranded break in duplex DNA [ | |
| Translocase | Enzyme that segregates dimeric circular chromosomes, formed by recombination of monomer sisters [ | |
| Transposase | Enzyme that mediates transposition, a process whereby defined DNA segments move freely about the genome [ | |
| Structural/DNA Binding | Centromeric Protein | Protein that is part of a chromosome centromere. |
| DNA Packaging | Protein that is part of the chromosome and packages the DNA. | |
| Maintenance/Protection | Protein involved in the protection and maintenance of the genome. | |
| DNA Bending | Protein that bends DNA with a highly component of indirect readout. | |
| Repair Protein | Protein that recognizes damaged DNA and recruits other proteins or enzymes. | |
| Replication | Protein involved in the DNA replication process. | |
| Telomeric Protein | Protein that binds telomere parts of a chromosome contributing to its stability. | |
| Zalpha | Protein that binds left-handed DNA. | |
| Transcription Factor | Alpha Helix | Protein that interacts with DNA mainly through α-helices. |
| Alpha/Beta | Protein interacting with DNA through α-helices and β-strands. | |
| Beta Sheet | Protein that interacts with DNA mainly through β-sheets. | |
| Helix Turn Helix | Protein that contains the HtH motif according to the information available in PDB. It includes those proteins containing the "winged helix" domain. | |
| Ribbon/Helix/Helix | Protein that contains the RHH fold according to the information available in PDB. | |
| Zinc Coordinating | Protein that coordinates the metal in order to bind DNA. | |
| Zipper Type | Protein that contains the zipper motif, including the helix-loop-helix one. | |
Figure 2Protein-protein interaction modes with DNA. Three modes of protein-protein interaction with DNA are defined, according to the direction and the axis of the DNA helix. (A) Mode 1, the direction of the protein interaction and the double helix axis are orthogonal. (B) Mode 2, the direction of the interaction is parallel to the double helix axis. (C) Mode 3, both previous modes are observed at the same time. In the Mode 3 example, the histone core shown is the only instance of this case in the current version of the database. The histone core presents a set of proteins interacting with each other, thus making up a continuous interface with DNA. Additionally, a Mode 0 to assign those interfaces with only one protein has also been defined (not shown).
Figure 3DNA features. Several DNA features has been defined. (A) Double strand or single strand in the asymmetric unit. This is useful to identify those interfaces coming from the reconstruction of the biological unit. (B) Sticky ends were defined based on the specific strands and the number of free bases at their ends. (C) Presence of flipped bases. (C) Existance of nicked DNA. (E) Existance of gapped DNA. (F) Presence of modified or non-standard DNA bases. (G) Presence of opened or unpaired bases at the DNA ends. Although not depicted here, left-handed DNA conformation was also recorded (Z-DNA).
Figure 4Definition of effective atomic interactions. (A) To determine if the interaction between DNA atom X and protein atom Y is effective, all other atoms inside the X interacting sphere (Zi atoms) are evaluated by comparing each ωi angle (i.e. the angle between atoms X, Zi and Y) with a defined shielding angle value Ω. If all the ωi angles observed are smaller than Ω, then the interaction between X and Y is defined as effective. (B) Three-dimensional view of three example interacting spheres of X, which only differ in the value of Ω. Red balls represent those protein atoms interacting effectively with DNA atom X. Pink balls represent protein atoms not interacting effectively with X, since they are shielded by other atoms inside the interacting sphere of X, according to the Ω value defined and used. A definition of Ω = 90° commonly captures the first interacting atom shell, while using Ω = 180° all the interactions observed inside the contacting sphere are considered as effective. To build this database a value of 90° for Ω was adopted.
Figure 5Classification of DNA atoms. All the interactions occurring in an effective interface were classified according to the chemical/structural/groove position of the DNA atoms. Pink-highlighted atoms were classified as being part of the major groove, green-highlighted atoms belong to the minor groove, blue-highlighted atoms belong to the backbone and sugar, and yellow-highlighted atoms were classified as not assigned since they are in an ambiguous location.
Definition of interaction classes and types
| Class | Type | Detail |
|---|---|---|
| CHb | 1 | DBE-PSC: NA - ND |
| 2 | DBE-PSC: NA - OD | |
| 3 | DBE-PSC: OA - ND | |
| 4 | DBE-PSC: OA - OD | |
| 5 | DBE-PSC: ND - OA | |
| 6 | DBE-PBB: NA - ND | |
| 7 | DBE-PBB: ND - OA | |
| 8 | DBE-PBB: OA - ND | |
| 9 | DBB-PSC: OA - ND | |
| 10 | DBB-PSC: OA - OD | |
| 11 | DBB-PBB: OA - ND | |
| SHb | 12 | DBB-PSC: OA - SD |
| 13 | DBE-PSC: NA - SD | |
| 14 | DBE-PSC: OA - SD | |
| 15 | DBE-PSC: ND - SA | |
| CHO | 16 | DBE-PSC: CD - OA |
| 17 | DBE-PSB: CD - OA | |
| Ion | 18 | Ionic bond: (+)··· (-) |
| Hph | 19 | C - C |
| 20 | Not assigned | |
The nomenclature of the abbreviations used in this Table is the following: DBE, DNA Base edge; DBB, DNA Backbone; PSC, Protein Sidechain; PBB, Protein Backbone; XA, Acceptor; XD, Donor; O, Oxygen; N, Nitrogen; S, Sulphur; C, Carbon. See text for more information.
Figure 6Web user interface of PDIdb. (A) The database search engine. There are two search modes: basic and advanced. A basic search can be carried out by entering a PDB code or a keyword. The advanced search allows the user to make more complex queries through a dynamic expanding search form, by combining many subqueries that search all fields available in the database. (B) The result of the query is a table where each row shows basic information about the interface. When the user clicks on it, the row expands and a new table with detailed information, 2D and 3D molecular graphics is displayed. (C) For each interface, this information includes the protein and DNA features, as well as the detailed composition of the atomic contacts. (D) The user can inspect graphical information at both the sequence and structure level. A Jmol applet is available to explore the structure and the atomic interactions conforming the interface in three-dimensional space. (E) Sequences highlighting the contacting residues are available in FASTA format for further analysis (e.g. BLAST). (F) The protein-DNA complex can be also explored by means of NUCPLOT graphs, which map onto a plane direct or water-mediated hydrogen bonds between aminoacids and nucleotides.