| Literature DB >> 18583363 |
Marc van Dijk1, Alexandre M J J Bonvin.
Abstract
We present a protein-DNA docking benchmark containing 47 unbound-unbound test cases of which 13 are classified as easy, 22 as intermediate and 12 as difficult cases. The latter shows considerable structural rearrangement upon complex formation. DNA-specific modifications such as flipped out bases and base modifications are included. The benchmark covers all major groups of DNA-binding proteins according to the classification of Luscombe et al., except for the zipper-type group. The variety in test cases make this non-redundant benchmark a useful tool for comparison and development of protein-DNA docking methods. The benchmark is freely available as download from the internet.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18583363 PMCID: PMC2504314 DOI: 10.1093/nar/gkn386
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The protein–DNA benchmark
| Complex | Protein | DNA | RMSD | ||||||
|---|---|---|---|---|---|---|---|---|---|
| PDB id | Cat. | PDB id | Description | Sequence 5′-3′ | Nr. | BSA | Inter. | DNA | Prot |
| ‘Easy’ targets | |||||||||
| 2c5r | 1 | 2bnkX | Phage PHI29 replication organizer protein P16.7 | TCCACCGG | 4 | 402 | 0.49 | 0.49 | 0.82 |
| 1pt3 (A:C:D) | 8 | 1m08X | Col-E7 nuclease domain | GCGATCGC | 2 | 730 | 1.35 | 2.09 | 1.36 |
| 1mnn | 1 | 1mn4X | Sporulation specific transcription factor NDT80 | TGCGACACAAAAACT | 2 | 1292 | 1.48 | 1.81 | 0.83 |
| 1fok | 1 | 2fokX | Restriction endonuclease FOKI | TCGGATGATAACGCTAGTCAT | 2 | 1920 | 1.53 | 2.51 | 1.09 |
| 1ksy (A:C:D:F) | 4 | 1f08X | Papillomavirus replication initiation domain E-1 | ATAATTGTTGTCAACAATTAT | 3 | 1020 | 1.58 | 2.56 | 0.52 |
| 3cro | 1 | 1zugN | Phage 434 CRO | AAGTACAAACTTTCTTGTAT | 3 | 1473 | 1.58 | 2.66 | 1.17 |
| 1emh | 8 | 1akzX | Human uracil-DNA glucosylase | TGT(P2U)ATCTTT | 2 | 869 | 1.62 | 4.53 | 1.46 |
| 1h9t | 1 | 1e2xX | FADR, fatty acid responsive transcription factor | CATCTGGTACGACCAGATC | 3 | 1622 | 1.68 | 3.88 | 0.77 |
| 1tro (A:C:I:J) | 1 | 3wrpX | TRP repressor | TGTACTAGTTAACTAGTACA | 3 | 1540 | 1.70 | 3.08 | 1.42 |
| 1by4 (A:B:E:F) | 2 | 1rxrN | Retinoid X receptor DNA binding domain | TAGGTCAAAGGTCAG | 3 | 1480 | 1.77 | 1.46 | 2.23 |
| 1hjc (A:B:C) | 5 | 1eanX | RUNX1 runt domain | GAACTCTGTGGTTGCGG | 2 | 634 | 1.80 | 2.88 | 0.97 |
| 1diz (A:E:F) | 8 | 1mpgX | TGACATGA(NRI)TGCCT | 2 | 805 | 1.82 | 5.80 | 0.46 | |
| 1rpe | 1 | 1r63N | Phage 434 repressor | ACAAACAAGATACATTGTATA | 3 | 1430 | 1.87 | 2.97 | 0.94 |
| ‘Intermediate’ targets | |||||||||
| 1vrr | 8 | 1sdoX | Restriction endonuclease BSTYI | TTATAGATCTATAA | 3 | 2098 | 2.08 | 2.11 | 2.22 |
| 1f4k | 1 | 1bm9X | Replication terminator protein | CTATGAACATAATGTTCATAG | 3 | 1741 | 2.26 | 1.94 | 2.29 |
| 1k79 (A:B:C) | 1 | 1gvjX | ETS-1 DNA binding and autoinhibitory domain | TAGTGCCGGAAATGTG | 2 | 912 | 2.37 | 3.82 | 0.80 |
| 1kc6 (A:B:E:F) | 8 | 2audX | Restriction endonuclease HINCII | CCGGTCGACCGG | 3 | 2658 | 2.38 | 4.67 | 1.38 |
| 1ea4 (D:E:F:G:W:X) | 6 | 2cpgX | Transcription repressor COPG | TAACCGTGCACTCAATGCAATC | 3 | 1473 | 2.43 | 4.48 | 0.64 |
| 1z63 (A:C:D) | 8 | 1z6aX | Sulfolobus solfataricus SWI2/SNF2 ATPase core domain | ATTGCCGAAGACGAAAAAAA | 2 | 603 | 2.51 | 2.74 | 2.27 |
| 1r4o | 2 | 1gdcN | Glucocorticoid receptor | CCAGAACATCGATGTTCTGT | 3 | 1401 | 2.61 | 3.05 | 1.91 |
| 1azp | 6 | 1sapN | Hyperthermophile chromosomal protein SAC7D | GCGATCGC | 2 | 778 | 2.70 | 3.77 | 2.76 |
| 1w0t | 1 | 1ba5N | HTRF1 DNA-binding domain | CTGTTAGGGTTAGGGTTAGA | 3 | 1545 | 2.78 | 3.20 | 2.47 |
| 1cma | 6 | 1mjkX | Methionine repressor | TTAGACGTCT | 2 | 775 | 2.81 | 2.60 | 2.05 |
| 1jj4 | 4 | 1f9fX | Papillomavirus type 18 E2 | CAACCGAATTCGGTTG | 2 | 1169 | 2.83 | 3.32 | 2.25 |
| 1vas | 8 | 1eniX | T4 pyrimidine dimer specific excision repair | ATCGCGTTGCGCT | 2 | 1445 | 3.04 | 6.99 | 1.42 |
| 4ktq | 8 | 1ktqX | DNA polymerase I | GACCACGGCGC(DOC) | 2 | 1685 | 3.23 | 3.64 | 1.97 |
| 1z9c (A:C:D) | 1 | 1z91X | Organic hydroperoxide resistence transcription regulator | TACAATTTAATTGTATACAATT TAATTGTA | 3 | 2107 | 3.24 | 4.26 | 4.18 |
| 1ddn | 1 | 2tdxX | Diphtheria TOX repressor | ATATAATTAGGATAGCTTTACC TAATTATTTTAA | 5 | 2877 | 3.26 | 7.25 | 0.50 |
| 2irf | 1 | 1irgN | Interferon Regulatory Factor 2 | AAGTGAAAGUGA | 2 | 898 | 3.35 | 2.23 | 3.83 |
| 1jt0 | 1 | 1jusX | Multidrug binding transcription factor QACR | CTTATAGACCGATCGATCGG TCTATAAG | 2 | 2484 | 3.49 | 4.58 | 3.53 |
| 1g9z | 8 | 2o7mX | I-CreI endonuclease | GCAAAACGTCGTGAGACAGTTTCG | 2 | 3255 | 3.67 | 5.02 | 4.21 |
| 1a73 | 8 | 1evxX | Intron-encoded homing endonuclease I-PPOI | TTGACTCTCTTAAGAGAGTCA | 2 | 2076 | 4.26 | 8.22 | 1.20 |
| 2fio | 4 | 2fibX | Phage PHI29 transcription regulator P4 | AAAAACGTCAACATTTTATA AAAAAGTCTTGCAAAAAGT | 2 | 1114 | 4.41 | 8.03 | 0.67 |
| 1qne (A:C:D) | 5 | 1vokX | Adenovirus major late promotor TBP | GCTATAAAAGGGCA | 2 | 1487 | 4.57 | 8.54 | 0.89 |
| 1zs4 | 1 | 1zpqX | Phage lambda CII | CCTCGTTGCGTTTGTTTGCACGAAT | 2 | 1358 | 4.71 | 2.97 | 3.77 |
| ‘Difficult’ targets | |||||||||
| 1qrv | 4 | 1hmaN | High mobility group protein D | GCGATATCGC | 3 | 1204 | 5.19 | 7.68 | 3.91 |
| 1o3t | 1 | 1g6nX | CAP-CAMP | GCTTTTTACGCTAGATCTA GCGTAAAAAGCGC | 2 | 1277 | 5.20 | 10.6 | 2.55 |
| 1b3t | 4 | 1vhiX | Epstein-Barr virus nuclear antigen-1 | GGAAGCATATGCTTCCC | 2 | 2627 | 5.32 | 3.91 | 3.53 |
| 3bam | 8 | 1bamX | Restriction endonuclease BAMHI | TATGGATCCATA | 3 | 2208 | 5.55 | 2.19 | 4.50 |
| 1rva | 8 | 1rveX | Eco RV endonuclease | AAAGATATCTTT | 2 | 2350 | 5.68 | 9.78 | 3.88 |
| 1zme | 2 | 1ajyN | Proline utilization transcription activator PUT3 | ACGGGAAGCCAACTCCGT | 2 | 1362 | 5.76 | 4.68 | 8.64 |
| 1dfm | 8 | 1es8X | Restriction endonuclease BGLII | TATTATAGATCTATAAAT | 3 | 2735 | 6.31 | 3.04 | 4.68 |
| 1bdt | 6 | 1arqN | Phage P22 Arc gene regulating protein | TATAGTAGAGTGCTTCTATCATT | 3 | 2109 | 6.45 | 4.90 | 5.20 |
| 7mht | 8 | 2hmyX | HHAI methyltransferase | GTCAGCGCATGG | 2 | 1613 | 6.71 | 2.55 | 3.84 |
| 2fl3 | 8 | 1ynmX | Restriction endonuclease HINP1I | CCAGCGCTGG | 2 | 1670 | 6.71 | 2.95 | 4.37 |
| 1eyu | 8 | 1pvuX | PVUII endonuclease | TGACCAGCTGGTCA | 2 | 2068 | 6.82 | 4.49 | 6.36 |
| 2oaa | 8 | 2oa9X | Restriction endonuclease MVAI | GGTACCTGGATG | 2 | 2009 | 8.95 | 8.15 | 8.02 |
aThe RCSB PDB accession number for the structures used. Specific chains are in parenthesis. Structures for the unbound protein were either solved by X-ray crystallography (X) or NMR spectroscopy (N).
bThe classification of the protein–DNA complexes in eight different groups according to the scheme of Luscombe et al. (6).
cThe base sequence of the DNA in the bound complex also used for generating the unbound DNA structure. Some sequences contain modified bases. These are: DOC (2′,3′-dideoxycytidine-5′-monophosphate), NRI (phosphoric acid mono-(4-hydroxy-pyrrolidin-3-ylmethyl) ester) and P2U (2′-deoxy-pseudouridine-5′monophosphate).
dThe number of individual biomolecules that need to be docked to reconstruct the complex.
eBuried surface area of the DNA upon complex formation in Å2.
fThe RMSD (Å) from the bound form calculated over the interface Cα and phosphate atoms of the unbound protein structure after superposition onto the reference complex.
gThe RMSD (Å) from the bound form calculated over all phosphate atoms of the unbound DNA after superposition onto the reference complex.
hThe RMSD (Å) from the bound form calculated over Cα atoms of the unbound protein after superposition onto the reference complex.
Figure 1.Illustration of ‘easy’ (interface RMSD < 2.0 Å), ‘intermediate’ (2.0 Å ≤ interface RMSD < 5.0 Å) and ‘difficult’ (interface RMSD ≥ 5.0 Å) test cases from the protein–DNA benchmark. ‘Easy’ test case: the Papillomavirus replication initiation domain E-1 (PDB id 1ksy) (interface RMSD = 1.6 Å) (A). ‘Intermediate’ test case: the intron-encoded homing endonuclease I-PPOI complex (PDB id 1a73) (interface RMSD = 4.3 Å) (B). ‘Difficult’ test cases: the proline utilization transcription activator (PDB id 1zme) (interface RMSD = 5.8 Å) (C) and the PVUII endonuclease complex (PDB id 1eyu) (interface RMSD = 6.8 Å) (D). The bound form of the complex is shown in yellow and the unbound protein in blue. The bound- and canonical B-form DNA structures are shown as insets to highlight the conformational changes in the DNA.