| Literature DB >> 20466807 |
Marc van Dijk1, Alexandre M J J Bonvin.
Abstract
The intrinsic flexibility of DNA and the difficulty of identifying its interaction surface have long been challenges that prevented the development of efficient protein-DNA docking methods. We have demonstrated the ability our flexible data-driven docking method HADDOCK to deal with these before, by using custom-built DNA structural models. Here we put our method to the test on a set of 47 complexes from the protein-DNA docking benchmark. We show that HADDOCK is able to predict many of the specific DNA conformational changes required to assemble the interface(s). Our DNA analysis and modelling procedure captures the bend and twist motions occurring upon complex formation and uses these to generate custom-built DNA structural models, more closely resembling the bound form, for use in a second docking round. We achieve throughout the benchmark an overall success rate of 94% of one-star solutions or higher (interface root mean square deviation ≤4 A and fraction of native contacts >10%) according to CAPRI criteria. Our improved protocol successfully predicts even the challenging protein-DNA complexes in the benchmark. Finally, our method is the first to readily dock multiple molecules (N > 2) simultaneously, pushing the limits of what is currently achievable in the field of protein-DNA docking.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20466807 PMCID: PMC2943626 DOI: 10.1093/nar/gkq222
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Nucleotide atom subsets used in the definition of AIRs
| DNA base | Minor groove atoms | Major groove atoms |
|---|---|---|
| Thy | H3, O2, C2′ | H3, O4, C4, C5, C6, C7′ |
| Ade | N1, N3, C2, C4′ | H61, H62, N1, N7, C5, C6, C8′ |
| Gua | H1, H21, H22, N3, C2, C4′ | H1, H21, N7, O6, C5, C6, C8′ |
| Cyt | N3, O2, C2′ | H41, H42, N3, C4, C5, C6′ |
| Non-specific backbone atoms | ||
| Sugar–phosphate backbone | C1′, C2′, O3′, O5′, P, O1P, O2P | |
Subsets are defined for atoms capable of interacting using non-bonded or hydrogen bonded interactions. Individual subsets are defined for those atoms facing the DNA major and minor groove for the four bases and for the sugar–phosphate backbone atoms.
Definition of the AIRs based on experimental data for the six selected test-cases
| Protein | DNA | References | |
|---|---|---|---|
| ‘Easy’ | |||
| 1by4 ( | Act: (K31,R32) | Act: (T5,C6,A26,C27,C28,T29) | ( |
| Pas: V34,A75,V76,Q77, R55,N56,Q59,R62 | |||
| 3cro ( | Act: (K29,Q31,S32,K42-P44) | Act: (C6,A7,T16-T18,C24,A25,T34- T36) | |
| Pas: K9,T18-T20,G27,V28,Q30,Q34, I36,E37,V40,T41,R45,F46 | (T4,A5,T13,C14,T15,T22,A23, G31) | ( | |
| ‘Intermediate’ | |||
| 1azp ( | Act: W24 | Act: C2,G3 | ( |
| Pas: K21,R25,G27,K28,K39,T40,A44, S46,E47 | |||
| 1jj4 ( | Act: (N13,K16,C17,R19-R21) | Act: (A3,C4,T30) | ( |
| Pas: S34,T35,H37 ⇒ T26-C27 | |||
| ‘Difficult’ | |||
| 1a74 ( | Act: (H97,N122) | Act: (T1-C7) | ( |
| Pas: V51,G57,P58,T66,V71,H77, H100,K119 | |||
| 1zme ( | Act: (R9,R11,H12,R80,R82,H83) | Act: (C2,G3,G4,C15,C17,G18, C20,G21,G22,C33,C34,G35) | ( |
Active residues (Act) are grouped according to the available information. Continuous stretches of residues are separated by a dash. Arrows indicate active restraints for specific pairs of residues. Passive residues (Pas) are only defined for the protein. Since 1by4, 1jj4 and 1a74 are symmetrical dimers only the restraints for one subunit are shown. Base-specific restraints for 3cro, 1by4, 1jj4, 1a74 and 1zme are targeted to the atoms of the nucleotides facing the major groove and those of 1azp to those facing the minor groove (Table 1).
aConserved residues.
bMutagenesis data.
cEthylation interference data.
dMethylation interference data.
eNMR native state amide hydrogen exchange.
fRaman spectroscopy.
Figure 4.Best solutions from unbound flexible docking using an ensemble of custom-built DNA structural models (blue) superimposed on to the reference structure (yellow). The complexes are grouped according to their docking difficulty (‘easy’, ‘intermediate’ and ‘difficult’) as indicated in the benchmark. The CAPRI score for each solution is indicated as one or two stars after the PDB code as well as the fraction of native contacts (a), the interface (b) and DNA r.m.s.d (c) from the reference structure. r.m.s.d values (Å) were calculated after superimposition on all heavy atoms of the selected regions of the reference complex. The figures were generated using Pymol (DeLano Scientific LLC, www.pymol.org).
Figure 1.Cumulative bar graph expressing the quality of the docking solutions according to the CAPRI star rating for all 2000 bound–bound rigid-body docking solutions. Complexes are sorted according to the total number of obtained stars. CAPRI criteria are defined as; three stars (high quality): Fnat > 0.5, l-r.m.s.d or i-r.m.s.d < 1.0 Å; two stars (medium quality): Fnat > 0.3, l-r.m.s.d < 5.0 Å or i-r.m.s.d < 2.0 Å; one star (acceptable quality): Fnat > 0.1, l-r.m.s.d < 10.0 Å or i-r.m.s.d < 4.0 Å. Fnat is the fraction of native contacts within a 5 Å cutoff.
Figure 2.Cumulative bar graphs expressing the quality of the best 400 docking solutions according to the HADDOCK score in terms of CAPRI one-star (grey) and two-star (white) results, for the two-stage unbound–unbound protein–DNA docking using true interface derived restraints. Results are presented for; the rigid-body docking starting from a canonical B-DNA model (A); after the semi-flexible refinement (B) and after semi-flexible refinement using an ensemble of custom DNA 3D structural models (C). Complexes are sorted according to the total number of obtained stars in (B), reclassifying the benchmark into ‘easy’, ‘intermediate’ and ‘difficult’ categories. See caption of Figure 1 for the definition of the CAPRI criteria.
Figure 3.All heavy atom r.m.s.d values from the reference complex [(A) DNA only, (B) full complex, (C) interface] and fraction of native contacts [Fnat, (D)] for the 10 best solutions of the best cluster, both selected based on the HADDOCK score, after rigid-body docking (open squares) and semi-flexible refinement (closed circles) starting from a canonical B-DNA structural model and after semi-flexible refinement (open triangle) starting from an ensemble of custom-built DNA models.
Performance of the two-stage docking protocol when using AIRs based on experimental information: the r.m.s.d values from the reference and fraction of native contacts for the top ten docking solutions of the top ranking cluster both selected based on the HADDOCK score
| r.m.s.d (Å) | Fnat | CAPRI | ||||
|---|---|---|---|---|---|---|
| Total | Interface | DNA | Protein | |||
| ‘Easy’ | ||||||
| 1by4 | ||||||
| Bound rigid | 0.410.08 | 0.340.07 | 0.000.00 | 0.380.07 | 0.890.02 | 0,0,10 |
| Unbound rigid | 4.330.72 | 4.010.53 | 1.410.00 | 4.660.73 | 0.110.04 | 4,0,0 |
| Unbound flex | 6.722.10 | 5.871.71 | 1.900.19 | 6.982.21 | 0.170.05 | 5,0,0 |
| DNA lib | 5.522.43 | 4.912.32 | 1.610.14 | 5.852.46 | 0.270.09 | 4,3,0 |
| 3cro | ||||||
| Bound rigid | 0.320.16 | 0.380.19 | 0.000.00 | 0.440.22 | 0.850.09 | 0,0,10 |
| Unbound rigid | 3.790.60 | 3.510.63 | 3.700.00 | 3.500.83 | 0.150.05 | 10,0,0 |
| Unbound flex | 3.570.63 | 3.290.68 | 2.860.30 | 3.190.68 | 0.270.07 | 6,2,0 |
| DNA lib | 2.890.40 | 2.620.73 | 2.080.21 | 2.960.43 | 0.400.06 | 3,7,0 |
| ‘Intermediate’ | ||||||
| 1azp | ||||||
| Bound rigid | 0.330.07 | 0.310.07 | 0.000.00 | 0.110.00 | 0.920.03 | 0,0,10 |
| Unbound rigid | 7.122.06 | 7.092.25 | 3.250.00 | 3.580.02 | 0.020.02 | 0,0,0 |
| Unbound flex | 6.902.00 | 6.682.26 | 2.870.32 | 3.640.13 | 0.040.04 | 0,0,0 |
| DNA lib | 4.560.79 | 4.000.45 | 1.830.26 | 3.760.16 | 0.100.04 | 5,0,0 |
| 1jj4 | ||||||
| Bound rigid | 0.390.10 | 0.400.09 | 0.000.00 | 0.100.03 | 0.820.07 | 0,0,10 |
| Unbound rigid | 4.230.37 | 4.760.48 | 3.190.00 | 1.470.05 | 0.090.02 | 3,0,0 |
| Unbound flex | 4.250.43 | 4.550.58 | 3.190.21 | 2.400.02 | 0.160.07 | 6,0,0 |
| DNA lib | 3.220.30 | 3.620.38 | 2.380.14 | 2.370.05 | 0.210.07 | 9,1,0 |
| ‘Difficult’ | ||||||
| 1a74 | ||||||
| Bound rigid | 0.060.01 | 0.070.01 | 0.000.00 | 0.010.00 | 0.840.01 | 0,0,10 |
| Unbound rigid | 5.430.99 | 6.880.97 | 7.440.00 | 1.680.14 | 0.040.02 | 0,0,0 |
| Unbound flex | 4.950.38 | 6.300.46 | 7.120.32 | 1.840.14 | 0.140.04 | 8,0,0 |
| DNA lib | 2.720.25 | 3.370.32 | 3.760.19 | 1.780.12 | 0.240.05 | 9,1,0 |
| 1zme | ||||||
| Bound rigid | 0.480.11 | 0.460.08 | 0.000.00 | 0.010.00 | 0.790.06 | 0,0,10 |
| Unbound rigid | 6.290.64 | 5.490.68 | 4.280.00 | 5.670.61 | 0.060.03 | 0,0,0 |
| Unbound flex | 6.150.62 | 5.290.59 | 4.680.33 | 5.880.27 | 0.120.06 | 4,0,0 |
| DNA lib | 5.270.62 | 4.630.80 | 3.350.13 | 5.550.48 | 0.150.04 | 8,0,0 |
Average all heavy atom r.m.s.d values from the reference structure (Å, standard deviation in subscript) calculated over:
aThe entire complex.
bThe interface.
cThe DNA only for the 10 top ranking solutions.
dThe protein only for the 10 top ranking solutions.
The r.m.s.d values are reported for; bound rigid-body docking (bound rigid); unbound rigid-body docking (unbound rigid), semi-flexible refinement (unbound flex.) starting from canonical B-DNA; unbound semi-flexible docking using a library of custom-built DNA structural models as input (DNA library).
eFnat is the fraction of native contacts.
fNumber of one-, two- and three-star CAPRI ranked solutions obtained in the top 10 solutions.