Michael Garton1, Charles Laughton. 1. School of Pharmacy and Centre for Biomolecular Sciences, University of Nottingham, University Park, Nottingham NG7 2RD, UK.
Abstract
Eukaryotic chromosomes are capped by telomeres, nucleoprotein complexes that prevent chromosome end-to-end fusions and control cell ageing. Two proteins in this complex, telomere repeat binding factors (TRF1 and TRF2), specifically recognise the double-stranded TTAGGG tandem repeat sequence. TRF1 is a homodimer with roles governing DNA architecture and negatively regulating telomere length. We explore the conformational space of this protein-DNA complex using molecular dynamics and, for the first time, generate a complete model of TRF1-DNA recognition that has not been possible on the basis of crystallographic and NMR data alone. The results reconcile previous conflicting experimental models for the sequence selectivity of the recognition process, by confirming many of the findings while identifying important new interactions and behaviour. This improved characterisation also reveals extensive indirect readout, which suggests that recognition will be affected by changes to DNA helical parameters such as bending.
Eukaryotic chromosomes are capped by telomeres, nucleoprotein complexes that prevent chromosome end-to-end fusions and control cell ageing. Two proteins in this complex, telomere repeat binding factors (TRF1 and TRF2), specifically recognise the double-stranded TTAGGG tandem repeat sequence. TRF1 is a homodimer with roles governing DNA architecture and negatively regulating telomere length. We explore the conformational space of this protein-DNA complex using molecular dynamics and, for the first time, generate a complete model of TRF1-DNA recognition that has not been possible on the basis of crystallographic and NMR data alone. The results reconcile previous conflicting experimental models for the sequence selectivity of the recognition process, by confirming many of the findings while identifying important new interactions and behaviour. This improved characterisation also reveals extensive indirect readout, which suggests that recognition will be affected by changes to DNA helical parameters such as bending.
Telomeres are nucleoprotein complexes that protect the ends of linear chromosomes in eukaryotes. They perform a number of functions including the prevention of chromosome end-to-end fusions that occur when DNA ends are recognised as double-strand breaks. They also act as a buffer to solve the end replication problem, which means that their length is one determinant of cellular age. Activation of telomerase during carcinogenesis promotes telomere elongation and cell immortalisation, currently making telomeres a subject of intense interest in cancer biology. Telomeric DNA is always guanine rich but sequence varies depending on the organism. In humans, it consists of tandem TTAGGG repeats, and in double strands, these are specifically recognised by the telomere repeat binding factors TRF1 and TRF2. TRF1 is a homodimer and flexible linkers between the DNA binding and dimerisation domains imbue it with remarkable spatial flexibility. Using two Myb-like DNA binding homeodomains, this flexibility is used to direct telomere architecture. TRF1 is also a negative regulator of telomere length and recruits other shelterin components to the DNA, ostensibly via association with TIN2. The TRF1–DNA interaction has been characterised by both X-ray crystallography and NMR. In common with most examples of protein–DNA recognition, these structures reveal both specific contacts between the protein and DNA bases, and nonspecific contacts between the protein and the sugar-phosphate backbone. The former are readily related to the mechanisms through which the protein reads the DNA sequence, while the latter can only do so indirectly—through sequence-dependent distortions in the DNA backbone geometry. While in general crystallography and NMR reveal similar features, in detail the situation is less harmonious. The crystal structure shows that two base pairs in the TTAGGGTTA binding motif are not involved in any direct DNA–protein contacts, while mutagenesis finds that binding activity is lost when these bases are mutated. At the same time, bases interacting with the protein via minor-groove contacts could be mutated without an effect on binding activity. The NMR structure deposited in the form of 20 conformations is discordant with the crystal structure but better explains the mutagenesis results, hinting strongly that a better understanding of dynamic aspects of the complex may be required to fully appreciate the recognition process. Two recent molecular dynamics (MD) studies have used the TRF1–DNA complex to examine water mediation in protein–DNA complexes and also explore flexibility change driven by complex formation. However, neither of these studies focused specifically on TRF1–DNA recognition and in fact no single model that fully describes recognition is available.Here, we assess the contribution of dynamic interaction to TRF1–DNA recognition, using an extensive set of MD simulations to explore the conformational space. Based on analysis of the resulting trajectories in conjunction with previous characterisations, we propose a reconciliatory model of TRF1–telomere recognition. The model builds on many findings from the crystal structure, mutagenesis data, and NMR structure, also adding some important, completely new interactions. This model is the most complete characterisation to date of the TRF1–DNA complex and includes both hydrophobic and water-mediated interactions. Specifically, this model explains why DNA–protein recognition is compromised by a transversion in the DNA sequence at position 6 (i.e., from TTAGGGTTA to TTAGGCTTA), and why it is much less sensitive to transversion at position 8 (i.e., from TTAGGGTTA to TTAGGGTAA).
Results
Calculation of root-mean-square deviations, principal component analysis, and clustering confirm equilibration and adequate sampling of conformational space by the MD simulations
As explained in Methods, our approach yields six independent replicate simulations of the protein–DNA interface for each DNA sequence modelled (WT, G6C, and T8A). Calculating the root-mean-square deviations (RMSDs) for each replicate simulation using protein alpha carbons and DNA phosphates (with the first MD snapshot as the reference frame) provides a preliminary evaluation of simulation quality. For the wild-type system, Fig. S1 shows that site A of replicate 1 takes 25–30 ns to properly equilibrate, but the remainder are equilibrated by around 3 ns. Minimal fluctuation is observed across the trajectories, indicating stable simulations. RMSD is an inherently crude metric, and thus, to provide a more detailed analysis of the way the molecules explored their conformational space, we used principal component analysis (PCA) to isolate the first two principal motions of each protein–DNA complex. Figure 1 shows the extent of this exploration for each system. Good equilibration and reproducibility is observed for four of the six binding-site replicates. Binding site A in both replicates 1 and 2 (broken lines) shows a large conformational spread because TRF1 partially dissociates from the DNA over the course of the simulation. These replicates were removed before further analysis, on the grounds that this behaviour is likely to be an artefact of simulating the interaction of the protein with a short DNA oligomer rather than with an extended length of telomeric DNA.
Fig. 1
PCA of all six binding sites for the wild-type TRF1–DNA complex. Four out of six replicates (solid lines) are closely clustered, indicating well-reproduced MD. Site A of replicates 1 and 2 (broken lines) spread outside of the cluster because DNA binding is partially lost; data from these replicates were discarded.
The remaining trajectories were combined and clustered to identify the most popular conformations. Figure 2a–e shows representative structures from the most significantly populated clusters (a = 33.3%, b = 19.2%, c = 10.5%, d = 9.8%, e = 9.1). In fact, there is little variation between these structures with a maximum RMSD of 1.79 Å between any two. Some difference in the N-terminal arm position is observed and is discussed later. Taken together, RMSD, PCA, and clustering confirm the reliability and stability of the selected MD calculations, providing an appropriate level of confidence in proposing a more complete model of TRF1–DNA recognition. The equivalent analysis of the G6C and T8A mutant simulations is included in the Supplementary Material (Figs. S2–S5).
Fig. 2
Structures (a–e) representing (in descending order) the most highly populated conformations from cluster analysis. For clarity, the DNA bases are colour coded in accordance with contact map colouring: adenine, yellow; guanine, green; cytosine, blue; and thymine, purple.
The MD model identifies new contacts and describes a highly dynamic interface
The MD data were used to construct a map describing TRF1–telomeric DNA recognition (Fig. 3). The map details hydrogen bonds with nitrogenous bases, contacts with the sugar phosphate backbone, water-mediated interactions, and hydrophobic patches. Figures 4 and 5 show analogous maps generated in the same way from the crystal structure and NMR data. MD predictions are in much better agreement with the NMR model than the crystal structure. This is probably because both NMR and MD explicitly describe conformational variability. The agreement between the simulations and the NMR data is evident from the observation that every conformer is sampled by MD to within an RMSD of 1.4–1.6 Å (Table S1, Supplementary Material). There is no evidence that there is a correspondence between the deposited NMR structures and the centroids of the clusters identified above (Table S2, Supplementary Material); however, this could simply reflect a difference in the criteria used to select the NMR structures for deposition.
Fig. 3
Map describing every interaction between TRF1 and telomeric DNA as determined by MD. Direct readout (red line), indirect readout (black dotted line), hydrophobic patches (black broken line), and water-mediated contacts (blue line) are all delineated and show many more interactions than either the crystal or NMR structures (Table 1). Direct and indirect contacts are annotated with percentage occupancy and standard deviation.
Good sampling is further confirmed by the fact that though the MD simulations were initiated from crystal structure coordinates, many “NMR only” interactions are reproduced, notably the important Lys43–A3 and Asp44–C6 contacts. These were deemed critical for the interpretation of mutagenesis experiments but not identified in the crystal structure. Completely new interactions determined by MD further reinforce mutagenesis results, particularly interaction between Arg47 and G6, which suggest why binding activity is lost upon G6 mutation. The MD model identifies many new contacts, which cumulatively show that every base pair in the TTAGGGTTA binding motif is involved in direct (base-specific) readout. Added to this, the THR 48–A7 interaction extends the major-groove binding motif by one more base pair than previously determined. Altogether, this means a significantly greater incidence of both direct and indirect readout interactions (Table 1).
Table 1
Comparing the number of contacts for each model including crystal structure and solution NMR together with MD wild-type and mutated DNA models
Model
H-bonds to bases
H-bonds to backbone
Hydrophobic contacts
Water-mediated contacts
Crystal structure
6
6
1
10
NMR
7
8
6
NM
MD (wild type)
10
12
1
11
MD (G6C mutant)
7
7
1
7
MD (T8A mutant)
10
12
0
8
NM, not measured. MD captures a much broader set of hydrogen bonds than the other methods while indicating less involvement of the hydrophobic effect. The loss of binding engendered by G6-to-cystosine mutation is seen by a reduced interface, while the ineffectual T8-to-adenine mutation shows little reduction.
In addition to new contacts, the MD model augments its predecessors by confirming and increasing the extent of dynamic interaction at the interface. NMR suggests three oscillatory dinucleotide contacts, through which a single amino acid can interact alternately with both base pairs in a base pair step. MD confirms these and identifies an additional oscillatory contact of Arg47 with G5 and G6. This is particularly interesting as the newly found interaction between these bases is implicated in correctly orienting TRF1 binding (see the next section). The MD map also reveals that though, as the crystal structure shows, the N-terminal Arg2 can make contacts simultaneously with three bases, in reality, it is likely to be a highly dynamic interaction.Approximately half of the total hydrogen bonds identified exploit the sugar phosphate backbone, suggesting that recognition and binding affinity are dependent on DNA helix morphology. Helical parameter changes such as untwisting, bending, and kinking will disrupt the network of contacts with ramifications for the binding affinity. Indirect readout can however still contribute to the sequence specificity of protein–DNA recognition because DNA helical morphology is sequence directed as a consequence of base stacking interaction differences.In terms of binding-site hydration, this model is in agreement with very recent MD studies that have focused on the microscopic dynamic properties of water surrounding the TRF1–DNA complex. Specific, persistent sites of solvent density shown in Fig. 6a–f confirm the rigid water layer described in these studies. Red spheres represent 10 crystallographic waters identified at the interface. MD agrees remarkably well, confirming these and predicting an additional water-bridged contact between 3 residues and G6. Interestingly, though the sites of solvent density are clear and sharp, an analysis of residence times extracted from the MD data (Table 2) shows that, in all cases, waters exchange rapidly at an average rate of 21.9 different molecules per nanosecond, a value in line with other MD studies on biomolecular hydration.
Fig. 6
Map showing hydration density at the TRF1–DNA interface from six different viewpoints (a–f). Areas of intransient solvent density as determined by MD (grey mesh) are largely consistent with the crystallographic waters (red spheres). The N-terminal arm is colour coded orange; helices 2 and 3 are purple and green, respectively. On the DNA: adenine is yellow; thymine is purple; guanine is green; cytosine is blue. These correspond to the contact map colour coding.
Table 2
Dynamic characteristics of water-mediated protein–DNA interactions
Site
Between residues
Unique WAT/ns
Maximum residence time (ps)
Average residence time (ps)
1
Ser39 and T1
18.6
680
57
2
Ser39 and T2
22.6
387
45
3
Ser39 and A3
22.1
441
45
4
Lys43 and A3
20.8
336
48
5
Arg47 and Asp44 (A7)
19.3
585
52
6
Arg47 and Trp46 and Asp44 (G6)
24.0
304
42
7
A7 (Arg47 and Asp44)
25.9
337
39
8
Met41 and C5, C6
21.3
519
47
9
G6 (Arg47 and Trp46 and Asp44)
26.1
362
38
10
Asp44 (C4 and C5)
20.0
539
50
11
C4 and C5 (Asp44)
20.7
411
48
Sites are numbered as in Fig. 3. Hydration sites shown in bold were only observed in the MD simulations, all others correlate with crystallographically observed sites.
Court et al. in discussing the crystal structure did not specifically identify hydrophobic group interactions; however, in our analysis of this structure, we did find a single instance of close proximity between hydrophobic moieties in the DNA and protein, and six in the case of the NMR models. In this case, our MD supports the crystallography in identifying only one good hydrophobic contact, with the caveat that TIP3P is a simple water model and not thoroughly validated in reproducing protein hydrophobicity. Time-dependent measurements indicate that most instances of proximity between TRF1 and thymine methyl groups are transient. The relatively few NMR conformations appeared to capture these transient states.
Newly identified sequence-specific contacts with the G6/C6 base pair are vital to DNA backbone clamping
While the newly identified contacts between TRF1 and G6/C6 in the wild-type sequence make it much easier to explain why transversion at this site markedly reduces interaction, confirmation of this requires analysis of the mutant system (G6C) to ensure that no alternative, compensatory interactions are possible. This is indeed the situation. Table 1 and Fig. S6 show that though the interaction of Arg47 with the guanine base is apparently able to “follow” its transversion, the interaction of Asp44 with the cytosine counterpart cannot. In addition, many contacts to neighbouring, non-mutated base pairs in the DNA are also lost. Figure 7 shows the effect of this mutation on major-groove width, measured using Curves +. Characteristic DNA backbone clamping around the insertion helix is greatly reduced, explaining the loss of non-mutated contacts and the dramatic loss of binding activity reported from mutagenesis. DNA backbone narrowing is a classic feature of helix–turn–helix (HTH)–DNA recognition, and its impairment by mutation provides explanation for the severe binding reduction. Importantly, there is little evidence that it is changes in the intrinsic structure or elastic properties of the local DNA sequence, brought about by the change in sequence, that contribute much to this selective recognition. Thus, comparison of both mutated and wild-type free-DNA helical parameters indicates that the G6C mutation does little to change DNA morphology in the unbound state or, as evidenced by the error bars for major-groove width and helical twist, relevant elastic constants (Fig. 8). It seems therefore that recognition is “direct”—the change in TRF1 orientation engendered by Arg47 relocation causes the misalignment of other critical contacts.
Fig. 7
DNA major-groove width using the trajectories of TRF1-bound wild type and G6C DNA compared with free telomeric DNA. Sugar phosphate backbone narrowing around the HTH insertion helix is an important feature of binding, which is lost by mutation of G6 to a cytosine.
Fig. 8
Comparison of helical parameters: twist, major-groove width, minor-groove width, roll, and rise for wild type and G6C mutant DNA, both complexed and uncomplexed. Bars showing standard deviation for twist and major-groove width indicate that G6C mutation does little to alter flexibility.
N-terminal promiscuity explains tolerance to mutation at positions 8 and 9
Binding to the telomeric minor groove by TRF1 occurs via the homeodomain N-terminal arm. While the significance of this arm in binding affinity is well established, the argument that it contributes a high level of specificity is disputed here. Biophysical work has shown that mutation of T8 and A9 engenders no significant loss of binding activity. Our simulations of the T8A model show that when T8/A8 undergo transversion, the long flexible N-terminal arm is able to simply relocate with the same affinity to different thymine or adenine bases in the minor groove (S7, Supplementary Material). This promiscuity is also evident by comparison of the MD (Figs. 2 and 3), NMR (Fig. 5), and crystal models (Fig. 4), which show significant variation between models in terms of bases contacted by N-terminal Arg2. While the arm undoubtedly contributes to binding affinity, we predict that its only specific discriminatory function will be between T + A and G + C. Functionally, by retaining some protein flexibility on binding, TRF1 may mitigate the entropic penalty normally imposed by immobilisation at the DNA interface. Arm flexibility also means that any conformational adjustment in this region to optimise interactions with the local DNA sequence does not influence critical aspects of the corresponding HTH domain orientation or conformation. Table 1 shows that mutation of the minor groove (T8A) and subsequent N-terminal relocation have no effect on the overall number of hydrogen bonds involved in total recognition.
Fig. 5
Map describing the interactions between TRF1 and telomeric DNA as determined by solution NMR (Protein Data Bank entry 1IV6). Direct readout (red line), indirect readout (black dotted line), and hydrophobic patches (black broken line) are delineated. Direct and indirect contacts are annotated with their percentage occupancy and standard deviation using the 20 conformations published. With the exception of hydrophobic patches, less interaction is captured compared with MD (Table 1) and no water-mediated interaction could be analysed.
Fig. 4
Map describing the interactions between TRF1 and telomeric DNA as determined by crystallography (Protein Data Bank entry 1W0T). Direct readout (red line), indirect readout (black dotted line), hydrophobic patches (black broken line), and water-mediated contacts (blue line) are delineated, showing significantly less interaction than MD (Table 1). No occupancy information can be determined from the single conformation.
Discussion
We have used an extensive set of MD simulations to explore in detail the conformational space of telomeric DNA in complex with TRF1. A comprehensive picture of protein–DNA recognition has been produced, which includes both water-mediated interactions and hydrophobic group involvement in addition to direct protein–DNA hydrogen bonding. The simulations emphasise how individual amino acids can contribute to the specific recognition of more than one DNA base, not only in a static manner—through bifurcated interactions, but also in a dynamic manner—by oscillatory interactions.Analysis of hydration density maps, integrated over 200 ns of MD, identifies regions of intransient solvent mediation. These regions persist across a range of bound state conformations and therefore provide a different perspective to the X-ray picture of individual molecules of hydration, stabilised at a single crystallographic conformation. In view of this, it is quite remarkable that the MD predicts a pattern of water-mediated contacts that matches the crystallographic data very well; just one extra high-occupancy water is predicted, interacting with G6.Mutation of the G6/C6 base pair by transversion causes loss of the characteristic width reduction in the DNA major groove that optimises DNA–protein interactions and highlights the importance of the (MD identified) Arg47–G5/G6 dinucleotide interaction. The widespread loss of other hydrogen bonds as a consequence of this loss is consistent with experimental findings that, for this mutation, report a severe reduction in binding affinity. What we cannot fully quantitate though at this stage is whether the failure to achieve a narrow major groove is an intrinsic property of the mutated DNA sequence (“the groove fails to narrow, so good contacts cannot be formed”) or of the recognition requirements of the protein–DNA interaction (“good contacts cannot be formed, so the groove fails to narrow”).Experimental observations indicating that the interaction of the homeodomain N terminus with the telomeric DNA is actually rather non-sequence-specific are confirmed by our model. We show that N-terminal arm flexibility allows it to effectively relocate to more favourable bases in the event of minor-groove transversion. This promiscuity does not affect the rest of the binding-site orientation as damping by the flexible arm ensures that the HTH hydrogen-bonding network remains intact.More than half of the identified hydrogen bonds are contacts to the sugar phosphate backbone. Recognition of DNA by TRF1 and subsequent binding affinity are therefore likely to be sensitive to DNA morphological changes that disrupt this network. This is in line with experimental observations that TRF1 avoids telomeric T- and D-loop regions. The predominant localisation of TRF2 over TRF1 at these structures may indicate a different binding mode for TRF2 and work is ongoing to establish this.The ability of MD simulations to couple the atomic detail of crystallographic studies with the dynamic insights from NMR, and through doing so to reconcile apparently conflicting aspects of a molecular recognition problem, reinforces the view that a combination of experimental and theoretical techniques is preferred—indeed maybe necessary—to accurately characterise biological interactions.
Methods
MD calculations were performed on the TRF1–TAGGGTTA complex using Protein Data Bank file 1W0T as a starting point. The crystal structure is composed of a 19-nucleotide section of DNA with two TRF1 proteins bound to adjacent binding sites. Models were constructed for both wild-type DNA and two mutations: G6C and T8A (G6C: TTAGGG → TTAGGC, T8A: TTAGGG → TAAGGG). Mutation choices were based on apparent discrepancies between the crystal structure and mutagenesis work: G6C resulted in dramatic loss of binding activity and T8A at T8 reportedly had no effect. In both cases, the opposite effect was predicted by the crystal structure while NMR did not adequately explain such a severe drop-off in affinity. The models were prepared for MD simulation using the WHATIF web interface to build in any missing atoms and identify protonation states. They were then explicitly solvated (approximately 11,500 waters per model) in a 10-nm box of TIP3P water using TLEAP in AMBER 10. Sodium counterions were added for overall charge neutrality, and periodic boundary conditions were applied with the following box dimensions: 63 nm × 66 nm × 71 nm. Bonds to hydrogen were constrained using SHAKE to permit a 2-fs time step, and the particle mesh Ewald algorithm was used to treat long-range electrostatic interactions. The non-bonded cutoff was set at 12.0 Å. Systems were energy minimised using a combination of steepest descent and conjugate gradient methods. MD calculations were carried out with the PMEMD module of AMBER 10 in conjunction with the FF99 Barcelona force field, which is specifically customised for nucleic acids. The FF99 Stony Brook force field was used for the protein. Each system was equilibrated and heated over 100 ps to 300 K and positional restraints were gradually removed. A Berendsen thermostat and barostat was used throughout for both temperature and pressure regulation. Production MD runs of 3 × 50 ns replicates for each system were obtained from randomised starting velocities. Since each simulation features two independent copies of the protein in complex with its cognate (or mutated) DNA sequence (which we refer to as site A and site B), in this way, a total of 300 ns of conformational space exploration was obtained for each mutation. Exact protocols for the minimisations, equilibrations, heating, and production runs are delineated in the Supplementary Material. During calculations, a snapshot was saved every 2 ps. RMSDs (Fig. 1a) and PCA (Fig. 1b) were used to assess the equilibration and reproducibility of each replicate by employing an in-house code: PCAZIP. Contour map outlines were generated from two-dimensional histograms of the first two PCA projections using matplotlib. Subdivided into a 50 × 50 grid, the outline is drawn around all grid bins that have a data point occupancy ≥ 0.025% of the total points. PCA revealed instability in two of the six binding-site replicates and these were discarded before further analysis. RMS clustering of the trajectory frames was carried out using the MMTSB toolset with the radius set to 2 Å and maxerr set to 1. Bonds were identified and quantified using PTRAJ in AMBER together with VMD. Hydrogen bond length cutoff was set at 3.5 Å including water-mediated contacts, and groups participating in hydrophobic effects were defined as non-polar, non-bonded contacts < 5 Å. DNA helical parameters were derived using Curves +. PTRAJ was also used to map hydration density using the “Grid” function and the protocol for this is described in the Supplementary Material. Structural alignments were performed and RMSDs were calculated using PyMOL from selected structures representing the most highly populated clusters.
Authors: Hao Peng; Yun Zhu; Fawn Yeh; Shelley A Cole; Lyle G Best; Jue Lin; Elizabeth Blackburn; Richard B Devereux; Mary J Roman; Elisa T Lee; Barbara V Howard; Jinying Zhao Journal: Aging (Albany NY) Date: 2016-08 Impact factor: 5.682