Literature DB >> 35694853

Development of Force Field Parameters for the Simulation of Single- and Double-Stranded DNA Molecules and DNA-Protein Complexes.

Maxwell R Tucker¹, Stefano Piana¹, Dazhi Tan¹, Michael V LeVine¹, David E Shaw^1,2.

Abstract

Although molecular dynamics (MD) simulations have been used extensively to study the structural dynamics of proteins, the role of MD simulation in studies of nucleic acid based systems has been more limited. One contributing factor to this disparity is the historically lower level of accuracy of the physical models used in such simulations to describe interactions involving nucleic acids. By modifying nonbonded and torsion parameters of a force field from the Amber family of models, we recently developed force field parameters for RNA that achieve a level of accuracy comparable to that of state-of-the-art protein force fields. Here we report force field parameters for DNA, which we developed by transferring nonbonded parameters from our recently reported RNA force field and making subsequent adjustments to torsion parameters. We have also modified the backbone charges in both the RNA and DNA parameter sets to make the treatment of electrostatics compatible with our recently developed variant of the Amber protein and ion force field. We name the force field resulting from the union of these three parameter sets (the new DNA parameters, the revised RNA parameters, and the existing protein and ion parameters) DES-Amber. Extensive testing of DES-Amber indicates that it can describe the thermal stability and conformational flexibility of single- and double-stranded DNA systems with a level of accuracy comparable to or, especially for disordered systems, exceeding that of state-of-the-art nucleic acid force fields. Finally, we show that, in certain favorable cases, DES-Amber can be used for long-timescale simulations of protein-nucleic acid complexes.

Entities: Chemical

Mesh：

Substances：
Amber
Proteins
RNA
DNA

Year: 2022 PMID： 35694853 PMCID： PMC9234960 DOI： 10.1021/acs.jpcb.1c10971

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 3.466

Introduction

Molecular dynamics (MD) simulations have frequently been used to characterize the structural dynamics of proteins. The application of MD simulation to nucleic acid based systems, however, has received considerably less attention, and as a consequence, the energy functions and associated parameterizations (″force fields″) used in simulating nucleic acids have been subject to less development and validation than corresponding protein force fields. In recent years, however, there have been substantial improvements to both the Amber[1−4] and CHARMM[5] family of DNA force fields, and an entirely new DNA force field based on QM data has recently been proposed to complement existing models.[6] In the case of RNA, we recently revised the torsion and nonbonded parameters of an Amber RNA force field[7] with an emphasis on obtaining a force field that could, with an accuracy comparable to that of state-of-the-art protein force fields, describe both single- and double-stranded RNA structures. Here, we describe a revision of our previously published force field parameters for RNA aimed at achieving for DNA a similar level of accuracy as was obtained for simulations of single- and double-stranded RNA oligomers.[7] As RNA and DNA are chemically very similar, we postulated that previously adjusted parameters for RNA should be largely transferable to DNA with a similar effect. Indeed, we found that after transferring the modified RNA nonbonded parameters to the DNA force field, only small adjustments to a few torsion parameters were required to make the force field suitable for DNA simulations. We also developed new parameters for the magnesium ion, a cofactor shown to often be critical for the stability of tertiary structures of nucleic acids, and for structural zinc ions, which are often found in protein motifs that bind to nucleic acids. The revised force field parameters were tested by comparing results from simulations of single-stranded (ssDNA) and double-stranded DNA (dsDNA) molecules (known to form a number of different secondary structures) to experiments. We found that the structure, dynamics, and thermodynamics of these systems were described with a level of accuracy comparable to that of state-of-the-art nucleic acid force fields. In particular, we found that this force field gives an excellent description of ssDNA structures while still providing a reasonably accurate description of dsDNA systems, with the notable exception that DES-Amber underestimates the relative stability of the BI/BII conformations. A second, equally important aim of this work was to make our nucleic acid force field parameters compatible with our latest protein force field parameterization, which provides state-of-the-art accuracy.[8] In this protein force field, the effect of polarization on the dielectric environment is modeled by an empirical scaling and redistribution of the net charges for ionic residues. The same approach has been followed here to develop the DNA and RNA electrostatic parameters so that these force fields can be used in combination with the protein force field. We name the force field that includes all of these parameters (the new DNA parameters, the revised RNA parameters, and the existing protein and ion parameters) DES-Amber. (We note that, in our previous work,[8] “DES-Amber” referred to a force field consisting solely of the protein parameters.) Subsequent testing of DES-Amber in simulations of protein–DNA and protein–RNA complexes gave mixed results, yielding an accurate reproduction of the structures of some, but not all, of the complexes investigated. Similar mixed results had been previously obtained for simulations of protein–protein complexes using the DES-Amber protein parameters.[8] A reparameterization of the phosphate backbone nonbonded interactions was attempted; this parameterization, described in detail in the Supporting Information (SI), involved the optimization of the phosphate–water interaction by modifications of the Lennard–Jones (LJ) pair potential followed by torsion refitting based on NMR data from short ssDNA fragments. The distribution of charges on the O2′ hydroxyl group and the aromatic carbon–water interactions were also optimized to reproduce QM data and the osmotic coefficient of purine in water. The resulting force field, named DES-Amber 3.20 (from the value of the LJ σ parameter of the interaction between water and the phosphate oxygen), greatly improved the description of protein–nucleic acid complexes, though we could not achieve a consistent improvement across all systems. Although further parameter optimization may result in more consistent improvement, the simple, nonpolarizable functional form used to describe the nonbonded interactions may limit the ability of the force field to simultaneously describe with a high level of accuracy isolated proteins and nucleic acids, as well as protein–protein and protein–nucleic acid complexes, and it is possible that more sophisticated, polarizable force fields (which are outside the intended scope of this work) may be required to address this shortcoming.

Methods

Ab Initio Calculations

The previously introduced DES-Amber force field for RNA[7] is a modification of the Amber nucleic acid force field.[1,9] In this earlier work, nucleobase charges and LJ parameters were developed based on QM calculations of nucleobase stacking and pairing, and QM torsion scans were used to develop new torsion parameters for the backbone and glycosidic torsions. For the development of a DNA force field, nucleobase nonbonded parameters are expected to be the same as in RNA, but the χ (O4′-C1′-N9-C4 in purines and O4′-C1′-N1-C6 in pyrimidines) torsion and the ζ (C3′-O3′-P′-O5′) torsion are expected to be affected to some extent by the absence of the OH2′ group. New torsion potentials were thus determined following our previous work on RNA.[7] In vacuo potential energy surface (PES) scans of the χ torsion were performed for the four ribonucleosides. Torsion scans for the ζ torsion were performed on a model molecule mimicking the DNA backbone. QM energies were calculated at the MP2 level of theory, using local and density-fitting approximations[10] with an augmented double-zeta basis set (aug-cc-pVDZ) and the COSMO model[11] to account for solvation effects, using the MOLPRO program.[12] The torsions were scanned between −180 and 180° in 10° increments. Each structure was relaxed at the MP2 level with a number of other dihedral angles constrained to prevent the formation of intramolecular hydrogen bonds, thus ensuring the smoothness of the potential energy surfaces. Only one-dimensional scans were performed, and the coupling between dihedral angles[13] was not explicitly considered. The total number of terms, N, in the cosine expansion was four for the χ torsion and three for the ζ torsions. The new k and ϕ values are summarized in Table .

Table 1

Revised Torsion Parameters Obtained from a Fit to Torsion Energy Profiles Calculated at the MP2 Level of Theorya

angle	ϕ₁	k₁	ϕ₂	k₂	ϕ₃	k₃	ϕ₄	k₄
χ_A	–7.75	–0.5250	–4.44	1.9578	–35.39	–0.9569	–21.36	0.2058
χ_C	138.56	–1.0002	–31.94	2.8162	34.57	–1.1862	–53.36	0.7164
χ_T	–7.14	–0.1027	–2.45	2.2160	–13.96	–0.9599	–123.92	–0.2482
χ_G	6.19	–0.5093	–0.54	1.7381	–29.00	–0.8156	–2.69	0.2408
ζ	–154.60	–0.2379	–4.09	1.6200	–51.37	0.4252

Torsion angle potentials, Vϕ, are expressed as . The χ torsion angle parameters were made nucleotide-specific (subscripts A, C, T, and G), while the same set of ζ torsion angle parameters was used for all nucleotides. Units are degrees for angles and kcal mol–1 for force constants.

Assignment of Charges to the Terminal Residues

In the Amber force field, the −1.0 charge of the 3′-terminal nucleobase, which is often phosphorylated, is split between the 3′-terminal residue (−0.6921) and the 5′-terminal residue (−0.3079). This charge distribution is the result of where the residue boundaries are defined in the Amber force field and does not follow the common approach of assigning integer charges to each residue in most fixed-charge force fields for biomolecules. We maintained it in our reparameterization for consistency with the original Amber force field. We also tested a force field (which we term DES-Amber T) in which the charge of the terminal groups was adjusted to localize all of the negative charge on the 3′-terminal nucleotide. For this force field, we performed simulations of short ssDNA fragments of two to six nucleotides (see the subsection Simulated Tempering Simulations of Short ssDNA Molecules below). Although we found some differences in the conformational ensembles sampled by simulations performed using these different charge distributions, both matched the experimental J-couplings well (RMSD <1 Hz, Table S1), suggesting that both charge distributions are reasonable choices.

MD Simulations of dT40

We performed MD simulations of dT40 molecules starting from a structure in which all residues were in the B-DNA conformation. This structure was solvated in cubic 1303 Å3 water boxes containing 0.025, 0.1, 0.4, or 1.0 M NaCl. At each NaCl concentration, 100 μs of simulation at 300 K and 1 atm in the NPT ensemble was performed using the DES-Amber and the DES-Amber SF1.0 force fields. For comparison, simulations of 100 μs were also run using the Amber-bsc1 force field,[1] and simulations of 40 μs were run using the CHARMM36 force field.[5] The persistence length LP was calculated from , where, for a polymer formed by a chain of vectors of the same length L, θ is the angle between two unit vectors i and j and |i–j| is their distance in sequence. The end-to-end distance was measured and FRET intensities were calculated assuming a Förster radius of 60 Å.[14]

Simulated Tempering Simulations of Short ssDNA Molecules

We performed simulated tempering simulations of 18 ssDNA molecules ranging from two to six nucleotides in length (Table S1). Each system was generated using Maestro[15] and solvated in a cubic 503 Å3 box containing 0.1 M NaCl. Simulations were performed using the Amber-bsc1 force field[1] and the TIP3P water model,[16] the DES-Amber force field, the DES-Amber SF1.0 force field, and the DES-Amber T force field. The TIP4P-D water model[17] was used in all DES-Amber simulations. The systems were first equilibrated at 300 K for 100 ps using the Desmond software,[18] and production simulated tempering simulations[19] of 20 μs were performed in the NPT ensemble[20−22] with 100 rungs spanning the 273–410 K range using Anton.[23] NMR J-couplings were calculated for the sugar ring protons using the Karplus relation as parameterized by Haasnoot[24,25] and compared to the J-couplings measured experimentally.[26−35]

Simulated Tempering Simulations of the d(GCGAAGC) Polynucleotide

We performed simulated tempering simulations of the d(GCGAAGC) polynucleotide that experimentally folds into a hairpin structure. We solvated the experimentally determined structure (PDB entry 1kr8)[36] in a cubic 503 Å3 box of TIP4P-D water containing 1.0 M NaCl. Simulations were performed using the DES-Amber force field and the DES-Amber SF1.0 force field. The systems were first equilibrated at 300 K for 100 ps using the Desmond software, and production simulated tempering simulations of 170 μs were performed in the NPT ensemble with 100 rungs spanning the 278–410 K range using Anton. The free energy of the hairpin at 310 K was calculated as , where ϕ310K is the fraction of frames where the hairpin was folded at 310 K, computed using a dual-cutoff[37,38] of 1 and 6 Å on the heavy-atom RMSD from the experimental structure, averaged over 50 ns.

Simulated Tempering Simulations of DNA Duplex Formation

We performed simulated tempering simulations of seven complementary DNA duplexes of sequence d(CACAG), d(CGCGG), d(CGTACG), d(GAGTGAG), d(GCGC), d(GGATCC), and d(TCATGA). The double-stranded models of the seven DNA duplexes were generated with the program COOT.[39] Each duplex was solvated in a cubic 503 Å3 box containing 0.1 M salt (Table S2). Simulations were performed using the Amber-bsc1 force field with the TIP3P water model, the DES-Amber force field, and the DES-Amber SF1.0 force field. The TIP4P-D water model was used in all DES-Amber simulations. The systems were first equilibrated at 300 K for 100 ps using the Desmond software, and production simulated tempering simulations of 500 μs were performed in the NPT ensemble with 100 rungs spanning the 278–410 K range using Anton. The fraction of frames containing B-DNA at each temperature (ϕ) was computed using a dual cutoff of 2 and 10 Å on the backbone RMSD from the ideal B-DNA conformation, averaged over 50 ns. The free energy of the B-DNA duplex at 310 K was calculated as , where c is the DNA concentration in the simulation box. The ″experimental″ free energy was estimated using the nearest-neighbor model of SantaLucia[40] with [Na+] = 0.1 M.

MD Simulations of DNA Oligomers

We performed simulations of 17 different sequences of lengths ranging from 10 to 17 amino acids (Table S4). The starting structure of each simulation was taken from the experimental structures deposited in the PDB (Table S4).[41−57] Three simulations were performed for the Drew–Dickerson dodecamer (DDD) starting from three different experimental structures. These simulations gave indistinguishable results, and only one is discussed in detail. Each oligomer was solvated by a 0.1 M NaCl water solution in a cubic box with a minimum distance between periodic images of 30 Å. For each system, 50 μs of MD simulation in the NPT ensemble at 300 K was performed. Five microseconds of MD simulation was also performed using the Amber-OL15 force field[4] using theTIP3P[16] and the OPC[58] water models and the Joung–Cheatham ion parameters.[59] The base-step, base-pair, and nucleobase structural parameters of the oligomers were calculated for 5000 evenly spaced frames of each simulation using the DSSR software.[60]

MD Simulations of Noncanonical DNA Structures

We performed simulations of Z-DNA, two CEB2 and telomeric G-quadruplexes, and one DNA trimer. The initial structures were taken from PDB entries 4ocb,[61]2lpw,[62]1jrn,[63] and 1bwg.[64] Five independent simulations of Z-DNA at different salt concentrations were performed with the system solvated in 0.1, 0.5, 0.75, 1.0, or 1.5 M KCl in a 703 Å3 cubic box. The quadruplexes were solvated in 0.1 M KCl in a 603 Å3 cubic box. The trimer was solvated in 0.15 M KCl in a 803 Å3 cubic box. To equilibrate the ion distribution and allow K+ ions to migrate between the nucleobases, the telomeric G-quadruplex was equilibrated for 1 μs using position restraints on the nucleic acid heavy atoms with a force constant of 0.1 kcal mol–1 Å–1; CEB2 was equilibrated for 5 μs using position restraints on the backbone atoms with a force constant of 0.2 kcal mol–1 Å–1. Production runs of 50 μs (with no position restraints) were then performed for each system.

MD Simulations of Protein–Nucleic Acid Systems

We performed simulations of six protein–RNA and six protein–DNA complexes. The starting structure for each simulation was the experimental NMR or X-ray structure as found in the PDB: 2dgc,[65]3jxc,[66]3zhm,[67]4mhg,[68]1tro,[69]3bdn,[70]2az0,[71]4qoz,[72]1urn,[73]4qil,[74]5dea,[75] and 3k5y.[76] Protein–nucleic acid complexes were solvated in a 0.1 M NaCl water solution in a cubic box with a minimum distance between periodic images of 30 Å. Production runs of 50 μs were performed for each system.

Results

Force Field Parameterization

Development of the DES-Amber Force Field Parameters for DNA

As an initial step, we incorporated into the Amber-bsc1 DNA force field the revised nucleobase LJ and electrostatic nonbonded parameters that were derived previously for RNA.[7] For thymine, which is typically not found in RNA—as it is replaced by uracil in most cases—the Amber charges were used. We developed two force field parameter sets: one (which we name DES-Amber) with charge scaling consistent with the previously reported DES-Amber protein and ion force field parameters and a second (which we name DES-Amber SF1.0) with no charge scaling, consistent with the previously reported DES-Amber SF1.0 protein and ion force field parameters.[8] Our previously reported RNA parameter set[7] did not include charge scaling and is thus part of the DES-Amber SF1.0 force field; the revised RNA parameters reported in this work, with charge scaling applied to ionic residues, are part of the DES-Amber force field. Each force field (DES-Amber and DES-Amber SF1.0) thus contains parameters for DNA, RNA, proteins, and ions. Both DES-Amber and DES-Amber SF1.0 are based on procedures similar to those used when developing the DES-Amber protein force field,[8] with the only difference between the two parameter sets being the charge of the ionic residues, which are scaled by 0.9 in the case of DES-Amber and 1.0 (i.e., no rescaling) in the case of DES-Amber SF1.0. Previous work has shown that glycosidic torsion parameters should not be shared between DNA and RNA molecules.[9] We thus focused the torsion optimization on the χ (OS-CT-N-C) and ζ (CT-OS-P-OS) angles; all other torsion parameters and bonded terms were kept the same as the RNA parameters in DES-Amber SF1.0 derived previously.[8] The torsion parameters describing the nucleotide-specific χ angles were obtained from a fit to the torsion QM energy profiles calculated at the MP2 level using MolPro.[12,77] This level of theory has been found to accurately describe torsion angles in nucleobases,[9] and we found it to be sufficient to achieve a good level of accuracy for the parameterization of χ angles in RNA.[7] Following previous work, single nucleosides in both C3′-endo and C2′-endo conformations were used as model compounds in the generation of χ-related torsion energies.[9] Restraints were applied to the sugar hydroxyl group torsions to prevent the formation of intramolecular hydrogen bonds between the sugar and the nucleobase during geometry optimization. For the ζ angle, a deoxyribose N-methyl phosphate molecule was used as a model compound in the generation of torsion energies. The revised torsion parameters are reported in Table .

Parameterization of the Magnesium Ion

Magnesium ions are often very important for the stability of the tertiary structure of nucleic acids.[78,79] Previous studies have highlighted that, although the description of the hydrated Mg2+ ion appears to be satisfactory,[80] the original parameterizations for the Mg2+ ions in Amber[81] and CHARMM are suboptimal.[82] In both of these ion parameterizations, the ionic radius of the ion is too small, leading to the overbinding of Mg2+ and slow exchange of the water shell. These deficiencies could result in simulation inaccuracies, especially in cases where the process of ion binding to the phosphate backbone is important. Subsequent studies have highlighted the importance of developing water-specific ion parameters,[59] in particular for divalent ions.[83,84] We thus revised the nonbonded parameters of the Mg2+ ions with the aim of making them compatible with the TIP4P-D water model[17] and the DES-Amber force field. We focused our parameterization on trying to achieve a balance between the affinity of the Mg2+ ion for water and for the negatively charged groups on proteins and nucleic acids (phosphates and carboxylates) while keeping the ionic radius close to the values estimated experimentally. Although we were able to achieve a reasonable compromise between these competing objectives for the scaled-charge version of DES-Amber, we were unable to obtain a good fit for DES-Amber SF1.0, so in this force field, the original Amber parameters[81] were retained. In our optimization of parameters for Mg2+, we used Mg2+ binding to tRNAPhe as our benchmark system. This RNA molecule folds into a complex T-shaped structure stabilized by multiple Mg2+ ions.[85] In two sites, Mg2+ ions directly interact with the nucleotides (bridging the phosphate groups of two nucleotides), and in an additional three sites, the ions are completely solvated and only interact indirectly with the phosphate moieties through the hydration shell.[86,87] We expect that the ability to recapitulate both binding modes in a single simulation would provide an indication that the force field parameters achieve a reasonable balance between the affinity for water and the affinity for biomolecules. We were unable, however, to achieve this balance using standard combination rules for the LJ potential. We thus decided to first set pair-specific LJ interaction parameters between Mg2+ and water to achieve an ionic radius consistent with the experimentally determined first peak of the radial distribution function of water and Mg2+ and then scan for a suitable set of atomic LJ parameters for the Mg2+ ion that would allow us to optimally describe the different ion-interaction sites in tRNAPhe. Following previous work,[82,88] the LJ parameters for the Mg–Owat interaction were first determined by performing simulations of Mg2+ in water, with σ parameters ranging from 2.9435 to 3.097 Å and ε ranging from 0.015 to 0.045 kcal mol–1, ultimately choosing the parameters ε = 0.025914 kcal mol–1 and σ = 3.004 Å, as these values best reproduced the experimental Mg–Owat distance of 2.07–2.11 Å[89,90] and the water exchange rate of 6.7 × 105 s–1.[91] Subsequently, simulations of 50–150 μs were performed starting from a tRNAPhe structure (PDB entry 1ehz)[87] in which the six magnesium ions identified in the X-ray structure were removed and a total of 11 magnesium ions were added to the water solution together with 0.15 M KCl. Position restraints of 0.3 kcal mol–1 Å–1 were applied to the nucleic acid backbone to prevent the collapse of the structure when no structural magnesium ions were bound. During the simulations, the occupancy of the sites where Mg2+ ions were observed in the X-ray structure and the coordination sphere of the ions were monitored. We gradually increased the value of the Mg2+ σ from 2.77 Å, and for each increase, we tested values of the Mg2+ ε ranging from 0.009 to 0.002 kcal mol–1; we found, using LJ parameters for Mg2+ (for all interactions except with water) of ε = 0.003000 kcal mol–1 and σ = 3.075 Å, that after 100 μs of MD simulation starting from a tRNAPhe structure with no Mg2+ ions bound, five of the six sites were occupied by Mg2+ ions in the correct coordination pattern (Figure A,B). This suggests that this set of force field parameters results in a reasonable affinity for the phosphate backbone. In the case of the sixth site, a neighboring site was occupied instead. Finally, we verified that, following equilibration of the position of the Mg2+ ions, the tRNA structure was also stable in the absence of position restraints (Figure S12).

Figure 1

Binding of Mg2+ to tRNA and RNaseH. (A) Occupancy of the six Mg2+ sites in tRNA as a function of simulation time. After 120 μs of simulation, five of the binding sites observed experimentally[87] were occupied. (B) Mg2+ ion positions between 100 and 120 μs of simulation; 25 snapshots of the positions of the Mg2+ ions occupying the six binding sites are reported as small solid pink spheres. The positions of the crystallographic Mg2+ ions are highlighted as large transparent multicolored spheres. (C) Cα-RMSD in Å from the X-ray structure (PDB entry 1g15)[92] in a 127 μs simulation of RNaseH started from the apo enzyme. (D) Snapshots of the positions of the Mg2+ ions from the last 20 μs of simulation (small pink spheres). The positions of the crystallographic Mn2+ ions are reported as large green transparent spheres. As an additional test, we performed a simulation of the Escherichia coli ribonuclease H (RNaseH). The active site of this protein contains four negatively charged residues (Asp10, Glu48, Asp70, and Asp134) and one Mg2+ ion.[92] At high Mg2+ concentration, binding of a second Mg2+ ion inactivates the enzyme. We performed a simulation starting from the apo crystal structure (PDB entry 2rn2),[93] in which Mg2+ ions are not present, in a 0.1 M MgCl and 0.1 M NaCl water solution. Position restraints were not necessary in this case, as Mg2+ is not required for the structural stability of the RNaseH protein. Indeed, the global protein structure was remarkably stable during the 127 μs of simulation, as indicated by the average Cα root-mean-square deviation (RMSD) of 1.2 Å with respect to the holo structure (PDB entry 1g15)[92] (Figure B). After approximately 50 μs of simulation, a Mg2+ ion occupied one of the binding sites occupied by a divalent ion in the Mn2+-bound X-ray structure, and at approximately 100 μs of simulation, a second ion bound to the active site (Figure D). Both ions remained bound for the rest of the simulation, with the second ion being more loosely coordinated (Figure C), an observation that may be consistent with its lower experimentally determined affinity.[92] In our simulation setup for the RNaseH system, there were 18 Mg2+ ions in solution and 20 negatively charged residues in the protein. During the course of the simulation, Mg2+ ions transiently interacted with other protein residues, but despite the large number of possible interactions, only two additional persistent interactions were observed (with Glu6 and Glu154). The significance of these interactions is unclear, as the binding–unbinding kinetics are slow with respect to the timescale of the simulation, but the low number of alternative persistent interactions observed suggests that this set of Mg2+ parameters does not lead to strong overbinding of the magnesium ions to the negatively charged protein residues.

Single-Stranded DNA Tests

Solution Conformation of Long Single-Stranded DNA Molecules

In testing DES-Amber, our first goal was to verify that the force field is capable of reproducing the conformational properties of long, disordered single-stranded nucleic acids. Previously, we have observed that simulations of a long, disordered RNA molecule (rU40) performed with the Amber and, to a lesser extent, the CHARMM force fields resulted in highly compact conformations, at odds with experimental observations, and found that this issue was resolved in the DES-Amber SF1.0 force field.[7] For this work, we used a 40-residue poly-thymine (dT40), as our test system. Although this molecule is not of particular biological significance, it is known to be mostly disordered in solution[94,95] and thus may be particularly sensitive to force field errors. With both DES-Amber and DES-Amber SF1.0, we performed 100 μs of simulation at 300 K in 0.025, 0.1, 0.4, and 1.0 M NaCl solutions; for comparison, we also performed simulations with the Amber-bsc1[1] and CHARMM36[5] force fields. These two force fields were chosen because they are generally believed to represent the state of the art for the Amber and CHARMM family of force fields.[5,96] As a first step, for each simulation, we compared the persistence length (Lp) and the FRET intensities calculated assuming a Förster radius of 56.4 Å[14,94] (Figure A,B) to the reported experimental values.

Figure 2

Comparison of calculated and experimental (A) FRET intensities and (B) persistence lengths for dT40 as function of NaCl concentration for simulations run with CHARMM36 (C36, red), Amber-bsc1 (green), DES-Amber (blue), and DES-Amber SF1.0 (purple). Experimentally measured FRET intensities and persistence lengths[94] are reported in black. At low salt concentrations, simulations with both DES-Amber and DES-Amber SF1.0 resulted in relatively expanded conformational ensembles that are generally consistent with the FRET intensity values measured experimentally at low salt concentrations,[14] although the persistence length appears to be overestimated due to the prevalence of relatively rigid B-like structures. This effect was even more pronounced in the Amber-bsc1 simulations, where the conformational ensemble was dominated by the presence of a long B-like structure with an end-to-end distance of >120 Å and a persistence length of 85 Å. This finding is at odds with experimental observations indicating that dT40 should be somewhat disordered, with an end-to-end distance of ∼65 Å[94] and a persistence length of 30 Å.[94] This discrepancy may be the result of a slight overfitting of torsion parameters to helical structures. The FRET intensity and persistence length calculated from the CHARMM36 simulations were in excellent agreement with experiment at the lowest salt concentration investigated here (0.025 M, Figure ), but at physiological salt concentrations, the simulations resulted in FRET intensities that are much higher than the experimental values due to the strong tendency of the CHARMM36 force field to collapse into molten globule structures. In simulations at high salt concentration, the extended structures observed in DES-Amber, DES-Amber SF1.0, and Amber-bsc1 had a greater tendency to collapse, resulting in a smaller persistence length and higher FRET intensities, in fair agreement with experiment. In the case of Amber-bsc1, the collapsed species in simulation were characterized by metastable hairpin structures (as indicated by the large number of base pairs that are transiently formed, Figure S1). The significance of this finding is unclear, as FRET and SAXS data suggest that poly-dT should be largely disordered in solution,[14,94,95,97] but an atomic-level structural characterization of this system using experimental approaches is difficult.[97] Overall, these results suggest that the dimensions and flexibility of the disordered state, a quantity that can be strongly affected by force field artifacts and is difficult to predict in simulation,[7] are recapitulated reasonably well by both DES-Amber force fields. It should be noted that the DES-Amber force fields appear to underestimate the level of compaction that occurs at high salt concentration (Figure ), which results in disordered ensembles and persistence lengths that are slightly larger than the experimental estimates, possibly due to some degree of overstabilization of B-like structures.

Solution Conformation of Short Single-Stranded DNA Molecules

To test the description of local conformational fluctuations in short ssDNA, we performed simulated tempering simulations of 18 ssDNA molecules composed of two to six nucleotides (Table S1). Each system was subjected to 20 μs of simulation using DES-Amber and DES-Amber SF1.0, and we compared the resulting 3J-couplings for the sugar ring protons, calculated using the Haasnoot Karplus equation,[24,25] to a dataset of 644 3J-couplings measured using NMR spectroscopy (Figure ).[26−35] Using both revised force fields, most of the systems investigated had mean squared errors <1 Hz2, indicating good agreement between the calculated and experimental 3J-couplings (for comparison, the average root-mean-square error of Amber-bsc1 across all systems is 1.70 Hz; Table S1). The largest deviations were observed in the H1′–H2′ 3J-coupling of the first nucleobase; the calculated coupling for this pair was often larger than the experimental value, suggesting that, in simulation, the sugar ring slightly overpopulates the S conformation.[25,26] These deviations, however, are generally not much larger than the expected accuracy of the Haasnoot parameterization of the Karplus relation, and to avoid the risk of overfitting, we refrained from performing additional tuning of the force field parameters based on these data.

Figure 3

Comparison of 644 calculated and experimental J-couplings for the deoxyribose protons in 18 short ssDNA molecules, ranging from two to six nucleotides in length, in simulations using the DES-Amber (red) and DES-Amber SF1.0 (black) force fields.

Double-Stranded DNA Tests

Structure and Stability of a DNA Hairpin

Having established that DES-Amber provides a reasonable description of disordered single-stranded DNA molecules, our next goal was to test whether the force field could describe molecules that fold into a well-determined structure, such as a DNA hairpin. We performed 170 μs of simulated tempering simulations of the d(GCGAAGC) polynucleotide using DES-Amber and DES-Amber SF1.0. This sequence has been found experimentally to fold into a hairpin in solution (PDB entry 1kr8).[36] On the μs timescale in our simulations, the polynucleotide folded into structures with heavy-atom RMSDs of <1 Å from the experimentally determined structure (Figure ), suggesting a folding rate roughly 2 orders of magnitude faster than that of RNA tetraloops.[7] Similarly to what we observed for a number of RNA tetraloops, the hairpin was less stable than what is observed experimentally (the calculated ΔGfold at 310 K was 1.3 and 1.5 kcal mol–1 for DES-Amber and DES-Amber SF1.0, respectively, compared to the experimentally determined[98] ΔGfold of −3.2 kcal mol–1). This discrepancy may be the result of residual inaccuracy in the description of the nonbonded interactions.

Figure 4

(A) Folding free energy as a function of temperature for the DNA hairpin d(GCGAAGC) with the DES-Amber (red) and DES-Amber SF1.0 (black) force fields. The structure obtained by averaging the frames between 9.2 and 9.5 μs of simulation time is shown superimposed with the experimental structure. (B) Backbone RMSD trace (Å) for the first 10 μs of the simulated tempering simulation of the d(GCGAAGC) DNA sequence performed with the DES-Amber force field.

Thermodynamic Stability of DNA Duplexes

B-DNA, as the most common double helix found in nature, is arguably the most important structure that is formed by DNA molecules. As a test of the ability of the force field to correctly identify double-stranded B-DNA as the thermodynamically most stable structure, and to recapitulate the observed experimental dependence of stability on the DNA sequence, we performed 500 μs simulated tempering simulations of seven pairs of DNA sequences with lengths ranging between four and seven nucleotides. We found that the correct B-DNA structure formed reversibly (Figure A and Figures S2–S4) in simulations performed with all three force fields investigated (DES-Amber, DES-Amber SF1.0, and Amber-bsc1). The B-DNA structure was also correctly identified as the thermodynamically most stable conformation at low temperatures (Figure B,C).

Figure 5

Simulated tempering simulations of short B-DNA fragments. (A) Snapshot from the d(CGCGG) simulation superimposed on a canonical B-DNA reference structure. (B) Backbone RMSD from a canonical B-DNA structure in the simulation of d(CGCGG) performed with DES-Amber. (C) Fraction of B-DNA duplex frames as a function of temperature. (D) Calculated melting free energy at 300 K for short B-DNA duplexes from simulations performed with DES-Amber, DES-Amber SF1.0, and Amber-bsc1 compared to the melting free energy predicted using a nearest-neighbor model. Some notable differences were observed between Amber-bsc1 and the DES-Amber force fields in the non-B-DNA conformational ensembles. Whereas in DES-Amber and DES-Amber SF1.0 the two DNA oligomers were noninteracting 60–70% of the time when B-DNA was not formed, in Amber-bsc1, the two strands rarely fully separated and, when B-DNA was not formed, were bound in molten globule states with noncanonical structure 90–97% of the time (Table S3). Such states have not been reported in the literature and are probably an artifact of the tendency of this force field to form disordered structures that are overcollapsed. There was generally a reasonable correlation between the calculated relative stability and the stability predicted by a nearest-neighbor model[40] (Figure D and Table S2), although the absolute stability at 310 K was systematically underestimated (by approximately 0.2 kcal mol–1 per nucleobase for DES-Amber and 0.15 kcal mol–1 per nucleobase for DES-Amber SF1.0 and Amber-bsc1). The origin of this discrepancy is unclear. It would be possible, in principle, to achieve much better agreement with experiment for these systems through a suitable modification of torsion parameters, as is often performed in the optimization of protein force fields.[37,99,100] We decided, however, not to pursue this route, as the simulations of short ssDNA molecules indicate that there is already a strong tendency to populate B-DNA conformations, suggesting that errors in the torsion parameters may not be responsible for the behavior observed here.

Solution Structure of Long DNA Duplexes

It is important that a DNA force field have the ability to reproduce the subtle deviations from the canonical B-DNA conformation that are induced by changes to local sequence, as these differences can be relevant for function and sequence recognition. Following previous work on DNA force field development,[1] we have performed simulations of 19 dsDNA molecules containing 10 to 17 nucleobase pairs using the DES-Amber and DES-Amber SF1.0 force fields. Each simulation was run in the NPT ensemble at 300 K for 50 μs, and the structure observed in simulation was compared to the corresponding experimentally determined structure. For all but one of the systems investigated, the DNA structure remained close to the experimental structure (Table S4), although in some cases fraying of the termini was observed. The RMSDs were comparable to those observed for Amber-bsc1 (e.g., average backbone RMSD was 2.47 Å for DES-Amber and 2.49 Å for Amber-bsc1), a force field that has been shown to describe these systems with extremely high accuracy.[1,101] A somewhat larger structural deviation from the X-ray structure was observed for d(CTTTTAAAAG) using all force fields, in particular when using DES-Amber SF1.0; this is possibly because this sequence was crystallized with seven Ca2+ ions bound, and these ions were not present in the MD simulation. In simulations of d(GGATATATCC) performed using DES-Amber SF1.0, the duplex dissociated. This is not entirely unexpected given that nearest-neighbor models predict this sequence to be only marginally stable at 300 K,[102,103] and our calculations of the free energy of melting of short duplexes suggest that the DES-Amber SF1.0 force field may slightly underestimate duplex stability. Although RMSD is a useful metric for monitoring large conformational transitions such as DNA melting or the B-A transition, it is relatively insensitive to the details of the fine structure of the DNA double helix. We thus calculated average base-step and base-pair parameters, as well as structural parameters for each nucleobase. The full set of data is available in the SI. DES-Amber, DES-Amber SF 1.0, and DES-Amber 3.20 results are almost indistinguishable; this is not unexpected, as they largely share the same torsion potentials. Previous studies have indicated that force field inaccuracies can result in the overstabilization of the γ-trans conformation, which can become the dominant conformation at long timescales.[104−106] Our structural analyses indicate that in simulations performed with the DES-Amber force fields, the γ-trans conformation is observed in the terminal base pairs 10% of the time, possibly due to occasional fraying, and 1% of the time in the rest of the sequence. (The corresponding results are 8 and 4% in Amber-bsc1 and 3 and 0.4% in Amber-OL15, respectively). The deviation of the step and pair parameters from the X-ray data was calculated for all force fields (Tables S9–S14). Although it is unclear how closely the simulation of the duplex in solution should resemble the X-ray structure, we found that the parameters in simulations performed using the DES-Amber force fields deviate from the X-ray structure by about the same amount as the reference force fields (Amber-bsc1 and Amber-OL15), suggesting that the DES-Amber force fields provide a description of the average nucleobase conformation with about the same accuracy as these state-of-the-art force fields. Averages were also calculated for the helical rise and twist per base step (Table S15). We found that, for all force fields, the calculated average rise was ∼3.35, ∼0.1 Å larger than that in the X-ray structures. The average helical twist was 33° for DES-Amber and 34° for Amber-OL15 and Amber-bsc1, values that are comparable to the 35° average value in the X-ray structures. Finally, we analyzed in more detail the self-complementary Drew–Dickerson dodecamer (DDD) of sequence d(CGCGAATTCGCG), a system that has previously been very well-characterized by both simulations and experiments.[5,41−43,96,107,108] In all three force fields, simulations of DDD resulted in a structure closer to the structures determined by NMR in aqueous solution (e.g., DES-Amber average RMSD was 1.93 Å from PDB entry 1duf(41) and 2.14 Å from 1naj(42)) than to the crystal structure (e.g., DES-Amber average was RMSD 2.61 Å from PDB entry 1bna(43)) (Table S4). For the six base-step parameters (twist, roll, slide, rise, shift, and tilt),[60] we compared the values to the range of values observed in experimentally determined structures[101] (Figure ). All base-step parameters were well converged in 50 μs of simulation, with DES-Amber and DES-Amber SF1.0 giving almost identical results. Agreement with experiment for most of the base-step parameters was satisfactory, though the local description of the GC base step appeared to be inaccurate; this is consistent with the observation that the helical twist of the GC step appears to be underestimated in DES-Amber simulations (Tables S9–S11). Interestingly, simulations performed with Amber-bsc1, despite having an overall RMSD from the experimental structures (1.83 Å from 1duf) similar to that observed in simulations performed with DES-Amber (1.93 Å) (Table S4), typically yielded a better description of the local geometry of the GC base step. The CGCG region of the DDD dodecamer has been shown to exhibit unusual flexibility due to the population of a large fraction of the BII state at 300 K (34% for the GC step and 56% for the CG step).[109] The BI/BII population of these steps in the DDD simulations performed with the DES-Amber force fields was only 1% for the GC step and 3% for the CG step, whereas it was ∼30 and ∼50%, respectively, for Amber-bsc1 and Amber-OL15. More generally, we found that the BII population in the DES-Amber duplex simulations rarely exceeded a few percent, while it was almost an order of magnitude larger in simulations performed using Amber-bsc1 and Amber-OL15. This suggests that there is some imbalance in describing the relative stability of the BI and BII conformations and points to an area in which the DES-Amber force field could be improved, possibly by suitable optimization of the backbone torsions or the guanine nonbonded parameters that for this work were simply transferred, without further refinement, from the RNA parameters previously derived for the DES-Amber SF1.0 force field.

Figure 6

MD simulations of DDD. Comparison of the average six base-step parameters (twist, roll, slide, rise, shift, and tilt) measured in simulations performed with DES-Amber (green), DES-Amber SF1.0 (blue), and Amber-bsc1 (red) to the values observed in seven crystal and NMR structures deposited in the protein data bank (PDB entries 1bna, 2bna, 7bna, 9bna, 1naj, 1jgr, and 4c64)[101] (black). Error bars for the experimental values report the variance of the data. Error bars for the simulated data are estimated using block averaging of the trajectory data.

Structural Stability of Noncanonical DNA Structures

Although B-DNA is the predominant form of DNA under physiological conditions, particular sequences or conditions can stabilize alternate DNA structures. We investigated the ability of DES-Amber to recapitulate some of these structures by performing simulations of (i) Z-DNA at varying salt concentrations, (ii) two DNA G-quadruplexes, and (iii) a DNA triplex. Z-DNA is a left-handed helical form of DNA that is stabilized by GC-rich sequences containing 8-methylguanine,[110] or 5-methyl cytosine in water/methanol solution[111] or at very high salt concentrations (>3 M NaCl).[112,113] For d(CG), the Z-DNA conformation can be crystallized under physiological conditions, suggesting that it should be at least marginally stable in solution; improving the stability of this structure has been the focus of a recent Amber force field optimization.[114] We have performed 200 μs simulations starting from a d(CGCGCGCGCGCG) Z-DNA dodecamer X-ray structure (PDB entry 4ocb)[61] at salt concentrations ranging from 0.1 to 1.5 M NaCl. Simulations could not be performed at higher concentrations, as the solubility of NaCl and KCl in DES-Amber is <2 M. Simulations were fairly stable on the timescale of a few microseconds, with backbone RMSDs from the starting X-ray structure of 1–2 Å (see Figure A for a representative simulation at 0.75 M), but extensive fraying of the duplex terminal base pairs was observed on longer timescales in all simulations. In most cases, one-third to one-half of the initial Z-DNA base pairs were still formed at the end of the simulation (Figure S8), a result suggesting that Z-DNA is only partially stable under the simulation conditions over long timescales; this finding is consistent with the experimental observation that rather extreme conditions are required to stabilize Z-DNA in solution.

Figure 7

MD simulations of other forms of DNA with DES-Amber. (A) A Z-DNA dodecamer (gray) in 0.75 M KCl spontaneously transitioned to B-DNA (orange). The heavy-atom RMSDs with respect to the initial Z-DNA structure (green) and final B-DNA structure (blue) are shown on the right. (B) The CEB25 human minisatellite locus G-quadruplex, with the initial structure shown in light gray and the final structure shown in dark gray. The heavy-atom RMSDs of the loop (blue) and quadruplex (green) segments (in the reference alignment to the quadruplex core) are shown on the right. (C) The Oxytricha nova telomeric DNA G-quadruplex, with the initial structure shown in light gray and the final structure shown in dark gray and red. The heavy-atom RMSDs of the loop (blue) and quadruplex (green) segments (in the reference alignment to the quadruplex core) are shown on the right. (D) Triple-stranded DNA, with the duplex shown in red and green and the third strand in blue. The heavy-atom RMSDs of the duplex (blue) and third strand (green) (in the reference alignment of the duplex) are shown on the right. In high-resolution X-ray structures of Z-DNA, two main backbone rotamer states (ZI and ZII) have been observed. Previous studies have shown that simulations of Z-DNA in 2 M NaCl performed with several force fields of the Amber family can populate rotamer conformations (ZI′ and ZII′) not observed in the X-ray structures.[3,4] Although the solution structure of Z-DNA in 2 M NaCl is not known, as it is only marginally stable,[113] the ZI′ and ZII′ rotamers are believed to be force field artifacts.[4] We have calculated the torsion angles from the first 2 μs of the Z-DNA simulations at 1 M NaCl (Figure S21). On this timescale and at this salt concentration, the Z-DNA structure was largely preserved in the DES-Amber simulations, and the ZI (60–65%) and ZII (30–35%) rotamers dominate. We observed a small amount of the ZI′ conformation (∼5%) and none of the ZII′ conformation. Interestingly, in the 0.75 M simulation, the Z-DNA spontaneously transitioned to B-DNA without fully dissociating. The process began with terminal base-pair fraying, which led to the distortion of the full Z-DNA duplex after approximately 10 μs. The structure then began to adopt a B-DNA-like structure on one end, while a DNA-bubble-like structure formed in the center of the duplex. The gap formed by the bubble transiently opened and closed by the formation of mismatched base pairs over the course of 100 μs until the entire structure eventually underwent a cooperative transition to B-DNA at roughly 120 μs (Figure A). Another important noncanonical structure is the G-quadruplex, a conformational form of DNA commonly found in telomeric DNA. We performed 50 μs simulations of two DNA G-quadruplexes: a repeat unit of the CEB25 human minisatellite locus (PDB entry 2lpw)[62] and the Oxytricha nova telomeric DNA G-quadruplex (PDB entry 1jrn).[63] The G-quadruplex structure is generally composed of a quadruplex region stabilized by potassium ions that intercalate between four guanine bases arranged in a plane separated by loops: a single nine-nucleotide-long loop in CEB25 and two three-nucleotide-long loops in the telomeric DNA system. All crystallographic ions and water molecules were removed during simulation setup, and the systems were solvated in 0.1 M KCl. We first performed 1 μs of restrained equilibration before running the 50 μs simulations. During equilibration, potassium ions intercalated between the guanine bases, stabilizing the complex; as a result, in both simulations, the quadruplex region was fairly stable (average heavy-atom RMSDs of 0.8 and 1.7 Å, as shown in Figure B,C). The loop region in CEB25, however, reversibly sampled several different conformations, including the experimentally determined structure—a result consistent with the higher degree of flexibility observed experimentally for this region.[62] The two shorter loops in the telomeric DNA were more rigid; in all simulations, however, the ion sites at the stem–loop junction were rarely occupied, and flipping of T7 after a few microseconds of simulation induced a slight rearrangement in the loop structures, with formation of a T5–T8 and T17–T20 pair (Figure C). This conformation has been observed in previous studies using enhanced sampling methods.[115] Although accurate parameterization of torsion angles can improve the loop stability, such as in the Amber-OL15 force field,[4] it has been hypothesized that an accurate description of this loop requires a polarizable force field.[116] Finally, we performed simulations of triple-stranded DNA (also known as H-DNA or triplex DNA). Triplex DNA is a complex formed between a B-DNA duplex and a third oligonucleotide.[64] The complex is formed by Hoogsteen base pairs[117] and is stabilized under acidic conditions due to the pronation of cytosines to create a C-G*C+ triad.[117] We parameterized the protonated cytosine based on our unprotonated cytosine parameters (see Methods) and simulated the d(GACTGAGAGACGTA-TACCGTCTCTCAGTC-CTCTCT) triple-stranded DNA complex[64] for 50 μs. We find that in both DES-Amber and DES-Amber SF1.0, the complex was stable on the timescale of the simulation, with marginally higher stability in DES-Amber (Figure D). In particular, we found that even when the B-DNA duplex underwent terminal fraying, the third strand remained stable in the major groove. These results indicate that the DES-Amber force field can accurately capture the Hoogsteen base pairing necessary for stabilizing triple-stranded DNA.

Protein–Nucleic Acid Complexes

Structural Stability of Protein–DNA Complexes

Finally, we investigated the ability of DES-Amber to maintain the structure of protein–nucleic acid complexes. This is a particularly demanding test, as it requires an accurate balance of the interactions both within and between force fields for each biomolecular component.[118,119] We selected the complexes of DNA with the GNC4 leucine zipper (PDB entry 2dgc),[65] the P22 c2 repressor (PDB entry 3jxc),[66] the Cl repressor from bacteriophage TP901-1 (PDB entry 3zhm),[67] the ETS transcriptional repressor ETV6 (PDB entry 4mhg),[68] the Trp repressor (PDB entry 1tro),[69] and the lambda repressor (PDB entry 3bdn)[70] as our test systems. Each system was run at 300 K for 50 μs using DES-Amber and DES-Amber SF1.0, and the deviation from the starting structure was measured (Figure ).

Figure 8

MD simulations of six protein–DNA complexes performed with the DES-Amber force field. RMSD trace as a function of simulation time for the complex (black), the protein (red), and the DNA (blue). Each panel is labeled with the PDB entry of the starting structure. The PDB structure of the complex is displayed as an inset in each plot. The Trp repressor, GNC4 leucine zipper, and ETS transcriptional repressor ETV6 complexes were stable in both force fields throughout the 50 μs simulations. The lambda repressor–operon complex displayed a large amount of flexibility when DES-Amber SF1.0 was used, with reversible fraying at the DNA termini and reversible binding/unbinding of one of the two lambda repressor units. This complex was more stable with DES-Amber, although transient binding/unbinding was also observed in this force field. The P22 c2 repressor and Cl repressor systems were stable only for a few microseconds, after which the complexes dissociated in both force fields. The P22 c2 repressor is a protein dimer with two independent binding sites, and dimerization is associated with the bending of the DNA operon due to a local B-DNA to B′-DNA transition.[120] In the simulation, the protein–protein dimer interface broke, leading to a straightening of the DNA double helix and disruption of the original geometry of the complex. The Cl repressor is monomeric and binds selectively to a short double-stranded DNA stretch. This complex dissociated in the simulation, although both the DNA and the repressor are structurally very stable, as indicated by the small RMSDs with respect to the starting structure for these individual pieces of the complex.

Structural Stability of Protein–RNA Complexes

We also performed simulations of a set of six protein–RNA complexes with various structures and binding modes using DES-Amber; some of these were previously investigated by Krepl et al.[119] In particular, we conducted 50 μs of simulations of the complexes of dsRNA–B2 protein (2az0),[71] tertiarily structured mRNA stem–loop, stem–loop binding protein and 3′hExoribonuclease 1 (4qoz),[72] RNA hairpin–U1A protein (1urn),[73] the ROQ domain–Hmg19 stem-loop RNA (4qil),[74] RNA duplex-quadruplex junction–RGG peptide (5dea),[75] and ssRNA–Pumilio FBF-2 protein (3k5y).[76] None of the complexes dissociated, and four complexes (2az0, 4qoz, 3k5y, and 4qil) were remarkably stable for the entire 50 μs of simulation (Figure ). We did observe, however, that a few key protein–nucleic acid interactions were unstable in the simulations. In 5dea, for example, the C- and N-terminal residues of the peptide were fairly flexible and only loosely bound, a result consistent with the NMR observation that only residues 8 to 17 are essential for binding.[121] During the simulation, however, the contact between RNA and Arg10 broke; this contact is considered critical for recognition[121] and is thus not expected to be broken.

Figure 9

MD simulations of six protein–RNA complexes performed with the DES-Amber force field. RMSD trace as a function of simulation time for the complex (black), the protein (red), and the RNA (blue). Each panel is labeled with the PDB entry of the starting structure. The experimentally determined structure of the complex is displayed as an inset in each plot. Overall, the protein–nucleic acid simulations performed using DES-Amber on the 50 μs timescale resulted in stable simulations for all protein–RNA complexes considered and for about half of the protein–DNA complexes. Complexes were sometimes found to be more flexible than indicated by experiment, and certain key interactions were poorly described. It is possible that, even in such cases, the simulations may still be informative, but care should be taken in the interpretation of the results.

Reparameterization of Nucleic Acid Backbone Nonbonded Parameters to Improve the Description of Protein–Nucleic Acid Complexes

Overall, we found that the DES-Amber force fields provided a fairly accurate description of nucleic acids but that their performance was less satisfactory in simulations of certain protein–nucleic acid complexes. We notice that often, in cases where the dissociation of a complex was observed, structures of the individual components (protein and nucleic acid) were reasonably well-maintained, suggesting that the main issue is related to inaccuracies in the description of nonbonded interactions at the protein–nucleic acid interface. We speculate that the use of a water model like TIP4P-D (or OPC, which has very similar parameters[58]) that reduces artifacts related to overcompaction may magnify force field inaccuracies that in simulations performed with water models like TIP3P would normally be masked by the strong stabilization of compact structures. Simulations performed with TIP4P-D or OPC may thus be less forgiving toward inaccuracies in the macromolecular force field. In an attempt to improve the description of protein–nucleic acid complexes, we have performed a reparameterization, using experimental data (Figure S6), of the nonbonded parameters of the phosphate group in DES-Amber, which we had not previously optimized (see SI for a full description of these reparameterization and validation efforts). We did not attempt the same reparameterization for DES-Amber SF1.0. The reparameterized force field, which we term DES-Amber 3.20 (from the value in Å of the σ parameter in the optimized LJ interaction between the phosphate oxygen atoms and water[122−128]), is comparably accurate to DES-Amber in simulations of nucleic acids (see, e.g., Figures S5, S7, and S9), and we tested it in simulations of protein–nucleic acid complexes by performing 50 μs simulations of eight DNA–protein complexes (the six complexes listed above and two zinc fingers), as shown in Figure and in Figures S13–20, and six RNA–protein complexes, as shown in Figure . We found that on the 50 μs timescale investigated, all the complexes studied were stable and none dissociated (Figure ), a clear improvement compared to using DES-Amber without the reparameterization. This finding suggests that DES-Amber 3.20 may be suitable for long-timescale simulations of systems containing both proteins and nucleic acids. (The relatively large RMSD of the 3bdn complex was mostly attributable to rigid pivot motions of the two large protein domains that are not involved in DNA binding and are thus free to move in solution, but the complex itself was stable.) Although it is possible that the improved performance of DES-Amber 3.20 might be the result of having two additional parameters compared to DES-Amber (specifically, two additional, explicit LJ pairs for nucleic acid–water interactions; see SI), the improvements observed in the description of protein–nucleic acid interactions were substantial, and we recommend the use of DES-Amber 3.20 in simulations of such systems.

Figure 10

MD simulations of six protein–RNA complexes (top) and eight protein–DNA complexes (bottom) performed with the DES-Amber 3.20 (black) and DES-Amber (red). Backbone RMSD traces are reported as a function of simulation time. Flexible terminal residues and regions that were not resolved in the X-ray structures were omitted from the RMSD calculations. Each panel is labeled with the PDB entry of the starting structure.

Discussion

We have performed an optimization of the Amber DNA force field parameters using our previously reported nonbonded parameters for RNA as a starting point. We also optimized parameters for magnesium ions and structural zinc ions. The new DNA force field parameters are part of the DES-Amber force field, which also contains RNA and protein parameters, and were tested on both single- and double-stranded DNA systems. Comparison of DES-Amber simulation results to experiment indicated that, in most cases, the structure and local flexibility of both single- and double-stranded DNA systems were reproduced with a level of accuracy similar to state-of-the-art force fields, with the notable exception of the BII population, which appears to have been underestimated in the DES-Amber DNA duplex simulations. Despite the adjustment of DNA-specific parameters, we observed that the stability of the DNA B-helix was underestimated by ∼0.1–0.2 kcal mol–1 per nucleobase. While it would be possible to correct this deficiency by further optimizing torsion parameters, we decided against this strategy, as it appeared that structures consistent with the B-helix conformation were already well-populated in single-stranded DNA. We thus suspect that some error in the balance of base stacking and hydrogen bonding or backbone electrostatic interactions may be responsible. We also observed some deviation from experiment in the local structure of CG base pairs, a deficiency resulting from the strong repulsion between the two guanine nucleobases on opposing strands, which induced local and global distortions in the helix. Such an imbalance is also characteristic of simulations performed with other Amber force fields,[129,130] and it is unclear if a simple reparameterization of nonbonded interactions and torsion angles would be sufficient to address it or if a more sophisticated functional form might be required. Finally, we performed a set of simulations of protein–DNA and protein–RNA complexes, with somewhat mixed results. Some complexes appeared to be well-described, while others became unstable and dissociated during simulation. Discrepancies from experiment were also observed in previous simulations of a number of protein–protein complexes.[8] Based on the simulations presented here, this instability could not simply be ascribed to internal strain resulting from the instability of one of the biomolecular components, as in some cases of unstable complexes the isolated proteins or nucleic acids were structurally stable. To further investigate this issue, we performed an analysis of the backbone nonbonded interactions and attempted a reparameterization of the phosphate group based on experimental data (see SI). The resulting DES-Amber 3.20 force fields improved the description of the complexes while yielding results roughly equivalent to the DES-Amber force fields for systems containing nucleic acids only. To achieve these results, it was necessary to introduce two additional nonbonded pairs, which may in effect be compensating for the lack of an explicit polarization model; such adjustments may not be required in the context of a polarizable force field. Achieving an even higher degree of accuracy may require the use of more complex functional forms. Further studies will be required to fully address this issue.

108 in total

1. Structural origins of adenine-tract bending.

Authors: Andrej Barbic; Daniel P Zimmer; Donald M Crothers
Journal: Proc Natl Acad Sci U S A Date: 2003-02-13 Impact factor: 11.205

2. Base-displaced intercalated structure of the food mutagen 2-amino-3-methylimidazo[4,5-f]quinoline in the recognition sequence of the NarI restriction enzyme, a hotspot for -2 bp deletions.

Authors: Feng Wang; Nicholas E DeMuro; C Eric Elmquist; James S Stover; Carmelo J Rizzo; Michael P Stone
Journal: J Am Chem Soc Date: 2006-08-09 Impact factor: 15.419

3. Fine-Tuning of the AMBER RNA Force Field with a New Term Adjusting Interactions of Terminal Nucleotides.

Authors: Vojtěch Mlýnský; Petra Kührová; Tomáš Kühr; Michal Otyepka; Giovanni Bussi; Pavel Banáš; Jiří Šponer
Journal: J Chem Theory Comput Date: 2020-05-19 Impact factor: 6.006

4. Crystal structure reveals specific recognition of a G-quadruplex RNA by a β-turn in the RGG motif of FMRP.

Authors: Nikita Vasilyev; Anna Polonskaia; Jennifer C Darnell; Robert B Darnell; Dinshaw J Patel; Alexander Serganov
Journal: Proc Natl Acad Sci U S A Date: 2015-09-15 Impact factor: 11.205

5. Co-crystal of Escherichia coli RNase HI with Mn2+ ions reveals two divalent metals bound in the active site.

Authors: E R Goedken; S Marqusee
Journal: J Biol Chem Date: 2000-11-16 Impact factor: 5.157

6. Simulations of anionic lipid membranes: development of interaction-specific ion parameters and validation using NMR data.

Authors: Richard M Venable; Yun Luo; Klaus Gawrisch; Benoît Roux; Richard W Pastor
Journal: J Phys Chem B Date: 2013-08-22 Impact factor: 2.991

7. DNA fragment conformations. 1H-NMR and relaxation studies of 2'-deoxyadenylyl(3'-5')thymidylyl-(3'-5')deoxyguanosylyl(3'-5')thymidine, d(A-T-G-T), in neutral aqueous solution.

Authors: J M Neumann; T Huynh-Dinh; S K Kan; B Genissel; J Igolen; S Tran-Dinh
Journal: Eur J Biochem Date: 1982-01

8. Structure of a B-DNA dodecamer: conformation and dynamics.

Authors: H R Drew; R M Wing; T Takano; C Broka; S Tanaka; K Itakura; R E Dickerson
Journal: Proc Natl Acad Sci U S A Date: 1981-04 Impact factor: 11.205

9. Structure-function studies of FMRP RGG peptide recognition of an RNA duplex-quadruplex junction.

Authors: Anh Tuân Phan; Vitaly Kuryavyi; Jennifer C Darnell; Alexander Serganov; Ananya Majumdar; Serge Ilin; Tanya Raslin; Anna Polonskaia; Cynthia Chen; David Clain; Robert B Darnell; Dinshaw J Patel
Journal: Nat Struct Mol Biol Date: 2011-06-05 Impact factor: 15.369

10. Building Water Models: A Different Approach.

Authors: Saeed Izadi; Ramu Anandakrishnan; Alexey V Onufriev
Journal: J Phys Chem Lett Date: 2014-10-16 Impact factor: 6.475

1 in total

1. Molecular Dynamics Simulations of Protein RNA Complexes by Using an Advanced Electrostatic Model.

Authors: Zhifeng Jing; Pengyu Ren
Journal: J Phys Chem B Date: 2022-09-15 Impact factor: 3.466

1 in total